Calibration and analysis of genome-based models for microbial ecology

Microbial ecosystem modeling is complicated by the large number of unknown parameters and the lack of appropriate calibration tools. Here we present a novel computational framework for modeling microbial ecosystems, which combines genome-based model construction with statistical analysis and calibration to experimental data. Using this framework, we examined the dynamics of a community of Escherichia coli strains that emerged in laboratory evolution experiments, during which an ancestral strain diversified into two coexisting ecotypes. We constructed a microbial community model comprising the ancestral and the evolved strains, which we calibrated using separate monoculture experiments. Simulations reproduced the successional dynamics in the evolution experiments, and pathway activation patterns observed in microarray transcript profiles. Our approach yielded detailed insights into the metabolic processes that drove bacterial diversification, involving acetate cross-feeding and competition for organic carbon and oxygen. Our framework provides a missing link towards a data-driven mechanistic microbial ecology. DOI: http://dx.doi.org/10.7554/eLife.08208.001


Overview
MCM (Microbial Community Modeler) is a computational tool for modeling the metabolism and population dynamics of multiple unicellular organisms in a realistic environmental context. Cells interact with a dynamical environment in which metabolite concentrations and other environmental variables influence, and are influenced by, microbial metabolism. For example, cells can excrete inhibitory toxins, deplete oxygen by respiration or alter pH by lactate fermentation.
The underlying concept is known as Dynamic Flux Balance Analysis (DFBA) [93,49,10,31]. In DFBA, individual cell metabolism is modeled using conventional FBA [83,67,57], and exported or absorbed metabolites are added to or removed from a common (environmental ) metabolite pool available to all cell species. Environmental metabolite concentrations, in turn, influence metabolite uptake kinetics by individual cells. The model framework is dynamical because cell concentrations and metabolite concentrations can both change with time, and the rate of change of each cell concentration or metabolite concentration can depend on the current cell concentrations and metabolite concentrations. While FBA has long been used to understand individual cellular metabolism [93,21,38,20,87,44], recent work has shown that FBA can in fact be scaled up to entire microbial communities to understand their distributed metabolic activity [87,44,10,31].
MCM allows the specification of almost arbitrary microbial community (MC) models within the DFBA model framework (see sections 1.2 and 1.3 for a detailed description of the framework). In the simplest case, an MC model is defined by (i) a set of metabolites, (ii) a set of (reversible or irreversible) chemical reactions between any of the metabolites and (iii) a set of cell species, each being able to perform any subset of reactions and exchange any subset of metabolites with its environment. Optionally, an additional set of environmental variables (e.g. light intensity, temperature or pH) can be included in the model, for example to model light-limitation of photosynthesis or increased cell mortality at high temperatures.
MC models are highly configurable and can accommodate complex microbial communities with several environmental variables, large cell-metabolic networks and complex metabolite exchange kinetics. For example, environmental variables might be given as explicit functions of time, stochastic processes or interpolated from available data. Alternatively, environmental variables can be dynamic, for example with a rate of change depending on metabolite concentrations. Constraints for reaction rates and metabolite exchange rates can depend in an arbitrary manner on any metabolite concentrations and environmental variables. Metabolite concentrations can be explicit functions of time and environmental variables, interpolated from available data or depend dynamically on microbial export, biomass recycling and other measured environmental fluxes. Cell models can include dynamical internal variables that influence, and are influenced by, cell metabolism, thus allowing for so called regulatory FBA approaches [14,13].
MCM keeps track of a multitude of output variables, such as cell concentrations (dead or alive), gene concentrations (as part of dead or living cells), cell-specific or community-wide reaction rates, metabolite concentrations and cell-specific or community-wide metabolite uptake/export rates. Apart from providing a powerful tool for simulating complex and realistic microbial communities, MCM supports: (i) local and global sensitivity analysis with respect to any set of numerical model parameters, (ii) uncertainty analysis in the presence of random model parameters or stochastic dynamics, (iii) quantitative comparison and statistical evaluation against time series data, (iv) maximum-likelihood estimation (fitting) of any set of numerical model parameters and their confidence intervals using available data, and (v) maximum-likelihood estimation of unknown data units (calibration).
MCM can be used to estimate and analyze the effects of poorly quantified cell-metabolic, cell-physiological and even environmental parameters using a multitude of temporal data, ranging from chemical concentration profiles, reaction rate measurements to optical cell concentrations and metagenomics. Because MCM can estimate unknown measurement units, raw uncalibrated data (e.g. optical densities with no calibration to CFUs) can also be used for comparison and parameter fitting. In practice, model calibra-tion becomes analogous to coefficient estimation in conventional multivariate regression. Fig. 1 gives an overview of MCM's working principle.
While MCM is designed for genome-based metabolic models, it can also accommodate conventional functional group models. In these models, different ecological functions such as primary production, heterotrophy or nitrification are performed by distinct populations whose metabolic activity is determined, for example, by Michaelis-Menten kinetics and whose growth is described by simple substrate-biomass yield factors [35,69]. Hence, natural microbial communities could be modeled even if annotated genomes are not available for each member species. MCM's potential applicability ranges from the gut microbiome [42], soil or groundwater microbial communities [97,78] to marine oil spills [74], laboratory cultures [47] and bioreactors [45,48].
MCM is designed to be primarily controlled via the command line through custom scripts, i.e. plain text files containing a set of special MCM commands. This facilitates archiving, task automation, highthroughput execution of multiple simulations and incorporation into other computational pipelines (e.g. through automatic MCM script generation). MCM also includes a python script (micog.py) for the automated conversion of multiple FBA models in SBML file format, such as generated by the Model SEED pipeline [32], into an MC model suitable for MCM. On the other hand, MCM produces output in a multitude of file formats, ranging from raw data files, PDF and SVG figures, gnuplot plotting scripts [96] to FBA models as SBML files [37] and GEXF graph files [3].   This document is organized as follows: The underlying model framework is described in detail in sections 1.2 and 1. 3. Section 2 provides a step-by-step introductory example that demonstrates MCM's working principles. The only prerequisites for this example are sections 1. 2  Section 13 serves as a reference of all major MCM output files. Section 14 contains a list of frequently asked questions and issues that you are likely to run into. Finally, section 15 lists our disclaimer and license agreement (you must read this!).

General model structure
The model framework is formulated as a combination of differential equations and optimization problems. The model considers the population dynamics of S unicellular species, the concentrations of M chemical substances (metabolites) in the environment and the per-cell rates of R biologically catalyzed reactions involving these metabolites. Each reaction is formally identified with a unique gene. Each species is characterized by its metabolic potential, that is, the subset of reactions it can catalyze as well as any metabolite transport mechanisms available. The environment is assumed to be well mixed (although compartmentalized ecosystem models are also possible, see section 5.11). Optionally, the microbial community's dynamics can be influenced by additional environmental variables (such as temperature or light intensity), which might be, for example, deterministic functions of time or themselves depend on other environmental variables.
At any point in time, the reaction rates and metabolite exchange rates (i.e. metabolism) of each cell is assumed to depend only on its metabolic potential, metabolite concentrations and environmental variables 1 . A commonly used mathematical framework for predicting cell metabolism based on environmental nutrient concentrations and metabolic potential is Flux Balance Analysis (FBA) [83,67,57]. In this framework, cell metabolism is assumed to be regulated in such a way that some objective function, commonly identified with biomass synthesis [93,38,25], is maximized. The latter is assumed to be a linear function of reaction rates and/or metabolite exchange rates. The chemical state of cells is assumed to be at dynamic equilibrium, that is, metabolite concentrations within individual cells do not change with time. This assumption of balanced fluxes leads to stoichiometric constraints that need to be satisfied for any particular combination of intracellular reaction rates. Reaction rates are limited due to finite enzyme capacities, but might also be limited by inhibitors or toxins in the environment [2] or environmental conditions like temperature and light. For example, anammox (the oxidation of ammonium with nitrite) is obligately anaerobic, i.e., is inhibited by oxygen [95]. Uptake/export rate limits generally depend on environmental nutrient concentrations, due to finite diffusion rates and limited transmembrane transporter efficiency. For example, nutrient uptake rates by phytoplankton are often approximated by linear or Monod-like functions of nutrient concentration [22,54]. Taken together, cell-metabolic potential, stoichiometric consistency, reaction rate limitations and transport rate limitations define the constraints for a linear optimization problem for each cell species and at each time point.
The assumption of individual cells optimizing biomass synthesis, subject to environmental and physiological constraints, is rooted in the idea that evolution has shaped regulatory mechanisms of unicellular organisms in such a way that they strive for maximum growth whenever possible. This conclusion is less valid for genetically engineered organisms or those exposed to environments that are radically different from the environments that shaped their evolution [83]. Despite its limitations, FBA has contributed to the understanding of several genome-scale metabolic networks [93,20,44], metabolic interactions between cells [26,10,31] and the outcome of microbial evolution experiments with directional selection [38,30], and provides a generic, intuitive starting point for modeling cellular metabolism.
Through their metabolism, cells act as sinks and sources of metabolites in the environmental pool. The total metabolite uptake and export rates across all species define an out-and in-flux into the shared metabolite pool. Other environmental fluxes (e.g. due to sedimentation or precipitation), as well as metabolite recycling due to cell death, can be additional terms in the differential equations for the environmental metabolite concentrations. Because metabolites can be used or produced by several cell species, the environmental metabolite pool provides the interaction hub for species with shared metabolic pathways [81].
The following section describes the MCM model framework in more mathematical detail. A rough understanding of the included mechanisms is required for the configuration and interpretation of microbial community models in MCM.

Mathematical model
Let E 1 , E 2 , .. be any independent environmental variables as functions or stochastic processes of time, summarized as a vector E. Let C 1 , .., C M be the concentrations of all considered metabolites in the environmental pool, summarized as a vector C ∈ R M . Similarly, let N = (N 1 , .., N S ) ∈ R S be the concentrations of individual cell species. Each species s is described by a subset of available genes (i.e. the ability to perform certain reactions), as well as a list of metabolite uptake or export mechanisms (collectively referred to as its metabolic potential ).
Technical note: MCM distinguishes between active and passive metabolite uptake/export mechanisms, for example representing transport membrane proteins and diffusion, respectively. If a cell lacks an active transport mechanism for a particular metabolite, its transport is constrained by the passive transport rate limits for that metabolite.
Let S ∈ R M ×R be the stoichiometric matrix for all possible metabolic reactions, where S mr is the stoichiometric coefficient for metabolite m in reaction r. For example, the reaction (1) might be formally associated with the presence of the cytochrome c oxidase operon, and its stoichiometric coefficients are −1, −6, −30, +6, +6 and +30 for the compounds C 6 H 12 O 6 , O 2 , ADP, CO 2 , H 2 O and ATP, respectively. Let H rs be the rate of reaction r within cells of species s. For the reaction in Eq. (1), a rate of 1 mol/(cell × day) means one mole molecules C 6 H 12 O 6 are being catabolized per day and per cell. Let F s ∈ R M be the corresponding net metabolite fluxes across the cell membrane, from the environment to the cell's interior, that is, F m,s is the net uptake rate of metabolite m by species s. The objective function (i.e. biomass production rate) is assumed to be a linear function of the species-specific reaction rates and metabolite uptake/export rates [73,39]: Each reaction r contributes to biomass production by a constant proportionality factor Z r ; similarly, the uptake (or export) of a metabolite m contributes to the objective function by a factor Z m,u (or Z m,e ) so that the total biomass production rate per cell is where F m,s,u = F m,s if F m,s > 0 (and zero otherwise) and F m,s,e = −F m,s if F m,s < 0 (and zero otherwise). The differentiation between positive and negative fluxes allows the incorporation of different costs associated with the uptake or export of a particular metabolite.
In the context of FBA, intracellular metabolite concentrations are assumed to be at equilibrium, which implies F s = −SH s . In particular, the objective function in Eq. (2) is a just linear function of H s . For each species s, the reaction rates H s are determined by optimizing the species' biomass production rate 2 B(H s ) under the constraints imposed on F s and H s by the cell's metabolic potential and the environment 3 . For each species one thus obtains an optimization problem with linear constraints and a linear objective function, also known as a linear programming problem [41]. A global optimum to such a problem always exists, provided that a sufficiently high number of constraints is specified. Fig. 2 exemplifies the above concepts for a simple MC model.

Nitrobacter
Maximize Nitrob biomass production rate = ZamoHamo,Nitrob + ZnxrHnxr,Nitrob 3 , O2 and H2O), 2 redox reactions (amo and nxr) and 2 species (Nitrosomonas and Nitrobacter ). Biomass synthesis is formally associated with each of the two reactions (e.g. 4.6 g dry weight/mol for amo). amo is only performed by Nitrosomonas and nxr is only performed by Nitrobacter. Both reactions are non-reversible, amo can be performed arbitrarily fast and nxr has a maximum rate of 10 −12 mol/(cell · day). The maximum NH + 4 and NO − 2 uptake rates are Monod-like functions of substrate concentrations, while O2 uptake is limited to below 1 × 10 −13 mol/(cell · day). Water can be freely exchanged with the environment. Nitrosomonas is unable to take up NO − 2 or NO − 3 while Nitrobacter cannot take up NH + 4 nor NO − 3 . Metabolite exchange, if at all possible, does not impose any explicit cost on a cell (Zu = Ze = 0).
In a nutshell, at each time point FBA defines how the metabolite concentrations C, environmental variables E and cell-metabolic potential determine cell metabolism H s (C, E) as well as cell-environment exchange rates F s (C, E) for each cell species s. Calculating H s and F s translates to solving the underlying linear optimization problem for each cell species and at each time step. The biomass production rate B(H s ) translates into a cell birth rate B(H s )/µ s , where µ s is the dry cell mass. Concretely, where τ s is the expected cell life time in the absence of any metabolism 4 . The latter can also be interpreted as a measure of minimum maintenance requirements [34] that would be added as a constant negative term to the objective function.
Metabolite uptake and export by cells directly affects environmental metabolite concentrations, which change as The 1st sum in Eq. (4) iterates over all species and represents the net metabolite uptake by the entire microbial community. The 2nd sum in Eq. (4) accounts for nutrient recycling through cell death. The constant vector b ∈ R M quantifies released metabolites per dead dry biomass and can be adjusted to empirical biomass stoichiometry [36]. For example, cell death might be associated with an instantaneous release of C 6 H 12 O 6 , NH + 4 and PO 3− 4 into the environment, at proportions given by the Redfield ratio [68,23,12]. The vector f in Eq. (4) accounts for possible external fluxes into and out of the system, for example due to sedimentation, precipitation or primary production. Environmental stochasticity can be incorporated by defining f as a stochastic process. Each of the terms in Eq. (4) can be omitted for individual metabolites. In fact, any metabolite concentration C i can alternatively be specified as an independent function of time, a stochastic process or a function of E and other C j (j = i), independent of any microbial activity. For example, some metabolites can be defined through conjugate acid-base relationships (see section 5.2.2 for options regarding these so called environmental metabolite dynamics) Finally, each environmental variable E i can itself be an explicit function of time or a stochastic process, or depend on C and other E j (j = i). In fact, E i can be dynamic, with a rate of change dE i /dt specified as a function of time, C and E.
Technical note: The model does not necessarily preserve total mass or chemical elements, unless it is carefully configured.
In fact, MCM has no way of telling whether a chemical reaction makes any physical sense.
An overview of the model framework is given in Fig. 3. MCM allows the specification of arbitrary models of this structure, with any number of involved metabolites, reactions, cell species and environmental variables (see section 5). Simulations of the model are performed by solving the differential equations (3) and (4) together with the cell-metabolic optimization problems at each time step (see Fig. 4 for an illustration). Figure 3: Illustration of the microbial community model framework described in section 1.3, here for 3 cell species. Thick arrows represent metabolite fluxes into and out of the ecosystem (e.g. due to fertilization in the case of a soil microbiome). Thin arrows represent per-cell metabolite fluxes across cell membranes. The graphs inside the cells represent the cell-metabolic models. Individual cell metabolism is predicted using flux balance analysis.
growth phase, τs can also be set to infinity (no cell death). It is also possible to include the death term, −Ns/τs, only when the cell production rate, B/µs, falls below a certain threshold. (1) into a linear optimization problem (LOP) for the biomass production rate of each cell species (2). Each LOP optimizes a linear function constrained to a polytope in high-dimensional space. Optimizing biomass production rates yields predictions on intracellular reaction rates (3) and thus microbial metabolite uptake/export rates (4). Metabolite fluxes are used to predict metabolite concentrations and biomass production rates are used to predict cell population sizes in the next time step (1). Optional environmental variables are also updated.

Introductory example
In this section we shall construct and analyze a simple model of a two-species nitrifier community. Our model will only comprise 5 metabolites (ammonium NH4, nitrite NO2, nitrate NO3, oxygen O2 and water H2O), 2 redox reactions (aerobic ammonium oxidation to nitrite, or amo, and aerobic nitrite oxidation to nitrate, or nxr) and 2 cell species (Nitrosomonas and Nitrobacter). amo and nxr will only be performed by Nitrosomonas and Nitrobacter, respectively, and both reactions will be identified with biomass synthesis. An overview of the metabolic network in terms of FBA is given in Fig. 2. This simplistic model should serve as an illustrative example of MCM's working principles. For an elaborate documentation of MCM please consult the remaining sections, in particular sections 5, 6, 7, 8 and 9.
More realistic examples using published cell-metabolic models are given in sections 10, 11 and 12. For instructions on installing MCM consult section 3.

Constructing a microbial community model
MC models are defined using a human-readable formal syntax in a series of special MC model configuration files. In a new empty directory (let's call it MC_intro_model), create 4 empty plain text files and name them metabolites, reactions, species and focals (these file names must be used exactly). In the first 3 files we will be defining all relevant metabolites, reactions and cell species, respectively. The focals file will list the model objects (i.e. metabolites, reactions and species) that we wish to focus on in our analysis. Whether a metabolite, species or reaction is focal or not does only affect the amount of output produced by MCM but has no effect on the microbial community dynamics. While not particularly obvious in this simplistic example, this is very useful for more realistic large-scale metabolic models that can comprise several hundreds of metabolites and reactions [1,44], only a few of which might be of real interest.
Let's start by defining all relevant metabolites: In the file metabolites, write: Observe that each metabolite is defined by a unique name (see section 5.13.1 for name restrictions) and a list of mandatory key:value pairs (one per line). Comments are preceded by the "#" symbol and are ignored by MCM. Excessive whitespace is also ignored. The description of a metabolite can be empty or contain a short descriptive text to be included in MCM's latter output.
In general, metabolites can be transported between cells and environment either actively (e.g. using a transport membrane protein) or passively (e.g. via diffusion). In MCM, active transport differs from passive transport in that the availability of an active transport mechanism needs to be explicitly specified for a cell species (see below), while passive transport kinetics are applicable whenever an active transport mechanism is absent. In the example above, all max passive uptake/export rates except for water are set to zero, so that exchange only takes place if a cell is explicitly specified to possess the appropriate active transport mechanism. Most active uptake/export rates are assumed to be unlimited (i.e. are not explicitly constrained), except for the O2, NH4, NO2 and NO3 uptake rates, whose limits are either set to a constant value or depend on substrate concentrations in a Monod-like manner. Other more complicated transport kinetics are also possible and are described in full detail in section 5.2.1.
The environmental dynamics of a metabolite specify how its concentration in the environmental pool changes with time. In the example above, the concentrations of NH4, NO2 and NO3 are dynamical and their 11 click here for source code rate of change depends on net microbial export (which might be negative). Replacing microbial_export in their environmental dynamics with a numerical constant (or a function of time, t) would mean that their concentration changes at that constant (or time-dependent) rate. While the NH4 concentration is initially 10 µM, both NO2 and NO3 are initially absent from the environment. The O2 concentration is explicitly set to be constant and at 100% saturation with the atmosphere (at 37 • C). Alternatively, the O2 concentration could have been an explicit function of time (t) or interpolated from available measurements. We arbitrarily set the concentration of water to constant 0, however water is only considered as a metabolic by-product and will not affect our model's dynamics. The full range of possible environmental metabolite dynamics is described in section 5.2.2.
Technical note: All physical quantities and constants must be given in specific (typically SI-derived) physical units. For example, time is given in days, half-saturation constants and metabolite concentrations are given in mol/L and metabolite export rates are given in mol/(cell · day). See section 5.13.4 for a full list of appropriate units.
Next, let's define the two redox reactions, ammonium oxidation (amo) and nitrite oxidation (nxr). In the reactions file, insert the following: Similarly to metabolites, each reaction is defined by a unique name and a set of mandatory key:value pairs (one per line). Each reaction is characterized by a chemical equation that describes the transformation of a set of reactants into a set of products. Only pre-defined metabolites can be used in chemical equations and names are case-sensitive (i.e. nh3 is not the same as NH3). Reactants and products are separated by "->" and can be preceded by an optional stoichiometric coefficient (note the space between the latter and the metabolite name). In the above example, both reactions are non-reversible. amo has an unlimited (i.e. unconstrained) max forward rate (in our model amo will, however, be limited by the finite NH4 uptake rate). On the other hand, nxr has a finite max forward rate due to (hypothetical) finite enzyme capacities. In general, reaction rate limits can be functions of metabolite concentrations (even of nonparticipating metabolites such as inhibitors) or other environmental variables (see section 5.3.3). The objective of a reaction (e.g. 4.6 for amo) specifies the amount of biomass produced per reaction flux (g dry weight per mol).
In the species file, define the two cell species, Nitrosomonas and Nitrobacter: Similarly to metabolites and reactions, each cell species is defined by a unique name and a list of mandatory key:value pairs. The initial concentration specifies the cell concentration at time 0. The life time specifies the expected time until cell death in the absence of metabolic activity. Here, we have omitted to model cell death for Nitrosomonas while we assume that a metabolically inactive Nitrobacter cell lives on average 20 days. In general, a cell's life time may explicitly depend on time, metabolite (e.g. toxin) concentrations or environmental variables such as temperature (see section 5.4.1).
The active uptakes and active exports lists specify the metabolites (separated by a comma) for which active uptake and export mechanisms are available, respectively. The corresponding uptake/export rate limits for each actively transported metabolite are taken as specified in the metabolites file (see above).
Since we specified the max passive uptake/export rate of H2O as unlimited (see above), both species can also freely exchange water with their environment.
In the above example, each species can only perform one of two redox reactions, amo or nxr. If a cell can perform several reactions, these need to be separated by a comma. Note that Nitrobacter is specified to have two copies of the nxr gene. While this has no effect on the cell's metabolic activity (which is determined according to FBA), specifying gene copy numbers (by default assumed to be one) is useful when comparing simulations of gene concentrations to quantitative metagenomic time series [16].
Finally, let us specify focal (i.e. particularly interesting) metabolites, reactions and species. In the focals file, insert the following: In this example we do not include O2 nor H2O in the focals list because their dynamics are trivial anyway.
On the other hand, we included all cell species using the special wildcard "*".

Running a single simulation
Now that we've defined our MC model, let us prepare an MCM script that will contain all necessary commands to run our simulation. In the parent directory containing our MC_intro_model directory, create a plain text file (let's call it script_intro) containing the following: Notice that most MCM commands fall into one of three categories: • set <control variable name> <some value>, for setting the value of a control variable. For example, "set maxSimulationTime 15" sets the time span of simulations to 15 days.
• set (or unset) <flag name>, for toggling a boolean flag on or off. For example, "set parallel" enables parallel computation, i.e. using multiple CPUs or cores to solve FBA models in parallel.
• <some standalone command>, for invoking a simulation or other comparable tasks.
Notable exceptions exist, such as setodnew (written as a single word and followed by a quoted file path), which creates a new non-existing output directory with the given path (a number is appended to the directory name to ensure non-existence). The command actually invoking the simulation in the example above is runMCM; the remaining commands simply modify control variables or prepare the output directory into which results are to be saved. The above script can be executed in the terminal by calling MCM with the script file path as an argument, e.g.
./MCM script_intro Upon completion, you should obtain a new directory simulations_intro/run_01, into which all simulation results are saved. MCM saves all results in the form of strictly structured data and plot files, though some overview output is reproduced in the command line. Within the specified output directory, all output files have a fixed name and location. However, which files are created depends both on the invoked MCM function as well as current control variable and flag values. In our case the sub-output directory simulation_summary will contain the predicted temporal profiles of several community-wide summary variables, such as metabolite concentrations, reaction rates or cell concentrations (see Fig. 5 for an example). The sub-output directory metabolic_activity_heatmaps will contain metabolic activity heatmaps generated at regular time intervals (see Fig. 6 for an example). See section 13 for a list of possible output files and section 6 for detailed simulation options.
14 click here for source code Note: At this point you should make a backup of the model and script, as we will be reverting to it in subsequent examples. The initially present NH4 is quickly oxidized by Nitrosomonas to NO2, which is in turn oxidized by Nitrobacter to NO3, albeit with a certain time lag due to initially low Nitrobacter populations.

Uncertainty analysis
As simple as the example above is, it already requires the specification of a respectable number of numeric parameters (5 uptake kinetic parameters, 2 initial metabolite concentrations, 2 cell masses, 2 initial cell concentrations and 1 cell life expectancy). In practice, several parameters might be poorly quantified or even random. If the probability distribution of parameters is known (e.g. as a Bayesian posterior), MCM can project the uncertainty of input parameters to an uncertainty of the microbial community's behavior. MCM's uncertainty analysis involves several Monte Carlo simulations of the MC model and aims to estimate the probability distribution of model predictions corresponding to the pre-defined probability distribution of model parameters. Uncertainty analysis is described in detail in section 9.
Model parameters included in uncertainty analysis need to be defined as symbolic. In short, this means assigning them a name, a value range, a default value and a probability distribution. In general, symbolic parameters allow a higher level analysis of microbial communities, including sensitivity analysis, uncertainty analysis and parameter estimation from data. See section 5.7 for more details on symbolic parameters.
Let us define two symbolic parameters, Khalf_NH4 and initial_Nb, representing the NH4 uptake halfsaturation constant and initial Nitrobacter cell concentration, respectively. Create a plain text file parameters, place it in the MC model directory used in the previous example (MC_intro_model), and fill in the following content: Notice the familiar syntax: Each symbolic parameters is defined by a unique name and a list of key:value pairs (one per line). The first parameter is uniformly distributed within its permissible range (as defined by its minimum and maximum), while the 2nd parameter is log-uniformly distributed. For purposes other than uncertainty analysis or fitting, the distribution of a parameter can also be omitted. We defined both parameters to be non-fixed, because we will be varying them in the following analysis (fixed symbolic parameters always evaluate to their default value). We set their default value to the numeric values used in our original model in section 2.2.
So far MCM has no way of knowing what these parameters actually mean. To include them in our MC model we refer to them by their name, enclosed between two "$" symbols: In the metabolites configuration file (located in your MC model directory, MC_intro_model), change the active uptake kinetics of NH4 to the following:

Sensitivity analysis
An alternative to uncertainty analysis, sensitivity analysis (SA) aims to quantify the dependence of a system's behavior on individual parameters, either when these are only slightly varied (local SA), or when these vary between extreme values (global SA) [7]. In MCM, local and global sensitivity analysis are invoked using the commands LSAMCM and GSAMCM, respectively.
Similarly to uncertainty analysis, SA requires the specification of non-fixed symbolic parameters, the sensitivity to which is to be evaluated. Contrary to uncertainty analysis, SA does not require (nor does it use) the probability distributions of parameters. Instead, local SA evaluates symbolic parameters close to their default value in order to estimate the partial derivatives of output variables with respect to these parameters, while global SA evaluates symbolic parameters at their default, minimum and maximum values (as defined in the parameters file). SA is described in detail in section 8.1.
For this example, we can just leave our MC model configuration, including the parameters file, unchanged from the previous section. In the script file script_intro, replace all uncertainty analysis-associated commands (the ones we added in the previous section) with the following commands: Executing the new script will result in two sub-output directories, LSA and GSA, containing detailed reports and graphical summaries of the local and global sensitivity analysis, respectively. Fig. 8 exemplifies some of the generated output.

Comparing and fitting the model to data
As valuable as it is to make predictions based on mechanistic models, the loop of scientific enquiry is only closed when predictions are validated against, and models are improved in view of, real measurements. MCM allows the statistical evaluation of model predictions against time series data (such as optical cell concentrations, rate measurements or metagenomics) as well as the calibration (fitting) of model parameters to the data using maximum-likelihood estimation. For more details on the underlying statistical theory as well as the fitting process consult section 7.
Simulation results are automatically compared to any time series data available. The latter needs to be given in tabular format in plain text files with a particular name structure. Consult section 14.3.6 for details on data file formats. In this example, we will be using an artificial time series data set for (a) Nitrobacter cell concentrations and (b) NO3 concentration. The former will be in optical density units, i.e. not calibrated to actual cell counts. As a result, MCM will estimate the appropriate linear unit scaling that best matches the simulation to the data. Create a new directory (let's call it data_intro) and two subdirectories, CellConcentrationsDeadAlive and MetaboliteConcentrations. The names of these subdirectories are used by MCM for identification purposes and must be as given here. Create two plain text files, "Nitrobacter [au]" and "NO3", and place them in the first and second data subdirectory that you created, respectively. These files will contain our two time series. We designate the first time series 18 click here for source code as dead+alive cell concentrations because suspended dead cells also contribute to light scattering during optical measurement and can thus not be differentiated from living cells. The " [au]" suffix in the file name tells MCM that the data is in arbitrary (unknown) units.
In the "Nitrobacter [au]" time series file, fill in an artificial time series similar to the following: Observe that multiple values (e.g. from repeated measurements) can be given for the same time points. These will be averaged if the flag averageAmbiguousData is set or treated as independent time points otherwise. As in all MCM files, comments preceded by the "#" sign as well as excessive whitespace are ignored.
Having prepared our data sets, we now need to point MCM to the root data directory (which we called data_intro). In the original MCM script from section 2.2 we add the specification of our data directory, ending up with the following script: Observe that we also specify a log-normal error distribution for cell concentrations and a normal error distribution for metabolite concentrations. Simply put, this means that deviations of cell concentrations and metabolite concentrations from data are statistically evaluated on a logarithmic and linear scale, respectively. For a mathematical explanation of log-normal and normal error structures see section 7.
Executing the above script will run a single simulation similar to the one in section 2.2, using the default values of our symbolic parameters. However, now MCM finishes the simulation with a comparison of its predictions to the provided data. All related reports and plots (exemplified in Fig. 9) are saved to the sub-output directory comparison_to_data. For example, the file comparison_to_data/comparison.pdf shows an overview of the data comparison, including coefficients of determination, log-likelihoods and estimates of noise variance.   Having compared our model to the available data (and most likely having obtained a moderate to bad match), let's go a step further and adjust the two symbolic parameters, Khalf_NH4 and initial_Nb, to better match our data. A sensitivity analysis of the individual log-likelihoods (similar to the sensitivity analysis demonstrated in section 2.4 above), would help us infer the direction to which parameters should be adjusted in order to improve the match (see section 8.2 for details). We shall skip this intermediate step and directly attempt an automated maximum-likelihood estimation using MCM's fitting function.
Since we have already (i) specified an MC model, (ii) defined a set of non-fixed symbolic parameters and their permitted ranges (see section 2.3) and (iii) provided MCM with appropriate data, we are ready to proceed with the fitting. Modify the last script by replacing runMCM with the command fitMCM 20 click here for source code and execute the script in the terminal as usual. The invoked parameter fitting is an iterative process and involves several repeated simulations with slightly varying parameter values, starting at their default values. A final simulation with the fitted parameters (i.e. maximizing the log-likelihood) and a comparison to the data are saved to the sub-output directory run_final (see Fig. 10). Following fitting, MCM will also estimate confidence intervals for the fitted parameters (this involves several more simulations). The final fit report (saved in the file fit_report.txt) is exemplified below:  Figure 10: Comparison of the nitrifier community model to time series data for (a) Nitrobacter cell concentrations (dead+alive) and (b) NO3 concentrations, following a maximum-likelihood fit of the parameters Khalf_NH4 and initial_Nb (section 2.5). Note that since the Nitrobacter cell concentration data was provided in unknown units, a calibration to actual cell counts was also performed (one optical density unit corresponding to 6.46 × 10 8 cells/L). Fig. (c) summarizes the normalized log-likelihoods of the evaluated variables. The overall log-likelihood after fitting is 17.9, compared to −1.84 prior to fitting (see Fig. 9). Excerpt from the file run_final/comparison.pdf.
Note that the example model considered here is very simple and well-behaved. For realistic large-scale models parameter fitting might (a) take much longer to converge, (b) fail for unexpected reasons or (c) fail to estimate parameter confidence intervals. This is particularly likely when many parameters are fitted simultaneously or when the model responds highly non-linearly to parameter changes. These issues are characteristic of the (dreaded) inverse problem, i.e. the calibration of mechanistic models to experimental data [90]. Resolving them typically requires a good understanding of the model and extensive tweaking of the fitting process. For a detailed explanation of MCM's fitting options consult section 7.3.

Fitting the model using arbitrary composite data
In the previous example we calibrated our model using available data for two variables: Nitrobacter cell concentrations and NO3 concentrations. In practice some variables may not be observed directly, but only inferred indirectly from other composite or observable variables. For example, the cell concentrations of the nitrifiers Nitrosomonas and Nitrobacter might not be separately measurable, but the total concentration of nitrifiers might be. In principle, such composite data can still provide valuable information on an ecosystem's state.
In the following example we shall define a new observable, nitrifier_concentration, as the total concentration of Nitrosomonas and Nitrobacter. Create a new text file observables, place it into the existing MC model directory (MC_intro_model), and fill in the following: In general, observables can be functions of any other variables such as metabolite concentrations, reaction rates or even other observables (section 5.5). While observables do not themselves influence the behavior of a model, they can be used for uncertainty analysis, sensitivity analysis and model fitting just like any other variable (e.g. as in the examples above, sections 2.3, 2.4 and 2.5).
Observe that we specified the error model of nitrifier_concentration as normal, which means that this particular observable shall be compared to data assuming a normal error distribution. Let us compare and then calibrate our model using a (hypothetical) time series for nitrifier concentrations. Similarly to the example in section 2.5, create a text file nitrifier_concentration, place it into a new directory data_intro/Observables/, and fill in a time series similar to the following:

Including environmental stochasticity
Natural microbiomes are inevitably subject to fluctuations in environmental conditions, such as precipitation, temperature or nutrient influx. In the following example, we shall extend our nitrifier model to include two sources of environmental stochasticity: (a) fluctuations in NH4 supply rates and (b) fluctuations in temperature, which in turn influence reaction rates. We shall then perform an uncertainty analysis of the resulting (stochastic) model, similar to the one in section 2.3.
To eliminate any stochasticity due to parameter uncertainty, we set both symbolic parameters (Khalf_NH4 and initial_Nb) to fixed. Hence, the parameters file should now look as follows: As a result, Khalf_NH4 and initial_Nb will always evaluate to their default values. We then modify the environmental dynamics of NH4 to include a random external supply rate: In the metabolites file, modify the NH4 record to the following: NH4 description: ammonium max active uptake rate: Monod 1.23e-13 $Khalf_NH4$ max active export rate: unlimited max passive uptake rate: 0 max passive export rate: 0 environmental dynamics: initial 1e-4 flux logOU 1e-4 1e-5 5 and microbial_export NH4 is now replenished at a rate which is itself a stochastic log-Ornstein-Uhlenbeck process 5 with a mean of 1 × 10 −4 mol/(L · day), a standard deviation of 1 × 10 −5 mol/(L · day) and a correlation time of 5 days.
To define temperature as an environmental variable, create a new plain text file called environment, place it in the MC model directory (which we called MC_intro_model) and fill in the following: Notice the familiar syntax: Each environmental variable is defined by a unique name and a set of key:value pairs (one per line). The option units (optional) merely specifies axis labels during graphical output, and sign:positive (optional) ensures that temperature is always positive regardless of numerical errors or flaws in the model. The most important aspect of an environmental variable is its dynamics, which either specifies its rate of change, or its explicit value as is the case above. value can be, for example, a function of time (t), metabolite concentrations and other environmental variables, or an interpolation of a measured time series, or even a stochastic process. Consult section 5.1 for more details on environmental variables. In our case, temperature fluctuates around 293 K as an Ornstein-Uhlenbeck stochastic process with a standard deviation of 5 K and a correlation time of 10 days. For completion, let's tell MCM that we find temperature interesting: At the end of the focals file, add the following two lines:

FOCAL_ENVIRONMENT temperature
So far temperature has no effect on our model's dynamics. Let's make the maximum amo reaction rate scale exponentially with temperature by changing the amo definition in the reactions file to the following: Observe that the max forward rate of amo is now a custom function of temperature (T). The custom keyword can generally be followed by any mathematical expression that depends on time (t), metabolite concentrations or environmental variables (see section 5.3.3 for more details).
Summarizing, we have so far (a) set all symbolic parameters to fixed, (b) added a stochastic influx of NH4 to the system, (c) defined temperature as a stochastic environmental variable and (d) included temperature effects on the amo reaction rate limits. Let us now modify the MCM script to perform an uncertainty analysis of the model in face of the introduced stochasticity. We can use the same script as in section 2.3, with the difference that we choose to run simulations a bit longer and use more Monte Carlo trials for higher accuracy. Hence, our new MCM script should look as follows: # specify the MC model directory set MCmodelDir "MC_intro_model" # create a new non-existing output directory # all simulation results will be saved in here setodnew "simulations_intro/run_" Executing the above script will perform an uncertainty analysis similar to section 2.3, the results of which are exemplified in Fig. 13. Typical temperature profiles and the resulting maximum amo reaction rates are shown in Fig      . Also shown are the mean and a random example simulation. Excerpt from the output file UA.pdf.

Including bacteriophages
While long overseen, bacteriophages are increasingly perceived as important regulators of microbial communities, for example in the ocean [88] or bioreactors [84]. Observe that free viral particles are released at a rate of 20 particles per infected host cell per day, are lost at a rate of 0.1/day and have an initial concentration of 1/L. We specified the sign of the phage concentration as positive to ensure that these never become negative (e.g. due to numerical errors). Defining "infected" versions of the two cell species is straightforward: In the species configuration file add the two additional species Nitrosomonas_infected and Nitrobacter_infected and modify the existing Nitrosomonas and Nitrobacter to account for phage infections, obtaining the following: To include phage concentrations in graphical simulation output, we specify them as focal by adding the following to the focals file: FOCAL_ENVIRONMENT phage* Notice that we use the wild card '*' to include both phage_Ns and phage_Nb. Running a simulation using the original script from section 2.2 produces the output illustrated in Fig. 14. Notice the initial grow of both cell species, triggering the proliferation of bacteriophages and the subsequent infection of cells.

Environmental (abiotic) reactions
Some reactions can take place abiotically in the environment, i.e. independent of microbial metabolism. For example, nitrate reduction to ammonium in soils can occur abiotically in the presence of sulfate green rust [29,28] stoichiometrically as follows: Experimentally determined rates follow the first-order kinetics where k ≈ 4.26 L · mol −1 · d −1 at 25 • C [28]. Let us incorporate this environmental reaction into the original microbial model from section 2.1. Hence, we start with the first version of the model files metabolites, reactions, species and focals. We begin by adding the three metabolites, FeIIGr, Fe3O4 and SO4 to the metabolites file: FeIIGr description: sulfate green rust max passive uptake rate: 0 max passive export rate: 0 environmental dynamics: initial 0.012 flux environmental_production Fe3O4 description: iron (II,III) oxide max passive uptake rate: 0 max passive export rate: 0 environmental dynamics: initial 0 flux environmental_production SO4 description: sulfate max passive uptake rate: 0 max passive export rate: 0 environmental dynamics: initial 0 flux environmental_production Observe that we have specified the environmental dynamics of all three metabolites as dynamical, with a rate of change according to environmental_production. This tells MCM to update FeIIGr, Fe3O4 and SO4 concentrations according to the net amounts produced (or consumed) by environmental reactions. Let's also adjust the environmental dynamics of NH4 and NO3 to account for fluxes from abiotic reactions (in addition to microbial metabolism): NH4 description: ammonium max active uptake rate: Monod 1.23e-13 2.6e-5 max active export rate: unlimited max passive uptake rate: 0 max passive export rate: 0 environmental dynamics: initial 1e-4 flux microbial_export and environmental_production NO3 description: nitrate max active uptake rate: 0 max active export rate: unlimited max passive uptake rate: 0 max passive export rate: 0 environmental dynamics: initial 0 flux microbial_export and environmental_production

Next, let's add protons (H) as an additional metabolite:
H description: protons max passive uptake rate: 0 max passive export rate: 0 environmental dynamics: concentration 10^(-7) Note that we assumed a fixed pH of 7 to calculate proton concentrations. More generally, we could have defined pH as an environmental variable that depends, for example, on NH4 concentrations. To summarize, the metabolites file should now contain the following: In the reactions file, add the following reaction with first-order kinetics: Observe that we used the wildcard '*' to designate all iron-compounds and all cell species as focal.
Because biocatalyzed nitrification oxidizes ammonium to nitrate and abiotic ammonification slowly reduces nitrate back to ammonium, nitrogen cycling is expected in the system (as long as sulfate green rust is available as an electron donor). To visualize nitrogen fluxes between reactions at particular time points (e.g. at the end of day 2 and day 14), we use the MCM control variable plotMetabolicFluxDiagrams (see section 6 for further options). Concretely, in the original script from section 2.2 we add the following line anywhere prior to the runMCM command: set plotMetabolicFluxDiagrams at 2 14 Running the modified script produces the output illustrated in Figs , microbial net export rates (center row) and environmental net production rates (bottom row) over time for NH4, NO2 and NO3 (columns 1-3, respectively), in the nitrifier model with abiotic ammonification (section 2.9). Observe that as NO3 accumulates, the system shifts from mostly nitrifying to a balance between nitrification and abiotic ammonification. In the latter state, nitrogen cycling is sustained by the presence of reducing sulfate green rust. Excerpts from the output files metabolite_concentrations.pdf, metabolite_net_environmental_production.pdf and metabolite_net_export_community_wide.pdf.
(a) (b) Current system-wide metabolic fluxes across reactions at time t=2.12 days. Chords connect reactants to reactions and reactions to products. Chords are colored according to their origin. Chord widths are proportional to fluxes. Only focal metabolites and reactions are shown. Environmental reactions are indicated by (env.) Microbial reactions are indicated by (micr.) Data file: data/current_at_time_2. 12 Figure 16: System-wide metabolic fluxes between reactions and metabolites after 2 days (a) and 14 days (b), in the nitrifier model with abiotic ammonification (section 2.9). Chords connect reactants to reactions and reactions to products. Chords are colored according to their origin. Chord widths are proportional to flux rates. Environmental and microbial rates are indicated by "env." and "micr.", respectively. Observe that on day 2 the system is dominated by nitrification, while on day 14 nitrification is balanced by abiotic (environmental) ammonification. Excerpts from the output files metabolic_flux_diagrams/current_at_time_2.html and metabolic_flux_diagrams/current_at_time_14.html.

System requirements and dependencies
MCM is mostly self-contained and should run without problems on any typical UNIX-like machine (including at least Mac OS X and Linux). Plotting graphs requires additional free 3rd party software (see section 3.5), the absence of which, however, does not interfere with MCM's computational functions. Note that compiling MCM from the sources (e.g. if a compiled executable is not available for your platform) also imposes additional requirements on the system (see section 3.4). MCM has been successfully compiled and tested on Mac OS 10.5 -10.8, Ubuntu Linux 8.04 LTS and Fedora Linux 7.
Hardware requirements depend on model size. For example, memory requirements increase linearly with species number and average metabolic model size (number of non-zero entries in the stoichiometric matrix). Computing time increases linearly with species number and polynomially with average cell-model size 6 . A single-species simulation involving 1000 reactions and lasting 1000 time steps will typically take a few seconds on a modern laptop (as of 2014).

3rd party components
In addition to its own roughly 50 000 lines of code, MCM also includes the following 3rd party code: • NLopt 2.4.2, an open source nonlinear optimization library [40,75,56,65,64]. NLopt is distributed under the GNU Lesser General Public License 2.1 or later.
• lpsolve 5.5.2, an open source linear programming solver [53]. lpsolve is distributed under the GNU Lesser General Public License 2.1 or later.
• D3, a free JavaScript library for data visualization, is distributed under the BSD 3-Clause license.
• Underscore, a free general-purpose JavaScript library, is distributed under the MIT license.
Detailed descriptions and license agreements for the above libraries can be found in their respective subdirectories. No separate installation or configuration is needed for the above libraries, since they come bundled with MCM.

Installing MCM
Compiled binaries (executables) of recent MCM releases might be available for your operating system at the project's website: http://www.zoology.ubc.ca/MCM (and might even be included with this manual).
If that is the case, simply download the suitable binary and place it in a location of your choice. You might need to adjust the file's execute permissions to be able to run it as a program. On a typical UNIX-like machine, the required terminal commands might look as follows: chmod ugo+x MCM # make MCM executable by all users mv MCM /usr/local/bin/ # make MCM accessible system-wide Note that while MCM is mostly a standalone program, it does require certain (free) 3rd party software for graphing, described in detail in section 3.5.

Compiling MCM from the sources
If a recent binary is not available for your system you will need to compile MCM from the raw C++ sources. Compiling MCM requires a recent version of the GNU C++ compiler (g++), by default present on most modern UNIX-like systems. Note that earlier versions of g++ (<4.2) do not support OpenMP, so MCM binaries compiled with such a compiler will not allow parallel (multithreaded) computations.
Mac OS X users, please also see the FAQ in section 14.1.3.
The latest MCM sources are available at the project's website (or might even be included with this manual). If the downloaded file is a compressed .tar.gz file, you can unpack it via the terminal, e.g.
gunzip -c MCM_1.0.tar.gz | tar xopf -(or by double-clicking on the file on a modern Mac). After unpacking, you should obtain a source_code directory containing at least the file MAKEFILE and the subdirectory sources. In the terminal, change to the source_code directory and run make to compile: To compile MCM without support for parallel computing use 'make openmp=0' instead of 'make' in the last command.
Attention: The MCM sources include 3rd party code which is also automatically compiled (for a detailed list see section 3.2). Some of the included 3rd party code requires that during compilation the entire path to the MCM directory does not contain any whitespace characters. This may not be obvious from the errors you would otherwise get.

Plotting
MCM plots graphics using gnuplot, a free and powerful command-line graphing utility (http://www. gnuplot.info). All data and plotting commands are transparently passed to gnuplot and plots are automatically saved into files. For each plot file, the raw plot data and gnuplot scripts are always saved as separate files (ending with _data.txt and _plot_script.gp, respectively), allowing the reproduction and manual adjustment of plots (e.g. for publication).
An installation of gnuplot is required for plotting. However, data and plot scripts are still saved even if gnuplot is absent. You can check if MCM was able to locate the correct gnuplot version on your system using the command checkGnuplot in the MCM command line (see section 4.1 for details on MCM's command line interface). MCM 1.3 is most compatible with gnuplot 4.6 (the latest release as of September 6, 2014), the sources and installation scripts of which should be included with the MCM package. If this is not the case, you can obtain gnuplot here.
Gnuplot has to be compiled and installed separately from MCM, which is typically done using the command line tools config and make 7 . On Mac OS X, this would look similar to gunzip -c gnuplot-4.6.5.tar.gz | tar xopfcd gnuplot-4.6.5 ./configure --with-readline=builtin --enable-qt make sudo make install If you run into troubles during compilation or installation, consult the gnuplot FAQ. On Mac OS X, gnuplot can also be installed via the homebrew package manager (if available) using the command brew install gnuplot If you already have a gnuplot version other than 4.6 installed, it is recommended that you also install gnuplot 4.6 in a separate location and tell MCM where to find it using the setGnuplot command, e.g.
# explicitly specify path to gnuplot version 4.6 setGnuplot "/usr/local/bin/gnuplot4.6" Omitting the above command, or calling setGnuplot "", will result in MCM silently trying to locate and use the local gnuplot installation.
If a compatible gnuplot version is available, MCM generates plots in one of the following vector graphics formats: PDF: Requires one of the three command line programs epstopdf, ps2pdf or pstopdf, all of which are free and at least one of which will typically be installed on modern UNIX-like systems. epstopdf can be retrieved at http://ctan.org/pkg/epstopdf. ps2pdf is part of the Ghostscript package which can be obtained at http://pages.cs.wisc.edu/~ghost/. pstopdf is part of standard Mac OS X installations (as of Mac OS 10.4). If neither epstopdf, ps2pdf nor pstopdf can be found by MCM, plots are saved in postscript (EPS) format instead. printod: Print the current output directory. Note that the macro $od$ also evaluates to the current output directory (see section 4.3 on macros).
openod: Open the current output directory in the default file browser.
changeod: Change output directory to the path following the command. If the path is a relative one, it is interpreted relative to the current output directory.
setod: Change output directory to the path following the command. For example, each of the following four variants result in the same output directory: setodnew: Similar to setod, but the target directory name is extended by a integer that ensures that the new output directory does not yet exist.
saveContext: Save the current values of all flags, macros and variables as a script with the name context into the current output directory. If followed by a file path, saveContext saves the script to the specified path instead (relative to the current output directory, if the given path is relative).
loadContext: Followed by a path to a script previously saved by saveContext. Reverts all flags, macros and variables to the state saved in the script. For example, after the following commands the flag parallel will be set:  All command, flag, macro and control variable names are case sensitive, i.e. QuiT is not valid (but quit is). File paths given as values to a control variable name must be enclosed in quotes if they contain any whitespace.

Reading syntax overviews
MCM (and this user guide) provides simple overviews of the input syntax required for various commands and model configurations. These overviews are in turn shown in a particular format, which uses special brackets to differentiate between input structures. More precisely, curly brackets "{}" enclose a commaseparated list of alternative blocks, of which exactly one needs to be present. Angular brackets "<>" enclose the unformatted description (not to be taken literally) of a block. Everything outside of brackets is to be taken literally. For example, represents a block that can either be empty, "dog", "cat" or an arbitrary space-separated list of plant names. Round brackets "()" enclose a comma-separated list of blocks, any non-empty subset of which can be present (in any order). For example, can either be "A B", "A C", "A B C" or "A C B". To see the syntax of all MCM commands and control variables, call help in the MCM command line. To see the syntax of a particular MCM command or control variable, call help followed by the name of the command or control variable (e.g. "help fit_objective").

MCM command line macros
Simple text macros can be defined using the macro command and evaluated thereafter by enclosing their name in "$" symbols. For example, the following commands set the MC model configuration directory to "MC models/bioreactor": macro parent:MC models # general syntax: define <name>=<value> macro child:bioreactor set MCmodelDir "$parent$/$child$" The macros today (evaluating to the current date), now (evaluating to the current date and time) and od (evaluating to the current output directory) are reserved. For example, the following command sets the output directory to "simulations/2014.09.06_15.39.49.863" (at the time of this writing): setod "simulations/$now$" The current value of a macro can be viewed by calling state, followed by the macro name. See section 5.13.1 for macro naming rules. Command line macros also apply to MC models, and provide a way to modify MC models through the MCM command line (section 5.8).

MCM scripts
MCM is meant to be primarily controlled through human-readable scripts with a simple syntax. This facilitates archiving, task automation, high-throughput execution of multiple simulations and incorporation into other computational pipelines (e.g. through automatic MCM script generation). Scripts can be loaded using the command loadScript (followed by the script path), or by providing the script path as an optional argument when calling MCM from the terminal, e.g. as A microbial community (MC) model is defined by (i) an arbitrary number of metabolites, (ii) an arbitrary number of (reversible or irreversible) chemical reactions between any of the metabolites and (iii) an arbitrary number of (cellular) species, each species being able to perform any subset of the reactions and exchange any of the metabolites with its environment. Optionally, an additional set of numerical environmental variables (such as light intensity, temperature or pH) can also be included in the model, for example to model light-limitation of photosynthesis.
It is also possible to define a list of model objects (i.e. metabolites, reactions and species) that one wishes to particularly focus on in the analysis. Whether a metabolite, species or reaction is focal or not does only affect the amount of output produced by MCM but has no effect on the microbial community dynamics. This is very useful for large-scale metabolic models that can comprise several hundreds of metabolites and reactions [1,44], only a few of which might be of real interest.
Environmental variables, metabolites, reactions, species and focals need to be defined in designated text files of a particular format, explained in detail in sections 5.1, 5.2, 5.3, 5.4 and 5.6 below. An overview of the MC model configuration syntax can be obtained using the command modelSyntax. The syntax is human readable and fairly easy to understand through the snippets given below (also see the complete examples in sections 10, 11 and 12). All configuration files need to be located in a single MC model configuration directory, to which MCM needs to be pointed using the MCmodelDir control variable, e.g. as follows set MCmodelDir "microbial_community_models/bioreactor" In addition, MC models can include symbolic (or abstract) parameters, with a predefined range, default value and optional probability distribution. Symbolic parameters play a central role in parameter estimation, sensitivity analysis and uncertainty analysis, and are elaborated on in section 5.7.
Technical note: All physical quantities specified in the model configuration are assumed to be given in specific physical units, summarized in section 5.13.4.

Introduction
MCM allows the inclusion of any number of environmental variables. Environmental variables can be (i) any explicit function of time, metabolite concentrations and other environmental variables, (ii) interpolated from available data, (iii) a stochastic process or (iv) dynamic, whose rate of change is any of (i), (ii) or (iii). Environmental variables, in turn, can be specified to influence metabolite uptake/export limits of cells, reaction rate limits, environmental metabolite dynamics or even other environmental variables. For example, pH in a bioreactor might be set to a measured profile and in turn used as a parameter in ammonia-ammonium dissociation equilibria. Alternatively, pH might be specified as a function of organic acid concentrations, and in turn influence the growth rate of cells. Similarly, light might be defined as a periodic function of time which in turn influences maximum photosynthesis rates in photosynthesizing cells. Furthermore, environmental variables might be defined as convenient summary variables explicitly depending on the system's state. For example, the concentration ratio of two particular metabolites, if of interest, can be defined as an environmental variable the temporal profile of which can be included in a simulation's output.

Specifying environmental variables
All environmental variables need to be defined in the optional environment configuration file, located in the MC model directory. Each environmental variable definition consists of a unique name on a single line, followed by several key:value pairs given on consecutive lines (in any order): • description (optional) • An environmental variable's description, units and scaling can be empty or omitted, and are only relevant to graphical output style. sign can either be omitted, or set to any (default), negative or positive. These choices will affect wether the sign of the environmental variable will be enforced during simulations, e.g. in the case of numerical inaccuracies. error model can either be omitted, or set to none, normal (default) or logNormal. These choices affect the assumed error structure when comparing the environmental variable's time course to experimental data (e.g. for model fitting, see section 7).
value interpolate "data/measured_O2.txt" File paths must either be absolute (e.g. /users/mikey_mouse/my_time_series) or given relative to the MC model directory (e.g. ../my_time_series). See section 14.3.6 on format requirements for time series data files. A combination of (i) and (ii) is also possible, e.g.
value interpolate "measured_O2.txt" and OU 10 5 # superimpose artificial noise # on the measured data (iii) a dynamical variable whose rate of change can depend, among others, on time (t), metabolite concentrations, cell concentrations, environmental variables and biomass death rate. The general syntax is initial <initial concentration> rate <additive flux components separated by " and "> where each of the rate components can be one of (without duplication): OU <standard deviation> <correlation time> logOU <positive mean> <standard deviation> <correlation time> periodic <amplitude> <period> <phase lag> biomass_death <change per (g dry weight dead biomass)/L> <a custom function of time (t), metabolite concentrations, environmental variables, cell concentrations, reaction Gibbs free energies, cell growth, birth and death rates, environmental reaction rates, environmental metabolite production rates, community-wide reaction rates, community-wide metabolite export rates, cell-specific reaction rates, cell-specific metabolite fluxes> Examples are given below for the environmental variable TSS (total suspended solids): initial 0 rate biomass_death 0.06 # TSS increases by 0.06 per g dry weight dead biomass initial 100 rate biomass_death 0.06 and -0.1*TSS # account for sedimentation initial 0 rate logOU 1 0. 2   (iv) a dynamical variable whose rate of change is a piecewise linear interpolation of a given (e.g. measured) time series, e.g.
initial 0 rate interpolate "data/TSS_flux_through_pump.txt" Note that the file path must either be absolute or given relative to the MC model directory. A combination of (iii) and (iv) is also possible, e.g.
initial 0 rate interpolate "TSS_flux.txt" and biomass_death 0.06 (v) the acid/base/acid+base component of a conjugate acid-base pair. This requires the specification of a paired metabolite and an acid dissociation constant, e.g. Circular dependencies among environmental variables and metabolite concentrations are not allowed and will be detected automatically.

Temperature and pH as special cases
In general, environmental variables can have any meaning and their proper (or improper) use (e.g. for regulating reaction rates and metabolite uptake kinetics) is the responsibility of the user. However, if an environmental variable is named "pH", it is interpreted as such and is used by MCM to calculate acid-base dissociation equilibria between metabolites whenever applicable (see option v in section 5.1.3). Similarly, if temperature is defined as an environmental variable (with units C, K or F), it is used to correct acid-base dissociation constants at non-standard conditions, as well as to calculate Gibbs free energies of reactions (section 5.3.2).

Metabolites
All considered metabolites need to be defined in the metabolites configuration file, located in the MC model directory. Each metabolite definition (record ) consists of a unique metabolite name on a single line (see section 5.13.1 for naming rules), followed by several key:value pairs given on consecutive lines (in any order): • description (optional) • active uptake objective (optional) • active export objective (optional) • max active uptake rate (optional) • max active export rate (optional) • max passive uptake rate (required) • max passive export rate (required) • DeltaGf0 (optional): Standard Gibbs free energy of formation (kJ/mol) Any excessive white space, empty lines or comments preceded by the '#' symbol are ignored. The first non-empty line in the file must indicate the file version (1.3). An example metabolites file is given below (see thereafter for details): file_version: 1.3 # do not remove, move or change this line O2 description: dissolved oxygen active uptake objective: 0 active export objective: 0 max active uptake rate: Monod 1.2e-13 3.9e-6 # Monod uptake kinetics, needs Vmax and Khalf max active export rate: unlimited max passive uptake rate: custom 1e-15 * O2 # proportional to O2 concentration max passive export rate: 0 # oxygen concentration given as a periodic function of time (t, in days) environmental dynamics: concentration 280e-6 + 10*sin(2*pi*t)

NH3_NH4
description: ammonia+ammonium # linear coefficients for active cell-environment transport in FBA objective function # (g dry-weight biomass per mol transported) active uptake objective: 0 active export objective: 0 # maximum uptake rate (mol/(cell*day)) depends on environmental NH3 concentration (mol/L) max active uptake rate: custom 1.2342e-13 * NH3/(NH3 + 26e-6) # if a cell has an active export mechanism, assume unlimited capacity max active export rate: unlimited # assume no passive (i.e. without explicit mechanism) uptake nor export max passive uptake rate: 0 max passive export rate: 0 # initial concentration changes according to microbial uptake/export environmental dynamics: initial 0.916e-3 flux microbial_export NH3 description: # description can also be blank active uptake objective: 0 active export objective: 0 max active uptake rate: 1e-2 max active export rate: unlimited max passive uptake rate: 0 max passive export rate: 0 DeltaGf0: -26.7 # standard Gibbs free energy of formation # NH3 concentration is determined by acid dissociation of ammonia # and therefore also depends on pH environmental dynamics: base_of_acid_plus_base NH3_NH4 5.62e-10

Metabolite transport kinetics
In the previous example you will notice that maximum uptake/export rates can be defined in several ways: • As unlimited (i.e. unconstrained).
• As a non-negative numerical constant.
• According to Monod uptake kinetics, by specifying the maximum transport rate at saturation ("V max ") and half-saturation concentration ("K half "), e.g.
Active metabolite transport limits apply to species for which the presence of an active transport mechanism is explicitly specified (see section 5.4), while passive metabolite transport limits apply to all other species. To prevent exchange of a metabolite in cells for which no active transport mechanism has been specified, set the max passive uptake/export rate to 0.
It is possible to define multiple versions of active metabolite transport limits, for example corresponding to different alleles of the same transporter enzyme or different transport mechanisms. For example, the following record defines a metabolite with three different versions of max active uptake rate.
NH3 max active uptake rate: 1e-2 # wild type version max active uptake rate allele01: 1e-3 # alternative version 1 max active uptake rate allele02: unlimited # alternative version 2 max active export rate: unlimited max passive uptake rate: 0 max passive export rate: 0 Active transport versions follow similar naming conventions as metabolite names (section 5.13.1) and are distinguished using the "." separator, e.g. NH3 (wild type), NH3.allele01 and NH3.allele02. This allows 45 click here for source code for inter-specific variations of uptake constraints for the same metabolites (see section 5.4 for details).
concentration interpolate "data/measured_O2.txt" Note that file paths must either be absolute or given relative to the MC model directory. See section 14.3.6 on time series data file format requirements. A combination of (i) and (ii) is also possible, e.g.
concentration interpolate "measured_O2.txt" and OU 10 5 # superimpose artificial noise # on the measured data (iii) a dynamical variable whose rate of change (flux) can be an explicit function of, among others, time (t), metabolite concentrations, cell concentrations, environmental variables, reaction Gibbs free energies and others (see table 1 for a complete list of model variables that can be included). Microbial uptake/export, production/consumption by environmental (abiotic) reactions (section 5.3.5) as well as biomass recycling can also be included. The general syntax is initial <initial concentration> flux <additive flux components separated by " and "> where each of the flux components can be one of (without duplication): OU <standard deviation> <correlation time> logOU <positive mean> <standard deviation> <correlation time> periodic <amplitude> <period> <phase lag> microbial_export environmental_production biomass_death <mol of metabolite released per g dry weight dead biomass> <a custom function of time (t), metabolite concentrations, environmental variables, cell concentrations, reaction Gibbs free energies, cell growth, birth and death rates, environmental reaction rates, environmental metabolite production rates, community-wide reaction rates, community-wide metabolite export rates, cell-specific reaction rates, cell-specific metabolite fluxes> (iv) a dynamical variable whose rate of change is a piecewise linear interpolation of a given (e.g. measured) flux time series, e.g.
initial 0 flux interpolate "data/O2_flux_through_pump.txt" Note that the file path must either be absolute or given relative to the MC model directory. A combination of (iii) and (iv) is also possible, e.g.
initial 0 flux interpolate "fertilization.txt" and microbial_export and biomass_death 0.06 In all of these cases, pH must be defined as an environmental variable (section 5.1.4). If temperature is also defined as an environmental variable (in • C, • K or • F), then acid dissociation constants are corrected for non-standard temperatures.
Circular dependencies among environmental variables and metabolite concentrations are not allowed and will be detected automatically.

Reactions
Reactions are defined in the reactions configuration file, located in the MC model directory. Each reaction definition consists of a unique reaction name on a single line (see section 5.13.1 for naming rules), followed by several key:value pairs given on consecutive lines (in any order): • description (optional) • equation (required) • max forward rate (optional, defaults to unlimited) • max reverse rate (optional, defaults to unlimited) • DeltaG0 (optional): Standard Gibbs free energy (kJ/mol) • objective (optional, defaults to 0) • environmental rate (optional, defaults to 0): Extracellular reaction rate (mol/(L · day)) Excessive whitespace, empty lines and comments preceded by the "#" symbol are ignored. The first non-empty line in the file must indicate the file version (1.3). An example reactions file with 2 reactions is given below: The description of a reaction can be left blank or omitted. equation and max forward/reverse rates are explained in more detail below.

Chemical equation
The general format for chemical equations is <list of reactants> -> <list of products> as demonstrated in the example above. At least one metabolite needs to be included in the reaction. All involved metabolites need to be defined in the metabolites file as described in section 5.2. Reactants are to be separated by single "+" signs, and may optionally be preceded by a numerical stoichiometric coefficient. The same holds for products. Stoichiometric coefficients and metabolite names need to be separated either by empty space (" ") and/or an asterisk ("*" To define bidirectional reactions, do not use "<->". Instead, use a positive value for max reverse rate.

Gibbs free energy
The Gibbs free energy of biocatalyzed redox reactions plays a central role in microbial spatial zonation [6] and has been quantitatively related to microbial growth rates [73]. The Gibbs free energy of any reaction is given by where ∆G 0 is the standard Gibbs free energy of the reaction, R is the gas constant, T is the temperature and Q is the reaction quotient [18].
MCM supports the incorporation of reaction energetics in microbial metabolism, by allowing reaction rate limits or objective coefficients to depend on ∆G (sections 5.3.4 and 5.3.3). MCM calculates a reaction's ∆G according to Eq. (7), using the current metabolite concentrations, current temperature and the reaction's ∆G 0 . A reaction's ∆G 0 can be specified using the reaction's DeltaG0 attribute. If the latter is omitted, MCM attempts to calculate ∆G 0 using the standard Gibbs free energies of formation (DeltaGf0, see section 5.2) of all involved metabolites.
Technical note: In order to use a reaction's DeltaG in an MC model, temperature must be defined as an environmental variable in one of the units • K, • C or • F.

Reaction rate limits
Cell-specific reaction rate limits (in either direction) are specified through the max forward rate and max reverse rate values. These can be any of the following: • unlimited (i.e. unconstrained). This is the default if unspecified.
• A non-negative number, optionally followed by a list of limiting and/or inhibiting metabolites and corresponding half-saturation/half inhibition constants, e.g.
1e-14 1e-14 inhibited_by NH2OH 1e-6 1e-14 inhibited_by NH2OH 1e-6 inhibited_by NO2 1e-7 limited_by O2 1e-5 Multiple limitations and inhibitions are combined in a multiplicative manner. Hence, the last case corresponds to a maximum reaction rate given by It is also possible to define multiple versions of reaction rate limits, e.g. corresponding to different alleles of the same enzyme. For example, the following record defines a reaction with three different versions of max forward rate: Reaction rate limit versions follow similar naming conventions as reaction names (section 5.13.1) and are distinguished to using the "." separator, e.g. amo (wild type), amo.order1 and amo.constant. This allows for inter-specific variations of reaction rate constraints (see section 5.4 for details).

Objective coefficients
The objective of a reaction is its contribution to biomass synthesis per intracellular reaction flux (g dry weight yield per mol Omitting a reaction's objective is equivalent to setting it to zero. Conventionally, FBA objectives are proportional to the flux through a single formal biomass reaction (as exemplified above) [93,38,25]. In that case, all other reactions have a zero objective coefficient but are required for the synthesis of biomass precursors.
Technical note: In certain circumstances a reaction's DeltaG can be +∞ or −∞ (e.g. if some products or reactants are absent, respectively). In these cases MCM may be unable to calculate cell metabolism, since infinite objective coefficients can lead to undefined FBA problems. It is therefore recommended to avoid such singularities using the functions max, min, escapeInf, escapeInf2, escapeNAN (see section 5.13.2). For example, escapeInf2(DeltaG,1e5,-1e5) evaluates to DeltaG or ±10 5 if DeltaG = ±∞ or DeltaG= ±∞, respectively.

Environmental rates
Some reactions can occur both as part of a cell's metabolism as well as abiotically in the environment. For example, the oxidation of H 2 S via reduction of NO − 3 to NO − 2 can be catalyzed by certain microbial clades but can also occur abiotically, albeit very slowly [77]. Similarly, nitrate reduction to ammonium in soils can occur biologically using organic carbon as electron donor as well as abiotically in the presence of sulfate Green rust [29,28]. In MCM, the environmental rate of a reaction can be, for example, a function of substrate concentrations or temperature. Some examples follow below: environmental rate: 0 environmental rate: 1e-5 * exp(-temperature/290) # temperature-dependent environmental rate: NH4 * O2/(O2 + 1e-8) # linear in NH4 and Michaelis-Menten in O2 environmental rate: NH4 * O2/(O2 + 1e-8) and OU 10 100 # same as previous + correlated noise In general, population growth specifications can comprise any of the following additive components: • interpolate: A piecewise linear interpolation of a given (e.g. measured) time series, e.g.
interpolate "data/anammox_rate_measurements.txt" Note that the file path must either be absolute or given relative to the MC model directory.
• An arbitrary custom function of time ('t'), metabolite concentrations, environmental variables, cell concentrations and the reaction's Gibbs free energy (DeltaG, if available), e.g.

Species
Cell species are defined in the species configuration file, located in the MC model directory. Each species is defined by a unique name (see section 5.13.1 for allowed names), a set of reactions it can perform, a set of available active metabolite uptake and export mechanisms, its dry cell mass and its expected life time under metabolic inactivity. In addition, an environment-dependent modulation of metabolic activity, as well as custom population growth terms, can be defined. Species records thus consist of a species name on a single line, followed by several key:value pairs on consecutive lines (in any order).
• initial concentration (required) • mass (required) • life time (required) • metabolic modulation (optional) • population growth (optional) • active uptakes (optional) • active exports (optional) • reactions (required) Any excessive white space, empty lines and comments preceded by the '#' symbol are ignored. The first non-empty line in the file must indicate the file version (1.3). An example species file is given below: The initial concentration gives the cell concentration at time 0 of a simulation and mass is the cell's mass in g dry weight (needed to convert biomass synthesis to cell production rates). The species' life time, metabolic modulation and population growth are explained in detail in the next section.
Metabolites specified as active uptakes or active exports are separated by a comma. Uptake/export rate limits for these metabolites are determined during simulation by their max active uptake/export rates as specified in the metabolites configuration file (see section 5.2.1). For all other metabolites, uptake/export rate limits are determined by their max passive uptake/export rates. Active uptake/export versions (if available) are specified using the "." separator, e.g. Each reaction can be preceded by an optional gene copy number (GCN), separated from the reaction name by either an asterisk ("*") or white space (" "). The default gene copy number is 1. In the examples above, Nitrosomonas has 5 copies of the amo gene, while Nitrobacter has 10 copies of the nxr gene. Non-positive GCNs are interpreted as gene absence (and thus reaction unavailability), but GCNs have no effect on metabolic activity otherwise. GCNs are merely used to translate predicted cell concentrations into gene concentrations, for example for comparison with metagenomic data (see section 7). Of course, the very assumption that each reaction corresponds to one gene via one enzyme can be questioned, e.g. when protein complexes are encoded by several genes. Hence, the final interpretation or use of MCM's predictions about gene densities should be case-dependent and subject to the user's judgement.
Note: You should include at least one reaction with non-zero objective per species. This will typically be a formal species-specific biomass synthesis reaction [93,38,25].

Cell life time
Cell death is modeled as an exponential decay process. The

Metabolic modulation
Apart from from reaction and metabolite transport rate constraints, a general inhibition of cell metabolism depending on environmental conditions can be specified in a species' metabolic modulation. The latter is a factor applied to the cell's growth rate, reaction rates and metabolite fluxes post-FBA, i.e. after calculation of the optimal metabolic activity. For example, the following modulation factor specifies that cells become inactive once lactate concentrations exceed the threshold 1 mM: metabolic modulation: custom theta(1e-3 -lactate) The syntax for metabolic modulation is almost identical to that of life time (section 5.4.1), with the exception that infinite is not permissible. In particular, any function of time, metabolite concentrations, environmental variables and cell concentrations can be used. By default, metabolic modulation is 1 (i.e. no modulation).

Population growth
By default, cell population growth dynamics comprise cell production and exponential decay ("birth and death"). However, arbitrary dynamics can be specified instead through population growth. Some examples follow below: population growth: birth # birth but no death population growth: birth and death # default population growth: birth and death and -Nitrobacter*amoebae/10 # include predation by amoebae population growth: birth and death and comb(t,0.1,10) # inoculate every 10 days In general, population growth specifications can comprise any of the following additive components: • birth: Growth by biomass production.
• death: Exponential death at a rate equal to the inverse life time.
interpolate "data/Ecoli_inoculation.txt" Note that the file path must either be absolute or given relative to the MC model directory. • An Ornstein-Uhlenbeck [91,27] stochastic process of given standard deviation and autocorrelation time (zero mean), e.g. At most one of each of the above components is allowed. Components are separated by " and ".

The role of cell mass
MCM measures all reaction and metabolite transport rate limits on a per-cell basis. In contrast, measured metabolite uptake rates are often reported with respect to dry biomass (i.e. without any reference to actual cell counts). The same holds for reaction rate limits given in many cell-metabolic models, such as the ones published by the Model SEED project [32]. This technical difference might seem inhibitory to the parameterization of the model because all per-dry-biomass rates need to be converted to per-cell rates. However, due to the underlying model structure (Eq. (3) and (4)), the model predictions are in fact invariant to the choice of cell mass used to perform the transformation, as long as predicted cell concentrations (if of interest at all) are converted back to dry biomass in retrospect.

Compartmentalization within cells
While not immediately apparent, MCM can accommodate compartmentalized cell-metabolic models of arbitrary complexity. To compartmentalize reactions within cells, one can define formal variants of the same metabolite, corresponding to different compartments, and define formal transport reactions between compartments. For example, to differentiate between protons in the cytosol, the periplasm and the environment, one would define three formal metabolites, H_c, H_p and H_e, that participate in different reactions. Proton pumping from the cytosol to the periplasm might then be represented by a reaction of the form 0.5 O2 + 4 H_c + 2 Cyt552e -> H2O + 2 H_p + 2 Cyt552 (ETC proton pump by cytochrome aa3 quinol oxidase, oxygen as final electron acceptor [60]).

Species-specialization
Enzymatic capacities and transport efficiencies can differ among microbial strains, even if these share homologous genes. For example, a microbial community might include E. coli strains with different terminal oxidases of varying oxygen affinities.
MCM allows the differentiation of reactions, regulatory mechanisms and metabolite transport kinetics between cell species. To specialize reactions (such as biomass production) to individual species, one can define multiple versions of a reaction (using different names) and then selectively make these reactions available to different species. To specify different metabolite exchange or reaction rate limits for different species, there are two options: Option 1 (recommended): Define multiple alleles (versions) of the enzyme performing the reaction or of the active metabolite transporter. For example, to specify different O2 uptake rate limits for two E. coli strains, define two versions of the active O2 uptake mechanism, e.g. Option 2: One can define an intracellular and extracellular (or environmental ) variant of the metabolite, e.g. NH3_i and NH3_e, with the conversion between the two limited by species-specific transport reactions. For example, uptake of NH3_e might be set to unlimited, and NH3_e is transformed into NH3_i (which is used in the cell's internal reactions) by a reaction whose maximum rate depends on NH3_e concentration, i.e. reactions transport_NH3_Nitrosomonas description: Nitrosomonas-specific NH3 transport reaction equation: NH3_e -> NH3_i max forward rate: 1e-10 limited_by NH3_e 26e-6 # Monod-like uptake kinetics max reverse rate: unlimited # export not explicitly constrained objective: 0

Enforcing individual reactions
Experimental data might suggest that a particular reaction is activated during growth, contrary to FBA predictions. For example, certain S. enterica strains have evolved the capacity to secrete methionine during growth, a trait not captured by standard S. enterica FBA models [31]. To enforce the performance of reactions at a rate proportional to biomass production (with fixed proportionality constant), one can include in the reaction a dummy reactant produced exclusively by the biomass synthesis reaction. Maximum uptake/export rates for that dummy metabolite should be set to zero. For example, suppose that X + Y -> Z + W .. 0.1 DNA + 0.2 protein + 0.01 lipids -> are the equations for the enforced and biomass synthesis reactions, respectively. Then modifying these to X + Y + dummy -> Z + W .. 0.1 DNA + 0.2 protein + 0.01 lipids -> 100 dummy will result in 100 flux units through the first reaction per flux unit through the biomass synthesis reaction. To selectively deactivate this coupling in an other species, set the max active uptake/export rates for dummy to unlimited and include dummy in the species' active uptakes/exports lists.
Alternatively, you can also enforce the performance of a reaction at a constant rate, for example representing minimum maintenance requirements in the form of ATP leakage: Simply specify a positive max forward rate and set the max reverse rate to the negative max forward rate, e.g. as follows: Of course, the value 5.64e-14 can be replaced by a symbolic parameter or even an arbitrary function of metabolite concentrations and environmental variables. If the resulting FBA problem cannot be solved anymore (e.g. if energy metabolism is too weak to maintain the above reaction in terms of ATP production), MCM will assume zero metabolic activity and thus zero biomass production.

Introduction
Within MCM's model framework, at any given point in time the state of a microbial community is fully described by its environmental variables, metabolite concentrations and cell concentrations. However, in practice these so called state space variables (or independent variables) can often not be observed directly, but only inferred from other response or observable variables. While observables do not define a system's state per se, they can provide valuable information on the actual state of an ecosystem. For example, the cell concentrations of the ammonia oxidizers Nitrosomonas and Nitrosospira might not be separately measurable, but the total number of ammonia oxidizing bacteria might be.
To address the conceptual and practical difference between state variables and observations MCM allows the definition of separate observables. Observables can be functions of several state space variables such as metabolite or cell concentrations, or derived variables such as reaction rates and gene concentrations. Observables can even be functions of other observables. While observables do not themselves influence the course of a simulation, they can be used for model fitting (sections 2.6 and 7), sensitivity analysis (section 8) and statistical analysis (section 9).

Defining observables
Observables are are defined in the optional observables file, located in the MC model directory. Each observable has a unique name (see section 5.13.1 for allowed names), followed by several key:value pairs given on consecutive lines (in any order): • description (optional) • Observables can also depend on non-state variables such as reaction rates and gene concentrations. For example, the following observable is defined as the community-wide ammonia oxidation rate multiplied by the reaction's Gibbs free energy and divided by the amo gene concentration: Notice that when specifying an observable's value, some variables (e.g. reaction rates or Gibbs free energies) require the addition of a prefix (e.g. "rate_cw." or "DeltaG.") to avoid ambiguities. A detailed list of available variables and required prefixes is provided in table 2. For cell-specific reaction rates or cell-specific metabolite export rates the species name is also required, e.g.  Focal names can also include the wildcard symbol "*", which represents any possible text (including empty). For example, FOCAL_REACTIONS amo: ammonia oxidation to hydroxylamine nirBD: nitrite reduction to ammonium * sets all reactions to focal, and additionally specifies information tags for the reactions amo and nirBD. Alternatively, FOCAL_METABOLITES *N* specifies all metabolites containing "N" as focals. The actual value of the symbolic parameters Vmax_ammonium and Khalf_ammonium can vary depending on case and might even be random. Aside from their practical convenience, symbolic parameters are a prerequisite for higher level model analysis, such as automatic parameter fitting (section 7.3), sensitivity analysis (section 8) and uncertainty analysis (section 9). Furthermore, they provide an easy interface to automated MC model specification (e.g. by 3rd party software).
Symbolic parameters are defined in the optional file parameters, located in the MC model directory. Each parameter is characterized by (i) a unique name (see section 5.13.1 for allowed names), (ii) an optional description, (iii) a default numerical value, (iv) numerical lower and upper bounds, (v) an optional probability distribution (used for uncertainty analysis and randomly initialized fitting) and (vi) whether it is fixed, i.e. should always evaluate to its default value. Parameter records thus consist of a name on a single line, followed by several key:value pairs given on consecutive lines (in any order): • description (optional) • default (required) • minimum (required) • maximum (required) • distribution (optional) • fixed (required) Any excessive white space, empty lines and comments preceded by the '#' symbol are ignored. The first non-empty line in the file must indicate the file version (1.3). An example parameters file follows below: The default value of a symbolic parameter must be between its minimum and maximum, and the latter must always be finite. A symbolic parameter's distribution can be one of the following: 59 click here for source code • Empty (not specified) • uniform (i.e. uniformly distributed between minimum and maximum), • logUniform (i.e. log-uniformly distributed between minimum and maximum), • triangular (i.e. with a triangular probability density between minimum and maximum, maximized at default and zero at minimum and maximum), • normal, followed by the standard deviation (i.e. parameter follows a truncated-normal distribution between minimum and maximum, with mode at default), e.g.

QF -20*log(1-p) # exponentially distributed with expectation 20
This option allows the specification of an arbitrary probability distribution, as long as its quantile function can be expressed in basic mathematical terms (see section 5.13.2 on valid mathematical functions).
• RG, followed by an arbitrary random generator, i.e. a mathematical expression involving one or more independent random variables, e.g. Symbolic parameters can be used in any of the MC model configuration files environment, metabolites, reactions and species in place of any numerical value, by enclosing the parameter name between two dollar signs. For example, the symbolic parameters Vmax_ammonium and Khalf_ammonium might be used in the specification of ammonium uptake kinetics in the metabolites file: NH4 description: ammonium active uptake objective: 0 active export objective: 0 max active uptake rate: Monod $Vmax_ammonium$ $Khalf_ammonium$ max active export rate: unlimited max passive uptake rate: 0 max passive export rate: 0 environmental dynamics: initial 0.916e-3 flux microbial_export    has one template variable (V) ranging from 1 to 10, and a template body defining a Nitrosomonas cell variant for each value of V. Both the cell name and cell mass depend on the value of V, which is represented in the template body by $V$. The template thus defines 10 Nitrosomonas variants with names Nitrosomonas_1, .., Nitrosomonas_10 and cell masses 1 pg, .., 10 pg, respectively.

Macros
Similarly, the following template defines three metabolites (NH4, O2 and NO3) with similar properties: template{name:NH4,O2,NO3}{ $name$ max active uptake rate: 1e-14 max passive uptake rate: unlimited max passive export rate: unlimited environmental dynamics: initial 0 flux microbial_export } As the above examples show, a template variable's range can either be an integer interval (e.g. 1-10) or a comma-separated list of text values (e.g. NH4, O2 and NO3). To include arbitrary text in a variable's value 8 , enclose it in single or double quotes.
Any arbitrary text block can be templated in an MC model 9 . Text not included in a template is kept as-is. Template variables follow certain naming rules (explained in section 5.13.1) and are evaluated before symbolic parameters (section 5.7) and macros (section 5.8).
A template may specify multiple template variables (separated by ";"), in which case each possible combination of their values defines a separate replicate of the template body. For example, the following block defines a metabolite with 15 versions of active uptake rate limits (section 5.2.1): The files reactions and environment have been generated using the corresponding template files, and together with the pre-existing metabolites, species and observables files, define a complete MC model.

Periodic dilutions
MCM allows the specification of periodic dilutions of microbial communities, e.g. to simulate long-term experiments involving batch culture growth with periodic dilutions into identical media. Dilutions are performed at regular time intervals and result in a reduction of cell concentration by a fixed ratio. Metabolites whose environmental concentration is a dynamic variable, i.e. specified via a rate of change (see options iii and iv in section 5.2.2), can also optionally be diluted at the same ratio. Diluted metabolites are diluted into ambient medium in which their concentration is their initial concentration at time 0. Hence, a dilution factor of 1/10 will reduce all cell concentrations by a factor of 10 and set dynamical metabolite concentrations to 0.1 × C + 0.9 × C i (where C and C i are the current and initial metabolite concentrations, respectively). Dynamic environmental variables, i.e. specified via a rate of change (see options iii and iv in section 5.1.3), can be diluted in a similar way.
Dilutions can be specified using the following commands, preceding any simulation:

Compartmentalized ecosystem models
MCM can accommodate microbial ecosystem models consisting of multiple compartments, each of which is characterized by different cell and metabolite concentrations as well as environmental variables. For example, a simplified lake ecosystem model could consist of three compartments corresponding to the oxic, suboxic and anoxic zones (Fig. 17). Compartments can interact through inter-compartmental fluxes of cells and metabolites, such as oxygen diffusing from the oxic layer to the suboxic layer or cells sinking from each layer to the layer below. To construct a compartmentalized ecosystem model, each metabolite and cell species must be defined separately for each compartment in which they are present. In the above example, each metabolite would have up to three variants corresponding to the oxic, suboxic and anoxic layer. Environmental variables (such as temperature) that differ between compartments must also be defined separately for each compartment. Fluxes between compartments are then specified as environmental metabolite fluxes (section 5.2.2) or additional terms in cell population growth (section 5.4.3). For example, the following metabolites file defines oxygen for each of the three compartments, subject to exchange with the atmosphere as well as exchange between compartments: Reactions can either (i) be the same for all compartments, or (ii) be specified separately for each compartment. In case (i), cells will need to include reactions that formally convert between compartment-specific metabolites and the metabolite names used in the reactions. In case (ii), each cell species variant will need to include the appropriate reaction variants. This approach has the benefit that community-wide reaction rates are predicted separately for each compartment. For example, aerobic glucose catabolism may be defined by the reactions The construction of compartmentalized MC models can be significantly streamlined using model templates (see section 5.9). For example, the above reactions may be defined through the template template{C:ox,sub,an}{ glc_catabolism_$C$ equation: glucose_$C$ + 6 O2_$C$ -> 6 CO2 + 6 H2O max forward rate: unlimited max reverse rate: 0 }

Non-stationary cell models
Conventional FBA assumes internally quasi-stationary cells, i.e. in which all fluxes are stoichiometrically balanced by metabolite uptake/export rates, and whose metabolic activity only depends on the external environment but not on intracellular dynamics. This assumption is usually justified by the general observation that intracellular metabolic transients are much shorter than the time scales associated with cell growth [51,14].
However, in certain cases it may be desirable to extend conventional FBA models to include intracellular dynamical state variables, for example to account for delays in enzyme synthesis during response to external perturbations [14,4]. Non-stationary cell models also enable so called regulatory FBA (rFBA) approaches [14,13,33], in which dynamical regulatory constraints restrict the FBA solution space depending on the abundance of internally produced inhibitors or activators.
MCM allows the incorporation of cell-internal dynamical variables whose rate of change can depend, for example, on the cell's current metabolic activity or external factors such as temperature and metabolite concentrations. In turn, cell-internal variables can, for example, influence a cell's metabolism or death rate. Dynamical cell internal variables are formally defined as environmental variables (section 5.1), as illustrated in the examples below.
Example 1: Let us consider a hypothetical enzyme produced at a rate proportional to the rate of reaction amo in species Nitrosomonas. The environmental variable representing the cell-internal abundance of that enzyme could be similar to the following: Residual enzymes can also contribute to post-transcriptional or post-translational regulation [4]. For example, if amo_enzyme_Ns acts as an inhibitor to the reaction nir, then one might specify a Nitrosomonasspecific version of nir whose maximum forward rate decreases with increasing amo_enzyme_Ns: nir max forward rate: 1e-14 # default rate limit max forward rate Ns: 1e-14/(1e-10 + amo_enzyme_Ns) # rate limit in Nitrosomonas max reverse rate: 0 ... Care should be taken to assign the appropriate nir version to the appropriate cell species (i.e. nir.Ns only to Nitrosomonas; see section 5.4). See section 5.3.3 for details on defining reactions.
Example 2: Let us consider a hypothetical freeze-shock resistance protein that is produced by Nitrosomonas during exposure to low temperatures at a small cost (in terms of ATP). We shall assume that in the absence of this protein, a cell's metabolism is reduced at temperatures below 10 • C. The appropriate environmental variable might look as follows: Observe that freeze_protein_Ns is only produced at temperatures below 10 • C, and at a rate proportional to the difference 10-temperature. Furthermore, freeze_protein_Ns decays at an exponential rate (0.2/day) following production. Also note that the environmental variable temperature needs to be defined separately (in this case in • C; see section 5.1.4 for details and section 2.7 for an example).
To account for freeze_protein_Ns production costs, we specify an additional reaction (section 5.3) that consumes ATP at a rate proportional to the protein's production rate: Observe that by choice of the reactions rate limits, we are enforcing the performance of this reaction at a rate equal to the rate of freeze_protein_Ns production. We assumed that 15 mole ATP are needed to produce one mole of freeze_protein_Ns. To model the opposite effects of temperature and freeze_protein_Ns on cell metabolism, we can adjust the metabolic modulation of Nitrosomonas cells, e.g. as follows: Nitrosomonas metabolic modulation: exp(-max(0,10-temperature) * 1e-15/ (1e-15+freeze_protein_Ns)) ...
Observe that in the absence of freeze_protein_Ns, a cell's metabolic activity drops exponentially with every • C below 10 • C, however the strength of that effect is weakened by freeze_protein_Ns. Alternatively, freeze_protein_Ns could also influence specific reactions, e.g. by modifying their rate limits (section 5.3.3).

Naming rules
Environmental variables, metabolite, reaction, species, observable, symbolic parameter and macro names must only contain letters (a-z, A-Z), digits (0-9) and/or underscores (_), must not begin with a digit and must not be "and" or "end". Environmental variable, metabolite, reaction, species and observables names must not clash and cannot be "t" (which is reserved for time) nor "DeltaG". Metabolites, reactions, species and observables must also not be named "pH" nor "temperature" (which are reserved for environmental variables).
All names are case-sensitive (i.e. NH3 is not the same as nh3).

Custom functions
Many biochemical and physiological specifications (e.g. maximum metabolite uptake rates, maximum reaction rates) can include custom analytic functions of time, environmental variables or metabolite concentrations (e.g. see sections 5.2.1, 5.2.2, 5.3.3 and 5.4.1). The general syntax for analytic functions follows standard mathematical conventions, as known for example from C-like programming languages or MATLAB [50]. Only round bracket types ("()") are allowed. Variables and function names are separated by spaces and/or binary operators (implicit multiplication is not supported). Recognized standard function names are sin, cos, tan, cot, acos, asin, atan, acot, cosh, sinh, tanh, coth, exp, log, log10, sqrt, abs, min, max, atan2, pulse, comb, theta, ceil and floor. The functions ceil and floor return the next-largest and next-lowest integer to their argument, respectively. theta(x) returns In addition to standard mathematical functions, several functions associated with acid dissociation and pH are also supported. A detailed list is given in Table 3. These functions can be used, for example, to define the environmental variable pH (section 5.1) as a function of various acid concentrations in water: The function escapeNAN(x, y) can be used to define an alternative expression (y) in case the first expression (x) evaluates to NaN. For example the function escapeNAN(NH3/O2, 1) will evaluate to 1 if both NH3 and O2 are zero. More generally, the function escapeNAN2(x, y 1 , y 2 ) can be used to choose between y 1 (if x is not NaN) or y 2 (if x is NaN). Similarly, the function escapeInf(x, y) can be used to define an alternative expression (y) in case x evaluates to ±∞. More generally, the function escapeInf2(x, y, z) evaluates to x, y or z if x = ±∞, x = +∞ or x = −∞, respectively.

Random generators
Random generators (e.g. for the random distribution of symbolic parameters, section 5.7) follow a similar syntax to custom functions (section 5.13.2), but can additionally contain one or more independent random variables, e.g. rchisquared(2) # chi-squared with 2 degrees of freedom rnormal(1,0)^2 + rnormal(1,0)^2 # also chi-squared with 2 degrees of freedom asin(runiform(-1,1)) # arcsin of a random value picked uniformly within [-1,1] Table 4 lists all available random variables.  quantity unit time days cell dry mass g cell and gene concentration 1/L cell growth rates 1/day frequency 1/day metabolite concentration mol/L metabolite fluxes (uptake/export) per cell mol/(cell · day) community-wide and environmental metabolite fluxes mol/(L · day) metabolite release via biomass recycling mol/(g dry biomass) reaction rate per cell mol/(cell · day) community-wide reaction rates mol/(L · day) acid dissociation constants mol/L objective coefficients of reactions (g dry biomass)/(mol flux) objective coefficients of metabolite transports (g dry biomass)/(mol transported) Gibbs free energy of formation of a metabolite kJ/mol Gibbs free energy of a reaction kJ/mol Gibbs free energy rate (community-wide) An overview of physical units can also be obtained with the MCM command modelSyntax. In general, environmental variables can have arbitrary units and their interpretation is the responsibility of the model designer. In fact, MCM has no way of checking whether an environmental variable makes any physical sense. The sole exception is pH, which is interpreted as − log 10 [H + ]. Furthermore, for some models temperature can only be defined in one of the units • K, • C or • F (an error message will indicate so otherwise).
Technical note: By convention, MCM considers microbial communities in a 3D setting, so that cell concentrations and metabolite concentrations are measured with respect to 3D volume units (L). However, the model can also be adapted to other dimensions (e.g. two), by formally reinterpreting the volume unit L accordingly (e.g. as surface area). Of course, then, all volume-specific model parameters (e.g. metabolite half-saturation concentrations, environmental metabolite fluxes, acid dissociation constants, pH) will also need to be adapted to the new volume unit.

micog: Converting conventional FBA models into MC models
Together with MCM, you should have obtained a python script called micog.py (microbial community generator). This script converts conventional FBA models in SBML file format 10 [37], such as generated by the Model SEED pipeline [32], into a draft MC model that can be used with MCM. Multiple input SBML files can be specified simultaneously using conventional shell wildcards (in quotes); all metabolites and reactions are pooled together and cell metabolic potentials are defined according to where the different metabolites and reactions were found 11 . Metabolite uptake/export mechanisms are detected through reactions with either no reactants or no products 12 ; for the involved metabolites the cell is assumed to have active uptake or export mechanisms, limited according to the original reaction rate limits. Note that micog will correct all non-MCM compliant names (in SBML referred to as "IDs") by replacing invalid characters with and inserting underscores ("_") where necessary. micog will print out several reports documenting the conversion process. The generated draft MC model typically requires further manual curation, for example to adjust uptake/export rate limits. A detailed description of micog options can be obtained by calling Note: You might want to install micog to /usr/local/bin/ (or similar) for system wide access. It might also be necessary to give micog execute permissions before running, e.g. using chmod u+x micog.py in the terminal.
Technical note: There exists no standardized way to save cell-metabolic FBA models in SBML format (despite its widespread use), so you might need to modify the micog script to convert your favorite files. micog has been successfully used to convert SBML files published by the Systems Biology Research Group [70,24], the BiGG metabolic reconstruction database [79] and files generated by the Model SEED pipeline [32].

Example 1
In this example we will be converting a cell metabolic model of M. tuberculosis, obtained from the BiGG database [79]. Let us download the compartmentalized M. tuberculosis iNJ661 model in SBML format from the BiGG wegpage. You should obtain the file SBML_export.xml. We can now convert the file to a draft MC model using the command: micog.py -i SBML_export.xml \ -o MC_model_tuperc \ --species_naming name_tag \ --cell_concentration 1e3 \ --life_time "infinite" \ --cell_mass 1e-13 \ --objective original \ --environmental_metabolite_dynamics "concentration 0" \ --verbose ("\" denotes a line continuation). The first two options (-i and -o) define the input SBML file(s) and output directory, respectively. The option --species_naming defines how names are assigned to cell species in the MC model: In our case, the choice name_tag specifies that micog should use the model 'name' tag in the SBML file (which in our case is "M. tuberculosis iNJ661", and which will be converted to the MCM-compatible name "M__tuberculosis_iNJ661"). We arbitrarily set the initial cell concentration to 1 × 10 3 cells/L. The cell life time is set to infinite (i.e. no cell death), although this can be changed in the draft MC model later on.
Since the iNJ661 model defines rates with respect to dry biomass, we need to provide micog with the dry cell mass (here arbitrarily chosen to be 1 × 10 −13 g) in order to convert these to per-cell rates (as required by MCM). The model already defines an objective function in terms of biomass synthesis (g dry weight per mmol reaction flux), which is why we chose --objective original.
Technical note: The alternative choice to original is biomass, in which case micog tries to detect a biomass metabolite (whose name is specified using the --biomass_name option) and a biomass synthesis function, and sets the objective to the production of the biomass metabolite. This is more suitable for SBML files generated using the Model SEED pipeline [32] (see the example in section 5.14.2).
We set the concentration of all metabolites to zero using the --environmental_metabolite_dynamics option. However, this will not affect the cell's metabolism, because the iNJ661 model specifies all metabolite uptake/export constraints as constants regardless of substrate concentrations. It is then our responsibility to change these to substrate-dependent functions. Furthermore, individual metabolite dynamics can of course be adjusted in the draft MC model. The --verbose option tells micog to be verbose, i.e. print out detailed information along the way.
The generated draft MC model (comprising 937 reactions, 826 metabolites and 1 species) will be written to the MC_model_tuperc directory. Simulating the model in MCM without any further adjustments (see section 6) predicts a per-capita growth rate of 1.285 day −1 .
70 click here for source code 5.14.2 Example 2 In this example, we shall convert two cell-metabolic models for Nitrobacter winogradskyi Nb-255 and Nitrosomonas europaea obtained as SBML files from the Model SEED pipeline [32] (models Seed323098. 3 and Seed228410.1, respectively).
Recall that the cell mass will influence the conversion from mass-specific constraints in the original models to cell-specific constraints in the MC model. Since the model will include different cell species, you should specify their mass individually for each species. To do so, rename the two downloaded SBML files, Seed323098.3.xml and Seed228410.1.xml, to Nitrobacter_mass=4e-13.xml Nitrosomonas_mass=3e-13.xml respectively. As a result, micog will infer cell masses from file names. Place the two SBML files into a new directory (let's call it nitrifier_sbmls).
Converting the two models into an MC model using micog is straightforward: micog.py -i "nitrifier_sbmls/*.xml" \ -o MC_model_nitrifiers \ --species_naming file_name \ --life_time "infinite" \ --cell_concentration 1e3 \ --environmental_metabolite_dynamics "initial 0 flux microbial_export" \ --objective biomass \ --biomass_name Biomass_b \ --verbose Note the necessary quotes around the input file path and the wildcard "*" representing any file name. Via the --species_naming option we tell micog that species names are to be inferred from the SBML file names (i.e. Nitrobacter and Nitrosomonas). We set --life_time to infinite since we're not interested in modeling cell death and we set the initial cell concentration for both species arbitrarily to 1×10 3 cells/L. In contrast to the previous example, we use the --environmental_metabolite_dynamics option to specify all metabolites as dynamical with an initial concentration of zero and a rate of change determined by the net microbial export (see section 5.2.2 for other options).
In contrast to the previous examples, --objective is now set to biomass, which means that micog will try to identify a biomass metabolite and define the objective function according to which reactions produce that metabolite. The name by which the latter is identified, is specified using the --biomass_name option (in our case "Biomass_b", which is used by all SEED models). This is the appropriate choice for Model SEED FBA models, because the latter do not define an objective function per-se.
The generated draft MC model will comprise 1001 metabolites, 1009 reactions and 2 species. You will likely see a long list of warnings, telling you that reaction rate limits are inconsistent between Nitrobacter and Nitrosomonas (e.g. 9.6e-12 vs 7.2e-12). This is because we used different cell masses to convert from mass-specific rate limits to cell-specific rate limits, while the two original models define the same (arbitrary) rate limits for all reactions. Note that this (common) practice is advised against for MC models, i.e. unconstrained reaction rates should be set to unlimited rather than just a large number representing infinity. You should therefore change all max forward rates and max reverse rates of values 9.6e-12 or 7.2e-12 to unlimited in the draft MC model. Single simulations of a microbial community model can be performed using the runMCM command. Running a simulation requires the specification of an MC model in terms of environment, metabolites, reactions, species and observables files as described in section 5.
MCM will generate predictions of the time course of several output variables at the community level, such as reaction rates (total as well as average per cell), metabolite concentrations in the environment, cell concentrations (alive or dead+alive), gene concentrations (corresponding to cell concentrations) and metabolite fluxes (total as well as average per cell). If the control variable output_timeSeriesAnalysis is set to save (saveAndPlot), MCM also saves (and plots) all periodograms and autocovariances for all time courses [61,76]. In addition, MCM can plot the time course of several biodiversity measures and rank-abundance relationships for cells as well as genes (depending on the control variables output_cellBiodiversity and output_geneBiodiversity). The predicted time courses can optionally be statistically evaluated against available time series data, which can be provided in the form of simple tabular text files (see section 7).
At the beginning of or during a simulation, MCM can optionally produce several additional outputs: • If the control variable output_speciesSpecificMetabolism is set to save (saveAndPlot), MCM will also save (and plot) the time courses of reaction rates and metabolite fluxes for each individual species 13 .
• If the flag plotMetabolicPotentialHeatmap is set, MCM will generate heatmaps of the metabolic potential of each species. If the flags • metabolicHeatmaps_onlyFocalMetabolites, • metabolicHeatmaps_onlyFocalReactions and/or • metabolicHeatmaps_onlyFocalSpecies are set, only focal metabolites, reactions and/or species are included, respectively.
• MCM can also generate heatmaps of the current metabolic activity of each species at arbitrary times specified via the control variable plotMetabolicActivityHeatmaps, e.g. as • cumulative_metabolic_activity.txt, or • average_metabolic_activity.txt.
• MCM can also generate chord diagrams of metabolic fluxes across reactions (or cell species). These so called metabolic flux diagrams (or microbial flux diagrams) can be viewed and interacted with in any modern web browser. See Fig. 29 for an example. Similarly to metabolic activity heatmaps (see above), the times at which metabolic flux diagrams are generated are specified via the control variables • plotMetabolicFluxDiagrams, • plotMetabolicFluxDiagrams, and • plotAverageMetabolicFluxDiagrams.
Similarly, for microbial flux diagrams • plotMicrobialFluxDiagrams, • plotMicrobialFluxDiagrams and • plotAverageMicrobialFluxDiagrams. • MCM can save cell-metabolic FBA models with calculated constraints and fluxes in SBML file format [37] at times specified via the control variable saveCellMetabolicNetworksSBML (same syntax as for plotMetabolicActivityHeatmaps).
• If the flag saveNetworkStructure is set, MCM saves the community wide metabolic network as well as the implied gene-gene and metabolite-metabolite interaction networks, into common file formats (ARFF, BIOM [52], GEXF and SBML [37]). This can be useful for visualization with 3rd party tools [3,94].
• If the flag checkTheoreticalSystemLimits is set, MCM tries to estimate maximum cell growth rates based on the cell-metabolic model structure 14 .
The following MCM commands demonstrate typical choices regarding simulation output: set output_timeSeriesAnalysis saveAndPlot # save and plot periodograms and autocorrelations # alternative choices are skip' and save' set output_speciesSpecificMetabolism save # save (but don't plot) reaction rates # and metabolite fluxes individually # for each focal species set maxRARank 20 # maximum species rank to show in rank-abundance curves set maxACRShift 10 # maximum time-lag to show in autocorrelations set maxFourierFrequency 10 # maximum frequency to show in periodograms set statisticsInterval [20:*] # only use times>20 for statistical evaluation set checkTheoreticalSystemLimits # try to estimate maximum possible cell growth rates # might be useful for spotting bad FBA models unset saveNetworkStructure # don't save overall metabolic network structure set plotMetabolicPotentialHeatmap # save heatmap of metabolic potential per species set plotMetabolicActivityHeatmaps start 0 step 5 # save metabolic activity heatmap # every 5 days starting at day 0 set saveCellMetabolicNetworksSBML at 10 # save FBA models in SBML format at day 10 14 These estimates are to be taken with a grain of salt.

set saveCellMetabolicNetworksSBML_onlyFocalSpecies # only do so for focal cell species
The duration of the simulation, the size of the time series (i.e. the maximum number of recorded points during the simulation) as well as the integration time step can be controlled as follows: set integrationTimeStep 0.01 # this controls the numerical accuracy of the simulation set maxTimeSeriesSize 100 # only record at most that many points set maxSimulationTime 20 # simulate 20 days It is recommended to keep integrationTimeStep well below the typical time scales at which you expect your model variables (i.e. cell concentrations, metabolite concentrations and environmental variables) to change.
Technical note: MCM uses the explicit two-stage Heun integration scheme for all simulations. Particularly short time steps might be required near the boundary of the permissible state space, e.g. when substrate concentrations rapidly approach zero (discontinuous jumps of metabolic activity might be observed in that case). Near such boundaries, MCM will temporarily and iteratively refine (halve) the integration time step as needed. The maximum number of permissible refinements is set using the variable maxIntegrationTimeStepRefinements. 7 Comparing and calibrating models to data

Introduction
Following a simulation, model predictions for the following quantities can be statistically evaluated against available time series data: • metabolite concentrations, • net community wide metabolite export rates, • metabolite export rates per cell (averaged over cells using or producing them), • net environmental (i.e. extracellular) metabolite production rates, • gene concentrations, • gene concentrations (dead+alive), • community wide reaction rates, • reaction rates per cell (averaged over cells capable of performing them), • environmental (i.e. extracellular) reaction rates, • cell concentrations, • cell concentrations (dead+alive), • cell (per capita) growth rates, • environmental variables, • observables.
Statistical comparisons are performed in terms of the log-likelihood of the observed data, but other measures of goodness of fit are also available. The log-likelihood is the probability density at the observed data, assuming a particular underlying stochastic model, typically split into a deterministic and an error (or random) part.
Each variable is assumed to exhibit either (a) a normal or (b) a log-normal error distribution, as specified beforehand by user. The deterministic part (or zero-noise limit) of a variable is given by the model prediction. More concretely, case (a) assumes that at any point in time t, any model variable V i (e.g. NO 2 concentration) is related to its measured valueṼ i viã where E i (t) are uncorrelated standard-normal random variables and σ i is the constant (but unknown) standard deviation of measurement errors (henceforth error amplitude). Similarly, case (b) assumes that i.e. errors are normally distributed on a logarithmic scale. The latter error model is recommended for positive variables possibly spanning multiple orders of magnitude over time (such as cell concentrations), and is motivated by the central limit theorem (in the logarithmic domain) in the case of multiplicative products of several positive random variables. For example, measurements retrieved through a sequence of multiple (e.g. electronic, chemical, biological) amplification steps are subject to random errors at any point in the amplification sequence that are subsequently amplified further downstream [46]. In contrast to the widely used normal error model, the log-normal error model accounts for the heteroscedastic error structure expected for quantities that can range multiple orders of magnitude. In the above model, the standard deviation of the measurementsṼ i scales linearly with the deterministic value V i , with the scaling factor determined by σ i . In both error models, the deterministic part is in fact the median of the random measurment.
The log-likelihood of a particular variable V i , with measured valuesṼ i1 , ..,Ṽ iM and predicted to have values V i (t 1 ), .., V i (t M ) at times t 1 , .., t M , is given by in case of a normal error structure, and by in case of a log-normal error structure. MCM calculates the log-likelihood LL i for each model variable for which time series data are available. The unknown error amplitudes σ i are automatically estimated using maximum-likelihood estimation, i.e. by maximizing LL i . Note that the last terms in Eq. (11) (or Eq. (12)) resemble the classical sum of squared errors between model predictions and data on a linear (or logarithmic) scale. Hence, the normalized log-likelihood, NLL i = LL i /M , might be seen as an average goodness of fit for the variable V i evaluated on a linear (or logarithmic) scale. The overall log-likelihood of the model is the sum of log-likelihoods of all model predictions for which data is available. Maximizing the log-likelihood by suitable choice of unknown (or free) model parameters yields an estimate for those parameters (see section 7.3 on maximum-likelihood fitting).
It is also possible to provide data in a priori unknown (or arbitrary) units, such as cell concentrations measured in terms of optical densities with unknown calibration to actual cell counts. In that case, V i is conceptually replaced by V i /α i in the above error models, where α i is an unknown scaling factor also estimated using maximum-likelihood estimation. Hence, MCM also allows the calibration of unknown linear measurement units by fitting of cell-metabolic models. Obviously, such calibrations are only as reliable as the model is suitable for the system at hand.
Technical note: Likelihoods are, a priori, probability densities over permissible value space and hence depend on measurement units (see Eq. (11) and (12)). To allow for a comparison of log-likelihoods between different variables, MCM makes likelihoods unit-less by multiplying them with the arithmetic mean of the provided time series data. This rescaling has no effect on maximum-likelihood parameter estimation (see section 7.3).

Comparing simulations to data
Model predictions generated by a simulation (invoked by the runMCM command) are automatically compared to available time series data. Error models (either normal or logNormal, see the previous section) are specified in the MCM command line, e.g. as # specify the error model for each variable class # only 'normal' error structure is allowed for variables possibly becoming negative set errorModel_MetaboliteConcentrations logNormal set errorModel_MetaboliteExportCommunityWide normal set errorModel_MetaboliteExportPerCell none # disable data comparison set errorModel_MetaboliteProductionEnvironmental normal set errorModel_GeneConcentrations logNormal set errorModel_GeneConcentrationsDeadAlive logNormal set errorModel_ReactionRatesCommunityWide normal set errorModel_ReactionRatesPerCell normal set errorModel_ReactionRatesEnvironmental normal set errorModel_CellConcentrations logNormal set errorModel_CellConcentrationsDeadAlive logNormal set errorModel_CellGrowthRates normal The error model of environmental variables and observables is specified separately for each of them in the MC model files environment and observables, respectively (see sections 5.1.2 and 5.5.2).
Output variables can be excluded from any data comparison by specifying their error model as none.
Available time series data must be provided to MCM in a specific format, in a data directory specified by the MCdataDir option, e.g.
set MCdataDir "data/bioreactor" Time series data must be provided in a separate plain text file for each variable. Each data file must be named exactly as the corresponding environmental variable, metabolite, reaction, species or observable (e.g. "NH4", no extension) and placed within a subdirectory in MCdataDir corresponding to the general observable class (see Table 6). If both data files (e.g. Nitrosomonas as well as Nitrosomonas [au]) are provided, the latter is ignored. Data directories may also contain other unrelated files. Setting MCdataDir to empty (this is the default) will disable any data comparisons.
Model predictions are linearly interpolated between recorded time points during the simulation and compared at the exact time points for which data are provided. No extrapolation and thus no comparison is done for data points outside the simulated time frame. Data points can also be excluded using the control variable statisticsInterval. The latter option might be useful if one only wants to evaluate the model within a particular time interval, for example after reaching an equilibrium. Multiple values at identical time points are allowed and are either evaluated as independent data points or averaged, depending on the flag averageAmbiguousData. In the case of a log-normal error structure, all nonpositive data points or time points at which the model prediction is non-positive, are ignored.
For each evaluated model observable V i , MCM estimates the error standard deviation σ i (and data units, if unknown), calculates the log-likelihood and plots an overview of the time series data and the simulated time course. MCM also shows the estimated 95% centile region around the model prediction.
MCM calculates the overall model log-likelihood and plots an overview histogram of the normalized loglikelihoods for all variables. Similarly, MCM calculates and plots an overview histogram of the average normalized squared residuals 15 as well as the coefficients of determination (R 2 ) for each observable (see Fig. 35 for an example). All related output is written to the directory comparison_to_data. For easy further processing by other pipelines (e.g. likelihood optimizers calling MCM), the overall log-likelihood and the number of evaluated data points are also written into the separate file loglikelihood.txt.

Estimating (fitting) unknown model parameters
In the presence of measured data, the overall log-likelihood (LL) of the model (see section 7.1) is a measure for the goodness of fit to the data. Maximizing the LL by suitable choice (fitting) of unknown model parameters yields statistical estimates for these parameters, so called maximum-likelihood (ML) estimates.
In general, i.e. under mild regularity conditions, ML estimates approach an increasingly peaked normal distribution around the true (but unknown) parameter values. The inverse observed Fisher information, −1 (where LL is the overall log-likelihood and q are the free parameters), evaluated at the fitted parameter values, is a consistent estimator of the asymptotic covariance matrix of the ML estimates [17, §10.4]. As such, the observed Fisher information can be used to estimate confidence intervals for each fitted parameter [8].
Technical note: For the normal and log-normal error models (see section 7.1), ML estimation resembles conventional least-squares estimation on a linear and logarithmic scale [59], respectively, if one assumes a common error amplitude for all measurements. While convenient (because numerical minimization of squared errors is easy), this assumption is unjustified as soon as one considers multiple quantities, potentially with different physical units. ML estimation weights errors with respect to the estimated intrinsic error amplitude σ i of each observable, while also penalizing overestimates for σ i .
MCM automates the maximum-likelihood parameter estimation as well as the calculation of confidence intervals. Parameters to be fitted need to be specified as symbolic, as described in section 5.7. MCM uses generic optimization routines to maximize the fitting objective via small stepwise variations of symbolic parameters and repeated simulations. If the flag fit_calculateConfidenceIntervals is set, MCM will also estimate confidence intervals for the fitted parameters, as described above. The observed Fisher information is calculated by approximating the second derivative of the LL (known as the Hessian) using finite differences [55]. Finite difference steps are repeatedly halved (refined ) as required to achieve a satisfactory accuracy 16 .
By default, MCM maximizes the normalized log-likelihood (NLL) instead of the regular log-likelihood, to account for varying numbers of evaluated data points across different simulations. Typically, however, the number of evaluated data points will be constant over multiple fitting iterations, particularly in the proximity of an optimum, yielding the same parameter estimates as a maximization of the log-likelihood would. To maximize the non-normalized log-likelihood instead, set the control variable fit_objective to LL. Alternatively, if fit_objective is set to meanR2, fitting is performed by maximizing the mean coefficient of determination (R 2 ) of model predictions (this objective function, however, does not support the estimation of confidence intervals).
Parameter fitting is performed using the fitMCM command. fitMCM requires the presence of time series data for at least one model observable (see section 7) and at least one symbolic parameter to fit. Observables whose error model is set to none are not considered. Parameters specified as fixed are not fitted; rather, their default value is used throughout. Fitted parameters are only varied within their minimum and maximum bounds (which must be finite). Only data points within the interval specified by statisticsInterval are considered.
Note that parameter fitting, the so called inverse problem, can be littered with surprises and technical obstacles, particularly for nonlinear high-dimensional models [90]. MCM provides several technical fitting options to experiment with if necessary. To avoid the danger of reaching a locally but not globally optimal fit, it is advised to start the fitting at different parameter values. By default, MCM starts at the default parameter values. If the control variable fit_randomRepeats is set to a positive number, MCM will repeat the fitting that many times starting at random parameter values, chosen independently and according to their probability distribution (see section 5.7). MCM will then choose the fitting outcome that yielded the best fit. If fit_randomRepeats is 0, the probability distributions of parameters are irrelevant.
Fitting options (additional to standard simulation options) are exemplified below: # the following options control the maximum effort put into fitting set fit_maxSimulations 500 # stop fitting algorithm beyond this number of simulations set fit_randomRepeats 5 # additional fitting attempts # starting at random parameter values set fit_maxTime 1000 # stop after 1000 seconds of execution time # the following options control the targeted fitting accuracy set fit_objAbsTolerance 1e-5 # find NLL maximum until within this tolerance set fit_PrelativeTolerance 1e-5 # fit parameters to within this # relative distance from their optimal value set fit_PrelativeInitialStep 1e-2 # first attempted parameter variation # (relative to their value) # the following options control the calculation of confidence intervals after fitting set fit_calculateConfidenceIntervals # also include confidence intervals when fitting set fit_confidenceLevel 0. The following two parameters affect the finite difference approximation of derivatives, used to calculate the observed Fisher information: set FD_PrelativeStep 1e-4 # initially attempt to estimate partial derivatives # with this relative finite difference step set FD_maxRefinements 10 # don't refine finite difference steps more than 10 times In principle, any number of parameters can be estimated simultaneously. Furthermore, fitted parameters need not be directly connected to the data used for calibration, as long as a change in the parameters influences the variables that are being compared to the data. Nevertheless, typical limitations of inverse problems [90] apply: (i) The fitting problem should not be degenerate (i.e. all parameters must influence the "goodness of fit" measure being optimized). While such parameters might still be fitted, the calculation of confidence intervals will likely fail. (ii) Fitted parameters should be independent, i.e. alternative similar parameter combinations must not yield the same optimum. Mathematically, this means that the gradient matrix at the optimum must have full rank. (iii) In order to avoid over-fitting, the number of data points should be much higher than the number of parameters to be fitted. In many cases good knowledge of the system or previous literature may be required to identify implausible calibrations.
The fitting routine writes a complete report into the file fit_report.txt. The fitted MC model is simulated one last time and simulation results and data comparison reports are saved into the run_final directory. A new parameters file (named parameters_fitted) is also generated, with the original default values replaced by the fitted values. This file can directly replace the original parameters file for subsequent simulations of the fitted MC model. If the flag fit_plotProgress is set, the LL, NLL and mean R 2 values encountered during fitting are plotted to the file fit_progress.pdf (see Fig. 33 for an example). If confidence intervals are calculated, MCM also plots a heatmap of the estimated parameter covariance matrix (see Fig. 34 for an example). The appearance of the latter can be adjusted as described in section 14.3.4. For a simple fitting example see section 2, for a more advanced example see section 12.
8 Sensitivity analysis 8.1 Sensitivity of model predictions MCM can perform local sensitivity analysis (LSA) of an MC model with respect to symbolic parameters using normalized local sensitivity coefficients (NLSC) [11]. NLSCs compare the relative changes (averaged over time) in model variables (V i ) to the relative changes of individual parameters (p j ) by means of partial derivatives, evaluated at some default parameter choice: 79 click here for source code Here, f q = q 1 t2−t1 t2 t1 |f (t)| q dt is the L q -norm 17 of any function of time, f (t), over a considered time interval [t 1 , t 2 ] (note that MCM predictions are time courses of particular quantities). Hence, NLSC ij is a measure for the relative effects of small changes of the parameter p j on the observable V i . Note that in Eq. (13) every perturbation of the observable V i is compared to its original unperturped value at the same time point (this is called a pointwise normalization), making NLSCs particularly sensitive to changes at lower values of V i (e.g. near the onset of growth, if V i is a cell concentration). Alternatively, one can consider the modified version where changes of the observable are first averaged and then compared to the averaged observable magnitude, V i 1 . To toggle between the two NLSC versions, use the MCM flag SA_normalizePointwise.
Similarly, MCM can perform global sensitivity analysis (GSA) of an MC model using normalized global sensitivity coefficients (NGSC). The latter measure the relative changes of model predictions as individual symbolic parameters range from their minimum to their maximum, while all other parameters are fixed to their default values: Similarly to LSA, one can modify the flag SA_normalizePointwise to toggle between pointwise or nonpointwise normalization.
LSA and GSA are performed using the commands LSAMCM and GSAMCM, respectively. Both LSA and GSA require the specification of an MC model with at least one non-fixed symbolic parameter (see section 5.7). In the case of LSA, non-fixed parameters must have a non-zero default value. In the case of GSA, non-fixed parameters need to vary within a non-trivial interval (as specified by their minimum and maximum). The number of simulations required increases linearly with the number of perturbed parameters.
For MCM-generated time series, · 2 is approximated using a trapezoid integration scheme over the considered time period. Using the control variable statisticsInterval one can control the time range within which to evaluate the model's sensitivity: Partial derivatives, as used for the NLSCs, are approximated numerically by finite differences [55]. Finite difference steps are repeatedly halved (refined ) if needed to achieve satisfactory accuracy. The following two parameters affect the finite difference approximation of derivatives: A summarizing report is written to the file LSA_report.txt. MCM also generates a heatmap of the sensitivity matrix, the appearance of which can be modified as described in section 14.3.4. See figures 18 and 23 for example LSA and GSA heatmaps generated by MCM. GSA will also produce time series plots of focal environmental variables, metabolites, reactions, species and observables and their responses to parameter changes (see Fig. 24 for an example).
Technical note: Some MC models can be very sensitive to certain parameters, to the point where numerical differentiation becomes problematic. Similarly, if an observable depends very weakly on a parameter, numerical errors can mask any existing linear responses, particularly at small difference steps. In both cases, the finite differences scheme can fail to produce an accurate estimate of derivatives. An affine-linear extrapolation of previous finite difference approximations is then used to estimate the true derivatives. This will be indicated in the LSA or GSA report file. Figure 18: Example of a local sensitivity heatmap generated by MCM for a two-species microbial nitrifier community feeding on ammonium and nitrite. A brighter color corresponds to a greater normalized local sensitivity coefficient.

81
click here for source code

Sensitivity of log-likelihoods
When time series data are available for comparison, MCM can perform local and global sensitivity analysis of the log-likelihoods of model predictions (see section 7 on log-likelihoods and data comparisons), similarly to the sensitivity analysis described for model predictions in section 8.1. Local sensitivity analysis of the log-likelihoods, performed using the command LSAMCM_LL, involves the calculation of the partial derivatives with respect to symbolic model parameters, evaluated at some default parameter choice: Partial derivatives are rescaled by the default value of the perturbed parameter to account for different scales among parameters. Hence, the values (16) define a local sensitivity matrix which quantifies the change of the log-likelihoods LL i as a result of small relative changes of model parameters p j . Partial derivatives are calculated using finite differences with repeated step refinements, as described in section 8.1.
Similarly, global sensitivity analysis of the log-likelihoods is performed using the command GSAMCM_LL and quantifies the changes of the LL i as model parameters are varied from their minimum to their maximum values: Both LSAMCM_LL and GSAMCM_LL require an MC model with at least one free (i.e. non-fixed) symbolic parameter (see section 5.7). LSAMCM_LL requires that non-fixed parameters have non-zero default values. GSAMCM_LL requires that non-fixed parameters have a non-trivial range (as defined by their minimum and maximum). The distribution of symbolic parameters is irrelevant. Both LSAMCM_LL and GSAMCM_LL require that data be available for at least one of the quantities listed in section 7.1 for which the error model is not set to "none". For example, the following commands tell MCM to only consider log-likelihoods of metabolite concentrations (for which data is available): The overall log-likelihood of the model is always included in the analysis. The entire analysis is repeated for the normalized log-likelihoods NLL i (the log-likelihoods divided by the number of evaluated data points, per observable), allowing a comparison of the effects of parameter variation on the goodness-of-fit of various model observables. A summarizing report is written to LSA_LL_report.txt or GSA_LL_report.txt. MCM also generates heatmap plots of the local or global sensitivity matrix (see Fig. 31 for an example). The appearance of heatmaps can be modified as described in section 14.3.4. Intermediate simulation are saved if the flag SA_LL_saveIntermediateSimulations is set. The amount of information printed out by MCM during computation is set using the control variable SA_LL_verbosity. See section 12 for a complete example that includes an LSA and GSA of the log-likelihoods.

Uncertainty analysis
In certain cases the parameters of an MC model are not precisely known, but are assumed to be distributed according to some probability distribution. The latter can be, for example, the posterior of a preceding bayesian parameter estimation. In that case, the behavior of the microbial community is stochastic as well. Stochasticity in model predictions can also emerge from intrinsic stochastic dynamics, e.g. when environmental variables or environmental metabolite fluxes are stochastic processes (see sections 5.1 and 5.2.2, respectively) The projection of parameter uncertainty or dynamic stochasticity to stochasticity in model predictions is called uncertainty analysis (UA). MCM performs UA using repeated independent simulations while randomly sampling uncertain parameters. Generated simulations are statistically evaluated to estimate the resulting probability distribution of observables across time. Any non-fixed symbolic parameter is considered uncertain, and its distribution is specified via the distribution tag in the parameters file (see section 5.7). For example, the following entries define two uncertain symbolic parameters, assumed to be log-uniformly and uniformly distributed, respectively, between their minimum and maximum: Technical note: Since UA is based on random Monte Carlo simulations, the outcome will vary each time UA is performed, particularly when UA_MonteCarloTrials is low. To ensure independent results among repeated UAs, you should randomize MCM's random number generator prior to UAMCM using the command seed.
10 Example 1 -Simulating a single-species model In the following step-by-step example, we will setup and run a single-species simulation of Escherichia coli strain K-12 MG1655 growing on minimal medium. We will be using the metabolic model iAF1260 by Feist et al. [24], obtained from the group's website (file Ec_iAF1260_flux1.xml as of Sept. 13, 2014; potentially also included with this manual). Oxygen will be the limiting resource, and in the course of its depletion cell metabolism will shift from aerobic respiration to fermentative growth, producing ethanol as a byproduct. Eventually, increased ethanol levels will lead to stagnation of growth and culture death.

Configuring the MC model
We start by converting the E. coli SBML model file (Ec_iAF1260_flux1.xml) into a draft MC model compatible with MCM, using the included python script micog (see section 5.14 for details). Since we are only converting one SBML, our MC model will consist of one cell species including all reactions. In the terminal the appropriate command might look as follows: micog.py -i Ec_iAF1260_flux1.xml\ -o microbial_community_01\ --output_rate_units mol_per_cell_per_day\ --objective original\ --cell_mass 2.8e-13\ --cell_concentration 1e3\ --environmental_metabolite_dynamics "concentration 0" The first two options (-i and -o) define the input file and output directory, respectively. The option -output_rate_units specifies the rate units to be used for the MC model (this must always be mol_per_cell_per_day). We set -objective to original because iAF1260 already defines the objective in terms of biomass synthesis. Since iAF1260 defines rates with respect to dry biomass, we need to provide micog with the dry cell mass (2.8 × 10 −13 g) in order to convert these to cell-specific rates (as required by MCM). The initial cell concentration is set to 10 3 , although this can easily be changed later on. Finally, we're providing a default value for the environmental dynamics of metabolites: In our case all metabolites are by default absent from the environment (see section 5.2.2 for more options). However, several metabolites can still be taken up by cells, as defined in the original SBML file. Individual metabolite dynamics and uptake kinetics will be manually adjusted below. In the metabolites file, adjust the glucose's max active uptake/export rate, environmental dynamics and environmental dynamics as follows: M_glc_D_e description: M_D_Glucose_C6H12O6 max active uptake rate: Monod 7.056e-14 0.38e-6 max active export rate: unlimited max passive uptake rate: 0 max passive export rate: 0 environmental dynamics: initial 0.0556 flux microbial_export The initial glucose concentration is chosen to match minimal growth medium with 10 g glucose/L, while the uptake kinetic parameters are taken from Varma and Palsson [93] and Owens and Legan [58]. Note that the iAF1260 FBA model includes two formal metabolites representing glucose, M_glc_D_e (external ) and M_glc_D_p (periplasmic). Such a distinction is common for transported metabolites in compartmentalized FBA models, and the "_e" version is conventionally (but not always) the one that is actually exchanged with the environment.
Similarly, adjust the oxygen uptake rate limits and environmental dynamics: M_o2_e description: M_O2_O2 max active uptake rate: Monod 1.008e-13 121e-9 max active export rate: unlimited max passive uptake rate: 0 max passive export rate: 0 environmental dynamics: initial 0.217e-3 flux microbial_export Oxygen is initially at 100% saturation with the atmosphere (at 37 • C), while uptake kinetic parameters are taken from Varma and Palsson [93] and Stolper et al. [86]. Because we want to keep track of the produced acetate (M_ac_e), formate (M_for_e), ethanol (M_etoh_e) and succinate (M_succ_e), set the environmental dynamics for each of these metabolites to "initial 0 flux microbial_export".
Finally, let us specify the metabolites and reactions that we're particularly interested in (see section 5.6), by adjusting the focals file to something like the following: This should execute all commands in the script and result in the output directory simulations_01/run_01 (or simulations_01/run_<N> if you repeat this action N times). All simulation results and reports are saved to this output directory (see section 13 for an overview of output files). Temporal profiles of summary quantities such as community-wide reaction rates are saved into the subdirectory simulation_summary, while species-specific output (irrelevant for our single-species model) is saved into simulation_species_specific.  Figure 19: Metabolite export rates (per cell), predicted for the E. coli culture model described in section 10.
Observe the transition to ethanol-producing fermentative growth after 1 day, triggered by oxygen depletion (see Fig. 20b). Excerpt from the output file metabolite_net_export_per_cell.pdf.   Observe the diauxic growth phase until day 1.5, characterized by a transition from aerobic to fermentative growth after 1 day (Fig. a and c). The rapid decline of living cells after day 1.5 is triggered by the sudden ethanol accumulation (Fig. 20c), resulting from fermentative ethanol production (Fig. 19c). While glucose concentration does decrease (Fig. 20a), it is not a limiting factor to cell growth, as seen from the unchanged glucose uptake rate limits (plot not shown). Excerpts from the output files cell_concentrations.pdf, cell_concentrations_dead_alive.pdf and cell_growth_rates.pdf.

Example 2 -Sensitivity and uncertainty analysis of the E. coli model
In this example we demonstrate the use of symbolic parameters (see section 5.7) to perform an automatic local and global sensitivity analysis, as well as an uncertainty analysis of the E. coli model from the previous example (section 10). More precisely, we will look at the responses of several model predictions (e.g. cell concentrations, final ethanol concentration) to variations of the maximum glucose uptake rate, the cell's maximum life expectancy and the ethanol half-inhibition concentration. For general information on sensitivity analysis and uncertainty analysis see sections 8 and 9, respectively.

Defining the symbolic parameters
Create a copy of the previous MC model directory (section 10) and call it microbial_community_02. Inside the latter, modify the copied parameters file to define the following three symbolic parameters: The above script performs two separate computational tasks: a local sensitivity analysis using LSAMCM, followed by a global sensitivity analysis using GSAMCM. Because MCM overwrites any existing output files without warnings, we change the output directory between the two commands using changeod.
Having adjusted our MC model and our script, we are now ready to perform a sensitivity analysis of the model with respect to the three perturbed parameters, Vmax_glucose, life_expectancy and Kinh_ethanol. Execute the script script_02 in your command line, e.g. by calling 90 click here for source code

MCM script_02
MCM will first perform a simulation of the MC model with the default parameter values, the results of which are saved into the LSA/run_default subdirectory. What follows is a series of almost identical simulations of the model, with slightly varied parameter values for the LSA or broadly varied parameter values for the GSA. This is a computationally expensive process; the number of required simulations might range anywhere between 10, 100 or more simulations, depending on the number of perturbed parameters, the smoothness of the model dynamics and the control variable FD_maxRefinements. In our particular case and on a modern laptop (as of 2014), this might take 10-20 minutes. Intermediate simulation results are not saved unless you set the flag SA_saveIntermediateSimulations. Once all sensitivity coefficients have been calculated, MCM creates heatmaps of the LSA and GSA sensitivity matrices (see Fig. 23) and saves detailed reports in LSA_report.txt and GSA_report.txt. For example, the report written to LSA_report.txt should be similar to the following (numbers might differ slightly): Finished local sensitivity analysis after 30 simulations Final relative finite difference steps: Vmax_glucose: finite differences prematurely stopped at relative step 9.765625e-06 life_expectancy: achieved 0.02 relative finite difference accuracy, at relative step 0.  Fig. 24).
Technical note: Note that finite differentiation might fail to estimate the true partial derivatives with respect to certain parameters, especially if the model's dependency on that parameter is highly non-linear. In that case MCM will try to extrapolate available finite-difference approximations to estimate the true derivatives (as seen in the case of Vmax_glucose).

Uncertainty analysis
The sensitivity analysis demonstrated in the previous section allows for a rough evaluation of the influence of parameters on the microbial community. However, neither LSA nor GSA will take into account possible additional information on the probability distribution of uncertain parameters. In this section, we shall use uncertainty analysis (UA) to translate uncertainties of input parameters into uncertainties of model predictions. Let us specify probability distributions for the symbolic parameters by modifying the parameters file to the following: Executing the new script will result in about 100 simulations (Monte Carlo trials) with parameter values drawn randomly from the distributions specified in the parameters file. Upon completion, the sample distributions of model predictions will be plotted into the file UA/UA.pdf (an excerpt of which is shown in Fig. 25

Example 3 -Comparing a two-species model to data
In this example we will examine a two-species model of the nitrifying bacteria Nitrosomonas spp. and Nitrobacter spp.. The main initial substrate will be ammonium, leading to the chemoautotrophic aerobic growth of Nitrosomonas and the production of nitrite as a by-product. The accumulation of nitrite triggers the delayed growth of Nitrobacter, which oxidizes the nitrite to nitrate. We will be using a combination of the core (i.e. energy metabolism) cell-metabolic models published by Poughon et al. [63], Starkenburg et al. [85] and Perez-Garcia et al. [60] 18 , comprising 24 reactions and 39 metabolites. We will be comparing the predictions of our model to experimental time series data (see section 7), measured by de Boer and Laanbroek [5] in an ammonium-fed mixed Nitrosospira 19 and Nitrobacter culture. The example will also demonstrate sensitivity analysis of the log-likelihoods (see section 8.2) and the use of a custom environmental variable (pH) defined through a time series measured during the same experiment (see section 5.1).

Configuring the MC model
The example's MC model and data files should be included with this document. If not, they may be obtained at MCM's official website. Observe that the parameters file defines four symbolic parameters, Vmax_NH3, Vmax_NO2, init_Ns_concentration and init_Nb_concentration, i.e. the maximum percell ammonia and nitrite uptake rates as well as the initial cell concentrations for Nitrosomonas and Nitrobacter, respectively [72,5] This will perform a single simulation of the model using default parameter values, writing all output to the sub-output directory default_simulation. Fig. 26 shows the cell concentrations predicted by the simulation. Fig. 27 shows the metabolic potential (available reactions) and metabolic activity (reaction rates) heatmaps for each species on day 10. Similarly, Fig. 28 shows the relevant metabolites and metabolite uptake/export rates for each species on day 10. Fig. 29 shows a chord diagram of current metabolic fluxes on day 1.   Observe that HNO2_NO2 is exported by Nitrosomonas and taken up by Nitrobacter.  Figure 29: Chord diagram of community-wide metabolic fluxes between reactions and metabolites on day 1. Generated by simulation of the nitrifier community described in section 12.2. Chords connect reactants to reactions and reactions to products. Chords are colored according to their origin. Chord widths are proportional to flux rates. Excerpt from the output file default_simulation/metabolic_flux_diagrams/current_at_time_1.html.

Comparing the simulation to data
As part of the simulation described in the previous section, MCM should automatically detect two data files, NH4 and NO3, located in data_03/MetaboliteConcentrations and containing measured time series for the NH4 and NO3 concentrations, respectively. At the end of the simulation, the predicted NH4 and NO3 concentrations are compared to the two provided time series. The log-likelihood is calculated for each compared quantity as well as the entire model, using the log-normal error model. All relevant output is written to the sub-output directory default_simulation/comparison_to_data. Fig. 30

Local sensitivity analysis of the log-likelihoods
Having observed the poor comparison of the model predictions to the experimental data (Fig. 30), let us perform a local sensitivity analysis (LSA) of the log-likelihoods (see section 8.2) with respect to our symbolic parameters, Vmax_NH3, Vmax_NO2, init_Ns_concentration and init_Nb_concentration. This might give us clues about the direction towards which we should adjust these parameters, in order to achieve a better fit. Let us modify the script from section 12.2 by inserting the following commands right after the runMCM command: # set a high verbosity to print plenty of information set SA_LL_verbosity 3 # change to a different sub-output directory changeod ../LSA_LL # perform LSA of the log-likelihoods LSAMCM_LL Execute the script as before. The sensitivity analysis invoked by the script involves several similar simulations of the model with slightly varied parameters. Once the partial derivatives of the log-likelihoods have been estimated, MCM creates a heatmap of the sensitivity matrix (LSA_LL_heatmap.pdf, shown in Fig. 31). MCM also writes a detailed report to LSA_LL_report.txt, the last bit of which is exemplified below (actual numbers may vary slightly): Finished local sensitivity analysis of log-likelihood after 16 Figure 31: Local sensitivity heatmaps of the log-likelihoods (a) and normalized log-likelihoods (b) for the NO3 and NH4 concentrations, with respect to perturbed symbolic parameters (see section 12.4). A positive sensitivity coefficient between a log-likelihood and a parameter means that the prediction would match the data better if the parameter was slightly increased. Observe that Vmax_NO2 and init_Nb_concentration do not seem to influence the goodness of fit of the NH4 concentration, in line with expectations (since Nitrobacter consumes NO2, which is merely the oxidation product of NH4).

Global sensitivity analysis of the log-likelihoods
As suggested by the LSA of the log-likelihoods, we should rather increase all four parameters to achieve a better match with the data. Let's double them and see how that would affect the log-likelihoods using MCM's global sensitivity analysis. In the parameters file, set the minimum and maximum parameter values equal to their default and double that, respectively: Execute the new script as before. The script will invoke a global sensitivity analysis of the log-likelihoods, by testing the effects of choosing each of the parameters at its minimum and maximum value. Observe that we set the flag SA_LL_saveIntermediateSimulations because we're interested in the full simulation outputs for all tested parameter values. After a few simulations, MCM saves a detailed report in the file GSA_LL_report.txt and creates the sensitivity heatmap shown in Fig. 32. In line with our previous findings, we conclude that an increased value for all four parameters would improve the fit to the data.
103 click here for source code click here for source code Figure 32: Global sensitivity heatmaps of the log-likelihoods (a) and normalized log-likelihoods (b) for the NO3 and NH4 concentrations, with respect to the parameters Vmax_NH3, Vmax_NO2, init_Ns_concentration and init_Nb_concentration (see section 12.4). A positive sensitivity coefficient between a log-likelihood and a parameter means that the prediction matches the data better at the parameter's maximum value, rather than its minimum (all other parameters being set to their default).

Calibrating parameters to data
In section 12.3 we have seen that the default parameter choices (taken from the literature) fail to reproduce the experimental time series data (Fig. 30). Through a local and global sensitivity analysis (sections 12.4 and 12.5) we have convinced ourselves that increasing Vmax_NH3, Vmax_NO2, init_Ns_concentration and init_Nb_concentration would improve the goodness of fit, but the optimal parameter choice still remains unknown. In this section we will use MCM's fitting routine to estimate the true values of our parameters, by maximizing the normalized log-likelihood (NLL) of the model (see section 7.3). In script_03, replace all previous sensitivity analysis associated additions with the following commands: # change to a different sub-output directory changeod ../fitting # run fitting routine fitMCM Don't forget to revert the parameters file to its original content (as given in section 12.1).
Executing the new script will run a simulation with the default parameter values, followed by a series of simulations with iteratively adjusted parameter values. Once an optimal fit has been reached 20 (in terms of a maximized NLL), MCM performs a final simulation with the fitted parameters and compares it to the data, writing all output to the sub-output directory fitting/run_final. MCM will also plot the log-likelihoods encountered during the fitting process, allowing an evaluation of the convergence (Fig.  33). 20 This can take anywhere from 10 to 1000 or more simulations, depending on the model and default parameter values. Fitting might also prematurely halt if the maximum number of allowed simulations or fitting time has been reached (set using the control variables fit_maxSimulations and fit_maxTime), or even due to other errors. MCM then tries to estimate 95% confidence intervals (and in fact the entire covariance matrix) for the fitted parameters, a process which involves several additional simulations. The estimated covariance matrix of the fitted parameters is saved as a heatmap (see Fig. 34) in the file covariance_matrix.pdf. Figure 34: (a) Estimated covariance matrix (Cov(pi, pj)) for the fitted parameters, Vmax_NH3 and Vmax_NO2, for the nitrifier model considered in section 12.6. (b) Covariance matrix, with each entry normalized by the fitted parameter values (Cov(pi, pj)/ |pi · pj|). Excerpts from the output file covariance_matrix.pdf.

Calibrating measurement units
As described in section 7, MCM supports the estimation of data unit scalings in the case that data is given in arbitrary (unknown) linear units. To illustrate this possibility, suppose that the NH4 measurements have unknown units, as might be the case with non-calibrated assays.

Finished after 435 simulations
Observe that the fitted parameter values differ slightly from the ones estimated previously, achieving an improved NLL. This was of course expected: Since there's one additional parameter to calibrate (the NH4 measurement units), MCM achieves a better fit to the data (Fig. 37). You might need to rename the installed flex executable to lex.

When I compile I get an error that omp.h was not found
OpenMP is used to enable parallel computing in MCM. The easiest solution is to compile MCM without OpenMP, i.e. using make openmp=0 in the terminal. Also consider switching (upgrading) to a compiler version that supports OpenMP.
Note to Mac OS X users: Recent versions of Apple's XCode install the clang compiler (instead of GCC), which has limited support for OpenMP and has been found to fail to compile the MCM code even with OpenMP disabled. In this case, you need to compile MCM using the non-XCode GCC 21 . Older XCode versions, on the other hand, install Apple's GCC version which has a known OpenMP bug. In this case, you can either compile MCM without OpenMP or using the non-Apple GCC. In either case, consider using one of the pre-compiled executable binaries available on the MCM website.

I get an error that the libgomp library was not found
When compiling MCM with support for parallel computing, the binary is dynamically linked to the OpenMP library libgomp. On some systems, that library might not exist, or may not be in the location that MCM expects. The generated error might look similar to dyld: Library not loaded: /opt/local/lib/gcc47/libgomp.1.dylib There are at least 3 solutions to the problem: • You can compile MCM from the sources without OpenMP, by using make openmp=0 in the terminal. Also see section 3.4.
• You can use one of the provided non-parallel binaries.
• You can install a recent GCC version (e.g. GCC 4.7), which comes with the appropriate library.

Heatmap PDF plots appear blurred
This is a known rendering problem of some PDF viewers (like Apple's Preview). There exist at least three solutions: • Try an alternative PDF viewer (like Acrobat Reader).
• Plot in SVG format instead (use the command set plotType SVG).
• Install the command line tool eps2eps, which MCM uses to fix blurred heatmaps. See section 3.5 for more details on plotting.
14.1.6 MCM doesn't generate any plots You might not have the (proper) gnuplot version installed on your system. If you are getting all the corresponding _plot_script.gp and _data.txt files, but no plots, this is most likely the case. The current MCM version (1.3) is most compatible with gnuplot 4.6; other gnuplot versions might or might not work. To check if MCM was able to find the correct gnuplot version on your system you can use the command checkGnuplot in the MCM command line. To point MCM to the correct gnuplot installation path use the command setGnuplot. See section 3.5 for more details. 21 The latter can be installed using homebrew or MacPorts.
112 click here for source code 14.1.7 MCM hangs during a simulation for no apparent reason This problem can occur on Macs during parallel computation (multithreading) due to an OpenMP bug in Apple's GCC compiler. Try to disable parallel computation (disabled by default) or compile MCM with a different GCC compiler (GCC version ≥ 4.7). Also see the FAQ in section 14.1.3.
14.1.8 Are there any risks associated with using MCM?
MCM is a standalone program that should not interfere with your operating system. That being said, please read the disclaimer in section 15. You should also know that the MCM commands execute and executeinod (see section 4.1) allow the execution of any shell command with the permissions specified for the user account running MCM. You should therefore restrict the permissions of MCM users appropriately (e.g. don't allow the control of MCM through a public web interface).
14.1.9 Will MCM run on Windows?
Most likely not. We have not tested MCM on any Windows machine, nor did we develop MCM with Windows in mind. Frankly, Windows is not an operating system for high-performance scientific computing.

How do I include gene regulation?
MCM models all cell metabolism via optimization of a user-provided objective function under flux balance constraints (section 1.3). However, regulation can be emulated by specifying particular reaction rate and metabolite transport rate limits as appropriate functions of metabolite concentrations or environmental variables [15]. For example, to model oxygen inhibition of nitrogen fixation [62], the maximum nitrogen fixation rate might be specified as a decreasing function of oxygen concentration (section 5.3.3). It is also possible to enforce the performance of a reaction, either at a constant rate or as a function of metabolite concentrations or environmental variables (section 5.4.7).
Alternatively, FBA models can be extended to include dynamical internal variables which influence, and are influenced by, a cell's metabolism (see examples in section 5.12). Such non-stationary cell models could accommodate dynamical transcriptional regulatory constraints, that restrict the FBA solution space depending on the abundance of internally produced inhibitors or activators. This approach, known as regulatory FBA (rFBA), is described in detail by [14,13].

MCM falsely predicts zero cell metabolism
Make sure your FBA models are feasible at the current environmental conditions and metabolite concentrations. The command runMCM will produce an output file called FBA_state_initial.txt, which lists a detailed description of all FBA models and their optimal solution. If the flag includeLowLevelLPdetails is set, MCM includes specifications of the derived linear programming problems in the common CPLEX-LP and lp_solve LP formats, allowing the independent verification of their feasibility 22 . runMCM also tries to detect dead-end metabolites in FBA models and prints the results to the file FBA_structure_check.txt.
Depending on the control variable saveCellMetabolicNetworksSBML, MCM will also save cell-metabolic FBA models as SBML files [37] that can be visualized or analyzed with other tools 23 [94,80].
Similarly, depending on the control variable plotMetabolicActivityHeatmaps, MCM will plot metabolic activity heatmaps, and write a detailed report on the metabolic activity and limiting metabolites and reactions into the file metabolic_activity.txt (see section 6).
If the flag checkTheoreticalSystemLimits is set during runMCM, MCM will try to estimate the maximum cell growth rate based on FBA model structure, by setting all metabolite transport and reaction rate limits to their most relaxed value. The results are reported in the file theoretical_limits.txt.
Note that cell-metabolic models often involve thousands of constraints and algebraic coefficients spanning several orders of magnitude. To avoid serious rounding errors, non-trivial (i.e. non-zero and non-infinite) FBA model parameters (e.g. metabolite uptake/export rate limits) of similar physical units must be of similar scales (i.e. less than 8 orders of magnitude apart). If reaction rate or metabolite transport rate limits are not known, do not set them to a large number (as is typically done in FBA models to represent infinity). Instead, use the unlimited keyword to denote the absence of a constraint (see sections 5.3.3 and 5.2.1).
14.2.3 My organism should perform reaction X, but FBA does not predict it FBA predicts metabolism based on optimality principles (in the case of MCM, maximization of biomass production). This assumption is at least questionable for genetically engineered organisms or those exposed to environments that are radically different from the environments that shaped their evolution [83]. Nevertheless, you can force an organism to perform a certain reaction as part of its metabolism through a series of modifications to the original metabolic model, as described in section 5.4.7.

What if an FBA problem has no solution or a negative optimum?
If a cell-metabolic FBA model is not solvable or predicts a negative biomass production rate, MCM assumes zero metabolic activity and zero biomass production.

Can I use different error models for different variables?
Yes. Using the control variables listed in section 7.2 you can specify a different error model (e.g. normal or logNormal) for each type of variables such as metabolite concentrations, reaction rates and so on. If you need to use a different error model for similar variable types (e.g. for two different metabolites), you can define an observable for each variable (section 5.5) and explicitly specify the appropriate error model for each observable. For example, to use a normal error model for NH4 and a logNormal error model for NO3, you can define the following two observables: 14.2.6 How do I keep track of species-specific activity?
To keep track of species-and cell-specific reaction rates and metabolite fluxes during simulations set the control variable output_speciesSpecificMetabolism to either save or saveAndPlot. This will save (and optionally plot) all reaction rates and all metabolite fluxes for all species into the sub-output directory simulation_species_specific. Depending on the model size, the output might use up a significant amount of disk space.
Alternatively, one can keep track of custom observables (section 5.5) focused on individual aspects of a cell's metabolic activity. For example, to keep track of the rate of reaction amo per Nitrosomonas cell one can define the following observable: amo_per_Nitrosomonas_cell value: rate_pc.amo.Nitrosomonas units: mol/(cell*day) See Table 2 for additional options for species-specific observables. Similarly to other model variables, observables can be compared to data and used for model calibration (section 7). To actually plot an observable's time course, make sure to specify it as focal (section 5.6) and to set the control flag summary_includeObservables (section 4.1).

14.
3.1 What's a flag or a control variable and how do I use them?
An MCM flag is a boolean option (which can only be on or off, set or unset) specified in the MCM command line, which allows the user to toggle between two alternative (typically opposite) behaviors. For example, to toggle between high-contrast and linear-colorspace heatmaps you can either write set highContrastHeatmaps or unset highContrastHeatmaps respectively. A control variable, similarly to flags, controls MCM's subsequent behavior but can have more complicated or specialized values (such as a number, a set of numbers or a file path). For example, maxSimulationTime is used to specify the duration of a single simulation, e.g. as set maxSimulationTime 10 A more complicated example is plotMetabolicActivityHeatmaps, which holds a sequence of times at which MCM is to plot metabolic activity heatmaps during a simulation. Setting the value of this variable requires a special syntax. For example, writing set plotMetabolicActivityHeatmaps start 0 step 2 end 10 will assign to the variable the sequence {0, 2, 4, 6, 8, 10}. One can always obtain an overview of a variable's purpose and syntax using the help command, e.g.

help plotMetabolicActivityHeatmaps
Consult section 4.2 for an interpretation of syntax overviews.
14.3.2 Can I analyze or vizualize MC models using 3rd party tools?
The short answer is yes. The long answer is that you need to save the MC model into another common file format like ARFF, BIOM [52], GEXF [3] or SBML [37]. This can be done as part of a regular simulation, as described in section 6. Note that when exporting MC models into other file formats some information is potentially lost, for example if the target file format does not support the full complexity of MC models.

How can I improve the quality of plots?
Every plot generated by MCM (e.g. UA.pdf) is accompanied by the corresponding raw data (UA_data.txt), as well as a self-contained gnuplot script (UA_plot_script.gp) used to generate the plot. Gnuplot itself is a sophisticated graphing tool capable of producing high quality plots (consult the gnuplot homepage for usage instructions). You can either adjust and re-run the gnuplot script to produce a modified version of the plot, or use any graphing tool of your choice with the raw data. Alternatively, you can choose to plot in SVG file format and directly edit the generated plots in any SVG editor. See section 3.5 for more details on plotting.
14.3.4 Can I change the appearance of heatmaps?
The coloring of heatmaps can be modified through the control variable heatmapColorPalette and the flag highContrastHeatmaps, e.g. as: