Emergence and propagation of epistasis in metabolic networks

Epistasis is often used to probe functional relationships between genes, and it plays an important role in evolution. However, we lack theory to understand how functional relationships at the molecular level translate into epistasis at the level of whole-organism phenotypes, such as fitness. Here, I derive two rules for how epistasis between mutations with small effects propagates from lower- to higher-level phenotypes in a hierarchical metabolic network with first-order kinetics and how such epistasis depends on topology. Most importantly, weak epistasis at a lower level may be distorted as it propagates to higher levels. Computational analyses show that epistasis in more realistic models likely follows similar, albeit more complex, patterns. These results suggest that pairwise inter-gene epistasis should be common, and it should generically depend on the genetic background and environment. Furthermore, the epistasis coefficients measured for high-level phenotypes may not be sufficient to fully infer the underlying functional relationships.


Introduction
Life emerges from an orchestrated performance of complex regulatory and metabolic networks within cells. The blueprint for these networks is encoded in the genome. Mutations alter the genome. Some of them, once decoded by the cell, perturb cellular networks and thereby change the phenotypes important for life. Understanding how mutations affect the function of cellular networks is key to solving many practical and fundamental problems, such as finding mechanistic causes of genetic disorders (Hu et al., 2011;Fang et al., 2019), deciphering the architecture of complex traits (Zuk et al., 2012;Mackay, 2014;Wei et al., 2014), building artificial cells (Hutchison et al., 2016), explaining past, and predicting future evolution (Blount et al., 2008;Wiser et al., 2013;de Visser and Krug, 2014;Harms and Thornton, 2014;Kryazhimskiy et al., 2014;Sailer and Harms, 2017a;Sohail et al., 2017). Conversely, mutations can help us learn how cellular networks are organized (Phillips, 2008;van Opijnen and Camilli, 2013).
To infer the wiring diagram of a cellular network that produces a certain phenotype, one approach in genetics is to measure the pairwise and higher-order genetic interactions (or 'epistasis') between mutations that perturb it (Phillips, 2008). Much effort has been devoted in the past 20 years to a systematic collection of such genetic interaction data for several model organisms and cell lines (Kelley and Ideker, 2005;Lehner et al., 2006;Jasnos and Korona, 2007;Collins et al., 2007;St Onge et al., 2007;Typas et al., 2008;Roguev et al., 2008;Costanzo et al., 2010;Szappanos et al., 2011;Huang et al., 2012;Roguev et al., 2013;Bassik et al., 2013;Babu et al., 2014;Costanzo et al., 2016;Skwark et al., 2017;Du et al., 2017;Heigwer et al., 2018;Horlbeck et al., 2018;Norman et al., 2019;Liu et al., 2019;Kuzmin et al., 2018;New and Lehner, 2019;Celaj et al., 2020). This approach is particulary powerful when the phenotypic effect of one mutation changes qualitatively depending on the presence or absence of a second mutation in another gene, for example when a mutation has no effect on the phenotype in the wildtype background, but abolishes the phenotype when introduced together with another mutation, such as synthetic lethality (Tong et al., 2001). Such qualitative genetic interactions can often be directly interpreted in terms of a functional relationship between gene products (Tong et al., 2001;Davierwala et al., 2005;Phillips, 2008).
Most pairs of mutations do not exhibit qualitative genetic interactions. Instead, the phenotypic effect of a mutation may change measurably but not qualitatively depending on the presence or absence of other mutations in the genome (Babu et al., 2014;Costanzo et al., 2016). The genetic interactions can in this case be quantified with one of several metrics that are termed 'epistasis coefficients' (Wagner et al., 1998;Hansen and Wagner, 2001;Mani et al., 2008;Wagner, 2015;Poelwijk et al., 2016). Although some rules have been proposed for interpreting epistasis coefficients, in particular, their sign (Dixon et al., 2009;Lehner, 2011;Baryshnikova et al., 2013), the validity, and robustness of these rules are unknown because there is no theory for how functional relationships translate into measurable epistasis coefficients in any system (Lehner, 2011;Domingo et al., 2019). To avoid this major difficulty, most large-scale empirical studies focus on correlations between epistasis coefficients rather than on their actual values (but see Velenich and Gore, 2013, for a notable exception). Genes with highly correlated epistasis profiles are then interpreted as being functionally related (Segrè et al., 2005;Bellay et al., 2011;Babu et al., 2014;Costanzo et al., 2016;Horlbeck et al., 2018). Although this approach successfully groups genes into protein complexes and larger functional modules Bellay et al., 2011), it does not reveal the functional relationships themselves. As a result, many if not most, genetic interactions between genes and modules still await their biological interpretation (Costanzo et al., 2016;Fang et al., 2019).
While geneticists measure epistasis to learn the architecture of biological networks, evolutionary biologists face the reverse problem: they need to know how the genetic architecture constrains epistasis at the level of fitness. Epistasis determines the structure of fitness landscapes on which populations evolve (Fragata et al., 2019). Understanding it would bear on many important evolutionary questions, such as why so many organisms reproduce sexually (Kondrashov, 2018), how novel phenotypes evolve (Blount et al., 2008;Bridgham et al., 2009;Natarajan et al., 2013;Harms and Thornton, 2014), how predictable evolution is (Weinreich et al., 2006;Tenaillon et al., 2012;Wiser et al., 2013;Kryazhimskiy et al., 2014), etc. So far, evolutionary biologists have relied primarily on abstract models of fitness landscapes (see Orr, 2005, for a review), rather than those firmly grounded in organismal biochemistry and physiology (e.g. Dykhuizen et al., 1987;Das et al., 2020). For example, Fisher's geometric model-one of the most widely used fitness landscape models-is explicitly devoid of the physiological and biochemical details (Fisher, 1930;Tenaillon, 2014;Martin, 2014).
A theory of epistasis must address two challenges. First, it must specify how the architecture of a biological network constrains epistasis. Such knowledge is important not only for evolutionary questions, but also for the inference problem in genetics. Consider a biological network module that produces a phenotype of interest but whose internal structure is unknown. By genetically perturbing all genes within the module and measuring the phenotype in all single, double and possibly some higher-order mutants, we can obtain the matrix of epistasis coefficients. In principle, we can then fit a network topology and parameters to these data. However, without knowing what information about the network is contained in the matrix in the first place, we cannot be sure whether the inferred topology and parameters are close to their true values or represent one of many possible solutions consistent with the data.
The second challenge is that epistasis may arise at a different level of biological organization than where it is measured by the experimentalist or by natural selection. For example, geneticists are often interested in understanding the structures of specific regulatory or metabolic network modules (Collins et al., 2007;Costanzo et al., 2010). However, measuring the peformance of a module directly is often experimentally difficult or impossible. Then epistasis is measured for an experimentally accessible 'high-level' phenotype, such as fitness, which depends on the performance of the focal 'lower-level' module, but also on other unrelated modules. However, if we do not know how epistasis that originally emerged in one module maps onto epistasis that is measured, it is unclear what we can infer about module's internal structure.
Evolutionary biologists encounter a related problem when they wish to learn the evolutionary history of a protein or a larger cellular module. To do so, they would in principle need to know how different mutations in this module affected fitness of the whole organism in its past environment. But such information is rarely available. Instead, it is sometimes possible to reconstruct past mutations and measure their biochemical effects in the lab (Lunzer et al., 2005;Bridgham et al., 2009;Natarajan et al., 2013;Sarkisyan et al., 2016). When interesting patterns of epistasis are identified at the biochemical level, it is usually assumed that the same patterns manifested themselved at the level of fitness and drove module's evolution. However, this is not obvious. If interactions with other modules distort epistasis as it propagates from the biochemical level to the level of fitness (Snitkin and Segrè, 2011), our ability to infer past evolutionary history from in vitro biochemical measurements could be diminished. Therefore, the second challenge that a theory of epistasis must address is how epistasis propagates from lower-level phenotypes to higher-level phenotypes.
There is a large body of theoretical and computational literature on epistasis. As early as 1934, Sewall Wright realized that epistasis naturally emerges in molecular networks (Wright, 1934). This was later explicitly demonstrated in many mathematical and computational models (e.g. Kacser and Burns, 1981;Keightley, 1989;Szathmáry, 1993;Gibson, 1996;Keightley, 1996;Wagner et al., 1998;Omholt et al., 2000;Peccoud et al., 2004;Gjuvsland et al., 2007;Gertz et al., 2010;Fiévet et al., 2010;Pumir and Shraiman, 2011). Metabolic control analysis became one of the most successful and general frameworks for understanding epistasis between metabolic genes (Kacser and Burns, 1973). Dean et al., 1986, Dykhuizen et al., 1987, Dean, 1989, Lunzer et al., 2005, MacLean, 2010 used it to interpret the empirically measured fitness effects of mutations and their interactions in terms of the metabolic relationships between the products of mutated genes. Kacser and Burns, 1981, Hartl et al., 1985, Keightley, 1989, Clark, 1991, Keightley, 1996, Bagheri-Chaichian et al., 2003, Bagheri and Wagner, 2004, Fiévet et al., 2010 explored the implications of epistasis in metabolism for genetic variation in populations, their response to selection, long-term evolutionary dynamics and outcomes, such as the evolution of dominance. However, most studies analyzed only the linear metabolic pathway (but see Keightley, 1989) and assumed that fitness equals flux through the pathway (but see Szathmáry, 1993), thereby bypassing the problem of epistasis propagation.
There have been few attempts to theoretically relate the molecular architecture of an organism to the types of epistasis that would arise for its high-level phenotypes, such as fitness. Segrè et al., 2005 andHe et al., 2010 used flux balance analysis (FBA, Orth et al., 2010) to compute genomewide distributions of epistasis coefficients in metabolic models of Escherichia coli and Saccharomyces cerevisiae and arrived at starkly discordant conclusions. Recently, Alzoubi et al., 2019 showed that FBA is generally poor in predicting experimentally measured genetic interactions, suggesting that it might be difficult to understand the emergence and propagation of epistasis by relying exclusively on genome-scale computational models. Sanjuán andNebot, 2008 andMacía et al., 2012 modeled various abstract metabolic and regulatory networks and found a possible link between epistasis and network complexity. The work by Chiu et al., 2012 is a more systematic attempt to develop a general theory of epistasis. They established a fundamental connection between epistasis and the curvature of the function that maps lower-level phenotypes onto a higher-level phenotype. However, further progress has been so far hindered by uncertainty in what types of functions map phenotypes onto one another in real biological systems. Previous studies made various idiosyncratic choices with respect to this mapping, leaving us without a clear guidance as to the conditions or systems where they are expected to hold.
To overcome this problem, here I consider a whole class of hierarchical metabolic networks and obtain the family of all functions that determine how the effective activity of a larger metabolic module can depend on the activities of smaller constituent modules. There are several advantages to this approach. First, it leads to an intuitive understanding of how the structure of the network influences epistasis emergence and propagation. Second, my approach is based on basic biochemical principles, so it should be relevant for many phenotypes. For example, epistasis is often measured at the level of growth rate (Jasnos and Korona, 2007;St Onge et al., 2007;Babu et al., 2014;Costanzo et al., 2016), and metabolism fuels growth. Moreover, metabolic genes occupy a large fraction of most genomes (Orth et al., 2011) and the general organization of metabolism is conserved throughout life (Csete and Doyle, 2004). Thus, by understanding genetic interactions between metabolic genes, we will gain an understanding of a large fraction of all genetic interactions.
In my model, I consider a hierarchical network with first-order kinetics but arbitrary topology, and ask two questions related to the two challenges mentioned above. (1) How does an epistasis coefficient that arose at some level of the metabolic hierarchy propagate to higher levels of the hierarchy?
(2) How does the network topology constrain the value of an epistasis coefficient between two mutations that affect different enzymes in this network? I obtain answers to these questions analytically in the limiting case when the effects of mutations are vanishingly small. I then computationally probe the validity of the conclusions outside of the domain where they are expected to hold.
My model is not intended to generate predictions of epistasis for any specific organism. Instead, its main purpose is to provide a baseline expectation for how epistasis that emerges at lower-level phenotypes manifests itself at higher-level whole-organism phenotypes, such as fitness, and what kind of information may be gained from measurements of such higher-level epistasis. One possible outcome of this analysis is that there may be fundamental limitations to what an epistasis measurement at one level of biological organization can tell us about epistasis at another level. On the other hand, if it turns out that there is a general correspondence between epistasis coefficients at different levels in this simple model, then it may be worth developing more sophisticated and general models on which inference from data can be based.

Hierarchical metabolic network
Consider a set of metabolites A ¼ f1; 2; . . . ; ng with concentrations S 1 ; . . . ; S n which can be interconverted by reversible first-order biochemical reactions. The rate of the reaction converting metabolite i into metabolite j is x ij S i À S j =K ij À Á where K ij is the equilibrium constant. The rate constants x ij , which satisfy the Haldane relationships x ji ¼ x ij =K ij (Cornish-Bowden, 2013), form the matrixx ¼ x ij n i;j¼1 . The metabolite set A and the rate matrixx define a biochemical network N ¼ A;x ð Þ. The first-order kinetics assumption makes the model analytically tractable, as discussed below; biologically, it is equivalent to assuming that all enzymes are far from saturation. The rate constants x ij depend on the concentrations and the specific activities of enzymes and therefore can be altered by mutations. K ij characterize the fundamental chemical nature of metabolites i and j and cannot be altered by mutations (Savageau, 1976).
The whole-cell metabolic network is large, and it is often useful to divide it into subnetworks that carry out certain functions important for the cell. I define subnetworks mathematically as follows. I say that two metabolites i and j are adjacent (in the graph-theoretic sense) if there exists an enzyme that catalyzes a biochemical reaction between them, that is, if x ij >0. Now consider a subset of metabolites A & A. For this subset, let A IO be the set of all metabolites that do not belong to A but are adjacent to at least one metabolite from A . Letx be the submatrix ofx which corresponds to all reactions where both the product and the substrate belong to either A or A IO . The metabolite subset A and the rate matrixx form a subnetwork ¼ ðA ;x Þ of network N . I refer to A and A IO as the sets of internal and 'input/output' ('I/O' for short) metabolites for subnetwork m, respectively. Thus, all internal metabolites and all reactions that involve only internal and I/O metabolites are part of the subnetwork. Note that the I/O metabolites do not themselves belong to the subnetwork, but reactions between them, if they exist, are part of the subnetwork. Metabolites that are neither internal nor I/O for m are referred to as external to subnetwork m. These definitions are illustrated in Figure 1A.
The main objects in this work are biochemical modules, which are a special type of subnetworks. To define modules, I introduce some auxiliary concepts. I say that two metabolites i and j are connected if there exists a series of enzymes that interconvert i and j, possibly through a series of intermediates. Mathematically, i and j are connected if there exists a simple (i.e. non-self-intersecting) path between them. If all metabolites in this path are internal to the subnetwork m (possibly excluding the terminal metabolites i and j themselves) then i and j are connected within the subnetwork m, and such path is said to lie within m. By this definition, metabolites i and j can be connected within m only if they are either internal or I/O metabolites for m. metabolites increases. Moreover, modules with just two I/O metabolites already capture two most salient features of metabolism: its directionality, and its complex branched topology (Csete and Doyle, 2004). Such modules are a natural generalization of the linear metabolic pathway which has been extensively studied in the previous literature (Kacser and Burns, 1973;Szathmáry, 1993;Bagheri-Chaichian et al., 2003;MacLean, 2010).
Modules have two important properties. First, for any given concentrations of the two I/O metabolites, all internal metabolites in the module can achieve a unique steady state which depends only on concentrations of these I/O metabolites but not on the concentrations of any other metabolites in the network (see Proposition 1 in Materials and methods). Now consider a module m whose I/O metabolites are (without loss of generality) labeled 1 and 2 ( Figure 1A). The second property is that, at steady state, the flux through this module is J ¼ y S 1 À S 2 =K 12 ð Þ, where is the effective reaction rate constant of module m ( Figure 1B). Importantly, y depends only on the rate matrixx , but not on any other rate constants (see Corollary 2 in Materials and methods), and it can be recursively computed for any module, as described in Materials and methods. In other words, metabolic network N can be coarse-grained by replacing module m at steady state with a single first-order biochemical reaction with rate y . Importantly, such coarse-graining does not alter the dynamics of any metabolites outside of module m (see Proposition 1 in Materials and methods). This statement is the biochemical analog of the star-mesh transformation (and its generalization, Kron reduction, Rao et al., 2014) well known in the theory of electric circuits (Versfeld, 1970). The biological interpretation of these properties is that a module is somewhat isolated from the rest of the metabolic network. And vice versa, the larger network (i.e. the cell) 'cares' only about the total rate at which the I/O metabolites are interconverted by the module but 'does not care' about the details of how this conversion is enzymatically implemented. In this sense, the effective rate y quantifies the function of module m (a macroscopic parameter) while the ratesx describe the specific biochemical implementation of the module (microscopic parameters).
The effective rate constant y of module m depends on the entire rate matrixx . In general, a single mutation may perturb several rate constants within a module, so that the entire shape of the function F may be important. Here, I focus on a special case when each mutation perturbs one reaction (real or effective) within a module, while all others remain constant. To examine epistasis between mutations, I will also consider two different mutations that perturb two separate reactions within a module. In these special cases, we do not need to know the entire function F. We only need to know how module's effective rate constant y depends on the one or two rate constants of the perturbed reactions. When y is considered as a function of the rate constant x of one reaction, I write y ¼ f 1 ðÞ; (2) and when y is considered as a function of the rate constants x and h of two reactions, I write The rate constants of all other reactions within module m play a role of parameters in functions f 1 and f 2 .
Consider now a network N that has a hierarchical structure, such that there is a series of nested Figure 1A). Since any module at steady state can be replaced with an effective first-order biochemical reaction, there exists a hierarchy of quantitative metabolic phenotypes y ; y n ; . . . (Figure 1B,C). These phenotypes are of course functionally related to each other. Specifically, because n is a 'higher-level' module (in the sense that it contains a 'lower-level' module m), the matrixx n can be decomposed into two submatricesx andx nn where the latter is the matrix of rate constants of reactions that belong to module n but not to module m. Since replacing the lower-level module m with an effective reaction with rate constant y does not alter the dynamics of metabolites outside of m, y n must depend on all elements ofx only through y , that is, where ratesx nn act as parameters of function f 1 . Thus, in the hierarchy of metabolic phenotypes y ; y n ; . . ., a phenotype at each subsequent level depends on the phenotype at the preceding level according to Equation 4, and the lowest level phenotype y depends on the actual rate constants accroding to Equation 1. This hierarchy of functionally nested phenotypes is conceptually similar to the hierarchical 'ontotype' representation of genomic data proposed recently by Yu et al., 2016.

Quantification of epistasis
Consider a mutation A that perturbs only one rate constant x ij , such that the wildtype value x 0 ij changes to x A ij . This mutation can be quantified at the microscopic level by its relative effect If the reaction between metabolites i and j belongs to nested modules ; n; . . ., then mutation A may impact the functions of these modules, which can be quantified by the relative effects d A y , d A y n , etc. at each level of the hierarchy.
Consider now another mutation B that only perturbs the rate constant x k' of another reaction. Since mutations A and B perturb distinct enzymes, they by definition do not genetically interact at the microscopic level. However, if both perturbed reactions belong to the metabolic module m (and, as a consequence, to all higher-level modules which contain m), they may interact at the level of the function of this module, in the sense that the effect of mutation B on the effective rate y may depend on whether mutation A is present or not. Such epistasis between mutations A and B can be quantified at the level m of the metabolic hierarchy by a number of various epistasis coefficients (Wagner et al., 1998;Hansen and Wagner, 2001;Mani et al., 2008). I will quantify it with the epistasis coefficient where d AB y denotes the effect of the combination of mutations A and B on phenotype y relative to the wildtype. Since I only consider two mutations A and B, I will write "y instead of " AB y to simplify notations. Note that other epistasis coefficients can always be computed from "y , d A y and d B y , if necessary. Expressions for epistasis coefficients at other levels of the metabolic hierarchy are analogous.

Results
The central goal of this paper is to understand the patterns of epistasis between mutations that affect reaction rates in the hieararchical metabolic network described above. Specifically, I am interested in two questions. (1) Given that two mutations A and B have an epistasis coefficient "y at a lower level m of the metabolic hierarchy, what can we say about their epistasis coefficient "y n at a higher level n of the hierarchy? In other words, how does epistasis propagate through the metabolic hierarchy? (2) If mutation A only perturbs the activity x ij of one enzyme and mutation B only perturbs the activity x k' of another enzyme that belongs to the same module m, then what values of "y can we expect to observe based on the topological relationship between the two perturbed reactions within module m? In other words, what kinds of epistasis emerge in a metabolic network?

Propagation of epistasis through the hierarchy of metabolic phenotypes
Assuming that the effects of both individual mutations and their combined effect at the lower-level m are small, it follows from Equation 4 and Equation 5 that where C ¼ f 0 1 y =y n and H ¼ f 00 1 y 2 =y n are the first-and second-order control coefficients of the lower-level module m with respect to the flux through the higher-level module n and o 1 ð Þ denotes all terms that vanish as the effects of mutations tend to zero (see Materials and methods for details). Note that Equation 6 is a special case of a more general Equation 49 which describes the case when mutations affect multiple enzymes. Equation 6 defines a linear map f with slope 1=C and a fixed point Þ À1 , which both depend on the topology of the higher-level module n and the rate constantsx nn .
To gain some intuition for how the map f governs the propagation of epistasis from a lower level m to a higher level n, suppose that module n is a linear metabolic pathway. In this case, it is intuitively clear that function f 1 is monotonically increasing (i.e. the higher y , the more flux can pass through the linear pathway n) and concave (i.e. as y grows, other reactions in n become increasingly more limiting, such that further gains in y yield smaller gains in y n ).
Indeed, it is easy to show that C ¼ 1 þ ay À Á À1 >0 and H ¼ À2a y 1 þ ay À Á À2 <0, where a is a positive constant that depends on other reactions in the pathway (see Materials and methods for details). It then immediately follows that any zero or negative epistasis "y that already arose at the lower level would propagate to negative epistasis "y n at the level of the linear pathway n. Moreover, since C<1, the fixed point of the map in Equation 6 is unstable. Therefore, if epistasis "y was already sufficiently large at the lower level, it would induce even larger positive epistasis "y n at the level of the linear pathway n. In fact, when module n is a linear pathway, " ¼ 1, so that "y n >1 whenever "y >1.
The first result of this paper is the following theorem, which shows that the same rules of propagation of epistasis hold not only for a linear pathway but for any module (Figure 2). that maps lower-level epistasis "y onto higher-level epistasis "y n . Slope 1=C and fixed point " depend on the topology and the rate constants of the higher-level module n, but they are bounded, as shown. Thus, the fixed point " of this map lies between 0 and 1 and is always unstable (open circle).

Theorem 1
For any module n, and 0 " 1: The proof of Theorem 1 is given in Materials and methods. Its main idea is the following. The functional form of f 1 in Equation 4 depends on the topology of module n. Since the number of topologies of n is infinite, we might a priori expect that there is also an infinite number of functional forms of f 1 . However, this is not the case. In fact, all higher-level modules that contain a lower-level module fall into three topological classes defined by the location of the lower-level module with respect to the I/O metabolites of the higher-level module (see Proposition 2 and Figure 7 in Materials and methods). To each topological class corresponds a parametric family of the function f 1 , so that there are only three such families. For each family, the values of C and H can be explicitly calculated, yielding the bounds in Equation 7 and Equation 8.
Equation 6 together with Equation 7 and Equation 8 show that the linear map f from epistasis at a lower-level to epistasis at the higher-level has an unstable fixed point between 0 and 1 (Figure 2). This implies that negative epistasis at a lower level of the metabolic hierarchy necessarily induces negative epistasis of larger magnitude at the next level of the hierarchy, that is, "y n "y <0. Therefore, once negative epistasis emerges somewhere along the hierarchy, it will induce negative epistasis at all higher levels of the hierarchy, irrespectively of the topology or the kinetic parameters of the network.
Similarly, if epistasis at the lower level of the metabolic hierarchy is positive and strong, "y >1, it will induce even stronger positive epistasis at the next level of the hierarchy, that is, "y n ! "y >1. Therefore, once strong positive epistasis emerges somewhere in the metabolic hierarchy, it will induce strong positive epistasis of larger magnitude at all higher levels of the hierarchy, irrespectively of the topology or the kinetic parameters of the network. If positive epistasis at a lower level of the hierarchy is weak, 0<"y <1, it could induce either negative, weak positive or strong positive epistasis at the higher level of the hierarchy, depending on the precise value of "y , the topology of the higher-level module n and the microscopic rate constantsx nn .
In summary, there are three regimes of how epistasis propagates through a hierarchical metabolic network. Negative and strong positive epistasis propagate robustly irrespectively of the topology and kinetic parameters of the metabolic network, whereas the propagation of weakly positive epistasis depends on these details. The strongest qualitative prediction that follows from Theorem 1 is that negative epistasis for a lower-level phenotype cannot turn into positive epistasis for a higherlevel phenotype, but the converse is possible.

Emergence of epistasis between mutations affecting different enzymes
Which of the three regimes described above can emerge in metabolic networks and under what circumstances? In other words, if two mutations affect the same module, are there any constraints on epistasis that might arise at the level of the effective rate constant of this module? To address this question, I consider two mutations A and B that affect the rate constants of different single reactions within a given module.
Consider a relatively simple module n shown in Figure 1A and two mutations A and B that affect the reactions, as shown in Figure 3A. I will now show that the epistasis coefficient "y n can take values in all three domains described above, depending on the biochemical details of this module. Using the recursive procedure for evaluating y described in Materials and methods, it is straightforward to obtain an analytical expression for y n as a function of the rate matrixx n , from which "y can also be obtained (see Materials and methods for details). To demonstrate that "y can take values below 0, between 0 and 1, and above 1, it is convenient to keep all of the rate constants fixed except for the rate constant z x 34 of a reaction that is not affected by mutations A or B, as shown in Figure 3A. Figure 3B then shows how the epistasis coefficient "y varies as a function of z for one particular choice of all other rate constants. When z is small, "y <0. As z increases, it becomes weakly positive (0<"y <1) and eventually strongly positive ("y >1). Thus, in my model, there are no fundamental constraints on the types of epistasis that can emerge between mutations.
This simple example also reveals that not only the value but also the sign of epistasis generically depend on the rates of other reactions in the network, such that other mutations or physiological changes in enzyme expression levels can modulate epistasis sign and strength. In other words, 'higher-order' and 'environmental' epistasis are generic features of metabolic networks.
Upon closer examination, the toy example in Figure 3 also suggests that the sign of "y n may depend predictably on the topological relationship between the affected reactions. When z ¼ 0, the two reactions affected by mutations are parallel, and epistasis is negative. When z is very large, most of the flux between the I/O metabolites passes through z such that the two reactions affected by mutations become effectively serial, and epistasis is strongly positive. Other toy models show consistent results: epistasis between mutations affecting different reactions in a linear pathway is always positive and epistasis between mutations affecting parallel reactions is negative (see Materials and methods for details). These observation suggest an interesting conjecture. Do mutations affecting parallel reactions always exhibit negative epistasis and do mutations affecting serial reactions always exhibit positive epistasis? In fact, such relationship between sign of epistasis and topology has been previously suggested in the literature (e.g. Dixon et al., 2009;Lehner, 2011).
To formalize and mathematically prove this hypothesis, I first define two reactions as parallel within a given module if there exist at least two distinct simple (i.e. non-self-intersecting) paths that connect the I/O metabolites, such that each path lies within the module and contains only one of the two focal reactions. Analogously, two reactions are serial within a given module if there exists at least one simple path that connects the I/O metabolites, lies within the module and contains both focal reactions. According to these definitions, two reactions can be simultaneously parallel and serial, as, for example, the reactions affected by mutations A and B in Figure 3A. I call such reaction pairs serialparallel. I define two reactions to be strictly parallel if they are parallel but not serial ( Figure 3C) and I define two reactions to be strictly serial if they are serial but not parallel ( Figure 3D). Thus, each pair of reactions within a module can be classified as either strictly parallel, strictly serial or serialparallel.
The second result of this paper is the following theorem.

Theorem 2
Let x and h be the rate constants of two different reactions in module m. Suppose that mutation A perturbs only one of these reactions by d A and mutation B perturbs only the other reaction by d B h.
In the limit d A ! 0 and d B h ! 0, the following statements are true. If the affected reactions are strictly parallel then "y 0. If the affected reactions are strictly serial, then "y ! 1.
The detailed proof of this theorem is given in Materials and methods. Its key ideas and the logic are the following. It follows from Equation 3 and Equation 5 that q qh h y are the first-and second-order control coefficients of the affected reactions with respect to the flux through module m and oð1Þ denotes terms that vanish when d A and d B h approach zero (see Materials and methods for details). Note that Equation 9 was previously derived by Chiu et al., 2012. To compute the epistasis coefficient "y for an arbitrary module m, we need to know the first and second derivatives of function f 2 . Analogous to function f 1 , there is a finite number of parametric families to which f 2 can belong. Specifically, all modules fall into nine topological classes with respect to the locations of the affected reactions within the module (see Figure 8), and each of these topologies defines a parametric family of function f 2 (see Proposition 3 and its Corollary 3 in Materials and methods). Most of these topological classes are broad and contain modules where the affected reactions are strictly parallel, those where they are strictly serial as well as those where they are serialparallel. And it is easy to show that not all members of each topological class have the same sign of "y . However, modules from the same topological class where the affected reactions are strictly parallel or strictly serial fall into a finite number of topological sub-classes (see Figure 10 through Figure  14, Table 2 and Table 3). Overall, there are only 17 distinct topologies where the affected reactions are strictly parallel (Table 2), which define 17 parametric sub-families of function f 2 . For all members of these sub-families, Equation 9 yields "y 0 (see Proposition 7 in Materials and methods). Similarly, there are only 11 distinct topologies where the affected reactions are strictly serial (Table 3), which define 11 parametric sub-families of function f 2 . For all members of these sub-families, Equation 9 yields "y ! 1 (see Proposition 8 in Materials and methods).
The results of Theorem 1 and Theorem 2 together imply that the topological relationship at the microscopic level between two reactions affected by mutations constrains the values of their epistasis coefficient at all higher phenotypic levels. Specifically, if negative epistasis is detected at any phenotypic level, the affected reactions cannot be strictly serial. And conversely, if strong positive epistasis is detected at any phenotypic level, the affected reactions cannot be strictly parallel. In this model, weak positive epistasis in the absence of any additional information does not imply any specific topological relationship between the affected reactions.
Sensitivity of results with respect to the magnitude of mutational effects depending on the location of the lower-level module within the higher-level module (see Figure 7). The topological class specifies the parametric family of the function f 1 which maps the effective rate constant y onto the effective rate constant y n (see Equation 4). For all modules from the topological class M b , function f 1 is linear (see Equation 29), which implies that the results of Theorem 1 hold exactly even when the effects of mutations are finite. For modules from the topological classes M io and M i , function f 1 is hyperbolic (see Equation 30 and Equation 31), so that the results of Theorem 1 may not hold when the effects of mutations are finite. To test the validity of Theorem 1 in these cases, I calculated the non-linear function f that maps the epistasis coefficient "y onto the epistasis coefficient "y n for 1000 randomly generated modules from each of the two topological classes and for mutations that increase or decrease the effective rate constant of the lower-level module y by up to 50% (see Materials and methods for details).
The validity of Theorem 1 depended on the sign of mutational effects. When at least one of the two mutations had a negative effect on y , map f had the same properties as described in Theorem 1, even for mutations with large effect, that is, it had a fixed point " in the interval ½0; 1 and this fixed point was unstable. When the effects of both mutations on y were positive and small, these results also held in about 82% of sampled modules (see Figure 4A, Figure  In the remaining~18% of sampled modules, the fixed point " shifted slightly above 1. As the magnitude of mutational effects increased, the fraction of sampled modules with ">1 grew, reaching 42% when both mutations increased y by 50%. In most of these cases, " remained below 2, and I found only one module with ">4 ( Figure 4A, Figure 4-figure supplement 1, Figure 4-figure supplement 2). Whenever the fixed point existed, it was unstable, with the exception of 12 modules for which f was very close to the identity map. For 289 modules (14.5%), the fixed point disappeared when both mutations increased y by 50%. In all these cases, "y n <"y , indicating that even large positive epistasis may decline as it propagates through the metabolic hierarchy when the effects of mutations are finite.
Fraction of modules with 0.01 < y < 0.99

Magnitude of mutational effects
10% 50%  " of the function f that maps lower-level epistasis "y onto higher-level epistasis "y n in modules with random parameters and for mutations with positive effects on y (see text and Materials and methods for details). All cases are shown in    Emergence of epistasis between mutations with finite effect sizes As mentioned above and discussed in detail in Materials and methods, modules where the reactions affected by mutations are strictly parallel fall into 17 topological classes (see Table 2) and modules where the reactions affected by mutations are strictly serial fall into 11 topological classes (see Table  3). The topological class specifies the parametric family of the function f 2 which maps the rate constants x and h of the affected reactions onto the effective rate constant y . To test how well Theorem 2 holds when the effects of mutations are finite, I calculated "y for randomly generated modules from these topological classes and for mutations increasing or decreasing x and h by up to 50% (see Materials and methods for details).
The validity of Theorem 2 depended most strongly on the topological relationship between the reaction affected by mutations. Whenever the affected reactions were strictly parallel, the epistasis coefficient at the level of module m was always less than or equal to zero, even when mutations perturbed the rate constants by as much as 50%, consistent with Theorem 2. This was also true for strictly serial reactions, as long as both mutations had positive effects. When the affected reaction were strictly serial and at least one of the mutations had a negative effect, the epistasis coefficient was always positive, but in some cases it was less than 1 (see Figure 4B, Figure 4-figure supplement 3), in disagreement with Theorem 2. This indicates that when the effects of mutations are not infinitesimal, even mutations that affect strictly serial reactions can potentially produce negative epistasis for higher-level phenotypes.
Taken together, these results suggest that both Theorem 1 and Theorem 2 extend reasonably well, but not perfectly, to mutations with finite effect sizes. The domains of validity of both theorems appear to depend on the sign of mutational effects. The way in which the theorems break down as their assumptions are violated appears to be stereotypical: when the mutational effects increase, more types of mutations produce weak epistasis, and the bias toward negative epistasis increases during propagation from lower to higher levels of the metabolic hierarchy.

Beyond first-order kinetics: epistasis in a kinetic model of glycolysis
The results of previous sections revealed a relationship between network topology and the ensuing epistasis coefficients in an analytically tractable model. However, the assumptions of this model are most certainly violated in many realistic situations. It is therefore important to know whether the same or similar rules of epistasis emergence and propagation hold beyond the scope of this model. I address this question here by analyzing a computational kinetic model of glycolysis developed by Chassagnole et al., 2002. This model keeps track of the concentrations of 17 metabolites, reactions between which are catalyzed by 18 enzymes ( Figure 5A and Figure 5-figure supplement 1; see Materials and methods for details). This model falls far outside of the analytical framework introduced in this paper: some reactions are second-order, reaction kinetics are non-linear, and in several cases the reaction rates are modulated by other metabolites (Chassagnole et al., 2002).
Testing the predictions of the analytical theory in this computational model faces two complications. First, in a non-linear model, modules are no longer fully characterized by their effective rate constants, even at steady state. Instead, each module is described by the flux between its I/O metabolites which non-linearly depends on the concentrations of these metabolites. Consequently, the effects of mutations and epistasis coefficients also become functions of the I/O metabolite concentrations. An epistasis coefficient at the level of module n can still be evaluated according to Equation 5, with y n now representing the flux through module n evaluated at a particular concentration of the I/O metabolites. For simplicity, I computationally find the steady state of the full glycolysis network and evaluate the epistasis coefficients only at this steady state, that is, for each module, I keep the concentrations of the I/O metabolites fixed at their steady-state values for the full network (see Materials and methods for details).
The second complication is that some control coefficients are so small that they fall below the threshold of numerical precision. Perturbing such reactions has no detectable effect on flux (Figure 5-figure supplement 2). In the analysis that follows, I ignore such reactions because the epistasis coefficient defined by Equation 5 can only be computed for mutations with non-zero effects on flux. In addition, the control coefficients of some reactions are negative, which implies that an increase in the rate of such reaction decreases the flux through the module (Figure 5-figure supplement 2). I also ignore such reactions because there is no analog for them in the analytical theory presented above. After excluding seven reactions for these reasons, I examine epistasis in 55 pairs of mutations that affect the remaining 11 reactions.
The glycolysis network shown in Figure 5A (see also Figure 5-figure supplement 1) can be naturally partitioned into four modules which I name 'LG' (lower glycolysis), 'UGPP' (upper glycolysis and pentose phosphate), 'GPP' (glycolysis and pentose phosophate), and 'FULL'. Modules LG and UGPP are non-overlappng and both of them are nested in module GPP which in turn is nested in the FULL module. Thus, at least for some reaction pairs it is possible to calculate epistasis coefficients at three levels of metabolic hierarchy. There are three such pairs, and the results for them are shown in Figure 5B. Epistasis for the remaining pairs of reactions can be evaluated only at one or two levels of the hierarchy because these reactions belong to different modules at the lowest levels or because their individual effects are too small. The results for all reaction pairs are shown in Figure 5 The enzymes catalyzing reactions whose control coefficients with respect to the flux through the module are positive are named; other enzyme names are ommitted for clarity (see Table 5 and Table 6 for abbreviations). Three reactions, catalyzed by PGI, PFK, PGDH, for which the epistasis coefficients are shown in panel B are highlighted in dark blue, red, and orange, respectively. (B) Epistasis coefficients for flux through each module between mutations perturbing the respective reactions, computed at steady state (see text and Materials and methods for details). Reactions catalyzed by PGI and PGDH are strictly parallel (path g6p-f6p-fdp-gap contains only PGI, path g6p-6pg-ribu5p-gap contains only PGDH and there is no simple path in UGPP between g6p and gap that contains both PGI and PGDH). Reactions catalyzed by PGI and PFK are serial-parallel (path g6p-f6p-fdpgap contains both reactions, path g6p-f6p-gap contains only PGI, path g6p-6pg-ribu5p-f6p-fdp-gap contains only PFK). Reactions catalyzed by PFK and PGDH are also serial-parallel (path g6p-6pg-ribu5p-f6p-fdp-gap contains both reactions, path g6p-f6p-fdp-gap contains only PFK, path g6p-6pg-ribu5p-gap contains only PGDH). The online version of this article includes the following figure supplement(s) for figure 5:  The strongest qualitative prediction of the analytical theory described above is that negative epistasis for a lower-level phenotype cannot turn into positive epistasis for a higher-level phenotype, while the converse is possible. Figure 5B and Figure 5-figure supplement 3 show that the data are consistent with this prediction. Another prediction is that epistasis between strictly parallel reactions should be negative. There is only one pair of reactions that are strictly parallel, those catalyzed by glucose-6-phosphate isomerase (PGI) and 6-phosphogluconate dehydrogenase (PGDH), and indeed the epistasis coefficients between mutations affecting these reactions are negative at all levels of the hierarchy ( Figure 5B). Finally, the analytical theory predicts that mutations affecting strictly serial reactions should exhibit strong positive epistasis. There are 36 reaction pairs that are strictly serial. Epistasis is positive between mutations in 33 of them, and it is strongly positive in 17 of them ( Figure 5-figure supplement 3). Three pairs of strictly serial reactions (those where one reaction is catalyzed by PK and the other is catalyzed by PGI, PGDH, or PFK) exhibit negative epistasis (Figure 5-figure supplement 3). These results suggest that, although one may not be able to naively extrapolate the rules of emergence and propagation of epistasis derived in the simple analytical model to more complex networks, some generalized versions of these rules may nevertheless hold more broadly.

Discussion
Genetic interactions are a powerful tool in genetics, and they play an important role in evolution. Yet, how epistasis emerges from the molecular architecture of the cell and how it propagates to higher-level phenotypes, such as fitness, remains largely unknown. Several recent studies made a statistical argument that the structure of the fitness landscape (and, as a consequence, the epistatic interactions between mutations at the level of fitness) may be largely independent of the underlying molecular architecture of the organism (Martin, 2014;Lyons et al., 2020;Reddy and Desai, 2020). If mutations are typically highly pleiotropic (i.e. affect many independent phenotypes relevant for fitness) or are engaged in a large number of idiosyncratic epistatic interactions with other mutations in the genome, the resulting fitness landscapes converge to certain limiting shapes, such as the Fisher's geometric model (Martin, 2014;Tenaillon, 2014). To what extent these arguments indeed apply in practice is unclear. But if they do, most genetic interactions detected at the fitness level may be uninformative about the architecture of the underlying biological networks.
In this paper, I took a 'mechanistic' approach, which is in a sense orthogonal to the statistical one. In my model of a hierarchical metabolic network, mutations are highly pleiotropic (a mutation in any enzyme affects all the fluxes in the module) and highly epistatic (a mutation in any enzyme interacts with mutations in any other enzyme). Yet, these pleiotropic and epistatic effects appear to be sufficiently structured that some information about the topology of the network is preserved through all levels of the hierarchy. Indeed, the emergence and propagation of epistasis follow two simple rules in my model. First, once epistasis emerges at some level of the hierarchy, its propagation through the higher levels of the hierarchy depends weakly on the details of the network. Specifically, negative epistasis at a lower level induces negative epistasis at all higher levels and strong positive epistasis induces strong positive epistasis at all higher levels, irrespectively of the topology or the kinetic parameters of the network. Second, what type of epistasis emerges in the first place depends on the topological relationship between the reactions affected by mutations. In particular, negative epistasis emerges between mutations that affect strictly parallel reactions and positive epistasis emerges between mutations that affect strictly serial reactions. Insofar as my model is relevant to nature, the key conclusion from it is that epistasis at high-level phenotypes carries some, albeit incomplete, information about the underlying topological relationship between the affected reactions.
These results have implications for the interpretation of empirically measured epistasis coefficients. It is often assumed that a positive epistasis coefficient between mutations that affect distinct genes signals that their gene products act in some sense serially, whereas a negative epistasis coefficient is a signal of genetic redundency, that is, a parallel relationship between gene products (Dixon et al., 2009). My results suggest that this reasoning is generally correct, but that the relationship between epistasis and topology is more nuanced. In particular, the sign of the epistasis coefficient in my model constrains but does not uniquely specify the topological relationship, such that a negative epistasis coefficient implies that the affected reactions are not strictly serial (but may or may not be strictly parallel) and an epistasis coefficient exceeding unity excludes a strictly parallel relationship (but does not necessarily imply a strictly serial relationship). My model suggests that one should also be careful with inferences going in the other direction, that is, extrapolating the patterns of epistasis measured at the biochemical level to those at the level of fitness. For example, if one wishes to infer the past evolutionary trajectory of an enzyme and finds two amino acid changes that exhibit a positive interaction at the level of enzymatic activity, it does not automatically imply that these mutations will exhibit a positive interaction at the level of fitness.
The strongest results presented here rely on several assumptions. I proved Theorem 1 and Theorem 2 in the limit of vanishingly small mutational effects. Some results of the metabolic control analysis, notably the summation theorem, are sensitive to this assumption (Bagheri-Chaichian et al., 2003;Bagheri and Wagner, 2004). To test the sensitivity of my analytical results with respect to this assumption, I used numerical simulations of networks with randomly sampled kinetic parameters and found that the results hold reasonably well when the effects of mutations are not infinitesimal.
The most restrictive assumption in the present work is that of first-order kinetics. Networks with only first-order kinetics clearly fail to capture some biologically important phenomena, such as sign epistasis Chou et al., 2014;Ewald et al., 2017;Kemble et al., 2020). I discuss possible ways to relax this assumption below. But at present, a major question remains whether the rules of epistasis and propagation described here hold for realistic biological networks and whether they can be directly used to interpret empirical epistasis coefficients. My analysis of a fairly realistic computational model of glycolysis cautions against overinterpreting empirical epistasis coefficients using the rules derived here. But it also suggests that more general rules of propagation and emergence of epistasis may be found for more realistic networks. Thus, the simple rules derived here should probably be thought of as null expectations.
Relaxing the first-order kinetics assumption is analytically challenging because it is critical for replacing a module with a single effective reaction without altering the dynamics of the rest of the network. Although such lossless replacement is almost certainly not possible in networks with more complex kinetics, advanced network coarse-graining techniques may offer a promising way forward (Rao et al., 2014). Flux balance analysis (FBA) is an alternative approach (Orth et al., 2010). FBA is appealing because it entirely removes the dependence of the model on reaction kinetics. However, this comes at a substantial cost. In FBA models, fitness and other high-level phenotypes become independent of the internal kinetic parameters, which is clearly unrealistic. Nevertheless, FBA is often very good at capturing the effects of mutations that change the topology of metabolic networks, such as reaction additions and deletions (reviewed in Gu et al., 2019). At the same time, there is no natural way within FBA to model mutations that perturb reaction kinetics (He et al., 2010;Alzoubi et al., 2019). In short, FBA and my approach are complementary (see Appendix 5 for a more detailed discussion).

Generic properties of epistasis in biological systems
Simple models help us identify generic phenomena-those that are shared by a large class of systems-which should inform our 'null' expectations in empirical studies. Deviations from such null in a given system under examination inform us about potentially interesting peculiarities of this system. The model presented here suggests several generic features of epistasis between genome-wide mutations.

Epistasis has two contributions
My analysis shows that the value of an epistasis coefficient measured for a higher level phenotype is a result of two contributions (Domingo et al., 2019), propagation and emergence, which correspond to two terms in Equation 6 (or the more general Equation 49). The first term, propagation, shows that if two mutations exhibit epistasis for a lower-level phenotype they also generally exhibit epistasis for a higher-level phenotype. The second contribution comes from the fact that lower-level phenotypes map onto higher-level phenotypes via non-linear functions. This is true even in a simple model with linear kinetics considered here. As a result, even if two mutations exhibit no epistasis at the lower-level phenotype, epistasis must emerge for the higher-level phenotype, as previously pointed out by multiple authors (e.g. Kacser and Burns, 1981;DePristo et al., 2005;Martin et al., 2007;Chiu et al., 2012;Otwinowski et al., 2018;Domingo et al., 2019;Husain and Murugan, 2020).

Epistasis depends on the genetic background and environment
My analysis shows that the value of an epistasis coefficient for a particular pair of mutations is in large part determined by the topological relationship between reactions affected by them. Since the topology of the metabolic network itself depends on the genotype (which genes are present in the genome) and on the environment (which enzymes are active or not), the topological relationship between two specific reactions might change if, for example, a third mutation knocks out another enzyme or if an enzyme is up-or down-regulated due to an environmental change (see Figure 3). Thus, we should generically expect epistasis between mutations to depend on the environment and on the presence or absence of other mutations in the genome. In other words, G Â G Â G interactions (higher-oder epistasis) and G Â G Â E interactions (environmental epistasis) should be common (Snitkin and Segrè, 2011;Flynn et al., 2013;Lindsey et al., 2013;Taylor and Ehrenreich, 2015;Sailer and Harms, 2017a). This fact complicates the interpretation of inter-gene epistasis since mutations in the same pair of genes can exhibit qualitatively different genetic interactions in different strains, organisms and environments, as has been observed (St Onge et al., 2007;Musso et al., 2008;Tischler et al., 2008;Dowell et al., 2010;Heigwer et al., 2018;. However, the situation may not be hopeless because the topological relationship between two reactions cannot change arbitrarily after addition or removal of a single reaction. For example, if two reactions are strictly parallel, removing a third reaction does not alter their relationship (see Proposition 5). Thus, comparing matrices of epistasis coefficients measured in different environments or genetic backgrounds could inform us about how the organism rewires its metabolic network in response to these perturbations (St Onge et al., 2007;Musso et al., 2008;Heigwer et al., 2018;.

Skew in the distribution of epistasis coefficients
Studies that measure epistasis for fitness-related phenotypes among genome-wide mutations usually find both positive and negative epistases, but the preponderance of positive and negative epistasis varies. Some authors reported a skew toward positive interactions among deleterious mutations (Jasnos and Korona, 2007;He et al., 2010;Johnson et al., 2019), whereas others reported a skew toward negative interactions (Szappanos et al., 2011;Costanzo et al., 2016). Beneficial mutations appear to predominantly exhibit negative epistasis, also known as 'diminishing returns' epistasis (e.g. Martin et al., 2007;Khan et al., 2011;Chou et al., 2011;Kryazhimskiy et al., 2014;Schoustra et al., 2016). The reasons for these patterns are currently unclear. Several recent theoretical papers offer possible statistical explanations for them (Martin, 2014;Lyons et al., 2020;Reddy and Desai, 2020). On the other hand, mechanistic predictions for the distribution of epistasis coefficients are not yet available (but see Sanjuán and Nebot, 2008;Macía et al., 2012;Chiu et al., 2012). The present work does not directly address this problem either, but it provides some additional clues. First, my model shows that the sign of epistasis at least to some extent reflects the topology of the network. Thus, the distribution of epistasis coefficients at high-level phenotypes in real organisms should ultimately depend on the preponderance of different topological relationships between the edges in biological networks. It then seems a priori unlikely that positive and negative interactions would be exactly balanced. Thus, we should expect the distribution of epistasis coefficients to be skewed in one or another direction.
The second observation is that in the metabolic model considered here a positive epistasis coefficient at one level of the hierarchy can turn into a negative one at a higher level, but the reverse is not possible. This bias toward negative epistasis at higher-level phenotypes appears to be even stronger in networks with saturating kinetics ( Figure 5 and Figure 5-figure supplement 3).
The third observation is that epistasis among beneficial and deleterious mutations affecting metabolic genes should be identical at the level where they arise, provided that beneficial and deleterious mutations are identically distributed among metabolic reactions. Thus, a stronger skew toward negative epistasis among beneficial mutations at the level of fitness could arise in my model for two mutually non-exclusive reasons. One possibility is that beneficial mutations tend to affect certain special subsets of genes, those that predominantly give rise to negative epistasis. For example, beneficial mutations may for some reason predominantly arise in enzymes that catalyze strictly parallel reactions. Another possibility is that when epistasis between beneficial mutations propagates through the metabolic hierarchy it tends to exhibit a stronger negative bias compared to epistasis between deleterious mutations. Indeed, this phenomenon arises in my model among mutations with large effect (see Figure 4A, Figure

Epistasis is generic
Perhaps the most important-and also the most intuitive-conclusion of this work is that we should expect epistasis for high-level phenotypes, such as fitness, to be extremely common. Consider first a unicellular organism growing exponentially. Its fitness is fully determined by its growth rate, which can be thought of as the rate constant of an effective biochemical reaction that converts external nutrients into cells (see Appendix 6 for a simple mathematical model of this statement). In other words, growth rate is the most coarse-grained description of a metabolic network and, as such, it depends on the rate constants of all underlying biochemical reactions. Many previous studies have shown that within-protein epistasis is extremely common (e.g. Lunzer et al., 2005;DePristo et al., 2005;Sailer and Harms, 2017b;Husain and Murugan, 2020). Present work shows that, once epistasis arises at the level of protein activity, it will propagate all the way up the metabolic hierarchy and will manifest itself as epistasis for growth rate. It also suggests that growth rate is a generically non-linear function of the rate constants of the underlying biochemical reactions, such that all mutations that affect growth rate individually would also exhibit pairwise epistasis for growth rate with each other (Kacser and Burns, 1981;DePristo et al., 2005;Martin et al., 2007;Chiu et al., 2012;Otwinowski et al., 2018;Husain and Murugan, 2020).
In more complex organisms and/or in certain variable environments, it may be possible to decompose fitness into multiplicative or additive components, for example, plant's fitness may be equal to the product of the number of seeds it produces and their germination probability, as pointed out by Chou et al., 2011. Then, mutations that affect different components of fitness would exhibit no epistasis. However, such situations should be considered exceptional, as they require fitness to be decomposable and mutations to be non-pleiotropic.
If epistasis is in fact generic for high-level phenotypes, why do we not observe it more frequently? For example, a recent study that tested almost all pairs of gene knock-out mutations in yeast found genetic interactions for fitness for only about 4% of them (Costanzo et al., 2016). One possibility is that many pairs of mutations exhibit epistasis that is simply too small to detect with current methods. As the precision of fitness measurements improves, we would then expect the fraction of interacting gene pairs to grow. Another possibility is that systematic shifts in the distribution of estimated epistasis coefficients away from zero are taken by researchers as systematic errors rather than real phenomena, and are normalized out. Thus, some epistasis that would otherwise be detectable may be lost during data processing.
If epistasis is indeed as ubiquituous as the present analysis suggests, it would call into question how observations of inter-gene epistasis are interpreted. In particular, contrary to a common belief, a non-zero epistasis coefficient does not necessarily imply any specific functional relationship between the components affected by mutations beyond the fact that both components somehow contribute to the measured phenotype (Boyle et al., 2017). The focus of future research should then be not merely on documenting epistasis but on developing theory and methods for a robust inference of biological relationships from measured epistasis coefficients.

Materials and methods
Key ideas and logic of proofs of Theorems 1 and 2 Before proceeding to the detailed proofs of Theorem 1 and Theorem 2, I informally outline some key ideas and the basic logic.
The central object of the theory is a metabolic module. Modules have two key properties. First, a module is somewhat isolated from the rest of the metabolic network, in the sense that all metabolites inside it interact with only two metabolites outside, the I/O metabolites. The second property is that the metabolites within the module are sufficiently connected that each of them individually as well as any subset of them collectively can achieve a quasi-steady state (QSS), given the concentrations of the remaining metabolites. This property is proven in Proposition 1.
When some metabolites are at QSS, they can be effectively removed from the network and replaced with effective reactions among the remaining metabolites. In other words, one can 'coarsegrain' the network by removing metabolites. This approach is a standard biochemical-network reduction technique (Segel, 1988); for example, the Briggs-Haldane derivation of the Michaelis-Menten formula is based on this idea. In general, the resulting effective reactions have more complex (non-linear) kinetics than the original reactions. However, when the original reactions are firstorder, the effective reactions are also first-order, that is, there is no increase in complexity. In Network coarse graining and an algorithm for evaluating the effective rate constant for an arbitrary module, I formally define the coarse-graining procedure (CGP) that eliminates one or multiple metabolites and replaces them with effective reactions.
CGP is an essential concept in my theory. I use it to compute the QSS concentrations for internal metabolites within a module (Corollary 1) and thereby prove Proposition 1, mentioned above. Since any module m can achieve a QSS at any concentrations of its I/O metabolites and since any module has only two I/O metabolites, it can be replaced with a single effective reaction (Corollary 2). CGP provides a way to calculate the rate constant y of this reaction. In other words, the CGP is an algorithm for evaluating function Fðx Þ in Equation 1 for any module m.
CGP has an important property: its result does not depend on the order in which metabolites are eliminated. Therefore, in computing the effective rate constant of a module, we can choose any convinient way to eliminate its metabolites. Suppose that one module m is nested within another module n as in Figure 1A. A convenient way to compute the effective rate y n of the larger module is to first coarse-grain the smaller module m, replacing it with an effective rate y , and then eliminate all the remaining metabolites in n. Since effective rates after coarse-graining do not depend on the order of metabolite elimination, y n must depend on the rate constantsx only indirectly, through y . In other words, all the information about the smaller module m that is relevant for the performance of the larger module n is contained in y . Therefore, if a mutation or mutations perturb only reactions inside of the smaller module m, we only need to know their effects on the effective rate constant y to completely understand how they will perturb the performance of the larger module n. Specifically, if we have two such mutations A and B, all the information about them is contained in three numbers, d A y , d B y and "y . Theorem 1 then describes how epistasis at the level of module m propagates to epistasis at the level of module n.
The proof of Theorem 1 proceeds as follows. Let a be the effective reaction with rate constant y that represents module m within the larger module n, and consider y n as a function of y , as in Equation 4. To obtain f 1 ðy Þ, it is convenient to first eliminate all metabolites that do not participate in reaction a. No matter what the initial structure of module n is, such coarse-graining will produce only one of three topologically distinct 'minimal' modules, which differ by the location of reaction a with respect to the I/O metabolites of module n (Figure 7). This implies that the function f 1 can belong to three parameteric families, where the parameters are the effective rate constants of reactions other than a in each of the minimal modules. This is the essence of Proposition 2. Then Theorem 1 can be easily proven by explicitly evaluating the first-and second-order control coefficients for each of the three functions and showing that the statements of the theorem hold for all of them, irrespectively of the function's parameters. Now consider two reactions a and b with rate constants x and h, and imagine the two mutations A and B that affect these reactions. To understand what value of "y will occur, we need to obtain y as a function of x and h (Equation 3). To do so, it is convenient to first eliminate all metabolites that do not participate in reactions a or b. No matter what the initial structure of module m is, such coarse-graining will produce only one of nine topologically distinct minimal modules, which differ by the location of reactions a and b with respect to the I/O metabolites of module m and each other (Figure 8). This implies that the function f 2 can belong to nine parameteric families. This is the essence of Proposition 3 and Corollary 3.
How does the topological relationship between reactions a and b translate into epistasis? First, there are only three types of relationships between any pair of reactions in a module: strictly serial, strictly parallel, or serial-parallel (see Figure 3). Second, Proposition 4 and Corollary 4 show that coarse-graining does not alter the strict relationships, that is, if reactions a and b are strictly serial or strictly parallel before coarse-graining they will remain so after coarse-graining. This is important because it implies that to prove Theorem 2 we do not need to consider an infinitely large space of all modules but only a much smaller space of all minimal modules, that is, those that have only those metabolites that participate in the affected reactions a and b. Although the space of all minimal modules is still infinite, the space of their topologies is finite (see Figure 8). For some minimal topologies, the connection between the strictly serial or strictly parallel relationship and epistasis can be established very easily. For example, if reaction a and reaction b both share an I/O metabolite as a substrate (see topological class M io;io;IO in Figure 8), then they are always strictly parallel, no matter what the rest of the module looks like. Evaluating the first-and second-order control coefficients for the function f 2 that corresponds to this topological class reveals that "y 0 for any parameter values of f 2 .
Unfortunately, most topological classes are too broad and include modules where reactions a and b are strictly serial as well as modules where they are strictly parallel or serial-parallel (e.g., class M io;io;; ). Consequently, the sign of "y for such modules can change depending on the values of the rate constants. However, since the number of distinct minimal topologies is finite, it is possible to identify all minimal topologies where the reactions a and b are strictly serial or strictly parallel. These topological sub-classes define parametric sub-families of function f 2 , and we can explicitly calculate "y for all such functions. However, such brute-force approach is extremely cumbersome because the number of distinct minimal topologies is very large.
Fortunately, the following simple and intuitive fact greatly simplifies this problem. If two reactions are strictly serial or strictly parallel, this relationship does not change if a third reaction is removed from the module. This statement is the essence of Proposition 5. However, if the two reactions are serial-parallel, removal of a third reaction can change the relationship to a strictly serial or a strictly parallel one. As a consequence, there exist certain most connected 'generating' topologies where the relationship between the focal reactions is strictly parallel or strictly serial, and any other strictly serial minimal topology can be produced from at least one of the generating topologies by removal of reactions. This is the essence of Proposition 6. All generating topologies can be discovered by a simple algorithm provided in Appendix 3. They are listed in Table 2 and Table 3 and shown in Figure  10 through Figure 14. Each generating topology defines a parametric sub-family of function f 2 , and I explicitly evaluate the first-and second-order control coefficients for all these sub-families (see Proposition 7 and Proposition 8) which essentially completes the proof of Theorem 2.

Network coarse-graining Notations and definitions
Here, I give a more precise definition of the model and introduce additional notations and definitions. As mentioned above, all reactions are first order and reversible. Thus, each reaction i $ j has one substrate i 2 A and one product j 2 A, and it is fully described by its rate constant x ij . By definition, x ii ¼ 0. I denote the set of all reactions by R ¼ i $ j : i; j 2 A; x ij >0 È É . The dynamics of metabolite concentrations S 1 ; . . . ; S n in the network N are governed by equations where In what follows, it will be important to distinguish three types of reactions within a module, based on their topological relationship to the I/O metabolites of that module. The topology of the module m is determined by its set of reactions R  Figure 1A has three i/o reactions 1 $ 3, 1 $ 4 and 2 $ 4. Finally, I refer to reactions between any two I/O metabolites for module m as bypass reactions for module m, and I denote the set of all such reactions by R b & R . For example, module m in Figure 1A has no bypass reactions but reaction 1 $ 5 is a bypass reaction for module n. By definition, all these three sets of reactions R i , R io and R b are non-overlapping, and . Another important concept are the simple paths that lie within a module. For any two metabolites i; j 2 A [ A IO , I denote a simple path between them that lies within m as p ij or, equivalently as i $ k 1 $ . . . $ k m $ j (where all k ' 2 A ). I say that each of the metabolites k ' belongs to (or is contained in) path p ij (denoted by k ' 2 p ij ). Similarly, I say that each of the reactions k ' $ k 'þ1 belong to (or are contained in) path p ij (denoted by k ' $ k 'þ1 2 p ij ). I will drop superindex m from p ij if it is clear what module is being referred to.
Network coarse graining and an algorithm for evaluating the effective rate constant for an arbitrary module In this section, I formally introduce and characterize the coarse-graining procedure (CGP). First, I introduce the main idea, which is to eliminate a metabolite that is at QSS and to replace it with a set of new reactions between metabolites adjacent to the eliminated one. This is exactly analogous to the star-mesh transformation in the theory of electric circuits (Versfeld, 1970). The resulting network is a coarse-grained version of the original network in the sense that it has one less metabolite. Next, I define the CGP, which is simply multiple metabolite eliminations applied successively. I prove Proposition 1, which justifies applying the CGP to a whole module and replacing it with a single effective reaction (Corollary 2). Finally, I show how to apply the CGP in practice to compute function F from Equation 1 for modules with some simple topologies.
Elimination of a single metabolite I begin by outlining the main idea behind the CGP, which is to replace one metabolite internal to a module with a series of effective reactions between metabolites adjacent to it. If the effective rate constants are defined appropriately, the dynamics of all metabolites in the resulting coarse-grained network are the same as in the original network, provided that the eliminated metabolite is at QSS in the original network.
To formalize this idea, suppose that module ¼ A ;x À Á contains m internal metabolites. Let k 2 A be the internal metabolites that will be eliminated. Let A fkg ¼ A n fkg be the reduced metabolite set and letx fkg be the reduced ðn À 1Þ Â ðn À 1Þ matrix of rate constants defined by where D k is given by Equation 11. Such metabolite elimination has three properties that follow immediately from Equation 12 and Equation 13. First, the rate constants of reactions do not change as long as the eliminated metabolite does not participate in them. Mathematically, x fkg ij ¼ x ij for all metabolites i and j that are not adjacent to the eliminated metabolite k. In particular, this is true for all metabolites external to module m. Second, because equilibrium constants have the property K ij ¼ K i' K 'j for any metabolites i; j; ', the rate constants x fkg ij obey Haldane's relationships. Therefore, the reduced metabolite set A fkg and the reduced rate matrixx fkg define a new 'coarse-grained' metabolic network N fkg ¼ A fkg ;x fkg À Á . It is easy to show that subnetwork m after the elimination of metabolite k is still a module. Third, the reaction set of module m (i.e., its topology) in the coarse-grained network N fkg depends only on the reaction set of this module in the original network N , but not on the particular values in the rate matrix x .
Next, I will show that the dynamics of metabolites in the coarse-grained network N fkg are identical to the dynamics of metabolites in the original network N where metabolite k is at QSS. Note that if metabolite k is at QSS in the network N , its concentration is given by which follows from Equation 10. Now, the dynamics of metabolites in the network N fkg are governed by equations where As mentioned above, x fkg ij ¼ x ij for all pairs of metabolites where at least one metabolite is external to module m. Therefore, Equation 15 for the external metabolites are identical to Equation 10 that govern the dynamics of these metabolites in the original network N . Next, consider the dynamics of the I/O and internal metabolites for module m in the coarse-grained network N fkg , that is, those in the set A IO [ A n fkg. For any such metabolite i, the sum in the righthand side of Equation 15 can be re-written as According to Equation 14, the second term in Equation 16 equals x ki S k , so that Equation 16 becomes For any metabolite i 2 A IO [ A n fkg, the second term in the righthand side of Equation 15 can be re-written as The coarse-graining procedure (CGP) Here, I define the CGP for an arbitrary set of internal metabolites by applying metabolite elimination recursively.
Let E A be an arbitrary subset of metabolites internal to module m in the metabolic network N and let n E be the number of elements in E. Let A E ¼ A n E be the reduced metabolite set after the metabolites have been eliminated. I define the reduced ðn À n E Þ Â ðn À n E Þ matrix of rate constants x E as follows. If n E ¼ 1, the matrixx E is defined by Equation 12 and Equation 13. If n E >1, then I define it recursively. Suppose that all metabolites in E other than some metabolite k 2 E have been previously eliminated, such that the coarse-grained network , and the known matrixx E 0 . Then, I define the matrixx E through the elimination of metabolite k from N E 0 , that is, with I define the the coarse-graining procedure that eliminates the metabolite set E as a map As with the elimination of a single metabolite, it is straightforward to show that the rate constants x E ij obey Haldane's relationships, so that N E is indeed a metabolic network. CG E maps module m within the metabolic network N onto a subnetwork 0 within the metabolic network N E . It is straightforward to show that 0 is a module. Whenever there is no ambiguity, I will label both the original and the coarse-grained versions of the module by m. To simplify notations, if the CGP Box 1. Properties of the coarse-graining procedure.
The coarse-graining procedure CG E eliminating metabolites in set E A has the following useful properties which follow from Equation 24.
1. The effective rate constants x E ij do not depend on the order in which metabolites are eliminated. Therefore, the composition of coarse-graining procedures is commutative, that is, if E ¼ E 1 [ E 2 , where E 1 and E 2 are two non-overlapping subsets of A , then 2. If at least one of the metabolites i or j is not adjacent to any of the eliminated metabolites, then x E ij ¼ x ij . In particular, x E ij ¼ x ij if either i and/or j are external to m.
3. The topology of module m after the application of CG E depends only on its original topology but not on the values of its rate constantsx .
4. If both metabolites i and j are adjacent to at least one eliminated metabolite, then x E ij ¼ x ij þ a; where a ! 0 depends only on the rate constants of reactions that involve at least one eliminated metabolite. In particular, if both k and ' are from A n E, then x E ij is independent of x k' . 5. If E ¼ A , then the effective rate constant y of module m depends on the rate matrixx but does not depend on any other reaction rates, that is, Equation 1 holds. Furthermore, the functional form of Fðx Þ depends only on the topology of module m, that is, all modules with the same topology are mapped onto y by the same function F.
6. Suppose that module m is nested in a larger module n ¼ A n ;x n ð Þ (see Figure 1A). It follows from Property #1 that CG n ¼ CG CG AnnA , that is, y n can be obtained by carrying out the CGP in two stages, by first eliminating module m and replacing it with the effective reaction with the rate constant y and then eliminating the remaining metabolites in A n . In the network N after the first stage of coarse-graining, all rate constantsx nn are identical to those in the original network, that is, they are independent ofx , by virtue of Property #2. Therefore y n depends on the rate constantsx of reactions within module m only through the effective rate constant y of module m. In other words, Equation 4 holds.
7. If, in the original network N , metabolites i and j are adjacent or connected by a simple path that contains only the eliminated metabolites, then metabolites i and j are adjacent in the coarse-grained network N E .
8. If, in the metabolic network N , metabolites i and j are not adjacent (i.e. x ij ¼ 0) and no simple path exists within the set E (i.e. such that all non-terminal metabolites in this path are from E) that connects metabolites i and j, then metabolites i and j are also not adjacent in the coarse-grained network N E (i.e., x E ij ¼ 0). 9. It follows from properties #7 and #8 that for a simple path p '1 'm ¼ ' 1 $ ' 2 $ Á Á Á $ ' m to exist in module m after the application of CG , it is neccessary and sufficient that for each i ¼ 1; . . . ; m À 1, either ' i and ' iþ1 are adjacent in the original module m or there exists a simple path ' i $ Á Á Á $ ' iþ1 within the original module m all of whose non-terminal metabolites are from E.
eliminates the entire module m (i.e., if E ¼ A ), I label it CG . I label the coarse-grained network that restults from the application of CG by N and I label the effective rate of the reaction substituting module m in network N as y . Intuitively, the result of coarse-graining should not depend on the order in which the metabolites are eliminated. To see this, let us obtain explicit (i.e. not recursive) expressions for x E ij . First, by applying the recursion Equation 19, it is easy to show that elimination of two metabolites E ¼ fk; 'g yields effective rate constants As expected, Equation 22 and Equation 23 are symmetric with respect to the eliminated metabolites k and '. Extrapolating from Equation 23, it is possible to show that for an arbitrary metabolite subset E A that contains n E metabolites, D En k1;k2;...;kL f g kL ; i; j 2 A n E; i 6 ¼ j: Here, the second sum is taken over all n E !=ðn E À LÞ! ordered lists of metabolites k 1 ; . . . ; k L ð Þ from E. Each list can be thought of as a simple path within E that connects metabolites i and j. The proof of Equation 24 can be found in Appendix 1. As expected, Equation 24 shows that the effective reation rate x E ij does not depend on the order in which metabolites are eliminated. This and other properties of the CGP are listed in Box 1.
One of key building blocks of the proofs of Theorem 1 and Theorem 2 is the fact that modules can be classified into a finite number of topological classes (see below). To arrive at this classification, it will be convenient to define a composition of coarse-graining procedures, as follows. Suppose that CG E1 and CG E2 are two coarse-graining procedures of network N for two subsets of metabolites E 1 & A and E 2 & A . If the sets E 1 and E 2 are non-overlapping, CG E2 is also defined for the coarse-grained network N E1 which is the result of applying CG E1 to the original network N . The result of applying CG E2 to the N E1 is called the composition of coarse-graining procedures CG E1 and CG E2 of the original network N and is denoted as CG E1 CG E2 .
As defined above, coarse-graining is a formal procedure, and there is no a priori guarantee that (a) it can in fact be carried out for every set of metabolites and (for example, because a metabolite set does not have a steady-state solution); and (b) it will not distort the dynamics in the rest of the network. The following proposition alleviates both of these concerns and thereby justifies the use of the CGP for any subset of internal metabolites within a module (including the entire module m). It is straightforward to prove it by induction, using the same logic as in the elimination of a single metabolite.

Proposition 1
Let E be any subset of metabolites internal to module m. Then, 1. There exists a joint QSS solution S i for all metabolites i 2 E, given the concentrations of the remaining internal and I/O metabolites for module m. 2. The dynamics of all remaining metabolites in A n E in the coarse-grained metabolic network N E are the same as in N where all metabolites in E are at QSS.

Corollary 1
Without loss of generality, suppose that the I/O metabolites for module m are labeled 1 and 2 and its internal metabolites are labeled A ¼ f3; 4; . . . ; mg. There exists a unique QSS S i for all i 2 A . The QSS concentrations can be obtained by recursively applying equation.

Corollary 2
Without loss of generality, suppose that the I/O metabolites for module m are labeled 1 and 2. Module m can be replaced with a single effective reaction between its I/O metabolites, whose rate constant y can be calculated using Equation 19

Computation of effective rate constants for simple modules
Corollary 2 provides a method for replacing any module m at QSS with an effective rate y ¼ Fðx Þ, which can be calculated using Equation 19 and Equation 20 or Equation 24. Here, I show how to implement this calculation for three simple metabolic modules.

Linear pathway
Consider a linear pathway with I/O metabolites 1 and m and internal metabolites 2; 3; . . . ; m À 1 ( Figure 6A). This labeling of metabolites is more convenient for the linear pathway. To calculate y , I will apply recursion Equation 19 and Equation 20. I start by eliminating metabolite 2. After this initial coarse-graining step, the resulting module is still a linear pathway, where two reactions 1 $ 2 $ 3 were replaced with a single reaction 1 $ 3 with the effective rate constant.
All other rate constants remain unchanged. Next, I eliminate metabolite 3. The resulting module is still a linear pathway, where now three reactions 1 $ 2 $ 3 $ 4 were replaced with a single reaction 1 $ 4 with the effective rate constant All other rate constants remain unchanged. Continuing this process until all internal metabolites are eliminated, I obtain which is identical to the expression originally obtained by Kacser and Burns, 1973.

Two parallel pathways
Consider two parallel pathways with I/O metabolites 1 and 2 and internal metabolites 3 and 4 ( Figure 6B). I obtain the effective rate constant using Equation 22 Thus, the contributions of parallel pathways are simply added.
Module m in Figure 1 To obtain the effective rate constant for module m shown in Figure 1, I again use Equation 22 with Classification of modules with respect to 'marked' reactions, and the parametric families of functions f 1 and f 2 The CGP described above allows us to calculate the function F that maps the rate matrixx for an arbitrary module m onto the module's effective rate constant y . F is a multivariate function of the entire matrixx . However, in many applications, only one or two reactions are varied at a time while all others remain constant, and we want to know how module's effective parameter y depends on these one or two perturbed reactions. I refer to such singled-out reactions as 'marked'. When y is considered as a function of the rate constant x of one marked reaction, I write y ¼ f 1 ðÞ, as in Equation 2. When y is considered as a function of the rate constants x and h of two marked reactions, I write y ¼ f 2 ð; hÞ as in Equation 3. The functional form of F and, as a consequence, the functional forms of f 1 and f 2 depend only on the topology of module m (see Property #5 of the CGP in Box 1). In other words, modules with identical topologies have the same functional forms of f 1 and f 2 , such that each topology of module m defines a parametric family of functions f 1 and f 2 , where all rate constants within module m other than x, or x and h, play a role of parameters.
Since the number of possible topologies is infinite, there is an infinite number of functional forms of F. However, the number of parameteric families of functions f 1 and f 2 is finite, and it turns out to be small. To see this, notice that for any module with a single marked reaction, the CGP can be carried out in two stages. In the first stage, we can eliminate all metabolites that do not participate in the marked reaction. The resulting coarse-grained module is minimal in the sense that it can have at most two internal metabolites. Such minimal modules (and, as a consequence, all modules with one marked reaction) fall into three distinct topological classes, which are specified by the location of the marked reaction with respect to the I/O metabolites, as shown in Figure 7. This implies that there are only three parameteric families of the function f 1 . The topologies of the three minimal modules are sufficiently simple that the three corresponding parametric functional forms of f 1 can be easily computed by applying the coarse-graining Equation 19 or Equation 22. This result is formulated in Proposition 2.
The same logic applies to modules with two marked reactions. CGP that eliminates all metabolites that do not participate in the marked reactions maps all such modules onto respective minimal modules, which can have at most four internal metabolites (see Figure 8). This result is formulated in Proposition 3. Minimal modules (and, as a consequence, all modules with two marked reactions) fall into nine distinct topological classes, which are specified by the locations of the marked reactions.
All modules from the same topological class are described by functions f 2 from the same parametric family. These families are characterized in Corollary 3.
These topological classifications are extremely useful for the following reason. If we can show that all functions from the same parameteric family (corresponding to a given topological class) have some common property irrespectively of the values of the parameters, it would imply that this property holds for all modules from the corresponding topological class. This logic is a key part of the proofs of both Theorem 1 and Theorem 2.
To formalize this reasoning, consider module ¼ A ;x À Á and let a ¼ i a $ j a and b ¼ i b $ j b be two reactions from its set of reactions R . I call a pair ð; aÞ a single-marked module and I call a triplet ð; a; bÞ a double-marked module, and I refer to reactions a and b as marked within module m. The topology of a single-marked module ð; aÞ is determined not only by the reaction matrix R , but also by the position of the marked reaction, so I refer to the pair ðR ; aÞ as the topology of the single-marked module ð; aÞ. Similarly, I refer to the triplet ðR ; a; bÞ as the topology of the doublemarked module ð; a; bÞ. I denote byx na the matrix of rate constants of all reactions in module m other than reaction a and I denote byx nfa;bg the matrix of all rate constants in module m other than reactions a and b. I denote the sets of all single-and double-marked modules by M 1 and M 2 , respectively. To avoid metabolite labeling ambiguities, I adopt the following conventions: i. The I/O metabolites are labeled 1 and 2 and the internal metabolites are labeled 3; 4; . . .; ii. i a ; j a 2 f1; 2; 3; 4g; i b ; j b 2 f1; 2; 3; 4; 5; 6g; iii. i a <j a , i b <j b , i a i b , j a j b .
It is easy to see that the set of all single-marked modules M 1 can be partitioned into three nonoverlapping topological classes depending on the type of the marked reaction a. I denote the classes of all single-marked modules where the marked reaction is bypass, i/o or internal (see Notations and definitions) by M b , M io and M i , respectively (Figure 7). Similarly, the set M 2 can be partitioned into nine non-overlapping topological classes according to the types of marked reactions and the type of metabolite that is shared by both of these reactions (I/O, internal, or none). These classes are listed in Table 1 and illustrated in Figure 8.

Minimal fully connected
Before coarse-graining Class  Figure 7). For the double-marked modules, m M and A M are given in Table 1 and illustrated in Figure 8.
If a single-marked module from the topological class M has the minimum number of metabolites n M in that class, I call such module and its topology minimal in M. There may be several minimal topologies in a topological class, but there is only one minimal topology that is fully connected. A topology ðR ; aÞ is called fully connected if the reaction set R is complete in the sense that it contains reactions between all pairs of metabolites in the minimal metabolite set A M . I denote such complete reaction set for the class M by R M , and I denote the respective fully connected topology by ðR M ; aÞ. I employ the same terminology and analogous notations for double-marked modules. The minimal fully connected topologies are shown in the second column in Figure 7 and Figure 8.
Next, I prove Proposition 2, which is the key step toward the proof of Theorem 1. This proposition states that there are only three functional forms for the function f 1 and characterizes them. The idea of the proof is the following. According to Property 5 (Box 1), all single-marked modules that are mapped by the CGP onto a minimal module with the same topology ðR ; aÞ have the same functional form of f 1 . In other words, each minimal topology ðR ; aÞ specifies a parameteric family of the function f 1 . Since the number of possible minimal topologies is finite, the claim of Theorem 1 can be tested for each corresponding functional form of f 1 . However, the number of minimal topologies is rather large. Fortunately, another simplification is possible. Since the reaction set R of any minimal single-marked module is a subset of the complete reaction set R M , the fully connected topology ðR M ; aÞ specifies the largest parametric family of the function f 1 for the class M, such that all other families can be obtained from it by setting some parameters to zero, which is equivalent to removing reactions from the fully connected topology. In other words, all single-marked modules that belong to the topological class M are described by functions f 1 that belong to one parameteric family corresponding to the fully connected topology minimal in M. The three parameteric families of f 1 are characterized by Proposition 2.
One important consequence of Proposition 2 is that it is not necessary to test the claim of Theorem 1 for each family of f 1 that corresponds to each minimal topology. Instead, it is sufficient to test it for the three families that correspond to the fully connected minimal topologies in each class. Let ð; aÞ be a single-marked module, and let x be the rate constant of reaction a. Then y ¼ f 1 ðuÞ, where u ¼ þ a for some a ! 0, and the function f 1 is given by one of the following expressions.

Proof
Since any single-marked module ð; aÞ belongs to exactly one of three classes M b , M io and M io , I consider these three cases one by one. Case ð; aÞ 2 M b . According to the labeling conventions outlined above, a ¼ 1 $ 2 (see Figure 7). Equation 29 follows directly from Property #4 of the CGP (Box 1).
Case ð; aÞ 2 M io . According to the labeling conventions, a ¼ 1 $ 3 (see Figure 7). According to Property #1 of the CGP, module m can be coarse-grained in two stages, by first applying CG Anf3g which eliminates metabolites 4; . . . ; m (those that do not participate in the marked reaction) and then applying CG f3g which eliminates the remaining metabolite 3. Mathematically, After applying CG Anf3g , the resulting coarse-grained module 0 has a single internal metabolite 3 and at most three effective reactions 1 $ 2, 1 $ 3 and 2 $ 3 (Figure 7), that is, it is minimal in M io . By virtue of Properties #2 and #4 of the CGP, the effective rate constants w 12 , w 23 are independent of x and u w 13 ¼ þ a. Note that w 12 may equal zero, but w 23 6 ¼ 0 because 0 is a module. Regardless, the reaction set R 0 of module 0 is always a subset of the complete reaction set R M io . Thus, to obtain the effective rate constant y , I consider the most general case when 0 is fully connected and eliminate the remaining internal metabolite 3, which leads to Equation 30. Case ð; aÞ 2 M i . According to the labeling conventions, a ¼ 3 $ 4 (see Figure 7). Otherwise, the logic of the proof is exactly the same as for the case ð; aÞ 2 M io .
Next I prove Proposition 3 which states that, for any double-marked module that belongs to a given topological class, there exists a double-marked module that is minimal in the same class, such that both modules have the same function f 2 . The corresponding minimal module is obtained from the original module by applying the CGP. This proposition is important because it implies that all functions f 2 can be completely characterized by only examing minimal modules. Then, analogously to single-marked modules, Corollary 3 states that function f 2 can belong to one of nine parameteric families which are defined by the fully connected minimal topologies in each topological class.

Proposition 3
Let ð; a; bÞ be a double-marked module that belongs to the topological class M, and let x and h be the rate constants of reactions a and b, respectively. Then there exist non-negative constants a and b and a double-marked module ð 0 ; a; bÞ minimal in M such that y ¼ y are the rate constants of the marked reactions a and b in 0 , respectively, and all other rate constants in 0 are independent of x or h. Module 0 is obtained from m by the coarse-graining procedure CG nfa;bg that eliminates all metabolites internal to module m that do not participate in reactions a or b.

Proof
To prove this proposition, I will construct the double-marked module ð 0 ; a; bÞ minimal in M by applying CG nfa;bg . Let m M be the mimimal number of internal metabolites in class M (see Table 1). According to the metabolite labeling conventions, metabolites n M þ 3; n M þ 4; . . . are neither I/O nor do they participate in the marked reactions. CG nfa;bg eliminates all these metabolites and maps module m onto module 0 , all of whose internal metabolites participate in reactions a and/or b. Therefore, ð 0 ; a; bÞ is minimal in class M (Figure 8). According to Properties #2 and #4 of the CGP (Box 1), the effective rate constants u and v of reactions a and b in module 0 are given by linear relationships in Equation 32 and Equation 33, and the remaining effective rate constants are independent of x and h. The fact that y ¼ y 0 follows from Property #1 of the CGP, CG ¼ CG nfa;bg CG A 0 .

Corollary 3
Let ð; a; bÞ be a double-marked module, and let x and h be the rate constants of reactions a and b, respectively. The function f 2 that maps x and h onto module's effective rate constant y belongs to one of nine parametric families. If ð; a; bÞ 2 M b;io;IO , then If ð; a; bÞ 2 M b;i;; , then If ð; a; bÞ 2 M io;io;I , then If ð; a; bÞ 2 M io;io;IO , then If ð; a; bÞ 2 M io;io;; , then f 2 ðu; vÞ ¼ w 12 þ D 3 w 14 v=K 24 þ D 4 u w 32 þ u w 34 v=K 24 þ w 14 w 43 w 32 D 3 D 4 À w 34 w 43 ; If ð; a; bÞ 2 M io;i;I , then If ð; a; bÞ 2 M i;i;I , then If ð; a; bÞ 2 M i;i;; , then

Proof
This statement and Equation 34 through Equation 35 follow directly from Proposition 3 and the fact that the reaction set of any double-marked module in any given topological class is a subset of the complete reaction set in that topological class.

Derivation of Equation 6 and Equation 9
Consider a higher-level phenotype y, such as the effective activity of a module, which is function of a multivariate lower-level phenotypex ¼ ðx 1 ; x 2 ; . . . ; x n Þ, such as the rates of individual reactions within the module, y ¼ Fx ð Þ. Denote the wildtype values of the phenotypes asx 0 ¼ ðx 0 1 ; x 0 2 ; . . . ; x 0 n Þ and y 0 ¼ Fx 0 ð Þ. Consider a mutation that perturbes these values, so that the mutant has lower-level phe- The relative effect of the mutation on phenotype x i is If all dx i k k ( 1 wherex k k denotes the length of vectorx, then the value of the higherlevel phenotype y 0 in the mutant is given by where which I refer to as first-and second-order control coefficients of the lower-level phenotypes x i and x j with respect to the higher-level phenotype y. Now consider two single mutants, A and B, and the double-mutant AB. Each mutation A and B and their combination may perturb all x i phenotypes such that Assuming that d Ax ( 1, d Bx ( 1 and d ABx ( 1, using the approximation in Equation 44 and the definition of "x i (analogous to Equation 5), I obtain where oð1Þ refers to terms that are vanishingly small as . . . n. I examine two special cases of Equation 49. The first special case is when both mutations affect a single phenotype x k , that is, when all d A x i ¼ 0 and all d B x i ¼ 0 except for i ¼ k.
Equation 52 is equivalent to Equation 6. The second special case is when mutation A affects a single phenotypes x k and mutation B affects a single phenotype Equation 55 is equivalent to Equation 9.

Calculation of epistasis in simple modules
Equation 52 and Equation 55 allow me to compute how epistasis propagates and emerges in arbitrary metabolic networks. In this section, I show how to implement these calculations for three simple metabolic modules considered above and in module n shown in Figure 3.

Linear pathway
First, consider how epistasis propagates through a linear pathway ( Figure 6A). For simplicity, assume that both mutations A and B affect the same reaction 1 $ 2. It follows from Equation 26 that where a is a positive constant. Therefore, the first-and second-order control coefficients of reaction 1 $ 2 with respect to the flux through the linear pathway m are given by Substituting these expressions into the expression for the fixed point " ¼ ÀH 2C 1 À C ð Þ ð Þ À1 , I find that " ¼ 1, irrespectively of the rates of other reactions in the linear pathway. This implies that epistasis "<1 at the level of reaction 1 $ 2 would induce a lower value of epistasis "y <"<1 at the level of the entire linear pathway, any value ">1 would induce a higher value of epistasis "y >">1, and " ¼ 1 would induce "y ¼ 1.
Now consider emergence of epistasis in a linear pathway. Suppose that mutation A affects reaction 1 $ 2 and mutation B affects reaction 2 $ 3. Denote the rate constant of reactions 1 $ 2 and 2 $ 3 by x 12 and h x 23 , respectively. It follows from Equation 26 that where b is a positive constant. Therefore, which, after substituting into Equation 9, yield "y ¼ 1. Together with the fact that " ¼ 1 (see above), this result implies that epistasis coefficient between any two mutations that affect different reactions in a linear pathway equals 1.

Two parallel pathways
Suppose that mutation A affects reaction 1 $ 3 and mutation B affects reaction 1 $ 4 in the linear metabolic pathway shown in Figure 6B. Denote the rate constants of reaction 1 $ 3 and 1 $ 4 by x 13 and h x 14 . It follows from Equation 27 that where a ¼ 1=ðK 13 x 32 Þ and b ¼ 1=ðK 14 x 42 Þ. Thus, we have H h ¼ 0, and there is no epistasis between such mutations. Figure 3A I denote the rate of the reactions affected by mutations A and B by ¼ x 13 and h ¼ x 42 , and I also denote z ¼ x 34 . I will calculate the epistasis coefficient "y n in two stages, by first calculating the epistasis coefficient "y and then propagating it to "y n using Equation 6. Here I am specifically interested in how "y n depends on the rate constant z.

Module n in
To compute epistasis between mutations A and B at the level of module m, I rewrite Equation 28 as . I obtain the following expressions for the first-and second-order control coefficients.
. To obtain the expression for "y n , I coarse-grain module n by eliminating the only remaining internal metabolite 2 and obtain y n ¼ x 15 þ y x 25 y =K 12 þ x 25 : I then apply equation Equation 6 with Figure 3B illustrates how "y n changes as a function of z. It was generated using the following matrix of rate constants:x Next, I consider thress special cases of the toy network depicted in Figure 3A that relate this network to those in Figure 3C and D.
1. z ¼ 0. According to Equation 61, "y ¼ 0 and hence "y n ¼ H=2C 2 0, with C and H given by Equation 62 and Equation 63. It is easy to see that in this case the reactions 1 $ 3 and 2 $ 4 that are affected by mutations are strictly parallel because there is a simple path 1 $ 3 $ 2 $ 5 that contains only reaction 1 $ 3 and there is a simple path 1 $ 4 $ 2 $ 5 that contains only reaction 2 $ 4 ( Figure 3C). 2. x 15 ¼ x 32 ¼ x 14 ¼ 0. In this case, module n is a linear pathway. Therefore, "y n ¼ 1 independently of z, as shown above. This fact can also be obtained directly from Equation 61, Equation 62, Equation 63 and Equation 6. 3. z ! ¥. In this case, module m becomes an effectively linear pathway because most of the metabolic flux between the I/O metabolites 1 and 2 passes through reaction 3 $ 4. Thus, it follows from Equation 61 that "y !ã=c 1c2 ð Þ ¼ 1, as expected. Then, according to Theorem 1, "y n ! 1.

Proof of Theorem 1
As discussed above, the key step toward the proof is Proposition 2, which states that the function f 1 belongs to one of three parameteric families, given by Equation 29

Proof of Theorem 1
Let a be the effective reaction within higher-level module n that represents the lower-level module m. To simplify notations, I denote y . According to Proposition 2, the functional from of f 1 depends only on the topological class of the single-marked module ðn; aÞ. So, I consider the three classes one by one.
Case ðn; aÞ 2 M io . From Equation 30, where D ¼ ð þ aÞ=K 13 þ w 32 . From Equation 64, it is clear that C ! 0. Re-writing Equation 64 as it is also clear that C 1 since both ratios in this expression do not exceed 1. From Equation 65 and the fact that 0 C 1, it follows that " ! 0. To show that " 1, note that Next, it is straightforward to show thatÃD ÀBC ¼ w 41 w 32 À w 31 w 42 ð Þ 2 =K 31 ! 0, which implies that C ! 0. To show that C 1, I expand the denominator in Equation 66 and obtain Therefore, numerator in Equation 66 cannot exceed the denominator. The fact that " ! 0 follows directly from Equation 67 together with C 1. To show that " 1, first note that Therefore, D ð1 À CÞ ¼C þC a þD 1 ÀÃ =D y n þ DBC y n !C : Hence, " ¼C Dð1 À CÞ

1:
Therefore, inequalities in Equation 7 and Equation 8 hold in this case as well, which completes the proof.

Proof of Theorem 2
Proving Theorem 2 involves several auxiliary steps. First, I note that any two reactions a and b within any module m can be either strictly serial, strictly parallel or serial-parallel. Then, Proposition 4 and its Corollary 4 establish that strictly parallel (serial) reactions in ð; a; bÞ are also strictly parallel (serial) in a minimal module ð 0 ; a; bÞ, which is obtained from m by eliminating all metabolites that do not participate in the marked reactions. Next, recall that in both modules ð; a; bÞ and ð 0 ; a; bÞ the same function f 2 maps the rate constants of two marked reactions onto module's effective rate constant (Proposition 3). Since the epistasis coefficient depends only on the shape of this function, we only need to consider all minimal modules in order to understand what kinds of epistasis may arise between mutations affecting strictly serial and strictly parallel reactions in any module. This is a big simplification because the number of different minimal topologies is finite and the parameteric families of function f 2 are known for all of them (see Corollary 3). As a consequence, to prove Theorem 2, we could in principle list all of the minimal topologies, identify those where the marked reactions are strictly serial or strictly parallel and evalulate the epistasis coefficient using Equation 9 for every respective function f 2 .
Unfortunately, the number of minimal topologies is very large, so that such brute-force approach would be quite cumbersome. I take a less cumbersome approach which is based on the realization that a strictly serial or strictly parallel relationship between two reactions cannot be altered by removing a third reaction from the module (Proposition 5). This implies that every minimal topology where the two reactions are strictly serial can be produced from another, more connected, 'generating' topology by removing some reactions; and similarly for minimal modules where the reactions are strictly parallel (Proposition 6). All generating topologies can be identified by a simple algorithm given in Appendix 3.
Finally, I prove Theorem 2 in three steps. First, Proposition 7 shows that "y 0 for any minimal module m with any strictly parallel generating topology. Second, Proposition 8 shows that that "y ! 1 for any minimal module m with any strictly serial generating topology. Third, the proof of Theorem 2 formally extends this argument to all modules with strictly serial and strictly parallel reactions.

Topological relationships between reactions within a module
Consider module m with the I/O metabolites 1 and 2. As described above, a simple path connecting two metabolites i and j within module m is denoted by If such path contains reactions a 1 ; a 2 ; . . . and does not contain reactions b 1 ; b 2 ; . . ., I denote it as p ij ða 1 ; a 2 ; . . . ; b 1 ; b 2 ; . . .Þ. I denote the set of all paths p ij ða 1 ; a 2 ; . . . ; b 1 ; b 2 ; . . .Þ by P ij ða 1 ; a 2 ; . . . ; b 1 ; b 2 ; . . .Þ. According to Lemma 1 proven in Appendix 2, any reaction in the module belongs to at least one simple path within module m that connects the I/O metabolites. Mathematically, P 12 ðaÞ 6 ¼ ; for any reaction a 2 R . Thus, we can define different topological relationships between any two reactions within a module based on the existence of various paths that do or do not contain them. Consequently, we can classify any double-marked module ð; a; bÞ based on the toplogocial relationship between its marked reactions. This classification is orthogonal to that given in Table 1.
Two reactions a 2 R and b 2 R are called parallel within module m if there exists a simple path p 12 ða; bÞ and a simple path p 12 ðb; aÞ. Two reactions a 2 R and b 2 R are called serial within module m if there exist at least one simple path p 12 ða; bÞ. Two reactions are called strictly parallel within module m if they are parallel but not serial, they are called strictly serial within module m if they are serial but not parallel, and they are called serial-parallel within module m if they are both serial and parallel. It is straightforward to show that there are no other logical possibilities for any two reactions to be anything other than strictly serial, strictly parallel or serial-parallel. This implies that two reactions are strictly parallel if they are not serial, and they are strictly serial if they are not parallel. If reactions a and b are serial, parallel, strictly serial, strictly parallel or serial-parallel within module m, I also refer to the double-marked module ð; a; bÞ as serial, parallel, etc. Since the relationship between reactions depends only on the topology of the module, but not on its rate constants, I also refer to the topology ðR ; a; bÞ as serial, parallel, etc.
Recall that coarse-graining procedure CG nfa;bg eliminates all metabolites internal to module m other than those participating in reactions a and b. If the double-marked module ð; a; bÞ belongs to the topological class M, then, according to Proposition 3, CG nfa;bg maps ð; a; bÞ onto a minimal double-marked module ð 0 ; a; bÞ from the same class M. The following proposition, which is easy to prove using Property #9 of the CGP (see Box 1), establishes how this procedure alters the topological relationship between reactions a and b.

Proposition 4
Let ð; a; bÞ be a double-marked module from the topological class M ( Table 1)  Note that the converse of the second claim in Proposition 4 is not true. In other words, if two reactions a and b are parallel in ð; a; bÞ, they may not be parallel in ð 0 ; a; bÞ. Figure 9 shows a counter-example illustrating this.

Corollary 4
1. If reactions a and b are strictly serial in ð; a; bÞ, they are also strictly serial in ð 0 ; a; bÞ.
2. If reactions a and b are strictly parallel in ð; a; bÞ, they are also strictly parallel in ð 0 ; a; bÞ. 3. If reactions a and b are serial-parallel in ð; a; bÞ, they are either strictly serial or serial-parallel in ð 0 ; a; bÞ.
Corollary 4 is an important step toward proving Theorem 2. According to Proposition 3, the function that maps the rate constants x and h of the reactions a and b in module m onto the effective rate constant y is the same function that maps the rate constants u and v of these reactions in the minimal module 0 onto the effective rate constant y 0 . It then immediately follows from Equation 9 that the epistasis coefficient "y between mutations affecting reactions a and b in the original module m is the same as the epistasis coefficient "y 0 in the minimal module 0 . Now, if the reactions a and b are strictly parallel in ð; a; bÞ, then, according to Corollary 4, these reactions are also strictly parallel in ð 0 ; a; bÞ. Hence, to demonstrate that "y 0 for any such double-marked module ð; a; bÞ, it is sufficient to show that "y 0 0 for all double-marked modules ð 0 ; a; bÞ that are minimal in M and where the reactions a and b are strictly parallel. And similarly for the strictly serial reactions.
According to this logic, Theorem 2 can be proven by identifying all double-marked modules that are minimal in each of the topological classes listed in Table 1 and where the marked reactions are strictly parallel, evaluating epistasis for all of them, and showing that it is non-positive, irrespectively of the rate constants of any reactions, and similarly for the strictly serial reactions.

Generating topologies
Since the number of topologically distinct minimal double-marked modules is finite, the approach outlined above is in principle feasible. Unfortunately, the number of topologies to be considered is very large, so in practice it is very cumbersome. To avoid this complication, I take an alternative approach that is based on the same key idea as the proof of Theorem 1. Rather than considering one by one, each minimal topology where the marked reactions are strictly serial or strictly parallel (and the corresponding parametric families of f 2 ), the idea is to identify the most connected minimal topologies (and the corresponding largest parametric families of f 2 ) such that all the other minimal topologies with the strictly serial or strictly parallel reactions (and the corresponding parametric families) can be obtained from them by removing reactions (i.e. setting some parameters to zero).
This idea can be implemented using the following observations. If the two marked reactions are strictly parallel or strictly serial in a minimal module, then removing a third reaction from it does not change this relationship. This statement is proven in Proposition 5. As a consequence, all minimal modules in the topological classes M b;io;IO , M b;i;; and M io;io;IO must be strictly parallel because the fully connected minimal topologies are strictly parallel (Figure 8). Similarly, all minimal modules in the topological class M io;io;I must be strictly serial because the fully connected minimal topology is strictly serial (Figure 8). The fully connected minimal topologies in all other topological classes are serial-parallel. If the two reactions are serial-parallel, removing a third reaction can change their relationship into a strictly serial or strictly parallel one, depending on which reaction is removed, as shown for example in Figure 3A,C and D. In fact, by removing reactions from the fully connected minimal modules shown in Figure 8, it is easy to show that the topological classes M io;io;; , M io;i;I , M io;i;; , M i;i;I , M i;i;; contain both strictly serial and strictly parallel modules.
These observations suggests that adding reactions to a minimal module where the marked reactions are strictly serial or strictly parallel will either change their relationship into serial-parallel or will preserve their relationship until the minimal module is fully connected. Therefore, among all minimal modules in a topological class, there must exist the most connected modules where the marked reactions are strictly parallel or strictly serial, such that all other less connected strictly serial or strictly parallel modules can be produced from the most connected ones by removal of reactions. This statement is proven in Proposition 6. Such most connected strictly parallel and strictly serial minimal topologies, which I refer to as 'generating', are listed in Table 2 and Table 3. They define the largest parameteric familes of functions f 2 which I then examine for the value of "y .

Proposition 5
Let ð; a; bÞ and ð 0 ; a; bÞ be two minimal double-marked modules from the same topological class whose sets of reactions are R and R 0 , respectively, and R 0 ¼ R n fcg where c 2 R n fa; bg.
1. If reactions a and b are strictly parallel in ð; a; bÞ, they are also strictly parallel in ð 0 ; a; bÞ.

Definition 3
Topology ðR; a; bÞ minimal in M is called a strictly parallel generating topology in M if it is strictly parallel (i.e. ðR; a; bÞ 2 R par M ) and either it is fully connected (i.e. R ¼ R M ) or ðR [ fcg; a; bÞ 2 R sp M for any reaction c 2 R M n R.
Clearly, a topological class M may have multiple generating topologies, and it is easy to show that every topological class has at least one generating topology. I denote the set of all strictly serial generating topologies for the class M by G ser M and I denote the set of all strictly parallel generating topologies for class M by G par M . The following proposition justifies the name 'generating topology'. It states that any strictly serial minimal topology can be produced from some strictly serial generating topology by removing one or multiple reactions, and similarly for any strictly parallel minimal topology.
io,i,ø,F . Fully connected topology io; i; ;; F is shown for reference (same as in Figure 8).
I provide an algorithm that discovers all strictly serial and strictly parallel generating topologies by sequentially removing reactions from the fully connected minimal topology in each topological class. The code implementing this algorithm in Matlab is available at https://github.com/skryazhi/epista-sis_theory. All strictly parallel generating topologies are listed in Table 2 and all strictly serial generating topologies are listed in Table 3. They are also illustrated in Figure 8 and Figure 10 through Figure 14. I label each generating topology by a four-letter combination (see column 4 in Table 2 and Table 3): the first three letters denote the topological class and the last letter (F, P, or S) denotes the specific generating topology within that class. Letter 'F' (stands for 'full') denotes the fact that the reaction set in the generating topology is complete. Letters 'P' (for 'parallel') and 'S' (stands for 'serial') denote strictly parallel and strictly serial generating topologies, respectively; if there are a multiple generating topologies within the same class, they are distinguished by subindices, for example, io; i; ;; P 1 ; io; i; ;; P 2 , etc.

Topological relationship between reactions and epistasis
Each strictly serial and strictly parallel generating topology in a given class M (listed in Table 2 and Table 3) is produced by removing reactions from the fully connected topology minimal in M (shown in Figure 8). This implies that the parametric family of function f 2 that corresponds to any generating topology is obtained from Equation 34 through Equation 35 by setting some parameters w ij to zero. In other words, these parametric families are known. Next, I prove Proposition 7 that shows that "y 0 for every member of every parameteric family of f 2 that corresponds to a strictly parallel generating topology and the analogous Proposition 8 for strictly serial topologies. Now, any minimal strictly parallel topology can in turn be produced by removing reactions from some strictly parallel generating topology, and any minimal strictly serial topology can be produced by removing reactions from some strictly serial generating topology. This implies that any function f 2 that corresponds to any strictly parallel minimal topology belongs to the parametric family that corresponds to some strictly parallel generating topology; and any function f 2 that corresponds to any strictly serial minimal topology belongs to the parametric family that corresponds to some strictly serial generating topology. Therefore, Propositions 7 and 8 imply that "y 0 for any minimal strictly parallel topology and that "y ! 1 for any minimal strictly serial topology. The proof of Theorem 2 is then concluded by recalling that every strictly parallel module is mapped onto its effective rate constant via function f 2 that corresponds to some minimal strictly parallel module, and similarly for strictly serial modules. Generating topology io; i; ;; P 3 ( Figure 12). Notice that metabolites 4 and 5 together with reactions a, b, 1 $ 4, 1 $ 5, 3 $ 4 and 3 $ 5 form a double-marked module ð 0 ; a; bÞ whose I/O metabolites are 1 and 3 and which is minimal in the topological calss M b;i;; . The rest of the proof for this topology is analogous to that for the topology io; i; I; P.
Generating topology i; i; I; P 1 ( Figure 13). Notice that metabolites 4 and 5 together with reactions a, b, 1 $ 3, 1 $ 4, 1 $ 5 and 4 $ 5 form a double-marked module ð 0 ; a; bÞ whose I/O metabolites are 1 and 3 and which is minimal in the topological calss M io;io;IO . The rest of the proof for this topology is analogous to that for the topology io; i; I; P.
Generating topology i; i; I; P 2 ( Figure 13). Notice that metabolite 5 together with reactions a, b, and 4 $ 5 form a double-marked module ð 0 ; a; bÞ whose I/O metabolites are 3 and 4 and which is minimal in the topological calss M b;io;IO . The rest of the proof for this topology is analogous to that for the topology io; i; I; P.
Generating topology i; i; ;; P 2 ( Figure 14). Notice that metabolites 5 and 6 together with reactions a, b, 3 $ 5, 3 $ 6, 4 $ 5 and 5 $ 6 form a double-marked module ð 0 ; a; bÞ whose I/O metabolites are 3 and 4 and which is minimal in the topological calss M b;i;; . The rest of the proof for this topology is analogous to that for the topology io; i; I; P.
Generating topology i; i; ;; P 3 ( Figure 14). Module m can be coarse-grained by first eliminating metabolites 4 and 6, which will result in a double-marked module ð 0 ; a 0 ; b 0 Þ that is minimal in the topological class M io;io;IO . The rest of the proof for this topology is analogous to that for the topology io; i; ;; P 1 .
Generating topology i; i; ;; P 4 ( Figure 14). Module m can be coarse-grained by first eliminating metabolite 4, which will result in a double-marked module ð 0 ; a 0 ; bÞ that is minimal in the topological class M io;i;; with a strictly parallel generating topology io; i; ;; P 3 . The rest of the proof for this topology is analogous to that for the topology io; i; ;; P 1 .
Generating topology i; i; ;; P 5 ( Figure 14). Module m can be coarse-grained by first eliminating metabolite 6, which will result in a double-marked module ð 0 ; a; b 0 Þ that is minimal in the topological i,i,ø,F . Fully connected topology i; i; ;; F is shown for reference (same as in Figure 8).
class M i;i;I with a strictly parallel generating topology i; i; I; P 1 . The rest of the proof for this topology is analogous to that for the topology io; i; ;; P 1 . Generating topology i; i; ;; P 6 ( Figure 14). Notice that metabolites 4 and 6 together with reactions a, b, 3 $ 5, 3 $ 6, 4 $ 5 form a double-marked module ð 0 ; a; bÞ whose I/O metabolites are 3 and 5 and which is minimal in the topological calss M io;io;; . The rest of the proof for this topology is analogous to that for the topology io; i; I; P.
Generating topology i; i; ;; P 7 ( Figure 14). Using Equation 42, I show in Appendix 4 that where A v , B v , E u , F u are all non-negative constants and b 0.

Proposition 8
Let ð; a; bÞ be a minimal double-marked module in the topological class M, with u and v being the rates of reactions a and b, respectively, and let y be the effective rate constant of this module. Suppose that mutation A perturbs only reaction a by d A u, and mutation B perturbs only reaction b by If reactions a and b are strictly serial in ð; a; bÞ, then "y ! 1.

Proof
The logic of the proof is the same as for Proposition 7, that is, it is sufficient to show that "y ! 1 for any double-marked module ð; a; bÞ with any strictly serial generating topology listed in Table 3. Generating topology io; io; I; F ( Figure 8). According to Equation 36, where D ¼ u=K 13 þ v. Therefore, Substituting these expressions into Equation 68, I obtain "y ¼ y u v=D ! 1 because y ! u v=D according to Equation 73. Generating topology io; io; ;; S ( Figure 10). According to Property 1 of the CGP (Box 1), module m can be coarse-grained by first eliminating metabolite 3. In the resulting module 0 , mutation A perturbs only the rate constant u 0 of the effective reaction a 0 1 $ 4 (by Properties 2 and 4 of the CGP). Then, according to Equation 50 and Theorem 1, d A u 0 ( 1. The double-marked module ð 0 ; a 0 ; bÞ is minimal in the topological class M io;io;I which implies that "y ! 1, as shown above. Generating topology io; i; I; S 1 ( Figure 11). Notice that metabolite 3 together with reactions a, b, and 1 $ 4 form a double-marked module ð 0 ; a; bÞ whose I/O metabolites are 1 and 4 and which is minimal in the topological calss M io;io;I . Therefore, if the effective reaction rate of module 0 is y 0 , "y 0 ! 1, as shown above. According to Equation 50, Equation 51 and Theorem 1, d A y 0 ( 1, d B y 0 ( 1. Since module 0 is contained in m, by Theorem 1, "y ! 1. Generating topology io; i; I; S 2 ( Figure 11). Module m can be coarse-grained by first eliminating metabolite 4, which results in a double-marked module ð 0 ; a; b 0 Þ that is minimal in the topological class M io;io;I . The rest of the proof for this topology is analogous to that for the topology io; io; ;; S.
Generating topology io; i; ;; S 1 ( Figure 12). Module m can be coarse-grained by first eliminating metabolites 4 and 5, which results in a double-marked module ð 0 ; a; b 0 Þ that is minimal in the topological class M io;io;I . The rest of the proof for this topology is analogous to that for the topology io; io; ;; S.
Generating topology io; i; ;; S 2 ( Figure 12). Notice that metabolites 3 and 4 together with reactions a, b, 1 $ 4, 1 $ 5 and 3 $ 4 form a double-marked module ð 0 ; a; bÞ whose I/O metabolites are 1 and 5 and which is minimal in the topological calss M io;io;; with the strictly serial generating topology io; io; ;; S. The rest of the proof for this topology is analogous to that for the topology io; i; I; S 1 .
Generating topology io; i; ;; S 3 ( Figure 12). Notice that metabolites 3 and 5 together with reactions a, b, 1 $ 4, 3 $ 4, and 3 $ 5 form a double-marked module ð 0 ; a; bÞ whose I/O metabolites are 1 and 4 and which is minimal in the topological calss M io;io;; with the strictly serial generating topology io; io; ;; S. The rest of the proof for this topology is analogous to that for the topology io; i; I; S 1 .
Generating topology i; i; I; S 1 ( Figure 13). Notice that metabolite 3 together with reactions a, b, and 4 $ 5 form a double-marked module ð 0 ; a; bÞ whose I/O metabolites are 4 and 5 and which is minimal in the topological calss M io;io;I . The rest of the proof for this topology is analogous to that for the topology io; i; I; S 1 .
Generating topology i; i; I; S 2 ( Figure 13). Notice that metabolites 3 and 5 together with reactions a, b, 1 $ 3, 1 $ 4, and 1 $ 5 form a double-marked module ð 0 ; a; bÞ whose I/O metabolites are 1 and 4 and which is minimal in the topological calss M io;i;I with the strictly serial generating topology io; i; I; S 2 . The rest of the proof for this topology is analogous to that for the topology io; i; I; S 1 .
Generating topology i; i; ;; S 1 ( Figure 14). Module m can be coarse-grained by first eliminating metabolites 5 and 6, which results in a double-marked module ð 0 ; a; b 0 Þ that is minimal in the topological class M io;i;I with the strictly serial generating topology io; i; I; S 1 . The rest of the proof for this topology is analogous to that for the topology io; io; ;; S.
Generating topology i; i; ;; S 2 ( Figure 14). Module m can be coarse-grained by first eliminating metabolite 6, which results in a double-marked module ð 0 ; a; b 0 Þ that is minimal in the topological class M i;i;I with the strictly serial generating topology i; i; I; S 1 . The rest of the proof for this topology is analogous to that for the topology io; io; ;; S.

Proof of Theorem 2
According to Proposition 3, the coarse-graining procedure CG nfa;bg maps the double-marked module ð; a; bÞ onto a double-marked module ð 0 ; a; bÞ that is minimal in the same topological class as ð; a; bÞ, and the rates u, v of reactions a, b in 0 are given by linear relations in Equation 32 and Equation 33. Clearly, d A u ( 1 and d B v ( 1. Furthermore, none of the other reaction rates w ij in 0 depend on x or h, so that d A w ij ¼ 0 and d B w ij ¼ 0 for all w ij other than u and v, and "w ij ¼ 0 for all w ij including u and v. It then follows from Proposition 3 that "y ¼ "y 0 . Now, according to Corollary 4, if reactions a and b are strictly parallel in ð; a; bÞ, they are also strictly parallel in ð 0 ; a; bÞ. Therefore, by Proposition 7, "y 0 0. Analogously, if reactions a and b are strictly serial in ð; a; bÞ, they are also strictly serial in ð 0 ; a; bÞ. Therefore, by Proposition 8, "y 0 ! 1.

Sensitivity of Theorem 1 and Theorem 2 with respect to the magnitude of mutational effects
According to Proposition 2, function f 1 for any module belongs to one of three parametric families, which correspond to the three minimal fully connected modules shown in Figure 7. As mentioned in the Results, for modules in the class M b , function f 1 is linear, so that the claims of Theorem 1 continue to hold for mutations with finite effects. To evaluate the sensitivity of Theorem 1 with respect to the effect sizes of mutations for the topological classes M io and M i , I generated 1000 minimal single-marked modules n from each of these topological classes with random parameters. Evaluating only minimal modules is sufficient because for any module from a given topological class there exists a minimal module from the same class, such that both of them map the lower level phenotype y onto the higher-level phenotype y n via the same function f 1 (see Proposition 2).
To this end, I drew each x ij (i<j) from a mixture of a point measure at 0 (with weight 0.25) and an exponential distribution with mean 1 (with weight 0.75). The point measure at 0 ensures that minimal modules that are not fully connected are represented in the sample. I drew each K ij (i<j) as a ratio of two random numbers from an exponential distribution with mean 1. As a result, the distribution of non-zero x ij values had the interdecile range of ð5:7 Â 10 À2 ; 3:91Þ with median 0.65. I denote the effective rate constant of the reaction that represents the lower-level module m by y . In modules from the topological class M io , it is reaction 1 $ 3 and in modules from the topo- the resulting values d A y n , d B y n and "y n at the level of the effective rate constant y n of the higher-level module n.
Using these data, I inferred the function f that maps lower-level epistasis " onto higher-level epistasis "y n , as follows. For any minimal single-marked module from the topological classes M io or M i , the effective rate constant y n can be written as Þðx 41 þ x 42 Þ for modules from the topological class M i (see Equation 31). Therefore, for any perturbation d, we have Since d AB is a linear function of ", d AB y n is a hyperbolic function of ". Therefore, "y n is also a hyperbolic function of ", where constants a, b and c depend on the parameters of module n and on the mutational effect sizes d A and d B . I numerically calculated these parameters for each sampled module and each pair of mutational effects.
The main results of Theorem 1 are that, when the effects of mutations are infinitesimal, the map f has a fixed point ", this fixed point is located between 0 and 1, and it is unstable. I use equation Equation 74 to test whether these statements also hold when the effects of mutations are finite. Specifically, it is easy to see that the map f has a fixed point " if the discriminant d ¼ ða À cÞ 2 À 4 ðb À acÞ is positive. In this case, I designate " as the one of two roots 1=2 a À c AE ffiffiffi d p À Á that is closer to zero. I then check whether this fixed point is located between 0 and 1. I check whether it is unstable by comparing the derivative of f at " with 1. According to Proposition 6, function f 2 for any module where the reactions affected by mutations are strictly parallel belongs to one of 17 parameteric families, which correspond to the strictly parallel generating topologies listed in Table 2. And similarly, function f 2 for any module where the reactions affected by mutations are strictly serial belongs to one of 11 parameteric families, which correspond to the strictly serial generating topologies listed in Table 3. Therefore, to evaluate the sensitivity of Theorem 2 with respect to the effect sizes of mutations I generated 10 4 double-marked modules ð; a; bÞ with each of the strictly serial and strictly parallel topologies with random parameters. I drew x ij and K ij as described above. I chose the same nine pairs of mutational effects ðd A ; d B hÞ as above, where x and h are the rate constants of reactions affected by mutations A and B: ðÀ0:01; À0:01Þ, ðÀ0:1; À0:1Þ, ðÀ0:5; À0:5Þ, ð0:01; 0:01Þ, ð0:1; 0:1Þ, ð0:5; 0:5Þ, ðÀ0:01; 0:01Þ, ðÀ0:1; 0:1Þ, ðÀ0:5; 0:5Þ.
I found that, for some modules, individual mutational perturbations d A y and/or d B y at the level of the whole module were too small, which resulted in numerical instabilities. To avoid them, I calculated epistasis "y only for cases where the effects of both mutations d A y and d B y exceeded the precision threshold of 10 À5 . As a result, I evaluated epistasis in less than 10 4 modules per generating topology and pair of mutational effects, but this number never fell below 1000. When comparing the values of epistasis with 0 and 1, I used the same precision threshold of 10 À5 to avoid numerical problems. In addition, I found that for mutations affecting strictly serial reactions there is a substantial fraction of modules where "y falls between 0.99 and 1 (see Figure 4-figure supplement 3). This is not a numerical artifact, but probably reflects real clustering of epistasis coefficients around 1, which is expected for the linear pathway irrespective of its parameters (see above).
The Matlab code for this analysis is available at https://github.com/skryazhi/epistasis_theory.  Table V of Chassagnole et al., 2002 and (b) removing dilution by growth. I then created four models of sub-modules of glycolysis by retaining the subsets of metabolites and enzymes shown in Figure 5-figure supplement 1 and Table 4 and removing other metabolites and enzymes. Each sub-module has one input and one output metabolite. Note that, since some reactions are irreversible, it is important to distinguish the input metabolite from the output metabolite. The concentrations of the input and the output metabolites in each model are held constant at their steady-state values given in Table 4. I defined the flux through the sub-module as the flux toward the output metabolite contributed by the sub-module (Table 4). This flux is the equivalent of the quantitative phenotype y of a module in the analytical model. In addition, I made the following modifications specific to individual sub-modules.

Kinetic model of glycolysis
1. In the FULL model, the stoichiometry of the PTS reaction was changed to ½Extglu þ ½pep $ ½g6p þ ½pyr and the value of the constant K PTS;a1 was set to 0.02 mM, based on the values found in the literature (Stock et al., 1982;Natarajan and Srienc, 1999). 2. In all models other than FULL, the extracellular compartment was deleted. 3. In all models, the concentrations of the I/O metabolites were set to values shown in Table 4, which are the steady-state concentrations achieved in the FULL model with the concentration of extracellular glucose being 2 mM and pyruvate concentration being 10 mM. Table 4. Definition of modules in the glycolysis network shown in Figure 5-figure supplement 1. Enzyme abbreviations are listed in Table 6. Metabolite abbreviations are listed in Table 5. Calculation of flux control coefficients and epistasis coefficients I calculate the first-and second-order flux control coefficients (FCC) C i and H ij for flux J with respect to reactions i and j as follows (see Equation 45 and Equation 46). I perturb the r max;i of reaction i by factor between 0.75 and 1.25 (10 values in a uniformly-spaced grid), such that dr max;i 2 ½À0:25; 0:25. Then, I obtain the steady-state flux J 0 in each perturbed model and calculate the flux perturbations dJ ¼ J 0 =J 0 À 1, where J 0 is the corresponding flux in the unperturbed model. Then, to obtain C i and H ii , I fit the linear model  by least squares. If the estimated value of C i was below 10 À4 for a given sub-module, I set C i to zero and exclude this reaction from further consideration in that sub-module because it does not affect flux to the degree that is accurately measurable. If the estimated value of H ii is below 10 À4 , I set H ii to zero.
To calculate the non-diagonal second-order control coefficients H ij , I create a 4 Â 4 grid of perturbations of dr max;i and dr max;j and calculate the resulting flux perturbations dJ (16 perturbations total). Since C i , C j , H ii and H jj are known, I obtain H ij , by regressing dJ À C i dr max;i À Á þ H ii 2 dr max;i À Á 2 À C j dr max;j À Á þ H jj 2 dr max;j À Á 2 against dr max;i À Á dr max;j À Á : If the estimated value of H ij is below 10 À4 , I set H ij to zero. I estimate the epistasis coefficient "J between mutations affecting reactions i and j as "J ¼ H ij 2 C i C j :

Establishing the topological relationships between pairs of reactions
To establish the topological relationship (strictly serial, strictly parallel, or serial-parallel) between two reactions, I consider the smallest module (LG, UGPP, GPPP, or FULL) which contains both reactions. I then manually identify whether there exists a simple path connecting the input metabolite with the output metabolite for that module that passes through both reactions. (Note that, since some reactions are irreversible in this model, it is important to distinguish the input metabolite from the output metabolite). If such path does not exist, I classify the topological relationship between the two reactions as strictly parallel. If such path exists, I check if there are two paths connecting the input to the output metabolites such that each path contains only one of the two focal reactions. If such paths do not exist, I classify the topological relationship between the two reactions as strictly serial. Otherwise, I classify it as serial-parallel.
Recalling that E ¼ E 0 n fkg, I re-write Equation 75 as Since the first and third terms of Equation 77 and Equation 78 are identical, to complete the proof, it is sufficient to show that for any ' 2 E 0 , The path p 0 2i does not intersect the path p 1i because its first segment p 2' belongs to path p 2i and its second segment p 'j belongs to the segment p kj of path p 00 1i (and, as mentioned above, segment p kj does not intersect p 1i ). Thus, the statement holds for case (ii) as well. The only effective activity that depends on v is V 56 , and qV56 qv ¼ 1. Thus, isolating the term V 56 (and recalling that V 65 ¼ V 56 =K 56 ), we obtain the following expression for y, which is easy to differentiate with respect to v.
Also notice that, any module with the reaction set i; i; ;; P 7 is symmetric with respect to swapping metabolite labels 1 with 2, 3 with 5, and 4 with 6. It is easy to check that Equations (81) where We can now obtain the second derivative q 2 y qu qv , taking into account Equation 86. Alternatively, we can obtain q 2 y qu qv by differentiating qy qv with respect to u, taking into account Equation 87. The denominators in both expressions would be identical due to Equation 83. Therefore the expression for the second derivative must have the form given by Equation 72, that is, where b is independent of u and v. Thus, according to Equation 84, Equation 86, which is verified in Mathematica notebook (Supplementary file 1) and Mathematica notebook pdf (Supplementary file 2), p. 4. Similarly, according to Equation 85, Equation 87, which is verified in Mathematica notebook (Supplementary file 1) and Mathematica notebook pdf (Supplementary file 2), p. 5.
concentrations S 1 and S 2 of the external nutrient and the biomass. However, in the degenerate case where all uptake reactions are irreversible (i.e., K 1i ¼ ¥ for all i 2 A in ), J ¼ J in ¼ P i2Ain x 1i S 1 , such that it is independent of the internal structure of the module. In this sense, this special case of my model is equivalent to FBA. However, my model and FBA are not equivalent in terms of the distribution of internal fluxes. My model still produces a unique steady sate and the corresponding flux distribution, whereas FBA does not specify a unique flux distribution in this case without additional auxiliary conditions.