Structural conserved moiety splitting of a stoichiometric matrix

Characterising biochemical reaction network structure in mathematical terms enables the inference of functional biochemical consequences from network structure with existing techniques and spurs the development of new mathematics that exploits the peculiarities of biochemical network struc- ture. The structure of a biochemical network may be speciﬁed by reaction stoichiometry, that is, the relative quantities of each molecule produced and consumed in each reaction of the network. A biochemical network may also be speciﬁed at a higher level of resolution in terms of the internal structure of each molecule and how molecular structures are transformed by each reaction in a network. The stoichiometry for a set of reactions can be compiled into a stoichiometric matrix N ∈ Z m × n , where each row corresponds to a molecule and each column corresponds to a reaction. We demonstrate that a stoichiometric matrix may be split into the sum of m − rank ( N ) moiety transition matrices, each of which corresponds to a subnetwork accessible to a structurally identiﬁable conserved moiety. The existence of this moiety ma- trix splitting is a property that distinguishes a stoichiometric matrix from an arbitrary rectangular matrix.


Introduction
Understanding biochemical networks is of great practical importance in systems biology. A variety of approaches for mathematical modelling of reaction networks have been developed, including topological ( Barabási and Oltvai, 2004 ), stochastic, deterministic ( Ingalls, 2013 ) and constraint-based modelling ( Palsson, 2015 ). Before any biological application of any of these modelling approaches, an abstract representation of the relative quantities of molecules produced and consumed in each reaction of a reaction network is reconstructed from experimental literature. A key output of this reconstruction process is a stoichiometric matrix, where every row corresponds to a molecule, every column corresponds to a reaction, and each entry corresponds to the relative quantity of a molecule produced or consumed in a reac- Fig. 1. Constraint-based reconstruction and analysis. Constraint-Based Reconstruction and Analysis (COBRA) is an example of a systems biology approach, carried out in an iterative cycle, where the aim is to increase the predictive accuracy of a constraint-based computational model. Quality controlled reconstruction of prior literature information generates a draft model, which includes a stoichiometric matrix (upper left). This is followed by mathematical modelling using optimisation methodologies (upper right), enabling hypothesis generation in the form of model predictions (lower right). These predictions are testing against experimental data (lower left) and any discrepancy is used to refinement the reconstruction. This iterative cycle is repeated until a desired accuracy is reached.
There is a pressing need for contributions from the graph and hypergraph theory community to establish connections between the form of hypergraph observed in applications to (bio)chemical reaction networks.
Among all modelling approaches for reaction networks, a particular emphasis of Constraint-Based Reconstruction and Analysis (COBRA, Fig. 1 ) ( Palsson, 2015 ) is reconstruction and modelling of biochemical networks at genome-scale. Such models contain the majority of the known reactions in an organism, within a scope based on considerations of the application domain, and give rise to stoichiometric matrices with a large number of rows and columns. Almost every biochemical constraint-based modelling problem is posed as an optimisation problem involving a stoichiometric matrix ( Palsson, 2015 ), and thus obtaining solutions to high-dimensional optimisation problems is essential to generate model predictions. This emphasis on optimisation has lead to an increasing interest in biochemical constraint-based modelling from the mathematical and numerical optimisation community ( Ma et al., 2017 ).
A stoichiometric matrix may be distinguished from an arbitrary rectangular matrix by mathematical properties arising from its biochemical origins. This matrix is may be fully specified by the known biochemistry of an organism, cell, organelle or biochemical subsystem being modelled. The last universal common ancestor from which all organisms now living on Earth have common descent is hypothesised to have lived over three billion years ago. The complete biochemical network of this organism, and every descendant thereof, is not known. Therefore, we do not yet, and may never have, a complete mathematical classification that specifies the subset of rectangular matrices to which every stoichiometric matrix belongs. What we do have is a certainty that this class is restricted by the physicochemical and biological principles that govern all living systems. The main purpose of this paper is to emphasise certain special mathematical properties of stoichiometric matrices that arise from physicochemical principles.
To date, much of the focus has been on characterisation of mathematical properties shared by stoichiometric matrices and arbitrary rectangular matrices, e.g., ( Papin et al., 2004 ). However, certain mathematical properties are known to distinguish a stoichiometric matrix from an arbitrary rectangular matrix. In chemistry, a moiety is a subunit of a molecule and conserved moiety is one that is invariant with respect to a defined set of chemical transformations. Clarke (1988) proposed that each basis vector for the left nullspace of a stoichiometric matrix corresponds to an independent conserved moiety. Famili and Palsson (2003) computed a convex basis, of extreme rays that may be linearly dependent, for the left null space and classified (conserved) moieties according to their relationship with cofactors and the boundary of the system. However, establishing a correspondence between each extreme ray and the structure of a moiety was not automatic. Householder QR factorisation ( Vallabhajosyula et al., 2006 ) and sparse LU factorisation ( Gill et al., 1987 ) are efficient methods for computing for basis vectors for the left nullspace of a large stoichiometric matrix but it is challengeing to interpret a linearly independent basis vector in terms of chemistry if it contains negative entries. Plasson et al. (2008) defined a reacton (conserved moiety) , as a subpart of a molecule that is never broken into smaller parts by any of the reactions composing the network. Based on this definition, it was proposed that a chemical reaction network be interpreted as simple recombinations of reactons, where each reaction could be represented by partial reactions, each one describing the transfer of reactons from one compound to another. Furthermore, examples were given of splitting a stoichiometric matrix into a sum of incidence matrices, each representing a directed graph of reacton transfers. Various approaches, none efficient at genome-scale, were considered to compute a non-negative basis for the left nullspace of a stoichiometric matrix, each of which was then manually identified with a reacton. Haraldsdóttir and Fleming (2016) defined a conserved moiety as a group of atoms that remains intact in all reactions of a network. They then showed that the structure of each conserved moiety and the corresponding non-negative left nullspace basis vector, could be efficiently identified at genome-scale by graph theoretical analysis of an atom transition graph, which required atom mappings for each reaction ( Rahman et al., 2016 ). It is needed to clearly specify, in graph and hypergraph theoretical terms, the mathematical relationship between atom transition graphs, chemical reaction hypergraphs and conserved moieties. Furthermore, it is necessary to investigate the properties that this relationship endows on a stoichiometric matrix that distinguish it from an arbitrary rectangular matrix. This paper has three objectives. The first objective, in Section 2 , is to briefly introduces some basic concepts from graph and hypergraph theory ( Voloshin, 2009 ). The second objective, in Sections 3 -5 , is to introduce the established concepts of a molecule, reaction and network, respectively, in terms of graph and hypergraph theory. This introduction is given at a high level in terms of a hypergraph where each vertex is a molecule, and each edge is a reaction, and also at a lower level in terms of graphs where each vertex is an atom embedded in a molecular structure and each edge is a bijection between atoms in separate molecules. The third objective, in Sections 6 and 7 , is to introduce the concept of a conserved moiety in terms of graph and hypergraph theory, split a stoichiometric matrix for a network into the sum of a set of subnetwork incidence matrices, each of which is an incidence matrix for a moiety subnetwork, then relate this to the mathematical properties of a stoichiometric matrix.
Notation Throughout this paper, R , R n , and R m ×n denote the field of real numbers, the vector space of n -tuples of real numbers, and the space of m × n matrices with entries in R , respectively. Similarly, Z , Z n , Z m ×n stand for integer numbers, the vector space of n -tuples of integer number, and the space of matrices with entries in Z , respectively. N T denotes the transpose of a matrix N in R m ×n . R n + and R n ++ display non-negative real n -tuples and positive real n -tuples in R n , respectively, and Z n + and Z n ++ display non-negative integer n -tuples and positive integer n -tuples in Z n , respectively. Let 1 be the vector of all ones. For a matrix A ∈ R m ×n , A i and A : j denote the i th row and the j th column of A , respectively, where i ∈ 1 , . . . , m and j ∈ 1 , . . . , n . The exponential or natural logarithm of a vector is meant component-wise and exp (log (0)) := 0. Further, [ · , · ] stands for the horizontal concatenation operator, and I denotes an identity matrix.
A calligraphic, uppercase, roman letter, e.g., A , denotes a set, multiset or sequence, with { · , · } denoting an unordered pair, ( · , · ) denoting an ordered pair and (·, . . . , ·) denoting a sequence. Let | A | denote the cardinality of the set A . A multiset is a modification of the concept of a set that, unlike a set, allows for multiple instances for each of its elements. In a multiset M := (A , f ) , A is a set and f : A → Z + is a function from A to the set of positive integers giving the multiplicity of the i th element A i in the multiset as the number f (A i ) . In multiset { a, a, b }, the element a has multiplicity 2, and b has multiplicity 1. The cardinality of a multiset is constructed by summing up the multiplicities of all its elements. The cardinality of sets, multisets and sequences is all assumed to be finite.
In illustrative examples, all metabolic species and reactions are annotated with their abbreviated identifier used in the Virtual Metabolic Human database ( http://vmh.life ), e.g., the crn abbreviation for the metabolite L-carnitine (crn).

Graph and hypergraph theory
There exist various excellent introductory textbooks on graph theory, e.g., Wilson (2020) and hypergraph theory, e.g., Voloshin (2009) . Nevertheless, for completeness we introduce key terms in graph and hypergraph theory next. A graph G(V, E ) is a mathematical object which consists of a set of vertices V and a set of edges E, where V : whence E j is said to join the head vertex V i to the tail vertex V k . An orientation of an undirected edge is an assignment of a direction to that edge, turning it into a directed edge. An inverted edge swaps the order of a pair of vertices in a directed edge. A subgraph G of a graph G is a graph whose vertex set and edge set are subsets of those of G.
A graph can be represented by an incidence matrix B ∈ Z m ×n , where each row corresponds to a vertex, each column corresponds to an edge and the entries are given by otherwise , or by its adjacency matrix A ∈ Z m ×m given by . . , m and j = 1 , . . . , n . An incidence matrix B ∈ R m ×n is said to be conserved if the summation of each column of B vanishes, that is A labelled graph is a graph that associates each vertex with one of a set of vertex labels and associates each edge with one of a set of edge labels. A vertex-labelled graph is a graph that associates each vertex with one of a set of vertex labels. An edge-labelled graph is a graph that associates each edge with one of a set of edge labels.
An isomorphism between two graphs G 1 (V 1 , E 1 ) and G 2 (V 2 , E 2 ) is a bijection ψ : V 1 → V 2 and θ : E 1 → E 2 . If the graphs are labelled, an isomorphism also preserves labelling. A set of graphs isomorphic to each other is called an isomorphism class of graphs. A path is a finite sequence of edges which connect a sequence of vertices. A pair of vertices is connected if there exists a path between them.
A component of a graph is a subgraph with a path between any two of its vertices and without a path to any vertex in the remainder of the supergraph. A vertex with no incident edges is itself a component.
A hypergraph H(V, S ) is a generalisation of a graph in which the j th hyperedge S j := {A j , B j } ∈ S is a pair of multisets of vertices A j ⊂ V and B j ⊂ V. A directed hypergraph H(V, S ) is a generalisation of a directed graph in which the j th directed hyperedge S j := (F j , R j ) ∈ S is an ordered pair of subsets of vertices, where F j ⊂ V and R j ⊂ V denote subsets of vertices corresponding to the tail and head of the j th hyperedge. A network is either a graph or a hypergraph.

Molecules
Strictly speaking, a molecule is an electrically neutral group of two or more atoms held together by chemical bonds. However, henceforth, for the sake of simplicity, we stretch this definition to also encompass an electrically charged molecule (ion) and a molecule with one atom. This is akin allowing a single isolated vertex to be defined as a graph. A molecule may be represented at multiple levels of abstraction. First, Section 3.1 introduces a molecule at a high level of abstraction, where each molecule is only represented by a chemical formula. Then, Section 3.2 introduces a molecule at a low level of abstraction in terms of its topological structure.

Molecules
A high level abstract representation of a molecule is to associate it a unique label. Unless otherwise specified, a molecule is assumed to mean a biochemical, that is, a chemical that is found in a biological system. A molecule could be a protein, a carbohydrate, an ion, a water molecule, or any other singular instance of a chemical found in a living being.

Definition 2.
A compartment, is a distinct, finite, contiguous subdivision of the three-dimensional space of a biochemical system that is demarcated by a boundary that selectively permeable to certain molecules.
All biochemical systems occupy at least one compartment ( Lane and Pariseau, 2016 ), and often multiple hierarchically embedded compartments. For instance, a eukaryotic cell consists of several compartments such as mitochondria, cytosol, nucleus and endoplasmic reticulum. A selectively permeable boundary prevents the diffusive exchange of certain molecules across the boundary of a compartment.

Definition 3.
A molecular species is a finite set of identical molecules, labelled with a single compartment.
Unless otherwise specified, a molecule is assumed to mean biomolecule. Two molecules in separate compartments, that are otherwise identical, are still considered distinct species. Compartmentalisation is denoted with a bracketed suffix to the abbreviated species label, e.g., crn [ c ] and crn [ m ] are the labels for the molecule L-carnitine (crn) in the cytosolic [ c ] and mitochondrial [ m ] compartments, respectively.

Molecular graphs
Although there is a rich literature on the representation of chemistry in terms of graphs ( Trinajsti ć, 1992 ), we only introduce some basic concepts in chemical graph theory here as our focus is on the mathematical structure of stoichiometric matrices, rather than the structure of individual molecules. Each molecule consists of a set of atoms. Each atom consists of a nucleus, with subatomic entities termed protons and neutrons, surrounded by electrons. Protons have positive electrical charge, neutrons have neutral charge and electrons have negative charge. We assume that biological systems conserve atomic nuclear structure, but they can change the number of electrons associated with an atomic nucleus, therefore each molecule is assigned a net electrical charge.

Definition 4.
An atom is a singular instance of a chemical element.
Unless otherwise specified, an atom is assumed to mean an atom of an element that is found in a biological system. Of the ~118 known chemical elements only ~27 are known to be incorporated into biochemical systems.

Definition 5.
A molecular formula, is the natural number of atoms of each element in a molecule.
For example, the molecular formula of a citrate molecule with charge −3 (cit) is C 6 H 5 O 7 . That is, it consists of 6 carbon atoms (C), 5 hydrogen atoms (H) and 7 oxygen atoms (O). The mass of a molecule is given by the sum of the strictly positive masses of each of its constituent atoms. The (mono-isotopic) molecular mass of a citric acid molecule with charge −3 is 192.0270026 Da.
Definition 6. Given a molecule V k , its atomic cardinality n (V k ) is sum of the number of atoms, irrespective of element label, in that molecule. Given a set of molecules V its atomic cardinality is the sum of the cardinality of each molecule, that is For example, citrate has atomic cardinality 18, while the molecular formula of L-carnitine is C 7 H 15 NO 3 and therefore its atomic cardinality is 26, therefore the atomic cardinality of the set A = { citrate , L − carnitine } is 44.

Definition 7.
A chemical bond is a singular instance of a pair of atoms.
In chemical terminology, a chemical bond is a lasting attraction between two atoms.
Definition 8. Given a set of molecules V, the molecular graph of and each edge Y j is a chemical bond in a molecule. A molecular graph represents the complete set of | X | atoms and | Y | bonds in a molecule as a single connected component. Each vertex is triply labelled, with (i) an element label, which is a type of chemical element, (ii) a molecular label, which uniquely identifies the molecule, and (iii) an atomic label i ∈ 1 . . . n (V ) , which uniquely identifies each of the n (V ) atoms in V. Each edge is labelled with a type of chemical bond.
In chemistry, a molecule must have at least one bond between two atoms. However, for the sake of consistency with graph theory, a chemical entity that consists of a single atom and no bond is also referred to as a molecule, as it corresponds to a graph with one vertex and no edge. Certain chemical assumptions are used to define the conditions for two molecules to be considered identical or distinct. These assumptions arise from topological and geometric considerations as to the structure of a molecule. However, with respect to the structural representation of a molecule considered here, it is to necessary and sufficient to consider that two molecules are of the same molecule if and only if both molecules are labelled with the same compartment and their corresponding molecular graphs are isomorphic.

Fig. 2.
A molecule of citrate represented as a vertex and a molecular structure. Citrate represented as a node with its molecular formula (left) and represented as a molecular graph (right). The three types of element are oxygen (red), carbon (green) and hydrogen (blue). The two types of bond are illustrated, single (-) and double ( = ).
(For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)

Reactions
A reaction is a process that leads to the chemical transformation of one set of molecular entities to another. Unless otherwise specified, a reaction is assumed to be a biochemical reaction, that is, a reaction that is found in a biological system. This excludes reactions that involve changes to nuclear structure, e.g., nuclear fusion. A reaction may be represented at multiple levels of abstraction. First, Section 4.1 introduces a reaction at a high level of abstraction, in terms of molecules and reaction stoichiometry. Then Section 4.2 introduces a reaction at a low level of abstraction in terms of molecular structures and atom mappings.

Reaction stoichiometry
At a high level of abstraction, a reaction may be represented by a reaction equation, described below, which only specifies the quantities associated with each molecule involved and whether they are consumed or produced in the reaction. The concept of a hyperedge, as a pair of vertex subsets, is well established. However, before we mathematically define a reaction, we must generalise the concept of a vertex subset, to allow a natural number weight on each vertex in a subset. This permits a generalisation of the concept of a hyperedge, where each involved vertex is associated with a natural number weight.

Definition 9.
A chemical complex C(V ) is a subset or multiset of molecules, drawn from a set of molecules V. A stoichiometric number is the multiplicity of molecules of a molecular species in a complex.
The term stoichiometry is derived from the ancient Greek origins of stoicheion meaning element and metron meaning measure.
The cardinality of a chemical complex | C | is the sum of the multiplicities of each of its constituent molecule. The set of molecules V may be the same for both chemical complexes in a hyperedge, but in that case their multiplicity must differ. In a chemical complex, the entities may be distinct or identical, that is corresponding to distinct species, or a single species, respectively. If there is no molecule of a species in a complex, then the stoichiometric number is trivially zero. If a reaction involves a molecule with a multiplicity greater than one, then this is represented by multiple instances of the same molecule, rather than a single molecular species, as is often the approach taken in stoichiometric modelling ( Palsson, 2015 ). We are interested in the relationship between mathematical modelling of a biochemical network at stoichiometric and atomic levels of resolution and atom mappings are between molecules, rather than molecular species, so we also represent reaction stoichiometry in terms of molecules.
In chemistry, thermodynamics dictates an to the complexes in reaction, leading to a directed reaction (hyperedge). In graph theory, an orientation of an (undirected) graph is an assignment of a direction to each edge, turning a graph into a directed graph.
Definition 11. A directed reaction is a directed hyperedge Y := (F (V ) , R (V )) , formed from an ordered pair of complexes, where F is the tail complex and R is the head complex.
By the principle of microscopic reversibility ( Lewis, 1925 ), each reaction is reversible, therefore when representing a real reaction we always have a pair of symmetric directed hyperedges.
In an undirected reaction, V i denotes the i th molecule in complex F and V k the k th molecule in the complex R , whereas for a directed reaction, F and R are referred to as the tail and head complexes, respectively.
In a reaction equation the symbol signifies an equivalence relation. This is consistent with the chemistry literature. Once an orientation is chosen, it is conventional to write a directed reaction equation with the tail complex (substrate complex) to the left and the head complex (product complex) to the right. The use of a union symbol is mathematically correct, but the use of a summation symbol is far more commonly observed in the chemistry literature.

Example reactions
In complex Q , the stoichiometric number is 2 for V 1 , and in complex Q the stoichiometric number is 1 for V 2 and 1 for V 3 . Fig. 4 provides a toy biochemical example, consisting of a cell with one sub-cellular compartment, several molecules and one directed reaction whose equation is Each molecule corresponds to a vertex, and the set of vertices

Atom mappings
At a low level of abstraction, a reaction can be represented as a mapping between pairs of atoms, where one atom is in a substrate complex and another atom is in a product complex.
Definition 13. Given a set of molecules V and a chemical complex is the disjoint union of a multiset of | C | molecular graphs, where each molecular graph corresponds to one molecule V k ∈ C. Let n (V k ) denote the atomic cardinality molecule of molecule V k , then the total number of vertices in complex graph C is Each vertex is triply labelled with (i) an element label, (ii) a molecular label, and (iii) an atomic label.
The number of connected components of a complex graph is equal to the number of molecules in that complex. For example, a complex graph will contain two connected components that are isomorphic up to vertex labelling, if the complex consists of two identical molecules, that is a molecular species with stoichiometric number (multiplicity) two.

Definition 14. Given a reaction
The edge is labelled with a reaction label, which uniquely identifies a reaction. Both vertices must have the same element label, but the molecular and atomic labels may be different.
The element label of the vertex X i ∈ G(X , Y, P ) is the same as the element label of the vertex X k ∈ G(X , Y, Q ) . That is, an atom transition is an edge between a pair of atoms of the same element, one in each of the pair of complexes involved in a reaction. Therefore, in a reaction, the total number of atoms of each element in both complexes is the same. For example, in reaction (1) the molecular formula of citrate is C 6 H 5 O 7 while the molecular formula of water is H 2 O and the molecular formula of cis-aconitic acid is C 6 H 3 O 6 . The element specific sum of atoms in the latter two molecules is C 6 H 3 O 7 , which is the same as the molecular formula for citrate. The atomic label of both vertices is generally not the same, because typically reactions involve transformation of one set of molecules into another set of molecules and, within a given set of molecules, atomic labels are unique, by definition.
Definition 15. Given a set of molecules V and a reaction H := formed by the disjoint union of the set of Each edge is labelled with an identical reaction label. Each vertex is labelled with an element label, a molecular label and an atomic label.
Note that an atom mapping consists of | Y | connected components, each of which contains one edge and two vertices with identical element labels. That is, all edges of the molecular graphs of each molecule in V are omitted. One reaction may correspond to multiple alternate atom mappings, e.g., if a molecular structure has a symmetrical subgraph, this may permit multiple alternate atom mappings that are equivalent with respect to element vertex labelling, but not with respect to atomic vertex labelling. Fig. 5 illustrates an atom mapping for the citrate hydro-lyase reaction (link).

Networks
A biochemical network consists of a set of molecules that are chemically transformed into one another by a set of reactions. A biochemical network may be represented at multiple levels of abstraction. First, Section 4.1 introduces a biochemical network at a high level of abstraction, in terms of molecules and reaction stoichiometry. Then Section 4.2 introduces a biochemical network at a low level of abstraction in terms of molecular graphs and atom mappings.

Stoichiometric hypergraphs
A stoichiometric hypergraph is a network of reactions expressed in terms of molecules and reaction stoichiometry.
Note that in this definition of a stoichiometric network, each vertex corresponds to a molecule, which is a singular instance of a distinct chemical, rather than a molecular species, which is a finite set of identical molecules. If one replaces each reaction with a symmetric pair of directed hyperedges then a stoichiometric network can also be represented by a directed hypergraph.  5. The citrate hydro-lyase reaction. The chemical conversion of citrate into water and cis-aconitic acid represented as a hyperedge (reaction) and a set of edges (atom mapping). The citrate hydro-lyase reaction (link) involves three molecules (a), each of which may be represented by a molecular graph (b), where each vertex is an atom and each edge is a chemical bond between atoms. The reaction is between two complexes (c), one complex P consisting of citrate (left, cit) and one complex Q consisting of water (middle, h2o) and cis-aconitic acid (right, cisa). A reaction may be considered as a hyperedge Y, where each vertex is defined by a molecular graph (e). Each atom in complex P (citrate) corresponds to one atom in complex Q (water and cis-aconitic acid) and together they constitute the atom mapping (corresponding atoms are individually labelled with numerical superscripts and connected by dotted lines) that represents the hyperedge Y as a set of disconnected edges. Definition 17. A directed stoichiometric hypergraph H(V , Y (F, R )) is an oriented stoichiometric hypergraph, that consists of a set of m vertices V := {V 1 , . . . , V m } , and sequence of n directed hyperedges Y := (Y 1 , . . . , Y n ) . In the j th reaction Y j := (F j , R j ) the tail complex is is a reverse stoichiometric matrix , with F and R being two sequences of cardinality n .
The entry F i,j is the stoichiometric number of molecule i consumed in the j th directed reaction and the entry R i,j is the stoichiometric number of molecule i produced in the j th directed reaction. If the i th molecule is neither produced, nor consumed in the j th directed reaction, then F i, j = R i, j = 0 . If F j, j = R i, j > 0 then the i th molecule is termed a catalyst of the j th directed reaction as it is chemically invariant with respect to that chemical transformation.
Let us know introduce the main mathematical object that is the main focus of attention in this paper. If and only if the i th molecule is a catalyst in the j th directed reaction then N i, j = 0 yet F j, j = R i, j > 0 . If the i th molecule does not participate in the j th directed reaction then N i, j = F j, j = R i, j = 0 . Therefore, N can be defined in terms of F and R while the opposite is not the case. We have introduced a stoichiometric matrix with a conjecture, rather than a definition as the construction of a complete mathematical definition of a stoichiometric matrix is an open problem. Note that in this definition of a stoichiometric hypergraph, each vertex corresponds to a molecule, which is a singular instance of a distinct chemical, rather than a molecular species, which is a finite set of identical molecules. Therefore, a stoichiometric matrix is a sign matrix, i.e., N ∈ {−1 , 0 , 1 } m ×n , while forward and reverse stoichiometric matrices are binary matrices F,

Conjecture 18. Given a directed stoichiometric hypergraph
Certain key topological features of a stoichiometric hypergraph can be discerned from its stoichiometric matrix ( Palsson, 2015 ). Given a stoichiometric matrix N ∈ R m ×n , its zero pattern N { 0 , 1 } m ×n is the binary matrix obtained by replacing each non-zero entry of Fig. 6. A directed stoichiometric hypergraph. The four molecules (vertices) are citrate (cit, C 6 H 5 O 7 ), isocitrate (icit, C 6 H 5 O 7 ), cis-aconitic acid (cisa, C 6 H 3 O 6 ) and water (h2o, H 2 O ). In biochemical terms, the reactions (black hyperedges) are Y 1 : aconitate hydratase (ACONTm), Y 2 : citrate hydro-lyase (link) and Y 3 : isocitrate hydro-lyase (link). Although each reaction is, in principle, reversible, the directions of each hyperedge are given in the conventional orientation, consistent with the corresponding stoichiometric matrix.
N by 1. The number of non-zero entries in each column, N T 1 , gives the molecular cardinality for each reaction. The number of nonzero entries in each row, N 1 , gives the reaction cardinality for each molecule. The molecularadjacency matrix is given by B := N N T . Each diagonal element of the molecule adjacency matrix gives the reaction cardinality of a molecule, and each off-diagonal element gives the number of reactions in which two molecules participate together. The reactionadjacency matri x is given by A := N T N . Each diagonal element of the reaction adjacency matrix gives the number of molecules that participate in a reaction while each off-diagonal element gives the number of molecules shared by two reactions. Therefore, the vectors a N T 1 and N 1 and matrices A := N T N and B := N N T can provide us valuable information about the sparsity pattern of stoichiometric matrices. Such issues become especially important in practical applications involving numerical computing with high dimensional stoichiometric matrices.

A directed stoichiometric hypergraph with three reactions
Consider a directed stoichiometric hypergraph with 4 molecules representation of which is illustrated in Fig. 6 . The 3 reaction equations are Y 1 : cit icit , Y 2 is the citrate hydro-lyase reaction introduced in Section 4.1 . The forward and reverse stoichiometric matrix of this network are In these matrices, each row is labelled with the corresponding molecule and each column is labelled with the corresponding reaction. The net stoichiometric matrix of the mitochondrial subnetwork is where again rows and columns correspond to molecules and reactions, respectively. The molecular adjacency matrix is while the reaction adjacency matrix is

A stoichiometric hypergraph of human metabolism
Metabolism refers to the set of reactions necessary to sustain the life of a single organism. Metabolism extracts energy and material precursors from food, and uses them to synthesises the macromolecules, e.g., proteins, that make up an organism. Metabolism also degrades macromolecules and eliminates waste. A stoichiometric hypergraph of metabolism is a reaction network representing metabolism where the molecules are metabolites (low molecular mass organic chemicals) and the reactions are metabolic reactions ( Fig. 7 ).
The latest comprehensive reconstruction of human metabolism, Recon3D ( Brunk et al., 2018 ), accounts for 17% of the functionally annotated genes in the human genome, and consists of 5,835 rows (molecular species) and 10,600 columns (reactions) in 9 compartments. Certain key topological features of the human metabolic network can be discerned from analysis of its corresponding stoichiometric matrix. The sparsity pattern for the stoichiometric matrix of the Recon3D reconstruction is illustrated in Fig. 8 . Reaction cardinality can vary widely depending on the molecular species concerned, with some molecular species participating in many reactions and others at least two, but perhaps only two reactions. For all genome-scale metabolic networks known there is an approximately linear relationship between the logarithm of reaction cardinality and the rank ordered reaction cardinality. That is, reaction cardinality approximates a power law distribution ( Palsson, 2015 ). Figure 9 illustrates the molecular and reaction cardinality of Recon3D. Fig. 10 illustrates the molecular and reaction adjacency matrices of Recon3D.

Atom transition graphs
An atom transition graph is a representation of a reaction network in terms of atoms and atom mappings.
Definition 19. Given a set of molecules V and a stoichiometric hypergraph H(X , Y{F (V ) , R (V ) } ) , an atom transition graph is a graph G(X , E, H) formed by uniting a set of | Y | atom mappings, each of which corresponds to a reaction. The union merges vertices of atom mappings that have identical elemental and atomic labels.
Each of the p := | X | vertices corresponds to an atom of an element in one of the m := | V | molecules. Each of the q := | E | edges corresponds to an atom transition in an atom mapping corresponding to one of the n := | Y | reactions. Each vertex is labelled with elemental, molecular and atomic labels, while each edge is labelled with a reaction label.
In a molecular graph, each vertex is triply labelled, with (i) an element label, which is a type of chemical element, (ii) a molecular label, which uniquely identifies the molecule, and (iii) an atomic label i ∈ 1 . . . n (V ) , which uniquely identifies each of the n (V ) atoms in V. If a pair of reactions share at least one molecule in common, then they share vertices from the same molecular graph, therefore the corresponding pair of atom mappings can be united in an atom transition graph, by merging vertices with identical elemental and atomic labelling, but possibly different molecular labels. In an atom transition graph, each edge is labelled with a reaction label.

Definition 20. A directed atom transition graph G(V, E, H) is an ori-
ented atom transition graph, with p := | X | vertices and q := | E | directed edges, with topology represented by the incidence matrix

An example of an atom transition graph
We now provide an example of an atom transition graph corresponding to the 3 reaction biochemical network introduced in Section 5.1.1 . This atom transition graph is formed by uniting identical vertices from an atom mapping for Y 2 , which is the citrate hydro-lyase reaction illustrated in Fig. 5 , and with identical vertices from atom mappings for Y 1 and Y 3 . The atom transition graph is illustrated in Fig. 11 .

Moieties
Each connected component of an atom transition graph corresponds to a set of atoms that have identical elemental labels, but may have different molecular and atomic labels. Each path in a connected component of an atom transition graph corresponds to the trajectory that a single instance of an atom could take, via a sequence of atom transitions, each of which corresponds to a reaction. It is of interest to group connected components that are the same throughout an atom transition network, because they identify conserved molecular substructures, as defined below. Herein we assume a time invariant representation of a reaction network at atomic resolution, that is, every chemical transformation corresponding to a reaction has occurred sufficiently that an atom in every position of every molecule of a substrate complex has been mapped to every chemically feasible position of every molecule in a product complex.

Conserved moieties
Definition 21. A conserved moiety is a set of atoms, where each atom belongs to one connected component of an isomorphism class of connected components of an atom transition graph G(X , E, H) ( Haraldsdóttir and Fleming, 2016 ). The isomorphism class is of maximum cardinality and the isomorphism is label preserving with respect to molecular labelling of vertices and reaction labelling of edges.  ( Brunk et al., 2018 ), termed Recon3Map ( Noronha et al., 2017 ), which was manually drawn using the network layout editor CellDesigner (version 4.4) ( Funahashi et al., 2008 ). To avoid excessive crossing of hyperedges, certain molecules that are involved in many reactions have been duplicated at different positions in the network. Fig. 8. The stoichiometric matrix of Recon3D. This stoichiometric matrix consists of 5835 rows (molecular species not molecules) and 10,600 columns (reactions). Only 0.065% (40, 425/61, 851, 0 0 0) of entries are non-zero (nz). The approximate upper diagonal appearance is due to the ordering of the reactions, rather than an intrinsic feature of a stoichiometric matrix. For genome-scale biochemical networks, stoichiometric matrices are sparse because molecular cardinality is typically less than 10 for most reactions.
Each connected component of an atom transition graph consists of vertices with the same elemental label. However, a pair of connected components of an atom transition graph may still be isomorphic with respect to Definition 21 even though they might correspond to different elements. For example, one connected component might correspond to an oxygen atom, while another con-nected component might correspond to a carbon atom. The atoms of a conserved moiety always corresponds to a subgraph of a molecular substructure. Often this subgraph consists of a single connected component, but it may consist of multiple connected components. For example, a pair of isomorphic connected components may correspond to a pair of atoms in the same molecule Fig. 9. The rank-ordered molecular cardinality molecular (species) and reaction cardinality of Recon3D. Fig. 10. The molecular and reaction adjacency matrices of a stoichiometric hypergraph. The sparsity patterns of the molecular (species) adjacency matrix ( NN T , left) and the reaction adjacency matrix ( N T N , right) for Recon3D, have 0.23% and 6.61% non-zero elements (nz), respectively. The fraction of blue is an overestimate of the actual sparsity pattern due as the minimum size of a coloured pixel is greater than the size of an element. Nevertheless, one can observe that it is less common for a pair of molecular species to participate in the same reaction (off-diagonals in NN T , left) than it is for a pair of reactions to involve the same molecular species (off-diagonals in N T N , right). . (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.) but without a bond between them and therefore the corresponding conserved moiety consists of a set of vertices, but more than one connected component. If L k,i = 0 then the k th conserved moiety is not incident in molecule V i . There may be more than one instance of a conserved moiety in a molecule, so L k,i ∈ Z + rather than L k,i ∈ {0, 1}. To see this, consider a connected component in an atom transition graph that is incident more than once in the same molecule. In this case there will be more than one instance of the corresponding conserved moiety in the same molecule, and therefore L k,i > 1.

Corollary 23.
Let N ∈ Z m ×n be a stoichiometric matrix corresponding to a directed stoichiometric hypergraph H (V , Y (A , B)) with m molecules and n reactions . The conserved moiety matrix L ∈ Z t×m + derived from the corresponding atom transition graph G (X , E, H) is orthogonal to R (N) , that is L · N = 0 .
Proof. In an atom mapping, the number of atoms of each element is the same in both tail and head complexes (cf (15) ). Therefore, the number of instances of the k th conserved moiety in tail complex of the j th reaction L k · F : j is the same as the number of instances of the identical conserved moiety in a molecule of a head complex L k · R : j , that is L k · F : j = L k · R : j , so L k · N : j = 0 . Each column of N represents the transformation of a tail complex into a Fig. 11. An atom transition graph for 3 reactions. The molecular structures of each molecule (blue disks) are those of citrate (left, cit), isocitrate (right, cit), water (middle, h2o) and cis-aconitic acid (bottom, cisa). Each atom in each complex is individually labelled (numerical superscripts). The labelling of each atom is invariant with respect to each atom transition. This is a sufficient condition to ensure that an atom transition is always between atoms of the same element. (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.) Fig. 12. Connected components and conserved moieties of an atom transition graph. Each molecule in { icit , h 2 o , cit , cisa } is displayed as a set of atoms. Atom transitions are labelled with colours corresponding to reactions R 1 (black), R 2 (blue) and R 3 (red). Connected components corresponding to atoms 1, 2 and 15 (green, also in Fig. 11 ) belong to one isomorphism class, that is label preserving with respect to molecular labelling of vertices and reaction labelling of edges. The set of atoms { H 1 , O 2 , H 15 } are therefore a conserved moiety. Connected components corresponding to atoms 4, 5, 6, 8, 9, 10, 11, 12, 13, 14, 3, 16, 17, and 18 (yellow, also in Fig. 11 ) belong to a different isomorphism class and make up a second conserved moiety. (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.) head complex, leaving the number of each conserved moiety invariant in every reaction, that is L · N = 0 . Fig. 12 illustrates the connected components and conserved moieties of the 3 reaction biochemical network introduced in Section 5.1.1 . In the atom transition graph introduced in Section 5.2.1 , there are two moieties and each of their atoms are labelled green and yellow in Fig. 11 and as sets of connected components in Fig. 12 . The conserved moiety matrix corresponding to Fig. 12 is

Example conserved moieties
The first and second conserved moiety vectors, L 1 and L 2 correspond to two isomorphism classes (green and yellow) in Fig. 11 . The invariance of the number of moieties with respect to each reaction is illustrated with .

Redundancy of conserved moiety vectors
The following example illustrates that there may exist more than m − rank ( N ) conserved moiety vectors orthogonal to R (N) that are linearly dependent. Consider a single reaction of the form ab + cd ac + bd, where a, b, c and d are moieties. The stoichiometric matrix N ∈ Z m ×n and conserved moiety matrix L ∈ Z t×m + , respectively, are given in Table 1 a and b. The number of moieties is k = 4 , the number of molecules is m = 4 and rank ( N ) = 1. Therefore, t > m − rank ( N ) . A moiety basis for N can be formed by selecting any three of the four conserved moiety vectors in L , giving a total of four possible combinations. A real example reaction of this form is shown in Fig. 13 .

Moiety splitting
Given a stoichiometric hypergraph and its corresponding atom transition graph, subject to certain assumptions, we now show how to split a stoichiometric matrix into a non-negative sum of incidence matrices, each of which corresponds to a compartmental network.

Moiety splitting of a stoichiometric matrix
Theorem 24. (Moiety splitting) Let N ∈ Z m ×n be a stoichiometric matrix, with r = rank (N) , such that there exists an L ∈ Z m −r×m + and LN = 0 , where each L k is a moiety vector, for all k ∈ 1 , . . . , m − r, then the following matrix splitting exists where N(k ) ∈ Z m ×n is a moiety transition matrix, given by Proof. Substituting (4) into (5) , it is enough to show := L T 1 ∈ Z m ++ and that The expression on the left sums each column of L then places it on the diagonal of an m × m matrix. The expression on the right places each row of L on the diagonal of a matrix, and sums the matrices, which is equivalent to the expression on the left as the operations involved are commutative. Each entry of L is nonnegative so ≥ 0, therefore it remains to show that L T 1 ∈ Z m ++ . By Definition 19 , an atom transition graph G(X , E, H) is formed by joining all atom mappings corresponding to a stoichiometric hypergraph H(X , Y{A , B} ) . Every molecule is therefore part of some atom transition graph, and therefore some isomorphism class, so L T 1 ∈ Z m ++ , giving the desired result.
It is an open question as to the biochemically interpretable conditions required to be satisfied for there to exist an L ∈ Z m −r×m + and LN = 0 , where each L k is a moiety vector, for all k ∈ 1 , . . . , m − r. In general, each stoichometric matrix for a moiety subnetwork N ( k ) is significantly more sparse than N . Since a molecule may contain more than one type of conserved moiety, some of rows, or multiplications thereof, are often repeated in several moiety transition matrices N ( k ). The splitting formalised in Theorem 24 exists for any matrix N ∈ Z m ×n such that, there exists an L ∈ Z t×m + satisfying L T 1 ∈ Z m ++ .

Example of moiety splitting
Moiety splitting of a stoichiometric matrix, by application of Theorem 24 , for the three reaction Eq. (3) , using the conserved moiety vectors given in Section 6.2 , is illustrated in Table 2 .
The two corresponding moiety subnetworks are both graphs, as illustrated in Fig. 14 .

Moiety transition matrices
We next provide some technical properties of the matrices N ( k ), k ∈ 1 , . . . , m − r, which shows that N ( k ) is conserved and the rank of N ( k ) is related to the number of components of the associated subnetwork. Recall that a vertex without any incident edges is considered a (trivial) component.

Theorem 25.
Let N ∈ Z m ×n be a stoichiometric matrix, with r = rank (N) , such that there exists an L ∈ Z m −r×m + with LN = 0 , where each L k ∈ Z 1 ×m + is a moiety vector and N k := diag( L k ) N is an incidence matrix for a moiety subnetwork, for all k ∈ 1 , . . . , m − r, then the following assertions hold:  Fig. 14. A stoichiometric hypergraph split into two moiety graphs. The three reaction network introduced in Section 5.1.1 , is split into two moiety subnetworks. The N (1) moiety subnetwork (a) is a graph with that omits the vertex V 4 , while the N (2) moiety subnetwork (b) is a graph with that omits the vertex V 1 from the stoichiometric hypergraph N .
giving Assertion (i). To prove Assertion (ii), recall that each moiety corresponds to an isomorphism class of connected components in an atom transition graph, so each moiety subnetwork is a connected component, therefore in N ( k ) there is only one connected component which could be a graph or a hypergraph. In order to prove Assertion (iii), without loss of generality by a suitable reordering the rows of N ( k ), we rewrite N ( k ) in the following form, where d i for i ∈ 1 , . . . , m − c + 1 corresponding to nonzero rows of N ( k ) (representing the incidence of the connected component) and . . , m (standing for unconnected component that is a single vertex without any edge). For sake of simplicity, we denote the first m − c + 1 rows of N ( k ) as N ( k ) (1) . If the connected component N ( k ) (1) is a graph, since there is just one +1 and just one −1 in each column of N ( k ) (1) , it follows that the sum of the rows of N ( k ) (1) is the zero row vector, and that the rank of N ( k ) (1) is at most m − c. On the other hand, if N ( k ) (1) represents a hypergraph, also because of being conserved moiety the sum of the rows of N ( k ) (1) is the zero row vector, and the rank of N ( k ) (1) is at most m − c. We now show, the rank of N ( k ) (1) is exactly m − c. To do so, suppose we have a linear relation γ j d j = 0 , where the summation is over all rows of N ( k ) (1) , and not all of the coefficients γ j are zero. Choose a row d k for which γ k = 0. If N ( k ) (1) represents a graph then this row has non-zero entries in those columns corresponding to the directed edges incident with v k . For each such column, there is just one other row d l with a non-zero entry in that column, and in order that the given linear relation should hold, we must have γ l = γ k . Thus, if γ k = 0, then γ l = γ k for all vertices v l adjacent to v k . Since N ( k ) (1) is a connected component, it follows that all coefficients γ j are equal, i.e., the given linear relation is just multiple of which proves Assertion (ii). The results of Assertion (iv) are straightforward from (iii).
Note that the connected component of N ( k ) given in this theorem can be a graph or a hypergraph. Given N with r = rank (N) , and assuming there exists an L ∈ Z m −r×m + satisfying LN = 0 , and where each L k is a moiety vector, for all k ∈ 1 , . . . , m − r, it is an open question as to the biochemically interpretable conditions required to be satisfied for N ( k ) := diag( L k ) N to always result in an incidence matrix for a graph, as opposed to a hypergraph. Theorem 25 (i) clearly implies that the vector of 1 is in the left nullspace of N ( k ), i.e., 1 ∈ N (N(k ) T ) , which has been a known result for incidence matrix of a graph; however, the subnetwork associated to N ( k ), k ∈ 1 , . . . , m − r, can be a hypergraph, but N ( k ) is still conserved.

Example moiety transition matrices
Consider the matrices N (1) and N (2) in Table 2 , where it can be seen that the summation of elements of each column is zero, consistent with Theorem 25 (i). In Fig. 14 (a) and (b), the N (1) and N (2) moiety graphs each consists of 3 vertices and one component therefore rank (N(1) For a slightly larger example, consider the directed stoichiometric hypergraph corresponding to part of human dopamine synthesis, investigated in Haraldsdóttir and Fleming (2016) . It consists of four reactions and eleven molecules given in Table 3 . The stoichiometric matrix and a conserved moiety basis for this network is given in Table 4 . Since N ∈ Z 4 ×11 and rank (N) = 4 , it follows that dim (N (N)) = 11 − 4 = 7 , so L ∈ Z 7 ×11 + and therefore this stoichiometric hypergraph may be split into 7 moiety subnetworks. Consider the N (6) moiety subnetwork in Fig. 15 . The sum of elements of each column of the N (6) stoichiometric matrix is zero, consistent with Theorem 25 (i). There is only one non-trivial component, which is a hypergraph, since it contains one directed hyperedge (F{V 2 , V 9 } , R{V 3 } ) . From Fig. 15 (b), one observes c = 8 components, m = 11 vertices, and n = 3 (2 edges and one hyperedge). Hence, Theorem 25 (iii) im-

Moieties in thermodynamically closed and open systems
In all of the examples presented thus far, we assume we are given a stoichiometric matrix N ∈ Z m ×n , with r = rank (N) , such that there exists an L ∈ Z m −r×m + and LN = 0 , where each L k is a moiety vector, for all k ∈ 1 , . . . , m − r. As a consequence, we are only considering reactions that are mass balanced. This begs the question, with these assumptions how one can model the chemical reaction network of a living system, which is a thermodynamically open system, if every reaction in the system is mass balanced? In most of the literature on mathematical modelling of living systems, Table 4 The stoichiometric matrix, a moiety basis matrix and its column sum, for part of human dopamine synthesis. thermodynamic forcing by the environment is represented by mass imbalanced reactions that inject mass across the boundary of a model. That is, one can inject mass across the boundary of a model by augmenting a stoichiometric matrix with a set of faux reactions that are mass imbalanced. This condition is sufficient but not necessary for modelling a living system. Alternatively, it is sufficient if all reactions are mass balanced, but a subset of faux elementary reactions are given thermodynamically infeasible kinetic parameters ( Fleming and Thiele, 2012 ). In ( Fleming and Thiele, 2012 ), such reactions were termed perpetireactions, where perpeti is from the latin perpes meaning perpetual. Any vector in the range of N may be used to represent the stoichiometry of a perpetireaction ( Fleming and Thiele, 2012 ). With respect to conserved moieties, the following cyclic stoichiometric matrix C := N −I 0 L augments a stoichiometric matrix N ∈ Z m ×n with m perpetireactions, each of which consumes one molecule and produces its constituent set of conserved moieties, corresponding to one column of the conserved moiety basis L ∈ Z m −r×m + . In an electrical network, there is one moiety, corresponding to an electron, and electrical networks are shown as closed loops, where the circulation of electrons must be driven by some energy source. Likewise, a living sys-tem and the part of its environment that it exchanges mass with, may be considered as a set of m − r cycles, one for each of its constituent moieties. Ultimately these cycles must each be driven by an external energy input. Since [ L , I ] is a non-negative left nullspace basis for C , if N admits a moiety splitting, C satisfies the conditions for Theorem 24 and so also admits a moiety splitting.

Discussion
While many biological discoveries have been made possible by applying established mathematical theory and algorithms, the reverse is also true. That is, the biology itself can also inspire the development of novel mathematical theory and algorithms. It is therefore important to keep in mind the principles that govern the behaviour of a biological system being modelled. In biology, mathematical and computational modelling play complimentary roles. Given certain assumptions, a mathematical model of a reaction network allows one to reason about biochemical networks in general. However, each mathematical model depends on a set of assumptions and, even if each of these assumptions are appropriate, it is hard to be sure that any set of assumptions is sufficient. Computational examples of real world biochemical networks complement mathematical models of generic biochemical networks by suggesting new assumptions that might otherwise not be consid- Fig. 15. A moiety subnetwork that is a hypergraph. A moiety subnetwork of a directed stoichiometric hypergraph corresponding to part of human dopamine synthesis ( Haraldsdóttir and Fleming, 2016 ). There are 8 components ered. For example, it is often assumed that a stoichiometric matrix is an arbitrary rectangular matrix with integer entries, that is N ∈ Z m ×n . However, from the perspective of this paper, which is motivated by example biochemical networks, that assumption is insufficient. Stoichiometric matrices are central in a number of modelling paradigms for reaction networks. These include optimisation, ordinary differential equation, and stochastic models. Since the complexity of networks for biological, medical, and technological applications is already straining existing solution algorithms and is only expected to grow further, special properties of the stoichiometric matrix that are rooted in the underlying biochemistry can inform the design of efficient ad hoc algorithms to overcome this hurdle.
In this paper, and in previous effort s ( Fleming, 2016 ), we have focussed on mathematical properties that seem to be peculiar to stoichiometric matrices. These properties should not be considered the definitive set of properties. We are still in the process of identifying the mathematical properties of stoichiometric matrices and there are very likely to be additional properties to discover. Furthermore, it is an open problem to establish which known properties are fundamental, and which are the consequences of other properties, known or unknown. With each new insight into the properties of stoichiometric matrices, there will be more opportunities to use their special structure in mathematical modelling of reaction networks. This has been the motivation for exploring these properties that we believe will help us to work towards a full definition and thereby characterisation of biochemical network topology.
The size of genome-scale biochemical network models has been growing exponentially, in response to the big data revolution in molecular systems biology in the last two decades. The generation of biological understanding from multiple omic datasets requires integrative analysis that compares this new data with hypotheses derived from prior information, in the form of computational model predictions. As discussed, the key element in developing mathematical models for biochemical networks is the stoichiometric matrix constructed from such biological data, which has considerable effects on properties of mathematical models. Therefore, emerging big data in biology brings many opportunities as well as many challenges to scientists in the fields of statistics, applied mathematics, and system biology. This increasingly demands developing new mathematical models that are biologically meaningful and computationally tractable. To this end, the specific structure of stoichiometric matrices should be taken into account to de-sign novel context-specific algorithmic methodologies for such big data problems.
Several classes of optimisation models can be developed for responding to different problems arising in system biology such as linear ( Ma et al., 2017 ), quadratic ( Segré et al., 2002 ), convex , duplomonotone ( Artacho and Fleming, 2014 ) and other nonlinear problems ( Ahookhosh et al., 2019, Ahookhosh et al. 2020. For each class of optimisation models, the specific structure of the stoichiometric matrices might be used to decompose the original problem to several computationally tractable subproblems using the moiety matrix splitting described in Section 7 . This may lead to algorithms with lower analytic and computational complexities. One of the main properties of stoichiometric matrices that can be effectively used in development of novel optimisation methodology is their sparsity pattern, i.e., most of the elements of large stoichiometric matrices are zero. Here, we emphasise that for implementations of optimisation methodologies, one needs to know about function values and gradients (subgradients in nonsmooth cases) of the objective function of the optimisation problems, which require some matrix-vector products. In order to decrease the computational complexity of these matrixvector products, one can exploit the sparsity of stoichiometric matrices combined with high-performance computing to efficiently provide required information for the corresponding methodology. We assert that by combining existing techniques from these two approaches, and developing new tailored techniques that mix elements of these two approaches, will provide a secure path toward solving high dimensional optimisation problems arising in the study of biochemical networks.

Conclusion
A biochemical network with m molecules and n reactions may be expressed as a hypergraph H(V , Y ) that consists of a set of m vertices V := {V 1 , . . . , V m } , each corresponding to one molecule, and a set of n hyperedges Y := {Y 1 , . . . , Y n } , each corresponding to one reaction. Once an orientation is chosen for each reaction, topology of such a network may be represented by a stoichiometric matrix N ∈ {−1 , 0 , 1 } m ×n , with r = rank (N) , where N i,j < 0 if molecule i is consumed in reaction j and N i,j > 0 if molecule i is produced in reaction j . However, N is not an arbitrary rectangular sign matrix because there exists a set of non-negative vectors L ∈ Z m −r×m such that (i) L T 1 > 0 , (ii) LN = 0 , (iii) N ( k ) := diag( L k ) N corresponds to an incidence matrix for a graph, or a hypergraph, with one connected component with the property that 1 N(k ) = 0 and (iv) the following matrix splitting exists Fundamentally, these properties arise due to the fact that, at a higher level of resolution: (i) a molecule may be represented as a molecular graph where each vertex represents an atom and each edge represents a chemical bond between a pair of atoms in a molecule, and (ii) a reaction may be represented as a graph where each vertex represents an atom and each edge represents a transition between an atom in a substrate complex and an atom of the same element in a product complex. It is an open problem to establish whether these properties are particular to stoichiometric matrices alone, or whether they have been studied in graph or hypergraph theory but not yet applied to biology.