A Newton Cooperative Genetic Algorithm Method for In Silico Optimization of Metabolic Pathway Production

This paper presents an in silico optimization method of metabolic pathway production. The metabolic pathway can be represented by a mathematical model known as the generalized mass action model, which leads to a complex nonlinear equations system. The optimization process becomes difficult when steady state and the constraints of the components in the metabolic pathway are involved. To deal with this situation, this paper presents an in silico optimization method, namely the Newton Cooperative Genetic Algorithm (NCGA). The NCGA used Newton method in dealing with the metabolic pathway, and then integrated genetic algorithm and cooperative co-evolutionary algorithm. The proposed method was experimentally applied on the benchmark metabolic pathways, and the results showed that the NCGA achieved better results compared to the existing methods.


Introduction
Recently, computational system biology has gained attention from many researchers and become an important research area. The computational system biology has contributed to the understanding on the process of the complex biology where it enables the biological process to act as a system. It can be achieved by utilizing the techniques and knowledge in the molecular biology and genetics where it allows living cell to be manipulated as a real factory and gives insights to researchers into ways to improve cell production. One way to improve cell production is by in silico optimization of metabolic pathway production. Metabolic pathway can be defined as a series of chemical reactions that occur within the microorganism cell. The computational system biology has enable the metabolic pathway to be represented by mathematical model. Due to that, it is possible to perform the optimization process using computer simulation (in silico optimization).
The in silico optimization of metabolic pathway production can be seen as the search for a set of components (chemical reactions) for maximizing production rate. It works by altering and tuning the value of the components in the metabolic pathway in order to find the maximum production rate. Similar to the production, the total of the component concentrations involved also needs to be considered. When minimum value of component concentrations are involved, the production cost can be reduced [1,2]. Recently, many successful works have been published on the in silico optimization of metabolic pathway production. Most of these works used continuous optimization approach, such as linear programming method [2][3][4] and geometric programming method [5,6]. These approaches usually involve equality and inequality constraints. They also require the definition of the decision variable that requires expert knowledge. Moreover, the search process in the continuous optimization approach totally depends on the decision variable, and this can lead to convergence problem if the decision variable is not accurate [7].
In contrast to the continuous optimization approach, the combinatorial optimization approach works by finding the optimal solution from a finite set of objects. The method functions by using a stochastic operator on a pool of candidate solutions for the optimization problem and makes it more efficient and robust [8]. Due to its stochastic nature, the method does not require any assumption regarding the structure of the model and the definition of the decision variable. This is because the stochastic operator uses random method to determine the search direction, which makes it more robust. For this reason, the present study uses the combinatorial optimization approach for the in silico optimization of metabolic pathway production. In this optimization process, metabolic pathway can be viewed as a nonlinear equations system. This is because the metabolic pathway can be described as mathematical model. There are many methods that can be used in dealing with the nonlinear equations system, including numerical, evolutionary and swarm intelligence methods. The Newton method, which is a numerical method, is the most popular method used [9,10] Newton method is an iterative method used to find an optimum point to real-valued roots. The utilization of the Newton method for the in silico optimization of metabolic pathway production is a good choice because of the fast convergence speed of the Newton method [11]. In this work, Newton method views metabolic pathway as a nonlinear equations system. Applying only the Newton method is not sufficient for the optimization process because the variables in the nonlinear equations system need to be tuned. This is because all the metabolic pathway components are represented by many variables in the nonlinear equations system. Therefore, a method is needed to represent and tune the variables. To overcome this problem, genetic algorithm (GA) is applied. This approach can be performed by representing the variables in the nonlinear equations system as a chromosome. Then, the chromosome is evolved and tuned. However, several issues occur when this method is applied on a complex metabolic pathway that involves many metabolic pathway components. This can cause complex representation of the solution as many metabolic pathway components need to be represented by GA. In addition, evaluating the solution is time consuming. Therefore, a method needs to be embodied into the GA to simplify the representation of the solution. It has been found that applying the cooperative co-evolutionary algorithm (CCA) is the best choice, where the CCA method works by dividing the representation of the solution into multiple sub-solutions [12][13][14].
In this paper, an improved method, namely the Newton Cooperative Genetic Algorithm (NCGA), was utilized for the in silico optimization of metabolic pathway production. This method uses the Newton method to deal with metabolic pathway. GA is then used in the optimization process and CCA is embodied into the GA to decompose the solution into multiple sub-solutions. Furthermore, this study introduced concepts for representing the solution, namely NCGA representation which includes sub-chromosome representation and cooperative chromosome representation. The sub-chromosome is part of the cooperative chromosome, while the cooperative chromosome is a complete solution. The NCGA representation concept ensures that the NCGA is able to increase the production. In addition, this study also introduced a two-level evaluation of the solution, namely sub-chromosome evaluation and cooperative chromosome evaluation where this concept minimizes the total of the component concentrations involved. In the following section, the theory of the metabolic pathway is discussed, where the modelling of the metabolic pathway and the optimization problem are described. This is followed by a discussion of the proposed method and two case studies of the optimization of ethanol production in Saccharomyces cerevisiae (S. cerevisiae) pathway and the optimization of tryptophan (trp) biosynthesis in Escherichia coli (E. Coli) pathway. Finally, the results are presented and discussed before a brief conclusion is made.

Modeling of metabolic pathway
In this study, the mathematical model used to represent the metabolic pathway is generalized mass action (GMA) models. The representation of GMA has the following form: where s represents the stoichiometric matrix of the system and v(x) denotes the vector that contains reaction rate. There are two types of reactions, which are dependent and independent variables. The dependent variable usually represents the metabolite concentrations, while the independent variable is the enzyme activity. These variables are usually in the form of nonlinear functions. The reaction rate v(x) can be represented by using the power-law function, which has the following form [6,15]: In this representation, coefficient γ i is denoted as the rate constant for v i and coefficient f ij is the kinetic order. These two coeffcients are derived from the Taylor series in the logarithmic space around a steady state [6,16]. They can be defined as follows: where the subscript 0 refers to the value at the steady state condition.

Problem statement for optimization
The in silico optimization of metabolic pathway production is always constrained by steady state condition. In the steady state condition, all the variables in the metabolic pathway are in static values. This condition forces all GMA models to be equal to 0, and thus produces models as follows: This leads to the solving of a nonlinear equations system, which can be defined as follows [1,9,10,17]: where x = (x 1 , x 2 Á Á Áx n ) denotes n equations and n variables, while f 1 , f 2 Á Á Áf n are the nonlinear functions. In order to solve a nonlinear equations system, all the equations f(x) 1 , f(x) 2 Á Á Áf(x) n must be equal to 0. This situation is similar to the in silico optimization of metabolic pathway production. Therefore, the in silico optimization of metabolic pathway production can be considered as a method for solving a nonlinear equations system. Besides the steady state condition, the constraint of the components in the metabolic pathway also needs to be considered. This constraint exists in order to ensure the concentrations of the components remain within the boundaries to maintain the survival of the microorganism cell [1,6]. Thus, the in silico optimization of metabolic pathway production involves improving the pathway production and at the same time tries to reduce the component concentration involved. The optimization problem can be written as follows: where objective function F 1 is the production (maximum function) and objective function, while F 2 is the total of the component concentrations involved.

Newton cooperative genetic algorithm
In this section, NCGA method is discussed. The NCGA is used for the in silico optimization of metabolic pathway production. The proposed method treats the metabolic pathway as a nonlinear equations system. In dealing with a nonlinear equations system, the Newton method is applied, and then GA is used to represent the variables in the system. This is necessary in order to tune the variables to search for the optimum set. GA loses its effectiveness when it is applied in a complex metabolic pathway as the representation of the solution becomes complex and difficult to evaluate. Hence, CCA is applied to overcome this particular problem. The present study introduces a modified GA, which includes the NGCA representation concepts and the two-level fitness evaluation. Fig 1 shows the simplified flow chart of the NCGA. In the figure, the concepts of the NCGA representation and the two-level fitness evaluation are highlighted by the dashed lines. The detailed flow of the algorithm is as follows: Step 1. Generate the initial N sub-chromosome in N sub-population. Each sub-chromosome represents each variable in the nonlinear equations system. The number of N sub-populations depends on the number of variables in the nonlinear equations system. The example of representation of all variables in nonlinear equations system by sub-chromosomes can be viewed in Fig 2. The sub-chromosome is in binary representation.
Step 2. Evaluate the sub-chromosomes. In this step, a representative of the sub-chromosome from all the sub-populations is selected based on the fitness value, where the sub-solution with the lowest fitness value is selected first. This is done to ensure that the representatives from all the sub-populations will combine with each other in order to minimize the total component concentrations involved. This step is referred to as the sub-chromosome evaluation.
Step 3. Form the cooperative chromosome. After all the representatives are selected from their sub-populations, they are combined with each other to form a complete solution known as the cooperative chromosome. This concept, referred to as the cooperative chromosome representation, is pictured in Fig 3. Step 4. Evaluate the cooperative chromosome. In this step, the cooperative chromosome is tested by the Newton method. A condition at this stage is whether or not termination has occurred. The termination condition can occur in two conditions; when the maximum number of generations has been reached, and when the component concentration constraint is in the optimum range. If the termination condition is achieved, then the process skips directly to Step 8.   Step 5. Transform the cooperative chromosome back into sub-chromosomes. In this step, the cooperative chromosome is decomposed back into sub-solutions with all the sub-chromosomes are going back into their own sub-population. The purpose of this step is to select subchromosomes in order to perform the reproduction process.
Step 6. Select the sub-chromosome. The selection process is based on the fitness values of the sub-chromosome, whereby the sub-chromosome with the lowest fitness value is selected first. This is intended to minimize the total of the component concentrations involved by making all sub-solutions that have lowest fitness value combine with each other. The process is performed until the last sub-chromosome has been selected.
Step 7. Reproduce a new generation. Crossover and mutation procedures are applied on the selected sub-chromosomes. This is performed in order to produce a new generation that has better quality compared to the previous generation. Then, the new generation goes back to Step 2.
Step 8. Return the best solution. This step decodes the cooperative chromosome into the solution. Fig 4 shows the NCGA algorithm in pseudocode format.

Case studies
To prove the effectiveness of the NCGA, a program was used and tested on the metabolic pathway benchmark, namely the optimization of ethanol production in S. cerevisiae and the optimization of trp production in E. coli pathway. The program written in Java language that is known as jMetal [18] was used. Besides that, JAMA version 1.0.3 [19] was used to handle the nonlinear equations system and integrate with the jMetal program. The jMetal program can be downloaded from http://jmetal.sourceforge.net/index.html while JAMA version 1.0.3 from http://math.nist.gov/javanumerics/jama/.
Case study 1: Optimization of ethanol production in S. cerevisiae pathway Modelling framework. The GMA model derived from the ethanol production by S. cerevisiae was used as the case study in order to observe the performance of NCGA. The GMA model derived from the S. cerevisiae pathway was suspended in a cell culture at pH 4.5 as recommended by Galazzo and Bailey [20]. This pathway has been applied in many studies over the years. Fig 5 depicts a schematic representation of the S. cerevisiae pathway. The GMA model of this pathway is described as follows: The variables and the corresponding steady state are summarized in Table 1. At the initial steady state, all the fluxes in the model were formulated in the following form:  Optimization problem. The performance of the method proposed in this work can be assessed by the rate of ethanol production given by the flux of pyruvate kinase, V PK . In addition, the total of the component concentrations involved must be considered. Thus, the optimization problem of this case study can be defined as follows: The optimization problem was subjected to the steady state condition, where all the GMA models were equal to 0: The constraint of the component concentrations involved was also considered. This is to ensure that the microorganism is still workable. In this case study, the components were categorized into two groups, which are metabolite and enzyme. The metabolites (X 1 − X 5 ) were set approximately 20% from their steady state values, which were in the range of 0.8-1.2. For the enzymes, not all enzymes were tuned as only Y 1 − Y 6 were involved. The values for enzymes were set in the range of 0-50 [1,21].

Case study 2: Optimization of trp biosynthesis in E. Coli pathway
Modelling framework. In this pathway, NCGA was used to optimize the trp production. This pathway is introduced by Xiu et al. and detailed description can be found in [22]. This pathway is depicted in Fig 6 and the detail is given in Table 2. This pathway has the following GMA model: where X 1 is the mRNA concentration, X 2 is the enzyme concentration and X 3 is the trp concentration. In steady state condition, all components in this pathway (variables in Table 2) have the following rate: Optimization problem. The first objective function was to maximize the trp production given by the reaction of V 34 [23]. In addition, the second objective function was to minimize the total concentrations involved. Thus, the optimization problem of this case study can be defined as follows: Similar to case study 1, the optimization problem was subjected to the steady state condition. Hence, this produced GMA models with the following form: In this case study, not all of the components were tuned. Only seven components were tuned, which are X 1 − X 6 and X 8 . For X 1 − X 3 , the component concentration constraints were in the range of 0.8-1.2; for X 4 , the component concentration constraints were in the range of 0-0.00624; for X 5 , the component concentration constraints were in the range of 5-10; for X 6 , the component concentration constraints were in the range of 500-5000 and for X 8 , the component concentration constraints were in the range of 0-1000 [4][5][6].

Results and discussion
Results and discussion Many parameter settings were used while performing the experiments. Table 3 summarizes the parameter settings used when the best result was obtained. For the Newton method, fixed parameters were used; the maximum number of iterations was fixed to 50 and the tolerance was set to 10 −6 .
In case study 1, the full results obtained by the NCGA are presented in Table 4. The table gave the best result achieved, the average of 100 experiments, the standard deviation and the comparison with other works. For the best ethanol production, the NCGA was able to increase the ethanol production up to 52.91 times from its initial steady state value. Several other works were compared with the NCGA in order to assess its performance. As seen in the table, the NCGA produced the highest amount of ethanol compared to the others. Besides that, the NCGA was able to reduce the total amount of the component concentrations involved, with the total concentration was only 294.80. The complete result achieved using NCGA in case study 2 is given by Table 5. The table summarizes the best result achieved, the average of 100 experiments, the standard deviation and the comparison with other works. For the best solution, NCGA was able to produce trp up to 3.9759 times the initial steady state value. When the comparison was made with other works, the result produced by NCGA was the highest. For the total component concentrations involved, NCGA was able to reduce the total amount of the component concentrations involved to 6006.5581, where it was the lowest value compared to other works.
In evaluating the concept of the NCGA representation, it was compared with the single chromosome representation (traditional GA). Several experiments were conducted in evaluating this concept and the parameter settings as indicated in Table 3 were used. Fig 7 and Fig 8 give the comparison of the results. From both figures, it can be observed that all the production results that utilized the NCGA representation concept were higher compared to the results that only used single chromosome representation. This is because each variable in a nonlinear equations system was represented by multiple sub-chromosomes. In contrast to that, a single representation of chromosome represents all the variables of nonlinear equations system into a single chromosome. As the concept of NCGA representation allowed each variable in the nonlinear equations system to be represented by many sub-chromosomes and reproduced in their own sub-population, this made all variables were tuned in order to produce the optimum result. This did not happen in the single representation of the solution, as there were possibilities that not all variables were tuned because all variables were grouped together into only a single representation. As a conclusion, applying the concept of sub-chromosome and cooperative chromosome representation enables the improvement of the production. In validating the two-level evaluation concept that was introduced in this study, several experiments were conducted where the representatives from all sub-populations were selected randomly based on their fitness value. The experiments used the parameter settings in Table 3 and the number of maximum generation was fixed to 300. The graph in Fig 9 and Fig 10 depict the results obtained in case study 1 and case study 2 respectively. In both figures, it was found that the representatives selected based on their fitness value were able to minimize the total of the component concentrations involved compared to the representatives that were selected randomly. Furthermore, the total of the component concentrations involved for the representatives that were selected based on the fitness value in Fig 9 and Fig 10 decreased slightly from the 1 st generation to the 300 th generation. For the total of the component concentrations involved of the randomly selected representatives, it was found that the total of component concentrations involved in the next generation sometimes were higher than the previous generation. This might be due to the random element in selecting the representatives to form cooperative chromosome. This could happen when a sub-chromosome in the current generation with higher fitness value compared to others in their sub-population was selected  randomly as a representative, and then combined with other representatives from other subpopulations. This made the total of the component concentrations involved became higher in the next generation. In conclusion, the two-level evaluation concept that was introduced in this study was able to reduce the total of components concentrations involved and allow the NCGA to perform the optimization process smoothly.
Several experiments were carried out in order to investigate the reliability of the results obtained by the NCGA. The reliability of the results can be assessed by the average results that are given in Table 4 for case study 1 and Table 5 for case study 2. For case study 1, it could be observed that all component concentrations involved were in their optimum range, thus proved that the NCGA was able to produce reliable results. In addition, the ethanol production rate was higher and the total of the component concentrations involved was the lowest compared to the previous works. Based on this observation, it can be concluded that the NCGA demonstrated the reliability in handling the optimization problem in case study 1. In case study 2, it was found that all component concentrations involved were placed in their optimal range, therefore confirmed that the NCGA was capable of producing reliable result. Besides that, the trp production rate was higher and the NCGA was able to reduce more for the total of the component concentrations involved compared to previous works. Based on this finding, the NCGA has shown its ability in the optimization of case study 2.
Besides comparing with previous works, the production of NCGA was also being compared with the Newton method with traditional GA (single chromosome representation). 100 experiments were performed and the best result, the average and the standard deviation were recorded. For the results that were produced by NCGA in case study 1 and case study 2 are given in Table 4 and Table 5 5 respectively, while the result that was produced by the single chromosome representation is as follows; for case study 1, the best solution result for the ethanol production rate is 52.57 and the total of the component concentrations involved is 297.38, the average for the ethanol production rate is 52.40 and the total of the component concentrations involved is 296.65, and the standard deviation for the ethanol production rate is 0.0826 and the total of the component concentrations involved is 0.3944; for case study 2, the best result for the trp production rate is 3.9570 and the total of the component concentrations involved is 6007.7884, the average for the trp production rate is 3.9510 and the total of the component concentrations involved is 6007.9811, and the standard deviation for the trp production rate is 0.0049 and the total of the component concentrations involved is 0.4872. It can be observed that all the production result produced by NCGA was higher compared to the result that was produced by traditional GA in terms of the best solution and the average for all case studies. For the total of the component concentrations involved results (the best solution and the average), NCGA was able to reduce more compared to the Newton method with traditional GA. For the standard deviation, NCGA was able to achieve smaller value of standard deviation compared to the Newton method with traditional GA. This shows that the NCGA is able to find the similar result. This is because with smaller value of the standard deviation points out the consistency and precision of the NCGA.
Other than the production and the total of the component concentrations involved, the computation time (in second) in performing experiment also needs to take into account. Table 6 gives the comparison of the computation time taken in performing experiment for NCGA and Newton method with traditional GA. Generally, the result shows that the NCGA requires more computation time compared to the Newton method with traditional GA. This might be due to the concept of the NCGA representation and the two-level evaluation concept where NCGA needs more time to perform these two processes in order to obtain good results compared to the Newton method with traditional GA.

Conclusion
In this paper, an improved method for in silico optimization of metabolic pathway production known as NCGA has been proposed. The NCGA was developed to improve the metabolic pathway production and at the same time minimize the total concentration of the components involved. The NCGA comprises a combination of Newton method, GA and CCA. The Newton method deals with the metabolic pathway and the GA is employed in the optimization process, with the GA representing the components in the metabolic pathway as a solution. However, as explained in this paper, the representation of the solution becomes complex when it is applied in a complex metabolic pathway. To overcome this situation, the use of the CCA was proposed in order to simplify the representation of the solution by decomposing the solution into multiple sub-solutions. In addition, this study introduced two concepts to make the NCGA perform well, namely the concept of representing the solution and a two-level evaluation of the solution. The proposed method was applied in two case studies, and it showed better results than those reported in previous studies. We would also like to express our gratitude to the editor and anonymous reviewers who reviewed this paper.