Robust Optimal Design of Experiments for Model Discrimination Using an Interactive Software Tool

Mathematical modeling of biochemical processes significantly contributes to a better understanding of biological functionality and underlying dynamic mechanisms. To support time consuming and costly lab experiments, kinetic reaction equations can be formulated as a set of ordinary differential equations, which in turn allows to simulate and compare hypothetical models in silico. To identify new experimental designs that are able to discriminate between investigated models, the approach used in this work solves a semi-infinite constrained nonlinear optimization problem using derivative based numerical algorithms. The method takes into account parameter variabilities such that new experimental designs are robust against parameter changes while maintaining the optimal potential to discriminate between hypothetical models. In this contribution we present a newly developed software tool that offers a convenient graphical user interface for model discrimination. We demonstrate the beneficial operation of the discrimination approach and the usefulness of the software tool by analyzing a realistic benchmark experiment from literature. New robust optimal designs that allow to discriminate between the investigated model hypotheses of the benchmark experiment are successfully calculated and yield promising results. The involved robustification approach provides maximally discriminating experiments for the worst parameter configurations, which can be used to estimate the meaningfulness of upcoming experiments. A major benefit of the graphical user interface is the ability to interactively investigate the model behavior and the clear arrangement of numerous variables. In addition to a brief theoretical overview of the discrimination method and the functionality of the software tool, the importance of robustness of experimental designs against parameter variability is demonstrated on a biochemical benchmark problem. The software is licensed under the GNU General Public License and freely available at http://sourceforge.net/projects/mdtgui/.


Introduction
The investigation of biochemical processes inherent to any biological systems such as bacteria, plants, animals or even humans is an important component of current systems biology approaches. To simulate such dynamic processes in silico, mathematical modelling based on ordinary differential equations (ODEs) represents a powerful and broadly used tool [1]. Although these models mostly do not represent the exact biochemical mechanisms of the investigated phenomena, they are useful to get deeper insights into the underlying processes or allow to make proposals for future experiments [2,3]. The identification of suitable hypothetical models that simultaneously perform well on acquired measurements of different experiments is a challenging task. Of course, if a model is over-parametrized, it is possible to perfectly reproduce observed measurement data while concurrently loosing the generalization capabilities of the model [4]. On the other hand, if the number of model parameters is not sufficient, the model might be unable to fit the data at all. This means that a good model should be a trade-off between being as simple as possible while maintaining a good approximation of observed measurement data [5].
Frequently, several models perform equally well on an initial measurement data set. Instead of performing a huge number of different experiments to determine the appropriate hypothesis, simulating and re-designing experiments in silico before moving back to real experiments is a reasonable approach. It is therefore desirable that newly designed experiments allow to evaluate the appropriateness of the investigated models and possibly discard incorrect models. This process is also referred to as optimal experimental design for model discrimination or model selection. Current methods for model discrimination range from exhaustive search methods [6] over probabilistic approaches [7] to optimization based approaches [6,8,9]. The approach considered in this work is based on reformulating the calculation of new experimental designs as a constrained nonlinear optimization problem. Therefore, an optimality criterion is maximized with respect to the involved experimental design variables such as the arrangement of measurement time points, the initial species concentrations, the experimental duration and external perturbations to the system. The concentration time courses of the participating species are attained by efficiently solving an initial value problem for the ODE based formulation of the biochemical reaction equations using numerical integration. Derivatives are generated using algorithmic differentiation which in turn allows to make use of efficient derivative based optimization algorithms [10]. The interior point optimization algorithm Ipopt is used to solve the constrained nonlinear optimization problem [11,12].
Another aspect that arises upon incomplete a priori knowledge of certain involved parameters is the calculation of robust optimal designs. The intention of robust optimal designs for model discrimination is to maintain the power to select between several hypothetical models, even for the most inappropriate parameters that can be observed in a given setup. This is achieved by reestimating parameter values in addition to calculating the discriminating designs.
In this article we present a comprehensive extension to the previously presented command line tool ModelDiscrimination-Toolkit by Skanda and Lebiedz [13]. The extension comprises a convenient graphical user interface for the interactive analysis of biochemical models as well as an implementation of the robustification approach described in [10]. Initially, we provide the reader with a brief introduction to the discrimination approach presented in [10,13], introduce the basic concepts of the robustification method and discuss its realization in the Mod-elDiscriminationToolkitGUI. The results section emphasizes the importance of robust experimental design and demonstrates the capabilities of the ModelDiscriminationToolkitGUI on two examples, including a realistic benchmark problem on a biochemical network in an artificial organism [6]. Finally, the work is summarized and an outlook on further work is given in the last section.

Model representation
A widely used approach to model biochemical reaction networks is their formulation as a set of coupled ordinary differential equations (ODE). The temporal concentration change dy dt of a certain species y is defined as a function of the concentrations of the involved species with respect to the corresponding rate constants according to the law of mass action [14]. A comprehensive treatment of the modeling of biochemical processes using ordinary differential equations can e.g. be found in [14,15].
As the models specified this way usually do not have an analytical solution, the equation system has to be solved numerically [16]. If the kinetic parameters of the reactions are not known in advance, they have to be estimated on given measurement data. By specifying the initial concentrations y I for all participating species, the ODEs can be solved using numerical procedures for solving initial value problems as described later. All models considered in this work are formulated as such a set of ordinary differential equations. The next sections show how to calculate new experimental designs for the discrimination of several plausible hypotheses formulated as ODE models.

Model discrimination
The central goal of model discrimination is to calculate new experimental designs in such a way that the time courses of the rival model responses are maximally separated. Hence, an objective function is needed that evaluates the distance between the trajectories of the model hypotheses in a suitable way. In our case the objective function is derived by the Kullback-Leibler (KL) divergence, which is a non-symmetric measure for the distance of two probability density functions, as described in [17]. A detailed derivation of the adjusted optimization criterion as well as the statistical background can be found in [10,13]. The general objective function for a discrete set of measurement points and possibly unequal variance functions s 1 j (t i ,j,h 1 ),s 2 j (t i ,j,h 2 ) of the two models reads as follows: In Eq. (1), the sum of squared differences is scaled by the corresponding variance functions of the models. For the homoscedastic case, Eq. (1) reduces to the sum of squared differences between the responses of the two models. In Eq. (1), m is the number of species, n is the number of measurement time points and y k j (t i ,j,h k ) with k[f1,2g is the value of the time course trajectory of species y j with j~1,:::,m of the model k. The variables t i with i~1,:::,n represent the measurement time points, j[J(R d is a d-dimensional experimental design vector from the design space J and h k [H k are the model parameters contained in the respective parameter spaces H k . Note, the objective function in Eq. (1) is non-symmetric and therefore depends on the ordering of the models. In the following the ordering of the models in the objective function is not regarded for convenience.
An optimal experimental designĵ j maximizes the objective function J(j,h 1 ,h 2 ) with respect to constant parameter vectors h 1 and h 2 . On the other hand, the objective function J(j,h 1 ,h 2 ) can also be used to determine worst-case parameters for a given design j corresponding to most similar model responses. In this case, the experimental design is considered constant and the desired parameter vectors are subject to optimization. Of course, the criterion has to be minimized if worst-case parameters, which produce the most similar model responses, should be identified. The functions H(Dt i ),H H(c i ) are weighting the model differences at a single measurement point depending on the time interval Dt i~ti {t i{1 and the current species quantity perturbations c i to the system. H(Dt i ) is used to avoid that time intervals fall below a predefined minimum measurement interval Dt min , which models restrictions of the measurement method. In turn, the functioñ H H(c i ) prevents simultaneously performed measurements and perturbations, which is demanded in some of the considered experiments. Mathematically this can be modeled by so called Heaviside functions that map from the real numbers to the set f0,1g.
The definition of the Heaviside functions can be found in Eq. (2) and Eq. (3). These Heaviside functions are not differentiable and therefore continuous differentiable approximations of these functions are needed in order to apply derivative based optimization algorithms. An approximation of these Heaviside functions based on parametrized hyperbolic tangent functions is given in [10,13].

Robust optimal designs
The parameters of the investigated models are usually estimated on available measurement data. Due to measurement inaccuracies, there is uncertainty about those parameters up to a confidence region representing a level of significance. Additionally, biological systems might involve intrinsically distributed parameters leading to the effect of distributed determinism as pointed out in [18,19]. Hence, it is desirable to have new experimental designs with both great discrimination power as well as keeping the ability to distinguish between the models even for inappropriate parameter settings within some confidence region. These demands naturally lead to the problem of finding a worst-case optimal experimental designĵ j with respect to appropriate parameter spaces H 1,2 and thusĵ j is referred to as robust optimal design. Mathematically, the problem of finding a robust optimal designĵ j can be formulated as a max-min optimization problem in the following way:ĵ Here, the experimental design vector j~(y I ,t,c)[J(R d consists of the initial species concentrations y I , perturbations c and an optimal arrangement of measurement time points t. The set J represents the space of feasible experimental designs and h 1,2 [H 1,2 are the feasible model parameters for the current design j. We solve the problem stated in Eq. (4) iteratively by an outer approximation algorithm, see e.g. [20]. Each iteration N of the outer approximation algorithm consists of two phases. In the first phase worst-case parametersh h N 1,2 [H 1,2 for the current optimal experimental designĵ j N [J are re-estimated. (For the first iteration with N~0,ĵ j 0 corresponds to the user supplied initial experimental design.) These worst-case parametersh h N It can be shown [20,21] under reasonable assumptions thatĵ j N converges to a critical pointĵ j of the problem stated in Eq. (4), i.e. j j N ?ĵ j as N??. Thus, the values obtained for the objective functions J of the discrimination run in phase two and the parameter re-estimation in phase one of the outer approximation algorithm are converging to the same value. The absolute value of the difference of the objective functions, which we call the robustification gap D RG , can be used as a stopping criterion of the max-min optimization problem: Eq. (5) shows the definition of the robustification gap as well as the stopping condition. The value for d is usually chosen to be a small positive number, e.g. 10 {6 , depending on the demanded accuracy and the problem under consideration. As soon as the robustification gap falls below a desired value, the discrimination algorithm can be stopped. To lower the chance of receiving only local worstcase parameters, several random initializations of the parameter estimation within a feasible range can be performed.

The optimization problem
The full robust optimal design problem which is considered in this paper can be formulated in the following way: subject to y min I ƒy I ƒy max Dt min i ƒDt i ƒDt max i ,i~1,:::,n, ð8Þ c min i ƒc i ƒc max i ,i~0,:::,n{1, ð9Þ The constraints for the initial species vector y I are necessary to avoid infeasible species concentrations, e.g. preventing negative concentration values. Since we demand that an experiment has to be performed in a fixed time span, the time intervals Dt i~ti {t i{1 ,i~1,:::,n, have to sum up to the fixed experimental duration T end and have to lie in the range ½t min i ,t max i . Finally, the constraints for the perturbations c i ensure that these values also lie in a reasonable range. The numerical realization and a more thorough mathematical formulation of this semi-infinite inequality and equality constrained optimization problem (SIECP, [20]) for the purpose of robust optimal design of experiments is given in [10,21].

Numerical tools
To solve the initial value problems for the ODE models, a backward differentiation formula (BDF) integrator is used and implemented in a multiple shooting setup [21,22]. Derivatives needed by the optimization algorithm are calculated using the automatic differentiation software CppAD and the optimization problem is solved using an interior point approach implemented in the Ipopt package [11,23]. Additionally, a so called homotopy method, which introduces new constraints on the parameter values slowly to subsequent optimization steps, is used to improve the convergence of the outer approximation algorithm [10,21].

The graphical user interface
To make the functionality of the ModelDiscriminationToolkit available to non-programmers, a convenient graphical user interface, namely the ModelDiscriminationToolkitGUI, has been developed, delivering easy and interactive access to the abovementioned model discrimination techniques. An exemplary screenshot of the ModelDiscriminationToolkitGUI is given in Fig. 1. The application is separated into the three labeled areas. The plot window (1) displays all time courses and derivatives of the currently enabled species. It provides direct feedback of the optimization steps that are performed during the discrimination and can also be used to get a better understanding of the model behavior. Experimental conditions like initial species, measurement intervals and perturbations can directly be altered within the plotting window. The models are immediately simulated again and therefore different possible discrimination scenarios can be tested before starting the actual optimization routine. The console window (2) displays the output generated by the ModelDiscrimi-nationToolkit and the optimization package Ipopt. Additionally, the intermediate experimental designs, worst-case parameters as well as the final results are printed there. The output format of the experimental design components such as time intervals and perturbations are given in MATLAB matrix notation and stored on disk in order to facilitate further processing of data. The settings window (3) can be used to adapt e.g. the experimental conditions, robustification properties, Ipopt parameters and integrator settings. A threaded implementation ensures that the GUI remains responsive to the user, e.g. allowing live update of the new experiment during optimization and a pause functionality for further investigations.

Implementation details
The ModelDiscriminationToolkitGUI is developed in the C++ programming language using the Xcode IDE on Mac OS X for development. For the creation of the graphical user interface, the Qt API (4.7) by Nokia, which is freely available for noncommercial projects, is used [24]. The ModelDiscrimination-Toolkit requires the interior point optimization package Ipopt [11,12] with the linear solver MA27 [25]. Throughout the presented implementation, the Ipopt version 3.9.2 is used. To fully exploit the parallel processing capabilities of modern hardware, the numerical integration is parallelized using POSIX threads. In addition to detailed installation instructions, getting started tutorials, Xcode project files and a Unix makefile, a template model file is provided to built and test the ModelDis-criminationToolkitGUI on different Unix based platforms. The application has successfully been compiled and executed on both Mac OS X 10.8 and Ubuntu 12.04.

Results and Discussion
In this section we present the results of two exemplary discrimination problems that demonstrate the importance of the robustification approach as well as the capabilities of the ModelDiscriminationToolkitGUI. The first example deals with two models for enzyme catalyzed reactions. In the second example, the tool is applied to a realistic benchmark problem for model discrimination published earlier by Kremling et al. [6]. All calculations are performed using the current version of our software tool. The complete software package, all discussed models as well as the project files and result logs are offered for download on the project website http://sourceforge.net/projects/mdtgui.

Hysteretic enzyme vs. simple enzyme
The enzyme catalyzed product molecule formation from certain substrate molecules is an ubiquitous process in living organisms. In this discrimination example we identify the appropriate enzyme model for a given set of measurement data. The most simple enzymatic reaction where a steady-state is reached directly after a short initial phase according to the reaction equations EzP can be modeled by the following set of ODEs: This model considers four species, namely substrate S, enzyme E, enzyme-substrate-complex ES and the product P. The substrate binds to the enzyme with a certain affinity, forming an enzyme-substrate-complex denoted by ES with a rate constant k 1 . The rate constant k {1 describes the dissociation of the enzymesubstrate-complex back into the single components and the rate constant k 2 describes the decay of the enzyme-substrate-complex into free enzyme and the product. After a short phase of building the enzyme-substrate-complex, the concentration of the free enzyme approaches zero and the reaction enters a steady-state with constant turnover rate. After the substrate is depleted, the turnover rate reduces and the free enzyme concentration increases again [26].
Contrary to this, reactions catalyzed by so called hysteretic enzymes are characterized by a distinct lag phase before a steadystate is entered [27]. The RNAse and wheat germ hexokinase, for instance, are monomeric enzymes showing hysteretic behavior [26,28]. This phenomenon, also known as kinetic cooperativity, A lack-of-fit test, which is performed on both models with the generated experimental data, ensures that none of these models can be discarded in advance. The depicted experiment is used as initial design of the discrimination approach. doi:10.1371/journal.pone.0055723.g002 can be explained by the slow transition model [26]. In contrast to cooperativity that can be observed when several subunits of a multimeric enzyme are interacting upon substrate binding, this model gives a plausible explanation for sigmoidal behavior of monomeric enzymes due to slow conformational changes. The enzyme exists in two conformational states E and E', termed inactive and active, respectively, as they heavily differ in their catalytic activity. In absence of substrate the enzyme primarily exists in the inactive conformation. Binding of substrate molecules to the inactive enzyme initiates the conformational change and the enzyme is then able to work at a much higher efficiency until the substrate is depleted. The slow transition into the active conformation and a higher turnover rate of the active conformation, are the reasons for the sigmoidal shape of the substrate and product concentration curves plotted against the time. According to [27], these two enzymatic states can be modelled by the following reaction equations: Again, these equations have to be reformulated as a set of ODEs for further numerical analyses:  Table 1. The correct parameters of the hysteretic enzyme model used for data generation. Parameter: As soon as the rate constant k 6 is significantly larger than k 5 and the transition rate between ES and ES' is sufficiently small, this model shows a clear sigmoidal shape for both the substrate and the product concentration over time. In this example three substrate molecules are used to form one product molecule, which is the reason for the factor 3 present in Eq. (11) and Eq. (15).
Data generation. Due to the unavailability of real experimental data, artificial measurements are generated using the hysteretic model. The correct parameters listed in Tab. 1 are chosen such that a nice sigmoidal shape for both the substrate and the product concentrations over time is received. The hysteretic model is simulated over 300 time units and measurements are performed every 10 time units. Initial concentrations for the substrate and the enzyme are set to 1:0 and 0:001, respectively, and the remaining species values are set to zero. No perturbations are performed during the initial experiment. Five independent random data sets are simulated using the hysteretic model with an additive normally distributed error term e i at measurement time point t i with standard deviation s~0:1 and mean m~0:0. Each model-generated value y i is altered toỹ y i~yi ze i for all n time steps in order to simulate measurement inaccuracies. As the standard deviation is equal for all measurements, the homoscedastic version of the objective function shown in Eq. (1) is used.
For the further discrimination process the parameters of both models are assumed to be unknown and are simultaneously estimated on the generated measurement data. All parameters are initialized to one and subsequently fitted to the observed data in a least squares sense. Fig. 2 shows the substrate and product time courses of both models with estimated parameters, which serves as initialization of the discrimination method. To verify that none of the models can be discarded on the initial experimental data, a lack-of-fit test based on the decomposition of the regression curve residual variance is performed [29][30][31]. The test fails to reject the null hypothesis of an appropriate fit for both models (data not shown), i.e. it is not clear which hypothesis underlies the given set of measurements. Due to the absence of a noticeable trend of the residuals, a similar conclusion can be drawn based on the residual vs. run time plot shown in Fig. 3. Estimated parameters for the simple model and the hysteretic model are listed in Tab. 2 and Tab. 3, respectively.
The goal for the subsequent analysis is to determine an optimal experimental design that allows to reliably identify the correct hypothesis out of the two candidates. Additionally, it is demanded that the optimal design is robust to parameter variations of the simple model, i.e. even if the parameters of the simple model are re-estimated, a clear distinction of the models should be guaranteed.
Results and discussion. For this discrimination example the parameters of the simple model k 1 ,k {1 ,k 2 are subject to robustification. The box constraints for the parameters listed in Tab. 4 are estimated using the graphical user interface with random measurements generated by the correct model. Parameter values, where one of the curves of the simple model clearly exceeds the generated random measurement data are used as boundaries for the robustification. The parameters for the hysteretic model are fixed for the discrimination and are listed in Tab. 3. The initial substrate concentration is set to 1:0 and constrained to the range ½0:5,3:0. The initial enzyme concentration is set to 0:001 and restricted to the interval ½0:0005,0:003. All other participating species are initialized to 0 and the initial concentrations of these variables are not subject to optimization. The multiple shooting bounds are not restricted, i.e. the bounds are set to +1e19. The The estimated parameters for the simple enzyme reaction model. Based on measurements generated by a hidden hysteretic model with known parameters, the unknown parameters of the simple candidate model were fitted to five independently generated data sets in a least squares sense. doi:10.1371/journal.pone.0055723.t002 Table 3. The estimated parameters for the hysteretic enzyme reaction model.
Parameter: The estimated parameters for the hysteretic enzyme reaction model. Based on measurements generated by a hidden hysteretic model with known parameters, the unknown parameters of the hysteretic candidate model were fitted to five independently generated data sets in a least squares sense. doi:10.1371/journal.pone.0055723.t003      These jumps clearly vanish the closer the algorithm converges to an optimum. Once the homotopy is started, here at iteration 20, no further jumps occur. This demonstrates the usefulness of the homotopy strategy, which stabilizes the convergence of the algorithm. Forcing the algorithm to reach such a small robustification gap is mostly not needed for practical applications, though. Since the accuracy of the used measurement method is the limiting factor, d can be chosen relative to the measurement variance. Part B of Fig. 4 shows the values of the objective functions. The solid line represents the objective value of the new optimal design with respect to the parameter sets of the simple model found so far. The dashed line in turn represents the KLdivergence of the new design with re-estimated worst case parameters. It can be observed that even for designs that lead to huge KL-divergence values, e.g. after the second iteration, new parameters can be found that significantly decrease the model distances. After multiple iterations the objective values are approaching a constant value and the robustification gap finally drops below the stopping criterion. The optimum design proposed after the first few iterations of the algorithm uses the lowest possible initial species concentrations and nearly all perturbations are set to the maximum value. This clearly differs from the final result and causes the large jump of the objective values that can be observed after the second iteration.
In After the identification of the optimally discriminating design, none of the models is able to perfectly describe the data using the initially identified parameters. The parameters are re-estimated on measurements of both, the initial design and the new design. The re-estimated parameters for the two models are listed in Tab. 5 and Tab. 6, respectively. Simulating the models again with the reestimated parameters, as shown in Fig. 6, the hysteretic model nicely fits the data of the initial experiment and the optimized experiment. In contrast to this, the simple model either lies below or above the measured data points. Since the residuals of the simple model shown in Fig. 7 follow a distinct pattern, the simple model can be assumed to be inappropriate for the observed measurements. Hence, the preconditions for the lack-of-fit test, which are independent normally distributed residuals with equal variances, are not fulfilled for the simple model. The time series of the simple model as well as the corresponding residual plot clearly show its poor fit. The residuals of the hysteretic model are randomly distributed around zero without showing any noticeable pattern. Also note that the residuals for the substrate curve of the simple model vary in a range of ½{0:5,1, which is by far larger than the corresponding residuals of the hysteretic model. The Fvalues of the lack-of-fit test for the hysteretic model are 0:8066 for the substrate curve and 0:5494 for the product curve. The critical value is given by F (df1~15,df2~104,a~0:05)~1 :7636 for five independent measurements, each of which consists of 26 measurement points. As the calculated values clearly lie below the critical Fvalue, the model cannot be rejected. Of course, the values of the lack-of-fit test heavily depend on the measurement data that are being used. So the values presented here should just be considered as exemplary result values of such a test. Multiple repetitions with different randomly disturbed data sets failed to reject the hypothesis, though. Considering the described goodness of fit methods, the hysteretic model is appropriate for the given experimental data.
To verify that the KL-divergence of 1:6416 found for the optimal design is the smallest distance that can be observed, a histogram plot is used. 1000 random initializations of parameters of the simple model are performed with respect to the box constraints mentioned above and the KL-divergence values are summarized into 100 equally spaced histogram bins. Some outliers that occurred rarely in the KL-divergence range of ½100,300 are not shown in Fig. 8. All measured distances are greater or equal to the red bar which indicates the KL-divergence for the worst case parameters. Therefore, the calculated design is able to distinguish between the models at least by this given worst case KLdivergence. Similarly, Fig. 9 shows a histogram plot of the KLdivergences for two experiments that have been optimized without robustification of parameters. Both histograms contain bins with KL-divergences smaller than the proposed KL-divergence of the optimal design indicated by the red bar. Therefore, no conclusions about the actual performance of the calculated experiments can be drawn. This shows the benefit and necessity of the robustification approach, as for robust optimal designs, no parameters can cause a lower discrimination power between the two models than the one observed for the proposed design. The re-estimated parameters for the simple model on the initial measurement data and measurement data for the identified optimal design. For correct hypotheses it should be possible to determine a parameter set that fits to all available measurements. doi:10.1371/journal.pone.0055723.t005 Table 6. The re-estimated parameters for the hysteretic model.
Parameter: The re-estimated parameters for the hysteretic model on the initial measurement data and measurement data for the identified optimal design. For correct hypotheses it should be possible to determine a parameter set that fits to all available measurements. doi:10.1371/journal.pone.0055723.t006 Benchmark for model discrimination In this section we apply our tool on a benchmark for model discrimination as presented in [6]. For convenience, the model is briefly summarized. The example models a biochemical reaction network of a fictitious organism with several participating metabolites. The experiment takes place in a tank reactor which is characterized by the inflow q in and the outflow q out of a substrate with concentration c in .
This reaction network is defined by the following ODEs: In Eq. (21), V is the volume of the tank reactor. As the flow rates q in and q out are assumed to be equal throughout the experiment, the volume change of the reactor _ V V is left out in the following. In Eq. (22) and (23), B represents the biomass concentration and S models the substrate concentration, respectively.
Directly after the uptake of substrate into the organism, M 1 is synthesized and the enzyme E catalyzes the irreversible conversion of M 1 to M 2 . The corresponding ODEs are given in Eq. (24)- (26). The reaction rates r 1 , r 2 and r 3 involve Michaelis-Menten kinetics with the parameters given in Tab. 7. For reaction rates r syn and r 2 a formal kinetic law is used to model the inhibition of the enzyme activity by the concentration of M 2 . The parameters for these terms are listed in Tab. 7. The different involved kinetic laws suggest that the concentration of M 2 influences both the synthesis and the activity of the enzyme. Eq. (31) models the partial substrate uptake by the organism, which is converted into biomass [6].
r 2~k2 : E : m~Y X =S : r 1 : The model described above is the correct model, and is used for the generation of in silico data as described in the next section. The two different hypothetical models are constructed by differently changing some reaction rate equations. For model A the velocity of the enzyme synthesis is altered to be constant, i.e. r synA~ksynmax . In model B the last fraction of r 2 is omitted, which leads to r 2B~k2 : E : . The conversion of metabolite M 1 to M 2 in model A is therefore only regulated by a noncompetitive inhibition of the enzyme by M 2 . For model B the enzyme synthesis is controlled depending on the concentration of M 2 . The non-competitive inhibition of the enzyme activity by M 2 does not play a role for model B anymore, as the corresponding term is left out [6]. Note that in this case both of the considered models are somewhat different from the model used for data generation. Accordingly, the task of the discrimination approach is to figure out which of these models is best suited to approximate the correct model. In the following discrimination example, the initial and intermediate values of the flow rates q in ,q out and the substrate concentration c in , as well as the measurement time points are subject to design. The calculated design is robustified against the parameter K IA of model A.
The parameters of the correct model as given in [6]. Note that the values of K IA and K IB mentioned as correct parameters in [6] are erroneously interchanged. doi:10.1371/journal.pone.0055723.t007 Table 8. Re-estimated parameters for both models on the initial benchmark experiment. Re-estimated parameters for both models on the initial experiment. doi:10.1371/journal.pone.0055723.t008 Data generation. As described by Kremling et al., measurements are generated using the correct model with an additional random noise applied to each simulated concentration value using a relative error model [6]. An independent normally distributed random variable e with standard deviation s~0:1 and m~0:0 is applied multiplicatively, i.e. each value y predicted by the correct model is altered toỹ y~y : The parameters of the correct model can be found in Tab. 7. In order to keep the presented results comparable to the results of Kremling et al. the model parameters as well as the initial species concentrations are taken from the appendix of [6]. Note that the values of K IA and K IB mentioned as correct parameters in [6] are erroneously interchanged. Throughout this section, the values for  Mw, K S , K M1 , K M2 , r 1max and r 3max are used for both considered models and are assumed to be known. The remaining parameters Y X =S , k 2 , k smax , K IA and K IB are estimated according to the simulated measuremental data using the method described in [6]. The value of the parameter K IA given in Tab. 7 is used as an initial guess for the parameter robustification performed in the innerloop of the model discrimination algorithm. The final worst-case value for K IA is given separately for each of the presented experimental designs. In Tab. 8 the parameters estimated on the initial experiment are summarized [6]. For the simulated experiment by Kremling et al. the initial substrate concentration is set to 2:0 g=L and the initial flow rates are set to q in~qout~0 :25 L=h. Measurements are taken every two hours which results in thirty measurement points for the experimental duration of 60 hours. After 20 hours both flow rates are raised to 0:35 L=h and after 30 hours the substrate concentration is reduced to c in~0 :5 g=L. Fig. 10 shows the initial time courses of the two models and illustrates that the models react sensitively on such inflow changes [6]. Additionally, Fig. 10 clearly indicates that both models fit the data equally well. By only taking this experiment into account, a reasonable distinction between the two models would not be possible. Hence, a different experimental design, which is able to directly discard one of the hypothetical models, is desirable. The next sections show the resulting model responses achieved by calculating different input and perturbation profiles for the experiments. In the presented results, usually only the plots for species M 1 and the enzyme are shown, because all curves of the other species are largely the same for the performed experiments. The design variables are the tank reactor flow rates q in and q out , the substrate concentration c in , the measurement time points and two perturbations.
As described in [6], an F-test is used to statistically verify the rejection of inappropriate hypotheses. Therefore, the standard deviation of the residuals is calculated for both investigated models. The ratio of the larger value divided by the smaller value follows an F-distribution and can be compared to the critical value of an F-distribution with corresponding degrees of freedom. Here, the degrees of freedom are the amount of measurements or rather the amount of residuals. Considering for example the standard deviations of the absolute values of the enzyme residuals for the initial experiment s A~4 :463e{3 and s B~3 :989e{3. The critical value for e.g. thirty measurements and a significance level of a~0:05 is given by: In this case the test fails to reject the null hypothesis that the two standard deviations are not significantly different and none of the models can be discarded for the initial experiment. If this ratio would be larger than the critical value the null hypothesis is rejected and the residual standard deviation of model B would be significantly smaller. Note that in this case the residuals are also calculated at perturbation positions, which is the reason they are also plotted in the presented figures.
Results and discussion. The calculation of a robust optimal design presented in this section differs from the approaches carried out by Kremling et al. by additionally allowing an optimization of the measurement and perturbation time points. The substrate Table 9. Re-estimated parameters for both models on the benchmark experiment.  As shown in Fig. 11, the optimal design is found after three iterations of the discrimination algorithm. The homotopy [10] is started after the robustification gap drops below 0.1 in the second iteration and it is performed once until the stopping criterion of 10 {4 is reached. The final KL-divergence value is 13:16 for the calculated design with K IA~0 :2586mmol=gDW . Calculating the robust optimal design including five parameter fits after each discrimination run took approximately 7.5 hours on one core of an Intel Xeon X5460 CPU with 3.16 GHz.
As in the previous example, the initially estimated parameters do not fit multiple measurements of different experimental designs. Therefore, the parameters are fitted simultaneously to measurements of the initial design and the new design. In Tab. 9 the reestimated parameters for both models are shown. For model A the parameters are re-estimated 50 times with random initializations to avoid locally optimal parameters. The best parameter set is listed in the table. The necessity for parameter re-estimation was also mentioned by Kremling et al. where new model parameters had to be determined after each calculation of new designs. Obviously, if a model is correct, it should be possible to find a parameter set that is suited to fit several distinct measurements produced by the underlying biochemical process. If no parameters can be found to simultaneously fit the observed data, the model might be inappropriate. Fig. 12 shows the time courses of both models for the enzyme with the re-estimated parameters. In both cases, model A fits the measurement data worse. The initial flow rate of the new design is raised to 0:4 L=h and the initial substrate concentration is lowered to 0:5639 g=L. The flow rate change remains at 20 hours and it is raised from 0:1 L=h to 0:3 L=h. The substrate concentration change is raised from {1:5 g=L to 1:5 g=L and applied after 34 hours. Two measurement points are removed and the measurements are moved into regions where the largest model distances are observed. The stimulation time points of the new experiment are quite similar to the initial time points. This indicates that the  initial perturbation points are already a good choice. Consequently, the optimization of the stimulation time points could not be exploited for an enlarged model distances in this case. Instead, the discrimination power is raised by rearranging measurement points and increasing the stimulations. The residual plots of the enzyme time courses for measurements of the new experiment are shown in Fig. 13 and clearly support the inappropriateness of model A.
The standard deviation of the enzyme residuals for measurements of the new experiment are 2:0519e{3 and 9:8074e{4 for model A and model B, respectively. As two measurement time points are omitted in the new design and two are used for perturbations, only 26 degrees of freedom are used for the following F-distribution:  Fig. 15 shows the result of this re-estimation. It is obviously impossible to draw a reasonable conclusion which model fits the data worse in this case. An issue that frequently occurs during the calculation of robust optimal designs is related to the constraints of the flow rates. Kremling et al. used ½0:05,1:6 as constraints for the flow rates, which does not work well in the presented discrimination attempt. It is also doubtful if such large flow rates are practically meaningful. In our example, the flow rates are constrained to significantly smaller ranges. Using the graphical user interface, the model behavior upon such value changes can easily be investigated. For example raising the flow rate from 0:25 L=h to a value of 0:5 L=h results in an increased M 1 concentration by factor 300 (data not shown). Of course, the corresponding derivatives get even larger and their calculation may cause numerical problems. In trials with larger flow rate intervals, usually the parameter reestimation still works fine, but during the calculation of a discriminating design, the objective function grows enormously. Ipopt converges to a point of local infeasibility and is finally unable to sufficiently reduce the dual infeasibility. This results in an interrupted optimization after a failed restoration phase. To partially get rid of these problems it helps to reduce the allowed variation of the flow rates and to add the Ipopt settings listed in Tab. 10. Particularly, if input profiles contain perturbations that are already set to the maximum value of the feasible range, the optimization starting point is close to the bounds. In this case Ipopt relaxes the bounds which can result in an infeasible starting point. The Ipopt settings listed in Tab. 10 avoid such a boundary relaxation and make the initial point stay feasible. Additionally, the Ipopt convergence tolerance tol is increased to 10 {5 in order to speed up the calculations and avoid the Solved to Acceptable Level warning. Of course, a reduction of the integrator tolerances and sensitivities might also be possible, but would in turn raise the computation time.

Conclusions
In this contribution, the theoretical and practical concepts that are used for the implementation and application of a convenient software tool for the purpose of robust optimal design of experiments for model discrimination, are presented. It is shown how a modeling approach based on ordinary differential equations can be combined with numerical optimization to calculate new experimental designs that allow to select between several rival models. We demonstrate the relevance of the robustification approach as well as the functionality and usefulness of the ModelDiscriminationToolkitGUI, by applying the software on two artificial biochemical reaction networks. Our approach successfully identifies inappropriate hypotheses with respect to various experimental design constraints. The goodness of fit of the models is evaluated using residual plots and a lack-of-fit test. The robustness of the given designs against parameter changes is verified by use of a KL-divergence histogram of numerous random initialized parameters. The major benefit of the graphical user interface is the ability to interactively investigate the model behavior and optimization strategies. Additionally, the GUI facilitates data handling and keeps the numerous experimental design variables clearly arranged. A general recipe for the calculation of robust optimal experimental designs, however, cannot be given on the basis of the presented examples. It usually makes sense to evaluate different discrimination scenarios, e.g. using the ModelDiscriminationToolkitGUI. The use of reasonable ranges for the involved perturbations, to avoid time courses that are unlikely for an observed biochemical phenomenon, combined with a sophisticated arrangement of the measurement points seems to be a good initial choice, though. The software is licensed under the GNU General Public License and freely available for download at http://sourceforge.net/projects/mdtgui/. In future releases, the ModelDiscriminationToolkitGUI could be extended by a formula parser, e.g. using the well known SMBL format, to further facilitate the investigation of new models and allow to access numerous reviewed biochemical models that are available online [32].