A Parameter-Free Model Comparison Test Using Differential Algebra

. We present a method for rejecting competing models from noisy time-course data that does not rely on parameter inference. First we characterize ordinary differential equation models in only measurable variables using differential-algebra elimination. This procedure gives input-output equations, which serve as invariants for time series data. We develop a model comparison test using linear algebra and statistics to reject incorrect models from their invariants. This algorithm exploits the dynamic properties that are encoded in the structure of the model equations without recourse to parameter values, and, in this sense, the approach is parameter-free. We demonstrate this method by discriminating between different models from mathematical biology.


Introduction
Given competing mathematical models to describe a process, we wish to know whether our data are compatible with the candidate models. Often comparing models requires optimization and fitting time-course data to estimate parameter values and then apply an information criterion to select a "best" model [1]. However sometimes it is not feasible to estimate the value of these unknown parameters (e.g., large parameter space, nonlinear objective function, nonidentifiability, etc.). In this paper, we compare candidate models with time-course data while avoiding the parameter estimation problem by considering a "parameter-free" approach.
The parameter problem has motivated the growth of fields that embrace a parameter-free flavor such as chemical reaction network theory and stoichiometric theory [2][3][4]. However many of these approaches are limited to comparing the behavior of models at steady-state [5][6][7]. Inspired by techniques commonly used in applied algebraic geometry [8] and algebraic statistics [9], methods for discriminating between possible models without estimating parameters have been developed for steady-state data [10,11]. These approaches characterize a model in only observable variables-called a steady-state invariant [5]-using techniques from computational algebraic geometry and determine whether the noisy steady-state data are compatible with this steady-state invariant via a statistical test. However, unlike other Bayesian and parameter estimation approaches, it does not select models; it can only rule them out. Notably the method does not require parameter estimation, hence there is the term parameter-free.
Extending the method developed in [10], we present a method for comparing models using time-course data instead of steady-state data. In this approach we compute input-output equations, which we refer to as input-output invariants for time series data. We consider state-space ordinary differential equations (ODE) models of the forṁ x(t) = f(x(t), u(t), p) and y( ) = g(x( ), p) where ( ) are species variables, = 1, . . . , , ( ) is a known input into the system, = 1, . . . , , ( ) is a known output (measurement) from the system, = 1, . . . , , p is the unknown −dimensional parameter vector, and the functions f, g are rational functions of their arguments. The dynamics of the model can be observed in terms of a time series where u( ) is the input at discrete points and y( ) is the output.
In Sections 4 and 7, we showcase our method with examples from linear and nonlinear models. Finally we discuss special cases and other related topics in Section 8, before concluding in Section 9.

Differential Elimination
We now give some background on differential algebra since a crucial step in our algorithm is to perform differential elimination to obtain equations purely in terms of input variables, output variables, and parameters. For this reason, we will only give background on the ideas from differential algebra required to understand the differential elimination process. For a more detailed description of differential algebra and the algorithms listed below, see [12,14,15]. In what follows, we assume the reader is familiar with concepts such as rings and ideals, which are covered in great detail in [8].

Definition 1.
A ring is said to be a differential ring if there is a derivative defined on and is closed under differentiation. A differential ideal is an ideal which is closed under differentiation.
Let our differential ideal be equipped with a ranking, i.e., a total ordering, denoted <, among the variables and their derivatives. Let ( ) and (]) be arbitrary derivatives. Then the ranking should be such that, for arbitrary positive integer : Let be the leader of a polynomial , which is the highest ranking derivative of the variables appearing in that polynomial. A polynomial is said to be of lower rank than if the order of is less than the order of or, whenever = , the highest algebraic degree of any term containing the leader of is less than the highest algebraic degree of any term containing the leader of . A polynomial is reduced with respect to a polynomial if contains neither the leader of with equal or greater algebraic degree, nor its derivatives. If is not reduced with respect to , it can be reduced by using the pseudodivision algorithm in Section 2.1. A set of differential polynomials = { 1 , 2 , . . . , } that are all reduced with respect to each other is called an autoreduced set.
A useful description of a differential ideal is called a differential characteristic set, which is a finite description of Complexity 3 a possibly infinite set of differential polynomials. We give the technical definition from [12].
Definition 2. Let Σ be a set of differential polynomials, not necessarily finite. If ⊂ Σ is an auto-reduced set, such that no lower ranked auto-reduced set can be formed in Σ, then is called a differential characteristic set.
A well-known fact in differential algebra is that differential ideals need not be finitely generated [12,15]. However, a radical differential ideal is finitely generated by the Ritt-Raudenbush basis theorem [16]. This result gives rise to Ritt's pseudodivision algorithm (see below), allowing us to compute the differential characteristic set of a radical differential ideal. We now describe various methods to find a differential characteristic set and other related notions, and we describe why they are relevant to our problem; namely, they can be used to find the input-output equations.
In what follows, we will be considering the differential ring R(p)[u, y, x], where R(p) is the field of rational functions in the parameter vector p. The variables in this differential ring are the states, the inputs, the outputs, and possibly their derivatives.
Consider an ODE system of the formẋ (t) = f(x(t), p, u(t)) and ( ) = (x( ), p) for = 1, . . . , with f and g rational functions of their arguments. Let our differential ideal be generated by the differential polynomials obtained by subtracting the right-hand-side from the ODE system to obtainẋ (t) − f(x(t), p, u(t)) and ( ) − (x( ), p) for = 1, . . . , . In what follows, we use the ranking in [17], which is given by u <u <ü < ⋅ ⋅ ⋅ < y <ẏ <ÿ < ⋅ ⋅ ⋅ Note that the notation reflects the fact that the ordering among the components of u and y is immaterial, since these are known variables, whereas different ordering of the components of x may lead to different characteristic sets [17]. With respect to this ordering, a differential characteristic set is of the form [17]: where are differential polynomials. Note that the resulting system is not necessarily auto-reduced in R(p)[u, y, x], namely, 1 ( , ), . . . , ( , ) may not be auto-reduced. The first terms of the differential characteristic set, 1 (u, y), . . . , (u, y), are those terms independent of the state variables and when set to zero form the input-output equations:  y, . . . with rational coefficients in the parameter vector p. Note that the differential characteristic set is in general non-unique, but the coefficients of the inputoutput equations can be fixed uniquely by normalizing the equations to make them monic.
We now discuss several methods to find the input-output equations. The first method (Ritt's pseudodivision algorithm) can be used to find a differential characteristic set for a radical differential ideal. The second method (RosenfeldGroebner) gives a representation of the radical of the differential ideal as an intersection of regular differential ideals and can also be used to find a differential characteristic set under certain conditions [18,19]. Finally, we discuss Gröbner basis methods to find the input-output equations.

Ritt's Pseudodivision
Algorithm. An algorithm to find a differential characteristic set of a radical (in particular, prime) differential ideal generated by a finite set of differential polynomals is called Ritt's pseudodivision algorithm. We describe the process in detail below, which comes from the description in [17]. Note that our differential ideal as described above is a prime differential ideal [12,20]. Let and be differential polynomials.
(1) If contains the ℎ derivative ( ) of the leader of , is differentiated times so its leader becomes ( ) .
(2) Multiply the polynomial by the coefficient of the highest power of ( ) ; let be the remainder of the division of this new polynomial by ( ) with respect to the variable ( ) . Then is reduced with respect to ( ) . The polynomial is called the pseudoremainder of the pseudodivision.
(3) The polynomial is replaced by the pseudoremainder and the process is iterated using ( −1) in place of ( ) and so on, until the pseudoremainder is reduced with respect to .
This algorithm is applied to a set of differential polynomials, such that each polynomial is reduced with respect to each other, to form an auto-reduced set. The result is a differential characteristic set. Note that the multiplication mentioned in Step (2) above may yield a nonequivalent system if that coefficient happens to belong to the ideal. However, in practice, this does not occur for the ODE systems studied [17].

RosenfeldGroebner.
Using the DifferentialAlgebra package in Maple, one can find a representation of the radical of a differential ideal generated by some equations, as an intersection of radical differential ideals with respect to a given ranking [21]. Specifically, the RosenfeldGroebner command in Maple takes two arguments: sys and R, where sys is a list of set of differential equations or inequations which are all rational in the independent and dependent variables and their derivatives and R is a differential polynomial ring built by the command DifferentialRing specifying the independent and dependent variables and a ranking for them [21]. Then RosenfeldGroebner returns a representation of the radical of the differential ideal generated by sys, as an intersection of radical differential ideals saturated by the multiplicative family generated by the inequations found in sys. This representation consists of a list of regular differential chains with respect to the ranking of R. Note that RosenfeldGroebner returns a differential characteristic set if the differential ideal is prime [18].

Gröbner Basis
Methods. Finally, both algebraic and differential Gröbner bases can be employed to find the inputoutput equations. To use an algebraic Gröbner basis, one can take a sufficient number of derivatives of the model equations and then treat the derivatives of the variables as indeterminates in the polynomial ring in x,ẋ ,ẍ ,..., u, u,ü ,..., y,ẏ ,ÿ ,..., etc. Then a Gröbner basis of the ideal generated by this full system of (differential) equations with an elimination ordering where the state variables and their derivatives are eliminated first can be found. Details of this approach can be found in [22]. Differential Gröbner bases have been developed by Carrà Ferro [23], Ollivier [24], and Mansfield [25], but currently there are no implementations in computer algebra systems [14].

Model Rejection Using Input-Output Invariants
We now discuss how to use the input-output invariants obtained from differential elimination (using Ritt's pseudodivision, differential Gröbner bases, or some other method) for model selection/rejection. We can write our input-output relations in (4), or inputoutput invariants, in the form: The functions (u, y) are differential monomials, i.e., monomials in the input/output variables u,u ,ü , ... u, . . ., y,ẏ ,ÿ , ... y, . . ., etc., and the functions (p) are rational functions in the unknown parameter vector p. In order to uniquely fix the rational coefficients (p) to the differential monomials (u, y), we normalize each input/output equation to make it monic. In other words, we can rewrite our input-output relations as ∑̃(p) (u, y) = (u, y) Here (u, y) is a differential monomial in the input/output variables u,u ,ü , First consider the case of a single input-output equation. If there are unknown coefficients̃(p), we obtain the system: We write this linear system as = , where is an by matrix of the form: For the case of multiple input-output equations, we get the following block diagonal system of equations = : ( where is a = 1 + ⋅ ⋅ ⋅ + by = 1 + ⋅ ⋅ ⋅ + matrix. In the symbolic setting, given a general input function u and generic initial conditions and parameters, this system = should have a unique solution for , due to the persistence of excitation conditions [26]. In other words, we assume that the vectors of differential monomials 1 , . . . , at various time points are linearly independent. This means the coefficients̃(p) of the input-output equations can be uniquely determined in the generic setting [26]. Note that we have assumed that the parameters are all unknown and we have not taken any possible algebraic dependencies among the coefficients into account.
The main idea of this paper is to translate the symbolic setting to the numerical setting and can be described as follows. Assume we have perfect data; i.e., we know values of u,u ,ü , ... u, . . . , y,ẏ ,ÿ , ... y, . . ., etc., at many time instances 1 , . . . , , perfectly. Given a set of candidate models, we find their associated input-output invariants and then substitute in our values of u,u ,ü , ... u, . . . , y,ẏ ,ÿ , ... y, . . ., etc., at time instances 1 , . . . , , thus setting up the linear system = for each model. With perfect data and assuming the persistence of excitation conditions mentioned above, the solution to = should be unique for the correct model, but there should, in theory, be no solution for each of the incorrect models. Thus under ideal circumstances, one should be able to select the correct model since the input/output data corresponding to that model should satisfy Complexity 5 its input-output invariant. Likewise, one should be able to reject the incorrect models since the input/output data should not satisfy their input-output invariants.
However, with imperfect data, there could be no solution to = even for the correct model, and likewise there may or may not be a solution to = for an incorrect model. Thus, with imperfect data, one may be unable to select the correct model. On the other hand, if there is no solution to = for each of the candidate models, then the goal is to determine how "badly" each of the models fails and rejects models accordingly.
A subtle point regarding this approach is that this model rejection technique works best if the models under consideration are in the simplest possible form. This means that, ideally, redundant parameters have been eliminated from the model so that the input-output equations are as reduced as possible; i.e., there are not extra columns in when considering the linear system = . Extra columns generically mean more possible solutions, which can make it harder for our algorithm to reject incorrect models. However, redundant parameters, and the related notion of parameter unidentifiability, do not necessarily yield more coefficients in the input-output equations, as can be seen from the structure of input-output equations for linear compartment models as discussed in Section 8.
A related question to model compatibility is that of structural indistinguishability. Two models are structurally indistinguishable if for any choice of parameters in the first model there is a choice of parameters in the second model that will yield the same dynamics in both models, and vice versa [27]. One way to test for structural indistinguishability of two models is to find the associated input-output equations and then equate their coefficient functions and attempt to solve for one set of parameters in terms of the other set of parameters, and vice versa [27]. A necessary condition for models to be structurally indistinguishable is to have inputoutput equations with the same differential monomial terms. Since our approach only considers the structure of the inputoutput equations and not the specific coefficient functions, it is possible to have several different models, all with the same structure of their input-output equations, to be compatible using our model compatibility test. Thus, if a given model is found to be compatible, then any model that is structurally indistinguishable from that model is also compatible and thus our approach and structural indistinguishability testing can be applied in parallel. For more on structural indistinguishability, see [28][29][30]. The specific form of the coefficients of the input-output equations is considered in Section 8.
We now describe criteria to reject models.

Linear Solvability
Let ∈ R × and consider the linear system = .
Here, we study the solvability of (10) under noisy perturbation of both and . Let̃and̃denote the perturbed versions of and , respectively, and assume that̃− and̃− depend only oñand̃, respectively (see Section 5). Our goal is to infer the unsolvability of the unperturbed system (10) from observation of̃and̃only.
Our method is based on detecting the rank of an augmented matrix, but first let us introduce some notation. The singular values of a matrix ∈ R × will be denoted by (Note that we have trivially extended the number of singular values of from ℓ to .) The rank of is written rank( ).
The range of is denoted R( ). Throughout, ‖ ⋅ ‖ refers to the Euclidean norm.
The basic strategy will be to assume as a null hypothesis that (10) has a solution, i.e., ∈ R( ), and then to derive its consequences in terms of̃and̃. If these consequences are not met, then we conclude by contradiction that (10) is unsolvable. In other words, we will provide sufficient but not necessary conditions for (10) to have no solution; i.e., we can only reject (but not confirm) the null hypothesis. We will refer to this procedure as testing the null hypothesis.

Preliminaries.
We first collect some useful results.
In other words, if (14) does not hold, then (10) has no solution.
Remark 6. This approach can fail to correctly reject the null hypothesis if is (numerically) low-rank.

Remark 7.
In principle, we should test directly the assertion that rank([ , ]) = rank( ). However, we can only establish lower bounds on the matrix rank (we can only tell if a 6 Complexity singular value is "too large"), so this is not feasible in practice. An alternative approach is to consider only numerical ranks obtained by thresholding. How to choose such a threshold, however, is not at all clear and can be a very delicate matter especially if the data have high dynamic range.
trivially. However, this is not a significant disadvantage beyond that described above since if is full-rank, then it must be true that (10) is solvable.

Example: Perfect Data.
As a proof of principle, we first apply Theorem 5 to a simple linear model. We start by taking perfect input and output data and then add a specific amount of noise to the output data and attempt to reject the incorrect model. In the subsequent sections, we will see how to interpret Theorem 5 statistically under a particular "noise" model for the perturbations.
Here, we take data from a linear 3-compartment model, add noise, and try to reject the general form of the linear 2-compartment model with the same input/output compartments. Linear compartment models are defined in Section 8. In practice, one would like to compare models with the same input and output compartments, as the number of compartments involved may not be known, but the injection and measurement compartments would be known. Thus, in this particular example, we are assuming the linear 3compartment model is the "true model" and want to reject a competing model, but as our method concerns model rejection and not model selection, this notion of a "true model" is not a requirement for our method to work.
Here we have an input to the first compartment of the form 1 = 2 −3 + 12 −5 and the first compartment is measured, so that = 1 represents the output. Note that we have chosen a smooth, persistently exciting input function [26] so that derivatives can be taken and the coefficients of the inputoutput equation can be uniquely determined, as required. The solution to this system of ODEs can be easily found of the form: The input-output equation for a 3-compartment model with a single input/output to the first compartment has the form: where 1 , 2 , 3 are the coefficients of the characteristic polynomial of the matrix and 4 , 5 are the coefficients of the characteristic polynomial of the matrix 1 which has the first row and first column of removed [31]. We now substitute values of 1 ,̇1,̈1, ,,, ... at time instances = 0, 0.2, 0.4, 0.6, 0.8, 1 into our input-output equation and solve the resulting linear system of equations for 1 , 2 , 3 , 4 , 5 . We get that 1 = 7, 2 = 14, 3 = 8, 4 = 5, 5 = 5, which agrees with the coefficients of the characteristic polynomials of and 1 .
We now attempt to reject the 2-compartment model using 3-compartment model data. We find the input-output equations for a 2-compartment model with a single input/output to the first compartment, which has the form: where again 1 , 2 are the coefficients of the characteristic polynomial of the matrix and 3 is the coefficient of the characteristic polynomial of the matrix 1 which has the first row and first column of removed.
We substitute values of 1 ,̇1, ,,̈at time instances = 0, 0.2, 0.4, 0.6, 0.8, 1 into our input-output equation and attempt to solve the resulting linear system of equations for The Since the smallest singular value is greater than zero (or order machine precision), it is evident that the 2-compartment model can be rejected. We now add noise to our matrix in the following way. To each entry, and , we add where is a random

Statistical Inference
We now consider the statistical inference of the solvability of (10). First, we need a noise model.

Noise Model.
If the perturbations ‖̃− ‖ and ‖̃− ‖ are bounded, e.g., ‖̃− ‖ ≤ ‖̃‖ and ‖̃− ‖ ≤ ‖̃‖ for some > 0 (representing a relative accuracy of in the "measurements"̃and̃), then Theorem 5 can be used at once. However, it is customary to model such perturbations as normal random variables, which are not bounded. Here, we will assume a noise model of the form wherẽis a (computable) matrix that depends only oñ and similarly with̃, ∘ denotes the Hadamard (entrywise) matrix product ( ∘ ) = , and is a matrix-valued random variable whose entries ∼ N(0, 1) are independent standard normals.
In our application of interest, the entries of̃depend on those of̃as follows. Let = (V) for some input vector V, but suppose that we can only observe the "noisy" vector V = (1 + ) ∘ V. Then the corresponding perturbed matrix entries arẽ By the additivity formula for standard Gaussians, Therefore,̃= so, to first order in , An analogous derivation holds for̃. The basic strategy is now as follows. Let be a test statistic, i.e., +1 ([̃,̃]) in Section 4.2. Then since where we have made explicit the dependence of both sides on the same underlying random mechanism , the (cumulative) distribution function of must dominate that of ‖̃∘ ‖ + ‖̃∘ ‖, i.e., Thus, Using (31a)-(31d), we can associate a -value to any given realization of by referencing upper tail bounds for quantities of the form ‖ ∘ ‖. Recall that = 0 under the null hypothesis. In a classical statistical hypothesis testing framework, we may therefore reject the null hypothesis if (31d) is at most , where is the desired significance level (e.g., = 0.05).

Hadamard Tail Bounds.
We now turn to bounding Pr(‖ ∘ ‖ ≥ V), where we will assume that , ∈ R × . This can be done in several ways.
One easy way is to recognize that where ‖ ⋅ ‖ is the Frobenius norm, so But ‖ ‖ ∼ has a chi distribution with degrees of freedom. Therefore, However, each inequality in (32) can be quite loose; a slightly better approach is to use the inequality [32] ‖ ∘ ‖ ≤ ‖ min (max ,: , max :, ) ‖ ‖ , where ,: and :, denote the th row and th column, respectively, of . The ‖ ‖ term can then be handled using a chi distribution via ‖ ‖ ≤ ‖ ‖ as above or directly using a concentration bound (see below). Variations on this undoubtedly exist.
Here, we will appeal to a result by Tropp [33]. The following is from Section 4.3 in [33].

Test Statistic Tail Bounds. The bound (31d) for Pr( ≥ V)
can then be computed as follows. Let so that Pr( ≥ V) ≤ 1 (V) + 2 (V). Then by Theorem 10, where 2 and 2 are the "variance" parameters in the theorem for̃and̃, respectively. This simplifies to on completing the square. Now set so that the integral becomes The variable substitution = ( − V)/ then gives where is the standard normal distribution function. Thus, A similar analysis yields Equations (44) and (45) together comprise the probability bound on the null hypothesis that we will use hereafter.

Gaussian Processes to Estimate Derivatives
We next present a method for estimating higher order derivatives and the estimation error using Gaussian Process Regression and then apply the input-output invariant method to both linear and nonlinear models in the subsequent sections.
Suppose that there is an underlying deterministic function ( ) that we can only observe with some measurement The conditional distribution of (s) given (t) =̂(t) is also Gaussian: where post = prior (s) + Σ prior (s, t) (Σ prior (t, t) + 2 (t) ) −1 ⋅ (̂(t) − prior (t)) , are the posterior mean and covariance, respectively. This allows us to infer (s) on the basis of observinĝ(t).
The diagonal entries of Σ post are the posterior variances and quantify the uncertainty associated with this inference procedure. In particular, the square roots of these variances give us estimates on the term in the assumed noise model = (1 + ) ∘ in Section 5.  , where ( ) prior ( ) is the prior mean for ( ) ( ) and Σ ( , ) prior ( , ) = Σ prior ( , ). This joint distribution is exactly of the form (47). An analogous application of (48) then yields the posterior estimate of ( ) (s) | ( (t) =̂(t)) for all = 0, 1, . . . , .
Alternatively, if we are interested only in the posterior variances of each ( ) (s), then it suffices to consider each 2 × 2 block independently: ) . (51) The cost of computing (Σ prior (t, t) + 2 (t) ) −1 can clearly be amortized over all .

Formulae for Squared Exponential Covariance Functions.
We now consider the specific case of the squared exponential (SE) covariance function where 2 is the signal variance and ℓ is a length scale. The SE function is one of the most widely used covariance functions in practice. Its derivatives can be expressed in terms of the (probabilists') Hermite polynomials ( ) = (−1) (these are also sometimes denoted ( )). The first few Hermite polynomials are 0 ( ) = 1, 1 ( ) = , and The GP regression requires us to have the values of the hyperparameters 2 , 2 , and ℓ. In practice, however, these are hardly ever known. In the examples below, we deal with this by estimating the hyperparameters from the data by maximizing the likelihood. We do this by using a nonlinear conjugate gradient algorithm, which can be quite sensitive to the initial starting point, so we initialize multiple runs over a small grid in hyperparameter space and return the best estimate found. This increases the quality of the estimated hyperparameters but can still sometimes fail.

Results
We showcase our method on competing models: linear compartment models (2 and 3 species), Lotka-Volterra models (2 and 3 species) and Lorenz. We compute the inputoutput invariants of the Lotka-Volterra and Lorenz using RosenfeldGroebner. The method to compute the linear compartment input-output invariants is presented in the following section. We simulate each of these models to generate time-course data, add varying levels of noise, and estimate the necessary higher order derivatives using GP regression. Using the estimated GP regression data, we test each of the models using the input-output invariant method on other models.
Example 11. The two species Lotka-Volterra model iṡ where 1 and 2 are variables and 1 , 2 , 3 , 4 are parameters. We assume only 1 is observable and perform differential elimination and obtain our input-output invariant in terms of only = 1 ( ): Example 12. By including an additional variable , the three species Lotka-Volterra model iṡ assuming only = 1 is observable. After differential elimination, the input-output invariant is Example 13. Another three species model, the Lorenz model, is described by the system of equations: We assume only = 1 is observable, perform differential elimination, and obtain the following invariant: Example 14. A linear 2-compartment model without input can be written aṡ1 where 1 and 2 are variables and 11 , 12 , 21 , 22 are parameters. We assume only 1 is observable and perform differential elimination and obtain our input-output invariant in terms of only = 1 ( ): where 1 , 2 , 3 are variables and 11 . 12 , 13 , 21 , 22 , 23 , 31 , 32 , 33 are parameters. We assume only 1 is observable and perform differential elimination and obtain our input-output invariant in terms of only = 1 ( ): By assuming = 1 in Examples 6.1-6.5 representing the same observable variable, we apply our method to data simulated from each model and perform model comparison. The models are simulated and 100 time points are obtained for the variable in each model. We add different levels of Gaussian noise to the simulated data and then estimate the higher order derivatives from the data. Accurate estimation of the derivatives was not always possible. For example, during our study we found that for some parameters of the Lotka-Volterra three species model, e.g., 1 , 2 , 3 , 4 , 5 , 6 , 7 = [1.24; 1.68; 3.26; 0.38; 1.50; 0.15; 1.14], the data could not adequately fit with a GP, as indicated by a small likelihood. Furthermore, even when a good fit is achieved, the derivative estimates themselves could be poor as reflected in high posterior variances. This is a notoriously difficult problem and we offer only some pragmatic guidance here. In particular, we err on the side of being overly conservative by keeping only "good" time points defined as follows. Let post ( ) be the posterior standard deviation of the estimate of ( ) ( ). Then a time point is considered good only if for all , where mean(⋅) and std(⋅) give the mean and standard deviations, respectively, of a set. In this way, we adaptively filter out potentially problematic inputs to the ensuing model rejection framework. Note that filtering out data is equivalent to removing constraints so that we can only decrease the discriminatory power, i.e., models that are flagged as incompatible after data filtering would have been incompatible as well without filtering.
Once the data are obtained and derivative data are estimated through the GP regression, each model data set is tested against the other input-output invariants. Results are shown in Figure 1, which gives a probability bound that the data are compatible with a given model (i.e., ∼ 0 means model rejected) at a variety of noise levels. We find that we can reject the three species Lotka-Volterra model and Lorenz model for data simulated from the Lotka-Volterra two species; however, both linear compartment models are compatible.
For data from the three species Lotka-Volterra model, the linear compartment models and two species Lotka-Volterra can be rejected until the noise increases and then the method can no longer reject any models. Finally data generated from the Lorenz model can only reject the two species linear compartment and two species Lotka-Volterra model.

Other Considerations: Known Parameter Values and Algebraic Dependencies
We have demonstrated our model discrimination algorithm on various models. In this section, we consider some other theoretical points regarding input-output invariants.
As mentioned earlier, we have assumed that the parameters are all unknown and we have not taken any possible algebraic dependencies among the coefficients into account. This latter point is another reason our algorithm only concerns model rejection and not model selection. Thus, each unknown coefficient is essentially treated as an independent unknown variable in our linear system of equations. However, there may be instances where we would like to consider incorporating this additional information.
To analyze the effects of incorporating known parameter values and algebraic dependencies, we will examine a particularly nice class of models, linear compartment models, whose input-output equations can be found using linear algebra techniques [31]; i.e., computation of the input-output equations does not rely on more computationally intensive approaches such as RosenfeldGroebner and Gröbner bases. Since we will now be considering the explicit form of the coefficients of the input-output equations, we describe the set-up of linear compartment models below. Let = ( , ) be a directed graph with vertex set and set of directed edges . Each vertex ∈ corresponds to a compartment in our model and each edge → corresponds to a direct flow of material from the th compartment to the th compartment. Let , , ⊆ be three sets of compartments: the set of input compartments, output compartments, and leak compartments, respectively. To each edge → we associate an independent parameter , the rate of flow from compartment to compartment . To each leak node ∈ , we associate an independent parameter 0 , the rate of flow from compartment leaving the system.
To such a graph and set of leaks we associate the matrix in the following way:  In [31], an explicit formula for the input-output equations for linear models was derived. In particular, it was shown that all linear -compartment models corresponding to strongly connected graphs with at least one leak and having the same input and output compartments will have the same differential polynomial form of the input-output equations. For example, a linear 2-compartment model with a single input and output in the same compartment and corresponding to a strongly connected graph with at least one leak has the form: Thus, our model discrimination method would not work for two distinct linear 2-compartment models with the abovementioned form. In order to discriminate between two such models, we need to take other information into account, e.g., known parameter values.
Notice that both of these equations are of the abovementioned form, i.e., both 2-compartment models have a single input and output in the same compartment and correspond to strongly connected graphs with at least one leak. In the first model, there is a leak from the first compartment and an exchange between compartments 1 and 2. In the second model, there is a leak from the second compartment and an exchange between compartments 1 and 2. Assume that the parameter 12 is known. In the first model, this changes our invariant to

71)
In the second model, our invariant is

72)
In this case, the right-hand sides of the two equations are the same, but the first equation has two variables (coefficients) while the second equation has three variables (coefficients). Thus, if we had data from the second model, we could try to reject the first model (much like the 3-compartment versus 2compartment model discrimination in the examples above). In other words, a vector in the span of, , and for 1 , 2 , 3 may not be in the span oḟand only.
We next consider the effect of incorporating coefficient dependency relationships. While we cannot incorporate the polynomial algebraic dependency relationships among the coefficients in our linear algebraic approach to model rejection, we can include certain dependency conditions, such as certain coefficients becoming known constants. We have already seen one way in which this can happen in the previous example (from known nonzero parameter values). We now explore the case where certain coefficients go to zero. From the explicit formula for input-output equations from [31], we get that a linear model without any leaks has a zero term for the coefficient of . Thus a linear 2-compartment model with a single input and output in the same compartment and corresponding to a strongly connected graph without any leaks has the form:̈+ Thus to discriminate between two distinct linear 2compartment models, one with leaks and one without any leaks, we should incorporate this zero coefficient into our invariant.
In the first model, there is a leak from the first compartment and an exchange between compartments 1 and 2. In the second model, there is an exchange between compartments 1 and 2 and no leaks. Thus, our invariants can be written as

76)
Again, the right-hand sides of the two equations are the same, but the first equation has three variables (coefficients) while the second equation has two variables (coefficients). Thus, if we had data from the first model, we could try to reject the second model. In other words, a vector in the span of, , and for 1 , 2 , 3 may not be in the span oḟand only.

Conclusion
After performing this differential algebraic and statistical model rejection, one has already obtained the input-output equations and thus can test structural identifiability [17,26,34]. In a sense, our method extends the current spectrum of potential approaches for comparing models with time-course 14 Complexity data, in that one can first reject incompatible models, then test structural identifiability of compatible models using inputoutput equations obtained from the differential elimination, infer parameter values of the admissible models, and apply an information criterion model selection method to assert the best model. Notably the presented differential algebraic and statistical method does not penalize for model complexity, unlike traditional model selection techniques. Rather, we reject when a model cannot, for any parameter values, be compatible with the given data. We found that simpler models, such as the linear 2-compartment model, could be rejected when data were generated from a more complex model, such as the three species Lotka-Volterra model, which elicits a wider range of behavior. On the other hand, more complex models, such as the Lorenz model, were often not rejected, from data simulated from less complex models. In the future it would be helpful to better understand the relationship between inputoutput invariants and dynamics. It would be useful to develop numerical algorithms in differential algebra (similar to that in numerical algebraic geometry); a natural extension, if such algorithms were available, would be to analyze models with data, although not parameter-free, similar to that done in [35,36]. Another future direction is creating an algorithm that takes a probabilistic or randomized approach for eliminating variables of larger differential-algebra models [37].
We believe there is large scope for additional parameterfree coplanarity model comparison methods. It would be beneficial to explore whether algorithms for differential elimination can handle larger systems and whether this area could be extended.

Data Availability
The data used to support the findings of this study are available from the corresponding author upon request. All methods for data generation and analysis are described throughout the paper. Detailed research materials supporting this publication can be accessed by contacting HA Harrington.

Disclosure
Current address of Kenneth L. Ho: TSMC Technology, Inc., San Jose, CA 95134, USA.