Discrete gradient methods for preserving a first integral of an ordinary differential equation

In this paper we consider discrete gradient methods for approximating the solution and preserving a first integral (also called a constant of motion) of autonomous ordinary differential equations. We prove under mild conditions for a large class of discrete gradient methods that the numerical solution exists and is locally unique, and that for arbitrary $p \in \mathbb{N}$ we may construct a method that is of order $p$. In the proofs of these results we also show that the constants in the time step constraint and the error bounds may be chosen independently from the distance to critical points of the first integral. In the case when the first integral is quadratic, for arbitrary $p \in \mathbb{N}$, we have devised a new method that is linearly implicit at each time step and of order $p$. This new method has significant advantages in terms of efficiency. We illustrate our theory with a numerical example.


1.
Introduction. Consider the autonomous ordinary differential equation (ODE) where x(t) ∈ R d for some d ∈ N, x(0) = x 0 ∈ R d is the initial condition and f : R d → R d is locally Lipschitz continuous. Then given a bounded set B ⊂ R d , there exists a T > 0 such that for any x 0 ∈ B the solution exists and remains bounded for t ∈ [0, T ] (see e.g. [5,Thm. I.7.3 on p. 37]). We assume that this ODE has a conserved first integral (also called a constant of motion) I : R d → R such that I(x(t)) = I(x 0 ) for all t ∈ [0, T ].
(2) To simplify notation define i(x) := ∇I(x) for all x ∈ R d , and assume that i(x) is locally Lipschitz continuous. Define R + := {t ∈ R : t > 0}. According to [9], on {x ∈ R d : i(x) = 0} we may write (1) aṡ where S(x) ∈ R d×d is skew-symmetric (S T = −S) and may be given by the so-called default formula In general, the choice of S(x) satisfying f (x) = S(x)i(x) is not unique. Moreover, Proposition 2.1 in [9] states that if f ∈ C r (R d ; R d ) for r ≥ 1 and I is a Morse function (i.e. smooth with non-degenerate critical points) then S in (4) is C r and locally bounded on {x ∈ R d : i(x) = 0}, and in the proof of Proposition 2.1 we also have that |f (x)|/|i(x)| is locally bounded on {x ∈ R d : i(x) = 0}. In fact, the requirement that f ∈ C r (R d ; R d ) for r ≥ 1 may be relaxed to f locally Lipschitz continuous so that S is also only locally Lipschitz continuous on {x ∈ R d : i(x) = 0}. Let us also make the assumption that I is a Morse function so that for a bounded set B ⊂ R d there exists a constant C 1 = C 1 (B) such that Note that from continuity it follows that if i(x) = 0 for x ∈ B, then f (x) = 0. A useful constant throughout this paper will be C 2 = C 2 (B) := C 1 + 1 5 . Methods for approximating the solution to this type of ODE that simultaneously preserve the integral are of interest in many applications. For example, Hamiltonian systems, Poisson systems, celestial mechanics, the Lotka-Volterra system and the undamped Duffing oscillator (see [4] and references therein). Here we consider discrete gradient methods for approximating the solution to (1) whilst exactly 1 preserving I (see e.g. [9,13,15]).
Let us first define a special type of discretization of the gradient of I, a discrete gradient of I. Definition 1.1. (Gonzalez [2]). A discrete gradient of I, denotedī : R d ×R d → R d , is continuous and satisfies There are several ways of constructing a discrete gradient. Two notable examples are the one used in the averaged vector field method (called the mean value discrete gradient in [9], see also [14]) and the coordinate increment method [8].
Given a time step h we define a discrete gradient method by the map x → x ′2 whereī is a discrete gradient of I and S is any skew-symmetric consistent approximation of S. By consistent we mean that S(x, x ′ , h) is continuous and S(x, x, 0) = S(x) on {x ∈ R d : i(x) = 0}. All discrete gradient methods preserve I because The final equality in (7) is because S is skew-symmetric. By discretizing the default formula for S(x) given in (4) we obtain an example of a discrete gradient method (there are many different possible discrete gradient methods for (1)). Letĩ(x, x ′ , h),î(x, x ′ , h) andȋ(x, x ′ , h) be consistent approximations to i(x), so that they are all continuous and for all x ∈ R d , letf (x, x ′ , h) be a consistent approximation of f (x), and letī(x, x ′ ) be a discrete gradient of I. Then a discrete gradient method applied to (1) is defined by (6) with S(x, x ′ , h) given by A useful way of describing the accuracy of a numerical method for solving (1) is to determine its order of accuracy. For one-step methods this is defined by the truncation error around the point x for a time step h. Definition 1.2. A one-step method x → x ′ with time step h for solving (1) has order of accuracy p ∈ N, if there exist positive constants C and H such that where x(·) denotes the solution to (1) with x(t) = x for some t ∈ R + and B is a compact set in R d .
These other definitions are ambiguous regarding how the hidden constant in O(·) may depend on other parameters, and how small h should be. By using the definition from [1] in our results we can be sure that the constants in the definition of order p do not depend on |i(x)|, which may be small.
Throughout this paper we will also make use of Banach's Fixed Point Theorem (also called the Contraction Principle), see e.g.  Let (X, d) be a non-empty complete metric space. Let T : X → X be a contraction on X, i.e. there exists a q ∈ (0, 1) such that for all x, y ∈ X.
Then there exists a unique fixed point x * ∈ X such that T (x * ) = x * . Furthermore, the fixed point can be found by iteration, x n+1 = T (x n ) for n = 0, 1, 2, . . . with x 0 ∈ X arbitrary.
In Section 2 we prove that discrete gradient methods where S has the form (8) are well-defined in the sense that provided h is sufficiently small andf ,ĩ,î,ȋ and i satisfy certain consistency and local Lipschitz continuity conditions, then there exists a locally unique solution to (6). In Section 3 we prove that for arbitrarily chosen p ∈ N, iff ,ĩ,î,ȋ andī satisfy two additional conditions (f defines a method of order p and |î ·ȋ −ĩ ·ī| is bounded in a special way) then we get a discrete gradient method of order p.
In Section 4 we consider discrete gradient methods from the perspective of doing computations. Generally, each step of a discrete gradient method requires solving a nonlinear system of equations for x ′ and this may add a significant amount to the computational cost of the method because an iterative scheme, such as Newton's method, must be employed at each step. In the case when I is quadratic we present a new method that is linearly implicit in x ′ at each time step, so only a linear system of equations must be solved at each step. We also show that Runge-Kutta methods, under very mild conditions on the coefficients, give anf that satisfies the conditions required in Sections 2 and 3, and therefore we may use Runge-Kutta methods of order p (for some p ∈ N) to construct discrete gradient methods of order p.
Following the theory in these sections we present a numerical example in Section 5 to illustrate our theory, and finally, in Section 6 we discuss the implications of this work and possible avenues for future research.
2. Existence and uniqueness. At each time step of the discrete gradient method we must (in general) solve a nonlinear system of equations (see (6)) for x ′ , but does the solution to this system of equations exist? In this section we present a theorem that ensures for sufficiently small time step h, the map from x → x ′ is well-defined in the sense that there exists a locally unique solution x ′ to the system of equations (6) for the case when S is given by (8).
Usual techniques for achieving this type of result include applying the Implicit Function Theorem (see [12]) or the Newton-Kantorovich Theorem (see [11]). For example, the Newton-Kantorovich Theorem is used in [3] to obtain existence of a numerical solution for a symmetric projection method, which requires solving a nonlinear system of equations at each time step. In our experience these approaches for discrete gradient methods lead to a condition on the time step such as h ≤ C|i(x)| r for some positive constants C and r. If we are close to a critical point of I (i.e. when |i(x)| is small) then the theory implies that we must also take h small. Our result and its proof below avoid this issue and we show that a solution to the nonlinear system of equations for a discrete gradient method ((6) with S defined by (8)) exists and is locally unique for a sufficiently small time step independent of |i(x)| (and hence independent of the distance to critical points of I).
The local nature of our result (everything will depend on an initially chosen bounded set B) is only due to the local Lipschitz continuity of f and i, and (5), rather than also depending on the distance to critical points of I.
We will require the following definition of a ball around a point x ∈ R d . Given a constant R > 0 and x ∈ R d define The following theorem ensures that, for sufficiently small h and under certain local Lipschitz conditions, the map x → x ′ defined by (6) and (8) is locally welldefined, in the sense that there exists a locally unique solution to the nonlinear system of equations defined by (6) and (8).
Theorem 2.1. Let B be a bounded set in R d and suppose there exist positive constants R, L and H such that for each x ∈ B and all u, v, w ∈ B R (x) and h ∈ [0, H), and similarly forî : Let C 2 be the constant defined after (5) and define and Then for each x ∈ B and h ∈ [0, H ′ ) there exists a unique x ′ ∈ B R ′ (x) satisfying (6) where S is given by the formula (8).
Proof. Note that if x ∈ B and i(x) = 0 then x ′ = x is the unique solution to (6) in B R ′ (x) = {x}. For the case when i(x) = 0 we will apply Banach's Fixed Point Theorem (Theorem 1.3) to prove our result. Let R ′ and H ′ be defined as in the theorem and for fixed x ∈ B, such that i(x) = 0, define X := B R ′ (x). X is a closed subset of R d , so together with the metric | · |, it is a complete metric space. Also fix h ∈ [0, H ′ ), and define T : where S is given by (8). To satisfy the assumptions of Theorem 1.3 we must show that T (z) ∈ X for all z ∈ X and that T is a contraction on (X, | · |). Let z ∈ X. Using (11) We can derive similar inequalities forî andȋ, and forī we can derive Using (9), (5), Using (11) and (12) We get T (z) ∈ X from the following inequality, where we have used (15), (14), (12), (13) and h ≤ H ′ to get To show T is a contraction, let z, z ′ ∈ X. Using (15), (11) and (12) forî andȋ, and writingî(x, z, h) asî(z) etc. we get Using (9), (10), (11), (12), (13), (14), (15) and (16) we get Using a similar argument we can also derive Now using (17), (18) and h < H ′ we get where (36C 2 + 6)Lh < 1. Therefore, T is a contraction on (X, | · |) and by Theorem 1.3 there exists a unique x ′ ∈ X such that T (x ′ ) = x ′ . By the definition of T it follows that x ′ satisfies (6) where S is given by (8).
3. Order of accuracy. In this section we give sufficient conditions for a discrete gradient method defined by (6) and (8) to be of order p for arbitrarily chosen p ∈ N. In addition to requiring the same conditions as in Theorem 2.1, we also require two further conditions.
The following two lemmas will be used to prove our main result, Theorem 3.3.
Proof. Fix x ∈ B and h ∈ [0, H ′ ), and define X : Hence T : X → X is a contraction and the result follows by applying Theorem 1.3. For the case when i(x) = 0 note thatf (x, z, h) = 0 for z ∈ X = {x} (use (9) and |i(x)| = 0).

Lemma 3.2.
In addition to the hypotheses of Theorem 2.1 suppose that for each Proof. If i(x) = 0 then (by (5)) we are at a stationary point and the result is trivial. Suppose i(x) = 0. Existence theory for ODEs (see e.g. [5, Thm. I.7.3 on p. 37]) implies there exists a T > t such that x(s) ∈ B 5R ′ (x) for all s ∈ [t, T ]. If T ≥ t + h then we are done, so suppose T < t + h. Existence theory also implies that the solution x(s) exists for s ∈ [T, T ′ ] for some T ′ > T , even though it may not be in Since this inequality is strict and x(·) exists up to T ′ and is continuous, there exists an ǫ > 0 such that To complete the proof let us argue by contradiction. Suppose there exists a s ∈ [t, t + h] such that x(s) / ∈ B 5R ′ (x) (this includes the case when x(t + h) does not exist). Then by continuity of x(·), x(t) = x and since B 5R ′ (x) is closed, there exists a s ′ ∈ [t, s) and a δ > 0 such that x(r) ∈ B 5R ′ (x) for all r ∈ [t, s ′ ] and x(r) / ∈ B 5R ′ (x) for all r ∈ (s ′ , s ′ + δ). However, by the above argument there exists an ǫ > 0 such that x(r) ∈ B 5R ′ (x) for all r ∈ [t, s ′ + ǫ). A contradiction. Therefore, The extra Lipschitz continuity condition on f in Lemma 3.2 follows from our earlier assumption that f is locally Lipschitz. Now let us present the main theorem of this section, where we show that under certain conditions the discrete gradient method defined by (6) and (8) is of order p, for some p ∈ N. For each x ∈ B and h ∈ [0, H ′ ) Then the discrete gradient method defined by (6) with S given by (8) is also of order p, so that there exist positive constants C 5 and H 5 such that (20) and (13) Using (14) And using (9), (13) Putting (22) together with (23), (24) and (25) we get Now bound T 4 and T 5 separately. Using (9), (14), (15), (12), (13) And using (15) (with z = x ′ ), (12) and (26) we get Putting (27) together with (28) and (29) we get Our result now follows easily from (30) and (20) since h ≤ min{H 3 , 1 30C2C4 }: Therefore, 4. Discrete gradient methods for computation. In this section we consider discrete gradient methods that may be used in computations, and how we might choosef ,ĩ,ī,î andȋ so that they satisfy the hypotheses of Theorems 2.1 and 3.3.
We also consider how to make the nonlinear system of equations defined by (6) and (8) as easy as possible to solve at each time step. In the case when I is quadratic we achieve this by constructing a discrete gradient method so that it is linearly implicit in x ′ . In general this is not possible, except in the case when I is quadratic.
Since f (x) and i(x) are locally Lipschitz continuous, for a given bounded set B ⊂ R d and constant R 0 > 0 there exists a constant L 0 > 0 such that for all x ∈ B and for all u, v ∈ B R0 (x) A possible choice forf is so that the method defined by (19) Forf in (19) to correspond to an s-stage Runge-Kutta method then we must definef where the k i are (implicitly) defined by (32). Even though the k i may be implicitly defined by (32), as for Implicit Runge-Kutta methods, we may use the map K : R d × R + → R d×s (defined below in Lemma 4.2) to explicitly represent each k i in terms of x and h as the i th column of K(x, h). Since the k i do not depend on x ′ we get af that does not depend on x ′ and we may writef =f ( For a given s-stage Runge-Kutta method two constants that will be useful are For completeness, the following lemma gives conditions on h so that an s-stage Runge-Kutta method is well-defined in the sense that the map x → {k i : i = 1, . . . , s} is locally well-defined (so that the map x → x ′ is also locally well-defined). Although it is a bespoke result for this paper, the proof is very similar to that given for [5, Thm. II.7.2].
Let us define a map K : Proof. We again apply Banach's Fixed Point Theorem (Theorem 1.3).
To apply Theorem 1.3 we must show that T (K) ∈ X for all K ∈ X, and that T : X → X is a contraction.
First note that by the definition of X and using (5) we have Using this, and since h ≤ 1 2A1(C1+L0/R0)R0 and u ∈ B 2R0 (x), we have (37) Hence u + h s j=1 a ij k j ∈ B R0 (x). Using (31) and (37) we then get Hence T (K) ∈ X. By (31) we also have , T : X → X is a contraction and the result then follows by applying Theorem 1.3.
The following lemma is a technical result for the subsequent corollary. Thenf (x, 0) = f (x) for every x ∈ B, and there exist positive constants R, L and H such that for each x ∈ B, and for all u, v ∈ B R (x) and h ∈ [0, H) For h ∈ (0, H) and u, v ∈ B R (x), we have from Lemma 4.2 that there exist unique Hence, K u − K v ≤ 2L 0 |u − v|. Using this, (34) and Finally, using (34), Corollary 4.4. All consistent Runge-Kutta methods (so that s i=1 b i = 1) define af (see (34)) that satisfies condition (9) in Theorem 2.1.
If we definef corresponding to an explicit s-stage Runge-Kutta method (where a ij = 0 for i ≤ j), then the k i in (32) are defined explicitly in terms of x and h and f (x, h) may be calculated explicitly (instead of using the map K which may not be computed explicitly). To obtain a Runge-Kutta method that is of order p there are additional constraints on the a ij and b i (e.g. see [4, p. 29]).
To check whether or notī =ī(x, x ′ ) satisfies (10) we only need to ensure that it is locally Lipschitz continuous sinceī(x, x) = i(x) is already satisfied by the definition of a discrete gradient. Moreover, since i(x) is locally Lipschitz, it easily follows that both the coordinate increment discrete gradient (see [8]) and the one used in the average vector field method (see e.g. [9,14]) are also locally Lipschitz. For general I there are no known explicitly defined discrete gradients. However, if I is quadratic then i(x) is linear and we may defineī(x, x ′ ) so that it is linear in x ′ by takinḡ There is considerable freedom over how we chooseĩ,î andȋ in (8), and to apply Theorems 2.1 and 3.3 they only need to satisfy (11) and (21). For example, we may defineĩ to be any of the following (and similarly forî andȋ): Except for the final case whenĩ(x, h) = i(y), it is obvious that all of these choices for i satisfy (11) since i(x) is locally Lipschitz. To confirm thatĩ(x, h) = i(y) satisfies (11) we prove the following two lemmas.  (9). Let C 1 be the constant from (5).
Then for each x ∈ B, and for all u, v ∈ B R1 (x) and h ∈ [0, H 1 ),ĩ satisfies Proof. Fix x ∈ B, u, v ∈ B R1 (x) and h ∈ [0, H 1 ). Let y = y(x, h) denote the solution to y = x + hf (x, y, h), and similarly for y(u, h) and y(v, h). From Lemma 4.5 we know these solutions exist and that y(x, h) ∈ B 2R (x) and y(u, h), y(v, h) ∈ B R (x). If h = 0 then y = x andĩ(x, 0) = i(x).
If h = 0, since y(u, h), y(v, h) ∈ B R (x), we may use (9) and h ≤ 1 2L to get Using this and (31) we get In Theorem 3.3 we also require thatĩ,ī,î andȋ satisfy condition (21). By takinĝ i(x, x ′ , h) :=ĩ(x, x ′ , h) andȋ(x, x ′ , h) :=ī(x, x ′ ) then this is achieved trivially, and the resulting method is equivalent to a projection method (see [10]). Unfortunately, in this case the system of equations to solve at each time step is nonlinear in general.
A method that is almost a projection method is the following. For generalĩ,ī andf satisfying the conditions in Theorem 2.1 and Theorem 3.3, defineî(x, x ′ , h) := i(x, x ′ , h) andȋ(x, h) :=ī(x, y(x, h)) where y = y(x, h) satisfies y = x + hf (x, y, h). The following two lemmas ensure thatȋ satisfies (11) andĩ,ī,î andȋ satisfy (21). Lemma 4.7. Let B ⊂ R d be a bounded set and let C 1 be the constant from (5). Suppose there exist positive constants R, L and H such that for each x ∈ B, and all u, v, w ∈ B R (x) and h ∈ [0, H) thatf : where y satisfies y = u + hf (u, y, h), for all x ∈ B, u ∈ B R1 (x) and h ∈ [0, H 1 ). Then for each x ∈ B, and for all u, v ∈ B R1 (x) and h ∈ [0, H 1 ),ȋ satisfies The proof of this result is very similar to the proof of Lemma 4.6 so we omit it.
be the unique solution to (6) with S defined by (8) (that exists by Theorem 2.1), and 3. let x(·) denote the exact solution to (1) satisfying x(t) for some t ∈ R + . Also suppose that 4.f is such that the method defined by (19) is of order p for some p ∈ N, i.e. there exist constants C 3 and If we define C 4 := 6L1 5 max{1, C 3 }, thenĩ,ī,î andȋ satisfy (21). Proof. The fact thatf ,ĩ,ī,î,ȋ R ′ 1 and H ′ 1 satisfy the conditions in Theorem 2.1 with R, L, H, R ′ and H ′ replaced by R 1 , L 1 , H 1 , R ′ 1 and H ′ 1 respectively, follows from R 1 ≥ R, L 1 ≥ L and H 1 ≤ H.
Using (12) and (10) (with L replaced by L 1 ) we get This lemma leads to the following obvious corollary of Theorem 3.3. Corollary 4.9. With the same hypotheses as Lemma 4.8, then the discrete gradient method defined by (6) and (8) is of order p.
If I is quadratic then using (39) to defineī and an explicit s-stage Runge-Kutta method to definef we can construct a linearly implicit discrete gradient method. The following corollary is a direct consequence of (39), Lemmas 4.3, 4.5, 4.7 and 4.8 and Theorems 2.1 and 3.3.
Corollary 4.10. Suppose I is quadratic and letf correspond to an explicit s-stage Runge-Kutta method of order p, for some p ∈ N. Then the discrete gradient method defined by is linearly implicit in x ′ , locally well-defined (in the sense that for sufficiently small h there exists a locally unique x ′ at each time step) and of order p.

5.
Numerical examples. In this section we experiment with using the new linearly implicit (when I is quadratic) discrete gradient method constructed in Corollary 4.10. To demonstrate the efficiency gain due to only needing to solve a linear system at each time step we will compare it with the standard projection method from [4] on a problem with quadratic I.
The new discrete gradient method we suggested in Corollary 4.10 for the case when I is quadratic corresponds to definingĩ(x, y) where y satisfies y = x + hf (x, h) andf is defined by an explicit s-stage Runge-Kutta method. With these choices forf ,ĩ,ī, i andȋ then the discrete gradient method defined by (6) and (8)  The entries denoted by ⋆ are not required because we are only considering autonomous ODEs.
Since I is quadratic, i(x) is linear and there exists a matrix M and a vector b such that i(x) = M x + b for all x ∈ R d . In the case when i(x) = 0, to obtain x ′ at each time step of the method in Corollary 4.10, we must solve the linear system where Id ∈ R d is the identity matrix. Note that the cost of computingf (x, h) at each time step is essentially the same as the cost for computing the RK4 method, so we already know that computing this new discrete gradient method will cost more than the RK4 method.
To compare this new linearly implicit discrete gradient method with another integral preserving method we also consider the standard projection method (see Algorithm IV.4.2 in [4]) with RK4 as the underlying method. The algorithm is: 1. Compute y by solving y = hf (x, h) (wheref corresponds to the RK4 method), 2. Compute x ′ by solving x ′ = y + λi(y) such that I(x ′ ) = I(x) for x ′ and λ ∈ R.
Actually, step 2 above is what is suggested in equation (4.5) of [4], after Algorithm IV.4.2, as a more convenient nonlinear system to solve (by reducing the number of evaluations of i(·) required). To solve the nonlinear system in step 2 Hairer, Lubich and Wanner use the following simplified Newton iteration (see [4, p. 111] for details) λ 0 = 0 and λ i+1 = λ i − I(y + λ i i(y)) − I(x) i(y) · i(y) .
Once this iterative scheme has converged to λ * then x ′ is computed using x ′ = x + λ * i(y).

5.1.
Modified rigid body motion. In the following example we will compare the performance of our new discrete gradient method with the RK4 method and another integral preserving method, the standard projection method (all described above). We first demonstrate the benefits of preserving the integral by inspecting phase space plots for the RK4 method and our new discrete gradient method. We will see that the integral preserving method does a much better job of following the trajectory of the exact solution. To compare the errors we include all three methods. We will see that the errors for all three methods are of similar size, that all three methods are of the same order, and that our new discrete gradient method is more efficient than the standard projection method. The example we use for our computations is a modification to the equations for rigid body motion in three dimensions (see e.g. Example 1.7 in [4, p. 99]). For a parameter α ∈ R, the augmented equations of motion for a body with centre of mass at the origin are d dt where I 1 , I 2 and I 3 are also parameters. In the case when α = 0 this system reduces to the equations for rigid body motion where the vector x = (x 1 , x 2 , x 3 ) T is the angular momentum in the body frame and the I i parameters are the principal moments of inertia. Moreover, when α = 0 there are two quadratic first integrals, but in the general case when α = 0 then the only first integral is, Notice that (41) has the form of (3). In our computations we have taken I 1 = 2, I 2 = 1, I 3 = 2/3, α = 1 and we have used the initial condition x 0 = (cos(1.1), 0, sin(1.1)) T at t = 0. Except for α, these are the same values used in [4].
In Figure 1 we see that phase space, projected onto the x 1 x 3 -plane, is more accurately represented when we compute the solution using our new discrete gradient method instead of the RK4 method. Here we have used two different time steps (h = 0.5 and h = 100 92 ) and computed up to a final time of t = 500. In the plots, the solid grey line is the exact solution and the black dots are the approximate solution at each time step using either RK4 or our new discrete gradient method. For the larger step size of h = 100 92 the RK4 method appears to converge to equilibrium which is the wrong type of asymptotic behaviour. For our new discrete gradient method, while the errors are clearly quite large for this larger step size, the solution appears to be circulating around a periodic orbit which is the correct asymptotic behaviour for this example. Another possibility with the RK4 method (not observed in this example) is that the solution will blow up at some critical time (for example with α = 2, h = 10 31 at t ≈ 100). This cannot happen for integral preserving methods such as our new discrete gradient method.
In Figures 2 and 3 we compare the errors for the three different methods: RK4, the standard projection method, and our new discrete gradient method. In Figure  2 we have plotted the solution error and integral error versus time for the three different methods. We see that the solution error is initially similar for all three methods, it grows as time increases, and then remains bounded. The integral error plot clearly shows that the integral preserving methods preserve the integral up to  double machine precision and are vastly superior in terms of preserving the integral than the non-integral preserving RK4 method. These computations used a fixed time step of h = 0.5 for all three methods and computations were performed up to a final time of t = 500.
In Figure 3 we compare the performance of the same three methods for different step sizes. We are interested to see whether or not our new discrete gradient method is of the same order as RK4 (order 4), and to compare the efficiency of our new discrete gradient method with another integral preserving method, the standard projection method, where a nonlinear system of equations must be solved at each time step. By plotting the solution error at time t = 100 for different step sizes (h ∈ [10 −3.5 , 1]) in the left plot of Figure 3, we confirm that our new discrete gradient method is of order 4, the same as RK4 which is the underlying method definingf . In the right plot of Figure 3 for the same range of step sizes we have plotted the solution error at t = 100 against the CPU time required to compute the solution up to t = 100. In this way we can compare the efficiency of these methods. The plot clearly shows that our new discrete gradient method is more efficient than    the standard projection method for this problem because it yields smaller errors using less computational effort. The plot also shows that the RK4 method is more efficient again. Since both integral preserving methods effectively compute the RK4 approximation within their methods the computational cost required by these methods is more than RK4. Moreover, in the left plot of Figure 3 we saw that the size of the error for all three methods is similar. For these reasons RK4 is the most efficient method, however, RK4 does not preserve the integral and over longer time intervals it often has the wrong asymptotic behaviour.
Also notice in Figure 3 (right) that the difference in efficiency between our new discrete gradient method and the standard projection method is more pronounced for larger time steps. This is probably due to the fact that the initial guess (the RK4 solution) in the Newton iteration for calculating the projection step in the standard projection method is more accurate for smaller time steps, resulting in fewer iterations until the convergence test is satisfied.

A time step criterion.
A key feature of the existence and order of accuracy results in this paper is the fact that we may take x as close as we like to a critical point of I without any additional constraints on the time step. By considering different x values, and computing a single time step to get x ′ for different time steps we can show that this feature of our results is illustrated in the modified rigid body motion example. As criteria for a valid time step we consider the denominator of S (which should be positive) and the condition number (ratio between the largest and smallest eigenvalues) of the matrix Id − h 2 SM (see (40)). The initial points we consider are x = (R cos(1.1), 0, R sin(1, 1)) T for R = 1 (this is the initial condition used in our earlier simulations and is far away from a critical point of I), respectively R = 0.1 and R = 0.01 (which is near to the critical point (0, 0, 0) T of I).
In Figure 4 we have plotted condition number of Id − h 2 SM , the denominator of S and the error after a single time step vs. time step, for different starting x (R = 1, R = 0.1 and R = 0.01). We see that as the time step is increased there seem to be critical values where the condition number blows up, the denominator veers down to zero, and the error no longer behaves with the same asymptotic behaviour with respect to the time step. We see that for x close to the critical point the largest allowable time step actually increases. This is consistent with our theory. 6. Conclusion. In this paper we have analysed discrete gradient methods from first principles. We have established the bare essentials in terms of local Lipschitz continuity conditions and other criteria to ensure that these types of methods are locally well-defined and are of order p. A key feature of our analysis is that we have removed any dependence of the time step on the distance to critical points of the preserved integral and all of the constants in our results are independent of |i(x)|.
Although we have been careful to trace the value of constants through our proofs we do not make the claim that our constants are optimal. The reasons for this are that we have assumed that the same constants R and L can be used in all of the inequalities in (9), (10) and (11), and to simplify the presentation we sometimes used inequalities that were not completely sharp. If we had more precise knowledge of the optimal constants for which (5), (9), (10), (11), (20) and (21) hold, then we could repeat the arguments in the proofs of Theorems 2.1 and 3.3 to obtain better constants R ′ and H ′ in Theorem 2.1, and C 5 and H 5 in Theorem 3.3.
As well as considering theoretical conditions for these methods we also developed results that will be useful for users of these methods for solving ODEs. We have shown how Runge-Kutta methods can easily be used inside the framework of discrete gradient methods and we have also developed a new method that is linearly implicit when the integral to be preserved is quadratic, and of order p for arbitrarily chosen p ∈ N. Our numerical experiments confirmed that, in this case, solving a linear system at each step instead of a nonlinear system led to significantly reduced computational cost.
The results in this paper can be easily applied to projection methods, see [10], and further avenues for research include developing similar theory for discrete gradient methods applied to ODEs with Lyapunov functions, and discrete gradient methods applied to stiff ODEs, an issue not addressed here.