Projection methods and discrete gradient methods for preserving first integrals of ODEs

In this paper we study linear projection methods for approximating the solution and simultaneously preserving first integrals of autonomous ordinary differential equations. We show that (linear) projection methods are a subset of discrete gradient methods. In particular, each projection method is equivalent to a class of discrete gradient methods (where the choice of discrete gradient is arbitrary) and earlier results for discrete gradient methods also apply to projection methods. Thus we prove that for the case of preserving one first integral, under certain mild conditions, the numerical solution for a projection method exists and is locally unique, and preserves the order of accuracy of the underlying method. In the case of preserving multiple first integrals the relationship between projection methods and discrete gradient methods persists. Moreover, numerical examples show that similar existence and order results should also hold for the multiple integral case. For completeness we show how existing projection methods from the literature fit into our general framework.


Introduction
First, consider an autonomous ordinary differential equation (ODE) with only one first integral. We will consider the case of multiple first integrals later. We consider the same problem as in [14]: where x(t) ∈ R d for some d ∈ N, x(0) = x 0 ∈ R d is the initial condition and f : R d → R d is locally Lipschitz continuous. Existence theory for ODEs (see e.g. [9, Thm. I.7.3 on p.37]) implies that given a bounded set B ⊂ R d , there exists a T > 0 such that for any x 0 ∈ B the solution exists and remains bounded for t ∈ [0, T ]. We assume that (1) has a conserved first integral I : R d → R so that x(t) ∈ M x 0 := {z ∈ R d : I(z) = I(x 0 )} for all t ∈ [0, T ].
To simplify the notation define i := ∇I and let us also assume that I is a Morse function (i.e. smooth with non-degenerate critical points) and that i : R d → R d is locally Lipschitz continuous. As in [12], and discussed in detail in [14], if i(x(t)) = 0 for t > 0 then we may write (1) asẋ where S : R d → R d×d is a skew-symmetric (S T = −S) matrix-valued function. In general, S is not unique. One choice for S is the so-called default formula, Since I is a Morse function, the default S is locally bounded on {x ∈ R d : i(x) = 0} and for a bounded set B ⊂ R d there exists a constant C 1 = C 1 (B) such that Also define C 2 = C 2 (B) := C 1 + 1 5 . In general it will be beneficial to approximate the solution to (1) in such a way so that I is preserved exactly (in practice up to round off error or a specified tolerance) by the approximate solution. Both (linear) projection methods (see e.g. [8,§IV.4 and §V. 4.1] and references therein) and discrete gradient methods (see e.g. [12,17,20]) are types of methods that achieve this. In the special case of Hamiltonian systems one must choose whether to preserve the Hamiltonian integral or the symplectic structure (only the exact solution up to time rescaling preserves both, see e.g. [23]), but there are many examples where preserving the Hamiltonian is advantageous, see e.g. [19,21].
First, let us define linear projection methods. The basic idea of a projection method is to couple a one-step method with a projection so that after a full time step the approximate solution to the ODE lies on the manifold M x 0 . Letf : R d × R d × [0, ∞) → R d define an arbitrary one-step method applied to (1) with time step h, so that where x = x n and x ′ = x n+1 at each time step 1  In this paper we will only concern ourselves with linear projections so that step 2 of the above algorithm is given by: 2. compute x ′ ∈ M x by solving x ′ = y + λĩ(x, x ′ , h) and I(x ′ ) = I(x) for x ′ ∈ R d and λ ∈ R, whereĩ : R d × R d × [0, ∞) → R d is a vector field that defines the direction of the projection and is typically an approximation of i. We refer to this type of projection as a linear projection because (x ′ − y) ĩ (x, x ′ , h). Note that for a method defined by y = x + hf (x, y, h) in step 1 of the algorithm above, there exists an implicitly defined map Φ h : R d → R d such that y = Φ h (x). If we defineg(x, h) := (Φ h (x) − x)/h, then we may alternatively write step 1 as y = x + hg(x, h). Usingg instead off in step 1 allows us to easily eliminate y from the algorithm and express the algorithm in a single line: Given x ∈ R d and h ∈ [0, ∞) compute x ′ ∈ R d and λ ∈ R such that x ′ = x + hg(x, h) + λĩ(x, x ′ , h) and I(x ′ ) = I(x).
For more generality in our projection methods, in addition to allowing different choices ofĩ, we will also modifyg so that it may also depend on x ′ . Switching back to usingf instead ofg in the notation we get our general form of a linear projection method for preserving a single first integral: Given x ∈ R d and h ∈ [0, ∞) compute x ′ ∈ R d and λ ∈ R such that By choosingf andĩ differently, we obtain different projection methods. To the best of our knowledge all of the projection methods that have been described in the literature fit into this framework (we are only aware of linear projection methods but it may be possible to define projection methods in spaces that are not linear spaces), including the (non-symmetric) standard projection method in [8,Algorithm IV.4.2] and the symmetric projection method in [8, §V.4.1] and [7]. This will be discussed in more detail in Section 4. The other type of integral preserving methods we consider are discrete gradient methods. For their definition we must first define a discrete gradient of I -a special type of discretization of the gradient of I. Definition 1. (Gonzalez [5]) A discrete gradient of I, denotedī : R d × R d → R d , is continuous and satisfies Formulae for constructing discrete gradients include the one used in the average (or averaged) vector field method (called mean value discrete gradient in [12], also see [19]) and the coordinate increment method [11].
If we letī be a discrete gradient of I and S : R d × R d × [0, ∞) → R d×d be a skew symmetric continuous and consistent approximation of S then a discrete gradient method for solving (1) is defined by the mapping x → x ′ where In this paper we only consider the large class of discrete gradients where S is defined by the formula for all h, x and x ′ satisfying (7), whereī is a discrete gradient of I. The final equality follows from the fact that S is skew symmetric.
In [14] discrete gradient methods of this type were studied and it was shown that under certain local Lipschitz conditions and for sufficiently small time step the numerical solution to (7) (with S defined by (8)) exists and is locally unique, independent of the distance to critical points of I. For arbitrary p ∈ N it was also shown how to construct discrete gradient methods that have order of accuracy p.
In this paper we will show that all linear projection methods of the type (6) are equivalent to discrete gradient methods where the approximate solution is independent of the particular choice of discrete gradientī. ..We prove this by showing that each projection method is equivalent to (generally) several discrete gradient methods, in the sense that a projection method and several discrete gradient methods (defined with different choices forī) all define the same map x → x ′ for a given h. A consequence of this result is that projection methods are a subset of discrete gradient methods.
In this case when there is only one first integral to preserve, we can then use the theory in [14] to obtain by simple corollary new existence, uniqueness and order of accuracy results for a large number of linear projection methods (only restricted by certain mild local Lipschitz conditions onf andĩ).
When there is more than one first integral to preserve, we will prove that the same equivalence between discrete gradient and linear projection methods holds, and as a consequence projection methods are a subset of discrete gradient methods for the multiple integral situation. Since the theory in [14] is only for the single first integral case we do not obtain new results about existence, uniqueness and order of accuracy from discrete gradient method theory for the multiple first integral case. Proving these results for general linear projection methods and discrete gradient methods in the multiple integral case remains an open problem.
The remainder of this paper is organised as follows. In Section 2 we prove our first result about the equivalence of linear projection methods and a class of discrete gradient methods in the case when (1) has a single first integral. Then, in Section 3 we use this result and the theory from [14] to get new results about existence, local uniqueness, and order of accuracy for linear projection methods. In Section 4 we demonstrate how several projection methods already described in the literature are special cases in our framework and how our new results improve on existing results by allowing more freedom on the projection direction than previously, and our results are independent of the distance to critical points of I. In Section 5 we then consider the case when (1) has more than one first integral and our projection and discrete gradient methods are designed to preserve multiple first integrals. We write down a new expression for linear projection methods in this case involving oblique projection matrices and prove equivalence with discrete gradient methods. Using numerical experiments we illustrate how the order of accuracy results, proven in the single first integral case, also appear to hold in the multiple integral case. Finally, in Section 7 we discuss the implications of this work and possible avenues for future research.

Equivalence in the single first integral case
In this section we explore the relationship between linear projection methods and discrete gradient methods. We will see that each linear projection method is equivalent to possibly several discrete gradient methods where the choice of discrete gradient is arbitrary. Note, however, that discrete gradient methods are not always projection methods so that projection methods are a subset of discrete gradient methods.
So far we have not proven that the projection method defined by (6) is well-defined in the sense that the implicit system of equations (6) for x ′ and λ has a unique solution for sufficiently small time step h. So let us assume that h is sufficiently small and x ′ and λ are uniquely defined by (6) and thatĩ ·ī = 0 (definitions ofĩ andī will be given). In the next section we will prove an existence result that justifies these two assumptions under sufficient conditions forf ,ĩ,ī and h.
The following theorem shows that linear projection methods may be expressed in several equivalent ways and the following corollary explains how each linear projection method is equivalent to possibly several discrete gradient methods where the choice of discrete gradient method is arbitrary.
be a consistent approximation of i, letī be an arbitrary discrete gradient of I, and letf : Assuming that given x ∈ R d and h ∈ [0, ∞), each of the methods below have uniquely defined x ′ and λ, and thatĩ ·ī = 0, then they define the same linear projection method.
Note that (9) is the same as (6), our general form for a linear projection method. In (10), P is a projection matrix satisfying Pf ⊥ī and (I − P )f ĩ , i.e. the range of P is span{ī} ⊥ and the null space of P is span{ĩ}.
Using the equivalence of (9) and (11) we get the following corollary.
consistent approximation of f and letī be an arbitrary discrete gradient of I. If we defineî ≡ĩ andȋ ≡ī (or vice versa), then the linear projection method defined by (6) is equivalent to the discrete gradient method defined by (7) where S is defined by (8).
Definingî ≡ĩ andȋ ≡ī in the definition of a discrete gradient method is a restriction so linear projection methods are a subset of all possible discrete gradient methods.
Also notice that the methods described by (10) and (11) depend on an arbitrarily chosen discrete gradientī, whereas linear projection methods are independent ofī. At first glance it would appear that the mapping x → x ′ defined by (10) and (11) should depend on the choice ofī and these methods would give different approximations to (1) for different choices ofī. It is perhaps surprising that this is not the case, and (as a consequence of Theorem 2 since (9) does not depend onī) they give the same approximation to (1) regardless of howī is chosen. Thus, each linear projection method defines an equivalence class of discrete gradient methods, and is uniquely defined by choosingf and the direction of projection given byĩ.

Existence, uniqueness and order of accuracy
In this section we will exploit the equivalence between linear projection methods and discrete gradient methods by using theory developed for discrete gradient methods to prove new results about linear projection methods.
Typically, the projection step of a projection method (step 2 in our original algorithm) requires solving an implicit nonlinear system of equations, and a new system of equations must be solved at each time step. A basic question regarding projection methods is: Does there exist a unique solution to each of these systems of equations? A further question is: Does a projection method retain the same order of accuracy as the underlying method (the underlying method is step 1 in our original algorithm)?
Linear projection methods have already been studied in the literature (see e.g. [8, §IV.4 and §V.4.1] and [7]) and questions of existence and uniqueness, and order of accuracy have already been answered in some cases. However, these results were only stated for particular special cases ofĩ (see Section 4) and their proofs rely on either a simple geometric argument (which only holds for the standard projection method whenĩ(x, x ′ , h) := i(x ′ )), the Implicit Function Theorem, or the Newton-Kantorovich Theorem. Closer examination of these techniques -with the assistance of results in [16] and [15] that give a lower bound on the radius of existence for the Implicit Function Theorem and the Newton-Kantorovich Theorem -reveals that the time step restriction on h (or radius of existence) for existence of the numerical solution is h ≤ C|i(x)| r for some positive constants C and r. If x is near to a critical point of I (so that i(x) ≈ 0) then this type of restriction is undesirable and in numerical simulations it appears to be unnecessary. Our new results below are an improvement and extension on these earlier results because we avoid this restriction, and we only place mild Lipschitz continuity conditions on the projection directionĩ so that the results hold for a much wider class of projection methods.
For the following results we require the following definition of a ball around a point x ∈ R d . Given x ∈ R d and a constant R > 0 define To simplify the presentation that follows let us define several 'Assumptions'.
Note that since we have assumed that f is locally Lipschitz continuous, for any R > 0 there exists a corresponding L > 0 such that f satisfies Assumption 3. Similarly for i.
The following theorem ensures for sufficiently small h and under certain local Lipschitz continuity conditions, that linear projection methods (defined by (6)) have a numerical solution that is locally unique. Its proof is omitted because it is a direct consequence of Theorem 2.1 in [14] and Corollary 3 whereī is chosen to be an arbitrary discrete gradient of I satisfying Assumption 2 for R, L and H defined as in the theorem below.
Theorem 4. Let B be a bounded set in R d , let C 2 be the constant defined by (4), and suppose that R, L and H are positive constants such that Then for each This existence result only provides us with local uniqueness since we are only sure that Now let us consider the order of accuracy of linear projection methods. We use the following definition for order of accuracy, which is similar to [ Definition 5. A one-step method x → x ′ with time step h for solving (1) has order of accuracy p ∈ N, if for problems with sufficiently smooth f there exist positive constants C and H such that The constants C and H may depend on B but should be independent of x and h.
If we are given an underlying method that is of order p for some p ∈ N, i.e. the method x → y defined by y = x + hf (x, y, h) is of order p, then an important question to ask is: What additional conditions (in addition to Assumption 2) onĩ (recallĩ defines the direction of the projection) are required to ensure that a linear projection method defined by (6) is also of order p? The following theorem gives the answer: none! Besides Assumption 2, there are no additional conditions onĩ that are required to ensure a linear projection method is of order p.
Again, we rely on theory in [14] to achieve our result. The following theorem is a special case of Theorem 3.3 in [14] using Corollary 3 and an arbitrary discrete gradient i satisfying Assumption 2.
Theorem 6. For a compact set B ⊂ R d , let C 2 , R, L, H,f ,ĩ, R ′ and H ′ be defined as in Theorem 4 and let f satisfy Assumption 3 for 5R ′ and L. For each and λ ∈ R be the unique solution to (6) such that |λ| ≤ 11 5 C 2 h (which exists by Theorem 4), 2. let y ∈ B 6R ′ (x) be the unique solution to y = x + hf (x, y, h) (which exists by [14, Lem. 3.1]), and 3. let x(·) denote the exact solution to (1) satisfying x(t) = x for some t ≥ 0.
Also suppose that 4.f is such that the method x → y defined by y = x + hf (x, y, h) is of order p for some p ∈ N, i.e. when f is sufficiently smooth there exist positive constants C 3 and H 3 < H ′ such that Then the linear projection method defined by (6) is also of order p, so that when f is sufficiently smooth there exist positive constants C 5 and H 5 such that and all x ∈ B.

Existing linear projection methods
In this section we consider several linear projection methods that have been described and studied previously in the literature. Our purpose is to show how all of these methods are special cases in our general framework, and hence our new theory also applies in these cases. We will need the following version of Banach's Fixed Point Theorem (also known as the Contraction Principle). This version was also used in [14] and is from [10, Thm.

on p. 74].
Theorem 7 (Banach's Fixed Point Theorem). Let (X, d) be a non-empty complete metric space. Let T : X → X be a contraction on X, i.e. there exists a q ∈ (0, 1) such that d(T (x), T (y)) ≤ qd(x, y) for all x, y ∈ X.
Then there exists a unique fixed point x * ∈ X such that T (x * ) = x * . Furthermore, the fixed point can be found by iteration, x n+1 = T (x n ) for n = 0, 1, 2, . . . with x 0 ∈ X arbitrary.

Example 1: (non-symmetric) standard projection method
In our notation, the (non-symmetric) standard projection method described in [8, Algorithm IV 4.2] for x → x ′ is defined by for all x, x ′ ∈ R d and h ∈ [0, ∞) then the standard projection method is a linear projection method of the form (6). However, for computation the authors of [8] suggest using (6) withĩ defined bỹ for all x, x ′ ∈ R d and h ∈ [0, ∞), instead of (13) to make the system of equations easier to solve at each time step. Strictly speaking, this method withĩ given by (14) instead of (13) is a different projection method because the projection direction is different.
Let us refer to it as version 2 of the (non-symmetric) standard projection method.
To apply our new theory in Theorems 4 and 6 we must determine what conditions g and i must satisfy to ensure thatf andĩ satisfy Assumptions 1 and 2, respectively, for both versions of the standard projection method. First, consider the first version of the standard projection method whenf andĩ are defined by (13). We must first prove the following lemma about existence, uniqueness and continuity of Φ h .
Lemma 8. For a bounded set B ⊂ R d , let C 1 be the constant from (4) and suppose thatg satisfies Assumption 1 for some positive constants R g , L g and H g .
If u ∈ B 2Rg (x) and h < min{H g , 1 6Lg , . We will apply Theorem 7 with X := B Rg (x) and T (z) := u + hg(u, z, h) for all z ∈ X. To do so we must show that T (z) ∈ X for any z ∈ X and that T is a contraction on X. It is obvious that X with the metric | · | (the usual Euclidean distance) is a non-empty complete metric space. Let z ∈ X. Then using Assumption 1 forg, (4), u ∈ B 2Rg (x) ⊂ B Rg (x), z ∈ B Rg (x) and the bound on h we get Hence T (z) ∈ X. For z, z ′ ∈ X, using Assumption 1 forg and h ≤ 1 6Lg we also get so that T is a contraction on X. Therefore, by Theorem 7 there exists a unique To get (15) we use a similar argument to (17). Finally, (16) follows from the following inequality where we have again used Assumption 1 forg and h < min{H g , 1 6Lg } Now we can prove thatf defined by (13) satisfies Assumption 1 for some choice of R, L and H.
Lemma 9. For a bounded set B ⊂ R d , let C 1 be the constant from (4) and suppose that g satisfies Assumption 1 for some positive constants R g , L g and H g . Define R := 2R g , Using Assumption 1 forg and Lemma 8 (in particular (16)) we get Finally, again using Assumption 1 forg and Lemma 8 (in particular (15)) we get Now let us consider how we should choose R, L and H so thatĩ defined by (13) or (14) should satisfy Assumption 2.
Lemma 10. For a bounded set B ⊂ R d , let R and L be positive constants such that i satisfies Assumption 3 for R and L, and let H be an arbitrary positive constant. Theñ Proof. Since i is locally Lipschitz, given arbitrary R > 0, L exists. The rest of the proof is trivial.
Lemma 11. For a bounded set B ⊂ R d , let C 1 be the constant from (4), and suppose that 1.g satisfies Assumption 1 for some positive constants R g , L g and H g , and 2. i satisfies Assumption 3 for R g and L i (given R g , L i exists since i is locally Lipschitz).

Example 2: symmetric projection method
It is perhaps surprising that the symmetric projection method from [8, §V.4.1] (originally in [7]) may also be written in the form of (6). In our notation, the symmetric projection method described in [8, §V 4.1] for x → x ′ is defined by: Given x ∈ R d and h ∈ [0, ∞), compute y, z, x ′ ∈ R d and µ ∈ R such that y = x + µi(x), z = y + hg(y, z, h), whereg : R d × R d × [0, ∞) → R d is such that y → z defined by z = y + hg(y, z, h) is any symmetric one-step method applied to (1). If we let λ = 2µ and eliminate y and z from (18) then we get: Given x ∈ R d and h ∈ [0, ∞) compute x ′ ∈ R d and λ ∈ R such that If we let Ψ be the implicitly defined mapping so that λ = Ψ(x, x ′ , h) where λ satisfies whereī is an arbitrarily chosen discrete gradient of I, then we see that the symmetric projection method is equivalent to (6) if we defineĩ : for all x, x ′ ∈ R d and h ∈ [0, ∞). It turns out that the fact thatg satisfies Assumption 1 is sufficient to ensure that f andĩ satisfy Assumptions 1 and 2 respectively. This will ensure that we are able to apply Theorems 4 and 6 to the symmetric projection method. However, verifying that this is true is quite technical, so we have included it only as an appendix.

Example 3: Methods of Dahlby, Owren and Yaguchi
In [2] Dahlby et al. describe two projection methods. In our notation, given an arbitrary discrete gradientī of I, then the first of their methods (see [2, eq. 2.1]) is defined by where Φ h : R d → R d defines an arbitrary one-step method for solving (1), and P is a projection onto span{ī(x, x ′ )} ⊥ . In [2, §2.2] the projection matrix P is defined as which is the orthogonal projection matrix onto span{ī(x, for all x, x ′ ∈ R d and h ∈ [0, ∞) then it is easy to see that (20) is the same method as (10), so it is a special case of our general linear projection method. The second of the projection methods by Dahlby et al. (see [2, eq. 2.2]) is, in our notation and given an arbitrary discrete gradientī of I, defined by where P is the same projection matrix as above andg : that the map x → y defined by y = x + hg(x, y, h) is an arbitrary one-step method for solving (1). By definingf andĩ such that for all x, x ′ ∈ R d and h ∈ [0, ∞) we see that (23) is the same as (10), so it is another special case of our general linear projection method. Dahlby et al. call their methods 'discrete gradient methods' because the methods are constructed using a discrete gradient. We think it is more appropriate to describe these methods as projection methods. However, our theory (Theorem 2) has established that they may also be expressed in the form of (7), for which we use the term 'discrete gradient method'.
It is a relatively simple task to show that if for a given bounded set B ⊂ R d ,g satisfies Assumption 1 for some positive constants R g , L g and H g , andī is a discrete gradient of I satisfying Assumption 2 for some positive constants Rī and Lī, thenf and i defined by (24) satisfy Assumptions 1 and 2, respectively, for some positive constants R, L and H. For this reason and for the sake of brevity we omit the details. In the case whenf andĩ are defined by (22) we must make suitable assumptions about the method defined by Φ h for a similar result to hold.

Equivalence in the multiple first integrals case
Now let us consider the case when (1) has multiple preserved integrals. Suppose that (1) preserves M first integrals I 1 , . . . , I M , i.e. there exist I m : R d → R for m = 1, . . . , M such that for all t ≥ 0, For each m we use the notation, i m := ∇I m . Recall from Section 1 the algorithm for computing one step x → x ′ of a projection method: Given x ∈ R d and h ∈ [0, ∞) For the general linear projection case when we have multiple integrals to preserve we first define M directions for the projection, i.e.ĩ m : . . , M , and then replace step 2 with We say that this type of projection is a linear projection because x ′ − y is a linear combination of the projection directionsĩ m , i.e. (x ′ − y) ∈ span{ĩ 1 , . . . ,ĩ M }. As in Section 1 for the single integral case, we can write the two step general linear projection method algorithm in one line by eliminating y and generalisingf . The method is: Given x ∈ R d and h ∈ [0, ∞) compute x ′ ∈ R d and λ ∈ R M such that

Oblique Projections
Before we present our theorem showing the equivalence between linear projection methods and discrete gradient methods for ODEs with multiple integrals we need to introduce oblique projection matrices. The type of projection described in detail in linear algebra textbooks is usually orthogonal projection, e.g. [22,Lecture 6] and [13, §5.13]. For a set of linearly independent vectors a 1 , . . . , a M ∈ R d (not necessarily orthogonal), the orthogonal projection matrix that maps R d onto the subspace A := span{a 1 , . . . , a M } along the subspace A ⊥ is given by We would like to consider the generalisation of these two projections where the range of the projection is decoupled from the direction of the projection. For this we define an oblique projection (see e.g. [13]). Note that the space along which a projection projects is the null space of the projection and we may define an oblique projection by specifying its range and null space. For a projection R let a 1 , . . . , a M be a basis for the range of R, and let b 1 , . . . , b M be a basis for the orthogonal complement of the null space of R, so that range(R) = A, where A := span{a 1 , . . . , a M } and B := span{b 1 , . . . , b M }. Then the oblique projection matrix R is given by the formula [13, eq. (7.10.39) on p. 634], The following proposition will help us decide whether or not A and B ⊥ are complementary subspaces.
Assume that A and B ⊥ are complementary subspaces. Then there exists a unique decomposition x = x A + x B ⊥ and since a 1 , . . . , a M is a basis for A there exists a unique v ∈ R M such that x A = Av. Therefore, B T x = B T x A = B T Av and since v is uniquely determined given x, B T A is invertible.
Conversely, suppose B T A is invertible. Then the matrix R = A(B T A) −1 B T is well defined and for a given x ∈ R d , x A := Rx ∈ A and x B ⊥ := (I − R)x ∈ B ⊥ defines a decomposition x = x A + x B ⊥ . To complete the proof we must show that this decomposition is unique. Suppose x = y A + y B ⊥ where y A ∈ A and y B ⊥ ∈ B ⊥ defines another decomposition of x. There exists a unique w ∈ R M such that y A = Aw. Then B T x = B T y A = B T Aw and hence w = (B T A) −1 B T x. Substituting this into y A = Aw we get y A = Rx = x A and y B ⊥ = x B ⊥ and the decomposition is unique.
An obvious choice for B so that B T A is invertible is B = A. Since A has full rank, A T A is positive definite and invertible. But this corresponds to orthogonal projection. More generally, if B is sufficiently "close" to A then B T A is positive definite and hence invertible. If for any m, b m ∈ A ⊥ then B T A is not invertible since it has a column with all zeros.
Another projection matrix R ⊥ with range(R ⊥ ) = B ⊥ null(R ⊥ ) = A may be defined by We will use R ⊥ to define an alternative formulation for linear projection methods for ODEs with multiple first integrals.

Equivalent formulation using an oblique projection matrix
A general method for solving (1), is integral preserving if and only iff ∈ span{ī 1 , . . . ,ī M } ⊥ where eachī m is a discrete gradient of I m . This fact follows from the definition of a discrete gradient: For each m, ifī m is a discrete gradient of I m , thenf ⊥ī m if and only if However, in general, we do not havef ∈ span{ī 1 , . . . ,ī M } ⊥ . Therefore, a way of constructing an integral preserving matrix is to modify (26) to where P = P (x, x ′ , h) is chosen to be a projection matrix with range(P ) = span{ī 1 , . . . ,ī M } ⊥ so that Pf ∈ span{ī 1 , . . . ,ī M } ⊥ . It turns out that constructing an integral preserving method in this way is equivalent to a general linear projection method of the form (25). This equivalence is formalised in the following theorem and is an extension to our earlier Theorem 2 (in particular the equivalence between (9) and (10)). Then the following expressions describe the same linear projection method. and Proof. Conditions 2 and 3 in the theorem ensure that P exists (B T A is invertible), range(P ) =S ⊥ and null(P ) =S (see discussion in previous section about oblique projection matrices).
For given x ∈ R d and h sufficiently small, suppose x ′ ∈ R d and λ ∈ R M satisfy (29). For each m, sinceī m is a discrete gradient, Substituting this into (29) we get Conversely, for given x ∈ R d and h sufficiently small, suppose x ′ ∈ R d satisfies (30). We know that range(P ) =S ⊥ . Therefore, Pf ∈S ⊥ and by (27)   Moreover, for small h the matrix B is "close" to A and the property thatS andS ⊥ are complementary subspaces of R d is satisfied (see discussion after Proposition 12). Now that we have established an alternative formulation for general linear projection methods we can explore their relationship to discrete gradient methods.

Equivalence with discrete gradient methods
In this section let us consider discrete gradient methods for preserving more than one integral. Our aim is to construct a general discrete gradient method to approximate the solution to (1) such that M integrals are simultaneously preserved, and then determine which discrete gradient methods are equivalent to linear projection methods.
According to [12,Prop. 2.14] (see also [18] for the two integral case), we may write (1) asẋ using Einstein's summation principle for repeated indices, where i m := ∇I m for each m and . See e.g. [3, Chap. 1] or [6] for a definition of the anti-symmetric ∧ product from exterior algebra. Based on the expression for the ODE given in (31) we can write down a general discrete gradient method for solving (1). Let S = S(x, x ′ , h) be an anti-symmetric consistent approximation of S and for each m letī m =ī m (x, x ′ ) be a discrete gradient of I m . Then, the method x → x ′ is defined as: Given For particular choices of S, this discrete gradient method is equivalent to a linear projection method. To prove our result we will need the following proposition (see [6, eq. 5.10 on p. 106] where the pair of dual spaces are R d and itself with the usual Euclidean inner product).

Proposition 14.
For arbitrary M, d ∈ N, let U, V ∈ R d×M be two matrices with columns u 1 , . . . , u M ∈ R d and v 1 , . . . , v M ∈ R d respectively. Then If we define then the discrete gradient method defined by (32) and (33) is equivalent to the linear projection method defined by (29) or (30).
Proof. To show that these methods are the same we must show that for any v ∈ R d . Let v be an arbitrary vector in R d . Using Proposition 14 and expanding the determinant along the first row we get where A j = [fĩ 1 · · ·ĩ j−1ĩj+1 · · ·ĩ M ] ∈ R d×M . Using the fact that the determinant of a matrix is anti-symmetric (each column swap introduces a factor of −1) it follows that where A j = [ĩ 1 · · ·ĩ j−1fĩj+1 · · ·ĩ M ] ∈ R d×M , i.e. the matrix A withĩ j replaced byf . Hence, using the two identities above and Cramer's Rule (see e.g. [13, p. 476]) we get If we restrict ourselves to the situation where only two integrals are preserved, I and J, with i := ∇I and j := ∇J, then (1) may be written as (see [18]) where S lmn is an anti-symmetric tensor given by The general discrete gradient methods for (34) that will preserve both I and J are whereī andj are discrete gradients of I and J respectively, and S is a skew-symmetric consistent approximation of S. If we define wheref is a consistent approximation of f ,ĩ andj are consistent approximations of i and j respectively, then this discrete gradient method is a linear projection method. We remark that (29) and (30) do not depend on any discrete gradients of I m . Therefore, we may conclude from Theorem 15 that each projection method (defined by the choice of projection directions) is equivalent to a class of discrete gradient methods where the approximate solution values at each time step are independent of the particular choices of discrete gradients used in the discrete gradient methods.

Existence, uniqueness and order of accuracy
For the single preserved integral case we could use theory from discrete gradient methods to prove, under certain local Lipschitz continuity and consistency conditions, the existence of a unique solution x ′ at each time step of a projection method for sufficiently small h. We were also able to show under the same conditions that a projection method retained the same order of accuracy as the underlying method. For the multiple integral case we cannot do this because these results for discrete gradient methods are not yet available. We do not anticipate that extending these results to the multiple integral case poses any real difficulty, except that a proof may be very lengthy. In Section 6 we will try to test numerically whether or not it is correct to assume that these results hold in the multiple integral case.

Special cases of projection methods
As in Section 4 we can show how our expression for a general linear projection method (25) for preserving multiple first integrals encompasses all existing (as far as we are aware) projection methods. Unlike Section 4 however, we will not go into all of the technicalities regarding existence of a unique solution for sufficiently small time step. Once again, the trick to seeing how other projection methods fit into our framework is to make the right choice for the projection directionsĩ m . Example 1 revisited: (non-symmetric) standard projection method.
The first method we consider is again the (non-symmetric) standard projection method described in [8, Algorithm IV 4.2]. In our notation, their method in the multiple preserved integral case for x → x ′ is defined by solving the following system of equations for x ′ ∈ R d and λ ∈ R M , given x ∈ R d and h ∈ [0, ∞), where the map x → y defined by y = x+hg(x, y, h) defines an arbitrary one-step method applied to (1) and A = [i 1 (x ′ )· · · i M (x ′ )] ∈ R d×M . If we let Φ h be the implicitly defined map so that y = Φ h (x) then this method has the form of (25) if for each m we definẽ for all x, x ′ ∈ R d and h ∈ [0, ∞). In [8], the authors suggest instead usingĩ m (x, x ′ , h) := i m (y) to reduce the number of evaluations of i m (·) required when solving the system of equations at each step using a simplified Newton method. Example 2 revisited: symmetric projection method. The multiple first integral version of the symmetric projection method (see [8, §V.4.1] or [7]), in our notation for x → x ′ , is: Given x ∈ R d and h ∈ [0, ∞), compute y, z, x ′ ∈ R d and µ ∈ R M such that y = x + A ′′ µ, z = y + hg(y, z, h), where y → z defined by z = y + hg(y, z, h) is a symmetric one-step method applied to (1), If we let λ = 2µ and A = 1 2 (A ′′ + A ′ ), and eliminate y and z, then we can write the method in one line: Given x ∈ R d and h ∈ [0, ∞), compute x ′ ∈ R d and λ ∈ R M such that Let Ψ be the implicitly defined map so that Ψ(x, x ′ , h) = λ where λ satisfies It is now clear that this method may be written in the form (25) if for each m we defineĩ m : where Φ h defines an arbitrary one-step method for solving (1). This method has the form of a general linear projection method (25) if for each m we defineĩ m : for all x, x ′ ∈ R d and h ∈ [0, ∞) where eachī m is an arbitrary discrete gradient of I m .
The second Dahlby et al. method uses the same choice ofĩ m , but a differentf (defined earlier in (24)).

Numerical Examples
In this section we use a numerical example to provide evidence that the same results for preserving a single first integral, also hold for methods that preserve multiple first integrals. In particular we will show: 1. many possible projection directions can be used to define a projection method that preserves the order of accuracy of the underlying method; and 2. the choice of discrete gradient for discrete gradient methods that are equivalent to projection methods does not change the approximate solution in exact arithmetic, but there may be differences in finite precision arithmetic.
The example we use is Kepler's two-body problem in cartesian coordinates (see e.g. [8, §I.2] and [2]). We will consider the case where either two or three integrals are preserved (the fourth integral is not functionally independent).
Kepler's two-body problem in the form of (1) is This system models two bodies that attract each other, with one body at the origin and the second body at position (x 1 , x 2 ) with velocity (or momentum if the body has mass 1) (x 3 , x 4 ). The variable r is the distance between the two bodies. The exact solution to Kepler's two-body problem preserves four first integrals, These integrals are energy, angular momentum, and the two components of the Runge-Lenz-Pauli vector respectively. As in [8, p. 12], we use the initial condition for some e ∈ [0, 1) so that the exact solution has period 2π. The exact solution can be found by integrating equation (2.10) in [8, p. 11] but we will use a very accurate solution computed with Matlab's ODE45 and very small tolerances as a reference solution in our examples.
In Figure 1 we have computed the solution to the Kepler two-body problem with initial condition (36) using e = 0.6 for several projection methods that differ according to which underlying method is used to definef and which projection direction is used to defineĩ. We have used the classical explicit 4 th order and 6 th order Runge-Kutta methods (RK4, see e.g. [8, p. 30], and RK6, see [1, p. 194], respectively), with coefficients defined by Butcher tableaux: We modify (25) in our computations to prevent the value of I m drifting due to finite precision arithmetic. Instead of requiring I m (x ′ ) = I m (x) at each time step, we require that I m (x ′ ) = I m (x 0 ) for each m.
In the phase space plot of Figure 1 (left) we see that method b does indeed keep the approximate solution on the ellipse while the RK4 approximate solution drifts away from the ellipse. In this plot we computed the solution up to a final time of t = 50π (25 periods) with h = 2π 50 . In the order plot of Figure 1 (right) we see the more important result that methods a -d all seem to preserve the 4 th order convergence of the RK4 method and methods a6 -d6 all seem to preserve the 6 th order convergence of the RK6 method. In fact, we see that the different choices for the projection direction seem to make very little difference to the error because the errors are essentially the same size for methods ad and methods a6 -d6 respectively (the lines in the plot overlay each other). For this plot we computed up to a final time of t = 2π (only 1 period) for a range of step sizes in [10 −3.5 , 10 −1 ].  In Figure 2 we have computed the solution to the Kepler two-body problem using three methods that are equivalent in exact arithmetic according to our theory -a projection method, and two discrete gradient methods defined using different choices of discrete gradient. To avoid making the discrete gradient methods overly complicated we only consider the case when two integrals, I 1 and I 2 , are preserved. The projection method we use is method b as defined above except now we only preserve two integrals. We define methods b1 and b2 by (32) and (33) with M = 2, the same choices off and i m as for method b, and ifj m denotes the coordinate increment discrete gradient of I m (see [11] or [12]) thenī m in methods b1 and b2 is defined as In exact arithmetic, according to Theorems 13 and 15 these methods should be the same. However, in finite precision arithmetic we notice some small differences.
In the left plot of Figure 2 we have compared methods b1 and b2 with method b for increasing time. Since computations are done in finite precision arithmetic and the nonlinear systems at each time step are only solved to a tolerance of 10 −14 we expect to see that the differences between these methods grows linearly with time. Perhaps surprisingly we actually see quadratic growth in the difference between these methods. In the right plot of Figure 2 we have plotted the error of I 1 and I 2 for the approximate solution as time increases for methods b1 and b2 (method b is constructed to keep the integral error below 10 −14 , the tolerance that we solve the nonlinear systems at each time step). We see linear growth of the integral errors as time increases, as expected. The plots in Figure 2 used the same initial condition as above, a time step of h = 2π 50 and a final time of t = 100π (50 periods).

Conclusions
In this paper we determined the relationship between linear projection methods and discrete gradient methods for ODEs with conserved first integrals. A consequence of our theory is that each linear projection method is equivalent to a class of discrete gradient methods. A further consequence when there is only one first integral to preserve is that we can use theory from discrete gradient methods to prove results about projection methods. We have shown that under only mild conditions on the continuity and consistency of the projection direction we obtain a projection method that has a well-defined approximate solution provided the time step is sufficiently small, and for arbitrary p ∈ N also preserves the order of accuracy of an underlying method of order p. Moreover, the condition on the projection direction does not depend on p. For the multiple first integral case we rely on numerical experiments to confirm that similar results appear to also hold in this case.
A Example 2 continued from §4.2 Continuing on from the end of §4.2 we seek to show thatg satisfying Assumption 1 is sufficient to ensure that bothĩ andf defined by (19) satisfy Assumptions 1 and 2 respectively. We begin by verifying thatĩ defined by (19) satisfies Assumption 2 for some choice of R, L and H. Lemma 16. For a bounded set B ⊂ R d , suppose that i satisfies Assumption 3 for some positive constants R and L i , and let H be an arbitrary positive constant. Define L := 1 2 L i . Thenĩ : R d × R d × [0, ∞) → R d defined byĩ(x, x ′ , h) := 1 2 (i(x) + i(x ′ )) for all x, x ′ ∈ R d and h ∈ [0, ∞) satisfies Assumption 2 for R, L and H.
Verifying thatf defined by (19) satisfies Assumption 1 requires a much lengthier argument. We first prove the following lemma to describe the properties of Ψ. For any x ∈ B such that i(x) = 0, any u, v ∈ B R λ (x) and any h ∈ [0, H λ ), there exists a unique λ = Ψ(u, v, h) ∈ R such that |λ| ≤ 1 R λ , satisfying .
and if w ∈ B R λ (x) then  Proof. Note that since i is locally Lipschitz continuous, L i exists for any R g > 0. Fix x ∈ B such that i(x) = 0, u, v ∈ B R λ (x) and h ∈ [0, H λ ). To get the result we will use Theorem 7. Define X := {γ ∈ R : |γ| ≤ 1 R λ } (which with the Euclidean norm | · | is a non-empty complete Metric space) and T : X → R by for each γ ∈ X.
To apply Theorem 7 we must show that T (γ) ∈ X for each γ ∈ X and that T is a contraction on X.
Hencef satisfies Assumption 1 for R λ , L and H λ .