On the discretization of nonholonomic dynamics in $\mathbb{R}^n$

In this paper we explore the nonholonomic Lagrangian setting of mechanical systems in local coordinates on finite-dimensional configuration manifolds. We prove existence and uniqueness of solutions by reducing the basic equations of motion to a set of ordinary differential equations on the underlying distribution manifold $D$. Moreover, we show that any $D-$preserving discretization may be understood as beeing generated by the exact evolution map of a time-periodic non-autonomous perturbation of the original continuous-time nonholonomic system. By means of discretizing the corresponding Lagrange-d'Alembert principle, we construct geometric integrators for the original nonholonomic system. We give precise conditions under which these integrators generate a discrete flow preserving the distribution $D$. Also, we derive corresponding consistency estimates. Finally, we carefully treat the example of the nonholonomic particle, showing how to discretize the equations of motion in a reasonable way, particularly regarding the nonholonomic constraints. The exploration in this paper lays the ground to analyze the dynamics of appropriate discretizations of nonholonomic mechnical systems in the Lagrangian framework and to relate that dynamics to the dynamics of the original nonholonomic systems. We postpone this analysis to a series of forthcoming papers.


Introduction
Nonholonomic systems play an important role in Mechanics. This kind of systems are characterized by so-called nonholonomic constraints, i.e. constraints involving both configuration as well as velocity variables, and which can not be integrated to purely configuration-dependent constraints (in this case the constraints are called holonomic). Geometrically, nonholonomic constraints, when they are linear in the velocities, may be understood as a distribution D on the tangent bundle of a given configuration manifold Q, i.e. D ⊂ T Q. The dynamical behavior of nonholonomic systems, for instance related to perfect rolling motion, substantially differs from that of holonomic ones in various aspects. While the energy is conserved in nonholonomic systems, this is not necessarily true for other quantities which are conserved in the holonomic case. Usually there is no invariant Poisson bracket associated with nonholonomic systems. Moreover, volume forms in the Lagrangian phase space are not invariant. The corresponding equations of motion, termed nonholonomic are derived by the Lagrange-d'Alembert principle. We point out, that different equations, termed vakonomic, result from a genuine variational principle. These do not describe the motion of a nonholonomically constrained mechanical systems correctly (but others, such as underactuated control systems, etc.).
Many authors have recently shown new interest in the nonholonomic theory and also in its relation to recent developments in control theory and robotics. The main innovation is that nonholonomic systems are studied from a geometric perspective; see [1,2,3,4,16,19] and references therein. Due to its practical importance, nonholonomic mechanics has also attracted a lot of attention by people from the numerical integration community.
Within the general framework of geometric integration ( [13]), a primary requirement for a discretized dynamical system is a reasonable correspondence of principal qualitative features of the dynamics, which is partly achieved by respecting geometric structural properties of the original continuous-time system. However, it is in general impossible to achieve a complete qualitative and quantitative correspondence (see [12] for a formal statement in this regard). Thus, any practical integrator is likely to disrespect some of the qualitative features of the original system.
Recent works, such as [5,6,7,8,9,14,15,20] have introduced numerical integrators for nonholonomic systems with very good energy behavior and preservation properties such as the preservation of the discrete nonholonomic momentum map in case of a system with symmetry. The approaches in most of these references are based on the ideas of [21,22], where the continuous Lagrange-d'Alembert principle on T Q is replaced by discrete Lagrange-d'Alembert principles on the discrete phase space Q × Q. Of special interest are the seminal works on nonholonomic integration [5,20], where a discrete version of the Lagrange-d'Alembert principle is proposed by introducing a proper discretization of the nonholonomic distribution, i.e. D d ⊂ Q × Q, which provides integrators with important structural preservation properties. Nevertheless, as just mentioned, this nice behaviour does not imply a complete agreement between continuous and discrete counterparts. Since the principal geometric object describing nonholonomic constraints is the distribution D, in this paper we focus on integrators which exactly respect the original continuous distribution D or a certain perturbation of that.
It turns out that this D−preservation is not obvious from the discretization of the Lagrange-d'Alemebert principle. This raises the question about the inavoidable discrepancies introduced by discretizating the dynamics, even if the latter is performed in some kind of structure-preserving fashion. It is needless to mention that this is a central and fundamental question for all kinds of numerical investigations, especially concerning the long-term evolution of dynamical systems. Regarding this issue, we refer to [10] where a positive answer to the following question is given: Is it possible to embed a numerical scheme approximating the continuous-time flow of a set of autonomous ordinary differential equations (ODE) into the time evolution corresponding to a non-autonomous perturbation of the original autonomous ODE? For convenience, we recall this result in proposition 4.1 of the present paper. It may be phrased as follows: Any p−th order discretization of an autonomous ODE can equivalently be viewed as the time−ǫ period map of a suitable ǫ−periodic nonautonomous perturbation of the original ODE (where ǫ is the fixed discretization lenght).
In the present paper we apply this result to the case of nonholonomic systems in Q = R n . For that purpose, we first rewrite the basic equations of motion as an appropriate set of ODEs in local coordinates. This turns out to be possible under some regularity conditions on the Lagrangian function and the distribution manifold, which we assume throughout the paper. Thus, we are able to apply the result from [10] to the derived ODE system leading to the general result that any D−preserving discretization of an autonomous nonholonomic problem may be understood as being generated by the exact evolution map of a time-periodic nonautonomous continuous-time perturbation of the original system. Moreover, by means of discretizing the corresponding Lagrange-d'Alembert principle, cf. [5,20], we construct geometric integrators for the original nonholonomic system. We give precise conditions under which the resulting discretization schemes preserve the distribution D. In this case, a geomteric integrator will be called velocity nonholonomic integrator. We provide two specific examples of such an integrator, apply these to the case of simple mechanical systems, and study consistency properties with respect to the continuous-time dynamics. Finally, we consider the model of the nonholonomic particle to illustrate the theory.
The paper is structured as follows: in §2 we introduce the notion of a Lagrangian nonholonomic problem in R n with linear velocity constraints and show how it can be undertstood as a set of differential algebraic equations (DAE). For later use, we present a proof for the existence and uniqueness of solutions of a non-autonomous generalization of this system following [24]. Moreover, it is proved in proposition 2.1 that the nonholonomic DAE can be rewritten as an ODE on the constraint manifold D (distribution D) underlying the nonholonomic setting. §3 is devoted to the exploration of the possibility of interpolating two points belonging to D by a curve within D itself. This fact is proved in proposition 3.1 taking advantage of coordinates on D suggested by using an appropriate Ehresmann connection to trivialize the corresponding bundle structure. In §4 we first recall the above mentioned result by Fiedler and Scheurle (proposition 4.1). Then, this result is generalized to the present case, see corollary 4.1. Moreover, we introduce the notion of discretization of a nonholonomic flow and use this together with the results obtained in §3 to explore the perturbation of the original nonholonomic system caused by that kind of discretizations; see remark 4.5. Section §5 is devoted to the construction of velocity nonholonomic integrators. This notion is introduced in definition 5.6. A special type of such integrators is developed by means of the discretization of the Lagrange-d'Alembert principle established in [5,20]. In proposition 5.3 we prove that those inegrators are D−preserving in some sense. Using the results of the previous section, in §6, for two specific examples of such integrators, the order of consistency is carefully explored with respect to the class of simple mechanical systems; see propositions 6.1 and 6.2. Moreover, the case of the nonholonomic particle is considered as an elucidating toy model. In particular, this system suggests the extension of the theory proposed in §4, such that a perturbation of the constraints due to discretization is allowed. Such an extension is finally presented in remark 6.1.
Throughout the paper, we use Einstein's convention for the summation over repeated indices. The proofs of some known results which are important for the understanding of our main results are included for convenience of the reader; it will be properly highlighted.

The Lagrangian nonholonomic problem
Mathematically, the Lagrangian nonholonomic setting can be described as follows. We shall start with a configuration manifold Q, which is assumed to be an n-dimensional differentiable manifold with local coordinates denoted by q i , i = 1, ..., n, and a non-integrable constant-rank distribution D on Q that describes the linear nonholonomic constraints. We can consider this distribution D as a vector subbundle of the tangent bundle T Q (velocity phase space) of the configuration manifold. Locally, the linear constraints are written as follows: where rank (D) = n − m. The annihilator D • is locally given by where the one-forms µ α are supposed to be linearly independent.
In addition to the distribution, we need to specify the dynamical evolution of the system, for example by choosing a Lagrangian function L : T Q → R. In nonholonomic mechanics, the procedure leading from the Newtonian point of view to the Lagrangian one is given by the Lagrange-d'Alembert principle. This principle says that a curve q : I ⊂ R → Q is an admissible motion of the system if δ t1 t0 L (q (t) ,q (t)) dt = 0 with respect to all variations such that δq (t) ∈ D q(t) , t 0 ≤ t ≤ t 1 and the fixed end point condition is satisfied, and if the velocity of the curve itself satisfies the constraints. It is remarkable that the Lagrange-d'Alembert principle is not variational since we are imposing the constraints on the curve after the extremization. Thus, one may consider the intrinsic data defining the Lagrangian nonholonomic problem to be given by the triple (Q, L, D). Using Lagrange-d'Alembert's principle, we arrive at the nonholonomic equations, which in coordinates read where λ α , α = 1, ..., m are "Lagrange multipliers". The right-hand side of equation (2a) represents the reaction forces due to the constraints, and equations (2b) represent the constraints themselves.
2.1. Local existence and uniqueness of solutions. In this work we are mainly concerned with the case Q = R n , so we will assume in the following that we deal with this particular configuration manifold. However, some of the results can be extended to a general manifold Q, a fact that will be pointed out at places in the paper. The equations (2), together with the initial conditions represent a set of n + m differential-algebraic equations in R n × R m (called nonholonomic DAEs subsequently). We are interested in the existence and uniqueness of solutions (q, λ) : (3) and (4). In order to address this issue, we refer to the following general result in [24]. We slightly modify our notation to state theorem 2.1. In particular, we denote the configuration variables by x here, in order to stick to the notation in the book by Rabier and Rheinboldt [24]. Consider the general non-autonomous DAE where λ ∈ R m , x ∈ R n , ∇ẋ = ∂ ∂ẋ and (t, x,ẋ) denotes an arbitrary element of R × R n × R n , while (t, x(t),ẋ(t)) refers to a curve t → x(t) withẋ(t) = (dx/dt)(t). The coefficient functions  (5), the coefficient functions (6) are defined and smooth in some neighborhood of a point (t 0 , x 0 ,ẋ 0 ) ∈ R × R n × R n and that Then there exists a δ > 0 such that (5) has a unique solution (x(t), λ(t)) for |t − t 0 | < δ that satisfies the initial condition Along this solution the full-rank condition holds for |t − t 0 | < δ.
Due to its importance for future purposes and for convenience of the reader we include the proof given in [24] with slight modifications.
Proof. From (7) (iii) it follows that rank ∇ẋϕ(t, x,ẋ) = m in some open neighborhood of (t 0 , x 0 ,ẋ 0 ) in R × R n × R n (indeed, as we will see shortly, in the nonholonomic case ϕ does not depend on t and ∇ẋϕ is full-rank everywhere in R n × R n by construction). This implies that dim ker ∇ẋϕ = n − m; whence the orthogonal projection P (t, x,ẋ) of R n onto ker ∇ẋϕ(t, x,ẋ) is a smooth function in a neighborhood of (t 0 , x 0 ,ẋ 0 ).
Without further assumptions, equations (3) together with the initial conditions (4) do not fit in the general setting established by theorem 2.1. Therefore, in the following we restrict ourselves to Lagrangian functions, the matrix of second partial derivatives of which, i.e.
only depends on the configuration variables, and is regular (regular Lagrangians) as well as positive-definite (normal Lagrangians). In other words, (15) represents a Riemannian metric on Q. Needless to say, these assumptions are completely consistent with most of the Lagrangian functions showing up in mechanics, for instance in the case of simple mechanical systems that we consider in the examples later on. Thus, comparing (5) and (3) we set in order to ensure the local existence and uniqueness of the solutions (q(t), λ(t)) of (3), since rank ∇qi ϕ α = rank(µ α i (q)) = m, which holds by the definition of the nonholonomic problem, and as long as the initial condition (4) respects the constraints (3b). As an important remark, note that theorem 2.1 ensures the solvability of more general, non-autonomous systems than the nonholonomic DAEs which we consider here. This fact will play a role in subsequent sections.

2.2.
Nonholonomic DAEs as ODEs on a manifold. Our purpose in this subsection is to transform the DAE (3) into an ODE on a manifold (see [11,23,25,26,27] for other approaches and similar constructions). Generally speaking, if M is a submanifold of R N , i.e. M ⊂ R N , and y(t) is a differentiable curve contained in it, then (by definition of the tangent space) its time derivative satisfiesẏ(t) ∈ T y(t) M for all t. On the other hand, a vector field on M is a C 1 −mapping f : M → R N such that f (y) ∈ T y M for all y ∈ M. For such a vector fielḋ is called a differential equation on the submanifold M, and a function y : I → M satisfyingẏ(t) = f (y(t)) for all t ∈ I is called integral curve or simply solution of that equation. Now, our purpose is to construct a vector field on the distribution D that represents the nonholonomic DAE (3).
Settingq i = v i , the nonholonomic DAE (3) can be rewritten aṡ As mentioned before, the matrix (15) is considered to be regular and postivedefinite. Denoting (m ij ) = ∂ 2 L ∂v j ∂v i , and its inverse by m ij , i.e. m ik m kj = δ i j , equations (16) may be rewritten as the following DAE on R 2n+m : (analogously we will denote ∇ q i = ∂ ∂q i ) and Remark 2.2. Note that equations (17) define a very particular DAE strongly depending on the nonholonomic structure, a fact which is taken into account in the definition (18) of f (y) and in m −1 ∇ v ϕ α (y) = Next, we shall prove that (17b) determines a submanifold in R 2n . We could omit the following result just by invoking the nonholonomic structure which establishes by construction that D is a smooth submanifold of T Q. Nevertheless, we include it (see [11,27] for the proof) for later use in remark 6.1 where we allow a deformation of the nonholonomic constraints: Theorem 2.3. Let φ : U ⊂ R l → R m , 1 ≤ m < l, be a C r −mapping, r ≥ 1, on an open set U ⊂ R l . Then the regularity set is open in R l , and for 0 ∈ φ (R(φ, U )), the regular solution set is a nonempty (sub-)manifold of R l of class C r and dimension l − m.
Fixing l = 2n and φ = ϕ, by that theorem it is straightforward to see that (17b) determines a submanifold M (ϕ, R 2n ) ⊂ R 2n since for any y ∈ R 2n by the structure of the nonholonomic problem. Of course, in the nonholonomic context this submanifold M (ϕ, R 2n ) is nothing but the distribution D introduced in §2. When constructing an ODE on D from the nonholonomic DAE, the following lemma will be of main importance. It follows from the fact that both (m ij ) and (µ α i ) are full-rank matrices; nevertheless we give a geometrical proof which takes into account the structure of our nonholonomic system and which will be relevant in remark 6.1.
Proof. Since T Q is equipped with the Riemannian metric m (recall that we are considering ∂ 2 L ∂v i ∂v j to be regular and postive-definite), we can perform the de- Taking into account that µ α , Y = 0 for any Y ∈ D, where ·, · represents the canonical pairing between T Q and T * Q, the local expression of Z α is given by In order to prove that C αβ is invertible, we proceed to calculate its kernel, that is ω β ∈ R m s.t. C αβ ω β = 0: Taking into account (21), we can write the previous expression as Since Z β ∈ D ⊥ , µ α , Z β is different from zero by definition; in consequence, the previous expression vanishes if and only if ω β = 0. Therefore ker C αβ is trivial, and the claim is proved.
We point out that the matrix C αβ only depends on q, a fact that follows from the q−dependence of the one-forms µ α and of the matrix m (assumed from the beginning), and therefore of its inverse. Using this lemma, we can finally construct the vector field on D defining an ODE as formulated in the following proposition.
Note that we view y to be a point belonging to R 2n in (17) while we write x in (22) to stress the fact that the equation (22) is defined on the submanifold D ⊂ R 2n and consequently, x ∈ D. We use y in the next proof where we are determining the conditions providing (22).
The equation (17b) determines a submanifold in R 2n , the normal vector of which is defined by ∇ϕ α . On the other hand, the right hand side of (17a) defines a vector field on T R 2n . With these two ingredients, we can establish the following perpendicularity condition: where the one-form ∇ϕ β is defined in coordinates by for a fixed β. The previous equation determines λ α in terms of y ∈ R 2n , due to the invertibility result of lemma 2.4 as we will see shortly, in such a way that the vector field The perpendicularity condition (24) can be written as (25) where m −1 ∇ v ϕ α stands for the vector m ij ∂ϕ α ∂v j . We have also used the shorthand (18). Moreover, the elementwise expression of the matrix which, according to lemma 2.4, is invertible. Therefore, after a straightforward computation, from (25) we arrive at (23). This finishes the proof.
Note again that this development depends strongly on the nonholonomic structure. Namely, the construction of (22) depends on the invertibility of C αβ , which at the same time depends on the full-rank condition of (µ α i (q)) prescribed by our nonholonomic setting.
Let us finaly introduce local coordinates ξ ∈ R 2n−m and the mapping ψ : with initial condition . In general the pseudo inverse matrix (∇ ξ ψ) −1 is not unique. We shall give an explicit matrix represenation for (∇ ξ ψ) −1 below, where we introduce an adapted set of coordinates for the nonholonomic distribution D prescribed by the Ehresmann connection.

Interpolation within the constraint submanifold
Our aim in this section is to investigate the possibility of interpolating a curve connecting two points x 1 , x 2 ∈ D which remains within D. As mentioned in the introduction, this fact will play an important role in order to embed a numerical method evolving on D into the evolution of a non-autonomous perturbation of the nonholonomic ODE. For this purpose, we first establish a useful local form of the nonholonomic constraints by means of an Ehresmann connection.
3.1. Local description of the constraint submanifold in terms of the Ehresmann connection. We briefly review some basics on Ehresmann connections (for more details we refer to [1,17]). Assume that there is a bundle structure with projection π : Q → R for the manifold Q; the manifold R is called the base. We call the kernel of T q π at any point q ∈ Q the vertical space denoted by V q . An Ehresmann connection A is a vertical vector-valued one-form on Q, which satisfies Thus, we can split the tangent space at q such that Consider the nonholonomic distribution D, which, as introduced in §2, locally is given by where µ α are m linearly independent one-forms that form a basis for the annihilator D • ⊂ T * Q. Let us choose an Ehresmann connection A on Q in such a way that H q = D(q). In other words, we assume that the connection is chosen such that the constraints are given by A · v q = 0.
Using the bundle coordinates q = (y a , y α ) ∈ R n−m × R m , a = 1, ..., n − m, α = n − m + 1, ..., n, the coordinate expression of π is just a projection onto the factor y a , and the connection A can be locally expressed by a vector-valued differential one-form µ α as Given a vector v q ∈ T q Q, we denote its vertical part by , and its horizontal part by a v a ). Therefore, the Ehresmann connection allows us to choose an adapted set of coordinates for the constraint distribution D, denotes the coordinate map. This is what we mean by D-adapted coordinates.
Proof. The proof follows directly from the decomposition induced by the Ehresmann connection.
Note that c(0) = x 1 and c(ǫ) = x 2 according to (31). Thus, for any two points belonging to D, we have constructed an interpolating curve c(t) ⊂ D for t ∈ [0, ǫ] connecting them. (A different and interesting procedure to interpolate any number of points on a manifold has been proposed in [18].)

Discretization of the Lagrangian nonholonomic problem
In this section we describe in detail the result from [10] already mentioned in the introduction, which can be phrased as follows: Any p−th order discretization of an ODE can equivalently be viewed as the time−ǫ period map of a suitable ǫ−periodic non-autonomous perturbation of the original ODE. Moreover, we apply this result to the nonholonomic case, in particular to the nonholonomic ODE (26).

4.1.
Discretization of ODEs. Consider the following system of ordinary differential equationsż wheref is a real analytic vector field, and a p−th order discretization of step-size ǫ given by (35) Φ is supposed to be a diffeomorphism with respect to z, real analytic with respect to ǫ and z. Denote the (local) flow of (34) by If Φ is a discretization of order p, then there exists a continuous and increasing holds, for some p ≥ 1 and for all ǫ, z for which the left-hand side is defined.
In the following proposition we prove the embeddability property of such a numerical scheme into the evolution map of a non-autonomous ODE of the forṁ Proposition 4.1. There exists a vector field g = g(ǫ, τ, z), ǫ 0 > 0, and a continous non-increasing function with ρ(0) = ∞, such that the following statements hold i) g(ǫ, τ, z) ∈ R N is defined for all real τ , 0 ≤ ǫ ≤ ǫ 0 , and all real z with |z| < ρ(ǫ); ii) g is C ∞ −smooth in all variables, and g and all its τ −derivatives are ana- We present part of the proof of the interpolation procedure. In particular we only consider the case τ ≥ 0. For more details see [10].
Proof. We focus on the construction of G and the time periodicity of g. For sake of simplicity we omit the arguments (ǫ, z). To define an evolution map, we put G(t, s) := G(t, 0) • G(s, 0) −1 , and extend to t ≥ ǫ successively.
The idea is to interpolate between the identity map and Φ by a curve G(t, 0) (0 ≤ t ≤ ǫ), in the space of diffeomorphisms. Employing the C ∞ cut-off functions χ 0 , χ 1 introduced in §3.2, namely for 0 ≤ t ≤ ǫ and |y| < ρ(ǫ), we set where F is the local flow (36), Φ is the discrete flow (35) and y stands for the initial value of z, i.e. y = z(0). With this definition, it holds that Let [τ ] denote the largest integer not exceeding τ ∈ R, and let Φ k (ǫ, ·) denote the k−th iterate of the discrete flow map, k ≥ 1. Then we extend our definition of G to all t ≥ 0, by putting This definition implies G(t, ǫ, G(kǫ, ǫ, y)) = G(t + kǫ, ǫ, y) for all t ≥ 0, k ∈ N, ǫ, y as is appropriate for the evolution map of an ǫ−periodic, non-autonomous ODE system. Obviously the curve t → G(t, ǫ, y), t ≥ 0, is C ∞ , and thus represents a C ∞ −interpolation of the discrete forward orbit Similarly, one can extend the definition of G to all t ≤ 0. In order to define the perturbation g, we switch to the scaled time variable τ = t/ǫ. Let the second equality holding only for 0 ≤ τ ≤ 1. Then the perturbation g is defined by where y = y(ǫ, τ, z) is given implicitly by In [10] it is proved that this last equality can be solved for y providing y = y(ǫ, τ, z), which is defined in a possibly reduced domain of the form 0 because (40) implies Therefore we concluce from (40),(42),(43),(44) This proves the 1−periodicity for ǫ > 0.
Next, we apply proposition 4.1 to the nonholonomic ODE (22). Obviously, it is sufficient to do this in coordinates.
According to proposition 4.1, any p−th order discretization of equations (26) can be viewed as the time−ǫ map of a suitable ǫ−periodic non-autonomous perturbation, namelyξ Note that in the coordinates (q i , v a ) induced by the Ehresmann connection, the nonholonomic ODE (26) reads as follows: Here f q (x) is defined in (18) and We would like to stress the fact, that any p−th order discretization of the nonholonomic ODE corresponds to a rather special discretization of the original nonholonomic problem, for instance, respresented by the DAE in (17). We clarify the relevant notion of p−th order discretization of the nonholonomic problem by means of the following definition.
Remark 4.3. As mentioned above, the Lagrange multipliers λ determine the reaction forces due to the constraints. This is the reason why we include λ in the definition 4.2. Obviously, if we discretize (22) and define the discretized values λ k of λ by inserting (q k , v q k ) for q and v in formula (23), then, by Lipschitz continuity of the right hand side, the accuracy of the discretization with respect to λ is of the same order p as with respect to (q, v).
Due to the definition of the vector field h in (22), it is clear that discretizations with the properties just stated in definition 4.2 generate p − th order discretizations of the ODEs in (22), (26), and (46), respectively, and vice versa. We will devote the final sections to find a systematic way of constructing this kind of integrators and, moreover, to carefully treat some examples.

4.2.
Perturbation of the nonholonomic DAE. In this section, we relate the previous developments to the nonholonomic DAE in (17). To this end, we multiply equation (45) by ∇ ξ ψ from the left to obtain which, taking into account that x = ψ(ξ), leads to the following ODE on D: where ξ x := ψ where we are using the notation introduced in (27). Thus, the equation (48) becomesẋ where, as before, x = (q, v) T such that µ α i (q) v i = 0. The following lemma will be useful below.
is the inverse matrix of (C αβ ) as provided by lemma 2.4.
Proof. We give the proof by a direct calculation in coordinates. As proved in lemma 2.4, the matrix C αβ is invertible; therefore C αγ µ γ i m ij µ β j = (δ β α ). Multiplying this relation by the matrix (µ α k ) from the left, we obtain Rearranging the terms we get , where the symmetry of m −1 and C −1 has been used. From the last expression it follows that m ji µ γ i C γα µ α k = (δ j k ). Thus µ γ l C γα µ α j = (m lj ) and the claim follows.
Remark 4.5. Obviously, by the derivation above, the ODE in (50) is equivalent to an ǫ−periodic non-autonomously perturbed nonholonomic DAE in R 2n+m of the following formẏ where y = (q i , v i ) T ∈ R 2n , ϕ α (y) = µ α i (q) v i and f (y) is given by (18) (note that we use the y−notation here, since we are considering again an arbitrary point y in R 2n ; we will switch back to the x−notation whenever we are dealing with equations on D).
Moreover, by appropriately modifying the expression for the Lagrange multipliers λ α in proposition 2.1, for any perturbation ǫ pĝ (ǫ, t/ǫ, y) in (51), that DAE induces an ODE as in (50). In particular, this implies the unique local solvability of the standard initial value problem corresponding to (51) (cf. the proof of theorem 2.1). To show this, we first solve the following equation for λ α which is nothing but the perpendicularity condition ensuring that the vector field f (y) (wheref (y) denotes the right hand side of (51a)) is tangent to D and therefore belongs to T D. Omitting, for simplicity, the arguments of the involved functions, we obtain where (C αβ ) is again the inverse matrix of C αβ introduced in lemma 2.4. Moreover, we recall that Introducing this last expression into (51a) we arrive aṫ The term can be simplified taking into account the result in lemma 4.4 as we show next. The term can be written in coordinates as In lemma 4.4 it has been proved that µ α j C αβ µ β k = (m jk ); thus, the last equation yields Introducing this into (54), we arrive aṫ .
Of course, it would be ideal for a further analysis, if the perturbed nonholonomic DAE in (51) admitted a Lagrangian structure of some kind. The question, when this is true and how this is related to specific properties of both, the underlying discretization of the unperturbed problem as well as the interpolation procedure presented in the previous section, is left as a subject of further study. Replacing D by some kind of perturbed distribution will be another issue in future work.

Nonholonomic integrators based on the discretizaton of the Lagrange-d'Alembert principle
We devote this section to study how some discretizations of the Lagranged'Alembert principle lead to discretizations of the nonholonomic DAE (17) and furthermore, under some hypotheses, to discrete schemes approximating the nonholonomic flow and preserving D in the sense of definition 4.2. First, we introduce a notion of discretization of Hamilton's variational principle ( [21,22]), leading to variational integrators. Then we focus on similar techniques applied to nonholonomic systems ( [5,20]), mainly following the approach by McLachlan and Perlmutter [20].

Discrete mechanics and variational integrators.
Variational integrators are a kind of geometric integrators for the Euler-Lagrange equations which retain their variational character and also, as a consequence, some of the main geometric properties of the continuous system, such as symplecticity and momentum conservation (see [13,21]). In the following we will summarize the main features of this type of geometric integrators. A discrete Lagrangian is a map L d : Q × Q → R, which may be considered as an approximation of the action integral defined by a continuous Lagrangian L : T Q → R, that is where q(t) is a solution of the Euler-Lagrange equations for L joining q(t 0 ) = q 0 and q(t 0 + ǫ) = q 1 for small enough ǫ > 0. Define the action sum S d : Q N +1 → R corresponding to the Lagrangian L d by where q k ∈ Q for 0 ≤ k ≤ N , N is the number of discretization steps. The discrete variational principle states that the solutions of the discrete system determined by L d must extremize the action sum given fixed endpoints q 0 and q N . By extremizing S d over q k , 1 ≤ k ≤ N − 1, we obtain the system of difference equations or, in coordinates, ∂L d ∂x i (q k , q k+1 ) + ∂L d ∂y i (q k−1 , q k ) = 0, where 1 ≤ i ≤ n, 1 ≤ k ≤ N − 1, and x and y represent the n first and n last variables of the function L d , respectively.
These equations are usually called the discrete Euler-Lagrange equations. Under some regularity hypotheses (the matrix D 12 L d (q k , q k+1 ) is supposed to be regular), it is possible to define from (58) a (local) discrete flow map F L d : Q × Q → Q × Q, by F L d (q k−1 , q k ) = (q k , q k+1 ). We will refer to the F L d flow, and also (with some abuse of notation) to the equations (58), as a variational integrator.
Define the discrete Legendre transformations associated to L d by

and the discrete Poincaré-Cartan 2-form by Ω
where Ω Q is the canonical symplectic form on T * Q. The discrete algorithm determined by F L d preserves the symplectic form, i.e., F * L d Ω d = Ω d . Moreover, if the discrete Lagrangian is invariant under the diagonal action of a Lie group G, then the discrete momentum map J d : is preserved by the discrete flow. Therefore, these integrators are symplecticmomentum preserving. Here, ξ Q denotes the fundamental vector field determined by ξ ∈ g, where g is the Lie algebra of G. (See [21] for more details.) Everything developed in this section is true for a general configuration manifold Q.

Nonholonomic integrators.
Discretizations of the Lagrange-d'Alembert principle for Lagrangian systems with nonholonomic constraints have been introduced in [5,20], for instance. Under some regularity conditions, these discretizations allow to construct numerical integrators that approximate the continuous flow fairly well, and can be linearly implicit, semi-implicit or implicit.
To define the notion of a discrete nonholonomic system providing a discrete flow on a submanifold of Q × Q and, as well, a corresponding version of discrete Euler-Lagrange equations (58), one needs three ingredients: a discrete Lagrangian, a constraint distribution D ⊂ T Q and a discrete constraint space D d ⊂ Q × Q. In the following, we follow [20]. (1) D d is a submanifold of Q × Q of dimension 2n − m with the additional property that (2) L d : Q × Q → R is the discrete Lagrangian.
We define the discrete Lagrange-d'Alembert principle (DLA) to be the extremization of the action sum in (57) among all sequences of points {q k } with given fixed end points q 0 , q N , where the variations must satisfy δq k ∈ D q k (in other words δq k ∈ ker µ α ) and (q k , q k+1 ) ∈ D d for all k ∈ {0, ..., N − 1}. This leads to the conditions for all variations δq k , where µ α , δq k = 0 along with (q k , q k+1 ) ∈ D d . This leads to the following set of discrete nonholonomic equations For the sake of clarity, the condition (59b) may be rewritten as φ α d (q k , q k+1 ) = 0, where φ α d : Q × Q → R is the set of m functions whose annihilation defines D d (in other words, a suitable discretization of the nonholonomic constraints (1)). Equations (59), where λ α = (λ k ) α is chosen appropriately, define a discrete nonholonomic flow map F nh where q k+1 satisfyes (59) provided that (q k−1 , q k ) ∈ D d , if and only if the following regularity condition is fullfiled for each (q k , q k+1 ) in a neighborhood of the diagonal of Q × Q. To state this regularity condition, we make the definition: Proposition 5.1. Let (q k−1 , q k ) ∈ D d . Let π 1 : Q × Q → Q be the projection on the first factor. Suppose that π 1 | D d : D d → Q. The discrete nonholonomic flow map F nh L d is then guaranteed to exist locally uniquely provided that for each q k+1 ∈ γ −1 (q k−1 ,q k ) (0) ∩ (π 1 | D d ) −1 (q k ), and for each non-zero v q k+1 ∈ T q k+1 D d , holds for all v q k ∈ D q k . When this condition holds for all q k ∈ Q, the discrete nonholonomic equations produce a uniquely defined diffeomorphism F nh L d : D d → D d in the way described above.
We point out, that this regularity condition has been previously introduced in [5] phrased as follows: the discrete nonholonomic flow map F nh L d : D d → D d is uniquely defined as a local diffeomorphism if and only if the matrix is invertible. Roughly speaking, this regularity condition, when it holds, allows us to determine λ α = (λ k ) α in (59) in terms of the other variables and therefore to choose it appropriately as mentioned above. The form of (λ k ) α depends on the discretization constraints φ α d and is quite involved in general. We refer to §6 for particular examples. Again, everything in this section is true for a general configuration manifold Q.
We will refer to the algorithm described by (60) as a nonholonomic variational integrator in analogy to the unconstrained case.
Remark 5.3. The discrete nonholonomic flow map generates points (q k , q k+1 ) belonging to D d , from points (q k−1 , q k ) ∈ D d . As mentioned before, we look for some conditions guaranteing that v q k ∈ D q k for any k. This is not straightforward from the relations in (59), since a definition of the pair (q k , v k ) from (q k−1 , q k ) is required. This issue will be discussed below.
Remark 5.4. The property that v q k ∈ D q k for any k, is automatically achieved if q k+1 and q k are connectable by the time-continuous nonholonomic flow map In [20] the subset of Q × Q consisiting of pairs (q k , q k+1 ) such that there exists a C 2 curve q(t) joining q k and q k+1 in time ǫ, and such that the curve (q(t),q(t)) satisfies the Lagranged'Alembert equations is called exact discrete constraint distribution and denoted by D E d . Moreover, it is proved that there exist smooth coordinates realizing D E d as a submanifold of Q × Q of dimension 2n − m. Here we do not explore this object further, rather we are going to address the issue raised in the previous remark directly. 5.3. Construction of nonholonomic variational integrators by means of finite difference maps. As has been shown, the definition of a discrete nonholonomic system requires the tuple (Q × Q, L d , D d ) to be specified. For some reasons (which will become clear in the section devoted to examples), in many cases it is preferable to specify L d and D d by means of a so-called finite difference map ρ (see [20] for further details).
is a neighborhood of the diagonal I d in Q × Q, and V (Z) denotes a neighborhood of the zero section of T Q, i.e Z : Q → T Q s.t. Z(q) = 0 q ∈ T q Q, which satisfies the following conditions: Next, we give a local definition of ρ, i.e. ρ q : U q →Ũ q × R n : whereq ∈ U q , and g i q , f i q : U q → R, with i = 1, ..., n are smooth functions. In other words, f i q (q) are the local coordinates of v gq (q) ∈ T gq (q) Q (recall that, locally, (g q (q), f q (q)) ∈ V (Z(q))). From the definition of ρ in definition 5.5, these functions must satisfy two conditions, namely g q (q) = q and f q (q) = 0.
For the sake of brevity, we setq := g q (q) ∈Ũ q and vq := f q (q) ∈ R n .
The finite difference map ρ is crucial in our construction since it can be used to define both the discrete Lagrangian function and the discrete constraint distribution D d from the continuous one as stated in the following proposition (recall that D d is defined by the annihilation of the functions φ α d ). Proposition 5.2. Given a diffeomorphism ρ, define the functions φ d : See [20] for the proof. Thus, it is sufficient to consider the continuous oneforms µ α and, in addition, the difference map ρ in order to construct the discrete constrained distribution D d . Notice that we could have chosen any set of m independent functions ψ α : T Q → R in the previous proposition to define D d , namely φ α = ψ α • ρ. However we pick µ α : T Q → R (the one-forms defining D), a choice that has some advantages as we'll see shortly in proposition 5.3. Note as well that in the previous proposition, the discrete functions φ d are not defined everywhere on Q × Q, but in a neighborhood of its diagonal only.
Definition 5.6. Consider the commutative diagram where q k ∈ U q k−1 and q k+1 ∈ U q k . We define the local flowF L d : , which we call a velocity nonholonomic variational integrator.
In the next proposition we establish a sufficient condition forF L d to preserve the nonholonomic distribution D, in other wordsF L d : Dq k → Dq k+1 . Proposition 5.3. Assume that (q k , vq k ) ∈ Dq k and that D d is defined as in proposition 5.2, that is using φ α d = µ α • ρ. Then (q k+1 , vq k+1 ) defined byF L d belongs to Dq k+1 . In other words,F L d : Dq k → Dq k+1 .
Remark 5.7. The previous assertion includes a redefinition of the original nodes, that is {q k } → {q k }, and suggests a particular choice of the discrete constraints φ α d . In general, the discrete nonholonomic variational integrator F nh L d will not define, through ρ, a velocity nonholonomic integratorF L d which preserves the nonholonomic distribution D at the original nodes. Thus, the question arises, whether this is true for a perturbed constraint manifold. We are going to discuss this question in case of the nonholonomic particle example below.
Remark 5.8. Given a complete sequence {q 0 , q 1 , q 2 , ..., q N −1 , q N }, the generation of new nodes by the finite difference map requires a choice for the end points, i.e. eitherq 0 orq N . This can be seen easily in the following way: In general we will choose the new nodes to beq k+1 = g q k (q k+1 ), which defines univocally the sequence {q 1 , ...,q N }; thereforeq 0 remains undefined and we have to choose it independently, a natural choice beingq 0 = q 0 . Similarly, if we define the new nodes to beq k = g q k (q k+1 ), thenq N is the undefined end point, a natural choice being q N = q N . The same holds true for the velocities vq = f q (q). Remark 5.9. Notice that the discretization of the nonholonomic problem established in this section provides a set of discrete multipliers besides of the discrete dynamical variables (q k , v q k ). More concretely, the regularity condition (61) ensures that λ α can be worked out in terms of (q k−1 , q k , q k+1 ) and therefore, through the finite difference map, in terms of (q k , v q k ). The accuracy of these discrete multipliers with respect to the continuous evolution λ(t) = λ(q(t), v(t)) is determined by comparison to (23).
As mentioned before, ρ will also be used to define the discrete Lagrangian L d : Q × Q → R, accomplishing the task of defining the discrete nonholonomic system (Q × Q, L d , D d ), particularly by where L denotes the continuous Lagrangian and ǫ the step size of the discretization.
As before, all the considerations in this section make sense for a general configuration manifold Q. In the next section we shall restrict again to R n by considering two particular examples of ρ parametrized by β, namely corresponding to g q (q) = (1 − β) q + βq and f q (q) =q −q ǫ in (62). These ρ β lead to the following family of discrete Lagrangians As pointed out in (56), discrete Lagrangians are approximations of the action integral over time intervals of length ǫ. Next, using usual arguments based on Taylor expansions, we are able to determine the order of consistency of L β d (q k , q k+1 ) with respect to ǫ 0 L(q(t),q(t))dt (see [21] for more details): ǫ 0 L(q(t),q(t)) dt = ǫ L(q,q) + ǫ 2 2 where q(0) = q andq(0) =q. On the other hand, recalling that q k ≃ q(0) and q k+1 ≃ q(ǫ), we find that Therefore, if we consider β ∈ [0, 1], L β d (q k , q k+1 ) is second-order consistent with respect to the action integral if β = 1/2, otherwise it is only consistent of order one. In the examples we shall consider the cases β = 0 and β = 1/2.

Examples
In this section we focus on the particular class of simple mechanical Lagrangians when Q = R n . We construct nonholonomic variational integrators according to the procedure shown in the previous section and study their consistency properties with respect to the continuous dynamics. Furthermore, we focus on the example of the nonholonomic particle, studying carefully how particular discretizations generate numerical schemes that preserve the nonholonomic distribution D, and to which extent D is perturbed otherwise. 6.1. The class of simple mechanical Lagrangians. Assume that the Lagrangian function L : T Q → R is of the form L = T − V , where T is the kinetic energy associated with the Euclidean metric on R n and V is the potential energy, i.e.
where M is the n × n mass matrix. The equations (17) becomė where According to proposition 2.1, equations (65) induce an ODE on D given byq where according to (23) , and such that µ α (q) v = 0. In this particular case, (C αβ ) is the inverse matrix of C αβ = µ α (q)M −1 µ β (q) . Employing the coordinates (27), the ODE (26) readṡ where we use the Ehresmann connection µ α i (q) = (A α a (q), δ α β ). In addition, which corresponds to the case β = 0 in (64). Consider the discrete Lagrangian L d : Q × Q → R generated by ρ: It has been proved above that this discrete Lagrangian gives rise to a first order consistent approximation of the action integral over a time interval of lenght ǫ; therefore, in the spirit of [21], we expect a first order consistent integrator for the nonholonomic flow. The nonholonomic integrator (59) is given by where the rescaling λ k+1 → −λ k+1 /ǫ has been performed (equivalently this accounts for the choice φ α d = −ǫ µ α • ρ, which keeps us in the case φ α d ∝ µ α • ρ and the conditions established by proposition 5.3). Considering the particular form of ρ, a velocity nonholonomic integrator is defined byq k+1 := q k and vq k+1 := v k+1 := (q k+1 − q k )/ǫ, which leads to the velocity formulationF L d : where the initial condition should satisfy the constraint µ α (q k ) v k = 0. As expected, this integrator respects the nonholonomic constraint described by D, which is ensured by the equation (71c). Using usual arguments based on Taylor expansions, one can check that the previous integrator is first-order with respect to the time-continuous equations (65). The discrete Lagrange multipliers λ k+1 can be determined by inserting (71b) into (71c) and solving the linear system which makes the method linearly explicit (note that λ k+1 depends on v k andq k+1 , which, at the same time, depends onq k and v k ; therefore the right hand side of (71a) and (71b) depends explicitly onq k and v k ). Proof. For the sake of simplicity we set V = 0. The strategy of our proof is to use Taylor expansions to show that |q(t k + ǫ) −q k+1 | ∼ O(ǫ 2 ) and |v(t k + ǫ) − v k+1 | ∼ O(ǫ 2 ), where the continuous functions q(t), v(t) are determined by (66), while the discrete quantitiesq k+1 , v k+1 are determined by (71). Since the assertion is easy to prove for q(t) andq k+1 , we focus on the v part. Taking into account thatq k+1 =q k +ǫ v k and expanding µ β (q k+1 ) in (72) around q k we arrive at Recall that the method (71) respects the nonholonomic constraint (71c), i.e., µ α (q k ) v k = 0. Inserting the previous expression for (λ k+1 ) α into (71b) and expanding aroundq k we obtain On the other hand, v(t k + ǫ) = v(t k ) + ǫv(t k ) + O(ǫ 2 ) or, taking into account that v(t k ) ≃ v k , v(t k +ǫ) = v k +ǫv k +O(ǫ 2 ). Using the equations (66b) and (67), we find out that v(t k +ǫ) is equal to (73) up to O(ǫ)-terms. Thus, |v(t k +ǫ)−v k+1 | ∼ O(ǫ 2 ) and therefore the p = 1 order follows. Now, we show the order of accuracy of the discrete Lagrange multipliers (72) with respect to the continuous expression (67). In particular, (72) leads to while the continuous expression reads λ γ (q, v) is again determined by (67). Finally According to this, the discrete Lagrange multipliers are consistent with the continuous evolution, i.e. s = 0, and the claim holds.
Employing the Ehresmann connection, (71c) becomes v α k+1 + A α a (q k+1 ) v a k+1 = 0, for an initial condition satisfying v α k +A α a (q k ) v a k = 0. Moreover, the equations (71a) and (71b) can be decomposed intõ A second-order method can be constructed using the following finite difference map corresponding to the case β = 1/2 in (64). The corresponding discrete Lagrangian is given by It has been proved above that this discrete Lagrangian implies a second-order consistent aproximation of the action integral; therefore we expect a second-order consistent integrator for the nonholonomic flow. The nonholonomic integrator (59) (the same rescaling as in the above example is performed, i.e. φ α Obviously, this integrator does not preserve the nonholonomic constraint D at the original nodes, neither if we choose v k := (q k+1 − q k )/ǫ nor if we choose v k+1 := (q k+1 − q k )/ǫ. To get the velocity formulationF L d we defineq k+1 := (q k + q k+1 )/2 and vq k+1 := v k+1 := (q k+1 − q k )/ǫ according to the finite difference map ρ. The numerical scheme (q k , v k ) → (q k+1 , v k+1 ) then reads where the initial conditions should satisfy the constraint µ α (q k ) v k = 0. At the new nodes {q k } this integrator preserves the nonholonomic constraint as it is ensured by (76d). The method is fully implicit, but it becomes explicit either if V = 0 or ∇ q V = 0. In addition, in the absence of constraints (λ k+1 = 0) it reduces to the standard trapezoidal rule. We use the same techniques as in proposition 6.1 to show that the numerical scheme (76) is a second-order integrator of (66). We set V = 0 and insert (76b) and (76c) into (76d); then, by Taylor expansion up to order ǫ 2 , we obtain Expanding this expression we get where In addition, here we have taken into account that µ α (q k ) v k = 0. Proof. First, we prove that |q(t k + ǫ) −q k+1 | ∼ O(ǫ 3 ). Taking into account that On the other hand, inserting (76a) into (76b) we getq k+1 =q k + ǫ 2 v k + ǫ 2 v k+1 , which, taking into account (76c) and the order one terms in (77), leads tõ proving the claim. (Note that the order one terms in (77) are represented by which follows by Taylor expansion of all the functions depending onq k+1/2 ). Next, we prove that Now, taking into account that v k+1 = v k + ǫ (λ k+1 ) α M −1 µ α (q k+1/2 ),q k+1/2 = q k + 1 2 ǫ v k and using the expression for R αβ (q k+1/2 ) as stated above, we can perform a Taylor expansion of the previous expression for v k+1 to obtain As mentioned before, (λ k+1 . Therefore, comparing the last expression for v k+1 and the equation in (78) we find out that |v(t k + ǫ) − v k+1 | ∼ O(ǫ 3 ), establishing the p = 2 order of consistency as claimed.
Now, regarding the multipliers we follow the same analysis as in proposition 6.1. From equations (76) we arrive at Taking the Taylor expansion of this expression and comparing to (74), it follows directly that |λ α (t k + ǫ) − (λ k+1 ) α | ≃ O(ǫ). Accordingly, s = 0 and the claim holds.
As in the previous example, using the Ehresmann connection, the equations (76a), (76b) and (76c) can be decomposed intõ which, after eliminating the Lagrange multiplier using (23) lead tov denote the nonholonomic distribution corresponding to this system. Then the equations in (80) define a time-continuous flow F t : , v z (t)) T . We shall consider two examples of the nonholonomic variational integrator (59) with φ α d = −ǫ µ α • ρ.
First, let us set leading to the integrator Let us define the finite difference map ρ(q k , q k+1 ) ∈ T q k+1 Q by setting v k+1 := v q k+1 := (q k+1 − q k )/ǫ (q k = (x k , y k , z k ) Notice that, according to (82d) and the last equation just above, in this case D d = D and therefore this integrator respects the original constraint without the need of redefining the nodes. Employing the equation (77), we obtain the Lagrange multiplier and furthermore the integrator v k+1 which, according to the results in the previous section, is a (1, 0) integrator in the sense of definition 4.2. In the following case we focus on a more involved scheme that shows the procedure of redefinition of the nodes.
Let us now consider The nonholonomic variational integrator (59) reads As the discrete Lagrange-d'Alembert principle ensures, these equations generate a discrete nonholonomic flow map F nh L d : D d → D d , where in this case D d ⊂ R 3 × R 3 . Let us define the finite difference map ρ(q k , q k+1 ) ∈ Tq k+1 Q, by setting q k+1 := (q k+1 + q k )/2 and v k+1 := vq k+1 := (q k+1 − q k )/ǫ (q k = (x k ,ỹ k ,z k ) T and v k = (v k x , v k y , v k z ) T ). Then the integrator given by the scheme (84) is a particular example for the second case studied in the previous subsection, i.e., it represents a (2, 0) consistent discretization (in the sense of definition 4.2) of the nonholonomic flow given by the ODE in (66). In other words, under these assumptions the equations (84) can be rewritten as v k+1 which, by using the formula (77) to determine the Lagrange multiplier providing a discrete flowF L d : On the other hand, it is easy to see that with respect to the original nodes, this integrator does not preserve the nonholonomic constraint D np and, on top of that, the order of consistency deminishes from second to first order. This assertion can be understood in the following way: if we consider v k+1 := (q k+1 − q k )/ǫ to be the velocity associated to q k+1 , then the equations (84) can be rewritten as v k+1 It is clear that the pairs (q k+1 , v k+1 ) do not obey the nonholonomic constraint given by v z − y v x = 0 but a deformed one given by Let D np ǫ denote the deformed nonholonomic submanifold represented by (88) (note that we can not say distribution since this constraint is no longer linear nor affine). At this point we have two choices for the initial data, namely: either they respect D np or D np ǫ . In both cases the order of consistency is deminished from second to first order as compared to the case of the nodesq k . In the first case, the previous numerical scheme defines a discrete flow map F L d : T q k Q → T q k+1 Q, while in the second case it describes a discrete flow map respecting the deformed constraints, Considering the particular expression of the Lagrange multiplier, given by inserting the dynamical variables into the constraints in (87), i.e.
, in the second case we get the numerical scheme v k+1 which is first-order consistent with respect to the time-continuous flow F t , i.e. F ǫ L d (q k , v k ) = F ǫ ((q(0), v(0))) + O(ǫ 2 ). Let us define the finite difference mapρ : (q k , q k+1 ) → (q k+1 , vq k+1 ), and the corresponding mapρ : (q k , q k+1 ) → (q k+1 , v q k+1 ), i.e.,ρ To avoid confussion, we would like to point out that with some abuse of notation bothρ andρ are finite difference maps associated to the discretization (83) corresponding to redefined and non-redefined nodes respectively; while, regarding (81), the finite difference map that we have considered associates the velocity v k+1 to q k+1 without redefining the nodes. Taking this into account, the previous discussion can be summarized in the following commutative diagram and subsequent proposition where, we recall, F nh L d : D d → D d is generated by (84),F L d : D np q k → D np q k+1 by (86) and F ǫ L d : (D np ǫ ) q k → (D np ǫ ) q k+1 by (89). Proposition 6.3. Consider the nonholonomic particle system and the associated nonholonomic ODE (80). The velocity nonholonomic variational integrator defined by the finite difference map in (90a) gives rise to (2, 0) consistent discretization, and moreover it is D np -preserving. Moreover, the velocity nonholonomic variational integrator defined by the finite difference map (90b) gives rise to a (1, 0) consistent discretization that does not preserve the original nonholonomic distribution but it does respect the perturbed constraint given by (88).
The previous exposition and this proposition emphasize again the importance of the redefinition of the nodes exposed above. Namely, the integrator determined bŷ ρ (90b) is first-order, (1, 0) in the nomenclature introduced in definition 4.2. On the other hand, the midpoint quadratureρ (90a) is only second-order as expected, this is (2, 0), after the redefinition of the nodes, which also implies the preservation of the constraints; otherwise the method is first-order in the variables (q, v) and we do not get any improvement as compared toρ or the simpler (1, 0) scheme (81), where we do not redefine the nodes at all and, yet, we obtain a distribution preserving integrator.
Finally, we would like to make a remark concerning the performance of a preliminary implementation of the numerical schemes introduced in this section (regarding the nonholonomic particle). They provided the results expected according to the general theory presented in this paper: In all cases, the original constraint turned out to be preserved up to small bounded oscillatory variations. We will provide more details of the numerics associated to the theory developed in this paper in forthcoming papers.
Remark 6.1. The example above draws our attention to the case of integrators that do not preserve the nonholonomic constraints (which is the usual case) but a deformation of them. At the light of the example of the nonholonomic particle and the perturbed constraints (88), we shall consider the following deformation: where we consider δ > 0 to be a small parameter and g α : T Q → R is a generic smooth function, where Q = R n , not necessarily linear in the velocities (thus, we cannot speak anymore about a distribution but about a submanifold). According to theorem 2.3, we need further assumptions on g α such that (91) defines a regular submanifold D δ ⊂ T R n , namely It is enough to assume that rank µ α i + δ ∂g α ∂v i = m, an assumption that we are making in the following (note that this is locally true for δ small enough). Considering equations (17) and theorem 2.1 we shall define the nonholonomic dynamics on D δ by means of the following set of differential algebraic equations: which are a δ−perturbation of the original nonholonomic DAE (17). Note again that theorem 2.1 allows more freedom on the function g α , more concretely, it might be non-autonomous (although we are keeping the autonomous setting for now). Moreover, we can employ analogous arguments than in lemma 2.4 to prove that the m × m matrix (C αβ δ (q, v)) = (µ α i + δ ∂g α ∂v i ) m ij (µ β j + δ ∂g β ∂v j ) is invertible for small δ. Consequently, equations (92) induce the following ODE evolving on D δẋ where λ δ α (x) = − (C δ (x)) αβ ∇ q (ϕ β (x) + δ g β (x)), f q (x) and the constraints ϕ α (x) + δ g α (x) = 0 must be obeyed. Introducing local coordinates ξ δ ∈ R 2n−m and the mapping ψ δ : R 2n−m → D δ ⊂ R 2n such that for x ∈ D δ ⊂ R 2n , x = ψ δ (ξ δ ), the previous equation induces an ODE on R 2n−m , i.e.
where the vector field h δ (x) is given by the right hand side of (94). Also, we can employ the Ehresmann connection (which is globally well-defined), to choose suitable bundle coordinates such that the result in proposition 3.1 can be applied to define a curve c δ (t) ⊂ D δ . Furthermore, the analysis presented in corollary 4.1 and remark 4.5 can be applied in the case D δ , leading to analogous conclusions.

Conclusions
We have proved that any D−preserving integrator of the nonholonomic ODE may be understood as a non-autonomous perturbation of this ODE itself. This result is based on the construction of the nonholonomic ODE from the original nonholonomic DAE, a construction which strongly depends on the structure of the linear constraints (inherent in our nonholonomic setting) and the regularity conditions which we impose. It is also studied what kind of a perturbation a non-autonomous perturbation of the nonholonomic DAE may cause for the nonholonomic ODE. The matching of both perturbations and their relation to the continuous Lagrange-d'Alembert principle is left as a topic of further analysis in future works.
Moreover, from the discretization of the Lagrange-d'Alembert principle we construct D−preserving variational integrators. We give precise conditions under which this property is ensured. It turns out that this property requires a redefinition of the original discretization nodes. We illustrate our procedure by means of two particular velocity nonholonomic integrators whose consistency properties are carefully studied in the case of simple mechanical systems.
We employ the developed techniques to treat the example of the nonholonomic particle in order to show how the discretizations may induce a perturbation of the original dynamics, particularly of the nonholonomic constraints. Our conclusions regarding this example can be summarized as follows: when we construct the velocity nonholonomic integrator redefining the nodes to preserve D we obtain the maximum order of consistency achievable; when we preserve the original nodes and perturb D to some D ǫ we loose one order of consistency; when we preserve the nodes and do not deform the distribution we loose one order of consistency; when we keep the original nodes and deform the constraints in order to preserve the original distribution in the numerical scheme we loose one order of consistency as well. This is an interesting phenomenon which we are going to consider in more detail and in a more general situation in future works.
Finally, in the light of the possible deformation of the distribution generated by discretizations, we extended the theory developed before to allow even nonlinear deformations of the constraints (consequently we are not dealing with a distribution anymore but with a smooth constraint submanifold).