Lie-Poisson Neural Networks (LPNets): Data-Based Computing of Hamiltonian Systems with Symmetries

An accurate data-based prediction of the long-term evolution of Hamiltonian systems requires a network that preserves the appropriate structure under each time step. Every Hamiltonian system contains two essential ingredients: the Poisson bracket and the Hamiltonian. Hamiltonian systems with symmetries, whose paradigm examples are the Lie-Poisson systems, have been shown to describe a broad category of physical phenomena, from satellite motion to underwater vehicles, fluids, geophysical applications, complex fluids, and plasma physics. The Poisson bracket in these systems comes from the symmetries, while the Hamiltonian comes from the underlying physics. We view the symmetry of the system as primary, hence the Lie-Poisson bracket is known exactly, whereas the Hamiltonian is regarded as coming from physics and is considered not known, or known approximately. Using this approach, we develop a network based on transformations that exactly preserve the Poisson bracket and the special functions of the Lie-Poisson systems (Casimirs) to machine precision. We present two flavors of such systems: one, where the parameters of transformations are computed from data using a dense neural network (LPNets), and another, where the composition of transformations is used as building blocks (G-LPNets). We also show how to adapt these methods to a larger class of Poisson brackets. We apply the resulting methods to several examples, such as rigid body (satellite) motion, underwater vehicles, a particle in a magnetic field, and others. The methods developed in this paper are important for the construction of accurate data-based methods for simulating the long-term dynamics of physical systems.


Relevant work
Machine Learning (ML) approaches to data assimilation and modeling have been successful in interpreting large amounts of unstructured data. However, for many scientific and industrial purposes, direct applications of ML methods for big data have been developed without understanding that the underlying engineering and physics are challenging. To address these problems, Physics Informed Neural Networks (PINNs) have been developed (Raissi et al., 2019). As it is impossible to provide a complete overview of the PINNs literature here, we refer the reader to recent reviews (Cuomo et al., 2022;Karniadakis et al., 2021) for references and more thorough review of the literature. In that approach, one assumes that the motion of the system u(t) is approximated by a law of motioṅ where u and f can be either finite-dimensional, forming a system of ODEs, or infinite dimentional, forming a system of PDEs. In the 'vanilla' PINNs approach, the initial conditions, boundary conditions, and the ODEs itself form a part of the loss function optimized during the learning procedure. The last part -computing the differenceu − f(u, t) -can be achieved for a neural network approximating u, that is, approximating the mapping (u 0 , t) → u. Similarly, for PDEs in a (1 + 1) dimension, for example, the solution u(x, t) is a mapping from the position x and time t to the space of solutions. In PINNs, one approximates this mapping by a neural network. For a given set of weights and activating functions in neural network, at the given set of points in space and time (t i , x i ), one can compute the exact values of temporal and spatial derivatives using the method of automatic differentiation (Baydin et al., 2018). One then constructs the Mean-Square-Error (MSE) of the PDE approximation of solution and boundary conditions, taken as the cost function, and optimizes network weights to minimize that cost. The advantage of PINNs is their computational efficiency: speedups more than 100,00x for evaluations of solutions of complex systems like weather have been reported (Bi et al., 2023;Pathak et al., 2022), although the quantification of speedup still needs to be understood (Karniadakis et al., 2021). PINNs are thus extremely useful in practical applications, and have been implemented as the open source using Python and Julia, as well as Nvidia's Modulus.
In spite of the successes, there are still many uncertainties in the applications of PINNs. For example, it is known that PINNs may struggle with systems having dynamics with widely different time scales (Karniadakis et al., 2021), or having trouble in finding the true minimum for optimization (Krishnapriyan et al., 2021). Finally, systems with very little friction, like Hamiltonian systems, are difficult to describe by the standard PINNs, since evaluation errors created on every time step tend to accumulate with no chance of dissipating quickly enough. This paper develops a method of simulating a type of non-canonical Hamiltonian system, namely, Lie-Poisson systems. These systems are extremely important in building models of physical systems in Eulerian or body coordinates. The method is also shown to generalize to some other types of non-canonical Poisson brackets. The remainder of the literature review is dedicated to work on the use of data-based methods in simulating Hamiltonian systems. In order to make the discussion more concrete, we introduce a brief background in Hamiltonian and Poisson systems. More details on this can be found in Section 2.
There are several approaches applying physics-informed data methods to computations of Hamiltonian systems. Most of the work has been focused on canonical Hamiltonian systems, i.e., the systems where the law of motion (1) has the following particular structure: u is 2n-dimensional, u = (q, p), and there is a function H(q, p), the Hamiltonian, such that f in (1) becomes with I n the n × n identity matrix, leading to the canonical Hamilton equations for (q, p):q The evolution of an arbitrary phase space function F(q, p) along a solution of (3) is then described by the canonical Poisson bracket : taken in computing the numerical solutions for Hamiltonian systems, especially for long-term computations, as regular numerical methods lead to distortion of quantities that should be conserved, such as total energy and, when appropriate, the momenta. In order to compute long-term evolution of systems obeying the Hamiltonian vector fields, whether exact or approximated by the Hamiltonians derived from neural networks, one can use variational integrator methods (Hall and Leok, 2015;Leok and Shingel, 2012;Marsden and West, 2001) that conserve momenta-like quantities with machine precision. However, these integrators may be substantially more computationally intensive compared to non-structure preserving methods.
In this paper, we focus on an alternative approach, namely, exploring the learning transformations in phase space that satisfy appropriate properties. Recall that if ϕ(u) is a map in the phase space of equation (3) with u = (q, p), then this map is called symplectic if ∂ϕ ∂u A well-known result of Poincaré states that the flow ϕ t (u) of the canonical Hamiltonian system (3), sending initial conditions u to the solution at time t, is a symplectic map (Arnol'd, 2013;Marsden and Ratiu, 2013). Several authors pursued the idea of searching directly for the symplectic mappings obtained from the data, instead of finding actual equations of the canonical systems and then solving them. Perhaps the first work in the area of computing the symplectic transformations directly was done in (Chen et al., 2020), where Symplectic Recurring Neural Networks (SRNNs) were designed. The SRNNs computed an approximation to the symplectic transformation from data using the appropriate formulas for symplectic numerical methods. An alternative method of computation of canonical Hamilton equations for non-separable Hamiltonians was done in (Xiong et al., 2020), building approximations to symplectic steps using a symplectic integrator suggested in (Tao, 2016). This technique was named Non-Separable Symplectic Neural Networks (NSSNNs).
A more direct computation of symplectic mappings was done using three different methods in (Chen and Tao, 2021;Jin et al., 2020). The first approach derived in (Jin et al., 2020) computes the dynamics through the composition of symplectic maps of certain type, which was implemented under the name of Symp-Nets. Another approach (Chen and Tao, 2021) derives the mapping directly using a generating function approach for canonical transformations, implemented as Genearating Function Neural Networks (GFNNs). The approach using GFNNs al-lows for an explicit estimate of the error in a long-term simulation. In contrast, error analysis in SympNets focuses on the local approximation error. Finally, Burby et al. (2020) developed HénonNets, Neural Networks based on the Hénon mappings, capable of accurately learning Poincaré maps of a Hamiltonian systems while preserving the symplectic structure. SympNets, GFNNs and HénonNets showed the ability to accurately simulate long-term behavior of simple integrable systems like a pendulum or a single planet orbit; and satisfactory long-term simulation for chaotic systems like the three-body plane problem. Thus, learning symplectic transformations directly from data shows great promise for long-term simulations of Hamiltonian systems.
The method of SympNets was further extended for non-canonical Poisson systems by transforming the non-canonical form to local canonical coordinates using the Lie-Darboux theorem and subsequently using SympNets (Jin et al., 2022) by assuming that the dynamics occurs within a neighborhood in which the Poisson structure has constant rank. This method was named Poisson Neural Networks (PNNs). While this method can in principle treat any Poisson system by learning the transformation to the canonical variables and its inverse, we shall note that there are several difficulties associated with this approach: 1. It is known that the Lie-Darboux transformation, in general, is only local: a global function transforming the system to canonical coordinates may not exist, although such transformation did exist for all examples presented in (Jin et al., 2022). When such non-locality happens, on would need to define several transformations in overlapping domains and ensure the smoothness between them. It is not clear how the network-based Lie-Darboux function will perform in that case.
2. If an absolutely accurate Lie-Darboux transformation coupled with a symplectic integrator was used to transform the coordinates to canonical form in (Jin et al., 2022), it would of course preserve all the Casimirs. However, any numerical errors in determining that transformation will yield corresponding errors in the Casimir evolution.
3. Since the errors in Casimir evolution are determined by the errors in the Lie-Darboux mapping, it is also not clear how these errors will accumulate over long-term evolution of the system.
The preservation of Casimirs is especially important for predicting the probabilistic properties for the long-term evolution of many trajectories in the Poisson system (Dubinkina and Frank, 2007). In particular, even when the errors in each individual component of the solution may accumulate over time, the fact that the solution stays exactly on the Casimir surface will play an essential role in the probability distribution in phase space. The SympNets and PNN approach was further extended in (Bajārs, 2023) where volume-preserving neural networks LocSympNets and their symmetric extensions SymLocSympNets were derived, based on the composition of mappings of certain type. A consistent good accuracy of long-term solutions obtained by LocSympNets and SymLocSympNets was demonstrated for several problems, including a discretized linear advection equation, rigid body dynamics and a particle in magnetic field. Although the methods of (Bajārs, 2023) did not explicitly appeal to the Poisson structure of equations, the efficiency of the methods was demonstrated as applied to several problems that are essentially Poisson in nature, such as rigid body motion and the motion of particle in a magnetic field. However, the extension of the theory to more general problems was hampered by the fact that the completeness of activation matrices suggested in (Bajārs, 2023) was not yet known.
The limitations of the methods of (Jin et al., 2022) and (Bajārs, 2023) come from the fact that they relate to a general system of equations, where it is assumed that very little is known about the general system apart the fact that it is Hamiltonian, or Poisson. On the other hand, there is a large class of physical systems where the Poisson bracket is known exactly; in particular, Lie-Poisson systems. In these approaches, the bracket does not come as a consequence of the equations of motion, but from general considerations of the Lie group symmetries of the system. In that case, the Poisson bracket has an explicit expression stemming from the expression for the Lie algebra bracket, and is called the Lie-Poisson bracket. The choice of actual Hamiltonian is a secondary step coming from physics. For example, the motion of the rigid body comes from the invariance of the Lagrangian/Hamiltonian in the body frame with respect to rigid rotations, i.e. SO(3) symmetry, see the introduction to the theory in Section 2 below. Table 1 provides an incomplete list of physical examples admitting a description through Lie-Poisson or closely related brackets.
In this paper, we develop the methods of constructing the activation maps directly from the brackets, predicting dynamics built out of maps computed by particular explicit solutions of Lie-Poisson systems. The method is applicable to all finite-dimensional Lie-Poisson systems, as well as any Poisson system where explicit integration of appropriate equations for appropriate transformations are available. The advantage of utilizing explicit integration in Lie-Poisson equations to drastically speed up calculations on every time step was already noticed Problem Reference Rigid body (Holm et al., 2009) (Marsden and Ratiu, 2013) Heavy top (Holm et al., 2009) (Marsden and Ratiu, 2013) Underwater vehicles (Leonard, 1997) (Leonard and Marsden, 1997) (Holmes et al., 1998) Plasmas (Morrison, 1980), (Marsden and Weinstein, 1982), (Holm et al., 1985), (Holm and Tronci, 2010) Fluids (Marsden and Weinstein, 1983), (Marsden et al., 1984), (Holm et al., 1985), (Morrison, 1998), (Morrison et al., 2006), Geophysical fluid dynamics (Weinstein, 1983), (Holm, 1986), (Salmon, 2004) Complex and nematic fluids (Holm, 2002), (Gay-Balmaz and Ratiu, 2009), (Gay-Balmaz and Tronci, 2010) Molecular strand dynamics (Ellis et al., 2010), (Gay-Balmaz et al., 2012) Fluid-structure interactions (Gay-Balmaz and Putkaradze, 2019) Hybrid quantum-classical dynamics (Gay-Balmaz and Tronci, 2022), (Gay-Balmaz and Tronci, 2023) in (McLachlan, 1993), although the application was limited to Hamiltonians of certain form depending on the Poisson bracket. Our method is also applicable to arbitrary Hamiltonians; in fact, the Hamiltonian does not need to be known for the construction of the Neural Network. However, it is assumed that the underlying symmetry and the appropriate Lie-Poisson bracket are known. This is indeed often the case, as the detailed expression of the Hamiltonian in terms of the variables is often only approximate and is driven by the modeling choices for the particular physical system (Tonti, 2013), in contrast to the Lie-Poisson bracket itself. The novel contributions of the paper are as follows: 1. We show that a large class of Poisson systems, namely Lie-Poisson systems obtained by symmetry reduction, allow explicit construction of maps for every particular Lie-Poisson bracket.
2. These maps are global, in contrast to Lie-Darboux coordinates that are only local and may need to be re-computed depending on the position of solution in the configuration manifold.
3. By construction, these maps preserve Casimirs of each particular bracket exactly, which is not necessarily the case for PNNs, LocSympNets, Sym-LocSympNets and any other methods known to us.
2. An introduction to the Lie-Poisson equations

General introduction: from Lagrangian to Hamiltonian description
We start with a brief introduction of the origin of Lie-Poisson systems to introduce some notations and show why this particular type of approach is essential for many physical problems. We will consider only finite-dimensional systems here not to make the consideration too abstract. Suppose a mechanical system is described by the coordinates q and velocitiesq, with q lying on some configuration manifold Q of dimension n. Hamilton's action principle is based on the Lagrangian function L(q,q) (possibly depending on time) and on the action S = t f t 0 L(q,q)dt, and imposes the condition that the variations of the action must vanish on the variations of q that are fixed on the boundaries In the Lagrangian approach, one takes the variations of (7) and gets the Euler-Lagrange equations, which are second order equations in q. In the Hamiltonian approach, one introduces the momenta p = ∂L ∂q and assumes that this relation can be inverted for each q, giving the velocities asq =q(q, p). One then defines the Hamiltonian function H(p, q) = p ·q(q, p) − L(q,q(q, p)), and the Euler-Lagrange equations of motion are equivalent to the canonical Hamilton equations: Any function F(q, p) evolves according to the canonical bracket One can see that the bracket (9) satisfies the following properties that are valid for any functions F(q, p), G(q, p) and H(q, p): After defining the total phase space u = (q, p), the canonical bracket (4) and Hamilton's equations of motion can be written in coordinates as where I n is the n × n unit matrix. From the definition of J above, J −1 = J T = −J.

General Poisson brackets
A general Poisson bracket {F, H} satisfies all four properties of the canonical Poisson bracket above, but cannot be necessarily expressed in coordinates as (10). Instead, the bracket and corresponding equations of motion are described more generally in local coordinates as In order for the bracket (11) to be Poisson, the matrix B(u), also known as the Poisson tensor, must be antisymmetric and satisfy a condition involving both the matrix B(u) and its derivatives. If such bracket can be found, the system is called (non-canonical) Poisson. A special attention should be paid to the case when the matrix B is degenerate. In that case, there very often 1 are special functions for that particular bracket, called Casimirs, which are conserved for any Hamiltonian. By definition, a Casimir function C satisfies Any evolution must occur in such a way that all Casimirs of the system are conserved. Geometrically, the motion is only possible on the intersection of Casimir level sets, no matter what the Hamiltonian for the system is.
As we have seen, the motion of mechanical systems without friction is governed by canonical Poisson brackets, therefore the appearance of a non-canonical bracket (11) is somewhat mysterious. It turns out that in presence of symmetries, one can drastically reduce the degrees of freedom of the canonical Hamiltonian system by expressing the dynamics in terms of reduced coordinates, such as body or spatial coordinates. In such reduced variables, the dynamics is still governed by a Poisson bracket, but which is no more canonical. One of the most important example is the case of a system whose configuration is a Lie group: Q = G. Let us consider the case of a rigid body. The configuration manifold is the Lie group of rotation matrices in R 3 , also known as G = S O(3). This group consists of all 3 × 3 orthogonal matrices Λ (i.e., Λ T Λ = I 3 with the determinant equal to 1). The Lagrangian depends on the variables (Λ,Λ), and is just the kinetic energy. If one were to write either the Euler-Lagrange equations or the canonical Hamilton equations for Λ obtained by parameterizing S O(3) using local representation of rotation matrices, such as rotation angles, one would end up with complicated and unwieldy equations. Instead, the equations of motion for the rigid body can be efficiently written in terms of the angular velocity in the body frame Ω using the tensor of inertia I as: As it turns out, the equations (13) can be understood from the point of view of symmetry. The rigid body kinetic energy is invariant with respect to left rotations Λ → RΛ, where R ∈ S O(3) is a fixed rotation matrix. We can thus express the kinetic energy in terms of the antisymmetric 3×3 matrices Ω = Λ TΛ which take on the role of angular velocities. One can compute the vector Ω as Ω i j = ϵ i jk Ω k , with ϵ i jk being the Levi-Civita symbol. The equations of motion (13) can be written in terms of momenta Π, with the utilization of the rigid body bracket: The bracket in (14), as it turns out, satisfies the four properties of a Poisson bracket and is expressed in terms of the Lie bracket on the Lie algebra of S O(3), given by the vector product, stemming from the invariance of the system with respect to S O(3) rotations. These ideas can be generalized as follows, see (Marsden and Ratiu, 2013, Chap.10). For a general system defined on a Lie group G, one naturally has the associated Lie algebra g with Lie bracket denoted as [α, β] for α, β ∈ g. If {e a }, a = 1, . . . , n is a basis of g, then the Lie bracket is locally expressed in terms of the structure constants C d ab such that [e a , e b ] = C d ab e d .
Let us denote by ⟨µ, α⟩ the duality pairing between vectors α in the Lie algebra g and co-vectors (or momenta) µ in the dual space g * to g. The partial derivatives of functions F, H : g * → R with respect to µ thus belong to g, and one can define the Lie-Poisson bracket derived from the Lie bracket as follows: We refer to Appendix A for the explanation of the ± sign and for the relation between this bracket and the canonical bracket. In terms of coordinates, the bracket (16) is a particular case of (11) with the matrix B(µ) defined as and the Lie-Poisson equations are expressed in coordinates aṡ One can verify that the Lie-Poisson bracket (16) satisfies all the conditions of a general Poisson bracket. While the Lie-Poisson brackets appear because of the fundamental considerations of the symmetries of the physical system, the Hamiltonian is a modelling choice related to physics. Thus, we assume that if a system possesses a Lie-Poisson bracket, it is known explicitly a priori and does not need to be computed or determined from the data. However, the dependence of the Hamiltonian on its arguments and parameters is not known and must be determined.

LPNets as Poisson maps for a general system
We are now ready to consider the theory of structure-preserving neural networks. We follow the ideas originally introduced in (Jin et al., 2020) as Symp-Nets. The idea behind SympNets is to learn in the space of available symplectic transformations for a canonical system. Further, (Jin et al., 2022) introduced a generalization of this method to an arbitrary Poisson system by using the Lie-Darboux theorem, stating that every Poisson system can be locally transformed into a canonical form, implementing these ideas a PoissonNets. This theory was further extended in (Patel et al., 2022;Zhang et al., 2022) for thermodynamics systems using the GENERIC metriplectic bracket approach.
Mathematically speaking, previous works in this avenue of thinking (Jin et al., 2022(Jin et al., , 2020 and (Bajārs, 2023) seek to determine a mapping ϕ h (y i ) in the next computational point y i+1 , satisfying as many restrictions preserving the structure of the actual flow as possible. The authors developed universal mappings that are symplectic in the canonical coordinates (q, p), and sought the solution as a combination of these mappings. Since the mappings are universal for all finitedimensional Hamiltonian systems, one then needed to prove the convergence and completeness of the combinations of these mapping, especially when coupled to the Lie-Darboux theorem which maps the Poisson system to a canonical form.
We build on the ideas of SympNets and Poisson nets by directly constructing the transformations related to the Poisson systems (Poisson maps) for the known Lie-Poisson bracket, and using them as elements for construction of LPNets.
We first recall the useful concept of Poisson maps.
Definition 3.1. (Marsden and Ratiu, 2013, §10.3) Let (P i , {·, ·} i ), i = 1, 2 be two Poisson manifolds. A mapping f : P 1 → P 2 is Poisson if it preserves the Poisson bracket, i.e., for all functions F, G : A critical piece of information for our further progress is contained in the following result. Let ϕ t (u 0 ) be the flow of the Poisson system associated with H, see (11), which maps the initial conditions u 0 to the solution u(t) at time t. Then, the mapping ϕ t is Poisson, i.e., it satisfies: for all functions F, G : P → R.
Following this theorem, we can design Poisson maps for a particular Poisson bracket using a sequence of flows created by simplified Hamiltonians for the Lie-Poisson dynamics. The advantage of this approach is that the resulting transformations will be Poisson and will be able to approximate any flow locally for the particular system considered. The disadvantage of our approach is inherently intertwined with the advantages: the mappings have to be constructed explicitly for every Lie-Poisson bracket. Fortunately, as we show below, this is possible as these mappings are derived as solutions of a linear system of ODEs. The method presented here is reminiscent of the Hamiltonian splitting methods used in numerical analysis (McLachlan, 1993;McLachlan and Quispel, 2002), reformulated for the purpose of data-based computations and the use of neural networks to find appropriate parameters of the Hamiltonians.

A general application to finite-dimensional Lie groups
The key to this paper lies in considering the evolution of the momentum µ coming from the equations (18) for particular expressions for the Hamiltonian, namely, Hamiltonians linear in momenta where α a are some constants that are to be found based on the learning procedure. The flow generated by the Hamiltonian (21) is given by a linear equation in µ that can be written in two equivalent ways (choosing the + sign in (18)): The number of possible dimensions of α is exactly equal to the dimension of the momentum space. However, the "effective" dimension of this space may be less, and is related to the dimension of the Lie algebra n minus the dimension of the kernel of the operator N(µ) : g → g * , see Remark 4.1. The operators M(α) acting on the space of momenta µ and N(µ) acting on the space of α can be described in the coordinate-free form as ad * α □ and ad * □ µ, respectively, see Appendix A. Let us assume that there are d independent Casimir functions C j (µ), j = 1, ..., d. From (12) and (17) such functions satisfy C d ab ∂C j ∂µ b µ d = 0 for all µ, i.e., ∂C j /∂µ must belong to the kernel of the operator N(µ). We shall consider the effective dimension of the space of all possible α to be exactly n − d, where n is the dimension of the momentum space and d is the number of independent Casimirs. Thus, locally the vectors α in this effective space form exactly the right number of local tangent vectors to the intersection of Casimir surfaces to reach any point locally using a flow generated by the Poisson map. The next step is to compute that map exactly.
Remark 4.1 (On the effective number of dimensions). In general, the number of null directions of N(µ 0 ), denoted as k(µ 0 ) may depend on the momenta µ 0 ∈ g * (or, more precisely, on the actual coadjoint orbit the solution is on), whereas the number of independent Casimirs d is fixed. The dimension of the image of the map ξ ∈ g → N(µ 0 )ξ = ad * ξ µ 0 ∈ g * is the dimension of the coadjoint orbit of µ 0 , denoted as O µ 0 . Thus, in general, we have k(µ 0 ) + dim(O µ 0 ) = n and we have d ≤ k(µ 0 ) for almost all µ 0 , but not necessarily d = k(µ 0 ) 2 . However, the data are exceptionally unlikely to lie on orbits with high codimension k(µ) > d, and we are going to assume that the effective number of dimensions of α is n − d. If the data was obtained from one of the orbits with high codimension, one would amend formula (26) below using the information about that exceptional orbit. Note that our method preserves the general form of coadjoint orbits in all cases, whether the orbit is exceptional or not. This fact could be useful in data-based computations of exceptional orbits. It is an interesting question which we will address in the follow-up work.
Equations (22) are linear differential equations in µ, and the solutions of these equations can be (in principle) found in explicit form. This solution can be written in compact form as The mappings T(t, α) defined in (23) satisfy all the requirements as the building blocks for the neural networks. Let us denote the mapping T a (t, α a ), a = 1, ..., n, to be the map originating from the a-th component of α to have the value of α a and all other values being zero. We divide each time step into n substeps h a with a h a = h. Notice that the number of steps is n and not n − d, where d is the number of Casimirs, as we explain later.
These mappings satisfy the following conditions: 1. The mappings T a (h a , α a ) are Poisson, see (20), for the Lie-Poisson bracket, 2. The mappings conserve all Casimirs of the system, 3 3. Any two points close enough to each other on the same Casimir surface can be connected using a combination of mappings T a (h a , α a ).
The procedure of constructing LPNets is as follows: 1. Find explicit solutions for (23) by solving equations (22).
2. Defineᾱ = (α 1 , . . . , α n ) and For the set of N data pairs (µ 0 i , µ f i ), i = 1, . . . N set up a minimization procedure to find the set of numbersᾱ for each µ 0 i , minimizing the "individual" square lossᾱ In most of the examples we consider here, we can findᾱ analytically; however, in more general situations, one could also utilize a root-finding or a minimization procedure findingᾱ i for every pair of data points.
3. Sinceᾱ are only defined up to a vector normal to the Casimir surface, in order to make the data consistent, we need to project out the appropriate components of the Casimirs. We will need to find coefficients p j , j = 1, . . . , d such that the projection ofᾱ on the gradients of the Casimirs vanishes: where C j , j = 1, ..., d are a set of d independent Casimir functions.
4. Create a neural network approximating the mapping µ i →ᾱ i . This function will be denoted asᾱ = NN(µ).
Since the operators T a , in general, do not commute, the composition order should be fixed ahead of time and also be the same for all data pairs. A different choice of the composition order will lead to a different (but of course equivalent) set of parameters α. The choice of the composition order of T a has to be preserved in the prediction step as well. Also, we do not consider different time sub-steps h a here, taking them all to be equal. It is possible that the choice of h a can be made to improve accuracy or convergence properties of the scheme.
The solution starting at a given initial condition µ 0 can then be evaluated using the neural network. If µ = µ j at the time step j, then the value of the solution at the next step is computed as Note that we never compute the actual Hamiltonian, its gradients, or the equations of motion. Instead, we just compute the composition of Poisson transformations reproducing the dynamics in phase space of some unknown Poisson system -with the known Lie-Poisson bracket.
We will now apply this procedure to several particular examples of physical systems which have a Lie-Poisson bracket. We shall call this method Local LP-Nets, or simply LPNets, as the exponential representation of the map is only defined locally. The advantage of this method is that when operating on Lie groups, we are guaranteed to be able to reach any point locally with an exponential map, so the completeness is automatic. However, this mathematical simplicity has to be compensated by the necessity to apply a neural network to learn the mappings from data points in the neighborhood of a trajectory. After our description of LP-Nets, we develop a more complex procedure which derives Lie-Poisson activation modules that directly extend the work of (Bajārs, 2023;Jin et al., 2022Jin et al., , 2020. We call these methods global Lie-Poisson networks, or G-LPNets. Using the example of a rigid body motion, we show that G-LPNets provide a promising simple, accurate and highly computationally effective neural network.
A more general derivation performed in the coordinate-free form and in the general language of modern geometric approach is presented in Appendix A.

Test cases for LPNets
We choose to use the same test cases as in (Bajārs, 2023;Jin et al., 2022), except for an additional test for S E(3) (the underwater vehicle). No detailed comparisons are performed between our results and those in (Bajārs, 2023;Jin et al., 2022). This is because the accuracy of each method depends strongly on the structure of the neural network, the accuracy and distribution of data used in training, and specific learning procedures employed, which prevents a fair comparison between methods. Instead, we will outline the performance of LPNets on these problems, and contrast with general results from (Bajārs, 2023;Jin et al., 2022). In all cases, LPNets conserve the Casimir functions to machine-precision, unlike the other approaches.
In what follows, we will develop the prediction of parameters for LPNets using a dense Neural Network structure. Naturally, such a dense network will require appropriate number of data points to avoid overfitting. In Section 6, we show how to reduce the size of that network and achieve the accuracy in the whole phase space with a particular structure of a network which we will call G-LPNets.

Rigid body dynamics
Motion of a rigid body as dynamics on the Lie group S O(3). The (Lie-)Poisson bracket, the Hamiltonian, and the corresponding equations of motion for momenta Π (measured from the body frame of reference), are (Holm et al., 2009): Suppose there is a sequence of pairs of initial and final points of transformation . . , N, coming from some information that we call ground truth. If that sequence comes from a single trajectory of length N, i.e. has the form However, our method does not explicitly assume the existence of a single trajectory for learning.
To find the Poisson map approximating the motion of the system at every time step i, let us consider Hamiltonians H i linear in momenta, i.e. having the form H i (Π) = A i · Π, where A i is some unknown constant vector that is different for every pair of points. The Hamiltonian flow generated by that Hamiltonian is described byΠ Note that we can also consider a Hamiltonian of the form H(Π) = f (A · Π). The dynamics induced by this Hamiltonian preserve the quantity A · Π, as one can see from (29). Thus, that extension corresponds to a simple redefinition of A by scaling and does not bring extra insight into the problem. 4 Thus, the motion defined by (29) is a rotation of a vector Π about the axis A i with a constant angular velocity. The flow preserves the Lie-Poisson bracket since it is a Hamiltonian flow with the same bracket. The dynamics (28) also preserve the Casimir C(Π) = |Π| 2 exactly. After the time t = h, the dynamics (28) rotates the vector of angular momentum Π by an angle ϕ(A i ) = |A i |h around the axis n A i = A i /|A i |. In other words, if R(n, ϕ) is the matrix of rotation around the axis n by the angle ϕ, then (28) transforms the momentum as Π → R(n A i , ϕ(A i ))Π.
Everywhere in this paper, the ground truth is obtained by BDF integrator in Python's Scipy package, with relative and absolute tolerances being set at 10 −13 and 10 −14 , respectively. Note that this numerical method is not expected to conserve Casimirs so our method will actually be more precise than the ground truth. We note that while Lie-Poisson integrators preserving Lie-Poisson structure do exist (Marsden et al., 1999), the precise implementation needs to be rederived in an explicit form for each particular Lie-Poisson system. We found it to be more appropriate to use a high accuracy algorithm that is common to all problems for a fair comparison, rather than build an algorithm that is tailored to every particular problem. The accuracy of the BDF algorithm is more than sufficient for our ground truth calculation and providing comparisons with the previous works. Thus, to be consistent, we used a high precision non-symplectic integrator for all cases, carefully checking its accuracy in all applications.
Data preparation. We should consider three angles of rotations. Given a sequence of begin and end pairs of momenta (Π 0 i , Π f i ), we compute A i as a function of Π 0 i as follows: and the angle θ i of rotations from vector Π i 0 and Π i f as the shortest motion along the sphere Π =const. The cross product contains information for both the direction normal to both Π 0 i and Π f i , and the angle of rotation, as described above. The factor 1/h is introduced to normalize the output data to be of order 1.
We could of course get A i by finding the match of three subsequent rotations around, say, Euler angles, and subtracting the corresponding rotation about the axis Π 0 i or Π f i , applying equation (25) directly. However, the notation of vector cross product, only available for S O(3), provides a simple and efficient alternative to this more complex procedure.
Neural network. Make a standard neural network learning from the data for the mapping Π → A. The neural network will have the three components of Π as inputs and the three components of A as outputs. Here and everywhere else, we utilize the package Tensorflow 5 . In the results shown in Figure 1, the learning is done on N = 1000 pairs produced by the single trajectory originating at , with the interval between trajectory points given by h = 0.1. The neural network has three hidden layers of 16 neurons with the sigmoid activation function, with 659 trainable parameters. Out of 1000 pairs as the input data, 80% are used for training and 20% for evaluation, with the loss measured as the mean square discrepancy between R(n A i , ϕ(A i ))Π 0 i and Π f i . Adam optimization algorithm is used with the learning rate starting at 10 −3 . The loss and validation loss reach the values of approximately 3.8 · 10 −9 and 4.1 · 10 −9 after 10 5 epochs.
Prediction. The predicted trajectory start Π 0 is taken to coincide with the endpoint of the learning trajectory, with the values of Π 0 ≃ (0.43, 1.33, −0.21). On each step, once Π j−1 is known, Neural Network creates the prediction for the rotation axis A j /h and the angle ϕ j = |A j |/h. That prediction of the neural network is used to produce the mappings Π j = R(A j , ϕ j )Π j−1 , where j = 1 . . . m after the desired number of steps. That prediction by LPNets is compared with the ground truth prediction obtained by high accuracy ODE solver as described above. We perform 10000 time steps to reach t = 1000 and present the results in Figure 1 and In Figure 2, we present the results for the conservation of the Hamiltonian H(Π) and the Casimir C(Π) = |Π| 2 . The Hamiltonian is preserved to about 0.01% relative accuracy. The Casimir in the ground truth solution is preserved to about 10 −9 accuracy. In the LPNets solution, the Casimir is preserved to machine precision, far exceeding possible accuracy for ground truth for this variable. On the right side of this panel, we plot the error of the solution as the L 2 norm of deviation between two solutions (ground truth and LPNets) for each t. The deviation is growing roughly linearly in time to the values of about 0.5 after the time t = 1000. This may come as a surprise given the excellent agreement on the right hand side of the Figure 1 for all t. This phenomenon has a simple explanation: high accuracy of conservation laws presented on Figure 2 forces the solution to exist on the intersection of H =const (an ellipsoid) and C =const (a sphere) with the high accuracy.
Comparison with the previous literature. The case of rigid body motion was considered in (Bajārs, 2023). The appropriate comparison is the learning of the single trajectory of the rigid body dynamics contained in §4.2.3 of that paper. The error of a single trajectory (ground truth vs. a solution obtained by the neural network solution) is of the same order as our results. The error in Hamiltonian is somewhat better in our case, with the relative error being about 10 −4 (0.01%) vs. 0.008 in (Bajārs, 2023). However, that number depends on the particular realization of both neural networks and learning data, as we outlined above. The value of the Casimir C(Π) = |Π| 2 is conserved to machine precision in our case, whereas it would follow the general accuracy of computations in the previous work on the subject.
Learning general dynamics of the rigid body. In the above calculation, we have followed the method of (Bajārs, 2023;Jin et al., 2022) and learned the dynamics continuing a single trajectory. However, it is also possible to extend the LPNets to learn several trajectories simultaneously, and predict the dynamics of the trajectory the method has not seen. We take the initial conditionΠ 0 = (1/ √ 2, −1/ √ 2, 1) and compute 20 ground truth trajectories with the initial con- Each trajectory generates 200 pairs of ground truth mapping between the momenta at the neighboring points, a total of 4000 data points. A neural network is constructed with a similar structure as the one used for learning a single trajectory, having three inner layers with 32 neurons, each having a sigmoid activation function, and the total of 2339 trainable parameters. The neural network is trained using Adam algorithm with initial time step of 10 −3 decaying exponentially to 10 −4 , over 10 5 epochs. The final training and validation losses are slightly below 10 −6 and 10 −5 respectively, after 100,000 epochs.
Trajectory prediction using LPNets. A trajectory is then constructed with the initial conditionsΦ 0 =Π 0 iterating over 10000 steps up to the time t = 1000. Even though all the learning data were taken a finite distance from this solution, the LPNets faithfully reproduces the ground truth, as shown in Figure 3. While the solutions were computed until t = 1000, the left panel of Figure 3 only illustrates the evolution of momenta until t = 200 for clarity. This example of S O(3) shows that our method is capable of learning multiple trajectories and understanding the general Poisson dynamics. Of course, one has to keep in mind that in order to achieve good global accuracy, one needs to have quite a dense covering of the configuration manifold with the data points for learning, a task that may be difficult in many dimensions. A compromise presented here that is feasible to implement is to consider a few trajectories in the neighborhood of the desired trajectory for learning. To illustrate that point, we show the 3D plot of the trajectories and the corresponding data for learning on the right panel of Figure 3. Note that all the data were used on the right panel, and there is no visible deviation between the ground truth and the solutions in 3D. We present the accuracy of the results compared to the ground truth and the preservation of the conserved quantities (Energy and Casimir) on Figure 4. The conservation of the Hamiltonian is satisfied with the relative accuracy of about 0.25% (0.001 in absolute accuracy), and the Casimir is conserved in our case to the absolute accuracy of about 10 −11 , several orders of magnitude better than the ground truth solution.

Extended pendulum case
In order to compare our results directly with the Poisson Neural Networks developed in (Jin et al., 2022), we consider the extended pendulum test case from that paper. This example shows that our method extends beyond the Lie-Poissson case, as long as an explicit integration for the Poisson equations for linear Hamiltonians is available. This is case here since the Poisson tensor B(y) is affine in y.
Consider a standard pendulum of length 1 and mass 1, with the Hamiltonian The paper (Jin et al., 2022) then introduces an extra variable c with equation c =const, extending the system to the three-dimensional space with the new Hamiltonian H = 1 2 p 2 − cos q + pc. The paper (Jin et al., 2022) then makes a transformation of variables In the new variables (u, v, r), the equations of motion become with the new Hamiltonian K(u, v, r) = 1 2 u 2 − cos v + ur − u 3 − uv 2 . The system (34) is Poisson with the corresponding bracket where we have denoted y = (u, v, r) T . The matrix B defined in (35) is degenerate, and which is just c in the old variables (p, q, c), is a Casimir of the bracket (35). Indeed, one can readily check that B(y) · ∇ y C(y) = 0. Moreover, one also checks that B(y) only has a single eigenvalue of 0, so C(y) defined in (36) is the only Casimir. In order to apply the method of LPNets, we take the test Hamiltonian linear in y as H(y) = α · y. For test Hamiltonians of that type, the equations of motion becomė There are three test Hamiltonians to consider, H a (y) = α a y a , a = 1, 2, 3 (no sum). Then, the equations of motion are H 1 = α 1 y 1 ⇒ẏ 1 = 0 ,ẏ 2 = α 1 ,ẏ 3 = 2α 1 y 2 H 2 = α 2 y 2 ⇒ẏ 1 = −α 2 ,ẏ 2 = 0 ,ẏ 3 = −2α 2 y 1 H 3 = α 3 y 3 ⇒ẏ 1 = −2y 2 α 3 ,ẏ 2 = 2y 1 α 3 ,ẏ 3 = 0 .
Equations (38) are easily solved explicitly. Each Hamiltonian H a = α a y a leads to explicit expressions for an affine transformation T(t, α a , y 0 ) of the initial condition y 0 to the final solution after time t: with the transformations T a given by Data preparation, Part 1: finding the transformations. Suppose we have pairs of solution (y i , y i+1 ) that are obtained from snapshots of a single or several trajectories of equation (34); the time difference between the snapshots is ∆t = h. We separate the interval ∆t = h between the snapshots into three equal sub-intervals (although other divisions of the time interval are also possible). Then, for each initial point of the pair y i we are looking for the sequence of parameters (α 1 , α 2 , α 3 ) defining the transformations T i as in (40), such that In other words, α i = (α 1 , α 2 , α 3 )(y i ) should be such that the three transformations T 1,2,3 , performed in the corresponding order, map the initial snapshot of the pair y i to the final snapshot of the pair y i+1 . For data that is not precise, the matching of α i to the data can be accomplished numerically using a gradient descent method, with further removal of components of the Casimir gradient ∇C. Indeed, each set of parameters α i = (α 1 , α 2 , α 3 )(y i ) determined in the previous step, is only defined up to the value of ∇C. Since ∇C is a zero eigenvector of the matrix B(y), changing for arbitrary k does not change the result of composition of transformations T 3 • T 2 • T 1 . We thus choose k in (42) such that the projection of α on ∇C vanishes, i.e., take However, in the case the training data is exact, for example, is obtained from a numerical simulation with a very high accuracy, we can solve equations for α i analytically. For shortness, we denote the components of y i as (y 1 , y 2 , y 3 ), dropping the index i, and the components of y i+1 by (y 1, f , y 2, f , y 3, f ), since y i+1 represents the final value of the interval t ∈ (t i , t i+1 ). Let us notice that the sequential application of T 1 and T 2 after time ∆t = h/3 on each sub-step gives the intermediate values On the third sub-step, y 3 doesn't change so y * 3 = y 3, f . Also, at that sub-step, the application of T 3 just induces a rotation of (y * 1 , y * 2 ) by the angle 2α 3 ∆t, so the compatibility conditions to match the data points precisely is y * 3 = y 3, f , (y * 1 ) 2 + (y * 2 ) 2 = y 2 1, f + y 2 2, f .
Using (44), we can rewrite (45) as so the solution for α i giving exact match between the data points can be found if and only if the Casimir is exactly the same for all values of the data. If the data contains noise, an optimization procedure must be introduced searching for an optimal value of α i at every step. In this paper, we assume that the learning data is exact. In that case, we can set α 3 = 0 on every time step and match the values of the components y 1 and y 2 ; the matching of the component y 3 is done automatically due to the conservation of the Casimir.
Learning procedure: Neural Network approximation. The inputs y i and outputs α i computed from (47) are used to learn the mapping α(y) using a Neural Network.
In order to match (Jin et al., 2022), we use three trajectories with initial conditions y 0 = (0, 1, 1 2 ), (0, 1.5, 1.5 2 + 0.1) and (0, 2, 2 2 + 0.2), with the time step of h = 0.1. We use the number of points N = 1000 with half of these points used for training and half for validation. The ground truth solution is computed using Python's Scipy odeint routine with BDF (Backward Differentiation Formula), with the tolerances, relative and absolute, set at 10 −13 and 10 −14 , respectively.
Evaluation and dynamics using Neural Network. Just as in (Jin et al., 2022), we compute the trajectories starting at the last point of the trajectory y N for 1000 points, with half of these points used for training and half for validation. We use a Sequential Keras program, with three inner layers of neurons with sigmoid activation functions. Each layer has the breadth of 16 neurons with the total number of tunable parameters being 642. The network uses the Adam optimizer with the learning rate of 10 −3 , exponentially decaying to 10 −5 and Mean Square Error loss, computed for 50000 epochs. The resulting MSE achieved is between 10 −7 and 10 −8 . The results of the simulations are shown in Figure 5. On the left side of this Figure, we show the match between the "exact" solution and the solution obtained by LPNets (notice that the "ground truth" solution is still obtained using a numerical method with a given accuracy). On the right hand side of that figure, we present a three-dimensional plot of the phase space (u, v, r) = (y 1 , y 2 , y 3 ) comparing the numerical results in blue with the LPNets results in red. In Figure 6, we show the conservation of the Casimir for all three initial conditions. Note that the ground truth numerical solution (blue line) only conserves the Casimir C = r − u 2 − v 2 up to the accuracy of calculation, accumulating the error of the or- der 10 −9 after t = 100. In contrast, the Casimir in LPNets (red line) is preserved to machine precision by the very nature of transformations performed on each time step. On the right panel of that Figure, we present the accuracy of the evolution of Energy in LPNets. As one can see, the energy conservation by LPNets is quite satisfactory, yielding the relative error of about 1% or less for all cases. Finally, in Figure 7, we present the results for the discrepancy between the ground truth and LPNets solutions. Even though the discrepancy grows, it is mostly due to the fact that the time between the ground truth and the solution is slightly mismatched, which explains why energy is conserved to much higher precision than the solution itself. The accuracy is still quite good and the solution obtained by the LPNets is virtually indistinguishable from the ground truth on the left side of the Figure 5.
Finally, we would like to emphasize that a system of three inner layers having a width of 16 in each layer should be viewed as very compact. Higher accuracy can be achieved with more data points and correspondingly wider or deeper network, or having more insights into the structure of the framework which we have assumed to be completely dense. Achieving these efficiencies is an interesting challenge that we will consider in the future.
Comparison with previous results for extended pendulum system. The visual agreement of the solutions and the ground truth is similar to the one presented in (Jin et al., 2022) for this system. The conservation of the Casimir is not presented in (Jin et al., 2022). We interrogated the data produced by the code made available in  (Jin et al., 2022) and found that the relative errors for Casimir in the transformed Lie-Darboux coordinates are quite small, between 10 −7 and 10 −6 . The errors in Casimir in the original coordinates (u, v, r) for the parameters presented in (Jin et al., 2022), although still quite small, are several orders of magnitude larger than the transformed coordinates. This could be attributed to the fact that the inverse of Lie-Darboux transformation is not being computed sufficiently accurately. This accuracy of PNNs can certainly be improved by appropriate modification of the original PNN network from (Jin et al., 2022), which is beyond the scope of the paper. In any case, our method conserves the Casimir exactly for the original system with no necessity to search for the Lie-Darboux transformation to the canonical coordinates.

A particle in a magnetic field
To compare with the second test case computed in (Jin et al., 2022), we study a particle of mass m and charge q moving in a magnetic field B(x). We assume that the motion of the particle is in x ∈ R 3 , and the relevant variables are the particle position x = (x 1 , x 2 , x 3 ) and its' momentum p = (p 1 , p 2 , p 3 ). The equations of motion for a particle in a magnetic field B(x) are: Notice that equations (48) are not of Lie-Poisson form. However, explicit solutions for the transformations on each time step can be found here as well. We thus believe that this problem is a useful case for demonstrating the power and applicability of these methods beyond the Lie-Poisson equations. The Hamiltonian for simulations is taken as Here, I 3 is, as usual, a 3 × 3 unity matrix, and we used the hat map notation Note that B(x)v = B(x) × v for all x ∈ R 3 . Similar to (Bajārs, 2023;Jin et al., 2022), we take the following values for parameters, electric potential φ(x) and the magnetic field B(x): One can readily check that (48) possesses no Casimirs since B is non-degenerate.
Reduction of motion and conserved quantities. In (Jin et al., 2022), the initial conditions were chosen to be (x 3 = 0, v 3 = 0) so the particle would always move on the plane. The motion in that case is four-dimensional, and we only need four test Hamiltonians. We thus take the Hamiltonians linear in velocities (v 1 , v 2 ) and coordinates (x 1 , x 2 ) and compute the corresponding motion. Note that when the system (48) is restricted so both x and p are in the plane, i.e. both x 3 = 0 and p 3 = 0, and for the choice of any B = e 3 B 3 (r) and φ = φ(r), where r = x 2 1 + x 2 2 , there are two integrals of motion. One is clearly the Hamiltonian (49). Another one can be found by considering the evolution for the angular momentum in the e 3 direction M 3 = e 3 · (x × p). We can observe thaṫ leading to the conservation law that for (51) and q = 1 reduces to Therefore, the system (48), for the choice of functions (51) and reduced to fourdimensional motion, is essentially a two-dimensional motion because of the conservation of the Hamiltonian (49) and (53). Thus, one should expect a limited richness of solution behavior. Nevertheless, it is a good test problem and since it has been used in both recent papers on the subject (Bajārs, 2023;Jin et al., 2022), we study this particular case as well.
Lie-Poisson transformations. Suppose we have a set of N data pairs, each pair is obtained by the phase flow from y i = (x i , p i ) to some value y f i = (x f i , p f i ). If these pairs are obtained from a single trajectory, then y f i = y i+1 , although this does not have to be the case -our method is capable of learning from several trajectories.
In order to apply our method, we need to compute the results of phase flows for Hamiltonians linear in coordinates x and momenta p.
We just present the answers for these transformations here for brevity; an interested reader may readily check these formulas. In what follows, we shall use the function Φ of time t, initial conditions X = x(0) and parameters α = (α 1 , α 2 ) defined as follows: For the expression of B 3 (x) taken in (51), both components of the function Φ can be expressed in terms of a single elementary function coming from taking the quadrature of (54): The presence of explicit expression for the function (54) is helpful, but it is not essential; for general B 3 (x), one can use the quadrature expressions (54). The transformations for each test Hamiltonian are: 1. H 1 = α 1 p 1 + α 2 p 2 leads to the motion y = T 1 (t, p 0 , x 0 ) 2. H 2 = β 1 x 1 + β 2 x 2 leads to the motion y = T 2 (t, p 0 , x 0 ) Clearly, T 2 does not alter the coordinates x 1,2 . Thus, the algorithm of computation is very explicit, as α 1,2 and β 1,2 can be solved without any root-finding procedure in the learning stage.
Data preparation. Divide the interval ∆t = h into two equal steps h/2. On each of those steps, for each pair of data points (y i , y f i ) where y = (p 1 , p 2 , x 1 , x 2 ), we compute the parameters (α 1,i , α 2,i , β 1,i , β 2,i ) as follows: 1. Compute α 1,2 at the i-th data point to match x 1,2 , and corresponding new intermediate momenta p * 1,2 by using the corresponding transformations T 1,2 according to 2. Compute β 1,2 at the i-th data point according to Learning using Neural Network. The network will learn the mapping between y and the parameters (α 1 , α 2 , β 1 , β 2 ) using y i as inputs and corresponding parameters (α i , β i ) as outputs. The network structure consists of an input layer taking four inputs: (x, p), three densely connected hidden layers of 32 neurons, with the sigmoid activation function, and the output layer producing an estimate for four outputs (α, β). The number of trainable parameters in the network is 2404. Adam optimization algorithm uses the step 10 −3 decaying exponentially to 10 −5 , with the number of epochs equal to 2 · 10 5 . The loss function is taken to be the mean square average (MSE). Because of the large number of parameters, to avoid overfitting, we take 20000 data points of a single trajectory separated by the time step h = 0.1. For that trajectory, use the initial conditions v 0 = (1, 0.5) and x 0 = (0.5, 1). From all the data points, 80% are used for learning and 20% for evaluation. At the end of the learning procedure, both the loss and the validation loss drop to values below 10 −5 .
Solution evaluations. We choose the initial condition to coincide with the end of the learning trajectory, which is located at x 0 ≃ (−0.627, 0.985) T and p 0 ≃ (0.119, 1.112) T . We present the results of simulations in Figure 8. The ground truth was obtained by solving (48) numerically with the given initial conditions using the BDF solver of SciPy, with the relative tolerance of 10 −13 and absolute tolerance 10 −14 for t = 200, providing outputs every h = 0.1. The ground truth is presented with a solid blue line.
The iterative solution using the evaluation provided by successful iterations of Poisson maps (56) and (57) with the parameters (α, β) provided by the neural network, approximates the solution at the time points t = ih, i = 1, . . . 2000. The iteration starts with the same initial conditions as the ground truth solutions. The results of the iterations are presented in Figure 8 with solid red lines. As one can see, the LPNets solution approximates ground truth quite well. To further investigate the results, on the left panel Figure 9 we show the mean-square deviation between the ground truth and LPNets solutions of (48) presented in Figure 8. On the right panel of this Figure, we present the conserved quantities: Hamiltonian (Energy) H given by (49) and the conserved quantity (53). As one can see, the Hamiltonian is preserved with the relative accuracy of 1−2% and the integral (53) to about 3 − 4%.
Comparison with previous literature. The problem of a particle in a magnetic field was considered in both (Jin et al., 2022) and (Bajārs, 2023). The conservation law (53) has not been identified in these papers, thus we do not provide a direct comparison. The visual agreement of the solutions is just as good as the one demonstrated in (Bajārs, 2023;Jin et al., 2022). The absolute error in solution and the Hamiltonian is also similar to that presented in (Bajārs, 2023). There are no Casimirs in the system, so there is no additional advantage over the methods presented in (Jin et al., 2022) or (Bajārs, 2023). Additionally, although the system (48) is Poisson, it is not in the Lie-Poisson form. The fact that LPNets can solve this problem on par with other methods is a consequence of the particular form of the magnetic field B(x) and the fact that the corresponding integrals (54) can be computed explicitly. Even though the system (48) is not Lie-Poisson, we found it useful to present the results in order to show possible extensions of the methods of LPNets for more general systems.

Kirchhoff's equations for an underwater vehicle
The motion of a neutrally buoyant underwater body is described by the Kirchhoff equations, see (Leonard, 1997;Leonard and Marsden, 1997) for the discussion on the Lie-Poisson structure of these equations and the corresponding stability results. When the centre of gravity coincides with the centre of buoyancy, the system simplifies somewhat but still possesses a rich dynamics with several integrable and chaotic cases with a rich history of study (Holmes et al., 1998).
We will treat that particular system with coincidence of centres of buoyancy and gravity and show the applicability of LPNets approach there as well.
The Hamiltonian for Kirchhoff's equations consists of the kinetic energy of rotational motion and the energy of the translational motion with velocity v and mass tensor M: The Poisson bracket for the underwater vehicle is expressed as the Lie-Poisson bracket for the group of rotations and translations S E(3): This bracket has a specific form coming from the geometry of semidirect product groups, see (Holm et al., 1998(Holm et al., , 2009Leonard, 1997). Kirchhoff's equations of motion for the underwater vehicle arė Equations (62) have two Casimirs: C 1 = ∥p∥ 2 and C 2 = Π · p. In addition, the total energy given by (60) is also conserved.
LPNets for Kirchhoff's equations. Bearing in mind that now we have two momenta (Π, p), we shall take the following Hamiltonians: H 1 = A·Π and H 2 = b·p.
Equations of motion (62) reduce to The first motion is the simultaneous rotation of the vectors Π and p about the same axis A, by the same amount, with a given angular velocity. This is the trans- 1. Select the training data couples (y 0 , y f ), where y := (Π, p).
2. Find the rotation axis A j and angles θ j that takes p 0 j to p f j , and the corresponding rotation matrix R(A j , θ j ). This could be accomplished by finding rotation angles, for example, the Euler or Tait angles, introducing the rotation mapping p 0 j into p f j . Simultaneous rotation about either p 0 j or p f j does not change the end result and must be discarded.
While it is possible to proceed in this manner for higher-dimensional groups, for the three-dimensional groups we can utilize a shortcut based on the cross-product of the two vectors, as we have done in the case of a rigid body. We compute The vector A j contains all the information necessary for the first step of the algorithm, namely simultaneous rotation. In order to reconstruct the vector A j on each time step, we only need the components normal to the vector p 0 j . These components can be found by defining two vectors spanning the plane normal to p 0 j in the following way. Take a fixed vector, for example, e 1 = (1, 0, 0) T , and for each p 0 j define Since (ξ 1 , ξ 2 ) are unit length and orthogonal to each other and also to p 0 j , the vector A j defined by (64) can be uniquely reconstructed by Variables a 1,2 j are the first part of the input for any pair of data points. 3. Find the vector b j (or, more precisely, two of its components normal to p f ), such vanishes. This is accomplished by the following calculation. Define two vectors E 1,2 as 5. Compute the axes E 1,2 according to (67) and update the angular momentum according to 6. Reset the final values (Π f , p f ) to the initial conditions and repeat Step 1.
One can see that the algorithm formulated above conserves the Casimirs |p| 2 and Π · p exactly (i.e., to machine precision) on every time step during the trajectory prediction phase.
Simulation results.
Data generation We present the simulation results for the LPNets for the Kirchhoff's equations. The data were obtained using 500 trajectories of 100 time steps with the step h = 0.1. All trajectories started in the neighborhood ofΠ 0 = (1, 1, 1) T andp 0 = (−1, 1, 2) T . The initial conditions are uniformly distributed in a cube of the size 2a = 0.2 in every direction of the phase space. The mass matrix M and tensor of inertia I in (62) are taken to be M = diag(1, 2, 1) and I = diag(1, 2, 1). Network parameters The 50, 000 data pairs are used to produce 10, 000 points (Π j , p j ) as input and the parameters (a 1 , a 2 ,b 1 ,b 2 ) as outputs. A fully connected network with 6 inputs, 4 outputs and 6 hidden layers with 64 neurons each, is initialized to describe the mapping. The total number of trainable parameters for the network is 21,508. All activation functions are taken to be sigmoid. The network is trained using Adam algorithm with the step of 0.001, using mean square error as the loss function. From the data, 80% of data are used for training and 20% for validation. The network is optimized using Adams algorithm with the learning rate of 0 −3 decaying exponentially to 10 −5 for 500, 000 epochs. After the optimization procedure, he loss and validation loss reaching the values of 1.65 · 10 −4 and 1.35 · 10 −3 , respectively.
Evaluation of a trajectory A trajectory is reproduced using the LPNets algorithm as described above, using trained network, starting with the initial condi-tionsΠ 0 = (1, 1, 1) T andp 0 = (−1, 1, 2) T until t = 10. As usual, the ground truth solution is produced by a high accuracy BDF algorithm with relative tolerance of 10 −13 and absolute tolerance of 10 −14 . The comparison between the ground truth and the trajectories obtained by LPNets are presented on Figure 10.
On Figure 11, left panel, we show the conservation of the Hamiltonian (top), the first Casimir |p| 2 (middle) and the second Π · p (bottom). Notice that the Casimirs are conserved with higher accuracy than the ground truth, although the ground truth is already conserved to about 10 −10 .
One may wonder whether better results can be obtained by using alternative ways of simulating trajectories or more effective implementations of neural networks. We have to keep in mind that Kirchhoff's equations (62) are chaotic (Holmes et al., 1998;Leonard, 1997). We measure the rate of the divergence of nearby trajectories with the same values of the Casimirs, also known as the first Lyapunov exponent, to be about λ = 0.250 in units 1/time, starting with the same initial conditions as the simulated trajectory. The minimum growth of errors expected by any numerical scheme is thus proportional to e λt (Ott, 2002). On and Π·p (bottom), comparing the results of LPNets (red) and ground truth (blue). The Hamiltonian is conserved to the relative accuracy of less than 0.3%. Notice that LPNets conserves the Casimir exactly (to machine precision) and thus exceeds the ground truth in the conservation of Casimirs. Right: The discrepancy between the results of LPNets and the ground truth. Dashed red line: growth of error expected from the Lyapunov exponent in the semilog scale. The growth of error follows the Lyapunov's exponent; thus, the neural network is performing as well as could be expected for a chaotic system. the right panel of Figure 11 we present the semilogarithmic plot of the growth of errors, and show that it is corresponds to the growth of errors expected from the chaotic properties of the system.

General theory
Up until now, we have developed a method of computations using transformations that are easily computable, but the coefficients of these transformations must be obtained through the action of a Neural Network. We can extend these results and compute the transformation modules with parameters that are learned through an optimization procedure, producing modules preserving the Poisson structure of the bracket. We proceed as follows.
For Lie-Poisson systems, Section 4 showed that the flow generated by a test Hamiltonian, which is linear in momenta H(µ) = ⟨α, µ⟩, can be computed analytically. Instead of taking the test Hamiltonian to be a linear function of momenta, here we take the Hamiltonian to be H = φ(H), where φ is some nonlinear scalar function of the variable H, which is also a constant of motion 7 . Instead of equa-tions (22), we obtain: is constant under the flow, generated by the Hamiltonian H = φ(H), we conclude that the transformation generated by the flow of φ(H) is simply obtained by a time scaling of (23) and can be written as where T(α, t) are defined as in (23). The prefactor φ ′ (H) = φ ′ (⟨α, µ 0 ⟩) can be sought as a function of the variables using some kind of scalar activation function. Thus, in the G-LPNets framework, we are looking for transformations that are compositions of moduli T s (a s , α s , b s , t), s = 1, . . . M having the functional form where σ is the activation function, which we will choose to be the sigmoid, and (a s , α s , b s ) are the parameters 8 to be determined. The transformations (72) are generalized moduli preserving the Poisson bracket and the Casimirs to machine precision.
The method of G-LPNets selects the order of transformations T s and minimizes a mean-square loss function. For example, for the data comprising N points specifying the beginning µ 0 i and end µ f i values of momenta on each time interval, and assuming that the composition takes M exactly equal time-substeps, the loss function can be taken to be the mean square error (MSE) The use of the mean square loss is advantageous since the derivatives of L with respect to parameters a s , α s , b s can be found analytically. Denoting the collection of parameters with a bar for shortness, for example,ā = (a 1 , a 2 , . . .) etc, we naturally have (ā,ᾱ,b) = arg min L(ā,ᾱ,b).
serves H and arbitrary function of that Hamiltonian φ(H); but the flow generated by that Hamiltonian may not be explicitly solvable.
There are several points to keep in mind regarding the application of G-LPNets.
1. If there are no Casimirs, it is natural to choose the transformations alternating in a given sequence. For example, for an n-dimensional system one could choose a repeated application of T 1 , T 2 , T M followed by T 1 etc. It would be natural (although strictly speaking not necessary) to take the depth of network to be M = n · k, where k is an integer number.
2. If there are d independent Casimirs C j (µ), j = 1, . . . , d, the transformations producing the motion about the axis parallel to ∇C j produce no effective evolution of momenta. One can either choose n − d transformations on every time step by excluding certain α, or simply apply all transformations in a sequence with the understanding that the resulting α is not unique. The latter is the approach we will use to describe rigid body rotation below.
3. The advantage of applying all T s in a row is that there will be no accuracy loss if, at some point, the momentum is becoming close to parallel with the given coordinate axis. The disadvantage of applying all transformations in a sequence without excluding any α lies in the necessity for post-processing for (75), as they are only defined up to corresponding components of gradients that are not in the span of the gradients of Casimir functions ∇ µ C j , j = 1, . . . , d. In fact, even at the continuous level, the Hamiltonian is defined only up to the Casimirs, since the same dynamics are obtained if an arbitrary Casimir is added to the Hamiltonian. Similarly, all possible solutions of (75) obtained by G-LPNets are in the same equivalence class since they generate the same dynamics in phase space. 4. It is also possible that because parameters are defined only up to a certain vector or vectors, one could encounter vanishing gradients in some directions and a slow down of the convergence to the desired solution due to some numerical artifacts. We have not encountered these artifacts during our solution of the rigid body equations, but we cannot exclude further numerical difficulties when applying G-LPNets for general high-dimensional problems.
5. We expect that G-LPNets will also work for the cases beyond the Lie-Poisson framework, whenever explicit integration of the trajectories with the test Hamiltonians is possible, such as the particle in a magnetic field. The transformations T s are then computed as the generalizations of the Poisson transformations with the unknown time scaling coefficients. As long as the completeness of these transformations will be achieved, we expect them to provide efficient and accurate data-based computing methods for general Poisson problems.
The most challenging part of applying G-LPNets, in our opinion, is the lack of a general completeness result. It would be nice if the G-LPNets moduli (73) satisfied a completeness result analogous to that of SympNets (Jin et al., 2020). Right now, it seems unlikely that a general result may be proved valid for all Lie-Poisson brackets. Without specifying more information about the particular Lie group, progress in that area may be limited. However, the silver lining here is that for each particular problem, the symmetry of the problem is bound to be known a priori, where the exact value of the Hamiltonian may or may not be known. Thus, if one focuses on particular problem at hand, a completeness result of transformations leading to G-LPNets could be feasible to achieve. It is also possible that some of the methods of analysis performed in Jin et al. (2020) for proving completeness of the modules in symplectic space would be applicable in the more general setting of a particular Lie-Poisson bracket.

Applications to rigid body dynamics
To show the potential power of G-LPNets, we treat the equations of a rigid body, following up on our discussion in Sec. 5.1. Now, instead of learning from trajectories that stay close to the desired area, we aim to learn the whole dynamics and simply choose 50 initial points uniformly distributed in a cube in momentum space Π, −2 ≤ Π a ≤ 2, a = 1, 2, 3. Each trajectory is simulated with a high precision ODE solver and output is provided every h = 0.1, providing 20 data pairs, with the total of 1000 data pairs. All parameters of the system are exactly as in Sec. 5.1.
Our comparison is the reconstruction of the dynamics of a rigid body which was already done in (Bajārs, 2023). Note, however, that (Bajārs, 2023) takes all data pairs on the same Casimir surface. In our opinion, since the Casimir surface depends on the initial conditions, a physical system, such as a satellite, could be observed on different Casimir surfaces due to the fact that the thrusters or other external forces have moved it from one Casimir surface to another outside the observation time. The Hamiltonian and physical parameters of the satellite are assumed to be the same, so it makes sense that the ground truth is generated with the same Hamiltonian, but different Casimirs.
After data points are generated, a G-LPnet with 6 transformations (18 parameters) is generated. The transformations T i are rotations about the coordinate axes in the fixed frame e 1,2,3 by the angle ϕ i (i.e., e 1 = (1, 0, 0) T etc.) The rotations proceed in the sequence At the given step s, the rotation angle ϕ s for the given value of the momentum Π 0 is given by The optimization finds the values of (ᾱ,ᾱ,b) minimizing the MSE loss function. The gradients of MSE functions with respect to parameters are computed analytically. We tried gradient descent methods and discovered that while they work satisfactorily, they do require quite a substantial number of epochs to converge, as was already observed in (Bajārs, 2023;Jin et al., 2022). To make computation more efficient, we implemented an optimization procedure based on the Broyden-Fletcher-Goldfarb-Shanno (BFGS) algorithm (Fletcher, 2000) into SciPy: www.scipy.org. We let the BFGS algorithm run for 3500 iterations achieving the value of the Loss of less than 5 · 10 −10 , a procedure which is several orders of magnitude more efficient than the standard gradient descent-based algorithm.
Using the values of parameters found by the optimization procedure, 10 long solution with 10, 000 data points each with the time step of h = 0.1 (max time t max = 1000) were generated and compared with the ground truth solution obtained by a high precision ODE method. These long-term solutions were generated to compare with the results in (Bajārs, 2023). The initial conditions Π 0 for these solutions were taken to be random numbers chosen from a random distribution from a cube |Π a | ≤ 1.25, a = 1, 2, 3. The value of the cube was somewhat smaller than the initial conditions so the solutions are guaranteed to remain within the area where the data were available.
In the left panel of Figure 12 we show a part of one of the 10 trajectories extending to only about t = 500 for clarity. This particular trajectory starts with the initial condition Π 0 = (1.011, 1.178, 0.585), the results for other trajectories are similar. The values of momenta are nearly indistinguishable from ground truth for very long time. In the right panel of this Figure, we show all 10 trajectories in the phase space, plotted for all times 0 ≤ t ≤ 1000 (10000 iterations). The results from the simulations and the ground truth coincide perfectly. This is due to the fact that both the Casimirs and the Hamiltonians are preserved with very high accuracy, as Figure 13 shows. The Casimir is, again, conserved to machine precision on each step and thus substantially exceeds the accuracy of the calculation for the ground truth solution (still very high at about 10 −9 ). The relative error in energy is of the order of 0.1% over all long-term solution. Comparison with previous literature. The case of learning the whole dynamics of a rigid body was considered in (Bajārs, 2023). In that paper, the dynamics was considered only on a single Casimir's surface |Π| = 1 with 300 data points. The solutions generated by the neural network were also taken on that Casimir's surface. In contrast, we presented the dynamics with initial data taken from a volume of Π (a cube), and initial conditions for G-LPNets are also taken without any restriction of the Casimir surface. Our relative error of solutions are of the same order as the results presented in (Bajārs, 2023). The conservation of Casimir and energy is reported to be of the order of 1.5% in (Bajārs, 2023). In our case, the relative error in energy is somewhat better, with the max value of error around 0.4%. The Casimir |Π| 2 is conserved with the machine precision on every time step, so after 10 5 steps a typical error in Casimir is expected to be of the order of 10 −11 ÷ 10 −10 . While the relative error of energy in methods presented in (Bajārs, 2023) can no doubt be improved to reach the accuracy achieved here, we are not aware of any method preserving the value of the Casimir to the same precision as ours.

Conclusions
We have derived a novel method of learning the evolution of a general Lie-Poisson system that is capable of predicting the dynamics in phase space with  0))/⟨C⟩ comparing the results of G-LPNets (red) and ground truth (blue), for all 10 simulations. As usual, G-LPNets conserve the Casimir exactly (to machine precision) and thus substantially exceed the ground truth in the conservation of Casimirs. Right: The discrepancy between the results of G-LPNets and the ground truth for the solution, presented on the left part of Figure 12. Again, the discrepancy comes mostly from time mismatch, whereas the amplitude of oscillations is conserved with high precision due to the corresponding high precision in the conservation of energy. high precision. Our method learns the system by applying exact Poisson maps for certain test Hamiltonians, which are exactly solvable for any Lie-Poisson bracket. The resulting maps preserve the Poisson bracket under the evolution and also preserve all Casimirs with machine precision. These methods are also applicable to systems beyond Lie-Poisson, such as a particle in magnetic field, as long as the corresponding equations for test Hamiltonians are exactly solvable.
We derive two types of networks. The first one is the Local Lie-Poisson Neural Networks (LPNets) which derive the local Poisson maps using test Hamiltonians that are linear in momenta. The parameters for these Poisson mappings resulting from the test Hamiltonians are then learned using standard methods of data-based learning with Artificial Neural Network (ANN). The advantage of this method is that the local completeness of the mappings is achieved automatically, since they represent exponential maps on a Lie group. An additional advantage of the method is the ability to use all the modern technology of ANNs for the discovery of the mapping from momenta to the parameters of test Hamiltonians.
An alternative method was derived, called Global LPNets, using an example of nonlinear test Hamiltonians. These nonlinear Hamiltonians are arbitrary functions of the local Hamiltonians in the local LPNets approach. The explicit evolution maps on every time step obtained by these methods can be viewed as generalizations of the symplectic modules derived in (Jin et al., 2020) for case of a general Lie-Poisson bracket. We have presented an application of these methods to rigid body motion and showed that these methods demonstrate excellent accuracy and efficiency for long-term computation, and the ability to learn the dynamics in the whole phase space from quite a limited number of observations.
While the G-LPNets seem more computationally effective than the local LP-Nets, we must caution the reader that a completeness result for mappings for G-LPNets obtained through the test nonlinear Hamiltonians is still missing. We believe it is probably highly unlikely that such a result can exist for a general Lie-Poisson system. The completeness result is likely to depend on the structure of the actual Lie group and the corresponding Lie-Poisson bracket. This is an interesting topic that we intend to address in the future.
Another interesting topic is the question of the extension of this method to more general systems. In order to achieve that goal, the equations generated by the Poisson bracket for the test Hamiltonians must be exactly solvable, as it was in the case of the particle in the magnetic field. It will be interesting to compute the conditions on the Poisson bracket for such integrability to occur, which we also plan to undertake in the future. Of particular interest are constrained systems, especially systems with nonholonomic constraints. The Lie-Poisson equations in this case become the Lie-Poisson-d'Alembert's equation, where the right-hand side of (A.3) contains extra terms enforcing vanishing of momenta projections to certain subspaces defined by the constraints, such as in the Suslov problem (Bloch, 2003). Apart from their importance in classical mechanics, these methods were recently found to be important for the variational discretizations of fluids (Gawlik and Gay-Balmaz, 2020), possibly including irreversible thermodynamics processes (Gawlik and Gay-Balmaz, 2022). Extension of data-based computing to nonholonomic Lie-Poisson systems may thus play an important role in the applications of the method of this paper to continuum mechanics, including irreversible processes. and hence preserves the coadjoint orbits O = {Ad * g µ 0 | g ∈ G} ⊂ g * . For the case of a linear Hamiltonian h(µ) = ⟨a, µ⟩, where α ∈ g is a fixed element of the Lie algebra, the Lie-Poisson equation (A.1) becomė and its flow is found as ϕ t (µ 0 ) = Ad * g(t) µ 0 with g(t) −1ġ (t) = α .
This is the coordinate free version of (23), namely T(t, α) is the coordinate version of Ad * exp(tα) , with exp the Lie group exponential. Then, we can derive LPNets if we can explicitly exponentiate the elements of the Lie algebra, so that it can be efficiently used in the minimization procedure (25). consistently with the formulas for T appearing in Sec. 5.1&5.4.