Reconstruction of observed mechanical motions with Artificial Intelligence tools

The goal of this paper is to determine the laws of observed trajectories assuming that there is a mechanical system in the background and using these laws to continue the observed motion in a plausible way. The laws are represented by neural networks with a limited number of parameters. The training of the networks follows the Extreme Learning Machine idea. We determine laws for different levels of embedding, thus we can represent not only the equation of motion but also the symmetries of different kinds. In the recursive numerical evolution of the system, we require the fulfillment of all the observed laws, within the determined numerical precision. In this way, we can successfully reconstruct both integrable and chaotic motions, as we demonstrate in the example of the gravity pendulum and the double pendulum.


Introduction
The task of intelligent systems is to identify the phenomena or concepts that are the most appropriate to describe the observed data [1,2]. While these concepts are hard to approach in tasks like image recognition or text analysis, in the case of description of observed motions of dynamical systems, we have some clues of the general form of these concepts. These include that in a proper (phase) space the motion is governed by a first-order differential equation, or, in the case of continuum mechanics, first-order partial differential equations. Sometimes we also have some educated guess about the form of the kernel of the differential equation. But in complicated systems, the kernel must be determined from the observed data (data-driven modeling, c.f. [3]).
In these cases, the task of the (artificial) intelligence is to describe the kernel in the most reliable, and in the most sparse way. There are several approaches in the literature to do this. In the neural network-based approaches, one tries to set up a network that can learn the properties of the observed data. It is used for a wide range of fields, including general PDE [7,8], fluid dynamics [4,5,6], quantum mechanics [9], molecular dynamics [10], particle dynamics [11] and in chaotic systems [12,13].
In the course of reconstructing the underlying laws for an observed mechanical motion, we have to model the mathematics of the dynamical system. One can model the driving force of the system directly [11,13], focus on the Hamiltonian [14,15], or the Lagrangian [6,16]. The applied machine learning tool exhibits the functional space, where the best fit for the aimed function can be found. A very important problem in all of these approaches is to find the most robust approximation that respects the symmetries of the dynamics.
Here a new way is presented for the reconstruction of the observed trajectories of simple and chaotic dynamical systems, using the Extreme Learning Machine [17,18]. In this approach, only a part of the weights is trained in a neural network, usually the ones in the last layer. This leads to a tremendous improvement both in required computational resources and training time.
The goal of this paper is to observe a motion with ∆t time separations, resulting in x n ∈ V series, where V is some vector space. We want to determine a recursion kernel in order to represent the motion as an x n+1 = F ∆t [x], where [x] means the past; in mechanical systems, it is enough to consider the x n and x n−1 values. We want to reconstruct the motion using this recursion and also continue it for future times.
An advantage of the above second-order recursion is that it, in principle, is appropriate to describe non-conservative systems, where the energy is not conserved. A drawback, however, is that a numerically determined force is improbable to support an exact conservation law even if it is conservative. This is because of observational noise and also reconstruction inaccuracies. In these systems, the recursion of the equation of motion soon leads to divergent trajectories, because usually the larger energy occupies larger phase space, and so by random walk, we tend to increase the system energy.
A solution to this problem follows the theoretical lines lied down in [23], and applied to linear systems in [24]. We determine not only the equation of motion but also other "laws" of the systems, using different levels of embedding. A first-order embedding leads to a law of the form C (1) ∆t (x n ) = constant, this includes the holonomic constraints of the system. Second-order constraints describe anholonomic constraints as well as other conserved quantities like energy. Third-order constraints give the equations of motion. In principle, higher-order constraints can be used too. The constraints are numerically determined and in the recursion the fulfillment of all of them is required within the precision of the numerical errors. This leads to a stable algorithm with good noise tolerance. This paper has the following structure. In Section 2 we discuss the general ideas to treat mechanical systems with finite time resolution. In Section 3 we discuss the issues that come up in the course of a numerical representation. In Section 4 we turn to the actual studies: the mathematical pendulum, the physical pendulum, and the double pendulum. In Section 5 we close the paper with conclusions.

General setup
In our earlier studies [23,24], we established the multi-feature approach of AI and worked out the method for the determination of the linear laws. In this paper, it is examined how nonlinear laws can be treated.
The goal of this paper is that, assuming that we observe the motion of all degrees of freedom of a closed system, from the numerical data we determine the equations of motion, and continue the observed motion in a plausible way. In this sentence there are a lot of notions which have to be defined: how a motion is observed, what is understood under "all degrees of freedom" and under a "closed" system in general.
It is assumed that we observe a system X that has possible states x ∈ X. We will represent the system as X ≡ R N . We record the state at every t = n∆t time step, thus we have the x n records.
It is evident, that what happens tomorrow, must depend on what is present today, since forgotten things, which have no trace today, can not influence the fate of the system in the future. This means that we may set up a recursion The recursion also needs an initial condition x 0 . This form can be true only if all information that is necessary for the future of the system is accounted for. This means that "all degrees of freedom are accounted for". Closedness is reflected in the fact that F does not depend directly on time (i.e. on n).
If ∆t is small enough, then the numerical values of x n+1 and x n are close to each other. Then it is usual to keep track only the change and define We shall emphasize here that ∆t is a finite quantity throughout the complete discussion. But we may discuss the ∆t → 0 continuum limit. In this case v n ∆t→0 −→ẋ(t), t = n∆t. ( Then the above recursion becomes a first-order differential equatioṅ Its solution reads G depends only on t − t 0 , if f 0 does not depend on time. Choosing t = (n + 1)∆t, t 0 = n∆t with a finite ∆t, and x init = x n , we obtain x((n + 1)∆t) = G(∆t, x(n∆t)).
This is consistent with (1) with F ∆t (x) = G(∆t, x). This means that if a system is governed by an autonomous first-order differential equation, then the discrete-time evolution is governed by a first-order recursion. Sometimes the equation (4) has conserved quantities C(x) which means that This quantity is conserved for any t − t 0 , thus this remains conserved in the discrete case, too.
In practice, it is not easy to decide, whether all information is known for the recursion, i.e. whether we know the complete state of the system. If the state space is N dimensional (X ∼ R N ), then we know that we need N real number to determine the future elements. Observing a single components of the embedded time series x 0 (t), x 0 (t + ∆t), . . . , x 0 (t + (N − 1)∆t) can provide this N parameters: we have N equations for N unknowns for the components of x 0 . Therefore it is also enough that we observe only some components of the complete state, but for several time steps, it then provides sufficient information to restore all the initial state, and so all the time dependence.
This means that we may observe a reduced system y ∈ R M , for time steps t, t − ∆t, . . . . Then we consider the state of the system as x n = {y n , y n−1 , . . . , y n−k }.
This information is enough to provide x n+1 , and y n+1 can be computed. Thus we have the recursion y n+1 =F ∆t (y n , y n−1 , . . . , y n−k ).
So in a reduced system, a (k + 1)th order recursion is needed to describe the complete time evolution. In the continuum limit, it corresponds to a (k + 1)th order differential equation. This statement is the difference equation version of Taken's embedding theorem [19].

Specialties of mechanical systems
In mechanical systems, in the continuum limit, the states are elements of the phase space. If the configuration of the system is denoted by x ∈ R N , then the phase space consists of the (x(t),ẋ(t)) pairs.
In practice, we can only observe the configuration of the system. If all coordinates are observed, they are still needed at two different times. Therefore we should use the recursion It is worth introducing quantities to characterize the first-and second-order change then we can write This is the difference equation realization of the Newton-equation. f ∆t is the discrete force function.
In the continuum limit, we have the differential equation an we have to fix the time evolution at two points, or at a single point we shall provide x(t 0 ) andẋ(t 0 ). This fixes the solution uniquely: Because of time translation invariance this form depends only on the time differences. In particular with t 0 = n∆t, t 1 = (n − 1)∆t and t = (n + 1)∆t we get back (10) with F ∆t (x, x ) = G ∆t (∆t, x, x ). We remark that even if f 0 does not depend onẋ, i.e. the force is velocity independent, the discrete force f ∆t can still depend on it.
We have seen in (7) that if in a continuum time there is a conserved quantity, then there is also in the discrete-time. Now we use the second derivative equation of motion, so the conserved quantities are C(x,ẋ) in the continuum limit. Using (14), we can express the local velocity as a function of initial conditions: In particular with t = t 0 = n∆t, t 1 = (n − 1)∆ṫ This means that, This means that is a conserved quantity for all solutions of (12) discrete recursion.

The numerical problem
The problem we shall solve is that we observe some trajectories with fixed time resolution ∆t, and we want to reproduce and also continue these trajectories.
We emphasize that we want to reproduce the time series exclusively from observations, we shall not use the underlying differential equations. This also means that the local velocity information is not available, since it is not measured (only the discrete variant). The algorithm to solve this problem is the following. We model the general function class to where it is suspected that our f ∆t function belongs. This function class is approximated by a neural network with parameters w ∈ R Nw . Thus The w i weights are determined to best describe the observed data. In very general terms we can say where L(a, a ) is the local loss, for example L(a, a ) = |a − a | 2 ; L total is the total loss, and a (i) n are the results of (11) evaluated on x (i) n trajectory. In this paper, we use simulated data to train the network. Once we have the approximate form of the discrete force function, we can establish the recursion

The neural network
To approximate the discrete force we applied the Extreme Learning Machine [17,18] ideas. This corresponds to a one hidden layer neural network, shown in Figure 1. According to the logic of the Extreme Learning Machine, the w 0 weights in the first layer are not trained, they remain to be chosen randomly. Training concerns only the w weights. We can also say that from the input data we create random features at the hidden layer. From these random features, we compose the resulting function.

Smoothness
There are several issues we should take into account when we construct the neural network. The first one is smoothness. We expect that the discrete force is a smooth function of its arguments, meaning not just that there are no discontinuities present, but also that the derivative of the function should not exceed certain values. Physically we expect that nearby positions and velocities result in almost the same force. Thus, we shall construct a network in which any reasonable choice of the weights results in an output which is a smooth function of the inputs. This requires that the features themselves are smooth for any w 0 values. In this work, we used the form

Stability
The second issue is about the stability of the recursion. In fact, we shall ensure that even for a large number of iteration steps the values of the position and velocity remain in the sensible range.
The core of the problem can be understood even in continuous formalism. The differential equation (12) can be dissipative, in which case the coordinates all go to a constant value after a long time. The motion can also be divergent, in which case the velocities have larger and larger values, eventually they may also diverge in a finite time interval. Motions that remain in a finite range, i.e. they are neither dissipative, nor divergent, lie on the border line between the two extremes. But it is highly improbable that a numerically determined force lies exactly on the border line. Usually what happens that the force may have dissipative and diverging parts, and, due to the larger phase space occupied by diverging motions, practically a numerically determined force almost always leads to a divergent motion.
The stability of these motions usually accompanied by the appearance of conserved quantities (like energy, momentum, angular momentum etc.). In numerical recursions the constancy of the conserved quantities can ensure the stability of the motion.
In more general terms we can establish kth order constraints (laws) in the observed data that affect k level of embedding. These are: • zeroth order (holonomic) constraints are of the form This means that the used coordinates are interdependent. We can avoid dealing with zeroth-order constraints if we choose independent parametrization of the observed motion. For example, a rod means a constraint x 2 n + y 2 n = R 2 = x 2 0 + y 2 0 for its endpoints, this can be avoided by working with the declination angle.
• first-order constraints are of the form These include the anholonomic constraints (in this case C 1 is linear in v), as well as the conserved quantities.
• second-order constraints are of the form C ∆t (x n , v n , a n ) = C ∆t (x 0 , v 0 , a 0 ), ∀n.
In mechanics, second-order constraints yield complete information about the motion. The usual form is to express the a n variable, and arrive at the Newton-equation form of the equations of motion (12).
In a consistent system, the higher-order constraints are compatible with the lower-order ones. In particular, in mechanics, the equations of motion should respect the holonomic, anholonomic constraints as well as conserved quantities.
In the case of numerically determined force, the force usually does not support an exact conserved quantity. Then the force and the conservation are practically independent, thus we should determine the different order constraints independently, from the data. Then we require the fulfillment of all the constraints (including the equation of motion), to the precision they describe the data.

Training
The training set is a collection of the observed trajectories with different initial conditions. We prepare the x n , v n coordinates from all the observed time series using (11), this makes the items in our dataset (our "phase space"). We also compute the acceleration proxies a n from (11), this makes the "labels" of the data. The number of all data N data is the number of all observed (x n , v n , a n ) triplets.
As the Extreme Learning Machine ideas suggest, we choose the w 0 weights randomly and do not train them. In the choice of w 01 and w 02 , we should take care that the generated features cover the observed phase space more-or-less uniformly. The w 03 variables we keep fixed for all the features.
Once we have the w 0 values, we also can compute the features, these are common for all observables. We denote the features by where N f eat is the number of the features.

Conserved quantities
We shall choose the w variables separately to obtain conserved quantities and the observed acceleration values. The conserved quantities are formed as This can be written in matrix form as where F now represents a matrix. Conservation means with arbitrary k. In the actual calculations, we fix the separation index. We must take care that we compare C values belonging to the same trajectory, so the above condition does not apply on some n-s, where n + k already points to the next trajectory.
To determine the w weights we prepare the matrix and require |dF w| 2 = minimal, where |w| 2 = 1.
This is a minimum value problem with a constraint: to handle it we shall use the Lagrange multiplier method. Then we shall minimize where λ is to be determined. This leads to eigenvalue equation. The quantity we need to minimize is then thus we have to choose the smallest eigenvalue. This method is analogous to the PCA method, but here we need the smallest eigenvalue, not the largest one. We also note that in practice not only the constancy of the eigenvalues is important, but also the ability to make a distinction between motions.

Force
The force is also a linear combination of the features, like the conserved quantities. In matrix form is reads To determinew we require that the calculated force reproduces the observed accelerations the best: This leads to the equation Although it seems that a matrix inversion could be used here, in fact usually F T F is an ill-conditioned matrix. We shall use a pseudoinverse instead; for that, we solve the eigenvalue problem for F T F and thenw In practice, we use ε = 10 −10 .

Recursion
Once we trained the network, we can do the recursion to produce the time series. The recursion goes with the equation (21). As we have already discussed, in itself it is not enough: it leads to a divergent behavior since the numerically determined force does not support exact conserved quantities. Therefore after each EoM step we also perform a projection to the conserved value surface. It is manifested in the code that we single out random directions, and we make one step towards the correct value of the conserved quantity (if needed, we decrease the step size). We repeat this stochastic minimum finding until we are closer to the desired value than some (2-3) times the standard deviation of the constancy of the given conserved quantity.

Chaoticity
A mechanical system with a relatively small number of degrees of freedom can lead to (usually leads to) chaotic behavior. This means that small differences between initial conditions grow exponentially. This also means that no numerical method can yield the solution for long times. This can be checked easily, for example comparing the results of two solution methods in Python differential equation solver. With any precision, two methods yield similar results only for a restricted period of time.
This means that we can not expect that the solution of the recursion, provided by the neural network, will be close to the solution of the observed data. In this case, the only measure of the motion is the faithfulness of the force function representation as well as the stability of the conserved quantities.

Renormalization
The same motion can be sampled with different ∆t time separations. As it was discussed, a recursion with the discrete force supplemented by numerically determining conserved quantities is always sufficient to reconstruct the data. This means that we can determine f ∆t for each ∆t: this dependence is called renormalization [20,21,22].
In case the force is characterized by numerically determining weights, we can obtain w ∆t dependence. These are known as "running" parameters in the terminology of the renormalization group.
In principle, we can fit a function to reconstruct the w ∆t weights if they are smooth enough. This makes it possible to give a hint for different ∆t values. When it is known, the recursion can be accelerated significantly, since when it is possible, we can change a larger ∆t.
In this work we do not use the renormalization to improve our methods, it is the task of future investigations.

Studies
In this section, we describe the different systems that are studied with the method described above.

Linear oscillator
The most simple system is the linear oscillator. In this case, we can follow the algorithm analytically.
The equation of motion, rescaled, reads having the solution If we perform observation in ∆t time steps, then we can establish the following relation x(t + ∆t) + x(t − ∆t) = 2 cos(ω∆t)x(t).
This corresponds to the recursion expressed through v n and a n this reads a n = 2 cos(ω∆t) − 1 ∆t 2 x n .
At finite step sizes, the force remains linear and independent of the velocity: where is the scale dependent "multiplicative mass renormalization factor". For small k it is Z(k) = 1 − k 2 /12 + . . . . We can also find a conserved quantity for any scale. In the continuous case the energy is conserved. Using the solution (41) we find thaṫ On the other hand with k = ω∆t from (41) we find Using t = n∆t and (11) we finḋ Finally, we obtain In the k → 0 limit, the first term indeed becomes simply 1 2ẋ 2 .

Gravity pendulum
A slightly more complicated system is the pendulum in the gravitational field. In the simplest version it is governed by the EoM (after appropriate rescaling of the time variable)ẍ = − sin x.
The physical meaning of x is the deflection angle.  We also trained the force. The value of the force along the trajectory of the v 0 = 1.7 motion and the computed value is shown in Figure 4. As we see, the computed force and the measured acceleration is hardly different. Even where they are not the same, its reason is that the numerical solver produces sometimes non-continuous acceleration. The precision of the reproduction of the force was 96.6%.
After the training, we may run a recursion. We continued the motion for 10 times the training period, and the result can be seen in Figure 5. As we can see, the reconstructed and the original trajectories are hardly distinguishable (in the figure they are within line width distance). Quantitatively, the reconstruction error is 0.83%.  Figure 6: Trajectories coming from the double pendulum system for initial conditions (x 1 , x 2 ,ẋ 1 ,ẋ 2 ) = (π/2, π/2, 0, 0) for two different solver: the upper is for 'DOP853', the lower is for 'RK45' in the solve ivp function of Scipy (Python).

Double pendulum
Using the same ideas we can work out the motion of the more complicated double pendulum. As a differential equation, its equations of motion read gM sin x 1 − gm 2 cos ∆φ sin x 2 + We first observe the motion of the double pendulum. Technically we solve the above differential equations with parameters 1 = 2 = 1, M = 3, m 2 = 1, g = 1, and for three initial conditions: , 0, 1)).
As an example, we show the angles, the velocities, and the acceleration in Figure 6 for the first case. As this Figure demonstrates, the solution of the double pendulum system numerically can not be determined exactly. The reason is that this system is chaotic, and any small difference in the state of the system leads to exponentially deviating solutions. In Figure 6 we see the example of the solution using two differential equation solver methods of Python's Scipy package: the 'DOP853' and the 'RK45' methods. We see that the solution starts in a similar way, but soon they become different. This is, actually, not a bug, but   a feature of chaotic systems. But this also means that we can not expect that our numerical method provides trajectories that stay close to an exact one.
After we have the observed data, we can build up and train our network. We used N f eat = 1000 features in the hidden layer, the w 01 , w 02 position variables were uniformly distributed in the range where the training set data was present, the scale variable was w 03 = 3. For the training, we used ∆t = 0.02 time discretization. We used two conserved quantities here.
In Figure 7 we show the time dependence of the conserved quantities and the time dependence of the acceleration and the reconstructed force. We can see that although the conserved quantities fluctuate in time, their average remains considerably constant; this persists for later times, too. The accuracy of the force determination is 93%; this is partly due to the differential equation solver which produces sometimes glitches in the acceleration.
With the trained network we can reconstruct the motion, the result can be seen in Figure 8. As we see, the early time behavior is the same as was in the case of both differential equation solving methods. But the deviation starts earlier: this is because the reconstructed force is less accurate than the numerical errors made by the numerical differential equation solvers. But what is important, the motion remains stable, there are no divergent parts. This property is the consequence of the requirement that conserved quantities remain conserved up to their standard deviation.

Conclusion
In this paper, we proposed a method that is able to reconstruct and continue observed mechanical motions. The input of this method is the trajectories discretized in time. The core is the module that is capable to determine the different level laws in the motion: the holonomic constraints, the anholonomic constraints, the conserved quantities, and the equation of motion. For a stable reconstruction of the motion, one has to use all of this information, since the EoM alone leads to unstable, diverging solutions.
For the representation of the laws a shallow neural network was used, the training followed the Extreme Learning Machine ideas. Here only the last layer is trained, all the former layers are thought to provide the features of the problem, which is combined linearly by the last layer. We used a relatively small number of parameters, where the hidden layer had N f eat = 100−1000 elements. Therefore the number of the weights was at most some thousand. This leads to a fast learning and fast reconstruction of the motion. With this network, the observed force could be reconstructed with better than 90% precision.
As applications, we worked out analytically the mathematical pendulum case. The numerical method was applied to the pendulum in the gravitational field and the double pendulum case. Whenever the motion is not chaotic, our method could reconstruct the observed motion with high precision and could continue the motion in a stable way. In the chaotic case, the exact motion can not be determined numerically, since all numerical methods contain approximations, and in chaotic systems, small deviations grow exponentially in time. This also means that our numerical method could not exactly reproduce the observed data, since the representation of the observed force was not completely correct. Nevertheless, the generated motion remained stable in time, due to the requirement of the constancy of the conserved quantities.