Model reduction techniques for the computation of extended Markov parameterizations for generalized Langevin equations

The generalized Langevin equation is a model for the motion of coarse-grained particles where dissipative forces are represented by a memory term. The numerical realization of such a model requires the implementation of a stochastic delay-differential equation and the estimation of a corresponding memory kernel. Here we develop a new approach for computing a data-driven Markov model for the motion of the particles, given equidistant samples of their velocity autocorrelation function. Our method bypasses the determination of the underlying memory kernel by representing it via up to about twenty auxiliary variables. The algorithm is based on a sophisticated variant of the Prony method for exponential interpolation and employs the Positive Real Lemma from model reduction theory to extract the associated Markov model. We demonstrate the potential of this approach for the test case of anomalous diffusion, where data are given analytically, and then apply our method to velocity autocorrelation data of molecular dynamics simulations of a colloid in a Lennard-Jones fluid. In both cases, the VACF and the memory kernel can be reproduced very accurately. Moreover, we show that the algorithm can also handle input data with large statistical noise. We anticipate that it will be a very useful tool in future studies that involve dynamic coarse-graining of complex soft matter systems.


Introduction
Generalized Langevin Equations (GLE)s, i.e., extensions of Langevin equations with memory, have fascinated scientists for many decades, ever since they were first introduced by Mori in the 60s [40] based on concepts of Zwanzig [52]. Their purpose is to describe in very general terms the irreversible dynamics of collective (coarse-grained) observables in many-particle systems without having to assume complete separation of time scales. Consider a multiparticle system at thermal equilibrium, and assume we are interested in the dynamical evolution of a given set A(t) of coarse-grained variables. For such cases, starting from a microscopic Hamiltonian description and using a projection operator formalism, Mori and Zwanzig derived a dynamical equation of the following form for A(t) [53] A(t) = F C (A(t)) − (1) Here F C (A(t)) = iΩA describes the collective oscillations of A(t), K(t) a memory kernel matrix, and F R (t) incorporates nonlinear terms and interactions with the remaining microscopic degrees of freedom and is interpreted as a stochastic process. It is connected to the memory kernel via a fluctuation-dissipation relation where F R (t)F R (t ) and AA denote tensor products and · configurational averages. In the limit where the memory kernel decays almost instantaneously, Eq. (1) can be replaced by a standard Markovian Langevin equation with friction K 0 = ∞ 0 ds K(s) and uncorrelated noise F R (t) satisfying F R (t)F R (t ) = 2K 0 δ(t − t ) AA .
Eq. (1) can be generalized to stationary and even non-stationary nonequilibrium systems [22,39]. Using modified projector methods, it is possible to design GLEs such that F C includes all (linear and nonlinear) reversible interactions [29], and the memory and noise terms subsume the remaining dissipative contributions to the dynamical equations. In that case, the latter can also be seen as a frequency dependent thermostat [53]. Independent of the original dynamic coarse-graining context, such GLE thermostats have been utilized to devise enhanced sampling approaches in molecular dynamics simulations [9] and even as means to mimic the effect of nuclear quantum fluctuations [10]. From a more fundamental point of view, GLEs are popular theoretical frameworks to describe intriguing physical phenomena such as anomalous diffusion [38,26,21] or the glass transition [20].
In numerical simulations of GLEs, one faces two main challenges: First, the efficient evaluation of the history dependent memory term, and second, the generation of suitably correlated random numbers for the stochastic term. If the memory kernel has a finite range, a straightforward direct integration of the GLE is possible [14,46,12,33,27,28]. However, depending on the memory range, the integration can be time consuming, and the necessity to impose a sharp cutoff may cause problems. Therefore, already early on, alternative approaches have been explored where the GLE is approximated by an extended Markovian system, either using autoregressive techniques [41,13,17,37] or by introducing additional, auxiliary degrees of freedom [11,5,47,35]. In both cases, the memory kernel is effectively approximated by a sum of exponentials. Integrating GLEs via extended Markovian schemes has several advantages over direct integration. First, the hard cutoff of the memory kernel is replaced by a less severe exponential cutoff. Second, the integration is often faster [33]. Third, the integration scheme can be implemented in a straightforward manner using established algorithms for Langevin equations.
To simplify the discussion, in the following, we will focus on GLEs that describe the Brownian motion of selected tagged particles (e.g., colloids or polymers) in a bath of surrounding particles. The coarse-grain quantity of interest is the velocity V (t) of the center of mass of the tagged particle. Our considerations and methods can easily be generalized to other GLEs as well. The derivation of extended Markov systems for approximating the solution of such GLEs has been treated in several papers. The standard work flow consists of three steps. In a first step an approximation of the memory kernel is determined. In some cases, this is done directly by using the Mori-Zwanzig formalism analytically [12,36], or by analyzing trajectories in microscopic simulations with concepts from the Mori-Zwanzig theory [8]. More often, one uses known velocity and force correlation functions from all-atom simulations [27,28,32,33,51] as input data and solves a first kind or second kind Volterra integral equation. Care has to be taken in this latter case, because first kind integral equations are notoriously illconditioned [7,31,45]. In a second step the computed memory kernel is approximated by an exponential series, known as a Prony series; this can be achieved by using rational approximations of the Laplace transform of the memory kernel, or by some general nonlinear fitting scheme [5,19,32,36]; both methods, however, are only feasible for few terms of this exponential series. Finally, the coefficients of the extended Langevin model can be extracted from the coefficients of the exponential approximation [11,19,42].
In case the starting point is the velocity autocorrelation function (VACF), determining the memory kernel prior to expanding it seems an unecessary detour. It should be more efficient and less error-prone to extract and optimize the parameters of the integrator directly from the VACF data. Wang et al. have taken this direct route and determined the parameters of a Prony series with up to eight terms by training a deep neural network in two steps with VACF data [49]. Using the example of dilute and dense star polymer solutions, they showed that their GLE integrator is able to reproduce the early and late time characteristics of the VACF over several orders of magnitude in time. This example demonstrates the potential of direct optimization. However, their machine learning based method requires a complicated training procedure involving a series of GLE simulations, similar to other iterative memory reconstruction methods [27]. Therefore, it gives little insight into the mathematical structure of the problem, and in particular, the impact of noise in the input data remains unclear.
In the present paper, we propose an analytical method to directly determine the Langevin model from a finite number of samples of the VACF. We avoid taking the detour of computing an intermediate memory kernel and thus circumvent a significant cause of numerical errors. Our approach adapts techniques which have originally been developed for the purpose of model reduction in deterministic dynamical systems [15,18]. Some of these have already been used in the context of stochastic dynamical systems [6,16], but here we follow a different route.
The paper is organized as follows. In Section 2, we describe the background and major ideas behind our model reduction algorithm; more technical details have been shifted into three appendices at the end of this text. Then, in Section 3, we discuss numerical results for three different case studies: one setting uses analytic data ("subdiffusion"), a second example treats molecular dynamics (MD) data taken from our earlier paper [27], and the final case study considers a sequence of MD data sets with different amounts of statistical noise. A final discussion of our results in Section 4 concludes this work.

Methods
In this paper we restrict ourselves to a one-dimensional system, and postpone the nontrivial technical generalization to higher dimensional problems to future work. We thus consider specifically the GLE for the scalar velocity V of a macroparticle with mass m > 0, where the memory kernel γ is taken to be continuous and integrable. The lower bound of the integral in Eq. (3) is chosen −∞ to indicate that we are considering the stationary limit of a process where the origin of time does not matter. In order to fulfill the fluctuation-dissipation theorem the random force F R is assumed to be a stationary Gaussian process with zero mean, satisfying where β is the inverse temperature [44]. The corresponding solution V is a stationary Gaussian process with zero mean and autocorrelation function We assume that samples of this autocorrelation function are given on an equidistant grid for some n ≥ 2 and mesh size τ > 0. Due to the particular specification (5) these data are normalized to satisfy y 0 = 1 up to statistical errors. In the initialization step of our algorithm we therefore rescale the data to exactly match this physical constraint.
Our goal is the derivation of a Langevin equation for an approximate velocityṼ and a set of auxiliary variables Z ∈ R N , with e 1 being the first canonical basis vector in R N and W a one-dimensional Brownian motion. We are interested in the stationary solution of (7), and we want to achieveṼ ≈ V by choosing the coefficients k ∈ R, g ∈ R N , and the tridiagonal matrix T 0 ∈ R N ×N in such a way that the autocorrelation function CṼ ofṼ is given by a finite Prony series and interpolates the given data at the 2n grid points of , i.e., βm CṼ (t ν ) ≈ y ν , ν = 0, . . . , 2n − 1 .
We mention that if the memory kernel is of interest for diagnostic purposes then it can readily be computed as
For this we use a variant of Prony's method, which we borrow from [1]. This algorithm computes the Lanczos polynomials associated with the moment functional defined by the given data (see Appendix C for more details) and uses the corresponding recursion coefficients to set up a (real) tridiagonal Jacobi matrix J ∈ R n×n . The key observation behind the algorithm is that this matrix solves the moment problem where now e 1 = [1, 0, . . . , 0] T is the first canonical basis vector in R n . We should mention that the interpolation problem (11), (12) need not have a solution, and that this recursive algorithm can break down, but we never encountered this problem in our experiments. We also assume in the sequel that J has n distinct eigenvalues µ j to simplify the presentation. Then we can factorize where D is a diagonal matrix with the eigenvalues µ j on its diagonal; the columns x j of are the associated eigenvectors. Given this factorization we then let where Λ ∈ R n×n is the diagonal matrix with the eigenvalues of A on its diagonal. It follows that e ντ A = J ν for ν ∈ N 0 , and hence, is the desired solution of the interpolation problem (12). When the memory kernel is integrable and continuous then the autocorrelation function C V is differentiable witḣ We therefore also like to impose the conditionḟ (0) = 0 for the Prony series (18), which gives In general, however, the matrix A generated in (16) does not satisfy this constraint. Our remedy for this shortcoming is that we blame this on measurement errors in y 1 = βm C V (τ ), i.e., the first nontrivial data point, and we seek to perturb y 1 in such a way that (19) is satisfied. We apply a Newton iteration for this purpose, which is described in Appendix A. We emphasize that the problem of fitting exponentials to given data is known to be highly ill-conditioned [48]. In our context this is reflected by the presence of spurious exponentials in (11) with exponents λ j in the closed right-half plane, corresponding to eigenvalues µ j of J in the exterior of the unit disk or on the unit circle. The corresponding exponentials are unphysical because they do not converge to zero. Fortunately, though, if τ has been properly chosen and the data noise is not too big (see Section 3 below), then the respective terms in (11) come with weights, which are by several orders of magnitude smaller than those of the relevant terms. We therefore delete the spurious terms from the series representation of f defined in (18) by eliminating the corresponding eigenvalues. This modification is described in more detail in Appendix B.
Another issue that is worth mentioning concerns negative eigenvalues µ j (within the unit disk) of the Jacobi matrix -in particular, because this detail is not addressed at all in [1]. The complex logarithm which is employed in (17) is defined in the complex plane except for a cut along the nonpositive real axis. However, the presence of eigenvalues µ j ∈ (−1, 0) makes sense physically; in view of (13) those correspond to damped oscillations in the data, and hence, to terms of the form w j |µ j | t/τ cos(πt/τ ) (20) in (11). As the logarithm is multivalued for negative arguments we need to replace the corresponding formula (17) by adding two eigenvalues to the spectrum of A for a correct representation of this term in (18). Again, we refer to Appendix B to illustrate how this can be achieved. The final matrix A has the form It is no longer n × n, but typically has a smaller dimension (N + 1) × (N + 1), because of the elimination of the spurious eigenvalues.

The Positive Real Lemma
In a second step of our algorithm we determine a vector ∈ R N , such that the auxiliary Langevin equation with the system matrix determined in (22) has a stationary solution, whose autocorrelation function CṼ is given by (8). Before doing so let us make two remarks.
(i) Due to our construction the matrix A of (22) has only eigenvalues with negative real part. Therefore, the Langevin equation (23) has a unique stationary solution, and the corresponding autocorrelation function CṼ is given by where the covariance matrix Σ is the solution of the Lyapunov equation cf., e.g., [42]. Since we want βm CṼ (t) to match the Prony series f (t) for t ≥ 0 we conclude from (24) and (18) that we need to impose the constraint i.e., the covariance matrix has to satisfy for some symmetric, positive, semidefinite matrix Σ 0 ∈ R N ×N . Inserting (22) and (26) into (25), we see that (25) is equivalent to the system which is known as (singular) Lur'e equations.
(ii) From the Lur'e equations it is evident that a necessary condition for the existence of an approximate Langevin system (23) is that all eigenvalues of A 0 must have nonpositive real part. Another necessary condition stems from the well-known fact that CṼ is per se a function of positive type, i.e., its Fourier transform CṼ is nonnegative. Since (24) and (22), it follows that this condition is satisfied, if and only if for all ω ≥ 0. This is a familiar condition in control theory, where the function is known as the transfer function associated with the system (23). Because of our previous statement this function is analytic in the open right-half plane, and if it also satisfies (29) then it is called positive real. Unfortunately, it is hard to deduce from (11) whether the transfer function κ is positive real.
We have seen that a necessary condition to find a solution of the Lur'e equations (28) is, that the transfer function κ be positive real. On the other hand a classical result from Anderson [2] states that the Lur'e equations do indeed have a solution when κ is positive real. Even better, there is also a constructive algorithm for computing Σ 0 and ; see Appendix C. Therefore the computation of the coefficients of the extended Langevin equation (23) is settled in the positive real case. When this algorithm fails, on the other hand, then this means that the transfer function (30) lacks to be positive real, and then no extended Langevin model (23) exists for this particular set of data. Usually this means that the grid spacing τ or the number 2n of grid points is either too big or too small (see Section 3).

A final Lanczos sweep
The system matrix A of the auxiliary Langevin equation (23) is a full matrix, in general, and hence, the numerical integration of (23) amounts to O(N 2 ) operations per time step. We therefore perform a coordinate transformation such that T is a tridiagonal matrix (which reduces the work load of the time integration to O(N ) operations per time step). The matrix U and its inverse can be determined with the (nonsymmetric) Lanczos method [30], which generates two vector sequences u 0 , . . . , u N and v 0 , . .
When choosing u 0 and v 0 as the first canonical basis vector in R N +1 then it follows from the biorthogonality relation (32) that for some U 0 ∈ R N ×N , and that b T c (the sign depending on the particular implementation of the Lanczos method), and hence, we have obtained the desired system (7) with The fact that b T c is a positive number can be seen as follows: Eq. (28) implies that b T c is nonnegative and can only be zero when c = 0; by virtue of (22), however, the latter would imply that the matrix A is singular, in contradiction to all eigenvalues of A having negative real part.
Similar to the first step of our algorithm presented in Section 2.1 the two vector sequences are computed recursively, and the recursion coefficients define the entries of the tridiagonal matrices T and T 0 , respectively. We mention that the Lanczos method can encounter a breakdown in exceptional situations [30]; for larger values of N the algorithm can also suffer from a loss of biorthogonality in (32) due to accumulated round-off, and this can lead to new spurious eigenvalues of T in the right-half plane. When either of this happens, then one has to employ the auxiliary extended Langevin equation (23) instead of (7).

Subdiffusion
We start with a well-known model problem [34,50,24] known as subdiffusion, where the memory kernel is and the VACF is explicitly given in terms of the Mittag-Leffler function .
This setting uses reduced (dimensionless) units, so that m = 1 and β = 1. The knowledge of exact (clean) data allows for a proof of concept of our method, although the memory kernel (34) is not integrable and exhibits a singularity at t = 0, whereas our extended Langevin model is always associated with a continuous memory kernel, see (10). Given the shape of the VACF displayed in Figure 1 we have run our algorithm with data samples for three different grid spacings, i.e., values of τ ; for each choice we have compared two grids covering the time interval • τ = 0.6 with n = 10 and n = 15, • τ = 1.0 with n = 6 and n = 9.
In Figure 1 the target data for τ = 0.6 are highlighted (the additional data for the longer time interval are marked in white); besides the exact VACF this figure also shows the corresponding approximations (24) for both associated values of n. In this scale the approximations exhibit a perfect fit of the true autocorrelation function. The same is true for all other four aforementioned choices of the two parameters τ and n.
A zoom into the approximate VACFs, however, provides some practical guidelines on how to choose τ and n for a given problem. For example, the reconstructions corresponding to τ = 1.0 exhibit a certain misfit near t = 0 (upper panel in Figure 2), and the one for τ = 1.0 and n = 6 also has difficulties in matching the small wriggle near t = 17 and the subsequent tail of the true autocorrelation function (bottom panel in Figure 2); all curves that are not visible in these plots lie exactly underneath the true   VACF. Such misfits show that the grid spacing τ = 1.0 is too large to cope with finer details and the infinite curvature at t = 0 of the true autocorrelation function. As a matter of fact, our Prony series approximation for τ = 1.0 and n = 6 fails to correspond to a positive real transfer function and does not lead to a valid Langevin system. For these parameters a modified system (7) with nonzeros in the (1,1)-entry of the system matrix and the top entry of the noise vector can be constructed by dropping constraint (19) (thereby foregoing the Newton iteration), resulting inḟ (0) = −0.204. The corresponding approximate VACF is included as dashed line in Figure 2, but is hardly visible in the upper panel, as it is almost enteirely hidden underneath the true VACF; for larger t, though, its fit is not much better than the original one according to the lower panel. Figure 3 presents the reconstructed memory kernels (10) for the various combinations of τ and n; again, the dashed line corresponds to the case τ = 1.0 and n = 6 without the constraint (19), where the memory kernel comes with an additional  delta distribution at t = 0. It can be seen that the quality of the approximation increases with reduced grid spacing and with increasing number of grid points.  Table 1. Number N of auxiliary variables Z in (7) for different choices of τ and n.
The italicized entry N = 5 in the first column corresponds to the unconstrained VACF approximation.
It is interesting to note that increasing the number of grid points (for the same grid spacing) beyond a certain threshold leads to the occurrence of spurious eigenvalues. As shown in Table 1, when increasing the number of grid points for τ = 0.6 from 2n = 20 to 30, then the size of the Langevin system (7) increases only by two (and not by five, as one would expect). The situation is even more striking for τ = 0.4: increasing the number of data points from 2n = 30 to 44 leads to only one additional auxiliary variable in (7).

Molecular dynamics data
In earlier work [27] we have employed an MD simulation to generate VACF data of a single colloid in a Lennard-Jones (LJ) fluid. In that work we have developed a new iterative procedure (which we called IMRV method) to determine a memory kernel for a generalized Langevin equation of a coarse-grained model. Here we demonstrate the consistency of the memory kernel (10) corresponding to the extended Langevin model (7) based on our approximation of the same VACF with the results from [27].
In preliminary experiments we found that in this setup the interpolation grid should extend up to a final time somewhat beyond t = 1, and therefore we have chosen the parameters τ = 0.06 and n = 10 for this example (see Figure 4). With these parameters we haven't encountered any spurious eigenvalues nor negative real ones, so the extended system is also of size 10 × 10 with N = 9 auxiliary variables. From Figure 5 it can be seen that the corresponding memory kernel approximation (10) is very similar to the two kernels computed in [27] with the IMRV method and via the (second order) Volterra integral equation scheme suggested in [45].

The impact of data noise
To assess how well our method performs with noisy data, we generated three data sets with shorter run times, which results in less smooth VACFs.
The setup of these MD simulations is almost identical to that in the previous section. The system is that of a colloid in a fluid of N = 31627 LJ particles with size σ and mass m. We use truncated and shifted LJ potentials with the energy scale which are cut off at r c = 2 1 6 σ, resulting in purely repulsive WCA interactions. The cubic simulation box has periodic boundary conditions in all three dimensions and a side length 35.76σ. The length, energy, and mass scales in the system are defined by the LJ diameter σ, energy , and mass m, respectively, which defines the LJ time scale t LJ = σ m/ . The colloid has a mass of M = 80m and is defined as a rigid body with a radius R = 3σ. In these simulations the body of the colloid is constructed so that its surface is more smooth than that of the simulations in [27], resulting in full slip boundary conditions for the LJ fluid. In order to sample at our desired temperature, β = 1, we equilibrate the system with a Langevin thermostat. All simulations are performed using LAMMPS [43]. We have generated three data sets with different levels of noise corresponding to the following production run lengths, where we use a time step of ∆t MD = 0.001: (i) Low noise setting: 5 M steps for the production run; (ii) Medium noise setting: 1 M steps for the production run; (iii) High noise setting: 0.1 M steps for the production run. In Figure 6, it can be seen that the relevant details of the VACF occur in the time interval [0, 3], while data noise seems to dominate beyond t = 5 -in each of the three settings. We therefore found it appropriate to sample interpolation data from about the time interval [0, 3], using three different grid spacings: However, not all transfer functions associated with these data have been positive real; when they weren't we reduced the dimension n by one, and retried, until being successful eventually.  Table 2. Grid parameter n and number N of auxiliary variables Z in (7) for noisy data.
This is documented in Table 2: For example, the entry n = 23 for the medium noise setting (ii) and grid spacing τ = 0.06 bears witness that both for n = 24 and n = 25 the corresponding transfer functions failed to be positive real. Overall, out of the 9 = 3 · 3 cases that we have used data samples for, six have been successful right away, while three of them failed initially, namely setting (i) with τ = 0.06, setting (ii) with τ = 0.06, and setting (iii) with τ = 0.1. It is difficult, though, to deduce a general rule of thumb for when the algorithm is prone to fail. Our experience indicates that the probability increases significantly beyond n = 20, but the algorithm may break down for n = 25, and yet be successful for n = 26. Turning to the VACF plots in Figure 6 it can be seen that in the low and medium noise settings, (i) and (ii), the VACF approximation for the grid spacing τ = 0.1 is best, whereas τ = 0.2 yields the best result in the high noise setting (iii). The latter is surprisingly smooth in spite of the high amount of noise, and looks very similar to the one from setting (ii). In fact, according to Table 2 both approximations are based on only five (= N + 1) non-spurious eigenvalues of A and their associated eigenvectors, and those probably carry much the same information in both settings.
Overall we conclude that τ = 0.06 is too small for this MD setup: In the high noise setting (iii) the corresponding VACF reconstruction picks up too much noise and is oscillating a lot; in the other two settings the noise doesn't manifest itself in the reconstruction but in a series of spurious eigenvalues (N n − 1), the elimination of which causes a certain loss of the interpolation property and restricts the quality of the approximation. This is most striking for the medium noise setting (ii) -which is the example where dimensions n = 25 and n = 24 failed.
Concerning the impact of noise on the associated memory terms, Figure 7 reveals that with increasing noise more and stronger oscillations appear in the memory kernels. This effect becomes stronger with decreasing grid width τ , but all Langevin models are more robust in that respect than the Volterra equation method, which we have employed for "calibration". Taking the kernel calculated with the Volterra method for the small noise setting (i) as "ground truth," then one can see that the Langevin models for τ = 0.2 fail to match the depth of the sharp minimum of the kernel near t = 0.2, whereas the memory kernels for the other two grids provide a reasonable fit of this particular feature, regardless of the noise magnitude. Apparently, the grid spacing τ = 0.2 is too wide for that purpose -although this could not be perceived from the VACF reconstructions. On the other hand, in the high noise setting (iii) the fairly smooth memory for τ = 0.2 is the most convincing one. We conclude that for larger amounts of noise it pays off to use a larger grid spacing, and that the quality of the associated memory kernel correlates well with the quality and the smoothness of the VACF approximation.

Computational efficiency
The accurate determination of an extended Langevin model allows systems to be coarsegrained. However, for such a model to be useful as a coarse-graining tool, it needs to provide a sufficient speedup to simulation runtimes. To evaluate the speedup, we simulate a coarse-grained model of the system discussed in Section 3.3 using the extended Langevin model determined from the method outlined in this paper. We then compare this runtime to that of a fine-grained MD simulation of the same system, using the same time step ∆t MD = 0.001. All simulations are performed in 3-dimensions using LAMMPS [43]. Although the described method is derived in one-dimension, each component of the velocity in our system is uncorrelated. Therefore, we can apply the determined extended Langevin model to each component of the velocity and maintain the system properties. For our coarse-grained simulation, we use an Euler-Maruyama integration scheme.
As expected, simulations using our extended Langevin model were significantly less computationally expensive than the fine-grained MD simulations. The speedup factor per time step was 160 for an extended Langevin model with 9 auxiliary variables, and 170 for a Langevin model with only 5 variables. Normalizing these numbers by the degree of coarse-graining (i.e., the number of fine-grained particles per coarse-grained particle), these numbers are comparable to those reported by Wang et al. [49], who used their machine learning based auxiliary variable schemes with comparably high number of auxiliary variables to study dilute star polymer solutions. We note that the actual speedup per time unit is even larger due to the fact that the time steps in coarse-grained simulations can be chosen much larger than in microscopic simulations [27,49].

Conclusion
In the present work, we have addressed the problem of mapping GLEs onto equivalent extended Markovian Langevin equations which can be integrated more easily in numerical simulations. We have developed an analytical method to extract the parameters of the extended Markovian system directly from the knowledge of the autocorrelation function of the target quantity, without knowledge of the memory kernel in the GLE. The memory kernel can also be evaluated in retrospect via Eq. (10). Importantly, in contrast to related recent approaches based on auxiliary variables [11,5,47,35,49], the number of independent degrees of freedom in our extended Markovian system is the same as that in the original GLE. Thus we are not extending the configurational space of the system. Compared to previous methods to derive extended Markovian integrators for GLEs, our method has the advantage that the number of auxiliary variables can be chosen much larger, even beyond twenty in some of our test runs. Therefore, the memory kernel can be represented very accurately. Moreover, as we have shown above, the algorithm can handle input data that suffer from large statistical noise.
We have derived the algorithm using the example of a GLE for Brownian dynamics with memory. In doing so, we have assumed that the derivative of the VACF vanishes at time zero,Ċ V (0) = 0. This is certainly the case if it is derived from an underlying atomistic model with reversible, Hamiltonian dynamics. However, it may not be correct if the underlying fine-grained dynamics already has a dissipative component. From a physical point of view, this would imply that the dynamics of the system is governed by processes on vastly different time scales -intermediate time scales that are captured by the memory kernel, and very short scales that are captured by additional instantaneous friction terms. Our algorithm can be extended to account for such a possibility, as we have demonstrated in Section 3.1.
In the current form, the algorithm can be used to integrate one dimensional GLEs or higher dimensional GLEs, if the memory and stochastic part of the high dimensional GLE can be written as a sum of independent one-dimensional GLEs. This is the case in most current applications of Non-Markovian modelling, e.g., in GLEs based on selfmemory only [49] and in Non-Markovian dissipative particle dynamics [33]. In the present form, this method cannot yet be applied to general multidimensional GLEs with pair memory [28]. Extending the algorithm for such cases will be the subject of future work.

Acknowledgments
We thank Viktor Klippenstein, Madhusmita Tripathy, and Nico van der Vegt for fruitful discussions. The research leading to this work has been done within the Collaborative Research Center SFB TRR 146; corresponding financial support was granted by the Deutsche Forschungsgemeinschaft (DFG) via Grant 233530050. GJ also gratefully acknowledges funding by the Austrian Science Fund (FWF): I 2887. Computations were carried out on the Mogon Computing Cluster at ZDV Mainz.

Appendix A. The Newton scheme
In order to satisfy (19), we consider the (1,1)-entry A 1,1 of A as a function ϕ of y 1 , choose y ) ≈ 0 has been found. However, instead of the target function ϕ(y 1 ) = A 1,1 , chosen in (19) for the ease of presentation, we (need to) use ϕ(y 1 ) = Re A 1,1 (A. 2) in (A.1). The reason is that negative eigenvalues of J result in a nonzero imaginary part of A, and it is only the real part which is relevant for our purpose because of the way we subsequently modify these eigenvalues, see Appendix B.
The difficulty in (A.1) is the evaluation of the derivative ϕ because of the nontrivial recursion that is used to set up the tridiagonal matrix J. We resolve this problem in two steps. First, we use algorithmic differentiation [23] to simultaneously determine J and its derivative Second, we employ the chain rule and the useful relation which follows from [25, p. 58]. This yields the expression ϕ (y 1 ) = Re (e T 1 ∂A e 1 ) = Re ∂A 1,1 , to be used in (A.1), see Appendix C for further details.
We finally mention that there is no need to determine y 1 to high accuracy, because (i) the subsequent modifications of A, cf. Appendix B, will slightly perturb this matrix entry anyway, and (ii) because of the way we (approximately) solve the singular Lur'e equations, cf. Appendix C. In our experiments two to seven Newton steps were always sufficient.

Eliminating spurious exponentials
Given the eigenvector matrix (15) of A, we let z = X −1 e 1 = [z 1 , . . . , z n ] T . Then we readily conclude from (16) the Prony series representation where x 1j is the first entry of the jth eigenvector x j . Let us now make the assumption that λ n is a spurious eigenvalue of A in the closed right-half plane, which we want to eliminate from (B.1), because it is non-physical and has negligible impact on f for t ∈ , since |x 1n | and |z n | are sufficiently small.
We denote by Λ the diagonal matrix with the eigenvalues λ 1 , . . . , λ n−1 of A. Then we delete the final column of X and select one out of rows 2 to n, which we also delete from X to obtain an invertible matrix X ∈ R (n−1)×(n−1) -in our code we always took the last row. Finally, let z = [z 1 , . . . , z n−1 ] T and e 1 denote the first canonical basis vector in R n−1 . Since z n is assumed to be negligible, it follows that X z ≈ e 1 , i.e., z ≈ X −1 e 1 , and hence, for t ∈ there holds w j e tλ j by virtue of (B.2). We therefore achieve our goal by replacing A by

Treatment of negative eigenvalues of J
Recall that we have made the assumption that all eigenvalues of J are different. Let us now assume that the eigenvalue µ n of J belongs to the interval (−1, 0); it comes with a real eigenvector x n and a real weight w n in (B.1).
In order to replace the corresponding exponential term by the one in (20), we define the diagonal matrix with λ ±n as in (21), and we further let X = x 1 · · · x n−1 x n x n 0 · · · 0 i −i and z = z 1 · · · z n−1 z n 2 z n 2 T .
The matrix X is an invertible (n + 1) × (n + 1) matrix, and it satisfies where e 1 is the first canonical basis vector in R n+1 . It therefore follows that e 1 T X e tΛ X −1 e 1 = e 1 T X e tΛ z = n−1 j=1 w j e tλ j + 1 2 w n e tλn + 1 2 w n e tλ −n = n−1 j=1 w j e tλ j + w n |µ n | t/τ cos(πt/τ ) by virtue of (21). Since this is the Prony series we have been looking for, we achieve our goal by replacing A by Appendix C. Outline of the full algorithm Algorithm 1 Input: y ν = C V (ντ )/C V (0), ν = 0, . . . , 2n − 1 1: call Algorithm 2 to compute the matrix J and its derivative ∂J with respect to y 1 2: for i = 1, . . . , i max do set y 1 ← y 1 − Re A 1,1 /Re ∂A 1,1 5: call Algorithm 2 to recompute J and ∂J with the new data vector y 6: end for 7: if |A 1,1 | > ε then 8: error (Newton's method did not find a zero of the function y 1 → A 1,1 ) 9: end if 10: factorize J = XDX −1 with a diagonal matrix D; [µ 1 , . . . , µ n ] ← diag (D) 11: duplicate eigenvalues µ j ∈ (−1, 0) and remove eigenvalues µ j with |µ j | ≥ 1; adjust X and D accordingly (see Appendix B) 12: set Λ ← 1 τ log(D); for duplicated eigenvalues µ j ∈ (−1, 0) employ formula (21)  We provide a pseudocode formulation of the overall algorithm in Algorithm 1. For the ease of presentation it comes with a small threshold parameter ε and a maximum number i max of iterative steps to control the Newton iteration.
In contrast to our presentation in Section 2.2 we have taken the liberty to introduce in line 14 of Algorithm 1 a negative entry A 1,1 = −δ of small absolute value to circumvent the singular Lur'e equations (28); this is a kind of regularization. The associated Lyapunov equation AΣ + ΣA T = −βm LL T with L as in line 16 is equivalent to the One consequence of introducing the regularization parameter δ is that the system (7) changes into where both T 1,1 and G 1 are nonzero, but small in absolute value. Before going on let us define, for polynomials a ν x ν , a ν ∈ R , of degree 2n − 1, at most, the moment functional with y ν the given data (5). Algorithm 2 determines the corresponding Lanczos polynomials u i of degree i = 0, . . . , n − 1, respectively, given by Φ[u i u j ] = ±δ ij , i, j = 0, . . . , n − 1 , and returns the Jacobi matrix J with the associated recursion coefficients and its derivative ∂J with respect to y 1 . For the computation of this derivative we also introduce the functional ∂Φ which maps p as above onto ∂Φ[p] = a 1 .

(C.4)
Note that all variables in Algorithm 2 with a prefix ∂ denote the derivative with respect to y 1 of the same variable without prefix. Further note that u i ,ũ i , ∂u i and ∂ũ i are all polynomials of degree i in the real variable x, and that, for example, xu 2 i is short-hand notation for the polynomial x → x(u i (x)) 2 . We finally mention that in line 18 the variable σ i ∈ {±1}, and hence, its derivative with respect to y 1 is zero.