Fokker–Planck description for a linear delayed Langevin equation with additive Gaussian noise

We construct an equivalent probability description of linear multi-delay Langevin equations subject to additive Gaussian white noise. By exploiting the time-convolutionless transform and a time variable transformation we are able to write a Fokker–Planck equation (FPE) for the 1-time and for the 2-time probability distributions valid irrespective of the regime of stability of the Langevin equations. We solve exactly the derived FPEs and analyze the aging dynamics by studying analytically the conditional probability distribution. We discuss explicitly why the initially conditioned distribution is not sufficient to describe fully out a non-Markov process as both preparation and observation times have bearing on its dynamics. As our analytic procedure can also be applied to linear Langevin equations with memory kernels, we compare the non-Markov dynamics of a one-delay system with that of a generalized Langevin equation with an exponential as well as a power law memory. Application to a generalization of the Green–Kubo formula is also presented.

use of the time-convolutionless transform, a technique developed in the '70s, to convert a time non-local Langevin equation to a time local description, and thus construct an equivalent FPE [25][26][27][28][29]. While focus of this construction has been directed at Langevin equations where time non-locality is given by a time convolution of the stochastic variable with a memory kernel, the so-called generalized Langevin equations (GLEs), also referred to as the fractional Langevin equation [30][31][32], this approach has not been fully exploited in Gaussian linear delayed Langevin equation (DLE).
A well defined procedure to find FPEs for the 1-and 2-time probability distribution equivalent to the Langevin description has been developed for the case of GLEs [23,25,[33][34][35]. However, in the DLE literature, such FPEs are still lacking. Of note is that more recently an FPE for the one-delay DLE described in terms of two state variables, the delayed and nondelayed one, has been proposed [36]. Although such representation has been shown to be equivalent to a random walk description [37], and generalizations to multiple delays have been put forward [38], it has an important drawback. It links the rate of change of the 1-time probability distribution of the non-delayed variable to the moments of the 2-time joint probability distribution of both the non-delayed and delayed variables (one for each delay). As a result, an exact solution for such an FPE, so far only obtained at steady state and for the case of one delay [39,40], cannot be found without supplementing information about the moments of the distribution from the Langevin equation. This under-determined nature of the FPE [40] calls into question its practical utility and has recently brought statements about finding the exact time dependent probability distribution equivalent to the DLE as a problem beyond reach [41].
Here we present a solution to this problem by building on recent studies of some of the present authors [42,43]. We construct and solve the 2-time FPE including the case where the initial history is non-zero (zero initial history makes a DLE a special case of a GLE). In other words we construct and solve closed FPEs for the 1-and 2-time joint probability distributions associated with a DLE. From that we are able to write the bona fide FPE representation of a DLE in terms of the the conditional probability distribution of the delay process and show how ageing (see e.g. [44]) effects appear evident from the dependence of the dynamics on the initial preparation of a system. Furthermore, our exact analytics allow us to show that DLEs and GLEs do not satisfy the Markov condition and are time-inhomogeneous processes except when they reduce to the Wiener or Ornstein-Uhlenbeck process. As an application we use the formalism of the 2-time probability distribution to highlight interesting differences when computing multi-time averages of the one-delay DLE versus a GLE with exponential memory or with a power law memory. Use of the conditional probability distribution also allows us to use our formalism to construct analytically the generalization of the Green-Kubo relation.
The paper is laid out as follows. We start in section 2 by reviewing the constitutive links between joint and conditional distributions of Gaussian processes and the relations between the Fokker-Planck and Langevin description. In section 3 we construct the 1-and 2-time FPE from a DLE with any number of delays and arbitrary history function, solve them analytically and present the bona fide representation for the conditional probability distribution. In section 4 we show that DLE and GLE can be analyzed together given their formal similarities and that their time non-locality describe non-Markov processes. The role of time homogeneity in these non-Markov systems and their relation to the Chapman-Kolmogorov equation (CKE) is also elucidated. In section 5 we study three specific non-Markov processes, a DLE with a single delay and a GLE with exponential and power law memory. We compare them in computing conditional probabilities when the system preparation time does not coincide with the start of the observation as well in calculating multi-time averages. Analysis of the generalized Green-Kubo formalism is also presented. Finally section 7 forms the concluding discussion.

Interrelation between Langevin and Fokker-Planck description
We review here some preliminary notions and the notations necessary to follow our derivation starting from the definition of the complete description of a stochastic variable x(t) in terms of the so-called n-time probability distribution ( ) x t , ; ; , n n n 1 1 . This distribution represents the joint probability for the random variable to attain the value x 1 at time ( ) t 1 , x 2 at time ( ) t 2 through to x n at time ( ) t n with ( ) where ( ) d z is a Dirac delta function. The n-time probability distribution allows one to compute any m-time average with  m n [45]. While the ( ) n 1 -time probability distribution ( ) x t x t , ; ; , n n n 1 1 1 1 1 is simply the marginal of the n-time distribution, the conditional distribution Q n m of the variables + x m 1 at time ( ) + t m 1 and + x m n at times ( ) + t m n , given the variables x 1 at time ( ) t 1 and x m at time ( ) t m , is given by Bayes theorem [46] as the ratio of the joint m-and ( ) + m n -time distributions (with  n m , 1 ) or alternatively the ratio of the conditional r-and ( ) + n r -time distributions (with  r 1): The general relation between the joint n-time probability ( ) , the initially conditioned probability, is found via marginalisation of Bayes relation (1).
, which indicates the exact equivalence between W n and P n when the system is prepared with a localized initial distribution ( ) . The unique class of stochastic processes called Markov are those for which the conditional probability depends only on the most recent previous times [27,47,48]. This means that they satisfy the relation . As Markov processes are completely determined by Q 1 1 and W 1 , it follows that time-homogeneous Markov processes are completely determined by the initially conditioned probability , 0 1 1 0 . In contrast, non-Markov processes are in general only completely described by the full hierarchy of joint probability distributions. However, Gaussian processes represent an exception to this rule as all n-time averages of a Gaussian distribution can be found from the 1-and 2-time averages [49]. Consequently, Gaussian non-Markov processes, like Markov processes, are completely determined by the 2-time joint probability distribution, In figure 1 we display, as a visual aid, the general interrelations between for linear Gaussian processes. In this article we focus primarily on developing a formalism of what those arrows represent when the Langevin equation is a DLE, but we also show that the formalism can be applied in general to GLEs. We construct and solve the associated FPE, but we do not report on the method of characteristic functionals (dashed lines), which has already been presented in [42,50].

DLE and a bona fide
while the coefficients g i are rates of decay for each of the N-delays, , and a given history function ( ) (3) is captured formally by the Green's functions ( ) l t satisfying and

The deterministic dynamics of equation
The symbol  indicates that the two objects are linked by marginalization. Solving these equations ( ) and applying Bayes theorem () gives the conditional probability, , , 1 1 , which gives, in the limit ¢ = t 0, the initially conditioned probability, can be obtained directly from the solution of the Langevin equation using characteristic functionals ( ) (see e.g. [42]). The double-arrowed line () aims to indicate the relations The dynamics of x(t) can be written explicitly in terms of this Green's function as [51] where the first two terms are present in the solutions of (3) with a dependence on the history function given by The time-convolutionless transform consists of converting the Langevin equation into a time-local form by differentiating the solution (5) with respect to time and substituting x 0 from equation (5) into the resulting equation [25][26][27]34]. The DLE then reduces to the time local form The non-Markoffian characteristics of the equation, which was apparent in the original DLE, is now lumped in the time-dependent terms A(t) and B d (t) and the colored noise obeying

FPE for the 1 and 2-time probability distributions
The procedure to construct a FP equation for the 1-and 2-time probability distributions from equations (7)- (11) consists of expressing the dynamics of ( ) x t x 2 In the Markoffian case one would evaluate the required ensemble average by substituting equation (3) into the above expressions and using properties of the Dirac delta to exchange the derivative of the stochastic variable x(t) for that of the fixed variable x. However the time non-local form of equation (3) implies that this substitution cannot be used to evaluate the required averages [43]. Performing this procedure leads to a non-closed FPE [52], that is a partial differential equation where the first time derivative of ( ) . Although in appendix A we show that it is possible to find a solution of such a FPE by postulating an ansatz informed by the dynamics of the moments of the Langevin equation, here we construct a closed form FPE.
To construct and solve as an initial value problem a bona fide FPE we need to use instead the time-local Langevin equation (7). Its substitution into equations (12) and (13) allows the evaluation of the averages to obtain the drift and diffusion terms of the FPEs. While the drift term is readily found to be equivalent to the deterministic term in the Langevin equation (with a change of sign), finding the diffusion term requires calculating the functional derivative of x (t) with respect to ( ) x t with the help of Novikov's theorem [53] (see also [54,55]). For the particular case of noise obeying fluctuation-dissipation the details of this procedure are given in [29] and the drift and diffusion terms of the 1-time FPE are shown to be equal. For our case of delta-correlated noise these terms are found to be very different. In appendix B we explicitly show their derivation for both the 1-time (also shown in [34]) and the more complicated 2-time distributions. The resulting FPEs are , , When ¢ = t 0 the joint distribution reduces to the 1-time distribution and the two FPEs are identical since ( ) ¢   C t t 0, 0. Compared to the usual FPE equations (14) and (15) display explicit time-dependence in the diffusion constant D(t), in the strength of the parabolic potential A(t) and in the drift terms both along x and ¢ x . Moreover, if the potential and the diffusion constant changes sign, the FPEs may describe diffusive ( ) > D t 0 and anti-diffusive ( ) < D t 0 dynamics within a parabolic potential that may vary from being attractive ( ) > A t 0 to being repulsive ( ) < A t 0. Moreover, in the parameter regime for which the delay Langevin Green's function display oscillatory behavior, the magnitude of the FPE coefficients may actually become unbounded. Hänggi [47,56] pointed out that this potential scenario may occur when representing non-Markov processes. The origin of this divergence is due to the convolutionless transform not being defined whenever ( ) l = t 0. This has the consequence of having the A(t), B d (t), D(t) and ( , terms becoming infinitely large at those specific moments in time. However, the manner in which the sign of the coefficients become unbounded is such that the probability distributions are well behaved, e.g. the coefficient A(t) grows unbounded with the same sign as D(t). To display the balancing effects of the sign change in the FPE coefficients we display in figure 2 the rescaled coefficients To obtain a FPE description with bounded coefficients also in those cases when the dynamics of ( ) l t is oscillatory, we make a variable transformation from t to the rescaled time T(t) and rewrite equations (14) and (15) in terms of ( )  (14) and (15) for the case in which the delay Langevin equation (3) with the single delay τ displays oscillatory dynamics (gt = 1.2). The label As the zeros of ( ) l t give rise to flex points in the function T(t), the change of sign from positive to negative of the function ( ) coincide, whereas they do not when crossing from negative to positive since the maxima of T(t) are not flex points. The former case corresponds to the time values when A(t) and D(t) diverge simultaneously, while the latter case indicates the separate crossings of the zero value for A(t) and D(t).
being respectively the inverse functions of T(t) and ( ) ¢ T t . Equations (21) and (22) supplemented, respectively, by the initial preparation of the system and by the initial condition constitute the sought after bona fide representation of the delay Langevin equation (3) in the rescaled time T and ¢ T . The dynamics in the actual time t is finally obtained by reexpressing the time T as T(t). This time transformation amounts to a rewriting of the Langevin equation where the dynamics is described in time T rather than time t.
To gain further insight we make a variable transformation (see appendix C) and solve exactly equations (14) and (15) and write their analytic solutions in time t and ¢ t as .
0and equals 1 whenever = ¢ t t . For the special case in which all t = 0 i , the Langevin equation (4) describes the Ornstein-Uhlenbeck process and the correlation function is simply the ratio of the function ( ) ( ) (24) and (25) to T and ¢ T would give the corresponding analytic form of W 1 and W 2 .

Conditional probability and aging dynamics
The 1-and 2-time probability distributions (24) and (25) represent the time-dependent equivalent description of the DLE in equation (3) and they allow to derive the conditional given by In the top and bottom panels we plot at different times equation (24) with a Dirac delta initial condition centered at = x x0, that is the dynamics of the initially conditioned probability distribution In equation (27) the dependence of , , 1 1 on the initial preparation of the system appears explicitly in the convolution integral with ( ) W z 0 both in the numerator and denominator. When the initial preparation affects the dynamics between time ¢ t and t, the system is said to undergo aging dynamics, a situation that does not occur in Markov cases. Indeed one can show that for an Ornstein-Uhlenbeck or a Wiener process, that is when . To display aging effects we consider the non-Markov scenario in which the system dynamics is governed by the Langevin equation (3) with two delays t 1 and t 2 and, for simplicity, one single rate g g g = = 1 2 . We use the general analytic expression of the Green's function for any number of delays [42] and write ( ) where ( ) Q z is the Heaviside step function. Equation (28), despite its complicated appearance, is not composed of infinite series since the summations get truncated whenever the argument of the Heaviside function is negative, and is well suited to evaluate ( ) l t at short and intermediate times. Compared to the Green's function for the case with one delay [50,57,58], for which the Green's function is a polynomial of degree k 1 within the interval ( 1 , for higher number of delays the degree of the polynomial of ( ) l t depends on the combination of k j values. Its order may in fact increase or remain unchanged as time increases. To detect clearly the differences due to aging in our delay Langevin equation we compare the dynamics governed by the two delay Green's function (28) in the oscillatory regime, which is generally present in parameter space close to where the system becomes unstable. As the system exhibits an unstable regime for values of gt 1 and gt 2 beyond the parameter space boundary function , it turns out that with the choice of gt = 1 2 1 and gt = 3 4 2 close to the unstable regime we ensure that the system is in the oscillatory regime. As the mean position in equation (27) is largely controlled by ( ) l t , when ( ) l t changes sign the probability distributions reverse the drift direction.
as time evolves the broadening of the probability distribution can be quite different depending on the type of zero crossing of T(t). When approaching the former case (inflection points), the change of width slows down, whereas it changes more rapidly as time approaches those values for which , , 1 1 may decelerate or accelerate the broadening of their width when close to the time at which, respectively, For our 2-delay system this is clearly visible in the top panel of figure 3 where is plotted for equally spaced time intervals (snapshots). Since ( ) l = t 1 for times shorter than the smallest delay, the distribution broadens without shifting in the first three snapshots, while it also starts moving to the left in the 4th and 5th snapshots. The height of the Gaussian gets only mildly reduced in the sequence of snapshots from the 6th to the 10th indicating a slowing down of the broadening which occurs before and after This direction reversal and slowing down/speeding up of the Gaussian width is also observed with ( | ) t Q x t x , , . The distribution drifts to the left from the start and its width increases without evident signs of slowing down except when close to reversing direction. This is nearly opposite to what occurs with ( | ) Q x t x , , 0 1,1 0 when the rate of spread is at its maximum upon reversing direction. The last snapshot on the other hand indicates that the two distributions are becoming similar as we expect since ( ) t  r t , 0 1 for large t.

Non-Markov processes and time inhomogeneity
Armed with exact expressions for the conditional probability, we can find the conditions under which the Langevin equation describe Markov processes by comparing . To avoid constructing ( ) , ; , ; , 3 to derive the most general conditional distribution = Q W W 1 2 3 2 , we calculate the specific one in which one of the conditioned time is the initial time. From (1) we know that with the latter being identical with , ; , 0 exp 2 1 , and are not equivalent (for ¢ > t 0) and thus do not satisfy the Markov property (2). The non-Markoffian character of the process can be evinced by the explicit dependence on x 0 in equation (29), which disappears only if By differentiating with respect to t the latter expression and using the fact that , in addition to its x 0 -dependence. We can thus state that With this understanding we can also ask whether the processes described by equation (27) are time-homogeneous. Inspection of equation (27) for By Laplace transformation one can show that this identity is satisfied once again only if for any   0. In other words, we conclude that conditional probability (27) represents time-inhomogeneous non-Markov processes, which reduce to the time homogeneous Ornstein-Uhlenbeck or Wiener processes when time non-locality in the Langevin equation is eliminated.
With our analytic results we can determine whether the conditional probability satisfies the CKE [59,60] . While it is known that satisfying CKE is a necessary but not a sufficient condition of all Markov processes [48,61], non-Markov processes may or may not obey it [47]. Direct substitution of the expression (27) shows that , , 1 1 satisfies CKE only if the 2-time correlation function is subject to the requirement ( t . This single criterium is valid for any Q 1 1 derived from a 2-time bivariate Gaussian and agrees with the three simultaneous conditions derived by Hanggi for a general Gaussian process to satisfy CKE [56].

GLEs and multi-time average analysis
Although we have derived a FPE by starting from a DLE, the same technical steps need to be followed to construct the bona fide representation of the GLE The local Langevin equation constructed from a GLE would be identical to equation (7) except for the absence of the B d (t) term which carries the information of the history function (24) and (25) have in fact the same formal appearance of the 1-and 2-time probability distribution that one would derive starting from a GLE. The only difference is in the functional dependence of the Green's function ( ) l t , which satisfies or more generally to the multiple delay differential equation (4) In section 3.3 we have shown the qualitative difference in the dynamics of a stochastic linear delay system when the preparation and observation times do not coincide. Here we highlight the time-inhomogeneous nature of the (non-Markov) DLE and GLE, even when preparation and initial observation times are identical. We do so by comparing a proper 2-time average with the case when the time-homogeneous assumption is forcefully made. We show these differences by analyzing three examples: a DLE with a single τ-delay process and a GLE with an exponential memory and one with a power law memory, respectively, of the form The Fourier characteristics and the time dependence of one-time averages of these three non-Markoffian Langevin equations have been recently studied by the present authors in [43]. The respective Green's functions are the one-parameter Mittag-Leffler function of index υ [62,63]. While the Green's function for the DLE process and for the GLE with power law memory have also a parameter region with unstable dynamics where ( ) l t grows without bounds, all three cases possess a monotonic and a (damped) oscillatory regime [43]. and a corresponding expression with s interchanged with t if < t s. Equation (34) is valid both for a DLE with zero history and a GLE, and reduces to the 1-time ⟨ ( )⟩ ( ) n = x t t 2 when t=s. If we were to assume that the process is homogeneous, i.e.
( | ) ( | ) = -Q x t y s P x t s y , , , , the computation of ⟨ ( ) ( )⟩ x t x s for > t s would only require the 1-time probability distribution and would lead to with the corresponding expression with s interchanged with t if < t s. In figure 4 we plot the covariance from equations (34) and (35) for our three non-Markov processes in the top panels, with the former displayed only for values > t s, while the latter only for values < t s (each respective covariance is specular around the diagonal). When t=s the expressions coincide since x y , , 1 1 and the covariance reduces to a 1-time average. When the Green's function decays to zero at long times for  t s or  s t the two expressions also coincide. Moving away from the diagonal is where the greater discrepancies between the correct covariance in (34) and the incorrect time-homogeneous one in (35) appear. We investigate those differences for the delay case and for the exponential memory when ( ) l t is in the oscillatory regime, whereas we study the case of the algebraic memory in the monotonic regime.
We first analyze the delay case by using the alternate expression for one-delay Green's function in equation (33) which is given by [64] where  is the (multivalued) Lambert function that satisfies the relation ( ) . In the stable regime for which ( ) for all k, if we look at values of ( ) f t s , along a line orthogonal to the diagonal, say c =t s , we can determine the expected rate of decay by taking only the complex conjugate pair of  k with the smallest negative real part, which we indicate for convenience with the index k=0. We have in fact The above analysis is useful to determine approximately the exponential decay, but is unsuccessfull to describe the great disparity as s approaches χ, and in particular it does not provide explanations as to why a nearly flat shape appears in equation (35). To explain this feature one relies upon using the top expression in (33) nearly independent of s. In other words when the variance of the process ( ) n t has already reached its saturating value, and  c t s the s-dependence of equation (35) is suppressed explaining the difference between the rapid drop of ( ) f t s , versus the flat-top shape of ( ) * f t s , . The analysis with a complex exponential for ( ) l t can also be done for the case of the exponential memory with the difference that there is only decaying exponential in this case. The decay rate of b 2 in equation (33) corresponds to ( ) , matching the damping of the oscillations in the bottom panel of figure 4(b). This explains the apparent difference in decay between the delay and the exponential memory case where ( ) f t s , reduces to zero on a time scale approximately twice as fast for panel (b) versus panel (a).
For the algebraic memory we have chosen to display the case where differences between (34) and (35) are most evident, which occurs mainly in the monotonic regime. In contrast to the delay and exponential memory case the bottom panel of figure 4(c) shows the interesting feature that ( ) f t s , decays faster than ( ) * f t s , . It is also evident that no decay characteristic time-scale emerges at the bottom of panel (c) given the power law nature of the GLE process with algebraic memory.
Although we have focused only on the 2-time covariance, comparisons for higher multitime averages can also be made directly from ⟨ ( ) ( )⟩ x t x s . For Gaussian distributions it is known in fact that [49] for any even central moment (note that ℓ is a positive integer), where the notation å  indicates a sum over all distinct pairs, ij, of the ℓ 2 terms On the other hand for any odd central moment, that is when computing ℓ 2 1terms, we have . For example the three time average is explicitly obtained in terms of the lower order moments as

Generalization of the Green-Kubo formula
We now make use of the formalism developed in the previous sections and present a generalization of the Green-Kubo formula relating the spatial diffusion coefficient to the velocity autocorrelation function. A recent study [65] has elucidated the relation between the longtime temporal dependence of the velocity autocorrelation function and the anomalous scaling behavior of the MSD in systems possessing scale invariant dynamics. In this section we are interested in presenting exact time-dependent expressions of the MSD in terms of the Green's function ( ) l t when the velocity associated to a random variable is governed by a DLE or GLE, focusing in particular on systems that do not obey the fluctuation-dissipation relation.
To compute the general expression for the MSD we evaluate the velocity autocorrelation can then be expressed in terms of the Green's function as where now s 2 has units of velocity squared over time. In equation (38) we have used the fact that ( ) l = z 0 when < z 0 to change the limit of integration of the second integral from t to t 2 on the right-hand side. Notice that the traditional simplification of taking ( ) W v t , 1 1 1 to be the stationary distribution ( )  +¥ W v t , 1 1 1 [65] has not been employed here. In the delay case it is possible to integrate equation (36) explicitly and show that at long times l n n n 1 1 2 , and that the Laplace transform of equation (36) can be written alternatively as [42] 1 it is then straightforward to write the long-time dependence of the MSD as ( ) s g  t t d 2 2 2 by noticing that˜( ) l g = -0 1 . This temporal dependence is visible early on in the overdamped regime (gt <e 1 ), but appears at increasingly longer time scales as the system approaches the unstable regime (gt p  2) as depicted in figure 5(a). A simple calculation shows that also in the exponential memory case the long time MSD converges to s g t 2 2 . At intermediate times the MSD dynamics for the delayed system follows the intuitive understanding that the system becomes diffusive depending on the decay time of the Green's function, namely the faster ( ) l t decays the earlier the diffusive regime sets in. This is visible in figure 5(a) where we have plotted the MSD as gt decreases. As the magnitude of  0 increases with a decrease in gt one notices that the system goes from the overdamped to the underdamped regime as gt  0 (for any non-zero γ) with the diffusive linear dependence being approached at increasingly earlier times. The lowest lying dashed-dotted curve has t = 0, which corresponds to an Ornstein-Uhlenbeck process with rate γ.
The dynamics of the MSD for the exponential memory follows a qualitative similar trend in the underdamped regime. There oscillations gets progressively reduced as the decay rate of the Green's function, in this case b, increases. With the choice of g b as our dimensionless parameter we see in figure 5(b) that with smaller values of g b the diffusive regime appears at earlier times. The trend in the overdamped regime, i.e. below the dashed line, is however different from the delay case. Although the MSD eventually reaches the common diffusive limit, s g t 2 2 , for any non-zero γ, a reduction in g b pushes the curves further and further from the dashed line. Numerically we have verified that the location where the curvature is maximal, i.e. the gt value where the third derivative of the MSD equals zero, grows unbounded as g b approaches 0.
The MSD for the power law memory in figure 5(c) has, on the other hand, a very different dynamics from the MSD in panels (a) and (b). The reason is the power law nature of the Green's function. In contrast to both the delay and the exponential memory, the qualitative dynamics shown in figure 5(c) depend entirely on the algebraic parameter ν; the scaling coefficient ga n , from equation (33), affects only the scaling of time [43]. In the undamped regime of the algebraic memory, for which ν is between 0 and 1, the qualitative behavior of the MSD at long times is similar to that of both the delay and exponential memories as it displays diffusive dynamics with the MSD proportional to t. The proportionality constant is dependent on the value of ν. An increase in ν from 0 leads to an initial decrease in the value of the diffusion coefficient, however, as ν is increased further the coefficient eventually increases. This non-monotonic dependence is evident in figure 5(c) below the dashed line.  (33). ln all three panels the MSD in the limit g = 0, i.e. s t 3 2 3 , is drawn with a dotted line representing the Wiener dynamics, which is common to all curves at short times. In all three panels the dashed red line separates the top curves where the Green's function is underdamped from the bottom curves where the Green's function is overdamped. In panels (a)-(c) that corresponds, respectively, to the value gt =e 1 , g = b 1 4 and n = 0. In panel (a) the dash-dotted line is the t = 0 case, which represents the MSD for the Ornstein-Uhlenbeck case whose analytic expression can be written as . The same exact expression is drawn as a dashed line in panel (c) (n = 0 case) as it separates the underdamped from the overdamped regime.
The value of ν that minimizes the diffusion constant strongly depends on the scaling coefficient.
By contrast, the dynamics in the over-damped regime is superdiffusive. This is evident through an analytical examination of the well-known long-time approximation of the Mittag-Leffler function [63], i.e., ( ) l t , valid for non-integer values of ν in the range | | n < 1, Here p is an integer and ( ) O ... is the standard order function which indicates the dominant term in the asymptotic limit. The leading term of equation (39) is proportional to n+ t 1 1 . In the over-damped regime, being of an order greater than −1, it dominates the integrals of equation (38) when  ¥ t . Upon the evaluation of equation (38), with an exclusive focus on the long-time behavior, the MSD is given approximately by ( ) µ n t t d 2 1 2 thus it approaches the standard Weiner dynamics, i.e., t 3 , as ν approaches −1. This leading order analytic result qualitatively describes the long-time dependence of the MSD on ν that is depicted in figure 5(c) above the dashed line. The extreme short-time correlations in the overdamped parameter regime of algebraic memory results in a strongly correlated forcing term and, subsequently, strongly correlated motion.

Conclusions
Stochastic delay differential equations have a long tradition. Since their original introduction to study the macrodynamic theory of business cycles [66,67], they have been employed across a multitude of scientific areas. In recent years they have been used in particular to study the lag dynamics and synchronizability of networks [68][69][70][71][72][73] whereby each node may interact with any other node with different coupling strength and with different delay. Although our results are presented for the case of a multi-delayed stochastic system in one dimension, the approach can be employed to study the dynamics of multidimensional Langevin equations with multiple delays given by˙( ) Our FPE may thus represent a useful limiting benchmark to unravel synchronizability issues in certain types of networks.
The specific focus of our analysis on constructing bona fide FPEs for systems obeying a DLE has an important general applicability. It provides the ability to analyze systems governed by time non-local harmonic forces and subject to boundary conditions. Already for the simpler case of time local harmonic forces knowing the exact functional dependence of the 1and 2-time probability distributions in infinite space is, in most cases, not sufficient to construct the corresponding distributions in presence of absorbing, reflecting or periodic boundaries. Techniques such as the method of images cannot be employed in situations where the space is not homogenous [74,75]. Two are the suggested solutions to circumvent this difficulty. By finding the exact dynamics of the Langevin equation that explicitly incorporate the presence of boundaries, it would be possible to construct the associated probability distribution via characteristic functionals as suggested by Budini and Caceres [50]. When such an exact Langevin solution is lacking, the other approach consists of supplementing the appropriate boundary conditions at the probability level and attempting to build an analytic or numerical solution of the FPE. Determining a bona fide FPE thus promise to open up opportunities to tackle a variety of boundary situations in systems governed by DLEs, which are currently accessible only via stochastic simulations.
In this study we have exploited the linearity of the DLE to generate an equivalent memoryless Langevin equation with colored Gaussian noise via the so-called time-convolutionless transform. We have then constructed the appropriate FPE for the 1-and 2-time probability distribution for the DLE. We have contrasted our FPE formalism with the one proposed in the past where the dynamics of the 1-time probability distribution is coupled to the moments of the 2-time probability distribution. Although we have found an appropriate ansatz that allows to solve exactly the proposed non-closed FPE, we have shown that it can only describe one-time averages and does not provide a bona fide representation of the random process.
Beyond the DLE we have shown that the 1-and 2-time FPE construction procedure can be used essentially unaltered for the case of a GLE. In this respect our results extend known procedures to build 1-and 2-time FPE for a GLE in which fluctuation-dissipation is valid to cases in which it is not and noise is white. The derived analytic expressions for W 1 and W 2 probability distribution have allowed us to show that both DLEs and GLEs are non-Markov time-inhomogeneous processes. Time-inhomogeneity is characteristic of ageing systems for which it is necessary to distinguish between preparation and observation times. The timedependent dynamics cannot in fact be evinced only from W 1 , but requires the knowledge of the conditional probability distribution Q 1 1 .
By taking a specific two-delay system we have presented in detail how the dynamics of the probability distribution differ if observation starts at time t=0 versus some later time. But we have also shown that, even when observation and preparation times coincide, inaccuracies still emerge in evaluating multi-time averages if one were to assume that the process is time-homogeneous. To show this aspect explicitly we have also analyzed and compared the effects of assuming time-homogeneity when computing the covariance for a one-delay DLE and two GLEs, respectively, with an exponential memory and a power law memory.
Finally, application to the case in which the Langevin equation describes the stochastic dynamics of the velocity variable rather than position has allowed us to generalize the Green-Kubo formalism to situations in which a system does not obey the fluctuation dissipation relation.
Defining the fixed variable, t x i corresponding to the stochastic variable, ( ) t x t i and using the following two properties, ⟨ ( ( )) ( one is left to evaluate the average on the right-hand side using Novikov's theorem [53], which states Here we have used the delta correlated property of the noise and the functional chain rule to obtain the average in terms of the functional derivative of x with respect to the noise term. For the linear Langevin equation, equation (3), one can calculate these functional derivatives exactly [27,28], A . 4 Substituting this expression into (A.3) we obtain which in turn can be substituted in equation (A.2) to give the desired FP 6) was already derived in [38], but it was never solved exactly. Writing FPEs for higher order n-time probability distributions does not allow one to close the system of equations since it only creates an infinite hierarchy of equations with W n depending on + W n 1 [76]. There is however a way to find the exact solution once the initial distribution, For an equivalent representation of the Langevin equation (3), the initial distribution has to take the form of a Dirac-delta function, ( ) x x 0 0 , and the joint distribution needs to assign the deterministic dynamics given by the history function ( ) and ( ) ( , given the underdetermined nature of equation (A.6), one proposes an ansatz in terms of the moments of the distribution and supplement information about those moments from the Langevin equation. Such a procedure has been exploited to find an exact solution at steady state for the case of a single delay dynamics [39,77]. We now show that a proper ansatz allows one to extend the procedure to the case of multiple delays and for the full time-dependence.
We begin by substituting the marginal relationship, ( ) ( ) , ; , d  Naming, k and k i , as the associated Fourier components of x and t x i , we can write the Fourier transform of this probability distribution aŝ ( ) To obtain an explicit expression forˆ( ) W k t , 1 one needs to solve the two differential equation (A.11) relating the evolution of the mean, variance and covariance of the system. In order to proceed we have to return to the Langevin equation equation (3), and examine the evolution of its mean and variance It is thus evident that the mean and variance (A.12) of the Langevin equation (3) obey the same dynamics of the mean and variance of the ansatz distribution (A.11). Therefore we can use equation (5) and the properties of the Gaussian noise to find the following explicit expressions Note that the expression for the mean is dependent on the initial condition of the system, x 0 . Therefore substituting these averages into equation (A.10), and settting = t k 0, will give a Gaussian that satisfies the FPE and represents the initially conditioned distribution,  The remaining averages can once again be calculated using Novikov's theorem [53] ⟨ ( ) (  (14) and (15).

Appendix C. Solving the FPEs
To solve the FPEs for the 1-and 2-time probability distributions we use the method of transformation of variables [79]. Here we demonstrate the method with a general FPE whose structure represents both equations (14) and (15). Consider the distribution ( ) ¢ R x t t , , obeying with  ¢ < t t 0 and subject to the general initial condition, ( x y Z y t , , , . We define the variable transformation and where the choice of ¢ t for the lower limit of the integrals is taken to satisfy the initial conditions. Substituting ( ) ¢ R x t t , , in equation (C.1) using equation (C.2) gives the diffusion equation, From this expression we can immediately obtain the solution of equation (14) for ( ) W x t , 1 .
d into (C.6) we arrive at equation (24). To obtain the solution for ( ) ¢ ¢ W x t x t , ; , 2 more work is required as equation (C.1) does not correspond directly with equation (15) but does correspond to the Fourier transform of (15) over ¢ x , given bŷ x W x t k t , ; , i , , ; , , C.7