Analytical approximation to the multidimensional Fokker--Planck equation with steady state

The Fokker--Planck equation is a key ingredient of many models in physics, and related subjects, and arises in a diverse array of settings. Analytical solutions are limited to special cases, and resorting to numerical simulation is often the only route available; in high dimensions, or for parametric studies, this can become unwieldy. Using asymptotic techniques, that draw upon the known Ornstein--Uhlenbeck (OU) case, we consider a mean-reverting system and obtain its representation as a product of terms, representing short-term, long-term, and medium-term behaviour. A further reduction yields a simple explicit formula, both intuitive in terms of its physical origin and fast to evaluate. We illustrate a breadth of cases, some of which are `far' from the OU model, such as double-well potentials, and even then, perhaps surprisingly, the approximation still gives very good results when compared with numerical simulations. Both one- and two-dimensional examples are considered.


Introduction
The Fokker-Planck equation (FPE) arises in a broad range of problems from physics, engineering science, economics and mathematical modelling. Part of this breadth of application can be traced back to its success in modelling generic transport processes provided the dynamics can be represented by a Hamiltonian or Lagrangian with random components [19]. Alternatively, a time series of data can be analysed as a Markov process to create an effective FPE that captures the statistics of the observed process. This has led to modelling based upon extracting FPEs from experimental or observable data for instance in turbulent cascades [22], fractal-generated turbulence [43], modelling the beat fluctuations in heart-rate [24], electronic noise and kinetics [11], electronic circuits with nonlinear resistance [16], systems with overdamped Langevin dynamics [17], or from nonlinear friction [30,45]. Financial modelling yields a wealth of further applications [23]. The modelling of market behaviour, where deviation from equilibrium is likely to be accompanied by higher volatility and/or slower mean-reversion, means that the invariant distribution is fat-tailed or leptokurtic [34]. Agent-based models [1] for the herd behaviour of interactions between traders, and the influence of rumours, social interactions and external information [7], take ideas from Kirman's stochastic models of information transmission [28] to arrive at FPEs. Rather than study a single specialised case we consider generic FPEs with the main restriction being that we consider mean reverting processes.
Another important application for FPEs is in Kalman filtering, which in addition to physics also impacts upon control theory, optimisation, and time series analysis. Examples include: subatomic particle tracks [9]; movements in the ionosphere [41]; chemical reactions [39]; and extensive use in econometrics for making predictions about financial variables in systems dominated by stochastic behaviour, e.g. [15]. The basic operation of Kalman-type filters is long-established (see e.g. [35,26,25]), and a key ingredient is the state transition density. Initial work used linear assumptions, i.e. an underlying Gaussian model. When the drift and covariance are nonlinear functions of the state vector, the extended Kalman filter [18] is often used, but in essence this employs a linearisation so that the Gaussian is used locally. This is usually acceptable for short time periods, but may not be over longer ones. A better approximation to the transition density for non-Gaussian processes, such as that we provide, is therefore highly desirable.
The normalised form of the FPE, in one dimension, that we investigate is where f is the probability density function, τ is nondimensional time and y is a spatial variable; the initial condition is a delta-function centred at y 0 . There is a general drift function A(y) for which we will take various choices as examples in later sections. The steady-state solution f (∞, y) ∝ exp y A(z) dz, is a normalisable probability density: this can be ensured by the condition lim sup |y|→∞ −yA(y) > 1 + ε for some ε > 0. As alluded to earlier this partial differential equation, the FPE, is connected to a stochastic differential equation (SDE): with τ = κt. More generally an SDE having both spatially varying mean and variance is however, provided σ X is bounded away from zero we can make the substitution (also known as the Lamperti transformation) from X to Y defined by dy/dx = √ 2κ/σ X (x), which places (3) in the normalised form (2); therefore the analysis we present for the normalised form is more general than it first appears. In higher dimensions we take A to be the gradient of a potential, i.e. conservative, and note that under some common assumptions the Lamperti transformation can be appropriately generalised [37].
The best-known example with a steady state is the OU process given by A(y) = −θy (θ is a parameter), for which there is a well-known explicit Gaussian solution [38]. However, one often wants to deviate from this model because, away from equilibrium, the force field A cannot be expected to rise without limit, but instead be bounded; equivalently, the equation describes diffusion in the presence of a potential which cannot be expected to be quadratic in general. In these situations the steady state will no longer be Gaussian. Departing from the OU, whilst gaining closer connection to the physical model under consideration, loses analytical tractability and typically numerical methods are required. Naturally, one would like the best of both worlds: analytical tractability and physical relevance.
Given the connection with SDEs, there is a choice between the deterministic or stochastic approaches in terms of which is more practical to tackle numerically. Choosing the latter naturally leads to Monte-Carlo methods: evaluating the density of a stochastic process requires not only a large number of simulations, but also kernel density estimation (KDE) at each point in time to produce a smooth estimate from the simulated data points: For literature on KDE, see. e.g. [40,42,4]. The quality of the estimation depends on the kernel width and the number of simulations. The optimal choice of width is not straightforward [13,27] 1 . In the work we have done, if f (∞, y) is very fat-tailed, for example Student-t, then simulation of 100, 000 paths is required to get an L 1 error (integrated absolute error) of 4.10 −2 . In two dimensions, 100, 000 paths only yield an L 1 error of 2.10 −1 . Furthermore, for every evaluation point one typically needs to make use of every path simulated, so the computation time of estimating the density is much higher than the simulation time and suffers considerably from the curse of dimensionality; see [10] for background and numerical algorithms that aim to lift the curse. This impediment means that numerical methods for the FPE are preferred and literature combining Monte-Carlo and KDE methods is rare and specific, e.g. [5]. Thus, when we perform numerical calculations we do so upon the FPE directly; for efficiency and accuracy we use spectral methods and detail these in section 2.1. Even so, the numerical simulations in two dimensions become very time-consuming, particularly if one is not looking for the solution at short time. Furthermore, if one desires to sweep over different parameter values to analyse the effect on the results, the problem gets even worse.
All this points to the need for a simple, fast, approximation. Further, it is desirable to have a modus operandi that gives results that are intuitive and offer direct insight into the problem at hand. Ideally, we would like to address the metaphysical question of 'how diffusions think about solving themselves'. In this respect, what we are going to describe-in the first instance (10)-does give clear intuition, in that the various terms in the equation make it clear how the solution behaves; also, the result at some basic level looks like an expression for the evolution of a probability density, in a way that an infinite series of eigenfunctions does not.
Most analytical approximations are based on the summation of eigenfunctions [38] that are often orthogonal polynomials and special functions, which is unsurprising given the linearity of the FPE; the OU can also be approached this way using Hermite polynomials [36]. Alternative asymptotic approaches, such as WKB methods [8], are useful for analysing the approach to equilibrium but are limited to studying specific regimes involving small or large diffusivities. A somewhat different approach, upon which the first steps were made in [34], consists in expressing the solution as a product of terms, rather than the more usual sum. Intuitively, with a sum it is difficult to represent the initial delta-function without generating oscillatory artefacts, whereas with a product this is simple: a narrow Gaussian, of width tending to zero as τ → 0, will capture that regardless of what the other terms in the product are; another term can capture the steady state; and a series of correction terms 'patches-up' the mid-term behaviour. Intuitively, we are expanding around an OU model, in the sense of finding the characteristics of a mean-reverting solution as exemplified by the OU case and capturing these characteristics for the general case, while reproducing the OU case exactly. There is a passing similarity with the WKB approximation, mainly in the use of a logarithmic transformation, but WKB expands around a zero-volatility (deterministic) model and is singular in the zero-volatility limit.
There are some important consequences of using products. The logarithm of the density is represented by a sum, and so: (i) positivity is guaranteed, in a way that it is typically not using linear methods; (ii) from the theoretical perspective there is a clear relation to entropy, and calculation of that from an approximated density is virtually impossible if the approximated density is anywhere negative. Based on these points, we therefore choose to develop this approach.
We begin by introducing our approach in the one-dimensional setting (Section 2), giving the key results and the main technical route to them. A range of examples demonstrates the efficiency of the results as we move away from the OU process. We then move on to the higher-dimensional case in Section 3 and show numerical simulations. Section 4 discusses extensions and potential limitations of the method, with concluding remarks drawn together in Section 5.

Theory in one dimension
We introduce the methodology and results in one dimension. A key step is the decision to work with the normalised density g, and the derivative, h, of the log-density 2 defined as h = −(∂/∂y) log g, with g(τ, y) = f (τ, y)/f (∞, y). Thereby g solves the backward equation with initial condition a delta-function of strength 1/f (∞, y 0 ) at y 0 , and h solves the nonlinear partial differential equation (PDE) which has a singular initial condition, in the sense that h ∼ (y − y 0 )/2τ as τ → 0. This singular behaviour can also be seen by dominant balance in (5). Next we observe that for the OU model A(y) = θ(y ∞ − y), with the constant y ∞ denoting the long-term mean, we have exactly as is easily verified by substituting it into (5) and ploughing through the algebra. Finally, from its definition, g(∞, y) must always equal unity, and so h(∞, y) = 0. This inspires the ansatz for arbitrary A. When (7) is inserted into the PDE for h, (5), and a Laurent expansion performed around τ = 0, the LHS and RHS agree at O(τ −2 ) and O(τ −1 ), explaining why we are writing the error term in (7) as o(1) in the short-time limit. This error can then in principle be approximated as a Taylor series around τ = 0, which we will discuss later. Another matter presents itself when this Laurent expansion is done: the development of (7) is and we observe that θ is absent from both the first two terms. Accordingly, all θ's give the same leading-order behaviour, and so we cannot say anything about θ simply by looking at the first two terms in the short-time expansion of the solution. In this sense, therefore, θ is now arbitrary, representing an estimate of the mean reversion speed of the force-field A, or, equivalently, the reversion speed of the OU process 'about which' we are expanding the given model A. Given that the leading-order asymptotics do not tell us what θ to use, some other method of inference is necessary, and we return to this later. In this context we call (7) the leading-order approximation for h.
To deduce g and f we integrate (7) from y 0 to y, giving (again for arbitrary A) where (. . .) τ,y 0 generically denotes a function of τ, y 0 . By means of the reciprocity condition 3 g(τ, y | y 0 ) = g(τ, y 0 | y) that is obeyed by the exact solution, we can infer the dependence of the prefactor on y 0 , to obtain The OU case requires the prefactor to be (Another way of deriving the prefactor is to write substitute into (1), and solve for the function n, which obeys a first-order linear differential equation; we shall refer to this technique later on.) Thence and which are the lowest-order approximations. Eq. (7), and its consequences, have several facets worthy of comment. First, they are exact for any OU model of reversion speed θ regardless of the reversion level, i.e. for A(y) = θ(y ∞ − y). Another way of putting this is to say that the correction terms to (7) will be expressible as functions of (d/dy)(A(y) + θy). Secondly, the approximations (9,10) are necessarily positive, and correct in both the short-and long-time limits, regardless of A(y). Thirdly, the approximations (9,10) obey the reciprocity condition.
We can proceed to determine higher-order terms in the representations for, say, h by continuing (7) and introducing an expansion for the remainder so where p = √ q = e −θτ , and consider the remainder terms, (b r ), that also depend parametrically on the starting-point y 0 . An alternative expansion is in powers of (1 − q), giving a different series, but with similar convergence properties and we do not pursue this further here. The power of √ q in (11) ensures h(∞, y) = 0 for any truncation of the series (i.e. sum as far as r = N ). By inserting (11) into (5) and comparing coefficients in powers of (1−p) r+1 we find that d dy where F r is a complicated quadratic expression invoking A, b 1 , . . . , b r and their derivatives. So although (5) is second-order nonlinear, successive terms in the expansion can be recursively obtained by solving a first-order linear differential equation, which can be integrated In the special case r = 0, we have which vanishes whenever A(y) + θy is a constant, as it should. As an aside, the term b 1 (y) also vanishes in another special case, i.e. when which has A as the logarithmic derivative of a parabolic cylinder function. This solution is sporadic in that it also depends on the starting-point, that is, for this choice of A(y) the function b 1 = 0 only if y 0 is chosen correctly. As (7) is a sum, the expressions for f, g will be infinite products, and so we refer to the method as an infinite product expansion. The focus here is not on the extraction of higherorder terms: rather, it is on the leading-order term, which is probably the most applicable. But it is notable that, should it be desired, one can also treat the correction term by spectral methods: that is to say, write f (τ, y) for (10), derive the PDE that it satisfies (this is another parabolic PDE), and solve it approximately by means of a Galerkin or collocation expansion [2]. As the initial spike and also the long-term behaviour have already been accounted for in f , the unknown function f / f is unity in both limits τ → 0, ∞, and therefore well approximated by spectral methods, which are ideally suited to smooth problems.
The first steps towards a product expansion were made in [34], which carries out a power series expansion on broadly similar lines and shows that, empirically at least, an expansion of the form appears to be convergent. The development here confers several advantages, besides greater compactness and elegance, over [34]: (i) the reciprocity condition (8) is enforced, whereas it was not in [34]; (ii) it is more readily adaptable to higher dimensions; (iii) it is more accurate over shorter time-scales; (iv) the leading-order approximation in [34] exhibits instability when |y 0 | is large. What is shown here can be obtained from (12) by taking the initial term and b old 0 (y) as the new leading-order term, with minor alterations, and modifying the prefactor in the derivation of g so as to enforce (8). Otherwise, however, it is neither more nor less convergent, being in effect a rearrangement of the terms.
The approach of expanding around an OU process leaves a free parameter, θ, and the remainder series implicitly depends upon it. As we said earlier, we cannot infer θ from the O(τ −1 ) or O(τ 0 ) terms in the short-time expansion. We can argue that θ should be chosen to minimise b 1 (y) as given earlier, and as that function vanishes when A (y) + θ is identically zero, it seems reasonable to choose θ so as to minimise A (y) + θ 'on average'. This motivates the choiceθ where · ∞ means an average over the invariant density f (∞, ·). This was proposed in [34], albeit using a different line of reasoning based on a sort of Rayleigh-Ritz argument to identify the least negative eigenvalue of L † , the differential operator of the FPE 5 . Clearly (13) is correct in the OU case, and it guaranteesθ > 0, all of which make it a pragmatic choice, but there is another compelling reason based on a connection with entropy and information theory, which runs as follows. Consider, for some p.d.f. ψ, the family of distributions ψ(y−µ) indexed by the parameter µ ∈ R. It is desired to estimate µ (from data), and the standard way of doing this is the maximum likelihood estimator. Writing we seek to maximise log f (y | µ) w.r.t. µ. The Fisher information [32, §2.5] is the expectation of the square of the µ-derivative of the log-likelihood, and hence is if we set ψ(y) = f (∞, y) then this is exactly the definition ofθ. In broad terms, the higher the Fisher information, the more certain we are about the estimation of the parameter in question, and indeed the reciprocal of the Fisher information furnishes the Cramér-Rao lower bound for the variance of any unbiased estimator. This has an interpretation in terms of the mean reversion: the higher the average speed of mean reversion, the more certain we are about our estimate of the mean from a given dataset, and vice-versa. Using the Fisher information as an estimator of reversion speed is therefore natural. As will be seen later, the method extends in a natural way to higher dimensions, with ∂/∂µ replaced by ∇, and then the Fisher information is a positive definite symmetric matrix rather than just a positive number.

Results and discussion for one-dimension
We consider a range of one dimensional examples deviating from the OU by different degrees and compare the leading-order approximation (10) to numerical solutions from a PDE solver. The numerical solution is computed by means of Fourier spectral collocation in the spatial direction coupled with a fourth-order Runge-Kutta algorithm in time and uses a narrow Gaussian as initial condition; a sufficiently large spatial domain is taken such that the FPE does not interact with the edge, and convergence is checked by mode-doubling. We take advantage of the linear diffusion by using an integrating-factor, and also the Fast Fourier Transform, to design a highly efficient solver; such methods are standard in scientific computing [46,6] and the fully-converged numerical simulations act as the 'gold standard' against which we compare the approximations. Even then, the PDE solver is far slower than the leading-order approximations. Our solution is a truncation from an infinite product, and so it is of central importance to demonstrate its efficiency.
The following examples explore the domain of validity of the leading-order approximation in which it successfully replicates the time-evolution of the numerical FPE solutions. We mainly choose cases from fat-tailed invariant distributions, which were our main motivation; the last case exhibits the adaptability of the approximation in places where it was unexpected.

Sech-power model
One way of moving away from the linear force field (quadratic potential well) of the OU model is to make the force field grow less rapidly away from equilibrium by stipulating This is an example, of the well-known Pearson diffusions [21], which is obtained from the local volatility model in which volatility increases away from equilibrium, using the transformation γX = sinhγY ; see [34] for more details and some applications in mathematical finance. The associated potential is also known in mathematical physics as the Pöschl-Teller potential [20] in the special caseδ/γ 2 = 2. The resulting Schrödinger equation is solvable in terms of special functions and its link with the FPE has been used to derive analytical solutions, see [3]. The steady state is a sech-power: with B denoting the Beta function; the explicit solutions mean that this is an attractive model that has been well-studied, for instance, for systems with nonlinear random vibrations (16) occurs, see [33].
In the limit whereγ → 0 we recover the OU model, soγ is a measure of the deviation from OU. We have looked at many parameter sets, the comparison of the leading order solution to full numerical simulations is consistently qualitatively pleasing, and a typical example in Figure 1 shows the comparison.

Dry-friction
Dry-friction [45] is the limit of the sech-power model obtained whenδ =γ → ∞, and we then have a discontinuous A(y) as This case is interesting in terms of the physics it describes, but also as additionally the transition density is available in closed form [45], e.g. by the usual route of Laplace transforming the Fokker-Planck equation: or g(τ, y | y 0 ) = e −(y−y 0 ) 2 /4τ √ πτ e −τ /4 e (|y 0 |+|y|)/2 + Φ τ − |y| − |y 0 | √ 2τ (18) and this exact solution provides a convenient benchmark against which to test our theory.
In this example, f arises as a sum of two pieces, and so h, rather than being a simpler function than g (as it is for example with the OU), is more complicated. Nonetheless, comparing to numerical simulation the accuracy is excellent when starting at the origin (not shown) or fairly near the origin (y 0 = −2, Figure 2a), but worse when it is much further away (y 0 = −5, Figure 2b). This points to a separate development of the theory that deals with far-field expansions, and that we pursue later in section 4: as we will show by consideration of (5), the first term in either of the above expressions corresponds to an approximation in which we start a long way from equilibrium and the drift is small, as here.

Student-t
Another popular deviation, and as noted in the introduction important across many fields, from the OU is that associated with fat-tailed distributions. We use, as in [34], a distribution that conveniently has Student-t as its steady state in the Y coordinates: This model gives rise to fatter tails than the sech-power example, because the force-field decays to zero as |y| → ∞. The distinction is further accentuated by noting that the sechpower case can be obtained, for certain parameter values, from transforming a model in which the steady state is Student-t in X coordinates; we demonstrate the efficacy of the leading-order approximation in Figure 3.

Double-well potentials
A gross deviation from OU is that of double-well potentials, and quite remarkably we find that the leading-order approximation still performs well capturing both the quantitative features and the qualitative behaviour, see Figs 4,5. We take a very general form with zeros at y = ±iγ and poles at y = α j ± iβ j for j = 1, 2; in [34] a limited case with just the quadratic in the numerator, but no denominator, that does not allow the flexibility to explore the parameters. These act as follows: γ → 0 makes the two wells disjoint, so that it becomes progressively less easy to transit from one well to the other; α controls their location; β → 0 makes them deeper. The force-field is and there are explicit forms for −A ∞ , and the normalising constant K, that can be obtained from the complex error function, or which can simply be evaluated numerically. We have evaluated several parameter sets and the results are shown in Figures 4,5 are typical. To be specific, the parameters are: poles at ±2 ± i, zeros at ±i/ √ 2, and for this we have −A ∞ ≈ 1.557. It is evident that starting from the equilibrium point ( Figure 4) produces different results from starting in one of the wells ( Figure 5). In the former, the approximation is excellent whilst in the latter, the approximation overestimates the rate at which the process 'finds out about' the other well, with the density being shared between the two wells at too early a time, though of course as τ → ∞ the results must again agree. Nonetheless even when starting in one of the wells the leading-order approximation gives qualitative insight. It is perhaps surprising that the approximation works at all, and that it does is suggestive that our philosophy of building an approximation based upon 'how diffusions think about themselves' and using the slightly counter-intuitive approach of using a product, rather than a sum, expansion is of value.

Multivariate theory
Buoyed by the success of the one-dimensional theory, and the exemplars that demonstrate the viability of the approach we employ across a range of illustrative cases, we move to higher dimensions where the availability of a fast accurate approximation to partially cure the 'curse of dimensionality' is attractive. In extending to the multivariate case we use two approaches: we can consider the case of the multivariate OU process as a guideline, and we can attempt to glue together components from the general one-dimensional case. There are, however, some preliminaries before we embark upon this. The general form (noting the discussion in [37] regarding assumptions required for the multivariate Lamperti transformation to move where, in m dimensions, W t , Y t ∈ R m and A : R m → R m is the force field and the corresponding FPE is (with τ = κt as before) It is natural to take A, the force field, conservative: that is to say it is the gradient of a potential. Indeed, A = ∇ log f ∞ , where f ∞ is the steady-state solution and then g = f /f ∞ obeys the adjoint equation so that if H = −g −1 ∇g, we have the vector equation If A were considered non-conservative then the equations (21,22) are no longer correct.

Symmetric OU
By a symmetric OU process, we mean (in normal form) that where italic bold letters are square matrices and W t is an m-dimensional standard Brownian motion, i.e. different coordinates are independent and so E[dW t dW † t ] = I dt where † denotes the transpose. The force field is conservative if, and only if, the matrix a, which we call the generator, is symmetric, and this is assumed henceforth. The steady-state is a Gaussian distribution of mean 0 and with covariance matrix σ ∞ = a −1 . Further, writing q as the matrix exponential q = exp(−2aτ ), which is evaluated by diagonalising a, this is eased by noting that a is symmetric, we find the following where the notation | · | denotes the determinant of the matrices. A technical, and notational, point is that as q lies in the commutative matrix ring R[a], we can legitimately write rational functions of a, q as if they were scalar indeterminates, providing the denominator is an invertible matrix. Thus a/(I − q), a(I − q) −1 , (I − q) −1 a are all equivalent. It is notable that the equation for h is, as in one dimension (6), a simple sum of two terms.

General theory
With the OU case in hand we now proceed to the non-OU case, and emphasise that although A is not linear, it has to be a conservative field; broadly we follow the approach of the univariate case, but there are technicalities associated with higher dimension. Comparing with the univariate case, the next step is to define a matrix analogue of q and also of θ. Recalling the previous discussion on Fisher information, we use the following for the multivariate analogue: As A is conservative, its matrix θ of partial derivatives is symmetric and is also equal the Hessian of − log f ∞ . It is then immediate that θ is positive definite because of the identity setting ψ = f ∞ , multiplying by f ∞ and integrating over R m causes the second term to vanish, while the first is the integral of a tensor square. Analogously to the univariate case, and also (24), we adopt the ansatz The first term may be integrated immediately to give a Gaussian, but the second is not in general a conservative field, except in certain cases: (i) any one-dimensional model or composite of one-dimensional models (by multiplying the marginals); (ii) any spherical model, i.e. one in which f (∞, y) is a function of (y − µ) † (y − µ) for some constant vector µ. This seems to be a difficulty as, no well-defined antiderivative exists in general. Upon further investigation the 'non-conservative' part of this term is O(τ ) because, if we write which at leading order is ignorable; it also disappears as τ → ∞.
We are then at liberty to define a function Ω(τ, y) as where µ ∞ is the long-term mean Y ∞ , and the path of integration, which needs to be specified whenever the integrand is non-conservative, is a straight line. As the integral in (27) is taken along a line, its computation does not present greater difficulties as the dimension is raised; for all the examples considered later it can be evaluated in closed form. This is the main technical hurdle and we can now proceed in much the same way as the univariate case to find that and, from reciprocity, as g must be symmetric in y, y 0 we must have so that the prefactor now depends on τ only. As τ → 0 we must have, as the density initially grows as a Gaussian, that while as τ → ∞ we have g(τ, y) → 1. This demands a prefactor of the form where ρ(0) = 1 2 , and ρ(∞) = 0, in view of the work leading up to (9) we write This gives our final results, the multivariate counterparts of (9),(10), as and We consider the two special cases mentioned above. First, one-dimensional models: (27)  reduces to and as A is the logarithmic derivative of the invariant density we recover (9); µ ∞ does not enter the final expression. An extension is to consider a product of independent processes, so that ∇A(y), and hence θ and q, are diagonal. This also leads to simplifications particularly if we take the reversion speeds (θ i ) to be identical, then θ and q are multiples of the identity matrix, and the multivariate approximation (30) becomes simply the product of the univariate approximations (10). Secondly, the symmetric OU case which has A(y) = −ay, θ = a, f (∞, µ ∞ ) 2 = |θ/2π|, and so we end up with (24), as we should. We now take illustrative examples and evaluate the leading-order approximations comparing to numerical simulation.

Fat-tailed and Double-well results
Note that, writing θ for −∇A ∞ , and since µ ∞ = 0 we have This model is an extension in two dimensions of the Student-t model shown previously, so it has the same characteristics, including the fat tails. We take a typical example, the model with a 1 = 1, a 2 = 3, so that the density starts off circularly-symmetric, and ends up elliptical. Two sub-cases are displayed, one with y 0 = (−2, 2) ( figure 6) and the other with y 0 = (3, 1) (figure 7) as contour plots. It is pleasing to see that the features are accurately captured both quantitatively and qualitatively.
The bivariate double-well model is particularly challenging, and very far from the simple OU model, and we take a general model where a 1 , a 2 ∈ R 2 , b 1 , b 2 , γ ∈ R and a is a 2-by-2 symmetric matrix. Then f (∞, y) = Ke −y † ay/2 y 2 + γ 2 y − α 1 2 + β 2 1 y − α 2 2 + β 2 2 We present cases illustrating different features: (a) The first uses the following parameters a = I, for which we compute numerically Similarly to the 1D case, α controls the position, β the depth and γ makes the wells more disjoint as it goes towards 0. Hence the wells are quite disjoint and located on the y 1 -axis; we consider two sub-cases with different starting points.
In the first one, y 0 = (0, 0.5) is equidistant from the two wells and the resulting field evolution is illustrated in Figure 8. The diffusion takes place at the same pace towards both wells and, even though we notice that the numerical solution converges slightly faster than the approximation, the agreement is reassuring. The second one is the extreme case when it starts at y 0 = (−1.5, 0) corresponding to the bottom of a well and its evolution is shown in Figure 9. As could be expected since the wells are well-separated, the approximation struggles to capture the medium-term behaviour.
(b) Finally we consider a = I, In this example, due to the symmetry of α, the wells are located on the y 1 = y 2 line, and they are less separated because γ is bigger. However the main difference is that β 1 = β 2 which implies that the wells have different depths. Again we present two sub-cases. The first one starts equidistant from the wells with y 0 = (−1, 1) and is illustrated in Figure 10. Once more we observe that the rate of convergence is the same towards both wells and that the exact and leading-order solutions have a very similar behaviour even though the starting point is further away from the wells than in case (a).
The second sub-case starts at the bottom of a well which is represented by y 0 = (1.3, 1.3) and is shown in Figure 11. As with case (a), the medium-term approximation diverges from the exact solution.

Extensions
We have presented typical examples and demonstrated, via comparison with numerical simulation that the approximation is encouragingly capturing behaviour far from the OU case, despite being based, in some sense, around the OU. We now highlight three extensions: firstly, what happens when we big far from equilibrium, then half-line problems for which our existing theory is inconvenient, and then finally what happens if the field is non-conservative.

Far-field expansion
The method is less accurate when the starting-point is a long way from equilibrium, and more concretely this means |A(y)| θ|y − µ ∞ | in one dimension. At one level this represents the deviation from the chosen OU model. However it can also be thought of as follows: |y − µ ∞ |/A(y) is, loosely, the time taken to get back to equilibrium, and if this is much larger than 1/θ, which is the reciprocal of the average speed of attraction, then we are in the far field-hence the above heuristic.
If we redevelop all the theory assuming no mean reversion (and there is little to be gained by staying in one dimension), we have, from (22), where we have replaced b 1 (y) by the gradient of a function B 1 . Equating the terms at O(τ 0 ) in (22) gives In one dimension, this gives which is the average of the integrand, and hence approximated by the average of the values at the endpoints. If A is slowly-varying then, going back to the higher-dimensional case, Also, by the same arguments as in §2, writing g(τ, y) = n(τ )e − h , and combining the results gives (again ignoring the ∇ · A term) Now let us reconsider the one-dimensional dry-friction case, which we reproduce for convenience: For |y| + |y 0 | > τ (a V-shaped domain, in a plot of y vs τ ) the first term predominates, and in fact is identical to what we have just derived in (32) as A = 1. This is understood as the far-field solution. Inside the 'V', i.e. |y| + |y 0 | < τ , the second term takes over; this is the term that governs the long-time limit, and the first term decays to zero then. It is elegant that these two halves combine to give the exact solution.
The basic principle is that (32) is a generic way of understanding the far-field behaviour of equations of non-OU type in the far-field, where the drift is small: for example, when A(y) = −a tanhγy or −y/(1 +γ 2 y 2 ) as previously studied. However, extending (31,32) to a complete solution that works in all régimes does not seem to be straightforward-there is no exponential damping factor in (31), so the expression is clearly wrong as τ → ∞. This is a matter for further research.

Square-root process
Another new branch of the theory concerns processes that are bound to lie in the half-line by reason of the volatility decaying to zero at some point, without loss of generality X = 0. (This is as opposed to a reflecting boundary condition, which is another way of constraining a process; we do not consider that case here, but doubtless it could be.) A famous example of this is the so-called square root process, described in, for example, [12,31]: It is unhelpful to transform this into (2) because the drift A will become infinite at the origin: 1/σ X becomes undefined. The problem can be rescaled by setting Y = 2aX/σ 2 and τ = at, giving where I ν (z) denotes the modified Bessel function of the first kind. We have for short time y , thereby keeping an error term of the same order, O(τ ), but ensuring h(∞, y) = 0. This gives which is an analogue of (11) for generalised square-root processes. The y-dependence is different, but there are obvious parallels, notably the e −τ /2 1−e −τ in the first term. The expansion can also be derived directly, if we define the normal form of a process on the half-line as In the expansion of h, the first term must, by dominant balance of the ∂h/∂τ and ∂(yh 2 )/∂y terms, look like the constant c must equal y 0 , by consideration of the initial behaviour, and to ensure h(∞, y) = 0 we write as the first term, where θ > 0 as before is arbitrary. The next term is obtained, as before, by requiring that the O(τ −1 ) terms balance in (36) and we find where the last term (−1/2y) is needed to balance the (yh ) term in (36). (No such term appears in (7), because the relevant term in (5) is h , which vanishes at leading order in τ .) This gives the same result as the square-root process, for which A(y) = ν − y in normal form, provided we set θ = 1 2 . Similar considerations apply to doubly-bounded processes e.g.
Both of these are examples of processes identified by Wong [47], where the usual method of solution is orthogonal expansion using the Hermite, Laguerre or Jacobi polynomials. There is also, as intimated above, the possibility of studying processes with one or two reflecting barriers.

Non-conservative problems
The theory of diffusion in the presence of a non-conservative force field is considerably more difficult. Even the OU case is not straightforward, but it can be related to its symmetric case as follows. To retain the general setting let us write the SDE as we convert the SDE to its normal form (23), but a might not be symmetric. The invariant density is still multivariate Gaussian with zero mean but the covariance matrix σ ∞ is not a −1 : instead it is given by the Lyapunov equation, which is most easily solved for σ ∞ by writing it as a set of linear equations in its elements. Also From this, with H = −∇ log(f /f ∞ ) as before, we can see by direct calculation that The first term, which is singular at τ = 0, corresponds to the first term in (7,26), if we accept the substitutions (notionally, √ q e −aτ or e −a † τ ), while the second term corresponds similarly, from the substitutions and A(y) −σ −1 ∞ y. Therefore, in some altered way, (7,26) carry over.
This brings us on to another matter: our work concerns approximating the FP solution using the short-and long-term behaviour. Is it possible to find two OU models with the same short-and long-term behaviour, even if the medium-term behaviour is different? The answer is yes: indeed, all generators of the form where A is the space of skew-symmetric matrices, give rise to the same behaviour in both limits. Formally, define two generators a to be equivalent (notation and clearly an equivalence relation) if they give rise to the same asymptotic covariance matrix. It is easy to see that two equivalent generators have the same trace.
As an example: when a = 1 a 0 1 we have So the equivalence class of a under is which contains the following elements, as it must: In summary, every OU process (or generator) is equivalent under to a unique symmetric one; we obtain σ ∞ from the Lyapunov equation and then invert it to obtain the symmetric generator.

Conclusions and final remarks
We have described the solution to the Fokker-Planck equation with steady state in simple, intuitive terms and demonstrated the validity of our approach with numerical examples in a range of cases: The main results are (10) and its multidimensional analogue (30). The most striking, and potentially most useful, conclusion is that even a simple expansion without the intermediate correction terms-the 'leading-order' expansion-produces acceptable results for the great majority of cases, in what may be described as the 'central zone' where the process spends most of its time. Perhaps surprisingly it continues to work well even if the departure from the OU model is quite gross, such as for the double-well potentials that we consider. Given the explicit nature of the formulae we provide, they are very fast to compute by comparison with a PDE solver or Monte Carlo simulation, especially in higher dimensions. Hence we anticipate that our approach can be utilised to form the core of, say, a Kalman filter avoiding linearisation or high-dimensional computation.
We have also indicated in Section 4 other developments that deal with extensions and important side-issues. These are: the far-field expansion, where the core result works less effectively; problems constrained to lie in the half-line or channel; and also the difficult case of when the force-field is non-conservative, i.e. not arising from a potential field.
On the more theoretical side there remains the open question of whether (7) converges in a neighbourhood of τ = 0, or whether it is simply an asymptotic expansion for small τ that eventually blows up if too many terms are taken. The empirical evidence of [34] is that it is convergent when A is analytic, but a formal proof is still required. There is also the possibility, as indicated in [34], of using spectral methods to produce a higher-order expansion, rather than developing in a power series in 1 − e −2θτ or 1 − e −θτ . This paper is, therefore, not the last word on the subject, but provides opportunities for further work in an area that we consider still to be fertile.