The instanton method and its numerical implementation in fluid mechanics

A precise characterization of structures occurring in turbulent fluid flows at high Reynolds numbers is one of the last open problems of classical physics. In this review we discuss recent developments related to the application of instanton methods to turbulence. Instantons are saddle point configurations of the underlying path integrals. They are equivalent to minimizers of the related Freidlin-Wentzell action and known to be able to characterize rare events in such systems. While there is an impressive body of work concerning their analytical description, this review focuses on the question on how to compute these minimizers numerically. In a short introduction we present the relevant mathematical and physical background before we discuss the stochastic Burgers equation in detail. We present algorithms to compute instantons numerically by an efficient solution of the corresponding Euler-Lagrange equations. A second focus is the discussion of a recently developed numerical filtering technique that allows to extract instantons from direct numerical simulations. In the following we present modifications of the algorithms to make them efficient when applied to two- or three-dimensional fluid dynamical problems. We illustrate these ideas using the two-dimensional Burgers equation and the three-dimensional Navier-Stokes equations.

Abstract. A precise characterization of structures occurring in turbulent fluid flows at high Reynolds numbers is one of the last open problems of classical physics. In this review we discuss recent developments related to the application of instanton methods to turbulence. Instantons are saddle point configurations of the underlying path integrals. They are equivalent to minimizers of the related Freidlin-Wentzell action and known to be able to characterize rare events in such systems. While there is an impressive body of work concerning their analytical description, this review focuses on the question on how to compute these minimizers numerically. In a short introduction we present the relevant mathematical and physical background before we discuss the stochastic Burgers equation in detail. We present algorithms to compute instantons numerically by an efficient solution of the corresponding Euler-Lagrange equations. A second focus is the discussion of a recently developed numerical filtering technique that allows to extract instantons from direct numerical simulations. In the following we present modifications of the algorithms to make them efficient when applied to two-or three-dimensional fluid dynamical problems. We illustrate these ideas using the two-dimensional Burgers equation and the three-dimensional Navier-Stokes equations.

Introduction
Using saddle point techniques in statistical and quantum physics has a long history since the early work related to disordered systems in solid state physics [1][2][3][4][5]. Especially, the pioneering works of Zittartz and Langer [3][4][5] contain all the ingredients that are relevant for this review. The term "instanton" was introduced in Yang-Mills theory as the classical solution of equations of motion in the Euclidean space with nontrivial topology in [6]. The quantum effects of instantons (determinant of quantum fluctuations) were computed for the first time by 't Hooft in [7]. The main features of the instanton calculus as a non-perturbative method for the calculation of path integrals were highlighted on the example of 0+1 quantum mechanical systems in excellent reviews [8,9]. The instanton calculus consists basically of four steps: (i) Calculation of the instanton as a classical trajectory (minima of the corresponding action S): the instanton provides the exponential decay term exp(−S) in the transition amplitude.
(ii) Calculation of zero modes that leave the action invariant: finding the zero modes is closely related to finding the symmetries of the underlying system. Once the zero modes are determined and if their number p is finite, as is usually the case, their contribution results from a finite dimensional integral and often takes the form ( √ S) p .
(iii) Calculation of the path integral of fluctuations around the instanton which change the action: this is normally done in the Gaussian approximation. (iv) Summation over the instanton gas.
Each of these steps may involve major obstacles but the instanton calculus offers an algorithmic approach for the calculation of transition probabilities in quantum mechanical and statistical systems, including the exponential decay and leading power law scaling. In this report we will focus on step (i): calculation of the instanton. We will shortly discuss the obstacles for the next steps and outline first ideas. However, already the knowledge of the instanton provides a deep insight into the underlying system which e.g. in the context of rare fluctuations is the adequate answer. One should also note that the expressions related to steps (ii)-(iv) are (complicated) functionals of the instantons. Thus, step (i) is naturally the most important step in this analysis.
This review is about instantons in fluid flows. Although the physics of fluid dynamics have little overlap with Yang-Mills theory mentioned above, they are nevertheless a strongly coupled nonlinear system as well. The comparison of quantum tunneling in a double well potential [10] and the Kramer's problem for escape rates in chemical reactions [11,12] already exemplifies the connection between quantum mechanics and stochastic ordinary differential equations (SDEs). Therefore, the calculus of instantons in fluid turbulence described by stochastic partial differential equations corresponds closely to the treatment of instantons in quantum field theory. However, naturally the question arises why a non-perturbative method like the instanton approach is especially promising for tackling the unsolved problem of turbulence.
In order to justify the approach, we first comment on the statement that fluid turbulence is one of the most important unsolved problems in classical physics as stated by R. Feynman. A mathematician, a physicist and an engineer will all have different thoughts on this statement. A mathematician will touch the issue by asking questions like existence of global solutions, bounds on the growth of velocity derivatives, the limit of vanishing viscosity and minimal regularity for energy conservation (see e.g. [13][14][15][16]). A computational engineer is concerned with the huge number of degrees of freedom present in a turbulent flows. Using direct numerical simulations (DNS) only flows with very low Reynolds numbers can be simulated: a Volkswagen traveling at the moderate speed of 30 km/h is presently at the absolute limit of direct numerical simulations. Therefore, constructing and improving large-eddy simulations using as much insight as possible from the underlying Navier-Stokes equations is at the heart of computational modeling [17]. This understanding is strongly related to the questions which arise in physics. For a physicist, the turbulence problem is solved if one has full information on the probability density of velocity fluctuations or equivalently on correlation functions of any order. The usual approach known from kinetic theory is to find some closure approximation for terms involving higher correlations. In principle, this can be done by applying perturbation theory starting from the heat equation and introducing the nonlinearity as perturbation. This approach using renormalized perturbation theory (direct interaction approximation, DIA [18]) and/or renormalization group analysis [19,20] was a major achievement of turbulence theory in the last century. However, it seems to be very difficult and questionable, whether these approaches can describe the probability density functions (PDFs) of velocity fluctuations in the tail that are extremely different from Gaussian fluctuations. The reason for this is the occurrence of nearly singular structures, as depicted in figure 1 for fluid and plasma turbulence, which are correlated over distances much larger than the dissipation scale and which cause strong departure from Gaussian behavior. Thus, for getting access to the tails of the velocity fluctuations PDF non-perturbative approaches are required. The instanton approach is especially promising since the instantons are closely related to nearly singular events like strong vortex tubes in fluid flows, current sheets in plasmas, and shocks in high Mach number flows which form the skeleton of anomalous scaling and departure from Gaussianity, as already noted by Onsager [14,21].
Before we start to summarize important results using the instanton calculus e.g. in fluid flows [22][23][24][25][26][27][28][29][30][31][32][33][34], magnetic field reversals [35], growth phenomena [28], chemical reactions [36,37], European options fluctuations [38], and genetic switching [39,40], we clarify the mathematical background. We start with the path integral formulation of general stochastic systems pioneered by Martin-Siggia-Rose/Janssen/de Dominicis. We then identify instantons as saddle points of the corresponding action and this also naturally leads to a description known from large deviation theory (LDT) in the context of the Freidlin-Wentzell theory. After the general procedure and notations are clarified in section 2, we come back to above mentioned important results in section 3. In 4, we will briefly introduce into a particularly instructive example of turbulence to show how the instanton formalism is applied to obtain verifiable predictions. Sections 5 to 7 present techniques to numerically find the instanton trajectory in fluid dynamical applications by direct computation, filtering the instanton from numerical experiments, and in higher dimensional setups, respectively.

The Martin-Siggia-Rose/Janssen/de Dominicis formalism
The functional integral description of stochastic systems started in the seventies with the Martin-Siggia-Rose/Janssen/de Dominicis (MSRJD) formalism [41][42][43]. The method transforms the stochastic system into a functional integral representation. Although there is little hope to solve the path integral analytically, it is the starting point for systematic perturbation theory in stochastic dynamical systems (see especially [44] for applications in critical dynamics). It is also the basis for the instanton calculus for stochastic dynamical systems. To introduce the formalism, consider a stochastic partial differential equation (SPDE) of the forṁ in d space-dimensions, u having n components: u(x, t) : R d × [−T, 0] → R n and η being a Gaussian forcing with correlation i.e. white in time and some correlation function χ(r) in space.
In fluid dynamics applications, the exact form of the spatial correlation is often irrelevant and can be characterized solely by its amplitude χ(0) and correlation length L, where χ(r) does not change significantly for x < L and decays to zero quickly for x > L. Dimensionally, for example L = χ(0)/χ (0) (assuming the forcing is isotropic and χ depends only on the scalar distance r). The forcing is then completely characterized by χ(0) and χ (0). Furthermore, we will only be considering additive noise. The drift b in general is a nonlinear, non-gradient operator. A (quick) derivation (similar to [45]) of the MSRJD formalism follows by considering the expectation of an observable O[u] and writing this expectation as a path integral taken over all noise realizations (using the fact that η is Gaussian and δ-correlated in time). For a more rigorous discussion of path integrals, in particular in the presented context, we refer the reader to the works [46,47]. Here, we keep in mind that the field u[η] has a functional dependence on the forcing η implicitly given by eqn. (1): where ·, · is an appropriate inner product, e.g. L 2 . At this stage, we can consider the transformation of the noise paths to the paths of the field u given by η → u given by (1), hence η =u−b [u]. This coordinate transform leads to a Jacobian in Dη = J[u]Du with Performing this coordinate change results in the Onsager-Machlup functional [48,49] This formulation is the starting point for directly minimizing the Lagrangian action Since it is often more convenient to work with the original correlation function χ instead of working with its inverse, we delay this coordinate change and introduce an auxiliary field µ and obtain (by virtue of the Fourier transform, completion of the square) from (5): Now again considering the coordinate change η → u we arrive at the path integral with the action function S[u, µ] given by also termed the Martin-Siggia-Rose/Janssen/de Dominicis (MSRJD) response functional. In many cases it can be shown explicitly that the Jacobian J(u) is not relevant to the discussion as it reduces to a constant, the value of which depending on the choice of either Itō or Stratonovich discretization (see e.g. [45]). Two remarks should be added: (i) the change introducing an auxiliary field µ to linearize the action with respect to the forcing is also known as Hubbard-Stratonovich transformation [50,51] and (ii) the MSRJD action S is closely related to classical limit of the Keldysh action [52][53][54].
The main idea of the instanton method is to compute the dominating contribution to this path integral via a saddle point approximation, i.e. by finding extremal configurations of the action functional (9).
Let us be more specific and consider an observable for a scalar functional F [u], which is defined at the end point t = 0 of a path that started at −T < 0 (keeping in mind that we might be interested in the limit T → ∞).
The idea is that we observe an extreme event at t = 0 that has been created by the noise (and we give the noise infinite time to create the extreme event). In turbulent flow, for example, an extreme event of interest could be a large negative gradient of the velocity profile (i.e. F [u] = u x δ(x)), an event of high vorticity, an event of extreme local energy dissipation, etc. Seeking a path integral representation of the probability distribution P (a) for the events that fulfill F [u] = a at t = 0, we find By now computing functional derivatives, we find δS δµ and it is easy to see that the saddle point equations (the instanton equations) are given byu where we have incorporated the Lagrange multiplier λ at the right hand side of the equation for µ as a final condition equivalent tȯ A solution (ũ,μ) of this set of equations is termed the instanton configuration or instanton. It represents an extremal point of the action functional (9).
We want to make a few remarks on the instanton configuration and the structure of the chosen path integral representation: • In the limit of applicability of the saddle point approximation, the instanton configuration corresponds to the most probable trajectory connecting the initial conditions to a final configuration which complies with the additional constraint defined by the observable O[u]. In the language of quantum mechanics, it corresponds to the classical trajectory obtained for the limit → 0. For stochastic differential equations of the form (1), we similarly need a smallness parameter to justify the saddle point approximation. This might either be the limit of vanishing forcing, χ(0) → 0, or the limit of extreme events, |a| → ∞ (which corresponds to the limit |λ| → ∞, see discussion in sections 2.2 and 5.3). • It is instructive to apply a change of variables (u, p) ≡ (u, −iµ). The response functional (9) can then be written in terms of a Hamiltonian H[x, p], The field variable u and the auxiliary field p then play the role of generalized coordinate and momentum for the Hamiltonian system defined by H[u, p]. The instanton equations (11), (12) are the corresponding Hamiltonian equations of motion. We remark in particular that the Hamiltonian H[u, p] is a conserved quantity even if the original dynamical system (1) is dissipative.
• In the above setup, note that the choice p = 0,u = b[u] is a solution of the equations of motion with vanishing action, corresponding to a deterministic motion of the original dynamical system without any perturbation by noise.
Since the action in general is always positive, S[u, p] ≥ 0, this implies that a deterministic trajectory connecting the initial and final point (if it exists) will always be the global minimizer of the action functional.
, which forms an important sub-class of problems. Here, minimum action paths become minimum energy paths.

Freidlin-Wentzell theory of large deviations
An alternative approach is given by Wentzell and Freidlin [55,56]. It has its roots in the application of large deviation theory [57][58][59] to dynamical systems under random perturbations. In general, we say that a family of random processes u (t) defined on t ∈ [−T, 0] fulfills a large deviation principle, if where " " is to be understood as the ratio of the logarithms of both sides tending to unity as → 0. The left hand side describes the probability of the random process to "end up" in some set Ω. On the right hand side, the infimum is taken over all realizations of the path that fulfill the boundary conditions. The functional I T is called the rate function, which plays the same role as the action functional in the previous section. The minimizer ψ * (t) of the rate function represents the maximum likelihood realization of the random process fulfilling the boundary conditions, and corresponds to the instanton configuration of the MSRJD-formalism. In the context of the Freidlin-Wentzell theory, the rate function can be written down explicitly as where is the Lagrangian of the system with Hamiltonian H(x, p), and ·, · is the inner product of the considered space. We are interested in systems of the form (1), with a d-dimensional Brownian increment dW and σ such that χ = σ T σ, where we can write down the Hamiltonian and consequently the Lagrangian The rate function thus looks like In a similar manner as before we consider observables at the end point t = 0 for a scalar functional F [u (t = 0)] and want to compute the probability P (a) of the random process evolving such that at the final time F [u (t = 0)] = a * , with a * ∈ R. As we will see, this corresponds to considering the large deviations of the momentgenerating functions of the random process, exp( λ F [u (0)] , with λ ∈ R. Starting from the large deviation principle (16) we define the set Ω a = {φ|F (φ) = a * } of final conditions that comply with the requirement of our observable. Now, defining where S(a) := − log P (a) = − log P (u (0) ∈ Ω a ). The last step of (23) makes use of Laplace's method to show that S * (λ) is the Legendre transform of S(a), and we conclude that S * (λ) = λa * − S(a * ). On the other hand where the infimum is taken over all realizations that fulfill the boundary conditions, ψ a * is its minimizer, and of course F [ψ a * ] = a * . In short, the rate function corresponds to the Legendre transform of the logarithm of moment-generating functions. We obtain the PDF of our observable from the Freidlin-Wentzell action.
To obtain the minimizer ψ a * , find the solution to the equations of motion (instanton equations) corresponding to (22), where the Lagrange multiplier λ assures that the constraint F [u] = a at t = 0 is satisfied. We immediately see that these equations are equivalent to (11), (12) by setting µ = −ip.

Applications of the Instanton method to fluid mechanics and related statistical systems
The instanton approach to fluid turbulence started about 20 years ago with the paper by Gurarie and Migdal [25] deriving the scaling of the right tail of the velocity increment PDF for the Burgers equation [60]. For the right tail shocks are absent and viscosity can be neglected. Even though the same result was already obtained by other techniques before [61,62], this paper contained all ingredients of the instanton analysis, demonstrating its applicability in the context of fluid dynamics. A remarkable feature, both of the right tail of velocity increments and gradients, is that fluctuations do not play a role and thus that the instanton solution is basically exact. This is not the case for the left tail of the increment and gradient PDF. The instanton solution for the left tail dominated by shocks was introduced in [26]. The instanton prediction was possible due to the conservation property of the Hamiltonian for the instanton equations and the existence of a Cole-Hopf transformation [63,64], the latter of which is in general not available in other fluid models. In the context of this paper, instantons for the Burgers equation will play a prominent role, and we will demonstrate many of the discussed methods and tools in application to Burgers turbulence. The reasons for this are manifold: As outlined above, many analytical estimates of instanton results are available for the Burgers equation because of the mathematical simplicity of the model. Furthermore, not only is the phenomenology of Burgers turbulence well understood [65] and many open problems of Navier-Stokes turbulence are known exactly in the Burgers case, but also the numerical treatment of a 1D model is far less demanding.
Besides the case of Burgers turbulence another problem was studied intensively during that time: the advection of a passive scalar by a turbulent velocity field. In a seminal paper, Shraiman and Siggia [23] investigated a Lagrangian treatment of the passive scalar using a path integral approach in the semi-classical approximation, demonstrating that the passive field as well as its gradient possess exponential tails. The instanton analysis was carried out in [24,66] reproducing the results of [23] but stressing the relevance of zero modes.
In a similar spirit, there is a corpus of literature applying instanton calculus to shell models of turbulence. In [22] the Gledzer-Ohkitani-Yamada (GOY) shell model was studied and an iterative procedure was introduced to determine the instanton. In addition, quadratic fluctuations around the instanton were calculated. One interesting outcome of this study is the tendency towards log-normal statistics of coherent structures. Similarly, in [67] a shell model of passive scalar advection is studied in the context of instanton calculus. A remarkable and underestimated feature of this article is the observation that in this case fluctuations around the instanton have a major impact and dramatically improve the quality of predictions for the scaling of structure functions.
Instantons for vorticity have been calculated in the inverse cascade for twodimensional turbulence [27]. We remark that these instantons are very different from the instantons considered so far: First, the direct instanton equations for vorticity are of no use if one considers forcing obtained by taking the vorticity as observable. The solutions are axial symmetric and the nonlinearity vanishes, which results in purely Gaussian vorticity statistics. Therefore, the authors took into account perturbations of the axial symmetry via the first harmonics and derived an effective action for which the axial symmetric solution provides an exponential tail for vorticity statistics. Second, whereas the instanton for e.g. Burgers turbulence has the property that the action S changes very rapidly at time t = 0 this is not the case for the 2D vorticity instanton. This is a peculiarity of turbulent fluctuations in the inverse cascade, and thus is neither present in Burgers turbulence, nor to be expected for the 3D Navier-Stokes instanton.
Instantons have also been calculated for bi-stability and transition probabilities in geophysical flows [30][31][32]. These investigations are of enormous importance for blocking phenomena and multi-state phenomena in atmospheric and ocean flows [33,34]. These cases differ from the studies mentioned before, since the instanton is found by directly minimizing the Onsager-Machlup functional. A numerical scheme for this minimum action method in the context of shear flows was introduced in [68], where the author studied the transition from Poiseuille flow to a traveling wave. Similar phenomena, like magnetic field reversals in dynamo action [35,[69][70][71], wait for an instanton analysis. The related problem of growing height profiles described by the Kardar-Parisi-Zhang equation has been studied in [28]. Here, the authors applied a numerical minimization to find instanton solutions and could identify instantons as nucleations and propagating steps in the height profiles. Instanton-like or large deviation-like solutions are also intensively studied in the context of freak waves [72][73][74][75][76][77], i.e. events of extreme instantaneous ocean surface elevation.
In a broader context, instantons have been extensively studied in condensed matter and critical phenomena [53,78,79]. Prototype examples are the nonlinear sigma model [79,80], strongly correlated electrons [81], and Mott-conductivity [82]. Especially, in [82] the integrability of the instanton equation allowed also the calculation of quasi-zero modes and fluctuations. This is in strong contrast to the situation in turbulence where any analytic solution of the instanton equations is out of reach. However, this paper motivates to search for simplified turbulence models such that the corresponding saddle-point equations are integrable.
The instanton method in the language of large deviation theory with a focus on a rigorous mathematical treatment has been extensively studied in the context of statistical mechanics (see especially [83,84]). In [84] the relation between the large deviation principle, the saddle point approximation and Laplace's approximation is discussed.
The calculation of chemical reaction rates based on an instanton approximation was introduced by Miller [36]. A recent successful treatment for multi-dimensional systems is presented in [37]. Furthermore, a field having much in common with the calculation of chemical reaction rates is genetic switching [40]. In [39], an instanton calculation of switching scenarios has been presented .
We finish this section by mentioning that experimental verification and detection of optimal paths or instantons are still in its infancy. An exception is the study of a semiconductor laser with optical feedback [103] where escapes from a metastable state could be attributed to instantons. We hope that this review also motivates and triggers studies in turbulence experiments.

The stochastic Burgers equation
As a mathematically less complicated turbulence model we consider the stochastically driven Burgers equation [60] and we will discuss the presented methods and numerical tools in many cases applied to Burgers turbulence. In order to understand both the relevance and the limitations of this model, we will in the following briefly introduce into the phenomenology of Burgers turbulence.
The stochastically forced, viscous Burgers equation is given by with a noise field φ that is δ-correlated in time and has finite correlation in space with correlation length L, more precisely While the precise form of χ is not important, the choice (28) is common in the literature [104,105] and is used in all numerical computations presented here.
There are several motivations to study Burgers equation. From a practical point of view, there is a broad range of applications governed by this model (see e.g. [65] for an overview), for instance in the context of compressible gas dynamics, in particular weak acoustic perturbations (viewed in the reference frame of the sound velocity), as well as cosmology [106,107], interface growth [108] or vehicle traffic flow [109]. A major motivation in the context of fluid dynamics is the fact that Burgers equation has a similar mathematical structure as the Navier-Stokes equations. As a onedimensional equation, however, it is much easier to handle from both an analytical and computational point of view, in particular because it contains no non-local pressure term. Therefore, Burgers equation is often referred to as a testbed or toy model for turbulence.
There are important similarities (and equally important differences) between the phenomenology of turbulence in fluids described by the Navier-Stokes equations and Burgers turbulence (see e.g. [65] for an overview of Burgers turbulence phenomenology). A major similarity between Burgers and the three-dimensional Navier-Stokes equations is the presence of a direct energy cascade: If energy is injected on large scales (or low Fourier modes), this energy is transported via the nonlinearity to small scales (corresponding to high Fourier modes) until the diffusive part of the equation becomes important and the energy is dissipated. Note that the above chosen correlation function (27) models this behavior: In Fourier space we havê χ(ω) = ω 2 e −ω 2 /2 such that we do not have any forcing of the mean (corresponding to ω = 0) and strong decay of the forcing for high frequencies. The length-scale L of the forcing is dimensionally given by L = −χ(0)/χ (0), which in the case (28) evaluates to L = 1.
Another major similarity between Burgers and Navier-Stokes turbulence is the existence of intermittency. This is manifest most strikingly in the existence of anomalous scaling for moments of velocity differences: The average increment of the velocity field on scale h, δu(h) = u(x + h/2) − u(x − h/2) and its moments δu n (h) = (u(x + h/2) − u(x − h/2)) n ("structure functions") exhibit a scaling of δu n (h) ∼ h ζn . In contrast to the dimensional estimate ζ n = n/3, the scaling exponents grow more slowly for n → ∞ for both Burgers and Navier-Stokes turbulence, a phenomenon that is believed to be connected to the intermittent nature of the flow.
In connection to this, a main quantity of interest in fluid systems are velocity gradients. High gradients are usually related to the most dissipative structures in the flow which govern the tails of the underlying probability distributions and structure functions. In the stochastically driven Burgers equation, we can immediately identify a difference in the behavior of positive and negative velocity gradients. For small viscosity ν and moderate gradients, the solution will follow the characteristics of the equation, given by the nonlinearity uu x . This means that positive gradients will be smoothed out, whereas negative gradients will further steepen until they are so steep that the viscous term will start to counteract the advection and shocks are forming. This has important consequences for the tails of the velocity gradient probability distribution. Let us fix a point in space-time (for simplicity take x = 0 and t = 0) and ask for the probability to observe a velocity gradient given by P (a) = P (u x (0, 0) = a). We are interested in extreme values for a, either positive or negative. Consider first the case of a > 0. Then, the noise has to counteract the deterministic dynamics that drive the system back to smaller positive gradients. Intuitively, we will find that it is very difficult for the noise to generate such gradients, and we may expect the probability density to decay faster than a Gaussian. For sufficiently large a, the scaling of the tail of the probability distribution should be characterized by the scaling of the associated minimizer of the Freidlin-Wentzell functional as we are clearly in a regime where we expect a large deviation principle to apply. The left tail of the velocity distribution is expected to have two different regimes: For small viscosity, it should be relatively easy for the system to generate moderate negative velocity gradients, simply following the deterministic dynamics of the nonlinearity that steepens the profile of the solution which would eventually lead to discontinuities in the velocity field if the system did not have any viscosity. Since the viscosity prohibits the occurrence of infinite gradients, once the gradient becomes too steep, it is again difficult for the noise to produce large negative gradients. Then, in the viscous tail of the distribution, again, the large deviation principle should be applicable and the scaling behavior is expected to be captured by the instanton (or minimizer of the Freidlin-Wentzell functional).
Following this intuition, there has been a considerable body of work devoted to the detail of the scaling of the function P (a). Here, we focus on several important studies that had consequences in the context of the present review. Concerning the right tail scaling, Gurarie and Migdal [25] used the MSRJD formalism in order to derive the Euler-Lagrange equations associated with Burgers equation (26) which are given by These equations follow directly from (25): simply set b[u] = −uu x + νu xx and compute (∇ u b) T = u∂ x + ν∂ xx using integration by parts. In their work, focusing on the right tail of the probability distribution p, Gurarie and Migdal introduced a finite-dimensional dynamical system approximating the solution of the Euler-Lagrange equation in order to predict that the distribution should decay much faster than Gaussian, i.e. ln(P (a)) ∼ −(a/ω) 3 , For the viscous left tail, Balkovsky et al. [26] applied the Cole-Hopf transform to the instanton equations and used a variety of careful approximations in order to predict that, in the limit a → −∞, the scaling of p should behave like ln(P (a)) ∼ −(a/(ωRe)) 3/2 , These limiting results can be motivated by a rather simple phenomenological description [26]: Velocity differences δu(h) = u(h/2) − u(−h/2) at the length scale h are increased by the (Gaussian) forcing according to the law δu 2 ∼ φ 2 t. The breaking time of the shock structures can be estimated by t ∼ L/δu. Therefore, from P (φ) ∼ exp(−φ 2 /χ(0)), one obtains P (δu) = exp(−δu 3 /(Lχ(0)). Now, in smooth regions u x ∼ δu/h, while for the shocks u x ∼ −δu 2 /ν. We thus recover the exponents 3 and 3/2 for the right and left tail, respectively. Yet, when comparing these limiting results to measurements in DNS, until recently, the role of the instanton for negative velocity gradients was unclear (and actually actively discussed among researchers). One numerical result obtained by Gotoh [110] via massive direct numerical simulations presented an estimate of the local scaling exponent of 1.15 for the probability distribution of the negative velocity gradients, which is surprisingly far away from the analytical prediction of 3/2. In 2001, Chernykh and Stepanov developed a numerical scheme to solve the instanton equations numerically [104]. This way they were able to show that all the approximations made by Balkovsky et al. leading to a scaling exponent of 3/2 were valid for the solution of these equations. These results rendered the discrepancy between DNS measurements and theoretical prediction even more in need of an explanation: In what sense are instanton configurations actually relevant in Burgers turbulence? In our recent work, we were able to confirm the relevance of instantons and their impact on the tails of the velocity gradient distribution in two ways: First, we showed that the instanton configurations themselves can be identified in realizations of turbulent Burgers flows by using an appropriate filtering technique [111]. Second we were able to analyze Gotoh's simulation in detail and compare them to the local scaling exponent given by the solution of the instanton equations [105]. It turned out that, for the parameters that were chosen in Gotoh's simulation, the scaling exponent of the velocity gradient given by the (numerical) solution of the instanton equations is actually 1.16, hence very close to the measured value. The regime in which these numerical simulations were carried out was simply not yet in the range of validity of the asymptotic analysis that was carried out analytically, but nevertheless already in a regime where the instanton approximation is valid. The resolution of this 'puzzle' is encouraging and gives hope that instantons are relevant in a wide-range of fluid dynamics and can help to answer many open questions in the field. However, the detailed study of the stochastic Burgers equation also clearly showed how important it is to develop efficient numerical methods in order to compute such instantons.
The identical question was discussed for the tail scaling behavior in the inviscid limit, ν → 0. Phenomenologically, the exponential tail decay is only valid for viscosity dominated shocks and the exponent of 3/2 appears due to the interplay of shock steepening by the non-linearity and shock smoothing by the dissipative term. In the inviscid limit, the exponential region of the PDF is pushed to negative infinity, and for finite gradients an algebraic tail P (a) ∼ a −γ prevails. There was a debate about the numerical value of γ, with predictions ranging from γ = 2 [112,113], over γ ≈ 3 [61,114,115] to γ = 7/2 [116]. The issue was settled in [117] by considering the scaling of pre-shocks and confirmed numerically [118]. Notably, field-theoretic methods [61] have resulted in wrong predictions and it is unclear in what manner the instanton formalism can be applied to obtain the correct exponent in this limit.
In the next section we will first present recent developments of efficient numerical techniques for solving the instanton equations. After that, we will discuss how to obtain instantons by filtering from direct numerical simulations and we will show how these two methods can be compared. The remainder of the paper is dedicated to higher-dimensional problems which occur naturally when considering many models of physical fluids.

Efficient numerical solution of the instanton equations
The problem of the numerical solution of the instanton equations, or, equivalently, the numerical minimization of the action functional, lies at the very center of the computation of instanton configurations. In the last decades, a multitude of numerical schemes were proposed, differing both in the setup and approach as well as in practical considerations, such as computational efficiency or parallelizability. In this section, we want to briefly introduce into this spectrum of methods, starting from traditional and general methods such as shooting methods [119] to very recent developments [120][121][122]. Differences in terms of applicability, scope and practicability are highlighted in order to allow a comparison of different schemes for a given problem in terms efficiency and simplicity. Furthermore, we particularly highlight specific techniques for efficiently computing the instanton field configuration for fluid-dynamical problems, giving practical tips. These include implementation details for reoccuring problems such as computing norms in function spaces for degenerate forcing, treatment of numerically unfavorable correlation matrices and increasing computational and memory efficiency. This discussion will be complemented with an example implementation given in the appendix.
Next to the possibility of computing rare and extreme events from the knowledge of the action functional or the corresponding equations of motion, alternatives exist that do not rely on the saddle-point approximation of the underlying path-integral. One notable class of algorithms approximates the infinite-dimensional path-integral (8) numerically, for example by Monte-Carlo integration. Algorithms of this type are regularly used for calculations in quantum chromodynamics ("lattice QCD"), but have recently been applied to fluid-dynamical problems as well [123]. Another class of algorithms devises strategies to increase the number of rare events by considering an ensemble of trajectories, and dynamically cloning and destroying realizations according to an empirical estimator of the rare event [124][125][126][127]. Suited also for the application in fluid dynamics, they can be seen as the counterpart of the instanton formalism in estimating the far tails of probability densities. As both these classes of methods in general do not make use of the instanton approximation, we will not discuss them in more detail here, even though many of them are applicable for the computation of rare and extreme events in fluid dynamics.

Gradient systems, minimum energy paths
In the context of theoretical chemistry and the computation of reaction rates, a common scenario is a simplification of system (1), where the drift-term is a gradient, b[u] = −∇ u V (u). Here, minimum action paths can be translated into minimum energy paths, which in turn allows for a number of simplifications of the numerical method, as a lot more is known about the properties of the minimizer: Between stable fixed points of the deterministic dynamics, which correspond to the minima of the potential landscape V (u), the transition trajectory has to cross through the saddle point u * of the potential with the lowest potential value V (u * ), when moving between neighboring basins of attraction. This simplified structure of the minimizer allows for a direct estimate of the associated transition probability in terms of the height of the potential barrier and the Hessian at the saddle point, ultimately leading to Kramers' law [11], which is a refinement of Arrhenius' law, known from the problem of chemical reaction rates [128]. Even though in the context of fluid dynamics the systems involved are usually not gradient, some of the mentioned methods are used as a basis for more generalized schemes. Notable methods for finding minimizing trajectories in this setup include the nudged elastic band method [129] and the string method [130,131].

Numerical computation of the minimizer of the action functional
Returning to the problem of an arbitrary action functional, more general methods have been developed, most notably the minimum action method (MAM) [96], which can be seen as a generalization of [132] and variants, for example adaptive MAM [68,133], parallel MAM [134], and gentlest ascent dynamics (GAD) [135]. The main idea for an efficient computation is to exploit the structure of the underlying Euler-Lagrange (instanton) equations (25) for efficient computation of the solution. This structure might depend on the particular problem and thus, the computational method often needs to be adapted to the concrete problem. An important step in the development of a method that can be applied in a very general context is the geometric minimum action method (gMAM). The gMAM can be viewed as a generalization of the string method for non-gradient fields. Its starting point is the construction of the quasipotential that exists even if the drift b does not possess a potential (for details see chap. 5 of [56]). If the drift b does not depend explicitly on time, time can be viewed as a parametrization of the instanton equations. This parametrization can, in principle, be changed to one that is more favorable from a computational point of view. Based on this idea, the gMAM [120,121,136] was developed which offers an efficient way to compute minimizers for transitions between an initial and a final state and works even in cases where b does not possess a potential. In particular, one arrives at a modified action functional,Ŝ where the search space is restricted to arclength parametrized trajectories. The metric is given by the correlation matrix χ of the problem via and the norm · χ is induced by this scalar product. The action functional (32) is also termed the geometric action functional, and its minimizers are identical to the infinite time minimizers of the original action, if reparametrized to physical time. An important class of problems that can be solved efficiently by the gMAM are noise-driven transitions between stable fixed points in the context of meta-stability. These problems are difficult to solve computationally in the original parametrization using time, since it can be shown that it takes infinite time for the minimizer to approach the stable fixed point. To illustrate this fact, in the simplest case, consider a one-dimensional Ornstein-Uhlenbeck process du = −γu dt + √ dW for the exit from a stable fix point u = 0, hence the transition from u(T min ) = 0 at T min < 0 to u(0) = a > 0. In this case, the explicit solution of the instanton equations yields such that the action S depends on T min , hence The geometric reparametrization used in both the string method and the gMAM avoids the computational difficulties that arise from the fact that the minimizer requires T min → ∞, which makes them in particular efficient for the problems involving the transition between fix points.

Solution of the instanton equations
As well known from classical mechanics, the alternative to minimizing the action functional is the solution of the corresponding equations of motion, which are equivalent to the instanton equations (25). This dualism transfers into the algorithmic context: Instead of employing global numerical minimization of the action functional, we can choose to solve a pair of coupled deterministic PDEs instead. In general, it is not obvious which choice is advantageous. In the context of fluid problems, though, we will show in the following that many problems are more easily treated by solving the equations of motion.
At the center of this lies the realization that commonly we are interested in observables of the form of (10), measuring a single degree of freedom in the final field configuration. Most importantly, we do not want to specify the full final condition of our field, or have no access to it a priori. For example when finding dissipative structures in Burgers turbulence, we do not know the exact form of the shock structure that will evolve. While the creation of a steep gradient can still be viewed as an exit problem (interpreting the initial state of the system u 1 = u(T min , x) = 0 as the stable fix point for , notably without specifying the final state u 2 . The choice of the observable restricts u 2 to have a gradient equal to a at x = 0, t = 0, but makes no other assumption about its form. In other words, while it is intuitively plausible that the system will develop a diffusion-limited shock-like structure in order to produce a large negative gradient at t = 0, we cannot compute u 2 without actually obtaining the full solution of the instanton equations (25). This is a quite powerful realization: The only input being a restriction on a single degree of freedom of the final condition, the instanton formalism (if applicable) provides us with the most probable final state which fulfills the constraint, the most probable evolution in time from a given initial condition into this state, and the force that was necessary to achieve this evolution.
As a direct consequence, problems of this type are difficult to frame in the context of minimization of the action functional, as the final condition is not fixed, but only subject to a constraint. In contrast to this they are readily solved by considering the instanton equations (25) as the term λ∇ u F [u]δ(t) defines an initial condition p(x, t = 0) = −λ∇ u F [u] for the auxiliary equation, propagated backwards in time. The problem is therefore reduced to solving two mutually dependent deterministic PDEs with known initial conditions. In what follows, we explain in detail how to implement this formalism in the case of Burgers shocks, noting that the scheme is similarly applicable to different problems in fluid dynamics of the same functional form.
In 2001, Chernykh and Stepanov published a seminal paper describing an algorithm suited for such boundary conditions [104]. Their idea was to solve (25) iteratively, demonstrating that the iterative scheme converges to the instanton given by the solution of (25). The original motivation of Chernykh and Stepanov was to confirm analytical predictions regarding the scaling exponent of 3/2 of the left tail of the velocity gradient, when diffusion counteracts the steepening of the shock profile. with F [u(x, t = 0)] = ∂ x u(x = 0, t = 0) and a 0. This choice leads to the boundary condition p(x, t = 0) = −λδ (x). In order to iteratively solve the equations, the basic scheme suggested by Chernykh and Stepanov was to view the instanton equations (25) from a dynamical point of view on the time interval (−∞, 0]. For a specified velocity field, for example the field u (k) (x, t) of the k-th iteration, we can solve the auxiliary equation backwards in time up to a time T min 0. From our previous considerations of the simple Ornstein-Uhlenbeck case, it is clear that one would like to choose T min as small as possible in order to approximate the minimizer (which is obtained in the limit T min → −∞). Once the next iteration p (k+1) (x, t) is found on [T min , 0], this solution is used in the equation for u, to find the next iterate u (k+1) (x, t). Note that the convolution χp represents the forcing of the system, while the rest of the u-equation resembles the deterministic part of the original Burgers equation (26). Therefore, having found the fixed point of our iteration, we can recover the optimal forcing by evaluating χp. To start off the iteration, an initial guess u 0 (x, t) is required, e.g. u 0 (x, t) = 0. The iteration is continued until reaching a fixed point. The following figure illustrates the iteration scheme: Note that one needs to specify the value of λ in this algorithm as it is present in the term p(t = 0, x) = −λδ (x). Usually, it is not known a priori how λ relates to the parameter a in the observable O[u] = δ(F [u(x, t = 0)] − a). One can view λ as the Lagrange multiplier that ensures that at time t = 0 we are observing indeed F [u(x, t = 0)] = a. In the above scheme, however, one can measure a after each iteration by computing F [u (k) (x, t = 0)]. Once the differences in the values of a are below a chosen threshold, the desired approximation of the instanton has been computed. An example implementation is given in the appendix in listing 1 for the case of a non-linear Ornstein-Uhlenbeck process.
Although this scheme appeals by its simplicity and generality, there are several computational challenges associated with it. For the Burgers case, this was already noted by Chernykh and Stepanov. Some of them are so severe that they make a direct application of the scheme almost impossible. In the following, we discuss two particular challenges: (a) the numerical instability of the fixed point and (b) convergence issues related to the fact that one needs to take T min → −∞ in order to compute the minimizer.
Regarding (a), we confirmed in our simulations results from Chernykh and Stepanov: For the Burgers case, for large λ (corresponding to large negative gradients a), taking directly the updated u (k+1) as the new advection field in the equation for p can lead to instabilities. These instabilities can be suppressed by a softer update in the spirit of a Gauss-Seidel iteration: Assume that, in the above scheme, we have computed the new velocity field u on (T min , 0] and let us call this fieldũ (k+1) . Then we can take as an update a weighted average of the newly computed fieldũ (k+1) and the old fieldũ (k) using a weight parameter α ∈ [0, 1) During the iterative procedure, we monitor the numerical value of the action at each iteration step to ensure that the update yields a minimizer. Note that one could have also used a gradient based method to descent into the minimizer. However the choice above enables us to re-use existing implementations of the deterministic dynamics. Regarding (b), the difficulties associated with T min → −∞ require a more extensive modification of the algorithm. One approach is to borrow an idea from geometric methods, in particular the string method and the minimum action method (gMAM), in order to find a suitable reparametrization of the evolution variable [122]. Hamilton's equations, which describe the trajectory of the minimizer of the Freidlin-Wentzell functional, are given bẏ If H does not depend explicitly on time, we can reparametrize these equations using t = t(s) (assuming t (s) > 0) to obtain We know from the geometric minimum action method that a suitable parametrization to circumvent the computational difficulties associated with T min → −∞ is to choose t(s) in such a way that u = C. The norm · is simply the Euclidean length for δ-correlated noise. For a correlation function χ, the induced norm · χ is given by f 2 χ = f, χ −1 f and we choose t(s) such that u χ = C. In analogy, we call this arclength-parametrization of Hamilton's equations for the minimizer. For the Burgers case, working directly with the correlation function given in (27) can lead to instabilities for high frequencies. Therefore, one can truncate the correlation function in Fourier space, e.g. work witĥ where H denotes the Heaviside function. Then, the correlation function χ can be inverted on its support in Fourier domain and we can implement the induced norm · χ via wheref is the Fourier transform of f and F −1 denotes the inverse Fourier transform. Then, for Burgers equation, the arclength-parametrized Hamilton equations (38) are written as Note that it follows from the above Hamilton's equations that we have at the minimizer This equation can be used in order to regularize the term b[u] χ in numerical simulations if necessary. The arclength parametrized equations of motion (41) can similarly be obtained by considering the variation of the geometric action (32). It is noteworthy that Chernykh and Stepanov were aware of the difficulties related to taking T min → −∞. They addressed this difficulty in their paper in two ways. First, they replaced the initial condition for u at T min → −∞ with a modified initial condition found from the linearization of the instanton equations around the fixed point. Second, they used a self-similar transform related to the kernel of the heat equation which resulted in an exponential rescaling of the time. While this approach proved to lead to efficient numerics for the Burgers case, the above chosen geometric reformulation leads to similar results but furthermore generalizes to other equations in a straightforward way.
We remark that for known initial and final conditions of the field variable, the method lined out above is not easily applied. In this case there is no restriction on the auxiliary variable, while the field variable has to hit a particular point u 2 starting from u 1 . To solve the instanton equations, one can then only rely on shooting methods [119], which is nearly hopeless in such high-dimensional systems. Therefore, the direct minimization techniques discussed in section 5.2 are in general preferred.

Instanton filtering in direct numerical simulations
Instantons describe the rare fluctuations of a system, far from its equilibrium state. For complicated systems, like fluids, it is difficult to predict analytically at which point the instantons will start to govern the behavior of the underlying probability distributions. One can even raise the question whether, in any realistic context, they are relevant to describe phenomena as they might become only important so far out in the tails that, while presenting a correct mathematical description of the asymptotic behavior of the tails, they are not of any relevance for understanding or modeling the physical phenomena. As lined out in section 4, precisely this question was raised and answered in the context of the stochastic Burgers equation regarding the left tail of the velocity gradients. From a practical and numerical point of view, however, there is an entirely different way to assess the importance of instantons: If one creates a very large number of direct stochastic realizations of the system, one can, fairly easily, filter the runs (or paths) that lead to a desired value of the observable. The ensemble of these numerically filtered -or conditioned -paths represents an approximation of the path integral under the appropriate boundary conditions and will therefore resemble the instanton whenever the saddle point approximation applies. It is clear from the discussion above that, in order to carry out this scheme without any further optimization, massive numerical simulations are needed. Stochastic Monte-Carlo simulations, however, parallelize in a trivial way and, with the rapid development of high-performance computing over the last decade, direct numerical simulations are now feasible as we will illustrate using the stochastic Burgers equation.

Illustration of numerical filtering
In order to show the simplicity of the numerical filtering, let us consider a simple one-dimensional Ornstein-Uhlenbeck process, hence We consider the exit from a stable fixed point at lim t→−∞ u(t) = 0 such that u(0) = a. A simple analytic solution of the instanton equations (11), (12) shows that the minimizer of the Freidlin-Wentzell action is given by How can we obtain a numerical approximation of this minimizer via direct numerical simulations? One way is to generate an ensemble of realizations and filter the paths that end in a small neighborhood of the target value u(0) = a, for example given by the interval a − < u(0) < a + . Since the probability distribution will have its maximum around the minimizer, and the fluctuations around the minimizer are (at the leading order) Gaussian, we can expect the mean of the filtered sample paths to approximate the minimizer. Figure  From the Freidlin-Wentzell theory, we know that the paths will cluster around the minimizer in the weak-noise limit of σ → 0. An example implementation of this algorithm is given in listings 2 and 3 in the appendix.

Extension to jump processes
In this section we will comment briefly on the extension of the filtering to jumpprocesses, in particular related to stochastic differential equations (SDEs) driven by symmetric alpha-stable processes. In the context of fluid mechanics, one might think of a system that is driven by an already non-Gaussian noise, such as the ocean driven by wind. In general, such SDEs have attracted increasing interest over the last decade, not only in the context of turbulence, but also in biology and particularly in finance [137]. An area of active research regarding fluids is the question whether Levy walks can be used to model pair dispersion in turbulent flows related to the Richardson law [138,139].
In the present work, we restrict ourselves to the simplest basic approach to review how one can use path integrals in the context of Levy-driven stochastic differential equations. Levy processes are also called fractional noise processes as their characteristic function (see below) involves a non-integer exponent α. It is well known that, using the MSRJD approach, for such Levy processes, a generalization of the Wiener-path integral can be formulated [140]. The starting point of our discussion is the stochastic differential equation (19) where the Brownian increment dW is now replaced by the Levy-increment dL, For simplicity, let us discuss the above equation in a one-dimensional context. For symmetric alpha-stable processes, the characteristic function of L t is given by In the following we assume 1 < α < 2. The characteristic function lacks smoothness around the origin, which gives rise to fat tails. Indeed, it can be shown [141] that (set t = 1) we have as |u| → ∞. While the theory of SDEs driven by Levy processes is much more involved compared to SDEs driven by Brownian motion, it is relatively simple to adapt the filtering technique to such equations. Equation (45) can be discretized in the following form: where the random number r n is generated, for example, by a Box-Muller-like algorithm: Draw v n from a uniform distribution on (−π/2, π/2) and draw w n from an exponential distribution with mean 1. Then compute This simulation technique can be extended to multi-dimensional processes in a straight forward way, using a Cholesky decomposition in order to account for a specific correlation function. Even in simple cases, however, the behavior of the solution differs fundamentally from the diffusive case. In order to illustrate these differences, as a simple example, we will discuss a Levy-driven Ornstein-Uhlenbeck process. Note that, for such processes, the corresponding path-integrals can be explicitly evaluated [142]. For our numerical simulations, we have the Ornstein-Uhlenbeck process u follow the SDE given by Here, again, we discuss the exit from a stable fixed point at negative infinity such that u(t = 0) = a. For the Levy case, consider as an example α = 1.8. Applying the filtering technique, we see that the filtered mean u of the process for the Levy case follows a similar form given by u ≈ a e γ(α−1)t .  5]. From here we see that, although the mean u appears to yield a smooth curve, the typical exit path exhibits an entirely different evolution: It deviates only very little from the stable fixed point at u = 0 until a (random) time t r , only to then perform a jump of height a e γtr , such that the deterministic drift b will take it to the point a at the time t = 0. In the diffusive case, as we have seen above, the filtered paths were distributed around the mean paths which coincided with the minimizer of the Freidlin-Wentzell action. An example implementation for the case of Levy noise given in listings 2 and 4 in the appendix.

Filtering for the stochastic Burgers equation
We now describe how to apply filtering in the context of the stochastic Burgers equation. Here, the situation is more expensive than for the Ornstein-Uhlenbeck process since we are dealing with many more dimensions: For each realization we need to solve equation (26) with the appropriate right-hand side. The basic idea of the filtering remains the same: We generate an enormous number of stochastic simulations and filter the realizations that produce the rare event that our observable measures. In the case under consideration here, this observable is the slope of the velocity field and we are in particular interested in the probability of large negative gradients.
There are a variety of numerical methods available to generate the noise term on the right hand side of equation (26) in direct numerical simulations, often the preferred methods are Fourier-based [110,143]. The idea is that the main results, for instance in our case the scaling of the probability distribution of the velocity gradient, should show universal behavior and not depend too much on the particular choice of the noise term as long as the noise is sufficiently weak and does not dominate the system. In the following, however, we discuss the results of a particular numerical experiment aimed at showing the relevance of instantons [111]. Here, we are not only interested in the scaling of the probability distributions but in the detailed structure of the instantons, in particular their time history. For this purpose, it is desirable to generate noise that imitates the fluctuations assumed in the instanton analysis as closely as possible. This can be done as follows: (i) Draw a vector r of appropriately scaled normally distributed random numbers.
The size of the vector corresponds to the discretization in x. (ii) Multiply this vector by a matrix A resulting from the Cholesky decomposition of the (discretized) correlation matrix C. Note that naive discretization of χ is likely to lead to aC that, due to finite machine-precision, is not positive-semidefinite. This is due to the fact that, for a large number of discretization points, the rows ofC are almost linearly dependent. In this case it is possible to use the algorithm introduced in [144] in order to obtain a matrix C that is positive-semidefinite and sufficiently close toC.
When implementing the direct numerical simulations, the available computational hardware needs to be used in an efficient way. In our work from [111], we used a combination of accelerator graphics cards (CUDA) and multiple processors connected via MPI: For the required resolution, the size of one simulation was small enough to fit on a graphics card such that a single realization could be performed directly on the card. In addition, other numerical procedures necessary to extract the instanton (see following section) could be performed as well on the graphics card. MPI was used in order to average over the different CUDA realizations. This made the actual implementation of the algorithm fairly simple but efficient: Since the realizations are stochastically independent, we obtain a linear scaling with the number of graphics cards. This approach, in principle, is applicable to any generic stochastic partial differential equation with one spatial dimension. Already for two dimensions (see section 7), the memory requirements grow significantly, so that in these cases more sophisticated numerical algorithms have to be employed.

Extracting the instanton
In order to obtain the instanton from an ensemble of direct numerical simulations, the following filtering can be applied: Prescribe a small interval around the desired value of the observable and keep the relations for which the value of the observable falls into this interval. This is similar to the procedure previously discussed and illustrated for the Ornstein-Uhlenbeck process. In the concrete case for Burgers equation, we prescribe the interval around a specified velocity gradient at x = 0 at t = 0. We assume that the realizations start at t min from a zero initial condition and are integrated to the final time t = 0. For computational efficiency, our filtering algorithm does not only look at the point x = 0 but makes use of spatial translation invariance. This requires shifting of both the velocity and the forcing field. In the simple case of the Ornstein-Uhlenbeck process discussed above, such shifting was unnecessary since we were working in only one dimension. In the case of a stochastic partial differential equation like the stochastic Burgers equation, however, the rare event (here the large negative gradient) can occur at any place in the profile. The above described procedure for searching the maximum gradient and shifting the fields allows for detecting many more rare events.
The averaging procedure now consists of taking the average of all those shifted fields u shifted (t, x). Note that, from the same simulations, we obtain also a corresponding force field η shifted (t, x) and, for both ensembles, we can compute the ensemble average u shifted (t, x) and η shifted (t, x) in space and time. Due to the δ-correlation of the driving noise (in time), we expect convergence only after many realizations. This filtering approach can be interpreted as the numerical counterpart to the MSRJD formulation of the path integral taking into account the chosen observable O(u) = δ(u x (0, 0) = a) as the conditional mean of the observable (where the conditioning is happening in our case due to the constraint of the velocity gradient at the end point of the evolution). For sufficiently strong gradients, corresponding to sufficiently rare events, we therefore expect that the ensemble averages u shifted (t, x) and η shifted (t, x) will be close to the instanton solution of (11), (12). The corresponding optimal force η shifted (t, x) is related to the instanton solution of (11) via η shifted (t, x) = −iχµ.

Comparison between typical shock events in DNS and instanton prediction
The first important choice in the direct numerical simulation is setting the initial time T min . Clearly, from an analytical point of view, one requires T min → −∞ and, therefore −T min should be as large as possible in order to let the instanton develop. The maximum ∆t for the resolution in time (usually the choice of ∆t is enforced by a stability condition of the numerical scheme and related to the spatial discretization) sets the number N t of discretization points in time. We found that, in fact, the results are very sensitive to the choice of T min if −T min is too small. Therefore, it is recommended to perform several runs with a variety of values for the parameter T min in order to be sure that −T min is sufficiently large. In our simulations, we found that a decent value of T min is obtained by comparing T min to the integral time T L and we chose |T min | > 10T L . The averages were obtained from about 10 7 stochastically independent realizations.
Using this data set of simulations at different forcing strengths (or equivalently, different Reynolds numbers), we can compare the applicability of the instanton approximation for events of extreme negative gradients in turbulent Burgers flows. In figures 4 (left) and 5 (left) this comparison is demonstrated for low and high values of the forcing prefactor, respectively. The instanton prediction for the left velocity gradient PDF tail, exp(−S inst ), agrees over the whole range of gradients in the low forcing limit, where the instanton approximation becomes asymptotically exact. As expected, for a higher value of forcing, the instanton prediction captures only the scaling of the tail for events of extreme negative gradients. Figures 4 (right) and 5 (right) on the other hand demonstrate the agreement of the final configuration u inst (x, t = 0) of the instanton against the filtered field u shifted (t, x) for low and high values of forcing, respectively. While in the limit of low forcing the average shock event resembles the instanton prediction almost exactly over the whole range of gradients, for higher values of the forcing the agreement is only visible in the far left tail of the velocity gradient PDF.
As expected, the filtering approach does not only extract the final configuration at time t = 0 but also the entire time history of the evolution of the instanton and of the optimal force which is compared in Figure 6 to the solution of the instanton equations. The agreement is remarkable.

Higher-dimensional problems
Applications in fluid dynamics rarely are restricted to one spatial dimension: The most interesting phenomena, especially in turbulence, occur only in 2D or even 3D fluid models. Treating higher-dimensional fluid equations numerically in the instanton framework poses a considerable challenge though, and has rarely been attempted (notable exceptions are the computation of transition paths for multistable geophysical flows [30][31][32]). A major difficulty lies in the fact that the numerical minimization of the action functional requires the storage of the field variable and possibly the auxiliary variable for every instance in time, which quickly exceeds limitations of available memory. In more concrete terms, if one is able to integrate in time a fluid equation (e.g. Navier-Stokes) with N x degrees of freedom (e.g. N x = 128 3 ), one needs N t times the resources to store the corresponding instanton trajectory, where N t is the number of time-steps of the simulation. In the following, we will discuss modifications to the algorithm laid out in sections 5 and 6 to efficiently compute the   Figure 6. Comparison of the velocity field between the instanton prediction and the average event in space-time for a moderate gradient, blue denoting positive and red denoting negative values. The time evolution of the typical shock event is indistinguishably reproduced by the instanton prediction. For more details, see [111].
instanton trajectory for higher-dimensional problems and show how to overcome the memory restrictions.

Recursive solution of the mixed initial/final value problem
The instanton equations for the field variable (11) and the auxiliary variable (12) are a mixed initial/final value problem. While the field variable is integrated forward in time, starting from an initial condition u 1 at t = −T , the auxiliary variable is integrated backwards in time from the final condition p 2 at t = 0. Both equations mutually depend on each other. A similar form of a mixed initial/final value problem is encountered in a different area of fluid dynamics, namely specific questions regarding a passively transported scalar in a flow: analyzing where the measured density at a given point originates from. A method for solving the resulting coupled equationsone for the evolving fluid, forward in time, and one for the passive scalar, integrated from its given final density backwards in time -was described in [145]. To elucidate this method, consider a simpler system of equations on the time interval t ∈ [−T, 0] of the form as described in [145] and which is closely related to the situation in control theory [146].
There are two opposing ways of solving this system numerically: (i) solving (52) forward in time starting from u 1 at t = −T and saving the complete evolution of u(t)  (52) is independent of the "auxiliary" field p. In the case of mutual dependence, such as the system of instanton equations, this variant does not apply.
These two building blocks can now be employed to obtain an efficient compromise -also known as checkpointing in control theory [148,149] -between the two scalings and balancing memory efficiency and computational cost. First, split the interval [−T, 0] into k sub-intervals. After integrating (52) once and storing the result at the beginning of each sub-interval, we reduce the original problem to a similar problem on a shorter domain of size N t /k. Recursively, for each of these sub-intervals, we can break down the problem by splitting the domain, until we reach an interval length of only a single integration step. At this point we can carry out the integration. Inspired by similar principles in multi-grid algorithms or the fast Fourier-transform, a natural choice is k = 2. In this case, the memory requirement scales as O(log N t ) and computing time as O(N t log N t ). A schematic depiction of the algorithm for k = 2 and N t = 16 is shown in figure 7. For the initial solution of the field equation, the field is stored at the intermediate timesteps i ∈ {8, 12, 14, 15, 16}, of which the least three can be used immediately for the backwards propagating auxiliary equation. Whenever a timestep is encountered for which the field configuration is not stored, such as i = 13, it is propagated forward from the last known position (in this case i = 12), while storing intermediate values recursively in the same fashion.
The algorithm described above cannot be used to solve the instanton equations (11), (12) without modification, because in contrast to the model problem (52), (53), both equations are mutually dependent and neither the u-nor Comparison of memory costs for 2D simulations with Nx = 256 × 256, Nt = 2048 for the presented optimizations. The total memory saving of the combined algorithm exceeds a factor of 20. For more details, see [147].
the p-equation can be solved without referring to the other field. As a consequence, at least one of the fields has to be stored for each time-step. Note though that in equation (11) the auxiliary field only acts on u through a convolution with the forcing correlation. Since in fluid dynamical applications the forcing is usually restricted to only a few active modes, χp is very compact to store. In other words, the optimal forcing χp is the only field that needs to be kept for every timestep to be able to integrate the complete velocity field and auxiliary field of the instanton configuration. We remark that this approach cannot recover the O(log N t ) scaling of the original algorithm. On the other hand, since the number of active modes of the forcing are constant, the memory cost of the auxiliary field becomes independent of the number of degrees of freedom in space N x . In contrast to this, the expensive storage of the field variable scales logarithmically in time. This interplay of savings allows for the drastic savings in memory necessary to compute higher-dimensional instantons.

Application: Two-dimensional Burgers equation
As example application for the numerical computation of higher-dimensional instantons in fluid applications, we will consider the Burgers equation in 2D [147], for a forcing f (x, t) correlated white in time and with a finite correlation length L in space, active only on large-scale modes (i.e. truncating the correlation function in Fourier space above a mode ω c , as in (39)). In contrast to the 2D incompressible Navier-Stokes equation, for which was shown in [27] that the naive instanton will be trivial, the 2D Burgers equation exhibits a direct cascade. Furthermore, equation (54) preserves irrotationality of the flow under irrotational forcing, which is the scenario we will restrict ourselves to in this section.
As generalization of the observable taken in 1D, i.e. F [u] = ∂ x u(x = 0, t = 0), several choices are possible: Taking F [u] = ∇ · u conditions on events exhibiting extreme velocity divergence at a single point, which corresponds to "explosion" and "implosion" events. Instead of this, we will focus on the case which selects events of high velocity gradient in a single direction. This observable is closely related to the energy dissipation, ν|∇u| 2 , but additionally breaks rotational symmetry. We will identify the instanton solution with shock structures known from 2D Burgers turbulence. The instanton equations corresponding to equation (54) read with u ⊥ = (−u y , u x ) and the convolution (χp) i = j χ ij p j . Figure 8 (left) depicts the solution of these equations with the numerical scheme laid out above. Shown is a surface-plot of the x-component of the velocity field for the instanton around the origin at t = 0. As expected, the size of the shock structure across the shock is determined by the viscosity ν 1, while its length along the shock is given by the forcing correlation length L = 1.
This example demonstrates the drastic savings in memory that are possible by employing the recursive technique: The highest resolution achieved for the 2D Burgers problem is N x = 1024 × 1024, N t = 8192, with a total memory usage of 577MB. This fits completely into the memory of a single graphics card on which the computation was undertaken. In contrast, the extrapolated memory usage of the un-optimized algorithm would be about 300 times higher. Note also that the computational overhead of the optimization, due to logarithmic scaling, is slightly lower than a factor 3 in the total computation time. Figure 8 (right) shows a comparison of the memory cost for a lower resolution setup of N x = 256 × 256, N t = 2048 in order to fit the unoptimized case on the machine: Both the recursive optimization and the projection method amount to a saving of roughly one half, the combination of both leads to memory savings of a factor 20.

Application: Three-dimensional Navier-Stokes equations
As last example of higher-dimensional fluid systems to be treated with the instanton formalism we will consider the three-dimensional incompressible Navier-Stokes equations, This equation of course lies at the very center of turbulence research. In spirit of the program laid out above there is hope to link the dissipative structures of turbulent flows to the configuration obtained by the instanton formalism. This in turn might elucidate not only the form and emergence of such structures, but serve to answer questions about the intermittent nature of turbulent flows. Yet, it is far from obvious that the naive approach of formulating the corresponding MSRJD-action (9) for the Navier-Stokes equations (58) and then computing its minimizing trajectory (which in itself is a considerable amount of work) leads to any meaningful results: It is not clear which observable to consider, a choice which can change the structure of the instanton trajectory completely. One might be tempted to look at events of high gradients (like in Burgers) or high energy dissipation to obtain the dissipative structures of turbulence. Yet, these events might be well into the dissipative range and thus not capture the multi-fractal nature of the turbulent cascade in the inertial range.
On the other hand, one can obviously commence on the path laid out in sections 5 and 6 for the Burgers equation: Comparing the events dominating the extreme tails of the PDF of a chosen observable to the structures obtained by solving the corresponding instanton equations. This way, one gets access to the tail scaling of PDFs of turbulent quantities that are much harder to obtain by classical methods. In order to demonstrate that such an approach is feasible, we show preliminary results of a numerical computation of a 3D Navier-Stokes instanton in figure 9. As observable, was chosen, which amounts to events of extreme vorticity in the origin at t = 0. The resulting structure in quality resembles the well-known vorticity filaments observed in developed 3D Navier-Stokes turbulence. For the filtered fields, the sample size is ≈ 10 3 . Structures at the final time t = 0 are very similar to those first obtained by Novikov [150,151] and analyzed in detail by Wilczek [152]. They consider the conditionally averaged vorticity field i.e. the vorticity field in space and time, conditioned on the fact that it will attain the vorticity ω(0, 0) = a in the course of its evolution. This is of course the same as the "filtering"-approach laid out above for Burgers shock structures. It therefore makes sense to ask whether the instanton resembles the filtered structure for events of extreme  vorticity. Preliminary results of this are shown in figure 10. Due to the symmetry of the observable, the instanton solution observes rotational symmetry around the axis prescribed by the vorticity in the origin. The filtered velocity field is obtained by averaging fields of the same vorticity ω(x, t) = a, after translating and rotating them into the origin. Depicted is a direct comparison of all three components of the vorticity field in cylindrical coordinates (r, θ, z) between the instanton configuration and the filtered vorticity field of a turbulent DNS of the 3D Navier-Stokes equation, both with N x = 128 3 and N t = 128 for the instanton. The instanton field reproduces the conditionally averaged one remarkably well. In particular, features like the spreading of the vortex and the appearance of a swirl for larger radii (also visible in figure 9) is well-reproduced. These results serve as evidence that the techniques laid out above are powerful enough to compute instantons even for the 3D setup.
In how far this Navier-Stokes vorticity instanton can be analytically captured by the approximation derived in [153] remains a future task.

Outlook
Instantons (saddle point configurations) of functional integrals, equivalent to minimizers of the associated Freidlin-Wentzell action, govern the tails of probability distributions of physically relevant observables in stochastically driven systems. In this work, we presented several algorithms to compute such instantons efficiently -either by extracting them from direct numerical simulations via filtering or by iteratively solving the associated Euler-Lagrange equations. For the stochastically forced Burgers equation, agreement of both methods was shown in one and two dimensions. Preliminary results for the three-dimensional Navier-Stokes equations are encouraging, but much higher resolutions are needed to see whether both approaches will also lead to similar results in this case. This, however, is only a first step on a long path: As it is well-known from quantum field theory, fluctuations around the instanton can yield to non-trivial modifications of the results and are, quite often, of major relevance for understanding the underlying physics of the system. We expect this to hold for fluids as well, meaning that an appropriate and efficient computational framework to capture the impact of fluctuations needs to be developed. Moreover, at this stage, the physical role of the found instantons needs to be investigated further: What is their impact (if any) on intermittency and the scaling of structure functions in the inertial range? Can they indeed capture the impact of singular structures in real flows in a variety of settings (not only in Burgers, but also in Navier-Stokes, MHD, (surface) quasi-geostrophic, etc.)? What if the underlying PDE is integrable (or has at least solitary wave solutions) -can these properties be exploited from a computational point of view? We believe that instantons are relevant for a wide range of phenomena in fluid dynamics -and finding them computationally and understanding their nature is still an area of on-going and fruitful research. historical development of the instanton/optimal fluctuation approach. The work of T.G. was partially supported through the grants ISF-7101800401 and Minerva-Coop 7114170101. The work of R.G. benefited from partial support through DFG-FOR1048, project B2. The work of T.S. was partially supported by the NSF grant DMS-1108780.