Functionals in stochastic thermodynamics: how to interpret stochastic integrals

In stochastic thermodynamics standard concepts from macroscopic thermodynamics, such as heat, work, and entropy production, are generalized to small fluctuating systems by defining them on a trajectory-wise level. In Langevin systems with continuous state-space such definitions involve stochastic integrals along system trajectories, whose specific values depend on the discretization rule used to evaluate them (i.e. the"interpretation"of the noise terms in the integral). Via a systematic mathematical investigation of this apparent dilemma, we corroborate the widely used standard interpretation of heat- and work-like functionals as Stratonovich integrals. We furthermore recapitulate the anomalies that are known to occur for entropy production in the presence of temperature gradients.


Introduction
In stochastic thermodynamics standard concepts of macroscopic thermodynamics are generalized to cover the non-equilibrium properties of small systems [1,2]. Instead of being obtained as ensemble averages, thermodynamic quantities such as heat, work, and entropy changes are defined on the level of individual trajectories [3], allowing for a refined formulation of the thermodynamic laws. For instance, in an overdamped, d-dimensional Langevin system [4][5][6] with coordinate x t at time t, the first law for a trajectory x = {x t } t f t=0 evolving from x 0 at time t = 0 to x f at time t = t f reads with the definitions In (1), the deterministic force exerted on the system has a conservative component −∇ x U(x, t) due to an external potential U(x, t), and a non-conservative component f nc (x, t). The dot inẋ t denotes the total time derivative, γ is the friction coefficient, k B Boltzmann's constant, and T the temperature of the thermal bath. The thermal fluctuations ξ t are modeled by Gaussian white noise sources, which are unbiased, mutually independent and delta-correlated in time. The • symbol in (3b) and (3c) denotes the Stratonovich product, which implies mid-point discretization. The second line in (3b) is simply obtained by exploiting the equation of motion (1). A generic example for a physical system described by the Langevin equation (1) is a colloidal Brownian particle suspended in an aqueous solution at room temperature. The definitions in (3a), (3b) and (3c) are physically motivated. The change in internal energy ∆U[x] is given by the difference of the system's potential energy at the beginning and the end of the trajectory x, and thus is dependent only on initial and final state. The heat Q[x] absorbed by the system results from the energy exchange due to the frictional force −γẋ t and fluctuating force √ 2k B T γ ξ t exerted by the heat bath on the system [7], integrated over all displacements dx t along the trajectory x. The work W [x] performed on the system by an "external agent" [7] results from externally controlled changes of the potential energy ∂U (xt,t) ∂t dt and from non-conservative forces f nc (x t , t) • dx t , acting during the motion along x. Therefore, both heat and work are functionals along the trajectory x. As argued in [7], the consistency of these physical interpretations with the energy balance (2) is based on interpreting the products in (3b) and (3c) in the Stratonovich sense (marked by •), because then the usual rules of calculus can be applied when computing differentials, such as dU(x t , t) = ∇ xt U(x t , t) • dx t + ∂U (xt,t) ∂t dt for the infinitesimal change in internal energy.
With similar reasoning functionals along the trajectory x, which are not directly obtained from energy balance arguments, are defined employing standard rules of calculus. The most prominent example is the entropy production in the thermal bath [1,2] Using the definition dQ t = −γẋ t + √ 2k B T γ ξ t •dx t for the infinitesimal heat exchange over a displacement dx t in accordance with (3b), and replacing −γẋ t + √ 2k B T γ ξ t with the balancing external forces from the Langevin equation (1), we find Following this common wisdom we thus end up with a Stratonovich interpretation of the stochastic integral in (4b). Another strong indication for the Stratonovich interpretation in (4b) being adequate comes from the fact that then ∆S env [x] can be interpreted as a measure for the (ir-)reversibility of the path x. It has been shown [2,8,9] that the log-ratio of probabilities for the trajectory x being generated by (1) in forward time versus the same trajectory being traced out backwards under a time-reversed dynamics is exactly given by (4b) as a Stratonovich integral. From a more mathematical viewpoint the above arguments may not be sufficiently convincing to resolve the fundamental ambiguity of how to "correctly" define the integrals in (3b), (3c) along the stochastic trajectory x: Itô interpretation, Stratonovich or even something else? This problem becomes even more apparent if the friction γ or the temperature T are functions of the position x, because then the associated multiplicative noise term in the Langevin equation may have to be interpreted in either Itô or anti-Itô (also called Hänggi-Klimontovich [10,11]) convention, depending on which of the physical quantities, γ or T , actually varies in space [12][13][14][15][16][17] (see also our derivation in Section 4 below). The identification of the force contributions to heat and work is then far less obvious already on the level of the equation of motion [7]. Put differently, it is of fundamental interest to verify, via a systematic approach, whether the definitions (3a), (3b), (3c) and (4b) are indeed the correct ones or whether they need modifications, in particular for more general cases than the simple Langevin equation (1) with additive noise.
We provide such a mathematical analysis with the present contribution, treating dynamics and thermodynamics under a common approach along the lines we have employed, together with collaborators, in earlier works [18][19][20][21][22][23][24]. In addition to deriving new results concerning functionals, we recover and systematize a number of related findings known from the literature, as we point out throughout the paper in connection with the relevant results. From this perspective our present work can also be seen as being of review-like nature, though certainly not exhaustive (e.g. not covering the most general cases).
The ambiguity in interpreting products and integrals involving the stochastic process x t generated by the Langevin equation (1) is related to two asymptotic limits that have implicitly been performed before writing down (1): (i) the limit of vanishing correlation time of the thermal noise (white-noise limit), and (ii) the limit of vanishing equilibration time for the velocity degrees of freedom (small-mass limit [25]). We therefore start our analysis from the so-called generalized Langevin equation (GLE) [26], i.e. a generalized form of the Langevin equation which possesses finite memory times and noise correlation times, and includes velocity degrees of freedom with finite relaxation time (see Sec. 2.1). We then perform the aforementioned asymptotic limits in a systematic way not only for the Langevin equation of motion itself, but also for general functionals along trajectories generated by the GLE.

The generalized Langevin equation (GLE)
The generalized form of the Langevin equation which we take as a starting point of our analysis reads [24] It describes the dynamics of a Brownian particle (with mass m) with position x t and velocity v t =ẋ t (in d = 1, 2 or 3 dimensions) at time t, which starts out from x 0 with velocity v 0 at time t = 0. The particle moves in a force field f (x, t) = −∇ x U(x, t) + f nc (x, t) and interacts with its environment. This interaction leads to frictional and fluctuating forces, which are modeled by the second and third term, respectively, on the right-hand side of the generalized Langevin equation (5). The functions g(x), h(x), and σ(x) are all matrix-valued, i.e. they are functions from R d to R d×d . The viscous friction term (second term on the right-hand side in (5)), involving an integral over the particle's past velocities with the kernel κ(t − s), describes state-dependent dissipation and comprises the back-action effects of the environment up to the current time t. For the memory function κ(t) we here consider the case with τ b being the characteristic time scale of memory effects induced by the thermal bath. The typical magnitude of the viscous friction forces is specified by the factorγ, i.e. g, h and σ are chosen to be dimensionless. The noise term 2k B T (x t , t)γ σ(x t )η t in (5) also accounts for past and present interactions of the particle with the environment, where η t is a mean-zero, stationary colored noise process. Note that even though they depend on position, x t , there is no ambiguity related to the interpretation of these products because x t is the second integral of the noise process and therefore is sufficiently regular §. We choose the η t to be Gaussian with time-exponential correlations where the correlation time scale τ b is identical to the characteristic time scale of the memory kernel (6) of the viscous friction, because both memory and colored noise effects are governed by the intrinsic time scale of the bath. This is equivalent to generating the η t by the stationary Ornstein-Uhlenbeck process, which is the solution to the SDE: where ξ t is a mean-zero Gaussian white noise source with covariance ξ i t ξ j s = δ ij δ(t − s). The typical scale of the colored noise strength is set by the space-and time-dependent parameter T (x, t).
Note that we allow the functions g(x), h(x), and σ(x) to be unrelated to each other in general. Such a general setting does not necessarily fulfill the fluctuation-dissipation relation (in current notation) [42], and thus corresponds to the Brownian particle being in contact with an inherently nonequilibrium environment. We immediately see, however, that the fluctuation-dissipation relation (9) is valid when g = h T = σ. Then, the latter tensors are typically the "square root" γ 1/2 of the (symmetric) hydrodynamic friction tensor γ (here measured in units of γ). Furthermore, the parameter T corresponds to the temperature of the environment.
In the case T = T (x, t), this environment is locally "around position x" and at any given time t an equilibrium heat bath at definite temperature. We finally point out that in introducing the factorsγ and √ 2k B Tγ in (5) we use a somewhat untypical representation of the GLE, compare for instance with [24,26]. However, this form (5) makes the connection to the overdamped Langevin equation (1) more explicit and thus the generalization of the functionals (3b), (3c) to the GLE case § Note that the noise process here happens to be colored so that in effect x t is the third integral of a white noise. more evident, see Sec. 2.2. We also remark that the form (5) of the GLE is more general than the generic GLE which is typically obtained from microscopic Hamiltonian models for a small system interacting with a heat bath [26,43] (see, for instance, Appendix A in [24] for a derivation). In particular, in these systems temperature would be a space-and time-homogenous constant. While space-dependent friction coefficients are known to be related to nonlinear system-bath couplings [26], the space-(and time-) dependent temperature is a generalization we introduce "by hand" under the following assumption: the temperature field T (x, t) varies sufficiently slowly such that the particle at position x and time t is in contact with a locally well-defined thermal bath, which is homogeneous on mesoscopic scales, and that any memory effects from that local bath on time scales τ b die out before the particle changes position into an "adjacent bath" at (slightly) different temperature, or before the bath-temperature changes significantly in time. The latter assumption is justified because we consider τ b to be the (by far) fastest time scale in the system (see discussion in Sec. 3).

The generalized functionals in stochastic thermodynamics
With the physical interpretation of the various terms in (5) as external force, (generalized) viscous friction and thermal fluctuations, we can now define heat and work along the trajectory ( t=0 generated by (5) in complete analogy to our discussion in Sec. 1, Similarly, the total internal energy for the GLE (5) is given as the sum of kinetic and potential energy, We can now verify easily that heat and work together with the change in internal energy Note that we here use heat and work as names for the functionals (10a) and (10b) even in the general case in which the fluctuation-dissipation relation (9) is violated. From a strict physical viewpoint this is not completely precise, in particular for the heat dissipated in the environment, which is a physically reasonable concept only if that environment is an actual "heat bath" at thermal equilibrium, i.e. if the fluctuation-dissipation theorem is valid.
in correspondence to (2). A generalized definition of entropy production on the level of the GLE is far less obvious; we thus discuss entropy production separately in Section 5.
The crucial mathematical difference between the functionals (10a), (10b) along the t=0 generated by (5) and those functionals in (3b), (3c) is that now the stochastic processes (x t , v t ) are sufficiently regular such that the integrals (10a), (10b) are independent of the discretization chosen to evaluate them. In other words, we do not need to prescribe any discretization scheme to render the quantities Q[x, v] or W [x, v] uniquely defined, and there is no ambiguity in their definition related to the interpretation of stochastic integrals. Hence, we take these unique representations of heat and work as our starting point to perform the same asymptotic analysis which turns the GLE (5) into the overdamped Langevin equation (1). As the main result we will obtain the asymptotic limits of the functionals (10a), (10b) and "automatically" get delivered the correct discretization.
In order to streamline the computation of the asymptotic limits and, at the same time, allow for similar functionals to be analyzed in the same way, we subsume the heat and work functionals under the general form where r(x, v, t) and q(x, t) are arbitrary (but "well-behaved") functions of the indicated arguments, the latter vector-valued. The allocations necessary to obtain (10a) and (10b) as special cases of (13) are rather immediate. For the heat

Formulation of the mathematical approach
The main mathematical difficulty in handling the GLE (5) comes from the fact that it is a stochastic integro-differential equation driven by a non-Markovian noise. However, by extending its phase space and introducing, in addition to the velocity degrees of freedom the auxiliary variables we can transform (5) to a system of Markovian SDEs [24]. We here write this Markovian system in the differential form: The first equation (15a) is a transcription of (14a). In the last equation (15d), W t is a ddimensional Wiener process, whose (formal) derivative in time is the white noise ξ t from (8), i.e. (15d) is a representation equivalent to (8) of the dynamical equation generating the Ornstein-Uhlenbeck process η t . The third equation (15c) results from differentiating (14b) with respect to time and using the specific form (6) of κ(t). Note that also the more general case of a memory kernel κ(t) which consists of a sum of different exponentials can be recast into a Markovian system by introducing a corresponding number of auxiliary variables like y t [44], and likewise one would introduce a suitable number of η t -like variables to represent a combination of different Ornstein-Uhlenbeck processes driving the system of interest. Finally, equation (15b) corresponds to the original GLE (5) after all the auxiliary variables have been plugged in; moreover, we use the abbreviation τ v = m/γ for the characteristic equilibration time of the velocity degrees of freedom. These "equations of motion" (15b)-(15d), generating the Markov process (x t , v t , y t , η t ) together with the SDE for the functional (13), constitute the basic set of equations for our asymptotic analysis. In the second line of Eq. (16) we have introduced dx t = v t dt from (15a) in order to turn the dx t differentials into dt differentials as this will simplify the analysis. We should nevertheless keep in mind that the q(x, t) term originated from a spatial differential, so that we might expect to get back, from the asymptotic analysis, a contribution of the form qdx t with some specific interpretation rule of the product. We are interested in two different types of limiting procedures, namely the whitenoise limit and the small-mass limit. In these asymptotic limits the time scales τ b and τ v , appearing in (15a)-(15d), are "much faster" than the characteristic time scale τ x of the diffusive motion in x. Accordingly, we want to calculate the asymptotic equations resulting from (15a)-(15d) and (16) The main idea is to exploit this scale separation by applying a systematic multiscale or homogenization method [29,45] to the Fokker-Planck equation for the full density comprising fast and slow variables, i.e. to the Fokker-Planck equation for the density ρ(t; x, v, y, η, F ) which is equivalent to the Langevin equations (15a)-(15d) and (16). The result is an "effective" Fokker-Planck equation for the homogenized density of the slow variables alone.
It is, however, not obvious how the above two limits relate to each other, i.e. if they are joint limits or if there is a specific order in which they have to be taken. To answer this question we have to consider the specific physical system we intend to model with (15a)-(15d). Having in mind a colloidal particle (the system) suspended in aqueous solution at room temperature (the thermal bath), the time scale τ b of the bath-intrinsic correlations is related to the molecular collision time of the water molecules, which is of order τ b ≈ 10 −13 s. The velocity relaxation time is given by the ratio of inertia and friction effects and amounts to τ v ≈ 0.5 × 10 −8 s for a colloid of 0.1 µm radius, while the colloid's diffusive time scale is of the order τ x ≈ 0.5 × 10 −2 s [6]. In other words, all three time scales are in fact well-separated, τ b ≪ τ v ≪ τ x , and are governed by different physical processes. Hence, we should perform the limits separately in two steps: We summarize the results for the first limit (white-noise or Markovian limit) in Section 4.1. The second limit (small-mass limit) is treated in Section 4.2. For both cases, the technical details are given in Appendix A.
In order to perform the homogenization procedure we need to know how the various terms in (15a)-(15d) and (16) scale with the relevant time-and length scales. We therefore rewrite (15a)-(15d) and (16) by introducing dimensionless quantities which are all of order one, so that the dependencies on scales show up as dimensionless (pre-) factors which we expect to contain τ b , τ v and τ x . Marking dimensionless quantities by a tilde, we define We measure the variables of interest t and x on the slowest scale of the system, namely the diffusive time scale τ x and an associated "large" length-scale L, such that whereT quantifies the typical temperature of the bath according to T (x, t) =T Θ(x, t), with a dimensionless temperature profile Θ(x, t). The velocity degrees of freedom are of the order of the thermal velocity, since they equilibrate on the scale τ v much faster than τ x . The typical magnitude of the auxiliary variable η is obtained from its definition (14b) by rewriting the integral in a form which is independent of all scales involved. The order of magnitude for η follows directly from its correlations (7). We assume the external force f to vary on slow scales only and to correspond to energies comparable to the typical thermal energy of the bath. The tensors g, h, σ are already dimensionless, because of the prefactorγ in (5). Plugging (17a)-(17f) into the GLE (15a)-(15d), we find (omitting arguments): Similarly, we find the dimensionless form of the general functional (16), which includes the dimensionless representations of heat and work from (10a), (10b), to be with the dimensionless counterpartsr =r(xt,ṽt,t),q =q(xt,t) of the functions (16). The Fokker-Planck equation associated to the SDEs (19a)- (20) is: for the density ρ = ρ(t;x,ṽ,ỹ,η,F) with In the following, we apply the two-step limiting procedure as described above to this Fokker-Planck equation. The technical details are given in Appendix A.

The white-noise limit
In the white-noise limit we have τ b /τ x → 0 while τ v /τ x stays finite. Defining the small parameter we can write the Fokker-Planck equation (21a)-(21c) in the form with the operators where we use Einstein's summation convention over repeated indices. Performing the multi-scale procedure outlined in Appendix A we find the "effective" Fokker-Planck equation which governs the dynamics of the reduced density ρ = ρ(t,x,ṽ,F) for the slow variables (x,ṽ) in the asymptotic limit ǫ → 0. It is equivalent to the Langevin equations For g = h T = σ = γ 1/2 these are exactly the standard Langevin-Kramers equations (see e.g. [5]) describing underdamped Brownian motion, together with the same equation for the functional as we had it in the very beginning. Since none of the functions appearing in the Langevin-Kramers equation or in the functional depend on the (auxiliary) variables (y, η), which have been "integrated out" during the homogenization procedure, the Markovian limit τ b /τ x → 0 is basically trivial in the current context. As we will see in the next Section, however, the small-mass limit τ v /τ x → 0 is much more interesting, in particular in our case in which g, h, σ, and Θ depend on the particle position.

The small-mass limit
For the small-mass limit the small parameter is Using this definition, we can bring the Fokker-Planck equation (25) for the underdamped motion into the form (23) with the operators The homogenization procedure from Appendix A then yields the "effective" Fokker-Planck equation where it is understood that a derivative with a function being included in the numerator of the derivative's fraction acts only on that function, but not on anything else that appears further to the right. The symmetric d × d matrix Σ is proportional to the covariance of the Gaussian distribution, which is the stationary distribution of the "fast process" (28c) (at a givenx) and solves the Lyapunov equation In the above, we have implicitly assumed that gh is positive definite (and therefore symmetric and invertible) so that the Lyapunov equation has a unique solution, which can be written as [46] Note that this assumption is satisfied for the physically relevant case in which gh is a friction tensor, since it then must be positive definite. If, in addition, gh commutes with σσ T (i.e., if the fast process governed by (28c) satisfies detailed balance), then we have Σ = (gh) −1 σσ T . It is easy to see that the fluctuation-dissipation relation (g = h T = σ) implies detailed balance but the converse is generally not true. The Fokker-Planck equation (29) describes the time evolution of the density ρ = ρ(t,x,F ) for the slow variablesx andF . The first line represents thex-dynamics. The space-dependent terms involving the strength of thermal fluctuations, i.e., the temperature Θ and Σ appear in Itô position, i.e. to the right of both derivatives in the diffusion term, meaning that with a constant friction the correct interpretation of the multiplicative noise would be the Itô one (start-point rule). The space-dependent (inverse) friction coefficient (gh) −1 appears in between these two derivatives and thus as an anti-Itô or Hänggi-Klimontovich term. Such splitting of friction and temperature into anti-Itô and Itô contributions is well known [12,13,[15][16][17] for a locally equilibrium bath with the fluctuation-dissipation relation being valid at each position x, for which hg = σσ T =γ are identical to the (dimensionless) friction tensorγ, and for which Σ = 1 (see Eq. (30)). The second to fourth line in (29) characterize the dynamics of the functionalF . If (gh) −1 Σ is symmetric, which is guaranteed if the fast process obeys detailed balance, these terms are consistent with a contributionq • dx to the drift of the functionalF, interpreted in the Stratonovich sense, see Appendix C. In fact, the third term in the second line contains both, the spurious drift Θq i Σ jk which appears in dx when writing its thermal noise term in Itô-form, as well as the correction The set of Langevin equations equivalent to (29) therefore reads where we switched back to dimensionful quantities. The term s(x, t) represents a noiseinduced drift corresponding to an Itô-interpreted multiplicative noise in dx t . Its i-th component is given by with Σ being the solution of the Lyapunov equation (30). These equations are the main result of the present contribution. Our systematic and general approach considering the white-noise and small-mass limit for the dynamics and for thermodynamical functionals simultaneously also recovers some of the results obtained in a series of related works, both for the dynamics [12][13][14][15][16][17]19,[27][28][29][30][31], and the stochastic thermodynamics for entropy [18,20,22,32,34,36,37,[39][40][41], heat [33,35], work [38] or general functionals [23]. The most important observation is that a functional of the underdamped dynamics (x, v) with a differential of the form q(x t , t) · dx t = q(x t , t) · v t dt (q being independent of v) in the small-mass limit always reduces to q(x t , t) • dx t with a Stratonovich product. This behavior occurs completely independent of spatial dependencies of the friction tensors or the temperature field, but rests on two important conditions: i) the processes which generate the colored noise to be equilibrium ones, ii) in the small-mass limit the velocity degrees of freedom reach an equilibrium distribution with the local temperature (this is always guaranteed if the system obeys the fluctuation-dissipation relation g = h T = σ) [19,29]. As a specific example of (32a)-(32c), we consider the heat and work functionals (10a) and (10b) for a Langevin equation, which fulfills the fluctuation-dissipation relation, γhg =γσσ T =γγ = γ: where the multiplicative noise in dx t is an Itô product. The Stratonovich product in dQ t and dW t confirms our physically motivated arguments from the Introduction, based on viewing heat, work etc. as transformations of the dynamical variables [7]. The original definition of heat was based on the identification with the energy exchanges due to the frictional and fluctuating force exerted by the heat bath on the system [7]. This interpretation is preserved also in the case of space-dependent friction or temperature with the additional condition that any spurious (noise-induced) drift term in the Langevin equation, which might result from a certain representation of the multiplicative noise term, always has to be interpreted to contribute to heat (and not to work). This is testified by the fact that the limiting expression (33b) is identical to (3b) and there is no correction to f due to space-dependent friction or temperature. In other words, the specific interpretation of the multiplicative noise in the overdamped Langevin equation with space-dependent friction or temperature is irrelevant for the identification of the contributions of the various drift terms to heat and work, a quite comforting finding from a physical perspective.

Entropy production
For the sake of completeness, we here discuss homogenization of entropy production. It has been shown [18] that, in presence of temperature gradients, the homogenized Fokker-Planck equation for entropy production acquires non-trivial terms which describe a socalled "entropic anomaly" of the small-mass limit, see also [20,22,23,[32][33][34][35][39][40][41]. In the following, we briefly summarize the calculation and main results from [18], treating explicitly the case in which the friction tensor is not simply proportional to the identity, see also [22]. In stochastic thermodynamics the entropy production (in the thermal environment) is usually defined [2,8,9] as a measure of irreversibility via the log-ratio of probabilities for observing a specific trajectory (x, v) = {(x t , v t )} t f t=0 in forward time versus observing the same trajectory traced out backwards when advancing time. For the overdamped Langevin equation, the explicit expression (4b) for the resulting functional is obtained from path-integral techniques [2,8,9]. The designation entropy production in the thermal environment for this irreversibility measure originates from the central observation that it is equivalent to the heat dissipated into the bath along the trajectory divided by the bath temperature, see our discussion of (4b).
However, for the GLE (5) a direct calculation using path integrals of path probabilities involving only (x, v) (without explicit knowledge of the auxiliary variables (y, η)), is much more challenging due to the GLE's non-Markovian character, and has not been achieved yet to the best of our knowledge. To analyze entropy production, we therefore start from its well-known expression for the Langevin-Kramers equation, and assume that this expression is the asymptotic limit of a (unknown) "generalized entropy production" for vanishing noise correlation time τ b /τ x → 0.
For analyzing the small-mass limit τ v /τ x → 0, it proves convenient to split off a boundary term, because we then can rewrite the dv t integral in (34) as a sum of dt and dx t integrals [18,22]. To do so, we first introduce the Maxwell-Boltzmann velocity distribution at given position x and time t, By observing that the total differential −d ln w = d( mv 2 2k B T + d 2 ln 2πk B T m ) indeed contains the desired term mvt k B T · dv t , we can eliminate it from the integral in (34) and find where w 0 and w f denote the Maxwell-Boltzmann distribution (35) at initial and final point of the trajectory. The corresponding dimensionless differential of the functional reads (omitting arguments) Our goal is to perform the small mass asymptotic limit ǫ = τ v /τ x → 0 of this functional in conjunction with the Langevin-Kramers equation, exactly as we have done it in Section 4.2 for heat-and work-like functionals. We will here, however, focus on the case when hg = σσ T =γ is positive definite, such that the fluctuation-dissipation relation is fulfilled (otherwise entropy production in the thermal bath would not be well-defined). The operators L † 2 , L † 1 , L † 0 of the Fokker-Planck equation (23) now take the form They are completely analogous to (28a)-(28c), except that now the terms related to the functionalF contain polynomials quadratic and cubic inṽ, which complicate the homogenization procedure considerably. Details of the corresponding calculation are given in Appendix A.3, the resulting Fokker-Planck equation reads where theγ (i) are the eigenvalues ofγ. To write the equivalent Langevin equation we switch back to dimensionful quantities, We recover the same dynamics in x t as before, see (33a), because we considered the exactly identical dynamical process. The dynamics of entropy production, however, looks quite differently from the one for heat (33b). In particular, entropy production consists of two parts, a regular one (first line of dS env t ), and a so-called "anomalous" contribution [18] (second and third line of dS env t ; note that the Wiener processŴ t in (40b) is independent of the Wiener process W t appearing in the equation (40a) for x t ). The regular part is expressed as a differential in dx t , i.e. it is determined by the statistical properties of the process x t . We point out that it is not simply heat divided by temperature f • dx t /T , but that for inhomogeneous temperature it contains an additional term. This expression is consistent with the definition of entropy based on time reversal that one would obtain for the dynamics described by (33a). For a spacedependent temperature there is also an additional "anomalous" entropy. Its essential property is that it cannot be expressed exclusively as a functional over the trajectory x, it rather has "its own" random fluctuations, which are completely independent of the fluctuations in x t (third line of dS t ). As a consequence of the coarse-graining associated with the asymptotic limit τ v /τ x → 0, the trajectory x alone does not contain sufficient information to fully specify entropy production in an inhomogeneous environment. The average "anomalous" entropy production corresponds to the drift term in the second line of dS env t , and is non-negative. Note that for γ = γ 0 1, where 1 is the identity tensor, with a scalar friction coefficient γ 0 , this term reads 1 T d+2 6γ 0 (∇ xt T ) 2 . Its physical origin and properties are discussed in detail in [18,20,22,33,40]. We emphasize that, if temperature is constant in space, the limiting procedure is regular and one simply finds that entropy is given by the usual (minus) heat divided by temperature recovering (4b) with the expected Stratonovich product.

Discussion and Conclusions
The generalized Langevin equation (5) and the associated functionals along its trajectories do not suffer from any ambiguity related to interpretation of stochastic integrals, because the generated position and velocity processes are sufficiently regular. Exploiting this uniqueness of integrals, we here applied a multiscale procedure to the GLE and its functionals to find out which noise interpretations of the overdamped Langevin equation and its thermodynamic functionals are consistent with the systematic white-noise and small-mass limits. The most important finding is that heat-and worklike functionals need to be interpreted in Stratonovich sense, no matter which kind of multiplicative noise is present in the underlying overdamped Langevin equation. This is in agreement with physical arguments based on consistency of the energy balance [7]. We remark that, before performing the limiting procedure on the heat functional (10a) we have split off the boundary term corresponding to the changes in kinetic energy of the particle. Its average contribution is proportional to the change in temperature dk B ∆T /2 (where d is the dimensionality of the system). As pointed out in [33] this contribution plays a relevant role when temperature is changing in time and must be kept into account to properly define adiabatic transformations, which have then been experimentally realized [47,48]. The average contribution of the boundary term has been computed for two cases studies also in [49] and its probability distribution in [50] In addition to heat and work, we also discussed the case of entropy production [18], which, in presence of spatial temperature variations, cannot be expressed as a functional over the overdamped stochastic trajectory only. It rather features an additional "anomalous" contribution, which is related to dissipation due to heat transport by the (hidden) velocity degrees of freedom exchanged with different thermostats [18,20,22,23,32,39,41].
We here presented these calculations having specifically in mind a Brownian particle suspended in an aqueous solution. Generalizations to several (interacting) Brownian particles are straightforward. In such systems, the intrinsic time scale of the thermal bath is the fastest time scale in the system, followed by the relaxation time scale of the velocity degrees of freedom of the Brownian particle. Accordingly, we performed the white-noise limit first, and then the small-mass limit. However, the GLE, which in its most basic form was first introduced by Mori in [51], has subsequently been used to model many systems in statistical and biological physics [52,53], including both normal and anomalous diffusion, and the motion of active matter [54]. In general, the separation of time scales might be different in other physical systems, distinct from a (passive) Brownian particle, such that other orderings of the above limits might become relevant. While it is known that the ordering of limits affects the limiting dynamics [28], we are not aware of any results for functionals. Further very interesting generalizations of our analysis based on the model (5)-(8) include the case of many particles subject to a matrix-valued friction kernel κ with different relaxation times, and likewise for the correlations of the colored noise η. This would allow for analyzing, e.g., the situation in which different particles are in contact with heat baths of different physical character. The heat baths can even have different temperatures by appropriate generalization of T or choice of σ. Other non-equilibrium systems of great interest, in particular in the context of active matter, are Brownian particles driven by colored noise processes η which are themselves out of equilibrium. We expect non-trivial contributions to the dynamics [19] and thermodynamics (i.e. functionals), resulting in implicit discretization rules which are more complicated than just the common rules such as the Itô, Stratonovich and anti-Itô rule [29].
All these ideas are left for future explorations.
Here, R 0 , R 1 are scalar functions, U , V are vector-valued functions and S, G, and D are matrix-valued functions of the indicated arguments; their dimensions are fixed by the number of slow and fast variables, respectively. The small parameter ǫ represents the (square root of the) ratio between the fastest and slowest time scale in the system. We assume that the explicit time dependencies in all the functions R 0 , R 1 , U , V , S, G, and D occur on the slowest time scale only. We can write the Fokker-Planck equation for the density ρ = ρ(t, X, Y , F ), which is equivalent to the Langevin equations (1.1a)-(1.1c), in the form with the operators where we skipped the arguments of the functions and switched to index notation with summation over repeated indices being understood. The "translations" of this general Fokker-Planck equation into the specific forms (24a)-(24c) and (28a)-(28c) used for the white-noise and the small-mass limit, respectively, are provided in table A1.
Appendix A.1. The homogenization procedure The homogenization procedure [23,29] consists in a systematic perturbation expansion in the small parameter ǫ, which characterizes the separation of time scales present in the system ¶. Accordingly, we introduce time variables θ and τ on the fastest and slowest scales, respectively, and an intermediate time scale ϑ, Then, we expand the density ρ in powers of ǫ, ρ = ρ 0 + ǫρ 1 + ǫ 2 ρ 2 + . . . , (1.5) where all ρ i = ρ i (ϑ, τ, X, Y , F ) are a priori assumed to be functions of all variables and the two slower time scales independently. Since we are not interested in the relaxation ¶ For a pedagogical introduction to the multiple time-scale approach in the context of Fokker-Planck equations we recommend Ref. [45]. Bocquet uses an expansion procedure equivalent to ours, with some (minor) technical differences on how the solutions of the higher-order equations (i.e. Eqs. (1.6b) and (1.6c) in our case) are treated. Table A1. "Translation key" to map the general form (1.3a), (1.3b), (1.3c) to the specific Fokker-Planck equations (24a), (24b), (24c) and (28a), (28b), (28c) from the main text.
white-noise limit small-mass limit eqs. (24a), (24b), (24c) eqs. (28a), (28b), (28c) processes on the fastest time scale θ, but rather in the situation in which these fastest scales reached their "equilibrated" stationary state, we assume the ρ i to be independent of θ. Plugging the expansion (1.5) into the Fokker-Planck equation (1.2), and using that the time-derivative turns into ∂/∂t = ǫ −2 ∂/∂θ + ǫ −1 ∂/∂ϑ + ∂/∂τ according to (1.4) (and the assumption of the three time scales to be independent), we collect together all terms with the same power in ǫ to find the hierarchy of equations From (1.3c) we see that the solution to (1.6a) can be written as where the density w(Y |X, τ ) fulfills i.e. it is the stationary density of the fast variables Y conditioned on the slow variable X and slow time τ . Due to the simple form of (1.3c), we can calculate w easily and obtain the Gaussian where n is the dimension of the vector Y . The symmetric covariance matrix Σ is the solution of the Lyapunov equation Note that in general it depends on X and τ via the functional dependencies of G, D, such that also the covariances depend explicitly on the slow scales X and τ (the overline denotes the average over the fast variables Y weighed with their stationary density w), justifying our notation of a density w(Y |X, τ ) conditioned on X and τ . We proceed with the next equation (1.6b) by employing the so-called solvability condition [29], which states that the right-hand side inhomogeneity has to be orthogonal to the nullspace of the operator L 0 adjoint to L † 0 , otherwise there is no non-trivial solution. From (1.3c) we see that the constant functions are contained in the nullspace of L 0 , such that the solvability condition reads dY ∂ρ 0 ∂ϑ + L † 1 ρ 0 = 0. Using the explicit form (1.3b) and the solution (1.7), it is straightforward to show that the solvability condition reduces to ∂ρ 0 ∂ϑ = −R 1 ∂ρ 0 ∂F . Since the equation (1.6b) represents contribution of order ǫ −1 we require them to vanish to guarantee a well-defined ǫ → 0 limit (so-called centering condition). In our case, we therefore have to restrict the function R 1 to be odd in the fast variables Y . Then, we have R 1 = 0, and we find that also ∂ρ 0 ∂ϑ = 0, i.e. that the reduced densityρ 0 is independent on the intermediate time scale ϑ.
Applying the solvability condition in an analogous way to the third equation (1.6c) of our equation hierarchy, we find In the second equality, we have used the solution (1.7) for ρ 0 , the explicit expressions for the operators L † 2 and L † 1 from (1.3a) and (1.3b), and we have introduced the definition of the current-like quantity for (polynomial) functions π(Y ) of the fast variables. It turns out that the result (1.12) is sufficient to derive a Fokker-Planck equation for the reduced densityρ = dY ρ in slow variables only (being the reason that we do not need to go to higher orders in the equation hierarchy (1.6a)-(1.6c)). Our goal is to identify the lowest order contributions to the dynamics ofρ which survive in the limit ǫ → 0. Indeed, from the definition (1.4) of the three time scales and the expansion (1.5) we find ∂ρ ∂t = ∂ρ 0 ∂τ + ∂ρ 1 ∂ϑ + O(ǫ) and ρ =ρ 0 + O(ǫ). Hence, in the asymptotic limit ǫ → 0 we obtain from (1.12) (1.14) As a Gaussian integral over w, the average R 0 is straightforward to calculate if R 0 contains polynomials in Y , as it is typically the case. The remaining task consists in expressing J Y i and J R 1 , which contain the a priori unknown solution ρ 1 of the equation (1.6b), in terms ofρ =ρ 0 + O(ǫ) in lowest order ǫ.

Appendix A.2. Heat-and work-like functionals linear in Y
For explicit results we first focus on the case, which is relevant for the heat and work functionals, i.e.
with functions Q i depending on slow scales only. As demonstrated in Appendix B.1, we find Hence, the final Fokker-Planck equation reads From this general result we can easily read off the two homogenized Fokker-Planck equations (25) and (29) for the white-noise and small-mass limit, respectively, by applying the mapping provided in Table A1. with its variance at X and t being set by the local temperature, Moreover, we restrict ourselves to the specific structure of linear, quadratic and cubic terms in the fast variables as they appear in the functional for the entropy production (37), keeping, in particular, the differences between Y 2 and their variance, In case of the entropy production, the functions R(X, t), Q(X, t) and P (X, t) read Note that all indicated explicit time dependencies are assumed to occur on the slowest scale τ . We have to use these expressions for R 0 and R 1 to evaluate the average R 0 and the integral J R 1 appearing in (1.14). We first notice that (see (1.19b and that we can write The integral J Y i is calculated in Appendix B.1 (see (2.5b)), where M is defined in (2.14). Hence, Plugging all these results into (1.14) we finally obtain the homogenized Fokker-Planck equation (1. 26) while Σ ij = Θδ ij for the entropy production functional, i.e.
The first expression is the same one as given in (1.16a), the second expression agrees with (1.23).
We need the integral J Y j Y j Y i only for the entropy production. In the following, we thus focus on the specification given by eqs. (1.18)-(1.19b). It turns out that the appropriate and construct A ijkl in a way that the combination of third-order polynomials in Y where O is an orthogonal tensor which diagonalizes G, with the eigenvalues G (i) of G. Note that the sum in (2.7) is over m and n and that A ijkl obeys the symmetry A ijkl = A jikl = A ijlk . We then obtain This result corresponds to the left-hand side of (2.2). For its right-hand side we obtain (see (1.3b) and (1.20b)) where we have used the averages ∂ ∂Y n Y j Y k Y l = Θ (δ nj δ kl + δ nk δ jl + δ nl δ jk ) , (2.11a) Y n Y j Y k Y l = Θ 2 (δ nj δ kl + δ nk δ jl + δ nl δ jk ) , (2.11b) and where we have replaced ΘV nρ0 , emerging from the middle term in the operator L † 1 , by ΘG nm J Ym + Θ ∂ ∂Xm ΘS mnρ0 + (Θ 2 Q n − 2Θ 3 P n ) ∂ρ 0 ∂F according to (2.5b). Combining (2.9) and (2.10) we finally find To arrive at this result we have used which can be proven using (2.7) and (2.8a) (note that in all terms we sum over j and k). Moreover, we have defined the tensor M as (2.14) We remark that in the case of an isotropic tensor G = g1 (relevant, e.g., for a spherical Brownian particle) M simplifies to M = (n+2) 3g 1.

Appendix C. Joint generator for dynamics and functionals
We here briefly sketch how to obtain the Fokker-Planck equation describing the evolution of the joint process X t , F t involving the dynamics and the functional. Note that we do not distinguish here between fast and slow variables, the vector X rather collects all dynamical variables. Likewise, we consider a very general form of the Langevin equation for X t and F t , written component-wise: 1b) where B ij = β ik β jk andĥ = h + 1 2 ∂q i ∂X j B ij , and the symbols · and •, respectively, indicate the Itô and Stratonovich product. In going from the first to the second line of (3.1b) we have converted the Stratonovich integral into its statistically equivalent Itô counterpart. Similarly to the derivation of Itô's formula, we express the differential of a generic function f (X t , F t ) as Upon taking the average and retaining only the contributions of order O(dt) we find from which we can read off the expression for the joint generator L. Its adjoint is We can now verify that the stochastic differential equations (32a)-(32c) given in the main text correspond to the Fokker-Planck equation (29). Let us recall their expression here to ease the procedure:

5c)
By plugging the expressions from (3.5a)-(3.5c), which correspond to u i , q i ,ĥ and B ij , into (3.4), dropping the explicit space and time dependence we obtain In dimensionless form, if (gh) −1 Σ is symmetric, which is guaranteed if the fast process obeys detailed balance, this is equivalent to