Stochastic thermodynamics and fluctuation theorems for non-linear systems

We extend stochastic thermodynamics by relaxing the two assumptions that the Markovian dynamics must be linear and that the equilibrium distribution must be a Boltzmann distribution. We show that if we require the second law to hold when those assumptions are relaxed, then it cannot be formulated in terms of Shannon entropy. However, thermodynamic consistency is salvaged if we reformulate the second law in terms of generalized entropy; our first result is an equation relating the precise form of the non-linear master equation to the precise associated generalized entropy which results in thermodynamic consistency. We then build on this result to extend the usual trajectory-level definitions of thermodynamic quantities that are appropriate even when the two assumptions are relaxed. We end by using these trajectory-level definitions to derive extended versions of the Crooks fluctuation theorem and Jarzynski equality which apply when the two assumptions are relaxed.


Thermodynamic requirements
We consider a system with a finite number of states, labeled m, n, etc. All quantities are implicitly functions of time t, and for any variable z which varies with t, we useż to indicate dz(t)/dt. Before formulating our requirements, we define several terms, all based on an arbitrary time-dependent scalar-valued internal energy function of the states, m . At this point, despite the terminology, there is no physical meaning associated with m ; such meaning will arise through requirements 2 and 3 below.
We write the expected internal energy as and we defineQ := mṗ m m andẆ := m p m˙ m as the rate of work and as the heat flow, respectively 5 . So the 'first law of thermodynamics' holdsU =Q +Ẇ. Next, we define the (generalized) information entropy as which reduces to ordinary Shannon information entropy H = − m p m log p m , for f(x) = x and g(x) = −x log x. The form of entropy used in equation (2) is known as sum-class [63], a generalization of trace-class [51], which is the special case where f(x) = x. Maximizing the generalized entropy subject to a given value of the expected internal energy gives us the equilibrium distribution: where C f (π) = f m g(π m ) . In order to obtain a unique equilibrium distribution, we also impose the following requirements on functions f and g: • f is a strictly increasing function, i.e. f > 0, • g is a concave function, i.e. g 0. Next, we introduce requirements concerning the thermodynamics of a system coupled to one or more infinite heat baths. By infinite heat bath we mean the standard approach of infinite heat capacity of the bath. Thus, the bath is statistically independent of the system, which leads directly to S TOT = S Bath + S Syst . Furthermore, the bath equilibrates very fast. Therefore, we can consider ΔS Bath = βΔQ.
These requirements provide us with the promised physical interpretation of the 'internal energy', 'work', and 'heat flow'.

Requirement 1. Markovian evolution.
We assume that the dynamics of our system evolves according to a first-order differential equation, as in conventional stochastic thermodynamics, but allow that equation to be nonlinear:ṗ where Ω(p n ) is an arbitrary function while w mn is a set of real-valued parameters. In appendix B, we show how this non-linear master equation can be derived from Markov chains with probability-dependent transition probabilities. C(p) is an arbitrary non-zero function depending on the probability distribution.
Overview of physical and life-science applications of Markov chains with probability-dependent transition rates can be found e.g. in review [64]. Note that the antisymmetric form of the summand in equation (4) ensures probability conservation: regardless of the precise form of Ω, nṗ n = 0. We refer to each w mn as a (generalized) transition rate. In general w mn will not have the form of a conventional transition rate. (The precise connection between the matrix w mn and conventional transitions rates is discussed in appendix B.). Requirement 2. Local detailed balance. We assume that the system is coupled to a finite set of infinite heat reservoirs, V, with associated temperatures T ν , ν ∈ V. We formalize this assumption in several steps. First, we assume the transition rates are given by a sum over rates for each reservoir, i.e. w mn = ν w ν mn . Next, define the local equilibrium of the system for each such reservoir ν as the distribution that would be the equilibrium of the system if it were only connected to that reservoir, i.e. as the distribution p m with maximal entropy (as defined in equation (2)), subject to the constraint that the expected energy defined in equation (1) equals β ν = 1/T ν (up to a reservoir-independent proportionality constant).
In the usual way we can solve for that maximizing distribution, getting . The requirement of local detailed balance means that the local probability flow J ν mn = w ν mn Ω(p n ) − w ν nm Ω(p m ), vanishes for the local equilibrium distribution, i.e. w ν

Requirement 3. Second law of thermodynamics.
We introduce a few more definitions in order to formalize the second law. We define the heat flow from a reservoir ν aṡ so that the total heat flow isQ = νQ ν . Similarly, we define the entropy flow from reservoir ν into the system asṠ ν e :=Q ν T ν and the overall entropy flow asṠ e = νṠ ν e = νQ ν T ν . We then define the entropy production aṡ where the entropy has the form given in equation (2). Using these definitions, the generalized second law is the requirement thatṠ i 0,

Necessary relationship between nonlinear master equation and entropy
Our first main result is a relation between the function Ω specifying the nonlinear Markovian dynamics and the precise generalized entropy which must hold if the three requirements introduced above all hold: Theorem 1 means that, in general, a given nonlinear Markovian dynamics will require that we not use Shannon entropy, if we wish to maintain thermodynamic consistent. Theorem 1 is also the basis for our other results below.
The complete proof of theorem 1 is in appendix C. Here we outline its main steps. First, we calculate the time derivative of generalized information entropy. Second, we decompose that derivative into two terms, which as we show equal the (non-negative) entropy production rate and the entropy flow rate. (The requirement of local detailed balance is invoked to identify the entropy flow rate with the heat rate over the temperature.) From this we deduce the form of the function Ω presented in equation (4).
A corollary of theorem 1 arises if we plug it into equation (4), to derive the general form that the master equation must have in order to meet the three requirements: There are multiple examples in the literature of systems that can be described by a master equation (11) or corresponding Fokker-Planck equation. (See appendix D for derivation of that Fokker-Planck equation) These include chemical reaction networks [68], morphogen gradient formation [69], financial markets and option pricing [70] (g(p i ) ∝ p q i ); heavy-ion collisions [71], turbulence and vortex formation [72][73][74] Another corollary of theorem 1 is that if we know the form of Ω, then the entropic functional satisfying requirements 1-3 can be written in the following form: Note also by plugging theorem 1 into the requirement of local detailed balance that local detailed balance has the same form as in conventional stochastic thermodynamics: In addition, we can express the entropy production rate aṡ and the entropy flow rate asṠ

Stochastic thermodynamics for nonlinear master equations
We now introduce the stochastic (trajectory-level) versions of the ensemble thermodynamic quantities investigated above. For the rest of the paper, we assume f(x) = x (i.e. the trace-class entropy), for simplicity. To begin, we define the stochastic entropy of a stochastic trajectory n(.) at time τ ∈ [0, T] as and stochastic energy as e(n(τ )) := n(τ ) . The stochastic heat flow and work flow can be obtained from the first law on the trajectory levelė(τ ) =q(τ ) +ẇ(τ ) whereq(τ ) = mδ m,n(τ ) m andẇ(τ ) = m δ m,n(τ )˙ m , respectively. Thus, the ensemble first law of thermodynamics can be written as ė = q + ẇ where · denotes the ensemble average, i.e. multiplication of p n(τ ) and summing over all trajectories n(τ ). The average energy can be obtained as the ensemble average of the stochastic energy, i.e. U(τ ) = e(τ ) . However, in general such equivalence of averages does not hold for entropy, once one goes beyond the special case of Shannon information entropy, i.e. in general S(τ ) = s(n(τ )) . Nevertheless, the second law of thermodynamics is still valid on the trajectory level. In appendix E, we show that s(n(τ )) =ṡ i (n(τ )) +ṡ e (n(τ )) whereṡ i (n(τ )) is the entropy production rate andṡ e (n(τ )). Due to the local detailed balance, we obtain thatṡ e (n(τ )) = ν β νq ν (n(τ )). The ensemble second law (7) can be obtained by averaging over all possible trajectories. Now consider an external protocol λ(τ ) which drives the energy spectrum, i.e. m (τ ) ≡ m (λ(τ )). In the usual way define a time-reversed trajectoryñ(τ ) = n(T − τ ) and the associated time-reversed driving protocolλ(τ ) = λ(T − τ ). Suppose that local detailed balance is fulfilled. Then as we show in appendix F, whereP denotes probability under time-reversed protocolλ(τ ). This means that the standard form of the detailed fluctuation theorem [76] must hold, just with our modified definition of entropy production: Next, write the trajectory version of the Massieu function (also called free entropy [77]) as ψ = s − βe. The entropy production can be written as Consider the situation where the initial distribution is equilibrium one. For equilibrium state the stochastic Massieu function is independent of the state since it is equal the Lagrange parameter α, i.e. ψ(π m ) = s(π m ) − β m = α = ψ . Combining this with equations (18) and (19) gives (a generalized form of) the Crooks fluctuation theorem, As usual this in turn gives (a generalized form of) the Jarzynski equality Jensen's inequality then gives These results provide a clear physical interpretation of the Lagrange parameter α: for a process that starts and ends in an equilibrium, −TΔα is the minimal amount of work necessary for transition between two equilibrium states.

Thermodynamic uncertainty relations and speed limits for non-linear systems
We finally mention that the validity of DFT (18) has further consequences. Consider the situation when initial and final distribution agree p n (0) = p n (T) and when the external protocol is time-symmetric, i.e. l(τ ) =l (τ ). Let us consider a trajectory quantity that is antisymmetric under time-reversal, i.e. φ(n(τ )) = −φ(ñ(τ )). Then the fluctuation theorem uncertainty relation holds: as shown in [78]. Next, we mention the time-dissipation uncertainty relation [80]. By defining survival probabilities p s (t) as in [80], we can derive an integral fluctuation theorem for p s (t) as a consequence of the detailed fluctuation theorem in the main text: where X s = n(τ ) X(n(τ ))P(n(τ ))/p s (t) denotes the normalized average of the generic observable X on the set of survived trajectories. By defining the instantaneous rate r(t) := − 1 p s (t) dp s(t) dt , we recover the main result of [80] ṡ whereṡ i is the entropy production rate for the case of non-linear systems.
Finally, we show that the results in [10] can be straightforwardly generalized to the case of systems governed by the non-linear master equation. In order to do so, we introduce the dynamical activity of the system as and define the total entropy production of the system as S i (τ ) = τ 0 dtṠ i (t) whereṠ i (t) is defined in equation (14). Let us also define A τ = 1 τ τ 0 dt A(t) and the variation distance as L(p, p ) := m |p m − p m |. Then by a straightforward calculation analogous to the proof in [10] we obtain the speed limit in the following form:

Discussion
In this paper we considered a broad type of nonlinear Markovian dynamics. We also define a generalized 'local equilibrium distribution', as the distribution that maximizes the 'information entropy', which we allow to be any specified sum-class entropy functional. This allowed us to impose a generalized form of local detailed balance, by requiring that the currents for each reservoir vanish if the system is in its local equilibrium distribution for the specified information entropy. We required that the generalized, nonlinear Markovian dynamics obey this generalized form of local detailed balance. Next, we define 'entropy flow' in the usual way in stochastic thermodynamics, in terms of heat flow into the reservoirs. Given these definitions, we define the (generalized) second law by requiring that under the nonlinear Markovian dynamics, the time-derivative of the difference between the specified information entropy and the entropy flow is non-negative.
Our first result was to show that for this generalized form of the second law to hold, there must be a certain simple relationship between the choices of which particular sum-class entropy to use to define the information entropy, and of the particular nonlinear form of the Markovian dynamics. We then used this result to derive trajectory-level definitions of (generalized) thermodynamic quantities. Our second result is that those generalized stochastic thermodynamic quantities obey both the Jarzynski equality and Crooks' theorem.
It is important to emphasize that although we broadened the range of scenarios that stochastic thermodynamics can apply to, we still restricted those scenarios in several ways. For example equation (4) is not the most general form of a non-linear master equation. Other types of non-linear equations are obtained from mean-field approximations of many-body systems, phase transitions and self-organization or approximations of quantum master equations, see [67] for an overview. The question at stake is whether the results can be extended to more general types of non-linear Markovian and non-Markovian systems. This might require to consider entropic functionals of more generalized form, or even generalizations of the maximum entropy principle to e.g. principle of maximum caliber [81]. classify generalized entropies based on asymptotic scaling [51,53,82], group laws [54,63] or generalized information axiomatics [57,83,84]. From the point of view of statistical inference, the maximum entropy principle is an inference method. Shore and Johnson formulated consistency requirements that the maximum entropy principle should fulfill [55,56]. After a recent debate on validity of axioms [85][86][87][88][89], it has been shown in [26] that the class of entropies that fulfills first three Shore-Johnson axioms can be written in the form S(P) = f( i g(p i )), which is known and sum class of entropies. As recently shown, Shore-Johnson axioms are also equivalent to a generalization of Shannon-Khinchin axioms [57] where the sum-class of entropies fulfills the so-called decomposition principle. For this reason, we use in our analysis the sum-class of entropies since it provides us with enough generality, yet it has a solid physical background.
Furthermore, let us discuss several aspects of generalized entropies relevant to the analysis in the main text.
Non-additivity and non-extensivity: probably the most popular example of generalized entropies is the entropy introduced by C Tsallis which can be defined as: The main property of Tsallis entropy is the non-additivity. Let us consider a system coupled to a heat bath. We consider two disjoint subsystems that are statistically independent. In other words, the joint system is the Cartesian product of the marginal systems {A, B}, which also holds for the respective probabilities. The entropy of the joint system can be expressed in terms of entropies for the subsystems as In general, the entropy of the joint system A ⊗ B can be expressed as where S q (B|A) is the conditional entropy. Closely related, but generally different is the concept of extensivity. Let us consider a system with sample space W(N) where N denotes a characteristic size of the system. An entropy is called extensive if S(W(N)) ∼ N in thermodynamic limit, i.e. N → ∞. While additivity is a property purely depending on the form of entropy, extensivity combines properties of entropy and the system itself. Therefore, for systems with exponentially growing sample space W(N) ∼ 2 N , Tsallis entropy is non-extensive. On the other hand, for systems with polynomially growing sample space W(N) ∼ N ρ , Tsallis entropy is extensive for ρ = 1/(1 − q). Note that there exists some invariance in the sense that several entropies can lead to the same Max-Ent distribution [90]. For example, Tsallis entropy and Rényi entropy are both maximized by the q-exponential distribution. While Tsallis entropy is extensive for systems with polynomial sample-space growth, Rényi entropy is extensive for the exponential sample-space growth. Thus both lead to different thermodynamics. Non-linear systems coupled to an infinite heat reservoir: let us consider the case when a non-linear system is coupled to an infinite heat bath [67]. In this case, the system is driven by a non-linear master equation (e.g. equation (1) from the main text). By considering local detailed balance, we connect the stationary distribution obtained from the infinite relaxation limit to the equilibrium distribution obtained by the maximization of entropy. By considering, e.g. Tsallis entropy, we obtain, similarly to the case of finite heat bath, the q-exponential equilibrium distribution. However, this distribution arises not due statistical correlations between the system and the bath but due to the system's non-linear nature. Since the bath is infinite and therefore uncorrelated with the system, the total entropy is simply the sum of entropy for the system of interest and the bath (note that the bath is always in equilibrium and therefore changes only due to absorbed heat). For more details, see [67] and the references therein. On the other hand, if we divide the system into subsystems, the entropy of the whole system may not be a sum of entropies of subsystems, due to long-range correlations among the subsystems. This is the case, e.g. for Tsallis entropy.
Extension of the second law of thermodynamics: on of the original motivations for introducing generalized entropies was to obtain non-exponential distributions from the maximum entropy principle. However, to obtain thermodynamically consistent entropy á la Clausius that can be used to calculate thermodynamic quantities, it is desirable to extend the validity of the laws of thermodynamics to generalized entropies as well. An extensive discussion about the validity of the second law for the case of generalized entropies has been heated up. A work by Gorban [91] discusses the second law in connection with H-theorem. Its conclusions are surprisingly similar to the results presented in this paper: for a given dynamics described by a master equation, one needs to find a proper information measure such that H-theorem is fulfilled. This is also in agreement with [67]. Specifically, for the case of Tsallis entropy, several works are showing the validity of the second law for particular classes of systems (see, e.g. [92]). In our study, we do not consider a specific system but rather assume validity of the second law. We believe that this assumption is plausible and fulfilled for a wide class of non-linear systems.
Extensivity issue for systems out of equilibrium: let us finally make some note about the extension of the extensivity issue to the case of non-equilibrium systems driven by a master equation. We follow the setup from [93] and consider a system weakly coupled to an infinite heat bath. The system can be decomposed into two disjoint subsystems X ⊗ Y. Let us consider that the total system evolves according to the linear master equation for this moment. It can be expressed aṡ p xy (t) = x y w xy,x y p x y (t).
The evolution of the subsystem X is obtained by coarse-graining over y, i.e.
where W xx := yy w xy,x y p y |x which is Markovian for the case of so-called 'time-scale separation' (see [93] for details).
The situation becomes even more complicated for the case of non-linear equation (1) from the main text since even in the case of time-scale separation, it might not be possible to find a unique decomposition Ω(p xy ) = Ψ(p x|y )Ω (p x ) for some functions Ψ and Ω . This means that while our analysis might be applicable to the whole system, it might not apply to its subsystems. This might not be surprising since the non-linear master/Fokker-Planck equations typically arise from mean-field approximations of many-body systems in the thermodynamic limit. Thus, the rescaling of such systems might not make physical sense or can change the form of the master equation.

Appendix B. Non-linear Markov chains and non-linear master equation
In this appendix we show how to derive a non-linear master equation from a Markov chain with probability-dependent transition rates. More details about non-linear Markov chains and their expression in terms of stochastic processes with probability-dependent transition rates can be found e.g. in [64]. Consider a discrete-time stochastic process whose n-point distribution can be written in a form of non-linear Markov process, i.e.
p(x n , t n ; x n−1 , t n−1 ; . . . Such a process is called 'non-linear Markov chain'. Note that the function W(x j , t j |x j−1 , t j ) cannot be interpreted as transition probability. This can be easily shown e.g. for n = 2 and summing (B1) over x 1 . We get that Now consider a two-point distribution Then, the transition probability T that is defined in the usual way can be expressed as follows: where ω(z) = Ω(z)/z. Thus, the transition probability explicitly depends on the probability distribution p(x 0 , t 0 ), i.e.
Next consider a three point distribution This can be rewritten in terms of transition probabilities as By summation over x 1 , we obtain the Chapman-Kolmogorov equation for non-linear Markov chains: From the self-consistency requirement that we obtain that lim For small times, we can approximate the transition probability by the first-order expansion in terms of By adding a zero term to the right-hand side, we get By plugging this into the Chapman-Kolmogorov equation, we obtain Reorder the terms in this equation and take the limit, to establish that As shorthand write w(x 2 |x 1 ; t 1 ) =T(x 2 |x 1 , p(x 1 , t 1 ))/ω (p(x 1 , t 1 ) = W(x 2 |x 1 ; t 1 ), multiply both sides of Section B by p(x 0 , t 0 ), and integrate over x 0 . Then we get the final equation which is the non-linear master equation from the main text. Note that w(x|y, t) cannot be obtained as the infinitesimal transition probability, but it is rather obtained as w(x|y, t) = lim Proof. It will be useful to introduce the notation w mn Ω(p n ) = J ν mn and C f = f ( m g(p m )). The time derivative of the entropy isṠ We now divide this expression into a sum of two terms: where Φ ν mn is yet undetermined. Next, we focus on the first term, identifying it as the entropy production rate. First, we rewrite it in the more suggestive form: According to requirement 3,Ṡ i is non-negative. Moreover, it is equal to zero only if all currents vanish, i.e. in equilibrium. First, the constant C f is positive because f is increasing. Second, the non-negativity of the sum can be assured by considering non-negativity of each term. From this, we have that Φ ν mn < Φ ν nm if J ν mn < J ν nm and vice versa. Thus, we can conclude that Φ ν mn is a function of J ν mn , i.e. Φ ν mn = φ(J ν mn ). Again, because we can relabel the probabilities, the function φ does not explicitly depend on n, m or ν. Now we focus on the second term, rewriting it in the more suggestive form If the first term is the entropy production, then the second term is equal to entropy flow rate. Entropy flow is equal toṠ Write the difference ofṠ e and the second term as This sum can be positive or negative, depending on p m . Since we require thatṠ i must be non-negative, and vanishing only if the probability currents vanish, the difference must be zero, regardless of p m . Since (J ν mn − J ν nm ) can be positive or negative depending on p, it means that the arguments in the curly brackets must be identically zero. Therefore we get that The left-hand side is now depending on p, which should not be the case. First, it means that C(p) = C f (p). Second, it means that relation in square brackets does not depend on p m , from which we immediately get φ(J ν nm ) = j ν (w ν nm ) − g (p m ) ( C 8 ) or more suggestively J ν nm = ψ(j ν (w ν nm ) − g (p m )), where j ν is an arbitrary function and ψ = φ −1 . Since J ν nm = w nm Ω(p m ) we need to solve the following functional equation ψ(j ν (w ν nm ) − g (p m )) = w ν mn Ω(p m ) for unknown functions ψ and j ν . This is a Cauchy multiplicative functional equation which determines ψ and j ν as ψ(z) = exp(z) and j ν (z) = log(z) (see [65]). By plugging into the equation, we immediately obtain that Ω(p m ) = exp(−g (p m )).
By plugging the form of Ω into the local detailed balance, we immediately obtain that β ν = 1 T ν . This result holds for any g. This concludes the proof.

Appendix D. Fokker Planck thermodynamics for nonlinear master equations
Here we make a connection between the non-linear master and the corresponding Fokker-Planck equation on continuous spaces. Consider a general master equation in the form equation (20). Consider a continuous state space, and rewrite the master equation (20) in the following form: p(x, t) = dr w(x − r|r)Ω(p(x − r; t)) − w(x| − r)Ω(p(x; t)) . (D1) Proceeding in the usual way, expand the first term in terms of (x − r) around r = 0. Zeroth term cancels with the second term in the previous equation and by truncation after the term quadratic in r, we obtain thatṗ (x, t) = − dr r ∂ ∂x [w(x|r)Ω(p(x; t))] where a n (x) = dr r n w(x|r). Thus, we end with the non-linear Fokker-Planck equatioṅ where J = u(x, t)Ω(p(x, t)) + D(x, t)Ω(p(x, t)) ∂ ∂x g (p(x, t)).
This equation is called the free energy nonlinear Fokker-Planck equation [67] due to its connection with thermodynamic quantities, as shown below. For simplicity, consider the case where f(x) = x, i.e. where the entropy has the trace-class form S(t) = dx g(p(x, t)).
Evaluating, the time derivative of the entropẏ The last expression was obtained from integration by parts. By plugging in equation (D4), we geṫ Here we can identify, similarly to the case of Shannon entropy [66], the entropy flow rate aṡ S e (t) = − dx J(p(x, t), x, t)u(x, t) D(x, t) and irreversible entropy production rate aṡ S i (t) = dx J 2 (p(x, t), x, t) D(x, t)Ω(p(x, t)) 0.