Minimal entropy production rate of interacting systems

Many systems are composed of multiple, interacting subsystems, where the dynamics of each subsystem only depends on the states of a subset of the other subsystems, rather than on all of them. I analyze how such constraints on the dependencies of each subsystem’s dynamics affects the thermodynamics of the overall, composite system. Specifically, I derive a strictly nonzero lower bound on the minimal achievable entropy production rate of the overall system in terms of these constraints. The bound is based on constructing counterfactual rate matrices, in which some subsystems are held fixed while the others are allowed to evolve. This bound is related to the ‘learning rate’ of stationary bipartite systems, and more generally to the ‘information flow’ in bipartite systems. It can be viewed as a strengthened form of the second law, applicable whenever there are constraints on which subsystem within an overall system can directly affect which other subsystem.


Introduction
Many systems are naturally modeled as composite systems, with two or more interacting subsystems. For example, a biological cell is naturally modeled as a composite system, composed of many separate organelles and biomolecule species. As another example, digital devices are naturally modeled as a set of separate, interacting logical gates. Recent research in stochastic thermodynamics [7,26,32,38] has started to investigate such composite systems [1,9,11,13,[20][21][22]. Most of this research has considered the special case of bipartite processes [1, 2, 9, 11, 13, 20-22, 27, 28], i.e., systems composed of two co-evolving subsystems, whose states fluctuate according to independent noise processes (e.g., since they are physically separated and so are connected to different parts of any shared thermodynamic reservoirs). However, given that many systems have more than just two interacting subsystems, research is starting to extend to fully multipartite processes [10,13,40].
The definition of any composite system specifies which subsystems directly affect the dynamics of which other subsystems. It is now known that just by itself, such a specification of which subsystem affects which other one can cause a strictly positive lower bound on the entropy production rate (EP) of the overall composite system [4,36,38]. In contrast, if all subsystems were allowed to interact with all other subsystems, the minimal EP would be zero. Accordingly this minimal EP due to constraints on which subsystem can affect the dynamics of which other subsystems has sometimes been called 'Landauer loss' [36][37][38]. Landauer loss can be viewed as a strengthened form of the second law, applicable whenever there are constraints on which subsystem within an overall system can directly affect the dynamics which other subsystem.
Previous analyses of this strengthened second law focused on scenarios where at every moment, each subsystem evolves in isolation, in a 'modular' fashion, without any direct coupling to the other subsystems. This is a severe limitation of those analyses. As an illustration of a simple scenario not covered by such analyses, consider a composite system with three subsystems A, B and C. B evolves independently of A and C. However, B is continually observed by C as well as A. Moreover, suppose that A is really two subsystems, Figure 1. Four subsystems, {1, 2, 3, 4} interacting in a multipartite process. The red arrows indicate dependencies in the associated four rate matrices. B evolves autonomously, but is continually observed by A and C. (The implicit assumption that B is not affected by the back-action of the measurement holds for many real systems such as colloidal particles and macromolecules [24].) So the statistical coupling between A and C could grow with time, even though their rate matrices do not involve one another. The three overlapping sets indicated at the bottom of the figure specify the three units of a unit structure for this process.
1 and 2. Only subsystem 2 directly observes B, whereas subsystem 1 observes subsystem 2, e.g., to record a running average of the values of subsystem 2 (see figure 1).
Physically, such a scenario arises whenever any of the many stochastic thermodynamics models of one classical system observing another classical system without any back-action [12,20,23,27,33,34] are 'chained together'. As an example, [3,9] considers a tripartite system where receptors in the wall of a cell observe the concentration level of a ligand in a surrounding medium, with no back-action on that concentration level, while a memory observes the state of those receptors, again with no back-action. This is exactly the scenario considered in figure 1, just without subsystem 4; subsystem 3 is the concentration level in the medium, subsystem 2 is the set of receptors in a cell observing that concentration level, and subsystem 1 is the memory within the cell observing the state of the ligand receptors. To extend this scenario to the precise scenario presented in figure 1 we just need to introduce a second cell, which is observing the same medium as the first cell; subsystem 4 is the state of the receptors of that second cell.
To investigate the second law in such composite systems, here I model them as multipartite processes, in which each subsystem evolves according to its own rate matrix [10]. So restrictions on the direct coupling of any subsystem i to the other subsystems are modeled as restrictions on the rate matrix of subsystem i, to only involve a limited set of other subsystems, called the 'unit' of i.
In this paper I derive a lower bound on the EP of composite systems, by deriving an exact equation for that minimal EP rate as a sum of non-negative expressions. One of those expressions is related to quantities that were earlier considered in the literature. It reduces to what has been called the 'learning rate' in the special case of stationary bipartite systems [1,5,9]. That expression is also related to what (in a different context) has been called the 'information flow' between a pair of subsystems [10,11].

Rate matrix units
I write N for a particular set of N subsystems, with finite state spaces {X i : i = 1, . . . N}. x and x both indicate a vector in X, the joint space of N . For any A ⊂ N , I write −A := N \ A. So for example x −A is the vector of all components of x other than those in A. A distribution over a set of values x at time t is written as p X (t), with its value for x ∈ X written as p X x (t), or just p x (t) for short. Similarly, p X|Y x,y (t) is the conditional distribution of X given Y at time t, evaluated for the event X = x, Y = y (which I sometimes shorten to p x|y (t)). I write Shannon entropy as S(p X (t)), S t (X), or S X (t), as convenient. I also write the conditional entropy of X given Y at t as S X|Y (t). I write the Kronecker delta as both δ(a, b) or δ a b . Since the joint system evolves as a multi-partite process, there is a set of time-varying stochastic rate matrices, {K x x (i; t) : i = 1, . . . , N}, where x refers to the current state of the system, while x refers to the next state; for all i, K x x (i; t) = 0 if x −i = x −i ; and the joint dynamics over X is governed by the master equation Note that each subsystem can be driven by its own external work reservoir, according to a time-varying protocol. For any A ⊆ N I define Each subsystem i's marginal distribution evolves as due to the multipartite nature of the process [15]. Equation (5) shows that in general the marginal distribution p x i will not evolve according to a continuous-time Markov chain (CTMC) over Δ X i . For each subsystem i, I write r(i; t) for any set of subsystems at time t that includes i where we can write for an appropriate set of functions K x r(i;t) x r(i;t) (i; t). In general, r(i; t) is not uniquely defined, since I make no requirement that it be minimal. (A minimal such r(i; t) is called a 'neighborhood' in [10].) I refer to the elements of r(i; t) as the leaders of i at time t. Note that the leader relation need not be symmetric. A unit ω at time t is a set of subsystems such that i ∈ ω implies that r(i; t) ⊆ ω.
Any intersection of two units is a unit, as is any union of two units. Intuitively, a unit is any set of subsystems whose evolution is independent of the states of the subsystems outside the unit (although in general, the evolution of those external subsystems may depend on the states of subsystems in the unit). A specific set of units that covers N and is closed under intersections is a unit structure. Unless explicitly stated otherwise, any unit structure being discussed does not have N itself as a member.
As an example of these definitions, [1,8,9] investigate a special type of bipartite system, where the 'internal' subsystem B observes the 'external' subsystem A, but cannot affect the dynamics of that external subsystem. So A is its own unit, evolving independently of B, while B is not its own unit; its dynamics depends on the state of A as well as its own state. Another example of these definitions is illustrated in figure 1.
For simplicity, from now on I assume that the set of units does not change with t. Accordingly I shorten r(i; t) to r(i). For any unit ω I write So (3) and (6). At any time t, for any unit ω, p x ω (t) evolves as a CTMC with rate matrix K (See appendix C.) So a unit evolves according to a self-contained CTMC, in contrast to the general case of a single subsystem (cf. equation (5)). I assume that each subsystem is attached to at most one thermal reservoir, and that all such reservoirs have the same temperature (or equivalently, that all subsystems are attached to separate, statistically independent parts of the same thermal reservoir [10]). Accordingly, the expected entropy flow (EF) rate of any unit ω ⊆ N at time t is the sum of the EFs of the subsystems in ω: which I often shorten to Q ω (t) [7,32]. (Note that this is EF from ω into the environment.) Make the associated definition that the expected EP rate of ω at time t is which I often shorten to σ ω (t) . I refer to σ ω (t) as a local EP rate, and define the global EP rate as σ(t) := σ N (t) . For any unit ω, σ ω (t) 0, since σ ω (t) has the usual form of an EP rate of a single system. In addition, that lower bound of 0 is achievable, e.g., if K x ω Note also that while the EF of a unit is the sum of the EFs of the subsystems within the unit, in general that is not true for the EP. It is worth comparing the local EP rate to similar quantities that have been investigated in the literature. In contrast to σ ω (t) , the quantity 'σ X ' introduced in the analysis of (autonomous) bipartite systems in [28] is the EP of a single trajectory, integrated over time. More importantly, its expectation can be negative, unlike (the time-integration of) σ ω (t) . On the other hand, the quantity 'Ṡ X i ' considered in the analysis of bipartite systems in [11] is a proper expected EP rate, and so is non-negative. However, it (and its extension considered in [10]) is one term in a decomposition of the expected EP rate generated by a single unit. It does not concern the EP rate of an entire unit in a system with multiple units. In addition, the quantity 'σ Ω ' considered in [28] is also non-negative. However, it gives the total EP rate generated by a subset of all possible global state transitions, rather than the EP rate of a unit [16]. Finally, papers on the 'thermodynamics of information' considers scenarios in which two subsystems interact but the second one does not change its state [20,23,25]. Most of those papers use a variant of the usual definition of 'entropy production' whose expectation can be strictly negative, in contrast to σ ω (t) [17].

EP bounds from counterfactual rate matrices
To analyze the minimal EP rate in multipartite processes, I need to introduce two more definitions. First, given any function f : Δ X → R and any A ⊂ N (not necessarily a unit), define the A-(windowed) derivative of f(p(t)) under rate matrix K(t) as (See equation (3).) Intuitively, this is what the derivative of f(p(t)) would be if (counterfactually) only the subsystems in A were allowed to change their states.
In particular, the A-derivative of the conditional entropy of X given X A is which I sometimes write as just d A S X|X A p(t)/dt. d A S X|X A p(t)/dt measures how quickly the statistical coupling between X A and X −A changes with time, if rather than evolving under the actual rate matrix, the system evolved under a counterfactual rate matrix, in which x −A is not allowed to change. In the special case of two subsystems, both leading each other, with A one of those subsystems, −d A S X|X A p(t)/dt is the same as the 'information flow' analyzed in [11]. (See equation (4) in [10] for a generalization of information flow to multiple subsystems that is similar to the general expression −d A is the derivative of the negative mutual information between X A and X −A , under that counterfactual rate matrix. In appendix C it is shown that in the special case that A is a unit, this expression is non-negative [18].
The second definition we need is a variant of σ ω;K (t) , which will be indicated by using subscripts rather than superscripts. For any A ⊆ B ⊆ N where B is a unit (but A need not be), which I abbreviate as σ K(A;t) (t) when B = N . σ K(A;t) (t) is a global EP rate, only evaluated under the counterfactual rate matrix K(A; t). Therefore it is non-negative. In contrast, σ ω;K (t) is a local EP rate. In the special case that A = ω is a unit, these two EP rates are related by In appendix D it is shown that for any pair of units, ω and ω ⊂ ω, (See figure 1 for an illustration of such a pair of units ω, ω ⊂ ω.) The first term on the rhs is the EP rate arising from the subsystems within unit ω , and the second term is the 'left over' EP rate from the subsystems that are in ω but not in ω . The third term is a time-derivative of the conditional entropy between those two sets of subsystems. All three of these terms are non-negative, so each of them provides a lower bound on the EP rate. Equation (15) is the major result of this paper. It holds no matter what the scale of the full system, so long as that system can be modeled as a multipartite process. In particular, setting ω = N and then consolidating notation by rewriting ω as ω, equation (15) shows that for any unit ω, (where the shorthand notation has been used). This lower bound on the expected EP rate can be evaluated without knowing any of the detailed physics occurring within unit ω, only knowing how the statistical coupling between ω and the rest of the subsystems evolves with time.
As an example of equation (17), consider again the type of bipartite process analyzed in [1,8,9]. Suppose we set ω to contain only what in [8] is called the 'external' subsystem. Then if we also make the assumption of those papers that the full system is in a stationary state, dS Therefore by equation (13), (The rhs of equation (19) is called the 'learning rate' of the internal subsystem about the external subsystem-see equation (8) in [5], noting that the rate matrix is normalized.) So in this scenario, equation (17) above reduces to equation 7 of [1], which lower-bounds the global EP rate by the learning rate. However, equation (17) lower-bounds the global EP rate even if the system is not in a stationary state, which need be the case with the learning rate [19]. More generally, equation (16) applies to arbitrary multipartite processes, not just those with two subsystems, and is an exact equality rather than just a bound.

Extensions
In some situations we can get an even more refined decomposition of EP rate by substituting equation (15) into equation (16) to expand the first EP rate on the rhs of equation (16). This gives a larger lower bound on σ(t) than the one in equation (17). For example, if ω and ω ⊂ ω are both units under K(t), then Both of the terms on the rhs in equation (21) are non-negative. In addition, both can be evaluated without knowing the detailed physics occurring within units ω or ω , only knowing how the statistical coupling between units evolves with time. This can be illustrated with the scenario depicted in figure 1. Using the units ω and ω specified there, equation (21) says that the global EP rate is lower-bounded by the sum of two terms. The first is the derivative of the negative mutual information between subsystem 4 and the first three subsystems, if subsystem 4 were held fixed. The second is the derivative of the negative mutual information between subsystem 3 and the first two subsystems, if those two subsystems were held fixed.
Alternatively, suppose that ω is a unit under K, and that some set of subsystems α is a unit under K(N \ ω; t). Then since the term σ K(N \ω;t) (t) in equation (16) is a global EP rate over N under rate matrix K(N \ ω; t), we can again feed equation (15) into equation (16), [this time to expand the second rather than first term on the rhs of equation (16)] to get The rhs of equation (23) also exceeds the bound in equation (17), by the negative α-derivative of the mutual information between X N \α and X α , under the rate matrix K(N \ ω; t).
Note that depending on the full unit structure, we may be able to combine equations (15) and (22) into an even larger lower bound on the global EP rate than equation (23). An example of this is illustrated below, in section 5. (Indeed, the more subsystems the overall system contains, the more times one might be able to iterate this process, getting progressively larger and larger lower bounds.) As a final comment, σ is just the conventional expected EP rate of stochastic thermodynamics [32]. So it obeys all of the usual fluctuation theorems, thermodynamic uncertainty relations, speed-limit theorems, etc. directly, without any modification. (As mentioned at the end of section 2, this is not true for some of the other quantities referred to as 'entropy production' in the literature.) So for example, As a final comment, σ is the standard expected EP considered in the literature, i.e., the rate of increase of entropy in the combination of the full system and all baths that are connected to it. As a result, the powerful results of stochastic thermodynamics concerning entropy production-fluctuation theorems, thermodynamic uncertainty relations, speed-limit theorems, etc.-all hold when the EP is set to σ . (Those results do not all hold for some of the variants of entropy production discussed at the end of section 2.) As an illustration, write the EP of unit ω during some time interval as Δσ ω = dt σ ω , and define Δσ K(N \ω) similarly. Then plugging equation (17) into the usual integral fluctuation theorem gives (recall that the windowed derivative in the integrand in the exponential is not a proper time-derivative).
Using the non-negativity of EP during the process, equation (25) gives This inequality characterizes how the mutual information between ω and the rest of the system can vary during the process, independent of all details of the physical process besides the fact that ω is a unit.

Example
To illustrate some of the results above, return to the physical scenario depicted in figure 1, in which there are two distinct cells in a medium both observing the concentration level of a ligand in that medium. Recall that subsystem 3 is the concentration level, subsystem 2 is the set of receptors in the first cell observing that concentration level, and subsystem 1 is the memory within that first cell which observing the state of that cell's ligand receptors.
since in fact, under the rate matrix K x x ({4}; t), neither subsystem 1, 2 nor 3 changes its state. So α is a member of a unit structure of K(N \ ω; t), and we can apply equation (22).
The first term in equation (22), σ ω (t) , is the local EP rate that would be jointly generated by the set of three subsystems {1, 2, 3}, if they evolved in isolation from the other subsystem, under the self-contained rate matrix The third term in equation (22) is the local EP rate that would be jointly generated by the two subsystems {3, 4}, if they evolved in isolation from the other two subsystems, but rather than do so under the rate matrix K(α; t) = K({3, 4}; t), they did so under the rate matrix K x x (N \ ω; t) given in equation (26). 4}; t).) The fourth term in equation (22) is the global EP rate that would be generated by evolving all four subsystems under the rate matrix for the subsystems in (N \ ω) \ α. But there are no subsystems in that set. So this fourth term is zero.
Those first, third and fourth terms in equation (22) are all non-negative. The remaining two terms, the second and the fifth, are also non-negative. However, in contrast to the terms just discussed, these two depend only on derivatives of mutual informations. Specifically, the second term in equation (22) is the negative derivative of the mutual information between the joint random variable X 1,2,3 and X 4 , under the rate matrix K x x ({1, 2, 3}; t). Next, since N \ α = {1, 2}, the fifth term is the negative derivative of the mutual information between X 1,2 and X 3,4 , under the rate matrix given by windowing α onto K(N \ ω; t), i.e., under the rate matrix K x x ({4}; t). Recalling that ω := {1, 2, 3}, α := {3, 4} and defining γ := {4}, we can combine these results to express the global EP rate of the system illustrated in figure 1 in terms of the rate matrices of the four subsystems: All four terms on the rhs of equation (28) are non-negative. Translated to this scenario, previous results concerning learning rates consider the special case of a stationary state p x (t), and only tell us that the global EP rate is bounded by the fourth term on the rhs of equation (28): Finally note that we also have a unit ω = {3} which is a proper subset of both ω and α. So, for example, we can plug this ω into equation (15) to expand the first term in equation (22), σ ω;K(ω;t) (t) , replacing it with the sum of three terms. The first of these three new terms, σ ω ;K(ω;t) (t) , is the local EP rate generated by subsystem {3} evolving in isolation from all the other subsystems. The second of these new terms, σ K(ω\ω ;t);ω (t) , is the EP rate that would be generated if the set of three subsystems {1, 2, 3} evolved in isolation from the remaining subsystem, 4, but under the rate matrix The third new term is the negative derivative of the mutual information between X 1,2 and X 3 , under rate matrix K(ω ; t). All three of these new terms are non-negative.

Discussion
There are other decompositions of the global EP rate which are of interest, but do not always provide non-negative lower-bounds on the EP rate. One of them, discussed in appendix E, generalizes the results in [38] which relate 'subsystem Landauer loss' to multi-information. Future work involves combining these (and other) decompositions, to get even larger lower bounds.

Appendix A. Proof of equation (8)
Write dp If j / ∈ ω, then a sum over all x −ω in particular runs over all x j . Therefore we get Using the fact that we have a multipartite process and then the fact that ω is a unit, we can expand this remaining expression as To complete the proof plug in the definition of K x ω x ω (ω; t).

Appendix B. Expansions of EP rates in multipartite processes
This appendix derives an expansion of EP rates that is used in the main text and the other appendices.

Lemma 1. Suppose we have a multipartite process over a set of systems N defined by a set of rate matrices
{K x x (i; t)} and a subset A ∈ N . Then .

(B2)
If in addition A is a unit under K, then we can also write the quantity in equation (B1) as Proof. Invoking the multipartite nature of the process allows us to write D H Wolpert To establish equation (B3), use the hypothesis that A is a unit to expand

Appendix C. Proof that if A is a unit, then d A dt S X|X A (t) 0
First simplify notation by using P rather than p to indicate joint distributions that would evolve if K(t) were replaced by the counterfactual rate matrix K(A; t), starting from p x (t). By definition, However, since by hypothesis A is a unit, Plugging this into equation (C1), summing both sides over x A (t + δt), and using the normalization of K(A; t) shows that to leading order in δt, Equation (C3) in turn implies that to leading order in δt, This formalizes the statement in the text that under the rate matrix K(A), x −A does not change its state. Next, since A is a unit under K(A; t), we can expand further to get So the full joint distribution is I can use this form of the joint distribution to establish the following two equations Applying the chain rule for entropy to decompose S P (X −A (t), X −A (t + δt)|X A (t + δt)) in two different ways, and plugging equations (C8) and (C9), respectively, into those two decompositions, we see that (C10) Next, use equation (C10) to expand Add and subtract S P (X −A (t)) in the numerator on the rhs to get d A;K dt S X|X A (t) = − lim δt→0 I P (X −A (t); X A (t + δt)) − I P (X −A (t); X A (t)) δt .
Since X −A (t) and X A (t + δt) are conditionally independent given X A (t), we have a Markov chain X −A (t) ↔ X A (t) ↔ X A (t + δt). So we can apply the data-processing inequality [6] to establish that the difference of mutual informations in the numerator on the rhs of equation (C13) is non-positive. This completes the proof.

Appendix D. Proof of equation (15)
For simplicity of the exposition, treat ω as though it were all of N , i.e., suppress the ω index in x ω and x ω , suppress the ω argument of K(ω; t), and implicitly restrict sums over subsystems i to elements of ω. Then using the definition of K(ω ; t), we can expanḋ .
(D2) Example 2. Equation (E9) can be particularly useful when combined with the fact that for any two units ω, ω ⊂ ω, σ ω σ ω (see equation (15)). To illustrate this, return to the scenario of figure 1. There are three units in N 1 (namely, {1, 2 Note that the derivatives in equation (E20) are conventional, non-windowed derivatives; none of the terms in equation (E20) involve counterfactual rate matrices. All of the local EPs in equation (E19) can equal 0, in a quasi-static process. In addition, σ(t) 0. Therefore dS 4|1,2,3 (t)/dt − dS 4|3 (t)/dt 0, i.e., As a final comment, it is worth noting that in contrast to multi-information, in some situations the in-ex information can be negative. (In this it is just like some other extensions of mutual information to more than two variables [14,31].) As an example, suppose N = 6, and label the subsystems as N = {12, 13, 14, 23, 24, 34}. Then take N 1 to have four elements, {12, 13, 14}, {23, 24, 12}, {34, 13, 23} and {34, 24, 14}. (So the first element consists of all subsystems whose label involves a 1, the second consists of all subsystems whose label involves a 2, etc.). Also suppose that with probability 1, the state of every subsystem is the same. Then if the probability distribution of that identical state is p, the in-ex information is −S(p) + 4S(p) − 6S(p) = −3S(p) 0.