Strengthened second law for multi-dimensional systems coupled to multiple thermodynamic reservoirs

The second law of thermodynamics can be formulated as a restriction on the evolution of the entropy of any system undergoing Markovian dynamics. Here I show that this form of the second law is strengthened for multi-dimensional, complex systems, coupled to multiple thermodynamic reservoirs, if we have a set of a priori constraints restricting how the dynamics of each coordinate can depend on the other coordinates. As an example, this strengthened second law (SSL) applies to complex systems composed of multiple physically separated, co-evolving subsystems, each identified as a coordinate of the overall system. In this example, the constraints concern how the dynamics of some subsystems are allowed to depend on the states of the other subsystems. Importantly, the SSL applies to such complex systems even if some of its subsystems can change state simultaneously, which is prohibited in a multipartite process. The SSL also strengthens previously derived bounds on how much work can be extracted from a system using feedback control, if the system is multi-dimensional. Importantly, the SSL does not require local detailed balance. So it potentially applies to complex systems ranging from interacting economic agents to co-evolving biological species. This article is part of the theme issue ‘Emergent phenomena in complex physical and socio-technical systems: from cells to societies’.

http://davidwolpert.weebly.com The second law of thermodynamics can be formulated as a restriction on the evolution of the entropy of any system undergoing Markovian dynamics.Here I show that this form of the second law is strengthened for multi-dimensional, complex systems, coupled to multiple thermodynamic reservoirs, if we have a set of a priori constraints restricting how the dynamics of each coordinate can depend on the other coordinates.As an example, this strengthened second law (SSL) applies to complex systems composed of multiple physically separated, co-evolving subsystems, each identified as a coordinate of the overall system.In this example, the constraints concern how the dynamics of some subsystems are allowed to depend on the states of the other subsystems.Importantly the SSL applies to such complex systems even if some of its subsystems can change state simultaneously, which is prohibited in a multipartite process.The SSL also strengthens previously derived bounds on how much work can be extracted from a system using feedback control, if the system is multi-dimensional.Importantly, the SSL does not require local detailed balance.So it potentially applies to complex systems ranging from interacting economic agents to coevolving biological species.

Introduction
Statistical physics concerns experimental scenarios where we have restricted information concerning the state of a system x ∈ X, which is quantified as a probability distribution over those states, px(t).In particular, the recently developed variant of statistical physics called "stochastic thermodynamics" concentrates on systems that evolve according to a continuous time Markov chain (CTMC).For a countable state space, this means that px(t) evolves according to a linear differential equation, (Note that the rate matrix K(t) can depend on time t.) Analyzing systems which evolve according to Eq. (1.1) has led to formulations of the second law of thermodynamics which apply even if the system is evolving while arbitrarily far out of thermal equilibrium [35,42].If we apply one of these formulations of the second law to any system evolving according to Eq. (1.1) while coupled to a single (infinite) heat bath at temperature T , and assume that the rate matrix is related to an underlying Hamiltonian via local detailed balance (LDB), we get where Q is the total heat flow into the system from its heat bath during the dynamics, and ∆S is the change in Shannon entropy of the system during the process.If LDB does not hold, Eq. (1.2) will not hold either, if we wish to interpret Q as thermodynamic heat flow.However, for any rate matrix, regardless of whether it obeys LDB, (for a process lasting from time t i to t f ).The quantity on the LHS of Eq. (1.3) is called the total expected entropy flow (EF) into the system during the process.The difference between the entropy change of the system (the RHS of Eq. (1.3)) and the EF is called the entropy production (EP), written as σ.So Eq. (1.3) can be re-expressed as Crucially, the inequality Eq. (1.4) holds for any CTMC, even a CTMC that has no thermodynamic interpretation, i.e., a CTMC which models a process that does not involve energy transduction.So Eq. (1.4) applies to dynamic models of everything from stock markets to the evolution of the joint state of an opinion network, so long as those models are CTMCs.In many experimental scenarios, while we are restricted in the information we have concerning the system's state, we also have some other information that does not directly concern the system's state, in the form of conditions satisfied by the dynamics of the system.Recently Eq. (1.4) has been strengthened, by adding non-positive terms to the RHS that incorporate this kind of information concerning the dynamics.Examples of these new results include "thermodynamic uncertainty relations" (TURs [9,15,21,23]), "speed limit theorems" (SLTs [11,28,36,43,50]), "thermodynamic first passage bounds" [8,10,26,27,30], etc.
Unlike Eq. (1.4) though, these bounds require measuring variables as they change during the process, in addition to knowing the beginning and ending distributions, px(t i ) and px(t f ).(For example, TURs rely on measuring accumulated currents, and speed limit theorems rely on measuring integrated activity.)This limits their experimental applicability.
In this paper, I derive new strengthened forms of Eq. (1.4) that, like the TURs and speed limit theorems, incorporate information concerning the dynamics of the system.However, unlike the TURs, speed limit theorems, etc., these new strengthened forms of Eq. (1.4) do not require measuring variables as they change during the process.dependencies in rate matrix of the overall system.So for example B evolves autonomously, but is continually observed by A and C. (The implicit assumption that B is not affected by the back-action of the observation holds for many real systems such as colloidal particles and macromolecules [33].)Note that the statistical coupling between A and C could grow with time, even though the rate matrix does not directly couple their dynamics.The three overlapping sets indicated at the bottom of the figure specify the three units of a unit structure for this process, as discussed in the text.As an illustration of some of the definitions below, there is one reservoir coupled to the system that has subsystem 2 as its puppet set, with both subsystems 2, 3 as its leader set.
These strengthened forms of Eq. (1.4) apply whenever we have information about which of the coordinates of the system can have their dynamics directly depend on which of the other coordinates.Formally, such information takes the form of constraints on the rate matrix K(t) of the CTMC governing the dynamics of the system.(See also [20].)I call this kind of restriction on the allowed dynamics a "dependency constraint".
As an example, consider a random walker over a two-dimensional finite lattice, Y 1 × Y 2 .For simplicity take |Y 1 | = |Y 2 | = LN for two positive integers L, N .The lattice is coarse-grained into a set of N 2 non-overlapping squares each of size L × L, and the position of the walker in the lattice is represented three-dimensionally, by a pair of coordinates X 1 = {1, . . ., L}, X 2 = {1, . . ., L} and an integer X 3 ∈ {1, . . ., N 2 }.(The value x 3 ∈ X 3 specifies the precise coarse-grained square, while (x 1 , x 2 ) ∈ X 1 × X 2 specifies the coordinates within that square.)In addition to position in the lattice, the walker has internal stores of two nutrients, A and B, specified (up to some coarsegraining) by values in the finite sets X A and and X B , respectively.So the state space of the walker is X = X A × X B × 3 i=1 X i , i.e., those five variables are the five coordinates of the walker.We can suppose that both x 1 and x 2 evolve autonomously, independently of all other variables, according to two associated rate matrices, i.e., the walker engages in two independent random walks, one in each of the two directions across the lattice.Note though that x 3 's dynamics will depend on x 1 and x 2 in general, and that there will sometimes be simultaneous transitions of x 3 and some other coordinate.For example, suppose x 1 = L, so the walker is at the extreme value of X 1 within some square, adjacent to the next coarse-grained square.Suppose as well that in the next step, the walker moves into that adjacent square.So simultaneously x 1 changes to 1 while x 3 must also change, since the coarse-grained square changes.However, for other changes in x 1 , x 3 remains unchanged.
We can also suppose that the dynamics of x A ∈ X A depends only on the walker's current position in X 1 and their current amount of x A , i.e., it depends only on (x A , x 1 ).Similarly, the dynamics of x B ∈ X B depends only on (x B , x 2 ).(For example, this would be the case if densities of those two nutrients were arranged appropriately across the lattice, and the walker at a given location accumulates those nutrients based on their densities at that location.)Summarizing, the dependency constraints are that x A depends only on x 1 (in addition to depending on its own state), x B depends only on x 2 (in addition to its own state), x 1 and x 2 are autonomous, while x 3 can depend on x 1 and / or x 2 (in addition to itself).
There are many other dynamic processes over a state space which obey a set of dependency constraints among the coordinates of the state space, but those coordinates do not specify characteristics of a single agent like a walker traversing a lattice.For this special issue on the topic of 'Emergence', perhaps the most important type of system that evolves subject to dependency constraints is a system that comprises a set of physically separated subsystems, co-evolving with one another, with each subsystem's state being identified as a different coordinate [16,46,47].In this kind of system, dependency constraints governing the dynamics of each coordinate, specifying which other coordinates can directly affect its dynamics, amount to constraints on the dynamics of each subsystem, specifying which other subsystems can directly affect its dynamics.As a concrete illustration, consider the scenario investigated in [2,13], in which receptors in the wall of a cell sense the concentration of a ligand in the intercellular medium, and those receptors are in turn observed by a "memory" subsystem inside the cell.Modify this scenario by introducing a second cell, which is observing the same external medium as the first cell.Assume that the cells are far enough apart physically so that their dynamics are independent of one another.This gives us the precise scenario in Fig. 1, where subsystem 3 is concentration in the external medium, subsystem 2 is the state of the receptors of the first cell, subsystem 1 is the memory subsystem of the first cell, and subsystem 4 is the state of the receptors of the second cell.
In this paper I consider how a set of dependency constraints on an evolving system affect its thermodynamics.My main result shows how such a set of dependency constraints can strengthen Eq. (1.4), by adding an expression to its RHS.This expression involves only those dependency constraints and the starting and ending distribution of the system.As a caveat, this new lower bound on EP is not always positive, i.e., it is not always stronger than the conventional second law, Eq. (1.4).However, I show below that for any set of dependency constraints, there is a conditional distribution p x(t f ) | x(t i ) that can be implemented by a rate matrix obeying those constraints, together with an initial distribution p(x(t i )), such that every rate matrix that implements that conditional distribution must result in a non-negative EP when applied to that initial distribution.Indeed, for some sets of dependency constraints, this new EP bound is stronger than the conventional second law no matter what p(x(t i )) and p x(t f ) | x(t i ) are (so long as p x(t f ) | x(t i ) is consistent with the dependency constraints).Some of the TURs, SLTs, etc., rely on the dynamics obeying LDB.LDB is not required for the new extension of the second law derived here.This means that (for example) this new extension applies to multipartite systems that have "directed" (sometimes called "non-reciprocal") interactions rather than undirected interactions among the subsystems, i.e., interactions in which there is exactly zero back-action [12,13,22,24,29,31,34,44]. Very often, these systems violate strict LDB, and so their thermodynamic analyses are, at best, approximations.(See discussion in appendix in [46] of some conditions that justify this approximation.)In contrast, the result derived below applies exactly to any scenario where there is no back-action, with no approximation.In addition, this result holds even if the dynamics allows multiple coordinates to change simultaneously.In particular, in the special case that each coordinate is a separate subsystem, the result does not require that the dynamics be a multipartite process [16].
Due to these relaxations of the assumptions made in conventional stochastic thermodynamics, the results below are not restricted to thermodynamic systems, involving energy transduction.The results hold for any CTMC, even if the rate matrix does not reflect physically coupling between the system and one or more external thermodynamic reservoirs, as it does in conventional applications of stochastic thermodynamics [42].
However, the strengthened second law derived below has special physical significance in the common scenario where the dependency constraints arise because the system's dynamics is governed by coupling with external reservoirs, and there are restrictions on that coupling.For example, a common physical scenario is where the system has multiple subsystems, and each subsystem is coupled to a physically distinct part of a shared reservoir.Due to the physical separation of those parts of the reservoir, each connected to a different subsystem, the usual assumption of time-scale separation between the dynamics of the overall system and that of the reservoirs means that the different subsystems are effectively coupled to independent reservoirs from one another. 1Such systems evolve as a multipartite process (MPP), in which no transitions are allowed in which more than two subsystems change their states exactly simultaneously [16].If the system is an MPP, and the dynamics of each subsystem obeys LDB, then we can use stochastic thermodynamics to identify various attributes of that dynamics with experimentally measurable thermodynamic quantities [1,4,12,16,17].More generally, there are systems with multiple coordinates that aren't usually viewed as separate "subsystems", but where the global dynamics arises due to the system's coupling with thermodynamic reservoirs, and where each reservoir is only coupled to a single coordinate.These systems can also be modeled as MPPs, and analyzed accordingly.
Generalizing further, there are other kinds of systems that also have multiple coordinates, where the global dynamics arises due to the system's coupling with thermodynamic reservoirs, just like in an MPP.Also like in an MPP, each reservoir in these systems is only coupled to a proper subset of the coordinates, which results in dependency constraints.In contrast to an MPP however, some reservoirs are coupled to more than one coordinate.As an example, as stated in [6]: "Fluctuations in biochemical networks, e.g. in a living cell, have a complex origin that precludes a description of such systems in terms of bipartite or multipartite processes, as is usually done in the framework of stochastic and/or information thermodynamics."The strengthened second law I present below applies to these generalized forms of MPPs as well as to MPPs.
In the next section I formalize dependency constraints as restrictions on the rate matrix of a CTMC.This is followed by a section in which I use this formalization to derive an expression for the EP of system that involves the triple of {the rate matrix dependency constraints, the initial distribution over states, the final distribution over states}, together with certain other factors.In the following section I derive a lower bound on that expression for EP which depends only on the triple of {dependency constraints, initial distribution, final distribution}, without those other factors.In particular, this lower bound does not depend on any properties of the rate matrix, other than the dependency constraints.This lower bound is my main result.In the following section this main result to analyze how the thermodynamics of feedback control [20,29,31] changes when we know that the system being controlled obeys a given set of dependency constraints.In the following section I present a set of examples of my main result.I end with some discussion, in particular of the relation of the new result to other results in the literature.

Rate matrix unit structures
I begin by defining notation.First, I write the state space of the system as X = N i=1 X i , where each finite state space X i is a coordinate of the system.I write the set of N coordinates as N .As examples, each coordinate could specify the state of a physically separate subsystem of the overall system, or it could specify a position on one axis of a lattice, or it could indicate a degree of freedom in a multi-scale specification of the state of the system.
The distribution px(t) over the states of the system is assumed to evolve according to a continuous time Markov chain (CTMC) 2 .For any A ⊂ N , I write −A := N \ A. So for example, x −A is the vector of all components of x other than those in A. For any set L, ∆ L is the associated of the implicit dynamics of the thermodynamic reservoirs that are coupled to the system [35].It does not concern the timescales of the dynamics of the different coordinates of the system.For analysis of the latter kind of time-scale separation in the special case of a bipartite system, see [5], and for a more general analysis, see [39]. 2 Note that assuming the state space of the system is a Cartesian product does not limit the applicability of the analysis.
Suppose that the set of physically allowed states are a subset of such a Cartesian product, Y ⊂ i Xi, but that Y is not itself such a Cartesian product.We can model such a scenario using the Cartesian product state space X = i Xi, simply by restricting the rate matrix of the CTMC so that there is zero probability of going from a state in Y to a state in X \ Y .unit simplex, and |L| is the number of elements in L. In addition, for any function f (p), I write ∆f := f (p(t f )) − f (p(t i )).The set of bits is B = {0, 1}.I write the Kronecker delta as δ(a, b).For any family of sets, A = {a 1 , a 2 , . ..},I define ∪A = a 1 ∪ a 2 ∪ . ... A distribution over a set of values x at time t is written as p X (t), with its value for x ∈ X written as either p(x(t)) or px(t), as convenient.Similarly, I write p(x(t) | x(t ′ )) for the conditional distribution of the state at time t given the state at time t ′ , etc.I write Shannon entropy as S(p X (t)), S t (X), or S X (t), depending on which would result in the cleanest equations, and write mutual information between two random variables F, G as I(F ; G).
The distribution over the overall system evolves according to the global rate matrix K(t), as given by Eq. (1.1).A unit ω ⊆ N at time t is a set of coordinates such that as the full system evolves according to K(t), the marginal distribution px ω evolves according to the CTMC for all p, for some associated rate matrix K(ω; t).Intuitively, a unit is any set of coordinates whose evolution is independent of the states of the coordinates outside the unit.Since the dynamics of a unit is given by a self-contained CTMC, all the usual theorems of stochastic thermodynamics apply to any unit, e.g., the second law [42], speed limit theorems [36,40], and some of the fluctuation theorems [19].Any union of units is a unit.In addition, it is proven in App.A that any nonempty intersection of units is a unit.Note that since the dynamics of the full system is a CTMC, Eq. (2.1) applies with ω set to all coordinates in the system.So N is a unit.Note also that in general, the evolution of a coordinate i lying outside of a unit ω may depend on the states of coordinates j lying inside ω, even though the reverse is impossible by definition.
As an example, in Fig. 1, subsystem 3 is its own unit, evolving independently of subsystems 2 and 1.In contrast, none of the other three subsystems are their own unit.(For example subsystem 2's dynamics depends on the state of 3.) A set of units defined over a set of coordinates N is called a unit structure if it obeys the following properties [46,47]: (i) The union of the units in the unit structure equals all of N .(ii) The unit structure is closed under intersections of its units.I will generically write any particular unit structure defined over N as N * .
I will sometimes say that N * represents the set of coordinates N .Also, in general for any given rate matrix there are sets of coordinates A ⊂ N which are not unions of units, and so cannot be represented by any unit structure.On the other hand, one can always construct a rate matrix that will implement any hypothesized unit structure over a set of coordinates, i.e., all unit structures can actually exist, for some appropriate rate matrix.(At worst, one can do this by choosing a rate matrix in which each coordinate evolves autonomously, i.e., a rate matrix that is a sum over all coordinates of independent rate matrices for each of those coordinates.)Not all collections of units is a unit structure though; one can form a collection of units that contains two units ω, ω ′ , but not the unit ω ∩ ω ′ , and so that collection won't be a unit structure.
For simplicity, from now on I assume that the unit structure doesn't change with t.In addition, I define a conditional distribution for the ending joint state given an initial joint state, p x(t f ) | x(t i ) , to be consistent with a specified unit structure if there is some rate matrix that obeys that unit structure and that implements p x(t f ) | x(t i ) .The dynamics of any two units ω, α ⊂ ω must be compatible with one another, i.e., for all px ω (t) = px α,xω\α (t), In particular Eq.(2.2) must hold for ω\α , then the two rate matrices in Eq. (2.5) are compatible with each other, i.e., Eq. (2.2) holds.As an important special case of Eq. (2.5), if we take ω = N and as shorthand write K(N ; t) as just K(t), we see that for any unit α, Recall that an MPP is a set of co-evolving subsystems evolving according to a CTMC in which no transitions are allowed in which more than two subsystems both change their states.Formally, in an MPP, for all subsystems i, for all x ′ , x, K [16,20].Equivalently, for every subsystem i in an MPP, there is an associated rate matrix K x ′ x (i; t) which is zero if x ′ −i ̸ = x −i such that the global rate K matrix can be written as and where for every unit ω containing subsystem i, the rate matrix terms K The units in an MPP are sets of subsystems whose joint evolution is independent of the other subsystems.Eq. (2.2) often holds (and therefore so does Eq.(2.5)) in an MPP.At the other extreme from multipartite processes, Eq. (2.5) also holds for some rate matrices K(t) which only allow state transitions in which all subsystems change, i.e., rate matrices K(t) such that K x ′ x (t) = 0 for any x, x ′ where there are two subsystems, j, k such that both x j ̸ = x ′ j and x k = x ′ k .This is illustrated in App.B.
It will often be convenient to re-express a unit structure as a directed graph.Define the dependency graph Γ N * = (N * , E) by the rule that there is an edge e ∈ E from node ω ∈ N * to node ω ′ ∈ N * iff both: ω ′ ⊆ ω, and there is no intervening unit ω ′′ such that ω ′ ⊆ ω ′′ ⊆ ω. (Note that Γ N * is a directed graph, which allows us to use standard graph theory terminology.)In a unit structure N * where N ∈ N * the dependency graph has a single root, but if N ̸ ∈ N * , then the dependency graph has multiple roots.
I will abuse notation and sometimes treat a unit ω as a set of coordinates while at other times I treat it as a single node in Γ N * .I write the set of parents of any node ω ∈ Γ N * as pa(ω), and the set of its descendants as desc(ω), with fa(ω) := ω ∪ desc(ω), the family of node ω.The maximal number of nodes in any directed path that starts at ω is the height of ω.So any unit ω which has no sub-units contained in it is a leaf node of Γ N * , with height 1. (The maximal height of all nodes in Γ N * is simply called "the height of N * ".)I write Γ R N * for the set of root nodes in Γ N * .As an example, the dependency graph of Fig. 1 has two root nodes, ω and α, and one leaf node, ω ′ , which is their common child.The height of the graph is 2.
There are several additional, technical conditions that I will impose on the unit structure, in order to simplify the algebra in the proofs of the results in Section 4. (These conditions can be ignored if the reader is only interested in understanding the results, not the details of their proofs.)(i) I require that the unit structure is rich enough that if a joint state transition can occur that simultaneously changes the state of all coordinates in a set α, then there is some unit ω ∈ N * that contains α. 3 I call such a unit structure flush.(ii) A unit ω is vacuous if every one of its coordinates are in at least one subunit ω ′ ⊆ ω.I assume that no unit in any unit structure we are considering is vacuous. 4(iii) I say that two units ω, ω ′ ⊂ ω are equivalent at time t if for all x ′ where p x (ω; t) = 0.I require that N * does not contain any two equivalent units.This means that for any two units ω, ω ′ ⊂ ω in the unit structure, there must be transitions x ′ → x that can occur in which some coordinate i ∈ ω \ ω ′ changes its value.
Any CTMC can be represented with at least one unit structure meeting these three conditions (e.g., the unit structure that consists just of {N }).
To connect these considerations to the theorems of stochastic thermodynamics, from now on I suppose we can model the system as though there are a total of R thermodynamic reservoirs attached to the system [35,42].I suppose further that each reservoir v ∈ {1, . . ., R} generates fluctuations of the joint state of an associated set of coordinates P(v) ⊆ N , without any such direct effect on the other coordinates.(For example, v may be able to do this by being directly physically coupled to the coordinates in P(v) and no others, via an implicit interaction Hamiltonian.)As is standard in stochastic thermodynamics, I suppose that if only one particular reservoir v were attached to the system, then the resultant dynamics over P(v) would be a CTMC.P(v) is called the puppet set of reservoir v, with its elements called the puppets of v.The collection of all R puppet sets covers N .
Example 2. Return to the example of an MPP, where we identify each subsystem with a separate coordinate.Each subsystem has its own unique set of reservoirs, which jointly causes the fluctuations in its state.In other words, the puppet set of each reservoir is a singleton, the associated subsystem of that reservoir, and each subsystem is the puppet set of at least one reservoir.On the other hand, in general, the rate matrices of each subsystem i will depend on the states of other subsystems besides i.When that is the case, the leader set of the reservoirs of each subsystem i will not be a singleton.
For simplicity, from now on I assume that neither the number of reservoirs nor the associated maps P(.) and L(.) changes with time t.
I write L(v) ⊇ P(v) for a set of coordinates whose associated value directly affects how the coupling with reservoir v affects the dynamics of x P(v) .I call this the leader set of P(v), or sometimes the leader set of v. 5 I write L(v; t) for the associated rate matrix over X induced by the coupling of the system to reservoir v.So L(v; t) affects of dynamics of x P(v) , but leaves the other coordinates unchanged.I write this as x −P(v) (2.8) 3 Formally, if there is some x ′ , x such that K x ′ x (t) ̸ = 0 while x ′ i ̸ = xi for all i ∈ α, then there is some unit ω ∈ N * where α ⊆ ω. 4 For a vacuous unit ω, the dynamics of xω is fully specified by the rate matrices of the subunits ω ′ ⊆ ω, and so to specify a rate matrix for ω would be redundant. 5Physically, L(v) will often reflect an interaction Hamiltonian coupling the coordinates in L(v) without any back-action of the value of x P(v) onto the values of x L(v)\P(v) .This isn't necessary for the analysis below though.
As in conventional stochastic thermodynamics, the global rate matrix at time t is the sum over all reservoirs of the rate matrices of those reservoirs, x −P(v) (2.9) I refer to any system evolving according to Eq. (2.9) for some associated set of puppet sets leader sets, and matrices L x ′ x (v; t) as a composite system.In the rest of this section I introduce some notation that will be helpful in analyzing composite systems.First, as shorthand I will sometimes rewrite Eq. (2.8) as x −P(v) (2.10) where L x ′ L(v) x P(v) (v; t) is a "rate matrix" in that all of its entries for x ′ P(v) ̸ = x P(v) are non-negative, and See Fig. 1 above and Example 4 below.
In general, any given coordinate i may be in more than one reservoir's leader set and in more than one reservoir's puppet set.Accordingly, I extend the definitions above by writing (2.12) where A is an any subset of N .So L(i) is the set of all coordinates whose state can directly affect the dynamics of coordinate i, via arguments of a rate matrix, and similarly for L(A).Along the same lines, I define

14)
So P(A) is the set of all coordinates, inside or outside of A, whose dynamics is governed jointly with that of any coordinate in A. L(v) ⊆ L(P(v)), since there can be coordinates i ∈ P(v) whose dynamics is affected by other reservoirs in addition to v. Note as well that for any set A, A ⊆ P(A) ⊆ L(A).So in particular, if any two different units have nonempty intersection, then since that intersection must also be a unit, the leader sets of all the coordinates in that intersection must lie within that intersection.In addition, the inverses of these set-valued functions are welldefined.In particular, for any set of coordinates A, P −1 (A) is the set of all reservoirs v such that i ∈ P(v) for some i ∈ A.
It will be convenient to introduce the shorthand that for any subset A ⊆ N , ν(A) is the set of all reservoirs v such that P(v) ∩ A ̸ = ∅.So ν(A) is the set of all reservoirs who affect the dynamics of any of the coordinates in A. In Appendix L it is shown that Eq. (2.9) implies the following intuitive result: where L x ′ L(v)∩ω x P(v)∩ω (v, ω; t) is a properly normalized rate matrix over x P(v)∩ω and is independent of x ′ ω\P(v) .
Example 3. As a simple illustration of Proposition 2.1, consider any MPP where each subsystem is controlled by one reservoir, which controls no other subsystems.To reduce notation, consider the case where the unit ω is all of N .In this case the sum over v ∈ ν(ω) runs over all subsystems i in unit ω, and each L x ′ L(v)∩ω x P(v)∩ω (v, ω; t) is the rate matrix of the subsystem i associated with reservoir v.
So Proposition 2.1 reduces to Eq. (2.7) in Example 1, with each term L x ω\P(v) in Proposition 2.1 re-expressed as K Proposition 2.1 means that as far as any single unit ω is concerned, we can replace each reservoir v ∈ ν(ω), which has leader set L(v) and puppet set P(v), with a reservoir which has leader set L(v) ∩ ω ⊆ ω and puppet set P(v) ∩ ω ⊆ ω.For simplicity I assume in the analysis below that we have chosen a unit structure where all such replacements have been made.Formally, without loss of generality, I restrict attention to unit structures that only contains units ω with the property that for all reservoirs v ∈ ν(ω), L(v) ⊆ ω.I call such a unit structure tight.(Note that there is always at least one unit structure with this property, namely the unit structure with a single element, the unit N .) We can tighten Proposition 2.1 under our assumption of a tight unit structure.The following result is proven in Appendix M: Proposition 2.2.For any unit ω in a tight unit structure, For any unit ω in a tight unit structure, L(ω) = ω. 6Since A ⊆ P(A) ⊆ L(A) for all sets A, it then follows that P(ω) = ω for any unit ω in a tight unit structure.So loosely speaking, no reservoir is allowed to "straddle" coordinates lying both within a unit and outside a unit, if we restrict attention to tight unit structures.In addition, P(−ω) = −ω in a tight unit structure, even though −ω = N \ ω is not a unit in general. 7

Thermodynamics of composite systems
Following conventional stochastic thermodynamics, I identify the (expected) global EF rate at time t as where the second equality is established in App. C. The results below do not require LDB.However, if all reservoirs are purely thermal, with no associated particle exchange, and if LDB applies, then we can interpret the EF rate as (temperature-normalized) heat flow between the system and its reservoirs. 8imilarly, the (expected) global EP rate at time t is the difference between the time-derivative of the global entropy and the expected EF rate.Expanding, we can write that difference as Eqs. (3.2) and (3.4) formally establish that the thermodynamics associated with each reservoir v doesn't involve any coordinates outside of L(v), just as one would expect.
Example 4. In an MPP with a single reservoir per system, each coordinate i is a "subsystem"; R = N ; there is a bijection between the set of reservoirs and the set of subsystems; and for every reservoir / subsystem i, P(i) = {i}.So a unit ω is any set of subsystems such that for all i ∈ ω, L(i) ⊆ ω.In addition, Eq. (3.4) reduces to See [4,13,16,[46][47][48] and Fig. 1.
Following the same convention as for global EF rate, I define the (expected) local EF rate of any unit ω ⊆ N at time t as the entropy flow rate into the associated reservoirs: Since no reservoir's puppet set can include both coordinates inside a unit ω and coordinates outside of ω, for any two units ω, ω ′ where ω ∩ ω ′ = ∅, So viewed as a function from the set of all units to reals, ⟨ Qω (t)⟩ obeys the countable additivity axiom of a signed measure over S(N * ), the sigma algebra generated by the units in N * .This allows us to extend the definition of local EF rate to the sigma algebra S(N * ) by using the set of values {⟨ Qω (t)⟩ : ω ∈ N * } to generate an entire signed measure.So for example, for every pair of Recall that the dynamics of any unit is given by a self-contained CTMC, independent of the state of any coordinate outside of that unit.Accordingly, the EP rate of a unit is the sum of the derivative of the entropy of the distribution of the joint state of that unit and the EF rate into the reservoirs of that unit.Using Appendix M to evaluate that entropy derivative and Eq.(3.7) to evaluate that EF rate, we see that the (expected) local EP rate of ω at time t is Accordingly I sometimes write the global EP rate given in Eq. (3.4) as ⟨ σN (t)⟩.For any unit ω, ⟨ σω (t)⟩ ≥ 0, since ⟨ σω (t)⟩ has the usual form of an EP rate of a single system.(See [20] for a discussion of the relation between local EP rates and similar quantities discussed in [16,17,37].)Write the local EP generated by a unit ω during the process as and similarly write σ N for the global EP.(To minimize notation, I adopt the convention that angle brackets are implicit for time-extended thermodynamic quantities, as opposed to rates.)In App.D, Eq. (2.5) and the log sum inequality [7] are used to prove that for any two units ω, α ⊂ ω, not necessarily part of a unit structure, ⟨ σω (t)⟩ ≥ ⟨ σα (t)⟩ at all times t.Therefore In particular it is shown in [20,45] that in the special case where there is a set of units {α j } who have no overlap with another, for any unit ω ⊃ ∪ j α j , (See also Eq. (6.4) below.)Let N * = {ω j : j = 1, 2, . . ., n} be a unit structure.For simplicity, from now on I assume that N ̸ ∈ N * .Suppose we have a set of real numbers, f , which are indexed by the units N * .It will be convenient to use the associated shorthand, (Note that the precise assignment of integer indices to the units in N * is irrelevant.)This quantity is called the inclusion-exclusion sum (or just "in-ex sum" for short) of f for the unit structure N * .Next, define the time-t in-ex information as where all the terms in the sums on the RHS are marginal entropies over the (distributions over the coordinates in) the indicated units.As an example, if N * consists of two units, ω 1 , ω 2 , with no intersection, then the expected in-ex information at time t is just the mutual information between those units at that time.More generally, if there an arbitrary number of units in N * but none of them overlap, then the expected in-ex information is what is called the "multi-information", or "total correlation", among those units [20,25,41,41].
This is the first major result of this paper. 9Integrating Eq. (3.20) from the start to the end of a process gives As an example of this result, suppose that we have two physically separated subsystems undergoing an MPP, and that subsystem 2 never changes its state, while subsystem 1 executes a map from px 1 (t i ) to px 1 (t f ), independent of the state of x 2 .Note that if the rate matrix of subsystem 1 depends on the state of subsystem 2, i.e., subsystem 1 observes subsystem 2 as it evolves, then there is only one unit rather than two.Accordingly, Eq. (3.20) tells us that the global EP rate can depend on this property of whether subsystem 1 observes the state of subsystem 2 as subsystem 1 evolves, even though the conditional distribution of subsystem 1's final state given its initial state, p(x , is independent of the state of subsystem 2. In general, this effect of the unit structure on the EP will occur whenever the two subsystems are initially statistically coupled.See App.F for a discussion. Eq. (3.21) applies to any unit structure.In addition, for any unit structure M * over a set of coordinates M ⊂ ω, Eq. (3.15) and the fact that the union of a set of units is itself a unit means that σ ω − σ M ≥ 0. Therefore using Eq.(3.21) to expand σ M gives Eq. (3.22) holds even if M * ⊂ N * , and at the other extreme, even if no unit in M * is also in N * .Finally, as an aside, I note that if local detailed balance holds for all reservoirs v with puppet set inside a unit, then all the usual fluctuation theorems [35], thermodynamic uncertainty relations [15,18,23], first-passage time bounds [10], bounds on stopping times [27], etc., apply to the thermodynamics of that unit.See [49] for extensive analysis of the implications of this for the special case of composite systems that are MPPs.

Strengthened second law for composite systems
In general, to evaluate the in-ex sum of local EPs on the RHS of Eq. (3.21) requires detailed knowledge of the precise rate matrices during the process.However, following Landauer, the goal in this paper is to derive bounds that are independent of those details, depending only on the starting distribution and the conditional distribution of the final state given the initial state.One might hope that one could achieve this goal simply by setting all local EPs to 0 in Eq. (3.21), giving Unfortunately, in general it is impossible to have the local EPs of all units = 0 in an arbitrary unit structure, even if one uses a quasistatically slow process.Indeed, the unit structure itself, independent of any other properties of the rate matrix, may mean that it is impossible to have all local EPs = 0. 10This might seem to imply that we cannot lower-bound the EP as B N * .However, recall that in general there are many different unit structures that all apply to the same CTMC.We are free to choose among those unit structures.And as it turns out, no matter what the CTMC is, we can always choose the unit structure in a way that guarantees that Eq. (4.2) does in fact hold.
I prove this result in several steps.First, in App.F and App.G, I derive a set of lower bounds on EP that always apply, no matter what the unit structure.These lower bounds are summarized in Prop.F.1, and are my second main result.These bounds are not in the form of Eq. (4.2) though; while important in their own right, they do not yet achieve our goal.
On the other hand, in general we can represent any CTMC with a unit structure of height 2. (For example, we can do that by combining all coordinates that are not members of a root node of Γ N * , into one, overarching unit.)In App.F I derive a corollary of Prop.F.1, telling us that Eq. (4.2) holds for any such unit structure of height 2. This is my third main result. 11ue to this third result, we can always choose the unit structure N * so that the global EP is bounded by Eq. (4.2).Unfortunately, as illustrated below, there are some unit structures N * of height 2 where the bound on the RHS of Eq. (4.2) is negative for an appropriate initial distribution p ti (x) and conditional distribution p(x(t f ) | x(t i )) consistent with N * .In such cases, Eq. (4.2) does not provide a stronger bound on EP than the conventional second law.This is not as much of a problem as one might fear though.For every unit structure N * , there are initial distributions p ti (x) and conditional distributions p(x(t f ) | x(t i )) that are consistent with N * where the RHS of Eq. (4.2) is non-negative, so that the bound in Eq. (4.2) is at least as strong as the conventional second law.This is my third and final main result.(This result is presented in Prop.F.2, and is also proven in App.F, based on results in App.I.)

Thermodynamics of feedback control for composite systems
We can use Eq.(4.2) to extend previous work on the thermodynamics of feedback control [20,29,31] to account for a known set of dependency constraints of the system being controlled.Suppose we have a composite system with some associated unit structure N * and some desired initial and final joint distributions over the states of the system, p † ti (x) and p † t f (x), respectively.Suppose we also have a feedback controller, C, whose state space C has values c.Before the system starts to evolve, the controller observes the initial state of the system through a noisy channel, p(c|x).This observation does not affect that initial system state, i.e., there is no back-action.So the initial joint distribution immediately after the observation is As is standard in the literature of the thermodynamics of feedback control, we do not consider the thermodynamics of this measurement process.Note that p ti (x) = p † ti (x).After the measurement, the controller's state, c, does not change.However, the system can observe c as it evolves (or as it's more usually phrased, the state of c serves to "control" the state of the system).The result is a new final distribution, where we abuse notation and write p(x t f |x ti , c) for the distribution over final states of the system conditioned on the initial state being x ′ and the feedback process state being c.For simplicity we parallel the conventional analysis in the literature and require that the marginal final distribution In order to analyze the thermodynamics of feedback control, one must define a Hamiltonian over the states of the system, so that one can define the work on / from the system.Following convention, I assume the Hamiltonian is uniform at both t i and t f , and assume it is related to the global rate matrix via LDB.
Let N * be some unit structure with height less than 3 representing the original system, without the feedback apparatus.Using Eq. (4.2), the EP without the feedback apparatus is lower-bounded by By coupling that original system to the feedback apparatus we construct a new system, M, which comprises the original system together with an extra subsystem (the feedback apparatus) and new dependencies of the original coordinates of the system on the state of that new subsystem.There are many possible unit structures, M * , over this new joint system-feedback-apparatus. For simplicity, exploit the fact that C evolves independently of the other coordinates in the system (by not evolving at all) to construct M * directly from N * , by replacing each unit ω ∈ N * with a new unit, ω ′ (ω) := ω ∪ C.So M * and N * contain the same number of units, with each unit in M * containing the subsystem C, and By conservation of energy, the work done on the system during [t i , t f ] is the change in its internal energy minus the heat flow to all the reservoirs, which is given by sum of the (temperature normalized) entropy flows to the reservoirs.(Equivalently, this is the negative of the work extracted from the system.)Since the Hamiltonian is uniform at t i , t f , the change in internal energy is zero.For simplicity assume all reservoirs have the same temperature, T , and choose units so that k B T = 1.Then that sum of entropy flows is the total change in the entropy of the system minus the EP.
Combining this with Eq. (4.2), it is shown in App.J that the amount of work that can be extracted from the system under feedback control if one takes into account the unit structure is (perhaps loosely) upper-bounded by In contrast, the conventional analysis in the literature, in which one does not account for the unit structure of the system, results in an upper bound of ∆S(X N |C) [20,29,31].The difference between these two terms is how much the unit structure restricts the amount of work we can extract from a system by observing its state.

Examples of the strengthened second law
In this section I work through some elementary examples illustrating Eq. (4.2).All unit structures in these examples are implicitly assumed to have height less than 3.

(a) Example 1
Consider any process where every coordinate that is in the intersection of two or more distinct units stays constant throughout the process.In such a process where the sum runs only over the root nodes.Moreover, since any unit that never changes its state generates no EP, in this kind of process by the definition of in-ex sum.The lowest each σ ω can be is zero (which occurs when each unit ω evolves semi-statically slowly).Therefore we can combine Eq. ( 6.2) with Eqs.(3.21) and (6.1) to establish that the lower bound on EP is exactly, i.e., Eq. (6.1) is a strict lower bound on the EP.This lower bound holds no matter what p ti (x) and p(x(t f ) | x(t i )) are, so long as p(x(t f ) | x(t i )) is consistent with the unit structure.
As an illustration of this result, suppose that no two units intersect one another, and that every unit contains just a single coordinate.Then the lower bound on EP is which is the drop among the coordinates in their multi-information, sometimes called "total correlation".(This lower bound on the EP was previously derived in [20,45], in the special case that each coordinate is a physically separate subsystem.)By repeated application of the data-processing inequality, it is easy to confirm that this lower bound on the EP is non-negative.Note though that Eq. (6.1) holds for any process with a height 2 unit structure, so long as the ending entropies of (the joint coordinates in the units corresponding to) the leaf nodes equal the associated starting entropies.In particular, this is true even if the coordinates in the leaf nodes do change state during the process.Since the dependency graph has height 2, Eq.(4.2) tells us that the expression in Eq. (6.1) is a lower bound on the EP of such a process.Furthermore, the same argument using the data-processing inequality establishes that that lower bound is non-negative.However, in general, if the coordinates in the leaf nodes change their states during the process, that lower bound may not be tight.
Suppose as well that initially, x 1 = x 3 with uniform probability over their two possible joint states, and that x 2 is independent of both x 1 and x 3 , also with uniform probability over its states: order of subset inclusion.In this example, the number of units is the same as the number of coordinates, but that need not be the case in general.
Therefore S(p(t i )) = 2 ln 2, and so Assume that x 2 eventually loses all information about its initial state.So In addition, as required by the unit structure, have x 1 and x 3 evolve independently of one another, conditioned on the state x 2 , and presume that they both eventually lose all information about their own initial states and the initial state of x 2 .So for example, Combining, , and so = 0 (6.12) Combining Eqs.(6.7) and (6.12) establishes that the EP is lower-bounded by ln 2. Note that we can derive this lower bound on the EP even though both subsystems 1 and 3 are continually observing subsystem 2 during the process, even if subsystem 2's state is changing as they observe it.In addition, this lower bound holds no matter what the ending distribution p t f (x) is, so long it can be written as in Eq. (6.11).(So in particular, as discussed in the introduction, it applies to a simple extension of the cell-sensing scenario analyzed in [2,13].)Suppose that at t i the full system has a single specific state with probability 1.So S(t i ) = 0. Suppose as well that the position in the lattice is uniformly random at t f .(For example, this will occur at large enough t f if the lattice has periodic boundary conditions and both X 1 and X 2 evolve by randomly choosing one of their two neighbors.)This means that knowing the values of x 2 , x 3 at t f tells us nothing about the most recent values of x 1 not already given by the value of x 1 at t f , and so in particular tells us nothing about the most likely value of x A then.The same is true concerning the value of x B at t f .This all means that S t Combining gives −∆I N * = 2 ln L. So Eq. ( 4.2) provides a strictly positive lower bound on the global EP.Note that this lower bound applies no matter what the dynamics of the process; it can be quasi-statically slow, it can involve Hamiltonian quenches, but so long as the unit structure does not change during the process, the EP is lower-bounded by 2 ln L. Furthermore, so long as the Hamiltonian is uniform at both t i and t f , the total work extracted in the process is the gain in entropy of the full system minus the global EP.Combining establishes that the total work extracted is upper-bounded by Note that increasing N while keeping LN constant means that the precise value (x 1 , x 2 ) tells us less about the precise lattice position.Eq. (6.15) tells us that increasing the significance of x 3 this way increases the upper bound on the total amount of work that can be extracted.

Discussion
In this paper I consider the thermodynamics of multi-dimensional systems evolving according to a continuous-time Markov chain.My main result is a strengthened version of the conventional second law, which applies whenever we have an a priori set of "dependency constraints" that for each coordinate i specify which other coordinates can directly affect the dynamics of i, via the rate matrix.The result holds for any coordinate system -the coordinates can be conventional phase space coordinates, they can be states of a set of separate interacting subsystems of an overall system, they can be positions in a sequence of more refined coarse-grainings of the state of the system, they can involve amounts of various chemicals in the system, etc.
To derive my result I first translate the dependency constraints into a "unit structure".This gives a sigma algebra that groups the coordinates into overlapping sets, in a way that respects the dependency constraints.In general, any set of dependency constraints can be translated into more than one unit structure.In turn, any unit structure specifies an information-theoretic functional of distributions over the states of the system, called the "in-ex information".To illustrate this, suppose the dependency constraints specify that each coordinate evolves autonomously, independent of the others.(As an example, this would be the case for the spatial coordinates of a particle freely evolving under over-damped Langevin dynamics in a uniform medium with no external forces.)We could then choose a unit structure that assigns each coordinate to its own unique unit.In this case the in-ex information reduces to the total correlation (sometimes called "multi-information") of the system's distribution, with each coordinate viewed as a separate random variable.
The strengthened version of the second law derived in this paper says that the entropy production (EP) of the system is lower-bounded by the difference between the beginning and ending values of the system's in-ex information.This lower bound is independent of all features of the dynamics other than the the beginning distribution, the ending distribution, and the dependency constraints restricting how the dynamics could have caused the initial distribution to evolve into the ending distribution.Accordingly, we can use this strengthened second law to upper-bound the amount of work that can be extracted from a system as it evolves from one specified distribution to another [14,29], in a way that accounts for dependency constraints governing the system's dynamics.Similarly, this strengthened second law can be used to refine recent results in thermodynamics of feedback control [32], to account for dependency constraints in the system being controlled.
In contrast to other similar recently derived lower bounds on EP [46,47], the one derived here does not require that the dynamics of the system be a multi-partite process.Nor does it require that local detailed balance holds.These two features mean the lower bound applies to any system undergoing continuous-time Markovian dynamics, even if the system has no natural thermodynamic interpretation.As a result, we can apply these results to everything from (Markov models of) evolving opinion networks to replicator dynamics of a population of evolving organisms.
A recent paper [20] used an information-geometric analysis to also derive bounds on minimal entropy production (EP) that arise due to constraints on the rate matrix of a system's dynamics.To use the analysis in [20] one needs to first find an operator ϕ over the set of all joint distributions which both obeys the Pythagorean theorem of information theory and which commutes with the time-evolution operators defined by the set Λ of allowed rate matrices.In general there are many such ϕ, but different ones will result in different bounds on EP.
The analog of Λ in this paper is the set of dependency constraints.The analog to finding one (or more) ϕ's for the approach in this paper is choosing a coordinate system and associated unit structure that represents the dependency constraints and is rich enough for the lower bound on EP to be strictly positive.Similarly to the case with the approach in [20], where different ϕ all consistent with the constraints on the set of allowed rate matrices will result in different bounds on EP, in general different unit structures all consistent with the constraints on the set of allowed rate matrices will result in different bounds on EP. [20] provides many examples of how constraints on the allowed rate matrices can be used to derive nonzero lower bounds on EP, including collective flashing ratchets, Szilard boxes where the particle is subject to a gravitational force in addition to driving by a piston, Szilard boxes where there are constraints on the way the piston can be used, evolving Ising spin systems, etc.Many of these examples can be formulated as systems evolving under dependency constraints (e.g., most of the examples involving in [20] "modularity constraints" can be directly formulated this way).Future work involves comparing the EP bounds in this paper to the ones in [20], and more generally trying to synthesize the two approaches.
B. Illustration of Eq. (2.5) Suppose that each coordinate of x has the same dimension, m.Choose any subset of the coordinates, α, coordinate i ∈ α and any stochastic matrix is a properly normalized rate matrix over all of X, since for all x ′ , x K x ′ x = xi W x ′ i xi = 0.In addition, choose ω = N , and choose So α is a unit.
x −P(v) x where for convenience I have switched back and forth between the shorthand of Eq. (2.10).

D. Proof of Eq. (3.15)
First, note that by Eq. (2.5), for all conditional distributions p(x where the second equality follows from Appendix M. Next, use Eq.(3.13) to expand In addition, using the log sum inequality [7] shows that for each v, xα, x ′ α , the associated value of the first term in the inner sum on the RHS of Eq. (D3) is bounded by where the second line uses Eq. (D2).
Combining and using Eq.(D2) again establishes that Again plugging into Eq.3.13, this time for the EP rate of unit α, completes the proof.
E. Proof of Eq. ( We are given some composite system with unit structure N * .Fix the time t and make it implicit for the rest of this appendix.Define T as the set of all state transitions in X, involving an arbitrary number of the coordinates in N .(So there are |X| (|X| − 1) elements of T , where |X| = i∈N |X i | is the number of joint states of the multi-dimensional system.)For any unit ω ∈ N * define τω as the set of all state transitions x ′ → x ̸ = x ′ such that both x ̸ = 0 and x −ω = x ′ −ω .(Intuitively, τω is the set of all state transitions that do not modify any of the coordinates outside of ω and that are allowed under the CTMC.)Note that τω ⊆ τ ω ′ if ω ⊆ ω ′ .So T N * := {τω : ω ∈ N * } ∪ {T } is a locally finite poset, ordered by the set inclusion relation.
Due to our assumption that the unit structure is flush, no state transition x ′ → x can occur that has the property of simultaneously changing the state of all coordinates in an arbitrary set α unless α is a subset of a unit ω ∈ N * (i.e., there is no pair (x ′ , x) that has that property where both p x ′ ̸ = 0 and K x ′ x ̸ = 0).This means that every element of T allowed by the CTMC is contained in at least one element of T N * .So T N * is a cover of the set of all state transitions allowed by the CTMC.
Next, due to our assumption that there are no two equivalent units in the unit structure, the set of all transitions in a set τω uniquely specifies ω, i.e., there is a bijection between Ω := N * ∪ N and T N * .Due to this, without any ambiguity we can define a function g : for all ω ∈ N * , and similarly setting g(T ) = ⟨ QN ⟩. (Note that if N ∈ N * , then this second definition is redundant.)In addition, unit structures are closed under intersection.As described in the text, this means that N * generates a sigma algebra over coordinates, and we can use the function {⟨ Qω ⟩ : ω ∈ N * } to generate a signed measure over that sigma algebra.In the same way, T N * generates a sigma algebra over state transitions, and the function g(τω) generates a signed measure of that sigma algebra.We can apply all the steps in the usual derivation of the inclusion-exclusion principle for signed measures from Rota's theorem [38] to that signed measure generated by g(.).This allow us to write Plugging in the definition of g(.) completes the proof.
F. Discussion of Eq. 3.21 for the case of parallel bit erasure Suppose our system comprises two physically separated subsystems where that state space of both subsystems are a bit.Suppose as well that subsystem 1 gets erased in the process, and the initial distribution is uniform random between the joint state where both bits equal 0, and the joint state where both bits equal 1, with no other possibilities.If the dynamics is a single unit, so that the rate matrix of subsystem 1 can depend on the (unchanging) state of subsystem 2, EP can equal 0. If instead there are two separate units, so that the rate matrix of subsystem 2 cannot depend on the state of subsystem 2, minimal EP is instead ln 2.
To understand this, suppose that there is a (time-varying) Hamiltonian and that the full system's (time-varying) rate matrix always obeys LDB for that Hamiltonian.So to have the global EP during the full process equal zero, there would have to be a trajectory of such Hamiltonians such that at all times the system was always at equilibrium for the associated Hamiltonian.Furthermore, because we're assuming the system is governed by a CTMC, the distribution px 1 ,x2 (t) can never change discontinuously during times t ∈ (0, 1) within the process, and therefore neither can the Hamiltonian.
Given this, suppose that there is no continuous trajectory of such distributions that is always a product distribution and has the desired initial and final forms, px 1 ,x2 (t i ), px 1,x2 (t f ).(For example, this is the case if the two subsystems are statistically coupled at t = 0.) Then by the always-at-equilibrium condition for zero EP, there must be a time t ∈ (0, 1) at which the Hamiltonian cannot be written as a sum of a function of x 1 plus a function of x 2 , but instead must nonlinear couple them.By our assumption of LDB, this would then mean that at that time t, the full rate matrix of the joint system must couple those two subsystems, which in turn means that the rate matrix of one of the two subsystems must depend on the state of the other subsystem 12 To apply this to our two-subsystem example, simply note that the beginning distribution is not a product distribution.Therefore the condition for zero EP is violated if neither of the rate matrices of the two subsystems depends on the other subsystem's state.(i) α = δ ω for some ω ∈ N * whose height ≤ 2; for a ω ∈ N * , v ∈ desc(ω); I will refer to any distributions that obey these conditions as type-1, type-2, and type-3 distributions, respectively.
Next, note that units A, B, C, E are all units contained in unit E (i.e., those nodes are contained in the family of node E).Similarly, A, C are both units contained in C. Therefore the units that are in E but not in C are B, E. Accordingly, the (unique) type-3 distribution for the pair of units ω = A, v = C is α = (0, 1/2, 0, 0, 1/2, 0).II) For any unit ω, write M(ω) ⊂ ω for any set of coordinates which can be represented by a unit structure M(ω) * ⊂ fa(ω) 13 .Define V 2 (N * ) as the set of all distributions over B |N * | of the form for some ω ∈ N * and associated unit M(ω) ⊂ fa(ω).I will refer to any such distribution as a type-4 distribution.Note that any type-4 distribution α uniquely specifies both ω and M(ω).Given this, I will sometimes abuse notation and write α * for the unit structure with the single unit M(ω) specified by a type-4 distribution α.
As shorthand, write V (N * ) := V 1 (N * ) ∪ V 2 (N * ).As a final piece of terminology, let U = {u} be any set of distributions over some shared space.I will say that U is centered if there exists a centering distribution π ∈ ∆ U such that Eπu = u∈U uπu equals the uniform distribution.Note that the set of all centering distributions of any U is a convex polytope.
The following result involving centering distributions is proven in Appendix H: Note that the values of the in-ex informations I N * and ∆I α * at the beginning and end of the process are fully specified by p ti and p t f .So given a centering distribution, Proposition G.1 provides a lower bound on global EP defined purely in terms of p ti and p t f .The precise rate matrix is irrelevant, so long as it has N * as a unit structure and maps p ti to p t f .Note as well that the sum in Proposition G.1 only extends over distributions in V 2 (N * ).The role of the distributions in V 1 (N * ) is indirect; in general they are necessary to construct a centering distribution π, and thereby constrain the possible values of πα for the distributions in V 2 (N * ).
As an example of Proposition G.1, first recall that even though each unit ω evolves autonomously, in general we cannot choose all the rate matrices so that every σ ω = 0.In particular, if ω has a non-trivial unit structure within it, then Eq. 3.21 will apply for the choice N = ω, potentially providing a strictly positive lower bound to σ ω .As a result, in general we cannot choose all the rate matrices so that ω∈N * σ ω = 0, and so cannot in general lower bound σ N by −∆I N * .On the other hand, suppose that Γ N * has height 2. (So there are no units ω, ω ′ , ω ′′ ∈ N * such that ω ′′ ⊂ ω ′ ⊂ ω.) Then all delta function distributions over B |N * | are type-1 distributions, and so contained in V 1 (N * ).Accordingly, the unit structure is centered by a distribution π that is uniform over all α ∈ V 1 (N * ) and equals 0 for all α ∈ V 2 (N * ).Plugging this into Proposition G.1 establishes that the EP of any process with a height-2 unit structure N * is in fact lower-bounded by −∆I N * . This provides the formal justification for Eq.4.1.Moreover, in general, we can represent any process by using a unit structure of height 2. (For example, we can do that by combining all coordinates that are members of some unit ω that is not a root node of Γ N * , into one, overarching unit.)Accordingly, for any process, we can always find an associated unit structure N * for which σ N ≥ −∆I N * .In addition, it is proven in Appendix I that for any unit structure N * , no matter what its height, V (N * ) is centered.(The set of all associated centering distributions of V (N * ) is the convex polytope discussed in the introduction.)In general, finding the optimal such centering distribution -the one that maximizes the bound in Proposition G.1, and so provides the strongest lower bound on global EP -only requires solving a linear programming problem.
Unfortunately, there are some unit structures N * where the bound in Proposition G.1 is negative for an appropriate initial distribution p ti (x) and conditional distribution p(x(t f ) | x(t i )) consistent with N * , no matter what centering distribution we use.In such cases, Proposition G.1 does not provide a stronger bound on EP than the conventional second law.
On the other hand, as illustrated in the main text, often the bound in Proposition G.1 will be stronger than the conventional second law.Indeed, for every unit structure N * , and every associated centering distribution, there are initial distributions p ti (x) and conditional distributions p(x(t f ) | x(t i )) that are consistent with N * where the EP bound in Proposition G.1 is at least as strong as the second law: Proposition G.2. Let N * be any unit structure that does not have N itself as a member.Then there exists an initial joint distribution p ti (x) and a conditional distribution p(x(t f ) | x(t i )) consistent with N * such that for any associated centering distribution πα, for every rate matrix that both implements that p(x(t f ) | x(t i )) and obeys the unit structure.
(See Appendix J for proof.) In addition, even if −∆I N * < 0, often we can use Eq.3.21 directly to provide a non-negative lower bound on the global EP, by constructing a lower bound on ω∈N * σ ω which is bigger than −∆I N * . To construct such a bound, first note that since there is no unit structure within leaf units of the dependency graph, in theory we can implement all such units with zero local EP.Next, for any unit ω that is a parent of a leaf unit, we can often use Eq. 15 in [47] (or its corollary, Eq. 17) to construct strictly positive lower bounds on σ ω .
Write the polytope of all centering distributions of the unit structure N * as P(N * ), and write the strongest lower bound on EP given by Proposition G.1 as Combining Propositions G.1 and G.2 establishes that for any unit structure N * there are pairs of p ti (x) and a conditional distribution p(x(t f ) | x(t i )) consistent with N * such that In contrast, the conventional second law says only that Q + ∆S ≥ 0, no matter what p ti , p t f or N * are.Summarizing, suppose we are given a unit structure N * that applies to the rate matrix at all times, an initial distribution p ti (x), and a conditional distribution p x(t f ) | x(t i ) consistent with N * .Then we know that −∆I N * − α∈V 2 (N * ) πα∆I α * is a lower bound on the EP, for any π that is a centering distribution for N * .These lower bounds are simple to evaluate, and are often stronger than the second law.We can find the strongest such lower bound on EP due to the unit structure N * by solving a linear programming problem.
Furthermore, in general, it is possible to represent any given set of constraints with more than one unit structure.Each one of them results in its own strongest lower bound on EP, given by solving the associated linear programming problem.So to find the strongest version of the second law for a given set of constraints, we should solve all the linear programming problems specified by the unit structures that can represent those constraints.
This result extends the previous strengthenings of the second law derived in [3,20,45], which all assume that the units have no overlap, to the case where the units may overlap with one another in arbitrary ways, even if none of the coordinates are fixed in the dynamics.In addition to this result, which maps an arbitrary set of constraints, a p ti and a p t f to a lower bound on EP, a second result is that for any set of constraints, there is a p ti and p t f that results in a strictly positive lower bound on EP, stronger than the conventional second law.

H. Proof of Proposition G.1
The proof has two parts.First, I construct a function f : N → R such that for all units ω ∈ N * , i∈ω f i = σ ω .(Note that in general, any coordinate i will be in more than one unit ω, and so this function f is the solution to a set of coupled equations.)I will then apply the inclusion-exclusion principle with this f in order to replace the first in-ex sums over unit ω on the RHS of Eq. 3.21 with a conventional sum over coordinates, i f i .
In the second part of the proof I use the hypothesized existence of a centering distribution to provide a lower bound on i f i , expressed purely in terms of p ti and p t f .Plugging in to Eq. 3.21 then completes the proof.
To begin, for each unit ω, define ω as the set of all coordinates in ω that are not in any of the units in desc(ω).(As an example, in Fig. in the main text, ω is the pair of coordinates 1 and 2.) Because N * is closed under intersections and covers N , every coordinate is in ω for exactly one unit ω ∈ N * .Moreover, because there are no vacuous units allowed, ω is nonempty for every unit ω.Note that if ω ⊂ ω ′ for two units ω, ω ′ , then the coordinates in ω must evolve independently of the states of any coordinates in ω ′ , but the reverse need not be true.In other words, if there is an edge from ω ′ to ω, then there may be coordinates in ω ′ whose dynamics depends on the state of coordinates in ω, but not vice-versa.
For all j ∈ N, let Ω j be the set of all nodes in Γ N * , with height j.So in particular, Ω 1 is the the set all units with no subunits.For every ω ∈ Ω 1 , for all coordinates i ∈ ω, set Next, note that any for any ω ∈ N * , the set of units in desc(ω) is closed under intersections.This allows us to define for all j ∈ N , all ω ∈ Ω j , and all i ∈ ω. (Note that any such i will be assigned a value f i exactly once in this procedure.)In general, it could be that f i is negative.Note though that since no two units in Ω 1 have any overlap, for all ω ∈ Ω 2 , Therefore by Eq. (H3), for all ω ∈ Ω 2 , (Note that the inclusion-exclusion principle holds for arbitrary functions f , not just for nowherenegative functions.)Since σ v ≥ 0, this means that we are guaranteed that i∈v f i ≥ 0. Iterate this procedure going from nodes in Ω k−1 to those in Ω k until all units have been considered, so that values f i have been assigned to all coordinates i ∈ N .By induction, at the end of this procedure, for all units ω ∈ N * , Eq. (H2) will hold and i∈ω f i ≥ 0. In addition, by the inclusion-exclusion principle, (Note that since N ̸ ∈ N * , Eq. (H12) does not imply that i∈N f i = σ N .So we cannot combine Eq. (H14) with the fact that global EP is ≥ 0 to establish that ω∈N * σ ω is also non-negative.)It will be convenient to define new variables that equal sums of f i over small sets of coordinates i.For all ω ∈ N * , define where the second line follows from Eq. (H3).Since each coordinate i is in ω for exactly one unit ω, where the last line uses Eq. (H12).Using similar reasoning shows that ω∈N * gω = i∈N f i .
Combining this with Eq. (H14) and Eq.(3.21) gives This completes the first part of the proof.In the second part I derive a lower bound on the RHS of Eq. (H20).First, to reduce the complexity of the equations, I will translate all distributions α into binary-valued vectors: (i) α = α for any type-1 distribution α; (ii) α = α ω ′ ∈fa(ω) 1 for any type-2 distribution α; (iii) α = α ω ′ ∈fa(ω)\fa(v) 1 for any type-3 distribution α; (iv) α = α ω ′ ∈fa(ω)\M * (ω) 1 for any type-4 distribution α; Note that every component of every vector α is either a 0 or a 1.I will refer to any vectors that obey condition (1) as type-1 vectors, and similarly for vectors obeying conditions (2), (3) and / or (4).Now make three suppositions.First, suppose that for all vectors α of types 1, 2 or 3, Now, hypothesize that there is a centering vector γ all of whose components are non-negative such that for all ω ∈ N * .Multiplying both sides of Eq. (H23) by gω, summing over ω, and then plugging in Eqs.(H21) and (H22), we see that if those three equations hold, Plugging this into Eq.(H20) shows that if we can prove that the suppositions Eqs.(H21) and (H22) always hold, then we will have proven that for any centering vector γ, To begin, use Eq.(H16) to conclude that gω = σ ω for all ω ∈ Ω 1 (which have no descendants) and so gω ≥ 0 for all ω ∈ Ω 1 .Next, combine this fact that gω = σ ω for all leaf nodes ω with Eqs.(3.16) and (H19) to also conclude that gω ≥ 0 for all ω ∈ Ω 2 .Combining these two results means that Eq. (H21) holds for all type-1 vectors α.
Next, note that Eq. (H19) means that for any ω ∈ N * , So by the non-negativity of local EP, for all ω ∈ N * , This means that Eq. (H21) holds for all type-2 vectors α.Now consider any pair of nodes ω ∈ N * , v ⊂ ω.Using Eq. (H19) for both ω and v and then applying Eq. 3.15 establishes that This means that Eq. (H21) also holds for all type-3 vectors α.Combining establishes our first goal, of showing that Eq. (H21) holds for all vectors α ∈ V 1 (N * ), of types 1, 2 or 3. Next, consider any pair of a unit ω and set of coordinates M(ω) ⊂ ω such that there is a unit structure M(ω) * ⊂ fa(ω).Use Eq. (H12), the inclusion-exclusion principle, Eq. (H15), and then Eq. (H19) to expand Eq. 3.22 then establishes that Finally, normalize each vector α to recover the distributions α, and define the distribution π(α) by normalizing γ(α).It follows that Eπα is the uniform distribution, so that π is a centering distribution.In addition, Eq. (H38) gets converted into the bound in Proposition G.1.This completes the proof of Proposition G.1.

I. Proof that any unit structure is centered
To begin, choose V 1 (N * ) to be the set of all type-1 distributions, i.e., V 1 (N * ) is the set of all distributions δ ω for any ω of height ≤ 2.
Next, for each ω of height greater than 2, plug M(ω) = desc(ω) and any arbitrary single one of the possible unit structures M(ω) * into Eq.(H23) to define a distribution and associated unit structure α(ω) * .Choose V 2 (N * ) to be the set of all such α(ω), one per ω, as one ranges over all ω of height greater than 2. By construction, V (N * ) is exactly the set of all vectors δ ω as one ranges over all ω ∈ N * .Accordingly, the sum of all distributions in V (N * ) is ⃗ 1, and the average of those distributions is the uniform distribution, (1/|N * |, 1/|N * |, . ..).Therefore the set of those vectors is centered, where the centering distribution πα = 1/|N * | for all α ∈ V (N * ).This completes the proof.First, note that for a uniform distribution over the states of the multi-dimensional system, the entropy of every coordinate i with |X i | states is ln |X i |.Furthermore, no matter what the unit structure is, one can create a process consistent with that structure that results in this uniform distribution as the final distribution.(Just choose the rate matrix so that by the end of the process, for each coordinate i, x i has been uniformly randomized.) Now assign values f i = ln |X i | to all coordinates.By construction, for all units ω, i∈ω f i = S ω .In addition, ω i∈ω by the inclusion-exclusion principle.But the sum on the RHS just equals S N , the entropy of the full system, since the coordinates are statistically independent under the final distribution.Therefore I N * = 0 for this ending distribution.Similarly, for this ending distribution, I α * = 0 for every α ∈ V 2 (N * ).
So to find a situation where Proposition G.1 holds, it suffices to find an initial distribution p such that I N * (p) ≥ 0 while I α * (p) = 0 for all α ∈ V 2 (N * ).To do that, label the states of each coordinate i by the first |X i | counting numbers.Define M := min i∈N |X i |, and define Γ R N * as the (units corresponding to the) root nodes of the dependency graph Γ N * .Note that since by hypothesis N ̸ ∈ N * , there must be at least two distinct root nodes in Γ R N * .Furthermore, since there are no vacuous units in a composite system, all of (the units corresponding to) those root nodes contain coordinates that are not in any other units, i.To do that, assign the value f j = 0 to all coordinates j ̸ ∈ T .For each coordinate j ∈ T , where ω(j) is the unique unit containing j, assign the value In addition, the entropy of every unit ω ̸ ∈ Γ R N * equals 0. Accordingly, I α * (p) = 0 for all α ∈ V 2 (N * ).
This completes the proof.

K. Proofs of results for thermodynamics of feedback control of composite unit structures
The growth of the entropy of unit ω during the process changes when we expand it into the unit ω ′ (ω) that includes C. To calculate how much it changes, first, since C does not change state during the process, the entropy of each unit ω ′ (ω) grows during the process by the change in conditional entropy, S(p t f (Xω|C)) − S(p ti (Xω|C)) (K1) where p ti , p t f are given by Eq. 5.1, 5.2, respectively.In contrast, the growth of entropy in the unit ω in the original, no-feedback process is This just equals S(p t f (Xω)) − S(p ti (Xω)), due to our assumption that the initial and final marginal distributions over X are the same regardless of whether there is a feedback apparatus.Combining, we see that the change in the growth of entropy of unit ω when we add the feedback apparatus is the drop of mutual information, This is true for every unit ω ∈ N * , and for N itself.Plugging this fact into Eq.4.1 gives Eq. 5.4.Next, reuse the reasoning behind Eq. (K1) to establish that the lower bound on EP in the feedback scenario is In addition, the maximal work that can be extracted from the system under feedback control without consideration of the unit structure of the system is [20,29,31] −∆F (X N ) − ∆I(X N ; C) = ∆S(X N ) − ∆I(X N ; C) = ∆S(X N |C) (K5) (under the common assumption that the Hamiltonian at both t i and t f is uniform, and in units of k B T = 1).Subtracting the lower EP bound, Eq. (K4), from Eq. (K5) establishes Eq. 5.5.

Figure 2 .
Figure 2. The random walker scenario described in the introduction and investigated in Ex. 3. a) In the left panel the five coordinates are indicated by circles, with the associated rate matrix dependencies indicated by arrows, using the same convention as in Fig. 1. b) The right panel shows a height-2 dependency graph for this rate matrix.Each square is a different unit, with the associated coordinates explicitly written.Note that in dependency graphs arrows indicate the partial

G
. Proofs related to Eq. 4.1 To begin, I need to define V (N * ), a set of distributions over the Boolean hypercube, B |N * | .After that, I define a "centering distribution" to be any convex combination of the elements of V (N * ) which equals (1/|N * |, 1/|N * |, . ..), the uniform distribution over the Boolean hypercube.My first main result, presented in Proposition G.1, is a function taking each such centering distribution to a different lower bound on global EP.The set V (N * ) is the union of two sets of distributions over B |N * | , which I define in succession: I) For any ω ∈ N * , write δ ω for the distribution over B |N * | which is all 0's except for a 1 in its ω component.Using this notation, define V 1 (N * ) as the set of all distributions α over B |N * | which obey at least one of the following three conditions:
H37)Plugging this into the definition of type-4 vectors for α = M(ω) and α = M(ω) * confirms that Eq. (H21) holds.Combining establishes that for any centering vector γ,σ N ≥ −∆I N * − α∈V 2 (N * ) e., for all ω ∈ Γ R N * , ω ̸ = ∅.Next, define T := ∪ ω∈Γ R N * ω, and fix the state of each coordinate j ̸ ∈ T , i.e., set the distribution over the state of that coordinate to a delta function.Set the joint distribution over the remaining coordinates to that only occur in a single root unit are perfectly coupled with one another, with a uniform distribution over the set of M possible joint states they can adopt.The entropy of the full joint distribution defined this way is S(p) = ln M .So to prove that I N * (p) ≥ 0 we need to show that ω S(pω) ≥ ln M (J3) J4)where |ω(j)| is the number of elements in ω(j).By construction, for all units ω ∈ N * , S(pω) = N * contains at least two units, this means that ω S(pω) > ln M = S(p) (J8)