Thermodynamics of computing with circuits

David H Wolpert; Artemy Kolchinsky

doi:10.1088/1367-2630/ab82b8

1. Introduction

A long-standing focus of research in the physics community has been how the energetic resources required to perform a given computation depend on that computation. This issue is sometimes referred to as the 'thermodynamics of computation' or the 'physics of information' [1–3]. Similarly, a central focus of computer science theory has been how the minimal computational resources needed to perform a given computation depend on that computation [4, 5]. (Indeed, some of the most important open issues in computer science, like whether P = NP, concern the relationship between a computation and its resource requirements.) Reflecting this commonality of interests, there was a burst of early research relating the resource concerns of computer science theory with the resource concerns of thermodynamics [6–10].

Starting a few decades after this early research, there was dramatic progress in our understanding of non-equilibrium statistical physics [2, 11–15], which has resulted in new insights into the thermodynamics of computation [2, 3, 13, 16]. In particular, research has derived the '(generalized) Landauer bound' [17–22], which states that the heat generated by a thermodynamically reversible process that sends an initial distribution p₀(x₀) to an ending distribution p₁(x₁) is kT[S(p₀) − S(p₁)] (where S(p) indicates the entropy of distribution p, T is the temperature of the single bath, and k is Boltzmann's constant).

Almost all of this work on the Landauer bound assumes that the map taking initial states to final states, P(x₁|x₀), is implemented with a monolithic, 'all-at-once' physical process, jointly evolving all of the variables in the system at once. In contrast, for purely practical reasons modern computers are built out of circuits, i.e., they are built out of networks of 'gates', each of which evolves only a small subset of the variables of the full system [4, 5]. An example of a simple circuit that computes the parity of 3 input bits using two XOR gates, and which we will return to throughout this paper, is illustrated in figure 1.

**Figure 1.** A simple circuit that uses two exclusive-OR (XOR) gates to compute the parity of 3 inputs bits. The circuit outputs a 1 if an odd number of input bits are set to 1, and a 0 otherwise.
Download figure:
Standard image High-resolution image

Similarly, in the natural world, biological cellular regulatory networks carry out complicated computations by decomposing them into circuits of far simpler computations [23–25], as do many other kinds of biological systems [26–29].

As elaborated below, there are two major, unavoidable thermodynamic effects of implementing a given computation with a circuit of gates rather than with an all-at-once process:

(I) Suppose we build a circuit out of gates which were manufactured without any specific circuit in mind. Consider such a gate that implements bit erasure, and suppose that it is thermodynamically reversible if p₀ is uniform. So by the Landauer bound, it will generate heat kTS(p₀) = kT ln 2 if run on a uniform distribution.

Now in general, depending on where such a bit-erasing gate appears in a circuit, the actual initial distribution of its states, p'₀, will be non-uniform. This not only changes the Landauer bound for that gate from kT ln 2 to kTS(p'₀); it is now known that since the gate is thermodynamically reversible for p₀ ≠ p'₀, running that gate on p'₀ will not be thermodynamically reversible [30]. So the actual heat generated by running that bit will exceed the associated value of the Landauer bound, kTS(p'₀).

(II) Suppose the circuit is built out of two bit-erasing gates, and that each gate is thermodynamically reversible on a uniform input distribution when run separately from the circuit. If the marginal distributions over the initial states of the gates are both uniform, then the heat generated by running each of them is kT ln 2, and therefore the total generated heat is 2kT ln 2. Suppose though that there is nonzero statistical coupling between their states under their initial joint distribution. Then as elaborated below, even though each of the gates run separately is thermodynamically reversible, running them in parallel is not thermodynamically reversible. So running them generates extra heat beyond the minimum given by applying the Landauer bound to the dynamics of the full joint distribution⁴.

These two effects mean that the thermodynamic cost of running a given computation with a circuit will in general vary greatly depending on the precise circuit we use to implement that computation. In the current paper we analyze this dependence.

We make no restriction on the input–output maps computed by each gate in the circuit. They can be either deterministic (i.e., single-valued) or stochastic, logically reversible (i.e., implementing a deterministic permutation of the system's state space, as in Fredkin gates [6]) or not, etc. However, to ground thinking, the reader may imagine that the circuit being considered is a Boolean circuit, where each gate performs one of the usual single-valued Boolean functions, like logical AND gates, XOR gates, etc.

For simplicity, in this paper we focus on circuits whose topology does not contain loops [5, 31], such as the circuit shown in figure 1.

1.1. Contributions

We have four primary contributions.

(1) We derive exact expressions for how the entropy flow (EF) and entropy production (EP) produced by a fixed dynamical system vary as one changes the initial distribution of states of that system. These expressions capture effect (I) described above. (These expressions extend an earlier analysis [30]).

(2) We introduce 'solitary processes'. These are a type of physical process that can implement any particular gate in a circuit while respecting the constraints on what variables in the rest of the circuit that gate is coupled with. We can use the thermodynamic properties of solitary processes to analyze effect (II) described above.

3) We combine our first two contributions to analyze the thermodynamic costs of implementing circuits in a 'serial-reinitializing' manner. This means two things: the gates in the circuit are run one at a time, so each gate is run as a solitary process; after a gate is run its input wires are reinitialized, allowing for subsequent reuse of the circuit. In particular, we derive expressions relating the minimal EP generated by running an SR circuit to information-theoretic quantities associated with the wiring diagram of the circuit.

4) Our last contribution is an expression for the extra EP that arises in running an SR circuit if the initial state distributions at its gates differ from the ones that result in minimal EP for each of those gates. This expression involves an information-theoretic function that we call 'multi-divergence' which appears to be new to the literature.

1.2. Roadmap

In section 2.1 we introduce general notation, and then provide a minimal summary of the parts of stochastic thermodynamics, information theory and circuit theory that will be used in this paper. We also introduce the definition of the 'islands' of a stochastic matrix in that section, which will play a central role in our analysis. In section 3 we derive an exact expression for how the EF and EP of an arbitrary process depends on its initial state distribution. In section 4 we introduce solitary processes and then analyze their thermodynamics. In section 5 we introduce SR circuits. In section 6 we use the tools developed in the previous sections to analyze the thermodynamic properties of SR circuits. In section 7 we discuss related earlier work. Section 8 concludes and presents some directions for future work. All proofs that are longer than several lines are collected in the appendices.

2. Background

Because the analysis of the thermodynamics of circuits involves tools from multiple fields, we review those tools in this section. We also introduce some new mathematical structures that will be central to our analysis, in particular 'islands'. We begin by introducing notation.

2.1. General notation

We write a Kronecker delta as δ(a, b). We write a random variable with an upper case letter (e.g., X), and the associated set of possible outcomes with the associated calligraphic letter (e.g., $\mathcal{X}$ ). A particular outcome of a random variable is written with a lower case letter (e.g., x). We also use lower case letters like p, q, etc to indicate probability distributions.

We use ${{\Delta}}_{\mathcal{X}}$ to indicate the set of probability distribution over a set of outcomes $\mathcal{X}$ . For any distribution $p\in {{\Delta}}_{\mathcal{X}}$ , we use $\mathrm{s}\mathrm{u}\mathrm{p}\mathrm{p}\;p{:=}\left\{x\in \mathcal{X}:p\left(x\right){ >}0\right\}$ to indicate the support of p. Given a distribution p over $\mathcal{X}$ and any $\mathcal{Z}\subseteq \mathcal{X}$ , we write $p\left(\mathcal{Z}\right)={\sum }_{x\in \mathcal{Z}}p\left(x\right)$ to indicate the probability that the outcome of X is in $\mathcal{Z}$ . Given a function $f:\mathcal{X}\to \mathbb{R}$ , we write ${\mathbb{E}}_{p}\left[f\right]$ to indicate ∑_xp(x)f(x), the expectation of f under distribution p.

Given any conditional distribution P(y|x) of $y\in \mathcal{Y}$ given $x\in \mathcal{X}$ , and some distribution p over $\mathcal{X}$ , we write Pp for the distribution over $\mathcal{Y}$ induced by applying P to p:

$\begin{equation}\left[Pp\right]\left(y\right){:=}\sum _{x\in \mathcal{X}}P\left(y\vert x\right)p\left(x\right).\end{equation} \tag{ 1 }$

We will sometimes use the term 'map' to refer to a conditional distribution.

We say that a conditional distribution P is 'logically reversible' if it is deterministic (the entries of P(y|x) are 0/1-valued for all $x\in \mathcal{X}$ and $y\in \mathcal{Y}$ ) and if there do not exist $x,{x}^{\prime }\in \mathcal{X}$ and $y\in \mathcal{Y}$ such that P(y|x) > 0 and P(y|x') > 0. When $\mathcal{Y}=\mathcal{X}$ , a logically reversible P is simply a permutation matrix. Given any subset of states $\mathcal{Z}\subseteq \mathcal{X}$ , we also say that P is 'logically reversible over $\mathcal{Z}$ ' if the entries P(y|x) are 0/1-valued for all $x\in \mathcal{Z}$ and $y\in \mathcal{Y}$ , and there do not exist $x,{x}^{\prime }\in \mathcal{Z}$ and $y\in \mathcal{Y}$ such that P(y|x) > 0 and P(y|x') > 0.

We write a multivariate random variable with components V = {1, 2, ..., } as X_V = (X₁, X₂, ..., ), with outcomes x_V. We will also use upper case letters (e.g., A, V, ..., ) to indicate sets of variables. For any subset A ⊆ V we use the random variable X_A (and its outcomes x_A) to refer to the components of X_V indexed by A. Similarly, for a distribution p_V over X_V, we write the marginal distribution over X_A as p_A. For a singleton set {a}, we slightly abuse notation and write X_a instead of X_{a}.

2.2. Stochastic thermodynamics

We will consider a circuit to be physical system in contact with one or more thermodynamic reservoirs (heat baths, chemical baths, etc). The system evolves over some time interval (sometimes implicitly taken to be t ∈ [0, 1], where the units of time are arbitrary), possibly while being driven by a work reservoir. We refer to the set of thermodynamic reservoirs and the driving—and, in particular, the stochastic dynamics they induce over the system during t ∈ [0, 1]—as a physical process.

We use $\mathcal{X}$ to indicate the finite state space of the system. Physically, the states $x\in \mathcal{X}$ can either be microstates or they can be coarse-grained macrostates under some additional assumptions (e.g., that all macrostates have the same 'internal entropy' [2, 20, 32]).

While much of our analysis applies more broadly, to make things concrete one may imagine that the system undergoes master equation dynamics, also known as a continuous-time Markov chain (CTMC). This kind of dynamics is the basis of stochastic thermodynamics, which is often used to analyze the thermodynamics of discrete-state physical systems. In this subsection we briefly review stochastic thermodynamics, referring the reader to [33, 34] for more details.

Under a CTMC, the probability distribution over $\mathcal{X}$ at time t, indicated by p^t, evolves according to the master equation

$\begin{equation}\frac{\mathrm{d}}{\mathrm{d}t}{p}^{t}\left({x}^{\prime }\right)=\sum _{x}{p}^{t}\left(x\right){K}_{t}\left(x\to {x}^{\prime }\right),\end{equation} \tag{ 2 }$

where K_t is the rate matrix at time t. For any rate matrix K_t, the off-diagonal entries K_t(x → x') (for x ≠ x') indicate the rate at which probability flows from state x to x', while the diagonal entries are fixed by K_t(x → x) = −∑_x'(≠x)K_t(x → x'), which guarantees conservation of probability. If the system is connected to multiple thermodynamic reservoirs indexed by α, the rate matrix can be further decomposed as ${K}_{t}\left(x\to {x}^{\prime }\right)={\sum }_{\alpha }{K}_{t}^{\alpha }\left(x\to {x}^{\prime }\right)$ , where ${K}_{t}^{\alpha }$ is the rate matrix at time t corresponding to reservoir α.

The term entropy flow (EF) refers to the increase of entropy in all coupled reservoirs. The instantaneous rate of EF out of the system at time t is defined as

$\begin{equation}\dot {\mathcal{Q}}\left({p}^{t}\right)=\sum _{\alpha ,x,{x}^{\prime }}{p}^{t}\left(x\right){K}_{t}^{\alpha }\left(x\to {x}^{\prime }\right)\mathrm{ln}\frac{{K}_{t}^{\alpha }\left(x\to {x}^{\prime }\right)}{{K}_{t}^{\alpha }\left({x}^{\prime }\to x\right)}.\end{equation} \tag{ 3 }$

The overall EF incurred over the course of the entire process is $\mathcal{Q}={\int }_{0}^{1}\dot {\mathcal{Q}}\;\;\mathrm{d}t$ .

The term entropy production (EP) refers to the overall increase of entropy, both in the system and in all coupled reservoirs. The instantaneous rate of EP at time t is defined as

$\begin{equation}\dot {\sigma }\left({p}^{t}\right)=\frac{\mathrm{d}}{\mathrm{d}t}S\left({p}^{t}\right)+\dot {\mathcal{Q}}\left({p}^{t}\right).\end{equation} \tag{ 4 }$

The overall EP incurred over the course of the entire process is $\sigma ={\int }_{0}^{1}\dot {\sigma }\;\;\mathrm{d}t$ .

Note that we use terms like 'EF' and 'EP' to refer to either the associated rate or the associated integral over a non-infinitesimal time interval; the context should always make the precise meaning clear.

Given some initial distribution p, the EF, EP, and the drop in the entropy of the system from the beginning to the end of the process are related according to

$\begin{equation}\mathcal{Q}\left(p\right)=\left[S\left(p\right)-S\left(Pp\right)\right]+\sigma \left(p\right).\end{equation} \tag{ 5 }$

In general, the EF can be written as the expectation $\mathcal{Q}\left(p\right)$ = ∑_xp(x)q(x), where q(x) indicates the expected EF arising from trajectories that begin on state x. Given that the drop in entropy is a nonlinear function of p, while the expectation $\mathcal{Q}\left(p\right)$ is a linear function of p, equation (5) tells us that EP will generally be a nonlinear function of p. Note that if P is logically reversible, then S(p) = S(Pp) and therefore EF and EP will be equal for any p.

While the EF can be positive or negative, the log-sum inequality can be used to prove that EP for master equation dynamics is non-negative [15, 35]:

$\begin{equation}\mathcal{Q}\left(p\right){\geqslant}S\left(p\right)-S\left(Pp\right).\end{equation} \tag{ 6 }$

This can be viewed as a derivation of the second law of thermodynamics, given the assumption that our system is evolving forward in time as a CTMC.

All of these results are purely mathematical and hold for any CTMC dynamics, even in contexts having nothing to do with physical systems. However, these results can be interpreted in thermodynamic terms when each ${K}_{t}^{\alpha }$ obeys local detailed balance (LDB) with regard to thermodynamic reservoir α [3, 15, 33]. Consider a system with Hamiltonian H_t(⋅) at time t, and let α label a heat bath whose inverse temperature is β_α. Then, ${K}_{t}^{\alpha }$ will obey LDB when for all $x,{x}^{\prime }\in \mathcal{X}$ , either ${K}_{t}^{\alpha }\left(x\to {x}^{\prime }\right)={K}_{t}^{\alpha }\left({x}^{\prime }\to x\right)=0$ , or

$\begin{equation}\frac{{K}_{t}^{\alpha }\left(x\to {x}^{\prime }\right)}{{K}_{t}^{\alpha }\left({x}^{\prime }\to x\right)}={\mathrm{e}}^{{\beta }_{\alpha }\left({H}_{t}\left(x\right)-{H}_{t}\left({x}^{\prime }\right)\right)}.\end{equation} \tag{ 7 }$

If LDB holds, then EF can be written as [34]

$\begin{equation}\mathcal{Q}\left(p\right)=\sum _{\alpha }{\beta }_{\alpha }{Q}_{\alpha }\left(p\right),\end{equation} \tag{ 8 }$

where Q_α is the expected amount of heat transferred from the system into bath α during the process.

We end with two caveats concerning the use of stochastic thermodynamics to analyze real-world circuits. First, many of the processes described in this paper require that some transition rates be exactly zero at some moments. In many physical models this implies there are infinite energy barriers at those times. In addition, perfectly carrying out any deterministic map (such as bit erasure) requires the use of infinite energy gaps between some states at some times. Thus, as is conventional (though implicit) in much of the thermodynamics of computation literature, the thermodynamic costs derived in this paper should be understood as limiting values.

Second, there are some conditional distributions that take the system state at time 0 to its state at time 1, P(x₁|x₀), that cannot be implemented by any CTMC [36, 37]. For example, one cannot carry out (or even approximate) a simple bit flip P(x₁|x₀) = 1 − δ(x₁, x₀) with a CTMC. Now we can design a CTMC to implement any given P(x₁|x₀) to arbitrary precision, if the dynamics is expanded to include a set of 'hidden states' in addition to the states in X [21, 22]. However, as we explicitly demonstrate below, SR circuits can be implemented without introducing any such hidden states; this is one of their advantages. (See also example 9 in appendix A).

2.3. Information theory

Given two distributions p and r over random variable X, we use notation like S(p) for Shannon entropy and D(p||r) for Kullback–Leibler (KL) divergence. We write S(Pp) to refer to the entropy of the distribution over Y induced by p(x) and the conditional distribution P, as defined in equation (5), and similarly for other information-theoretic measures. Given two random variables X and Y with joint distribution p, we write S(p(X|Y)) for the conditional entropy of X given Y, and I_p(X; Y) for the mutual information (we drop the subscript p where the distribution is clear from context). All information-theoretic measures are in nats.

Some of our results below are formulated in terms of an extension of mutual information to more than two random variables that is known as 'total correlation' or multi-information [38]. For a random variable X_A = (X₁, X₂, ..., ), the multi-information is defined as

$\begin{equation}\mathcal{I}\left({p}_{A}\right)=\left[\sum _{v\in A}S\left({p}_{v}\right)\right]-S\left({p}_{A}\right).\end{equation} \tag{ 9 }$

Some of the other results below are formulated in terms of the multi-divergence between two probability distributions over the same multi-dimensional space. This is a recently introduced information-theoretic measure which can be viewed as an extension of multi-information to include a reference distribution. Given two distributions p_A and r_A over X_A, the multi-divergence is defined as

$\begin{equation}\mathcal{D}\left({p}_{A}{\Vert}{r}_{A}\right){:=}D\left({p}_{A}{\Vert}{r}_{A}\right)-\sum _{v\in A}D\left({p}_{v}{\Vert}{r}_{v}\right).\end{equation} \tag{ 10 }$

Multi-divergence measures how much of the divergence between p_A and r_A arises from the correlations among the variables X₁, X₂, ..., rather than from the marginal distributions of each variable considered separately. See appendix A of [3] for a discussion of the elementary properties of multi-divergence and its relation to conventional multi-information. Note that multi-divergence is defined with 'the opposite sign' of multi-information, i.e., by subtracting a sum of terms involving marginal variables from a term involving the joint random variable, rather than vice-versa.

2.4. 'Island' decomposition of a conditional distribution

A central part of our analysis will involve the equivalence relation,

$\begin{equation}x\sim {x}^{\prime }\;{\Leftrightarrow}\;\exists y\;:\;P\left(y\vert x\right){ >}0,\quad P\left(y\vert {x}^{\prime }\right){ >}0.\end{equation} \tag{ 11 }$

In words, x ∼ x' if there is a non-zero probability of transitioning to some state y from both x and x' under the conditional distribution P(y|x). We define an island of the conditional distribution P(y|x) as any connected subset of $\mathcal{X}$ given by the transitive closure of this equivalence relation. The set of islands of any P(⋅|⋅) form a partition of $\mathcal{X}$ , which we write as L(P).

We will also use the notion of the islands of the conditional distribution P restricted to some subset of states $\mathcal{Z}\subseteq \mathcal{X}$ . We write ${L}_{\mathcal{Z}}\left(P\right)$ to indicate the partition of $\mathcal{Z}$ generated by the transitive closure of the relation given by equation (11) for $x,{x}^{\prime }\in \mathcal{Z}$ . Note that in this notation, $L\left(P\right)={L}_{\mathcal{X}}\left(P\right)$ .

As an example, if P(y|x) > 0 for all $x\in \mathcal{X}$ and $y\in \mathcal{Y}$ (i.e., any final state y can be reached from any initial state x with non-zero probability), then L(P) contains only a single island. As another example, if P(y|x) implements a deterministic function $f:\mathcal{X}\to \mathcal{Y}$ , then L(P) is the partition of $\mathcal{X}$ given by the pre-images of f, $L\left(P\right)=\left\{{f}^{-1}\left(y\right):y\in \mathcal{Y}\right\}$ . For example, the conditional distribution that implements the logical AND operation of two binary variables,

$\begin{equation}P\left(c\vert a,b\right)=\delta \left(c,a\,b\right)\end{equation} \tag{ 12 }$

has two islands, corresponding to (a, b) ∈ {(0, 0), (0, 1), (1, 0)} and (a, b) ∈ {(1, 1)}, respectively. As a final example, let P be the following conditional distribution:

$\begin{equation}P\left(y\vert x\right)=\left[\begin{bmatrix}{cccc}\hfill 0.5\hfill & \hfill 0.5\hfill & \hfill 0\hfill & \hfill 0\hfill \\ \hfill 0\hfill & \hfill 0.5\hfill & \hfill 0.5\hfill & \hfill 0\hfill \\ \hfill 0\hfill & \hfill 0\hfill & \hfill 1\hfill & \hfill 0\hfill \\ \hfill 0\hfill & \hfill 0\hfill & \hfill 0\hfill & \hfill 1\hfill \end{bmatrix}\right],\end{equation} \tag{ 13 }$

where the rows and columns corresponds to the ordered states $\mathcal{X}=\mathcal{Y}=\left\{\mathsf{00},\mathsf{01},\mathsf{10},\mathsf{11}\right\}$ . The island decomposition for this map is illustrated in figure 2 (left). We also show the island decomposition for this map restricted to subset of states $\mathcal{Z}=\left\{\mathsf{00},\mathsf{01}\right\}$ in figure 2 (right).

**Figure 2.** Left: the island decomposition for the conditional distribution in equation (13). The two islands are indicated by the two rounded green boxes. Right: the island decomposition for this map with $\mathcal{X}$ restricted to the subset of states $\mathcal{Z}=\left\{\mathsf{00},\mathsf{01}\right\}$ . For this subset of states, there is only one island, indicated by the round green box.
Download figure:
Standard image High-resolution image

For any distribution p over $\mathcal{X}$ , any $\mathcal{Z}\subseteq \mathcal{X}$ , and any $c\in {L}_{\mathcal{Z}}\left(P\right)$ , p(c) = ∑_x∈cp(x) is the probability that the state of the system is contained in island c. It will be helpful to use the unusual notation p^c(x) to indicate the conditional probability of x within island c. Formally, p^c(x) = p(x)/p(c) if x ∈ c, and p^c(x) = 0 otherwise.

Intuitively, the islands of a conditional distribution are 'firewalled' subsystems, both computationally and thermodynamically isolated from one another for the duration of the process implementing that conditional distribution. In particular, we will show below that the EP of running P(y|x) on an initial distribution p can be written as a weighted sum of the EPs involved in running P on each separate island c ∈ L(P), where the weight for island c is given by p(c).

2.5. Circuit theory

For the purposes of this paper, a (logical) circuit is a special type of Bayes net [39–41]. Specifically, we define any circuit Φ as a tuple $\left(V,E,F,{\mathcal{X}}_{V}\right)$ . The pair (V, E) specifies the vertices and edges of a directed acyclic graph (DAG). (We sometimes call this DAG the wiring diagram of the circuit.) ${\mathcal{X}}_{V}$ is a Cartesian product ${\prod }_{v}{\mathcal{X}}_{v}$ , where each ${\mathcal{X}}_{v}$ is the set of possible states associated with node v. F is a set of conditional distributions, indicating the logical maps implemented at the non-root nodes of the DAG.

Following the convention in the Bayes nets literature, we orient edges in the direction of information flow. Thus, the inputs to the circuit are the roots of the associated DAG and the outputs are the leaves of the DAG ⁵. Without loss of generality, we assume that each node v has a special 'initialized state', indicated as ∅.

We use the term gate to refer to any non-root node, input node to refer to any root node, and output node or output gate to refer to a leaf node. For simplicity, we assume that all output nodes are gates, i.e., there is no root node which is also a leaf node. We write IN and x_IN to indicate the set of input nodes and their joint state, and similarly write OUT and x_OUT for the output nodes.

We write the set of all gates in a given circuit as G ⊆ V, and use g ∈ G to indicate a particular gate. We indicate the set of all nodes that are parents of gate g as pa(g). We indicate the set of nodes that includes gate g and all parents of g as n(g) := {g} ∪ pa(g).

As mentioned, F is a set of conditional distributions, indicating the logical maps implemented by each gate of the circuit. The element of F corresponding to gate g is written as π_g(x_g|x_pa(g)). In conventional circuit theory, each π_g is required to be deterministic (i.e., 0/1-valued). However, we make no such restriction in this paper. We write the overall conditional distribution of output gates given input nodes implemented by the circuit Φ as

$\begin{equation}{\pi }_{{\Phi}}\left({x}_{\mathrm{O}\mathrm{U}\mathrm{T}}\vert {x}_{\mathrm{I}\mathrm{N}}\right)=\sum _{{x}_{G{\backslash}\mathrm{O}\mathrm{U}\mathrm{T}}}\prod _{g\in G}{\pi }_{g}\left({x}_{g}\vert {x}_{\mathrm{p}\mathrm{a}\left(g\right)}\right).\end{equation} \tag{ 14 }$

We can illustrate this formalism using the parity circuit shown in figure 1. Here, V has 5 nodes, corresponding to the 3 input nodes and the two gates. The circuit operates over bits, so ${\mathcal{X}}_{v}=\left\{0,1\right\}$ for each v ∈ V. Both gates carry out the $\mathsf{X}\mathsf{O}\mathsf{R}$ operation, so both elements of F are given by π_g(x_g|x_pa(g)) = δ(x_g, $\mathsf{X}\mathsf{O}\mathsf{R}$ (x_pa(g))) (where $\mathsf{X}\mathsf{O}\mathsf{R}$ (x_pa(g)) = 1 when the two parents of gate g are in different states, and $\mathsf{X}\mathsf{O}\mathsf{R}$ (x_pa(g)) = 0 otherwise). Finally, E has four elements representing the edges connecting the nodes in V, which are shown as arrows in figure 1.

In the conventional representation of a physical circuit as a (Bayes net) DAG, the wires in the physical circuit are identified with edges in the DAG. However, in order to account for the thermodynamic costs of communication between gates along physical wires, it will be useful to represent the wires themselves as a special kind of gate. This means that the DAG (V, E) we use to represent a particular physical circuit is not the same as the DAG (V', E') that would be used in the conventional computer science representation of that circuit. Rather (V, E) is constructed from (V', E') as follows.

To begin, V = V' and E = E'. Then, for each edge $\left(v\to ~{v}\right)\in {E}^{\prime }$ , we first add a wire gate w to V, and then add two edges to E: an edge from v to w and an edge from w to $~{v}$ . So a wire gate w has a single parent and a single child, and implements the identity map, π_w(x_w|x_pa(w)) = δ(x_w, x_pa(w)). (This is an idealization of the real world, in which wires have nonzero probability of introducing errors.) We sometimes call (V, E) the wired circuit, to distinguish it from the original logical circuit defined as in computer science theory, (V', E'). We use W ⊂ G to indicate the set of wire gates in a wired circuit.

Every edge in a wired circuit either connects a wire gate to a non-wire gate or vice versa. Physically, the edges of the DAG of a wired circuit do not represent interconnects (e.g., copper wires), as they do in a logical circuit. Rather they only indicate physical identity: an edge e ∈ E going into a wire gate w from a non-wire node v indicates that the same physical variable will be written as either X_v or X_pa(w). Similarly, an edge e ∈ E going into a non-wire gate g from a wire gate w indicates that X_w is the same physical variable (and so always has the same state) as the corresponding component of X_pa(g). However, despite this modified meaning of the nodes in a wired circuit, equation (14) still applies to any wired circuit, as well as applying to the corresponding logical circuit. In figure 3, we demonstrate how to represent the 3-bit parity circuit from figure 1 as a wired circuit.

**Figure 3.** The 3-bit parity circuit of figure 1 represented as a wired circuit. Squares represent input nodes, rounded boxes represent non-wire gates, and smaller green circles represent wire gates. The output XOR gate is in blue, while the other (non-output) XOR gate is in red.
Download figure:
Standard image High-resolution image

We use the word 'circuit' to refer to either an abstract wired (or logical) circuit, or to a physical system that implements that abstraction. Note that there are many details of the physical system that are not specified in the associated abstract circuit. When we need to distinguish the abstraction from its physical implementation, we will refer to the latter as a physical circuit, with the former being the corresponding wired circuit. The context will always make clear whether we are using terms like 'gate', 'circuit', etc, to refer to physical systems or to their formal abstractions.

Even if one fully specifies the distinct physical subsystems of a physical circuit that will be used to implement each gate in a wired circuit, we still do not have enough information concerning the physical circuit to analyze the thermodynamic costs of running it. We still need to specify the initial states of those subsystems (before the circuit begins running), the precise sequence of operations of the gates in the circuit, etc. However, before considering these issues, we need to analyze the general form of the thermodynamic costs of running individual gates in a circuit, isolated from the rest of the circuit. We do that in the next section.

3. Decomposition of EF

Suppose we have a fixed physical system whose dynamics over some time interval is specified by a conditional distribution P, and let p be its initial state distribution, which we can vary. We decompose the EF of running that system into a sum of three functions of p. Applied to any specific gate in a circuit (the 'fixed physical system'), this decomposition tells us how the thermodynamic costs of that gate would change if the distribution of inputs to the gate were changed.

First, equation (6) tells us that the minimal possible EF, across all physical processes that transform p into P' := Pp, is given by the drop in system entropy. We refer to this drop as the Landauer cost of computing P on p, and write it as

$\begin{equation}\mathcal{L}\left(p\right){:=}S\left(p\right)-S\left(Pp\right).\end{equation} \tag{ 15 }$

Since EF is just Landauer cost plus EP, our next task is to calculate how the EP incurred by a fixed physical process depends on the initial distribution p of that process. To that end, in the rest of this section we show that EP can be decomposed into a sum of two non-negative functions of p. Roughly speaking, the first of those two functions reflects the deviation of the initial distribution p from an 'optimal' initial distribution, while the second term reflects the remaining EP that would occur even if the process were run on that optimal initial distribution.

To derive this decomposition, we make use of a mathematical result provided by the following theorem. The theorem considers any function of the initial distribution p which can be written in the form $S\left(Pp\right)-S\left(p\right)+{\mathbb{E}}_{p}\left[f\right]$ (i.e., the increase of Shannon entropy plus an expectation of some quantity with respect to p). The EP incurred by a physical process can be written in this form (by equation (5), where ${\mathbb{E}}_{p}\left[f\right]$ refers to the EF). Further below, we will also consider other functions, which are closely related to EP, that can be written in this special form. The theorem shows that any function with this special form can be decomposed into a sum of the two terms described above: the first term reflecting deviation of p from the optimal initial distribution (relative to all distributions with support in some restricted set of states, which we indicate as $\mathcal{Z}$ ), and a remainder term.

Theorem 1. Consider any function ${\Gamma}:{{\Delta}}_{\mathcal{X}}\to \mathbb{R}$ of the form

$\begin{equation*}{\Gamma}\left(p\right){:=}S\left(Pp\right)-S\left(p\right)+{\mathbb{E}}_{p}\left[f\right]\end{equation*}$

where P(y|x) is some conditional distribution of $y\in \mathcal{Y}$ given $x\in \mathcal{X}$ and $f:\mathcal{X}\to \mathbb{R}\cup \left\{\infty \right\}$ is some function. Let $\mathcal{Z}$ be any subset of $\mathcal{X}$ such that f(x) < ∞ for $x\in \mathcal{Z}$ , and let $q\in {{\Delta}}_{\mathcal{Z}}$ be any distribution that obeys

$\begin{equation*}{q}^{c}\in \underset{r:\mathrm{s}\mathrm{u}\mathrm{p}\mathrm{p}\;r\subseteq c}{\mathrm{argmin}}{\Gamma}\left(r\right)\quad \;\text{for}\;\text{all}\;\;c\in {L}_{\mathcal{Z}}\left(P\right).\end{equation*}$

Then, each q^c will be unique, and for any p with $\mathrm{s}\mathrm{u}\mathrm{p}\mathrm{p}\;p\subseteq \mathcal{Z}$ ,

$\begin{equation*}{\Gamma}\left(p\right)=D\left(p{\Vert}q\right)-D\left(Pp{\Vert}Pq\right)+\sum _{c\in {L}_{\mathcal{Z}}\left(P\right)}p\left(c\right){\Gamma}\left({q}^{c}\right).\end{equation*}$

We emphasize that P and f are implicit in the definition of Γ. We remind the reader that the definition of ${L}_{\mathcal{Z}}$ and q^c is provided in section 2.4. The proof is provided in appendix A.

Note that theorem 1 does not suppose that q is unique, only that the conditional distributions within each island, ${\left\{{q}^{c}\right\}}_{c}$ , are. Moreover, as implied by the statement of the theorem, the overall probability weights assigned to the separate islands, {q(c)}_c, has no effect on the value of Γ.

Consider some conditional distribution P(y|x), with $\mathcal{Y}=\mathcal{X}$ , implemented by a physical process. Then if we take $\mathcal{Z}=\mathcal{X}$ and ${\mathbb{E}}_{p}\left[f\right]=\mathcal{Q}$ in theorem 1, the function Γ is just the EP of running the conditional distribution P(y|x). This establishes the following decomposition of EP:

$\begin{equation}\sigma \left(p\right)=D\left(p{\Vert}q\right)-D\left(Pp{\Vert}Pq\right)+\sum _{c\in L\left(P\right)}p\left(c\right)\sigma \left({q}^{c}\right).\end{equation} \tag{ 16 }$

We emphasize that equation (16) holds without any restrictions on the process, e.g., we do not require that the process obey LDB. In fact, equation (16) even holds if the process does not evolve according to a CTMC (as long as EP can be defined via equation (5)).

We refer to the first term in equation (16), the drop in KL divergence between p and q as both evolve under P, as mismatch cost⁶. Mismatch cost is non-negative by the data-processing inequality for KL divergence [42]. It equals zero in the special case that p^c = q^c for each island $c\in {L}_{\mathcal{Z}}\left(P\right)$ . We refer to any such initial distribution p that results in zero mismatch cost as a prior distribution of the physical process that implements the conditional distribution P (the term 'prior' reflects a Bayesian interpretation of q; see [20, 30].) If there is more than one island in ${L}_{\mathcal{Z}}\left(P\right)$ , the prior distribution is not unique.

We call the second term in our decomposition of EP in equation (16), ${\sum }_{c\in {L}_{\mathcal{Z}}\left(P\right)}p\left(c\right)\sigma \left({q}^{c}\right)$ , the residual EP. In contrast to mismatch cost, residual EP does not involve information-theoretic quantities, and depends linearly on p. When ${L}_{\mathcal{Z}}\left(P\right)$ contains a single island, this 'linear' term reduces to an additive constant, independent of the initial distribution. The residual EP terms ${\left\{\sigma \left({q}^{c}\right)\right\}}_{c}$ are all non-negative, since EP is non-negative.

Concretely, the conditional distributions ${\left\{{q}^{c}\right\}}_{c}$ and the corresponding set of real numbers ${\left\{\sigma \left({q}^{c}\right)\right\}}_{c}$ depend on the precise physical details of the process, beyond the fact that the process implements P. Indeed, by appropriate design of the 'nitty gritty' details of the physical process, it is possible to have σ(q^c) = 0 for all $c\in {L}_{\mathcal{Z}}\left(P\right)$ , in which case the residual EP would equal zero for all p. (For example, this will be the case if the process is an appropriate quasi-static transformation; see [21, 43].)

Imagine that the conditional distribution P is logically reversible over some set of states $\mathcal{Z}\subseteq \mathcal{X}$ , and that $\mathrm{s}\mathrm{u}\mathrm{p}\mathrm{p}\;p\subseteq \mathcal{Z}$ . Then both mismatch cost and Landauer cost must equal zero, and EF must equal EP, which in turn must equal residual EP⁷. Conversely, if P is not logically reversible over $\mathcal{Z}$ , then mismatch cost cannot be zero for all initial distributions p with $\mathrm{s}\mathrm{u}\mathrm{p}\mathrm{p}\;p\subseteq \mathcal{Z}$ (for such a P, regardless of what q is, there will be some p with $\mathrm{s}\mathrm{u}\mathrm{p}\mathrm{p}\;p\subseteq \mathcal{Z}$ such that the KL divergence between p and q will shrink under the mapping P). Thus, for any fixed process that implements a logically irreversible map, there will be some initial distributions p that result in unavoidable EP.

To provide some intuition into these results, the following example reformulates the EP of a very commonly considered scenario as a special case of equation (16):

Example 1. Consider a physical system evolving according to an irreducible master equation, while coupled to a single thermodynamic reservoir and without external driving. Because there is no external driving, the master equation is time-homogeneous with some unique equilibrium distribution p_eq. So the system is relaxing toward that equilibrium as it undergoes the conditional distribution P over the interval t ∈ [0, 1].

For this kind of relaxation process, it is well known that the EP can be written as [34, 44, 45]:

$\begin{equation}\sigma \left(p\right)=D\left(p{\Vert}{p}_{\mathrm{e}\mathrm{q}}\right)-D\left(Pp{\Vert}{p}_{\mathrm{e}\mathrm{q}}\right).\end{equation} \tag{ 17 }$

Equation (17) can also be derived from our result, equation (16), since

(a)
Taking $\mathcal{Z}=\mathcal{X}$ , P has a single island (because the master equation is irreducible, and therefore any state is reachable from any other over t ∈ [0, 1]);
(b)
The prior distribution within this single island is q = p_eq (since the EP would be exactly zero if the system were started at this equilibrium, which is a fixed point of P);
(c)
The residual EP is σ(q) = 0 (again using fact that EP is exactly zero for p = p_eq, and that there is a single island);
(d)
Pq = p_eq (since there is no driving, and the equilibrium distribution is a fixed point of P).

Thus, equation (16) can be seen as a generalization of the well-known relation given by equation (17), which is defined for simple relaxation processes, to processes that are driven and possibly connected to multiple reservoirs.

The following example addresses the effect of possible discontinuities in the island decomposition of P on our decomposition of thermodynamic costs:

Example 2. Mismatch cost and residual EP are both defined in terms of the island decomposition of the conditional distributions P over some set of states $\mathcal{Z}$ . That decomposition in turn depends on which (if any) entries in the conditional probability distribution P are exactly 0. This suggests that the decomposition of equation (16) can depend discontinuously on very small variations in P which replace strictly zero entries in P with infinitesimal values, since such variations will change the island decomposition of P.

To address this concern, first note that if P ≃ P', then the EP of the real-world process that implements P' can be approximated as

$\begin{align}\hfill {\sigma }^{\prime }\left(p\right)& =S\left({P}^{\prime }p\right)-S\left(p\right)+{\mathcal{Q}}^{\prime }\left(p\right)\hfill \\ \hfill & \simeq S\left(Pp\right)-S\left(p\right)+{\mathcal{Q}}^{\prime }\left(p\right),\hfill \end{align} \tag{ 18 }$

where ${\mathcal{Q}}^{\prime }\left(p\right)$ is the EF function of the real-world process, with the approximation becoming exact as P → P'⁸. If we now apply theorem 1 to the right-hand side of equation (18), we see that so long as P' is close enough to P, we can approximate σ'(p) as a sum of mismatch cost and residual EP using the islands of the idealized map P, instead of the actual map P'.

4. Solitary processes

Implicit in the definition of a physical circuit is that it is 'modular', in the sense that when a gate in the circuit runs, it is physically coupled to the gates that are its direct inputs, and those that directly get its output, but is not physically coupled to any other gates in the circuit. This restriction on the allowed physical coupling is a constraint on the possible processes that implement each gate in the circuit. It has major thermodynamic consequences, which we analyze in this section.

To begin, suppose we have a system that can be decomposed into two separate subsystems, A and B, so that the system's overall state space $\mathcal{X}$ can be written as $\mathcal{X}={\mathcal{X}}_{A}{\times}{\mathcal{X}}_{B}$ , with states (x_A, x_B). For example, A might contain a particular gate and its inputs, while B might consist of all other nodes in the circuit. We use the term solitary process to refer to a physical process over state space ${\mathcal{X}}_{A}{\times}{\mathcal{X}}_{B}$ that takes place during t ∈ [0, 1] where:

(a)
A evolves independently of B, and B is held fixed:
$\begin{equation}P\left({x}_{A}^{\prime },{x}_{B}^{\prime }\vert {x}_{A},{x}_{B}\right)={P}_{A}\left({x}_{A}^{\prime }\vert {x}_{A}\right)\delta \left({x}_{B}^{\prime },{x}_{B}\right).\end{equation} \tag{ 19 }$
(b)
The EF of the process depends only on the initial distribution over ${\mathcal{X}}_{A}$ , which we indicate with the following notation:
$\begin{equation}\mathcal{Q}\left(p\right)={\mathcal{Q}}_{A}\left({p}_{A}\right).\end{equation} \tag{ 20 }$
(c)
The EF is lower bounded by the change in the marginal entropy of subsystem A,
$\begin{equation}{\mathcal{Q}}_{A}\left({p}_{A}\right){\geqslant}S\left({p}_{A}\right)-S\left({P}_{A}{p}_{A}\right).\end{equation} \tag{ 21 }$

Note that it may be that some subset A' of the variables in subsystem A do not change their state during the solitary process. In that sense such variables would be like the variables in B. However, if the dynamics of those variables in A that do change state depends on the values of the variables in A', then in general the variables in A' cannot be assigned to B; they have to be included in subsystem A in order for condition (b) to be met.

Example 3. A concrete example of a solitary process is a CTMC where at all times, the rate matrix K_t has the decoupled structure

$\begin{equation}{K}_{t}\left({x}_{V}\to {x}_{V}^{\prime }\right)=\delta \left({x}_{B},{x}_{B}^{\prime }\right)\sum _{\alpha }{K}_{t}^{A,\alpha }\left({x}_{A}\to {x}_{A}^{\prime }\right)\end{equation} \tag{ 22 }$

for ${x}_{V}\ne {x}_{V}^{\prime },$ where ${K}_{t}^{A,\alpha }$ indicates the rate matrix for subsystem A and thermodynamic reservoir α at time t⁹.

To verify that this CTMC is a solitary process, first plug the rate matrix in equation (22) into equation (2) and simplify, giving

$\begin{equation*}\frac{\mathrm{d}}{\mathrm{d}t}{p}^{t}\left({x}_{A}^{\prime },{x}_{B}^{\prime }\right)={p}^{t}\left({x}_{B}^{\prime }\right)\sum _{{x}_{A}}{p}^{t}\left({x}_{A}\vert {x}_{B}^{\prime }\right)\sum _{\alpha }{K}_{t}^{A,\alpha }\left({x}_{A}\to {x}_{A}^{\prime }\right).\end{equation*}$

Marginalizing the above equation, we see that the distribution over the states of A evolves independently of the state of x_B, according to

$\begin{equation*}\frac{\mathrm{d}}{\mathrm{d}t}{p}^{t}\left({x}_{A}^{\prime }\right)=\sum _{{x}_{A}}{p}^{t}\left({x}_{A}\right)\sum _{\alpha }{K}_{t}^{A,\alpha }\left({x}_{A}\to {x}_{A}^{\prime }\right).\end{equation*}$

Note also that given the form of equation (22), the state of B does not change. Thus, the conditional distribution carried out by this CTMC over any time interval must have the form of equation (19). (See also appendix B in [3].)

Next, plug equation (22) into equation (3) and simplify to get

$\begin{equation}\dot {\mathcal{Q}}\left({p}^{t}\right)=\sum _{\alpha ,{x}_{A},{x}_{A}^{\prime }}{p}^{t}\left({x}_{A}\right){K}_{t}^{A,\alpha }\left({x}_{A}\to {x}_{A}^{\prime }\right)\mathrm{ln}\frac{{K}_{t}^{A,\alpha }\left({x}_{A}\to {x}_{A}^{\prime }\right)}{{K}_{t}^{A,\alpha }\left({x}_{A}^{\prime }\to {x}_{A}\right)}.\end{equation} \tag{ 23 }$

Thus, the EF incurred by the process evolves exactly as if A were an independent system connected to a set of thermodynamic reservoirs. Therefore, a joint system evolving according to equation (22) will satisfy equations (20) and (21).

We refer to the lower bound on the EF of subsystem A, as given in equation (21), as the subsystem Landauer cost for the solitary process. We make the associated definition that the subsystem EP for the solitary process is

$\begin{equation}{\hat{\sigma }}_{A}\left({p}_{A}\right){:=}{\mathcal{Q}}_{A}\left({p}_{A}\right)-\left[S\left({p}_{A}\right)-S\left({P}_{A}{p}_{A}\right)\right],\end{equation} \tag{ 24 }$

which by equation (21) is non-negative. Note that if P_A is a logically reversible conditional distribution, then subsystem EP is equal to the EF incurred by the solitary process.

In general, S(p_A) − S(P_Ap_A), the subsystem Landauer cost, will not equal S(p_AB) − S(Pp_AB), the Landauer cost of the entire joint system. Loosely speaking, an observer examining the entire system would ascribe a different value to its entropy change during the solitary process than would an observer examining just subsystem A—even though subsystem B does not change its state. We use the term Landauer loss to refer to this difference in Landauer costs,

$\begin{equation}{\mathcal{L}}^{\mathrm{l}\mathrm{o}\mathrm{s}\mathrm{s}}\left(p\right){:=}\left[S\left({p}_{A}\right)-S\left({P}_{A}{p}_{A}\right)\right]-\left[S\left({p}_{AB}\right)-S\left(P{p}_{AB}\right)\right].\end{equation} \tag{ 25 }$

Assuming that the lower bound in equation (21) can be saturated, since the bound in equation (6) can be saturated, the Landauer loss is the increase in the minimal EF that must be incurred by any process that carries out P if that process is required to be a solitary process.

By using the fact that subsystem B remains fixed throughout a solitary process, the Landauer loss can be rewritten as the drop in the mutual information between A and B, from the beginning to the end of the solitary process,

$\begin{equation}{\mathcal{L}}^{\mathrm{l}\mathrm{o}\mathrm{s}\mathrm{s}}\left(p\right)={I}_{p}\left(A;B\right)-{I}_{Pp}\left(A;B\right).\end{equation} \tag{ 26 }$

Applying the data processing inequality establishes that Landauer loss is non-negative [46]. (See section 7 for a discussion of the relation between solitary processes and other processes that have been considered in the literature.)

If P_A (and thus also P) is logically reversible, then the Landauer loss will always be zero. However, for other conditional distributions, there is always some p that results in strictly positive Landauer loss. Moreover, we can rewrite it as

$\begin{equation}{\mathcal{L}}^{\mathrm{l}\mathrm{o}\mathrm{s}\mathrm{s}}\left(p\right)=\sigma \left({p}_{AB}\right)-{\hat{\sigma }}_{A}\left({p}_{A}\right).\end{equation} \tag{ 27 }$

So in general the subsystem EP will be less than the overall EP of the entire system¹⁰.

Finally, note that ${\mathcal{Q}}_{A}\left({p}_{A}\right)$ is a linear function of the distribution p_A (since EF functions are linear). Combining this fact with theorem 1, while taking $\mathcal{Z}={\mathcal{X}}_{A}$ , allows us to expand the subsystem EP as

$\begin{align}\hfill {\hat{\sigma }}_{A}\left({p}_{A}\right)& =D\left({p}_{A}{\Vert}{q}_{A}\right)-D\left({P}_{A}{p}_{A}{\Vert}{P}_{A}{q}_{A}\right)+\sum _{c\in L\left({P}_{A}\right)}{p}_{A}\left(c\right){\hat{\sigma }}_{A}\left({q}_{A}\right),\hfill \end{align} \tag{ 28 }$

where q_A is a distribution over ${\mathcal{X}}_{A}$ that satisfies ${\hat{\sigma }}_{a}\left({q}_{A}^{c}\right)={\mathrm{min}}_{r:\mathrm{s}\mathrm{u}\mathrm{p}\mathrm{p}\;r\subseteq c}{\hat{\sigma }}_{A}\left(r\right)$ for all c ∈ L(P_A). As before, both the drop in KL divergence and the term linear in p_A(c) are non-negative. We will sometimes refer to that drop in KL divergence as subsystem mismatch cost, with q_A the subsystem prior, and refer to the linear term as subsystem residual EP. Intuitively, subsystem Landauer cost, subsystem EP, subsystem mismatch cost, and subsystem residual EP are simply the values of those quantities that an observer would ascribe to subsystem A if they observed it independently of B.

5. Serial-reinitialized circuits

As mentioned at the end of section 2.5, specifying a wired circuit does not specify the initial distributions of the gates in the physical circuit, the sequence in which the gates in the physical circuit are run, etc. So it does not fully specify the dynamics of a physical system that implements that wired circuit. In this section we introduce one relatively simple way of mapping a wired circuit to such a full specification. In this specification, the gates are run serially, one after the other. Moreover, the gates reinitialize the states of their parent gates after they run, so that the entire circuit can be repeatedly run, incurring the same expected thermodynamic costs each time. We call such physical systems serial reinitialized implementations of a given wired circuit, or just SR circuits for short.

For simplicity, in the main text of this paper we focus on the special case in which all non-output nodes have out-degree 1, i.e., where each non-output node is the parent of exactly one gate. See appendix C for a discussion of how to extend the current analysis to relax this requirement, allowing some nodes to have out-degree larger than 1.

There are several properties that jointly define the SR circuit implementation of a given wired circuit.

First, just before the physical circuit starts to run, all of its nodes have a special initialized value with probability 1, i.e., x_v = ∅ for all v ∈ V at time t = 0. Then the joint state of the input nodes x_IN is set by sampling p_IN(x_IN)¹¹. Typically this setting of the state of the input nodes is done by some offboard system, e.g., the user of the digital device containing the circuit. We do not include the details of this offboard system in our model of the physical circuit. Accordingly, we do not include the thermodynamic costs of setting the joint state of the input nodes in our calculation of the thermodynamic costs of running the circuit¹².

After x_IN is set this way, the SR circuit implementation begins. It works by carrying out a sequence of solitary processes, one for each gate of the circuit, including wire gates. At all times that a gate g is 'running', the combination of that gate and its parents (which we indicate as n(g)) is the subsystem A in the definition of solitary processes. The set of all other nodes of the wired circuit (V\n(g)) constitute the subsystem B of the solitary process. The temporal ordering of the solitary processes must be a topological ordering consistent with the wiring diagram of the circuit: if gate g is an ancestor of gate g', then the solitary process for gate g completes before the solitary process for gate g' begins.

When the solitary process corresponding to any gate g ∈ G begins running, x_g is still set to its initialized state, ∅, while all of the parent nodes of g are either input nodes, or other gates that have completed running and are set to their output values. By the end of the solitary process for gate g, x_g is set to a random sample of the conditional distribution π_g(x_g|x_pa(g)), while its parents are reinitialized to state ∅. More formally, under the solitary process for gate g, nodes n(g) evolve according to

$\begin{equation}{P}_{g}\left({x}_{n\left(g\right)}^{\prime }\vert {x}_{n\left(g\right)}\right){:=}{\pi }_{g}\left({x}_{g}^{\prime }\vert {x}_{\mathrm{p}\mathrm{a}\left(g\right)}\right)\prod _{v\in \mathrm{p}\mathrm{a}\left(g\right)}\delta \left({x}_{v}^{\prime },\varnothing \right)\end{equation} \tag{ 29 }$

while all nodes V\n(g) do not change their states. (Recall notation from section 2.5). Note that this means that the input nodes are reinitialized as soon as their child gates have run.

Example 4. In this example we demonstrate how to implement an XOR gate g in an SR circuit with a CTMC, i.e., how to carry out the following logical map on the state of gate g,

$\begin{equation*}{\pi }_{g}\left({x}_{g}\vert {x}_{\mathrm{p}\mathrm{a}\left(g\right)}\right)=\delta \left({x}_{g},\mathsf{X}\mathsf{O}\mathsf{R}\left({x}_{\mathrm{p}\mathrm{a}\left(g\right)}\right)\right),\end{equation*}$

and then reset the gate's parents. The CTMC involves a sequence of two solitary processes over n(g). The time-dependent rate matrix for both solitary processes has the form

$\begin{equation*}{K}_{t}\left({x}_{V}\to {x}_{V}^{\prime }\right)=\delta \left({x}_{V{\backslash}n\left(g\right)},{x}_{V{\backslash}n\left(g\right)}^{\prime }\right){K}_{t}^{n\left(g\right)}\left({x}_{n\left(g\right)}\to {x}_{n\left(g\right)}^{\prime }\right)\end{equation*}$

for all x_V ≠ x_V' (compare to equation (22), where for simplicity we assume there is a single thermodynamic reservoir). The two solitary processes differ in their associated subsystem rate matrices ${K}_{t}^{n\left(g\right)}$ .

In the first solitary process, the state of the gate's parents is held fixed, while the gate's output is changed from the initialized state to the correct XOR value. For t ∈ [0, 1] (the units of time are arbitrary), the subsystem rate matrix that implements this solitary process is

$\begin{align}\hfill {K}_{t}^{n\left(g\right)}\left({x}_{n\left(g\right)}\to {x}_{n\left(g\right)}^{\prime }\right)& =\delta \left({x}_{\mathrm{p}\mathrm{a}\left(g\right)},{x}_{\mathrm{p}\mathrm{a}\left(g\right)}^{\prime }\right)\eta \left[\left(1-t\right)\delta \left({x}_{g}^{\prime },\varnothing \right)/4+t{\pi }_{g}\left({x}_{g}^{\prime }\vert {x}_{\mathrm{p}\mathrm{a}\left(g\right)}^{\prime }\right)/4\right],\hfill \end{align} \tag{ 30 }$

for x_n(g) ≠ x'_n(g), where η > 0 is the relaxation speed. Note that the term δ(x'_n(g), ∅) inside the square brackets encodes the assumption that the initial state of the gate is ∅ with probability 1, while the factor of 1/4 encodes the assumption that the initial distribution over the four possible states of the gate's parents is uniform.

From the beginning to the end of the first solitary process, the nodes n(g) are updated according to the conditional probability distribution ${P}_{g}^{\left(1\right)}$ , given by the time-ordered exponential of the rate matrix in equation (30) over t ∈ [0, 1]. In the quasi-static limit η → ∞, this conditional distribution becomes

$\begin{equation*}{P}_{g}^{\left(1\right)}\left({x}_{n\left(g\right)}^{\prime }\vert {x}_{n\left(g\right)}\right)=\delta \left({x}_{\mathrm{p}\mathrm{a}\left(g\right)},{x}_{\mathrm{p}\mathrm{a}\left(g\right)}^{\prime }\right){\pi }_{g}\left({x}_{g}^{\prime }\vert {x}_{\mathrm{p}\mathrm{a}\left(g\right)}\right).\end{equation*}$

In the second solitary process, the gate's output is held fixed while the gate's parents are reinitialized. Redefining the time coordinate so that this second process also transpires in t ∈ [0, 1], its subsystem rate matrix is

$\begin{align}\hfill {K}_{t}^{n\left(g\right)}\left({x}_{n\left(g\right)}\to {x}_{n\left(g\right)}^{\prime }\right)& =\delta \left({x}_{g},{x}_{g}^{\prime }\right)\eta \left[\left(1-t\right){\pi }_{g}\left({x}_{g}^{\prime }\vert {x}_{\mathrm{p}\mathrm{a}\left(g\right)}^{\prime }\right)/4+t\prod _{v\in \mathrm{p}\mathrm{a}\left(g\right)}\delta \left({x}_{v}^{\prime },\varnothing \right)/2\right],\hfill \end{align} \tag{ 31 }$

for x_n(g) ≠ x'_n(g), where η is again the relaxation speed. Note that ${\pi }_{g}\left({x}_{g}^{\prime }\vert {x}_{\mathrm{p}\mathrm{a}\left(g\right)}^{\prime }\right)/4$ is what the distribution over nodes n(g) would be at the beginning of the second solitary process, if the distribution at the beginning of the first solitary process was δ(x_g', ∅)/4. From the beginning to the end of the second solitary process, the nodes n(g) are updated according to the conditional probability distribution ${P}_{g}^{\left(2\right)}$ , which is given by the time-ordered exponential of the rate matrix equation (31). In the quasi-static limit η → ∞, this conditional distribution is

$\begin{equation*}{P}_{g}^{\left(2\right)}\left({x}_{n\left(g\right)}^{\prime }\vert {x}_{n\left(g\right)}\right)=\delta \left({x}_{g},{x}_{g}^{\prime }\right)\prod _{v\in \mathrm{p}\mathrm{a}\left(g\right)}\delta \left({x}_{v}^{\prime },\varnothing \right).\end{equation*}$

The sequence of two solitary processes causes the nodes in n(g) to be updated according to the conditional distribution ${P}_{g}={P}_{g}^{\left(1\right)}{P}_{g}^{\left(2\right)}$ . In the quasi-static limit, this is

$\begin{equation}{P}_{g}\left({x}_{n\left(g\right)}^{\prime }\vert {x}_{n\left(g\right)}\right)={\pi }_{g}\left({x}_{g}^{\prime }\vert {x}_{\mathrm{p}\mathrm{a}\left(g\right)}\right)\prod _{v\in \mathrm{p}\mathrm{a}\left(g\right)}\delta \left({x}_{v}^{\prime },\varnothing \right),\end{equation} \tag{ 32 }$

which recovers equation (29), as desired.

We now compute thermodynamic costs for the XOR gate. Let $\mathcal{Q}\left({p}_{\mathrm{p}\mathrm{a}\left(g\right)}\right)$ be the total EF incurred by running the sequence of two solitary process, given some initial distribution p_pa(g) over the parents of gate g. Using results from section 4, write this EF as

$\begin{align}\hfill \mathcal{Q}\left({p}_{\mathrm{p}\mathrm{a}\left(g\right)}\right)& =S\left({p}_{\mathrm{p}\mathrm{a}\left(g\right)}\right)-S\left({\pi }_{g}{p}_{\mathrm{p}\mathrm{a}\left(g\right)}\right)+D\left({p}_{\mathrm{p}\mathrm{a}\left(g\right)}{\Vert}{q}_{\mathrm{p}\mathrm{a}\left(g\right)}\right)-D\left({\pi }_{g}{p}_{\mathrm{p}\mathrm{a}\left(g\right)}{\Vert}{\pi }_{g}{q}_{\mathrm{p}\mathrm{a}\left(g\right)}\right)+\sum _{c\in L\left({\pi }_{g}\right)}{p}_{\mathrm{p}\mathrm{a}\left(g\right)}\left(c\right){\hat{\sigma }}_{n\left(g\right)}\left({q}_{\mathrm{p}\mathrm{a}\left(g\right)}\right),\hfill \end{align} \tag{ 33 }$

where the three lines correspond to subsystem Landauer cost, subsystem mismatch cost, and subsystem residual EP, respectively. To derive this decomposition, we applied theorem 1, while considering the subset of states $\mathcal{Z}=\left\{{x}_{n\left(g\right)}\in {\mathcal{X}}_{n\left(g\right)}:{x}_{g}=\varnothing \right\}$ (note that for this $\mathcal{Z}$ , ${L}_{\mathcal{Z}}\left(P\right)=L\left({\pi }_{g}\right)$ ).

To compute the second and third of those terms, note that in the quasi-static limit, the prior distribution is uniform:

$\begin{equation}{q}_{\mathrm{p}\mathrm{a}\left(g\right)}\left({x}_{\mathrm{p}\mathrm{a}\left(g\right)}\right)=1/4.\end{equation} \tag{ 34 }$

To see this, suppose that the distribution over n(g) when the sequence of processes begins is given by p_n(g)(x_n(g)) = δ(x_g, ∅)q_pa(g)(x_pa(g)). Then,

(a)
The system will remain in equilibrium during the first solitary process, thereby incurring zero EP. At the end of the first solitary process, it will have distribution
$\begin{equation}\left[{P}_{g}^{\left(1\right)}{p}_{n\left(g\right)}\right]\left({x}_{n\left(g\right)}\right)={q}_{\mathrm{p}\mathrm{a}\left(g\right)}\left({x}_{\mathrm{p}\mathrm{a}\left(g\right)}\right){\pi }_{g}\left({x}_{g}^{\prime }\vert {x}_{\mathrm{p}\mathrm{a}\left(g\right)}\right).\end{equation} \tag{ 35 }$
(b)
Given that the system starts the second solitary process with this distribution ${P}_{g}^{\left(1\right)}{p}_{n\left(g\right)}$ , it will remain in equilibrium throughout the second solitary process, thereby again incurring zero EP.

So that sequence of processes will incur zero EP—the minimum possible—if the initial distribution is q_pa(g) over pa(g) (and x_g = ∅), as claimed. In addition, the fact that the minimal EP that can be generated for any initial distribution is strictly zero means that the subsystem residual EP vanishes. This fully specifies all terms in equation (33) as a function of p_pa(g).

As a concrete example of this analysis, consider the initial distribution which is uniform over states {00, 01, 10}:

$\begin{equation*}{p}_{\mathrm{p}\mathrm{a}\left(g\right)}\left({x}_{\mathrm{p}\mathrm{a}\left(g\right)}\right)=\left[1-\delta \left({x}_{\mathrm{p}\mathrm{a}\left(g\right)},\mathsf{11}\right)\right]/3.\end{equation*}$

For this distribution, the subsystem Landauer cost is

$\begin{align}\hfill S\left({p}_{\mathrm{p}\mathrm{a}\left(g\right)}\right)-S\left({\pi }_{g}{p}_{\mathrm{p}\mathrm{a}\left(g\right)}\right)& \hfill =\mathrm{ln}\;3+\left[\left(1/3\right)\mathrm{ln}\left(1/3\right)+\left(2/3\right)\mathrm{ln}\left(2/3\right)\right]\approx 0.46.\end{align}$

The subsystem EP is

$\begin{align}\hfill D\left({p}_{\mathrm{p}\mathrm{a}\left(g\right)}{\Vert}{q}_{\mathrm{p}\mathrm{a}\left(g\right)}\right)-D\left({\pi }_{g}{p}_{\mathrm{p}\mathrm{a}\left(g\right)}{\Vert}{\pi }_{g}{q}_{\mathrm{p}\mathrm{a}\left(g\right)}\right)& =\left[\mathrm{ln}\;4-\mathrm{ln}\;2\right]-\left[S\left({p}_{\mathrm{pa}\left(g\right)}\right)-S\left({\pi }_{g}{p}_{\mathrm{p}\mathrm{a}\left(g\right)}\right)\right]\approx 0.23.\hfill \end{align}$

We end by noting that the XOR gate may also incur some EP which is not accounted for by these calculations, due to loss of correlations between the nodes n(g) and the rest of the circuit as the gate runs. This is quantified by the Landauer loss, which can be evaluated using equations (25) and (26) or equation (27).

Given the requirement that the solitary processes are run accordingly to a topological ordering, equation (29) ensures that once all the gates of the circuit have run, the state of the output gates of the circuit have been set to a random sample of π_Φ(x_OUT|x_IN), while all non-output nodes are back in their initialized states, i.e., x_v = ∅ for v ∈ V\OUT.

Example 5. Consider the 3-bit parity circuit shown in figure 3. An SR implementation of this wired circuit would run its 6 gates in topological order, such that each gate computes its output and then reinitializes its parents. One such sequence of steps is shown in figure 4 (note that some other topological orderings are also possible). Each XOR gate could be implemented by the kind of CTMC described in example 4. Each wired gate could be run by a similar kind CTMC, but which carries out the identity map π_g(x_g'|x_pa(g)) = δ(x_g', x_pa(g)), instead of the XOR map.

**Figure 4.** An SR implementation of the wired circuit shown in figure 3. Each diagram represents one step of the SR implementation, with white shapes indicating nodes set to their initialized value (∅) and maroon shapes indicating nodes that can have non-initialized values. The implementation starts with only the input nodes set to non-initialized values (left-most diagram) and ends with only the output gates set to non-initialized values (right-most diagram).
Download figure:
Standard image High-resolution image

After the SR circuit has run, some 'offboard system' may make a copy of the state of the output gate for subsequent use, e.g., by copying it into some of the input bits of some downstream circuit(s), onto an external disk, etc. Regardless, we assume that after the circuit finishes, but before the circuit is run again, the state of the output nodes have also been reinitialized to ∅. Just as we do not model the physical mechanism by which new inputs are sampled for the next run of the circuit, we also do not model the physical mechanism by which the output of the circuit is reinitialized. Accordingly, in our calculation of the thermodynamic costs of running the circuit, we do not account for any possible cost of reinitializing the output¹³

This kind of cyclic procedure for running the circuit allows the circuit to be re-used an arbitrary number of times, while ensuring that each time it will have the same expected thermodynamic behavior (Landauer cost, mismatch cost, etc), and will carry out the same map π_Φ from input nodes to output gates.

6. Thermodynamic costs of SR circuits

In general, there are multiple decompositions of the EF and EP incurred by running any given SR circuit. They differ in how much of the detailed structure of the circuit they incorporate. In this section we present some of these decompositions. (See appendix B for all proofs of the results in this section).

6.1. General decomposition of EF and EP

Let p refer to the initial distribution over the joint state of all nodes in the circuit. By equation (5), the total EF incurred by implementing some overall map P which takes the initial joint state of all nodes in the full circuit to the final joint state is

$\begin{equation}Q\left(p\right)=\mathcal{L}\left(p\right)+\sigma \left(p\right).\end{equation} \tag{ 36 }$

where

$\begin{equation}\mathcal{L}\left(p\right){:=}S\left(p\right)-S\left(Pp\right)\end{equation} \tag{ 37 }$

is the Landauer cost of computing P for initial distribution p.

The first term in equation (37), $\mathcal{L}$ , is the minimal EF that must be incurred by any process over ${\mathcal{X}}_{\mathrm{I}\mathrm{N}}{\times}{\mathcal{X}}_{\mathrm{O}\mathrm{U}\mathrm{T}}$ that implements P, without any constraints on how the variables in ${\mathcal{X}}_{\mathrm{I}\mathrm{N}}{\times}{\mathcal{X}}_{\mathrm{O}\mathrm{U}\mathrm{T}}$ are coupled, and without any reference to a set of intermediate subsystems (e.g., gates) that may connect the input and output variables¹⁴.

The second term in equation (44) is the EP, which reflects the thermodynamic irreversibility of the SR circuit. Using equation (16), the EP can be further decomposed as

$\begin{equation}\sigma \left(p\right)=\left[D\left(p{\Vert}q\right)-D\left(Pp{\Vert}Pq\right)\right]+\sum _{c\in L\left(P\right)}p\left(c\right)\sigma \left({q}^{c}\right).\end{equation} \tag{ 38 }$

The decrease in KL reflects the mismatch cost, arising from the discrepancy between p(x_V), the actual initial distribution over all nodes of the circuit defined in equation (39), and q(x_V), the optimal prior distribution over the joint state of all the nodes of the circuit which would result in the least EP. The last sum in equation (38) reflects the residual EP, reflecting EP that remains even when the circuit is initialized with the optimal prior distribution.

Suppose we know that the dynamics is actually implemented with an SR circuit, but do not know the precise wiring diagram. Then we know that the initial joint distribution over all the nodes is

$\begin{equation}p\left({x}_{V}\right)={p}_{\mathrm{I}\mathrm{N}}\left({x}_{\mathrm{I}\mathrm{N}}\right)\prod _{v\in V{\backslash}\mathrm{I}\mathrm{N}}\delta \left({x}_{v},\varnothing \right),\end{equation} \tag{ 39 }$

and the ending joint distribution is

$\begin{equation}\left[Pp\right]\left({x}_{V}\right)={p}_{\mathrm{O}\mathrm{U}\mathrm{T}}\left({x}_{\mathrm{O}\mathrm{U}\mathrm{T}}\right)\prod _{v\in V{\backslash}\mathrm{O}\mathrm{U}\mathrm{T}}\delta \left({x}_{v},\varnothing \right),\end{equation} \tag{ 40 }$

So S(p) = S(p_IN) and S(Pp) = S(p_OUT) = S(π_Φp_IN), where π_Φ is the conditional distribution of the final joint state of the output gates given the initial joint state of the input nodes, defined in equation (14). Combining gives

$\begin{equation}\mathcal{L}\left(p\right)=S\left({p}_{\mathrm{I}\mathrm{N}}\right)-S\left({\pi }_{{\Phi}}{p}_{\mathrm{I}\mathrm{N}}\right)\end{equation} \tag{ 41 }$

Similarly, the EP becomes

$\begin{align}\hfill \sigma \left(p\right)& =\left[D\left({p}_{\mathrm{I}\mathrm{N}}{\Vert}{q}_{\mathrm{I}\mathrm{N}}\right)-D\left({\pi }_{{\Phi}}{p}_{\mathrm{I}\mathrm{N}}{\Vert}{\pi }_{{\Phi}}{q}_{\mathrm{I}\mathrm{N}}\right)\right]+\sum _{c\in L\left({\pi }_{{\Phi}}\right)}p\left(c\right)\sigma \left({q}_{\mathrm{I}\mathrm{N}}^{c}\right).\hfill \end{align} \tag{ 42 }$

While the expressions in equations (38) and (42) for EP must be equal, how they decompose that EP among a mismatch cost term and a residual EP differ. The two decompositions differ because they define the 'optimal initial distribution' relative to differ sets of possible distributions, resulting in different prior distributions (which are also defined over different sets of outcomes). Also note that the residual EP terms in equation (42) are defined in terms of a more constrained minimization problem than the residual EP terms in equation (38). Thus, given the same initial distribution p, the residual EP in equation (42) will generally be larger than the residual EP in equation (38), while the mismatch cost in equation (38) will generally be larger than the mismatch cost in equation (42). We also emphasize that the island decompositions appearing in the two expressions are different.

6.2. Circuit-based decompositions of EF and EP

The decompositions of EF and EP given in (equations (36), (38) and (42)) do not involve the wiring diagram of the SR circuit. As an alternative, we can exploit that wiring diagram to formulate a decomposition of EF and EP which separates the contributions from different gates. In general, such circuit-based decompositions allow for a finer-grained analysis of the EP in SR circuits than do the decompositions proposed in the last section. In particular, they allow us to derive some novel connections between nonequilibrium statistical physics, computer science theory, and information theory, as discussed in the next two subsections.

Before discussing these circuit-based decompositions, we introduce some new notation. We write p_pa(g)(x_pa(g)) and p_n(g)(x_n(g)) = p_pa(g)(x_pa(g))δ(x_g, ∅) for the distributions over x_pa(g) and x_n(g), respectively, at the beginning of the solitary process that implements gate g. We write the EF function of the solitary process of gate g as ${\mathcal{Q}}_{g}\left({p}_{n\left(g\right)}\right)$ , and its subsystem EP as

$\begin{equation}{\hat{\sigma }}_{g}\left({p}_{n\left(g\right)}\right){:=}{\mathcal{Q}}_{g}\left({p}_{n\left(g\right)}\right)-\left[S\left({p}_{n\left(g\right)}\right)-S\left({P}_{g}{p}_{n\left(g\right)}\right)\right].\end{equation} \tag{ 43 }$

We also write p^beg(g) and p^end(g) to indicate the joint distribution over all circuit nodes at the beginning and end, respectively, of the solitary process that runs gate g. As an illustration of this notation, p^beg(g)(x_pa(g)) = p_pa(g)(x_pa(g)). On the other hand, p^end(g)(x_g) = (π_gp_pa(g))(x_g) is the distribution over x_g after gate g runs, and p^end(g)(x_pa(g)) is a delta function about the joint state of the parents of g in which they are all initialized, by equation (29). Note that since we are considering solitary processes, p^beg(g)(x_V\n(g)) = p^end(g)(x_V\n(g)).

We now present our first circuit-based decomposition, and then we explain what its terms mean in detail:

Theorem 2. The total EF incurred by running an SR circuit where p is the initial distribution over the joint state of all nodes in the circuit is

$\begin{equation}\mathcal{Q}\left(p\right)=\mathcal{L}\left(p\right)+\underset{\sigma \left(p\right)}{\underbrace{{\mathcal{L}}^{\mathrm{l}\mathrm{o}\mathrm{s}\mathrm{s}}\left(p\right)+\mathcal{M}\left(p\right)+\mathcal{R}\left(p\right)}}.\end{equation} \tag{ 44 }$

(1) The first term in equation (44), $\mathcal{L}\left(p\right)$ , is the Landauer cost of the circuit, as described in equation (37). This Landauer cost can be further decomposed into contributions from the individual gates. Specifically, write ${\mathcal{L}}_{g}\left(p\right)$ for the drop in the entropy of the entire circuit during the time that the solitary process for gate g runs, given that the input distribution over the entire circuit is p:

$\begin{equation}{\mathcal{L}}_{g}\left(p\right){:=}S\left({p}^{\mathrm{b}\mathrm{e}\mathrm{g}\left(g\right)}\right)-S\left({p}^{\mathrm{e}\mathrm{n}\mathrm{d}\left(g\right)}\right).\end{equation} \tag{ 45 }$

Note that the distribution over the states of the entire physical circuit at the end of the running of any gate is the same as the distribution at the beginning of the running of the next gate. So by canceling terms, and using the fact that entropy does not change when a wire gate runs, we can expand $\mathcal{L}$ as

$\begin{equation}\mathcal{L}\left(p\right)=\sum _{g\in G{\backslash}W}{\mathcal{L}}_{g}\left(p\right).\end{equation} \tag{ 46 }$

(Recall from section 2.5 that W is the set of wire gates in the circuit.) This decomposition will be useful below.

(2) The second term in equation (44), ${\mathcal{L}}^{\mathrm{l}\mathrm{o}\mathrm{s}\mathrm{s}}\left(p\right)$ , is the unavoidable additional EF that is incurred by any SR implementation of the SR circuit on initial distribution p, above and beyond $\mathcal{L}$ , the Landauer cost of running the map π_Φ on initial distribution p. We refer to this unavoidable extra EF as the circuit Landauer loss. It equals the sum of the subsystem Landauer losses incurred by each non-wire gate's solitary process,

$\begin{equation}{\mathcal{L}}^{\mathrm{l}\mathrm{o}\mathrm{s}\mathrm{s}}\left(p\right)=\sum _{g\in G{\backslash}W}{\mathcal{L}}_{g}^{\mathrm{l}\mathrm{o}\mathrm{s}\mathrm{s}}\left(p\right),\end{equation} \tag{ 47 }$

where ${\mathcal{L}}_{g}^{\mathrm{l}\mathrm{o}\mathrm{s}\mathrm{s}}\left(p\right)=S\left({p}_{\mathrm{p}\mathrm{a}\left(g\right)}\right)-S\left({\pi }_{g}{p}_{\mathrm{p}\mathrm{a}\left(g\right)}\right)-{\mathcal{L}}_{g}\left(p\right)$ .

Each term ${\mathcal{L}}_{g}^{\mathrm{l}\mathrm{o}\mathrm{s}\mathrm{s}}\left(p\right)$ in this sum is non-negative (see end of section 4), and so ${\mathcal{L}}^{\mathrm{l}\mathrm{o}\mathrm{s}\mathrm{s}}\left(p\right){\geqslant}0$ . Note that we can omit wires from the sum in equation (47) because π_g is logically reversible for any wire gate g, which means that ${\mathcal{L}}_{g}^{\mathrm{l}\mathrm{o}\mathrm{s}\mathrm{s}}\left(p\right)=0$ for such gates.

We define circuit Landauer cost to be the minimal EF incurred by running any SR implementation of the circuit, i.e.,

$\begin{equation}{\mathcal{L}}^{\mathrm{c}\mathrm{i}\mathrm{r}\mathrm{c}}\left(p\right)=\mathcal{L}\left(p\right)+{\mathcal{L}}^{\mathrm{l}\mathrm{o}\mathrm{s}\mathrm{s}}\left(p\right)\end{equation} \tag{ 48 }$

$\begin{equation}=\sum _{g\in G{\backslash}W}\left[S\left({p}_{\mathrm{p}\mathrm{a}\left(g\right)}\right)-S\left({\pi }_{g}{p}_{\mathrm{p}\mathrm{a}\left(g\right)}\right)\right].\end{equation} \tag{ 49 }$

Recall that $\mathcal{L}\left(p\right)$ is the minimal EF that must be generated by any physical process that carries out the map P on initial distribution p. So by equation (48), ${\mathcal{L}}^{\mathrm{l}\mathrm{o}\mathrm{s}\mathrm{s}}\left(p\right)$ is the minimal additional EF that must be generated if we use an SR circuit to carry out P on p, no matter how efficient the gates in the circuit are. In this sense, equation (48) can be viewed as an extension of the generalized Landauer bound, to concern SR circuits.

(3) The third term in equation (44), $\mathcal{M}$ , reflects the EF incurred because the actual initial distribution of each gate g is not the optimal one for that gate (i.e., not one that minimizes subsystem EP within each island of the conditional distribution P_g, defined in equation (29)). We refer to this cost as the circuit mismatch cost, and write it as

$\begin{equation}\;\;\mathcal{M}\left(p\right)\;=\;\sum _{g\in G}\left[D\left(\right.{p}_{n\left(g\right)}{\Vert}{q}_{n\left(g\right)}\left.\right)\;-\;D\left(\right.{P}_{g}{p}_{n\left(g\right)}{\Vert}{P}_{g}{q}_{n\left(g\right)}\left.\right)\right]\end{equation} \tag{ 50 }$

where the prior q_n(g) is a distribution over ${\mathcal{X}}_{n\left(g\right)}$ whose conditional distributions over the islands c ∈ L(P_g) all obey ${\hat{\sigma }}_{g}\left({q}_{n\left(g\right)}^{c}\right)={\mathrm{min}}_{r:\mathrm{s}\mathrm{u}\mathrm{p}\mathrm{p}\;r\subseteq c}{\hat{\sigma }}_{g}\left(r\right)$ . Note that we must include wire gates g in the sum in equation (50) even though π_g for a wire gate is logically reversible. This is because the associated overall map over n(g), equation (29), is not logically reversible over n(g)¹⁵.

$\mathcal{M}$ is non-negative, since each gate's subsystem mismatch cost is non-negative. Moreover, $\mathcal{M}$ achieves its minimum value of 0 when ${p}_{n\left(g\right)}^{{c}_{g}}={q}_{n\left(g\right)}^{{c}_{g}}$ for all g ∈ G and all islands c_g ∈ L(P_g). (Recall that subsystem priors like ${q}_{n\left(g\right)}^{{c}_{g}}$ reflect the specific details of the underlying physical process that implements the gate g, such as how its energy spectrum evolves as it runs.)

Suppose that one wishes to construct a physical system to implement some circuit, and can vary the associated subsystem priors ${q}_{n\left(g\right)}^{{c}_{g}}$ arbitrarily. Then in order to minimize mismatch cost one should choose priors ${q}_{n\left(g\right)}^{{c}_{g}}$ that equal the actual associated initial distributions ${p}_{n\left(g\right)}^{c}$ . Moreover, those actual initial distributions ${p}_{n\left(g\right)}^{{c}_{g}}$ can be calculated from the circuit's wiring diagram, together with the input distribution of the entire circuit, p_IN, by 'propagating' p_IN through the transformations specified by the wiring diagram. As a result, given knowledge of the wiring diagram and the input distribution of the entire circuit, in principle the priors can be set so that mismatch cost vanishes.

(4) The fourth term in equation (44), $\mathcal{R}$ , reflects the remaining EF incurred by running the SR circuit, and so we call it circuit residual EP. Concretely, it equals the subsystem EP that would be incurred even if the initial distribution within each island of each gate were optimal:

$\begin{equation}\mathcal{R}\left(p\right)=\sum _{g\in G}\sum _{c\in L\left({P}_{g}\right)}{p}_{n\left(g\right)}\left(c\right)\;{\hat{\sigma }}_{g}\left({q}_{n\left(g\right)}^{c}\right).\end{equation} \tag{ 51 }$

Circuit residual EP is non-negative, since each ${\hat{\sigma }}_{g}$ is non-negative. Since for every gate g, p_n(g)(c) is a linear function of the initial distribution to the circuit as a whole, circuit residual EP also depends linearly on the initial distribution. Like the priors of the gates, the residual EP terms $\left\{{\hat{\sigma }}_{g}\left({q}_{n\left(g\right)}^{c}\right)\right\}$ reflect the 'nitty-gritty' details of how the gates run.

To summarize, the EF incurred by a circuit can be decomposed into the Landauer cost (the contribution to the EF that would arise even in a thermodynamically reversible process) plus the EP (the contribution to that EF which is thermodynamically irreversible). In turn, there are three contributions to that EP:

(a)
Circuit Landauer loss, which is independent of how the circuit is physically implemented, but does depend on the wiring diagram of the circuit, the conditional distributions implemented by the gates, and the initial distribution over inputs. It is a nonlinear function of the distribution over inputs, p_IN.
(b)
Circuit mismatch cost, which does depend on how the circuit is physical implemented (via the priors), as well as the wiring diagram. It is also a nonlinear function of p_IN.
(c)
Circuit residual EP, which also depends on how the circuit is physical implemented. It is a linear function of p_IN. However, no matter what the wiring diagram of the circuit is, if we implement each of the gates in a circuit with a quasistatic process, then the associated circuit residual EP is identically zero, independent of p_IN¹⁶.

There are other useful decompositions of the EP incurred by an SR circuit that incorporate the wiring diagram. One such alternative decomposition, which is our second main result, leaves the circuit Landauer loss term in equation (44) unchanged, but modifies the circuit mismatch cost and the circuit residual EP terms.

Theorem 3. The total EF incurred by running an SR circuit where p is the initial distribution over the joint state of all nodes in the circuit is

$\begin{equation}\mathcal{Q}\left(p\right)=\mathcal{L}\left(p\right)+\underset{\sigma \left(p\right)}{\underbrace{{\mathcal{L}}^{\mathrm{l}\mathrm{o}\mathrm{s}\mathrm{s}}\left(p\right)+{\mathcal{M}}^{\prime }\left(p\right)+{\mathcal{R}}^{\prime }\left(p\right)}}.\end{equation} \tag{ 52 }$

To present this decomposition, recall from equation (29) that for any gate g, the distribution over ${\mathcal{X}}_{n\left(g\right)}$ has partial support at the beginning of the solitary process that implements P_g, since there is 0 probability that x_g ≠ ∅. We use this fact to apply theorem 1 to equation (18), while taking $\mathcal{Z}=\left\{{x}_{n\left(g\right)}\in {\mathcal{X}}_{n\left(g\right)}:{x}_{g}=\varnothing \right\}$ . This allows us to express the modified circuit mismatch cost by replacing all of the map P_g(x'_n(g)|x_n(g)) in the summand in equation (50) with P_g(x'_g|x_pa(g)) = π_g:

$\begin{equation}{\mathcal{M}}^{\prime }\left(p\right)=\sum _{g\in G{\backslash}W}\left[D\left({p}_{\mathrm{p}\mathrm{a}\left(g\right)}{\Vert}{q}_{\mathrm{p}\mathrm{a}\left(g\right)}\right)-D\left({\pi }_{g}{p}_{\mathrm{p}\mathrm{a}\left(g\right)}{\Vert}{\pi }_{g}{q}_{\mathrm{p}\mathrm{a}\left(g\right)}\right)\right]\end{equation} \tag{ 53 }$

where the priors q_pa(g) are defined in terms of the island decompositions of the associated conditional distributions π_g, rather than in terms of the island decompositions of the conditional distributions P_g. Note that can exclude the wire gates from the sum in equation (53) because each wire gate's π_g is logically reversible, and so the associated drop in KL divergence are zero. Then, the modified circuit residual EP is

$\begin{equation}{\mathcal{R}}^{\prime }\left(p\right)=\sum _{g\in G}\sum _{c\in L\left({\pi }_{g}\right)}{p}_{\mathrm{p}\mathrm{a}\left(g\right)}\;{\hat{\sigma }}_{g}\left({q}_{\mathrm{p}\mathrm{a}\left(g\right)}^{c}\right),\end{equation} \tag{ 54 }$

where each term ${\hat{\sigma }}_{g}\left({q}_{\mathrm{p}\mathrm{a}\left(g\right)}^{c}\right)$ is given by appropriately modifying the arguments in equation (43). In deriving equation (54), we used the fact that ${L}_{\mathcal{Z}}\left({P}_{g}\right)=L\left({\pi }_{g}\right)$ (for $\mathcal{Z}$ as defined above for each gate g).

As with the analogous results in the previous section, theorems 2 and 3 differ, because they define 'optimal initial distribution' relative to different sets of possibilities. In particular, the decomposition in theorem 2 will generally have a larger mismatch cost and smaller residual EP term than the decomposition in theorem 3.

For the rest of this section, we will use the term 'circuit mismatch cost' to refer to the expression in equation (53) rather than the expression in equation (50), and similarly will use the term 'circuit residual EP' to refer to the expression in equation (54) rather than the expression in equation (51).

6.3. Information theory and circuit Landauer loss

By combining equations (47) and (26), we can write circuit Landauer loss as

$\begin{equation}{\mathcal{L}}^{\mathrm{l}\mathrm{o}\mathrm{s}\mathrm{s}}\left(p\right)=\sum _{g\in G{\backslash}W}\left[I\left({X}_{\mathrm{p}\mathrm{a}\left(g\right)};{X}_{V{\backslash}\mathrm{p}\mathrm{a}\left(g\right)}\right)-I\left({X}_{g};{X}_{V{\backslash}g}\right)\right]\end{equation} \tag{ 55 }$

Any nodes that belong to V\n(g) and that are in their initialized state when gate g starts to run will not contribute to the drop in mutual information terms in equation (55). Keeping track of such nodes and simplifying establishes the following:

Corollary 4. The circuit Landauer loss is

$\begin{equation*}{\mathcal{L}}^{\mathrm{l}\mathrm{o}\mathrm{s}\mathrm{s}}\left(p\right)=\mathcal{I}\left({p}_{\mathrm{I}\mathrm{N}}\right)-\mathcal{I}\left({\pi }_{{\Phi}}{p}_{\mathrm{I}\mathrm{N}}\right)-\sum _{g\in G{\backslash}W}\mathcal{I}\left({p}_{\mathrm{p}\mathrm{a}\left(g\right)}\right).\end{equation*}$

(We remind the reader that $\mathcal{I}\left({p}_{A}\right)$ refers to the multi-information between the variables indexed by A.) Corollary 4 suggest a set of novel optimization problems for how to design SR circuits: given some desired computation π_Φ and some given initial distribution p_IN, find the circuit wiring diagram that carries out π_Φ while minimizing the circuit Landauer loss. Presuming we have a fixed input distribution p and map π_Φ, the term $\mathcal{I}\left({p}_{\mathrm{I}\mathrm{N}}\right)-\mathcal{I}\left({\pi }_{{\Phi}}{p}_{\mathrm{I}\mathrm{N}}\right)$ in corollary 4 is an additive constant that does not depend on the particular choice of circuit wiring diagram. So the optimization problem can be reduced to finding which wiring diagram results in a minimal value of ${\sum }_{g\in G{\backslash}W}\mathcal{I}\left({p}_{\mathrm{p}\mathrm{a}\left(g\right)}\right)$ . In other words, for fixed p and map π_Φ, to minimize the Landauer loss we should choose the wiring diagram for which the parents of each gate are as strongly correlated among themselves as possible. Intuitively, this ensures that the 'loss of correlations', as information is propagated down the circuit, is as small as possible.

In general, the distributions over the outputs of the gates in any particular layer of the circuit will affect the distribution of the inputs of all of the downstream gates, in the subsequent layers of the circuit. This means that the sum of multi-informations in corollary 4 is an inherently global property of the wiring diagram of a circuit; it cannot be reduced to a sum of properties of each gate considered in isolation, independently of the other gates. This makes the optimization problem particularly challenging. We illustrate this optimization problem in the following example.

Example 6. Consider again the case where we want our circuit to compute the 3-bit parity function using 2-input XOR gates, i.e., it want it to implement the map

$\begin{equation*}{\pi }_{{\Phi}}\left({x}_{g}\vert {x}_{1},{x}_{2},{x}_{3}\right)=\delta \left({x}_{g},\delta \left({x}_{1}+{x}_{2}+{x}_{3},1\right)+\delta \left({x}_{1}+{x}_{2}+{x}_{3},3\right)\right).\end{equation*}$

Suppose we happen to know that the input distribution to the circuit will be

$\begin{equation}{p}_{\mathrm{I}\mathrm{N}}\left({x}_{1},{x}_{2},{x}_{3}\right)=\frac{1}{Z}{\mathrm{e}}^{-\left[\phi \left({x}_{1}\right)\phi \left({x}_{2}\right)/4+\phi \left({x}_{1}\right)\phi \left({x}_{3}\right)/2+\phi \left({x}_{2}\right)\phi \left({x}_{3}\right)\right]},\end{equation} \tag{ 56 }$

where Z is a normalization constant and ϕ(x) := 2x − 1 is a function that maps bit values x ∈ {0, 1} to spin values, ϕ(x) ∈ {−1, 1}. The distribution equation (56) is a Boltzmann distribution for a pairwise spin model, where inputs 2 and 3 have the strongest coupling strength (1), inputs 1 and 3 have intermediate-strength coupling strength (1/2), and inputs 1 and 2 have the weakest coupling strength (1/4).

We wish to find the wiring diagram connecting our XOR gates that has minimal circuit Landauer cost for this distribution over its three input bits. It turns out that we can restrict our search to three possible wiring diagrams, which are shown in figure 5. We indicate the circuit Landauer loss for the input distribution of equation (56) for each of those three wiring diagrams. So for this input distribution, the right-most wiring diagram results in minimal circuit Landauer loss. Note that this wiring diagram aligns with the correlational structure of the input distribution (given that inputs 2 and 3 have the strongest statistical correlation).

**Figure 5.** Three different circuit wiring diagrams for computing 3-bit parity using XOR gates. The circuit Landauer loss for each wiring diagram is shown below, given the input distribution of equation (56).
Download figure:
Standard image High-resolution image

An interesting variant of the optimization problem described above arises if we model the residual EP terms for the wire gates. In any SR circuit, wire gates carry out a logically reversible operation on their inputs. Thus, by equation (16), all of the EF generated by any wire gates is residual EP. If we allow the physical lengths of wires to vary, then as a simple model we could presume that the residual EP of any wire is proportional to its length. This would allow us to incorporate into our analysis the thermodynamic effect of the geometry with which a circuit is laid out on a two-dimensional circuit boards, in addition to the thermodynamic effect of the topology of that circuit.

Finally, note that for any set of nodes A, multi-information can be bounded as

$\begin{equation}0{\leqslant}\mathcal{I}\left({p}_{A}\right){\leqslant}{\sum }_{v\in A}^{}S\left({p}_{v}\right){\leqslant}{\sum }_{v\in A}\vert {\mathcal{X}}_{v}\vert .\end{equation}$

Given this, corollary 4 implies

$\begin{equation}{\mathcal{L}}^{\mathrm{l}\mathrm{o}\mathrm{s}\mathrm{s}}\left(p\right){\leqslant}\mathcal{I}\left({p}_{\mathrm{I}\mathrm{N}}\right){\leqslant}\mathrm{ln}\vert {\mathcal{X}}_{\mathrm{I}\mathrm{N}}\vert .\end{equation} \tag{ 57 }$

This means that for a fixed input state space, the circuit Landauer loss cannot grow without bound as we vary the wiring diagram. Interestingly, this bound on Landauer loss only holds for SR circuits that have out-degree 1. If we consider SR circuits that have out-degree greater than 1, then the circuit Landauer cost can be arbitrarily large. This is formalized as the following proposition, which is proved in appendix C.

Proposition 1. For any π_Φ, non-delta function input distribution p_IN, and κ ⩾ 0, there exist an SR circuit with out-degree greater than 1 that implements π_Φ for which ${\mathcal{L}}^{\mathrm{l}\mathrm{o}\mathrm{s}\mathrm{s}}\left(p\right){\geqslant}\kappa$ .

6.4. Information theory and circuit mismatch loss

Landauer loss captures the gain in minimal EF due to using an SR circuit, which becomes equal to the gain in actual EF if there is no mismatch cost or residual EP. It is harder to make general statements about the gain in actual EF due to using an SR circuit, i.e., when the mismatch cost and/or residual EP is nonzero. In this subsection we make some preliminary remarks about this issue.

Imagine that we wish to build a physical process that implements some computation π_Φ(x_OUT|x_IN) over a space ${\mathcal{X}}_{\mathrm{I}\mathrm{N}}{\times}{\mathcal{X}}_{\mathrm{O}\mathrm{U}\mathrm{T}}$ . Suppose we want this process to achieve minimal EP when run with inputs generated by q_IN (e.g., if we expect future inputs to the process to be generated by sampling q_IN), and as usual assume the initial value of x_OUT will be ∅ whenever it is run. Using the decomposition of equation (42) and assuming that the residual EP of the process is zero, the EF that such a process would generate if it is actually run with an input distribution p (initialized like SR circuits are, so that it has the form of equation (39)) is given by the sum of the Landauer cost and the mismatch cost, with no Landauer loss term. We write this as

$\begin{equation}{\mathcal{Q}}_{\mathrm{A}\mathrm{O}}\left(p\right)=\mathcal{L}\left(p\right)+D\left({p}_{\mathrm{I}\mathrm{N}}{\Vert}{q}_{\mathrm{I}\mathrm{N}}\right)-D\left({\pi }_{{\Phi}}{p}_{\mathrm{I}\mathrm{N}}{\Vert}{\pi }_{{\Phi}}{q}_{\mathrm{I}\mathrm{N}}\right),\end{equation} \tag{ 58 }$

Note that in order for the EF generated by an actual physical process to be given by equation (58), the prior of that process must be q_IN, and in general this may require that the process couple together arbitrary sets of variables. This means that the EF generated by implementing π_Φ(x_OUT|x_IN) with an SR circuit cannot obey equation (58) in general, due to restrictions on what variables can be coupled in such a circuit. (One can verify, for example, that the prior distribution q_IN of a circuit consisting of two disconnected bit erasing gates must be a product distribution over the two input bits.) To emphasize this distinction, we will refer to a process whose EF is given by equation (58)) as an 'all-at-once' (AO) process (indicated by the subscript 'AO' in equation (58)).

For practical reasons, it may be quite difficult to construct an AO process that implements π_Φ, and we must use a circuit implementation instead. In particular, even though the circuit as a whole cannot have prior q_IN, suppose we can set the priors q_pa(g) at its gates by propagating q_IN through the wiring diagram of the circuit. Assuming again that there is zero EP, the EF that must be incurred by any such SR circuit implementation of π_Φ on input distribution p, assuming some particular wiring topology and gate priors, is given by the decomposition of theorem 3,

$\begin{equation}{\mathcal{Q}}_{\mathrm{c}\mathrm{i}\mathrm{r}\mathrm{c}}\left(p\right)=\mathcal{L}\left(p\right)+{\mathcal{L}}^{\mathrm{l}\mathrm{o}\mathrm{s}\mathrm{s}}\left(p\right)+{\mathcal{M}}^{\prime }\left(p\right).\end{equation} \tag{ 59 }$

We now ask: how much larger is this EF incurred by the SR circuit implementation, compared to that of the original AO process? Subtracting equation (58) from equation (59) gives

$\begin{align}\hfill {\Delta}\mathcal{Q}& ={\mathcal{Q}}_{\mathrm{c}\mathrm{i}\mathrm{r}\mathrm{c}}\left({p}_{\mathrm{I}\mathrm{N}}\right)-{\mathcal{Q}}_{\mathrm{A}\mathrm{O}}\left({p}_{\mathrm{I}\mathrm{N}}\right)\hfill \\ \hfill & ={\mathcal{L}}^{\mathrm{l}\mathrm{o}\mathrm{s}\mathrm{s}}\left({p}_{\mathrm{I}\mathrm{N}}\right)+{\mathcal{M}}^{\mathrm{l}\mathrm{o}\mathrm{s}\mathrm{s}}\left({p}_{\mathrm{I}\mathrm{N}}{\Vert}{q}_{\mathrm{I}\mathrm{N}}\right)\hfill \end{align} \tag{ 60 }$

where we have defined ${\mathcal{M}}^{\mathrm{l}\mathrm{o}\mathrm{s}\mathrm{s}}$ as the difference between the circuit mismatch cost, ${\mathcal{M}}^{\prime }\left(p\right)$ , and the mismatch cost of the AO process. We refer to that difference in mismatch costs as the circuit mismatch loss, and use equation (53) to express it as

$\begin{align}\hfill {\mathcal{M}}^{\mathrm{l}\mathrm{o}\mathrm{s}\mathrm{s}}\left({p}_{\mathrm{I}\mathrm{N}}{\Vert}{q}_{\mathrm{I}\mathrm{N}}\right)& =\mathcal{D}\left({\pi }_{{\Phi}}{p}_{\mathrm{I}\mathrm{N}}{\Vert}{\pi }_{{\Phi}}{q}_{\mathrm{I}\mathrm{N}}\right)-\mathcal{D}\left({p}_{\mathrm{I}\mathrm{N}}{\Vert}{q}_{\mathrm{I}\mathrm{N}}\right)+\sum _{g\in G{\backslash}W}\mathcal{D}\left({p}_{\mathrm{p}\mathrm{a}\left(g\right)}{\Vert}{q}_{\mathrm{p}\mathrm{a}\left(g\right)}\right),\hfill \end{align} \tag{ 61 }$

where $\mathcal{D}$ refers to the multi-divergence, defined in equation (10). Equation (61) can be compared to corollary 4, which expresses the circuit Landauer cost rather than circuit mismatch cost, and involves multi-informations rather than multi-divergences.

Interestingly, while circuit Landauer loss is non-negative, circuit mismatch loss can either be positive or negative. In fact, depending on the wiring diagram, p_IN and q_IN, the sum of circuit mismatch loss and circuit Landauer loss can be negative. This means that when the actual input distribution p_IN is different from the prior distribution of the AO process, the 'closest equivalent circuit' to the AO process may actually incur less EF than the corresponding AO process. This occurs because an SR circuit cannot implement some of the prior distributions that an AO process can implement, so the two implementations end up having different priors. This is illustrated in the following example.

Example 7. Assume the desired computation is the erasure of two bits, π_Φ(x₃, x₄|x₁, x₂) = δ(x₃, 0)δ(x₄, 0), where x₁ and x₂ refers to the input bits, and x₃ and x₄ refers to the output bits. The prior distribution implemented by an AO process is given by q_IN(0, 0) = q(1, 1) = < 1/2, and q_IN(0, 1) = q(1, 0) = 1/2 − . The actual input distribution is given by a delta function distribution, p_IN(x₁, x₂) = δ(x₁, 0)δ(x₂, 0).

We now implement this computation using an SR circuit which consists of two disconnected erasure gates. The closest equivalent SR circuit has gate priors given by the uniform marginal distributions, q(x₁) = 1/2 and q(x₂) = 1/2. Then the difference between the EF of the AO process and the SR circuit is

$\begin{align*}\hfill {\Delta}\mathcal{Q}=& -\left[S\left({p}_{\mathrm{I}\mathrm{N}}\right)\;+\;D\left({p}_{\mathrm{I}\mathrm{N}}{\Vert}{q}_{\mathrm{I}\mathrm{N}}\right)\right]+\left[\sum _{g}S\left({p}_{\mathrm{p}\mathrm{a}\left(g\right)}\right)+D\left({p}_{\mathrm{p}\mathrm{a}\left(g\right)}{\Vert}{q}_{\mathrm{p}\mathrm{a}\left(g\right)}\right)\right]\hfill \\ \hfill & =\sum _{{x}_{1},{x}_{2}}{p}_{\mathrm{I}\mathrm{N}}\left({x}_{1},{x}_{2}\right)\mathrm{ln}\;{q}_{\mathrm{I}\mathrm{N}}\left({x}_{1},{x}_{2}\right)-\sum _{{x}_{1}}p\left({x}_{1}\right)\mathrm{ln}\;q\left({x}_{1}\right)-\sum _{{x}_{2}}p\left({x}_{2}\right)\mathrm{ln}\;q\left({x}_{2}\right)\hfill \\ \hfill & =\mathrm{ln}\;{\epsilon}+2\mathrm{ln}\;2\hfill \end{align*}$

This can be made arbitrarily negative by taking sufficiently close to zero. Thus, the EF of the AO process may be arbitrarily larger than the EF of the closest equivalent SR circuit.

7. Related work

The issue of how the thermodynamic costs of a circuit depend on the constraints inherent in the topology of the circuit has not previously been addressed using the tools of modern nonequilibrium statistical physics. Indeed, this precise issue has received very little attention in any of the statistical physics literature. A notable exception was a 1996 paper by Gernshenfeld [47], which pointed out that all of the thermodynamic analyses of conventional (irreversible) computing architectures at the time were concerned with properties of individual gates, rather than entire circuits. That paper works through some elementary examples of the thermodynamics of circuits, and analyzes how the global structure of circuits (i.e., their wiring diagram) affects their thermodynamic properties. Gernshenfeld concludes that the 'next step will be to extend the analysis from these simple examples to more complex systems'¹⁷.

There are also several papers that do not address circuits, but focus on tangentially related topics, using modern nonequilibrium statistical physics. Ito and Sagawa [40, 41] considered the thermodynamics of (time-extended) Bayesian networks [48, 49]. They divided the variables in the Bayes net into two sets: the sequence of states of a particular system through time, which they write as X, and all external variables that interact with the system as it evolves, which they write as $\mathcal{C}$ . They then derive and investigate an integral fluctuation theorem [12, 15, 33, 50] that relates the EP generated by X and the EP flowing between X and $\mathcal{C}$ . (See also [51]).

Note that [40] focuses on the EP generated by a proper subset of the nodes in the entire network. In contrast, our results below concern the EP generated by all nodes. In addition, while [40] concentrates on an integral fluctuation theorem involving EP, we give an exact expression for the expected EP.

Otsubo and Sagawa [52] considered the thermodynamics of stochastic Boolean network models of gene regulatory networks. They focused in particular on characterizing the information-theoretic and dissipative properties of 3-node motifs. While their study does concern dynamics over networks, it has little in common with the analysis in the current paper, in particular due to its restriction to 3-node systems.

Solitary processes are similar to 'feedback control' processes, which have attracted much attention in the thermodynamics of information literature [2, 53, 54]. In feedback control processes, there is a subsystem A that evolves while coupled to another subsystem B, which is held fixed. (This joint evolution is often used to represent either A making a measurement of the state of B, or the state of B being used to determine which control protocol to apply to B.) It has been shown for feedback control processes that the total EP incurred by the joint A × B system is the 'subsystem EP' of A, plus the drop in the mutual information between A and B [53]. Formally, this is identical to equation (26).

Crucially however, in feedback control processes there is no assumption that A and B are physically decoupled. (Formally, equation (20) is not assumed.) Therefore the change in mutual information can either be negative or positive in those processes (the latter occurs, for instance, when A performs a measurement of the state of B). In addition, the 'subsystem EP' in these processes can be negative. For this reason, in feedback control processes there is no simple relationship between subsystem EP and the total EP incurred by the joint A × B system. In contrast, in solitary processes A and B are physically decoupled (see equations (20) and (21)). For this reason, in solitary processes subsystem EP is non-negative, as is the drop in mutual information, equation (26), and so each of them is a lower bound on the total EP incurred by the joint A × B system.

Boyd et al [55] considered the thermodynamics of 'modular' systems, which in our terminology are a special type of solitary processes, with extra constraints imposed. In particular, to derive their results, [55] assumes there is exactly one thermodynamic reservoir (in their case, a heat bath). That restricts the applicability of their results. Nonetheless, individual gates in a circuit are run by solitary processes, and one could require that they in fact be run by modular systems, in order to analyze the thermodynamics of (appropriately constrained) circuits. However, instead of focusing on this issues, [55] focuses on the thermodynamics of 'information ratchets' [56], modeling them as a sequence of iterations of a single solitary process, successively processing the symbols on a semi-infinite tape. In contrast, we extend the analysis of single solitary processes operating in isolation to analyze full circuits that comprise multiple interacting solitary processes.

Riechers [57] also contains work related to solitary processes, assuming a single heat bath, like [55]. [57] exploits the decomposition of EP into 'mismatch cost' plus 'residual EP' introduced in [30], in order to analyze thermodynamic attributes of a special kind of circuit. The analysis in that paper is not as complicated as either the analysis in the current paper or the analysis in [55]. That is because [57] does not focus on how the thermodynamic costs of running a system are affected if we impose a constraint on how the system is allowed to operate (e.g., if we require that it use solitary processes). In addition, the system considered in that paper is a very special kind of circuit: a set of N disconnected gates, working in parallel, with the outputs of those gates never combined.

[3] is a survey article relating many papers in the thermodynamics of computation. To clarify some of those relationships, it introduces a type of process related to solitary processes, called 'subsystem processes'. (See also [58].) For the purposes of the current paper though, we need to understand the thermodynamics specifically of solitary processes. In addition, being a summary paper, [3] presents some results from the arXiv preprint version of the current paper, [58]. Specifically, [3, 58] summarize some of the thermodynamics of straight-line circuits subject to the extra restriction (not made in the current paper) that there only be a single output node.

There is a fairly extensive literature on 'logically reversible circuits' and their thermodynamic properties [3, 6, 59–61]. This work is based on the early analysis in [62], and so it is not grounded in modern nonequilibrium statistical physics. Indeed, modern nonequilibrium statistical physics reveals some important subtleties and caveats with the thermodynamic properties of logically reversible circuits [3]. Also see [16] for important clarifications of the relationship between thermodynamic and logical reversibility, not appreciated in some of the research community working on logically reversible circuits.

Finally, another related paper is [63]. This paper starts by taking a distilled version of the decomposition of EP in [30] as given. It then discusses some of the many new problems in computer science theory that this decomposition leads to, both involving circuits and involving many other kinds of computational system.

8. Discussion and future work

It is important to emphasize that SR circuits are somewhat unrealistic models of real digital circuits. For example, many real digital circuits have multiple gates running at the same time, and often do not reinitialize their gates after they are run. In addition, many real digital circuits have characteristics like loops and branching. This makes them challenging to model at all using simple solitary processes. Extending our analysis to these more general models of circuits is an important direction for future work. Nonetheless, it is worth mentioning that all of the thermodynamic costs discussed above—including Landauer loss, mismatch cost, and residual EP—are intrinsic to any physical process, as described in section 3. So versions of them arise in those other kinds of circuits, only in modified form.

An interesting set of issues to investigate in future work is the scaling properties of the thermodynamic costs of SR circuits. In conventional circuit complexity theory [4, 5] one first specifies a 'circuit family' which comprises an infinite set of circuits that have different size input spaces but that are all (by definition) viewed as 'performing the same computation'. For example, one circuit family is given by an infinite set of circuits each of which has a different size input space, and outputs the single bit of whether the number of 1's in its input string is odd or even. Circuit complexity theory is concerned with how various resource costs in making a given circuit (e.g., the number of gates in the circuit) scales with the size of the circuit as one goes through the members of a circuit family. For example, it may analyze how the number of gates in a set of circuits, each of which determines whether its input string contains an odd number of 1's, scales with the size of those input strings. One interesting set of issues for future research is to perform these kinds of scaling analyses when the 'resource costs' are thermodynamic costs of running the circuit rather than conventional costs like the number of gates. In particular, it is interesting to consider classes of circuit families defined in terms of such costs, in analogy to the complexity classes considered in computer science theory, like P∕poly, or P∕log.

Other interesting issues arise if we formulate a cellular automaton (CA) as a circuit with an infinite number of nodes in each layer, and an infinite number of layers, each layer of the circuit corresponding to another timestep of the CA. For example, suppose we are given a particular CA rule (i.e., a particular map taking the state of each layer i to the state of layer i + 1) and a particular distribution over its initial infinite bit pattern. These uniquely specify the 'thermodynamic EP rate', given by the total EP generated by running the CA for n iterations (i.e., for reaching the nth layer in the circuit), divided by n. It would be interesting to see how this EP rate depends on the CA rule and initial distribution over bit patterns.

Finally, another important direction for future work arises if we broaden our scope beyond digital circuits designed by human engineers, to include naturally occurring circuits such as brains and gene regulatory networks. The 'gates' in such circuits are quite noisy—but all of our results hold independent of the noise levels of the gates. On the other hand, like real digital circuits, these naturally occurring circuits have loops, branching, concurrency, etc, and so might best be modeled with some extension of the models introduced in this paper. Again though, the important point is that whatever model is used, the EP generated by running a physical system governed by that model would include Landauer loss, mismatch cost, and residual EP.

Acknowledgments

We would like to thank Josh Grochow for helpful discussion, and thank the Santa Fe Institute for helping to support this research. This paper was made possible through Grant No. CHE-1648973 from the U.S. National Science Foundation, Grant No. FQXi-RFP-1622 from the Foundational Questions Institute, and Grant No. FQXi-RFP-IPW-1912 from the Foundational Questions Institute and Fetzer Franklin Fund, a donor advised fund of Silicon Valley Community Foundation.

Appendix A.: Proof of theorem 1 and related results

A.1. Preliminaries

Consider a conditional distribution P(y|x) that specifies the probability of 'output' $y\in \mathcal{Y}$ given 'input' $x\in \mathcal{X}$ , where $\mathcal{X}$ and $\mathcal{Y}$ are finite.

Given some $\mathcal{Z}\subseteq \mathcal{X}$ , the island decomposition ${L}_{\mathcal{Z}}\left(P\right)$ of P, and any $p\in {{\Delta}}_{\mathcal{X}}$ , let p(c) = ∑_x∈cp(x) indicate the total probability within island c, and

$\begin{equation*}{p}^{c}\left(x\right){:=}\begin{cases}\frac{p\left(x\right)}{p\left(c\right)}\quad \hfill & \mathrm{i}\mathrm{f}\;x\in c\;\mathrm{a}\mathrm{n}\mathrm{d}\;p\left(c\right){ >}0\hfill \\ 0\quad \hfill & \text{otherwise}\hfill \end{cases}\end{equation*}$

indicate the conditional probability of state x within island c.

In our proofs below, we will make use of the notion of relative interior. Given a linear space V, the relative interior of a subset A ⊆ V is defined as [68]

$\begin{equation*}\mathrm{r}\mathrm{e}\mathrm{l}\mathrm{i}\mathrm{n}\mathrm{t}\;A{:=}\left\{x\in A:\forall y\in A,\exists {\epsilon}{ >}0\;\text{s.t.}\;x+{\epsilon}\left(x-y\right)\in A\right\}.\end{equation*}$

Finally, for any g, we use notation

$\begin{equation*}{\partial }_{x}^{+}g{\vert }_{x=a}{:=}{\mathrm{lim}}_{{\epsilon}\to {0}^{+}}\frac{1}{{\epsilon}}\left(g\left(a+{\epsilon}\right)-g\left(a\right)\right)\end{equation*}$

to indicate the right-handed derivative of g(x) at x = a.

A.2. Proofs

Given some conditional distribution P(y|x) and function $f:\mathcal{X}\to \mathbb{R}$ , we consider the function ${\Gamma}:{{\Delta}}_{\mathcal{X}}\to \mathbb{R}$ as

$\begin{equation*}{\Gamma}\left(p\right){:=}S\left(Pp\right)-S\left(p\right)+{\mathbb{E}}_{p}\left[f\right].\end{equation*}$

Note that Γ is continuous on the relative interior of ${{\Delta}}_{\mathcal{X}}$ .

Lemma A1. For any $a,b\in {{\Delta}}_{\mathcal{X}}$ , the directional derivative of Γ at a toward b is given by

$\begin{equation*}{\partial }_{{\epsilon}}^{+}{\Gamma}\left(a+{\epsilon}\left(b-a\right)\right){\vert }_{{\epsilon}=0}=D\left(Pb{\Vert}Pa\right)-D\left(b{\Vert}a\right)+{\Gamma}\left(b\right)-{\Gamma}\left(a\right).\end{equation*}$

Proof. Let a := a + (b − a). Using the definition of Γ, write

$\begin{equation}{\partial }_{{\epsilon}}^{+}{\Gamma}\left({a}^{{\epsilon}}\right)={\partial }_{{\epsilon}}^{+}\left[S\left(P{a}^{{\epsilon}}\right)-S\left({a}^{{\epsilon}}\right)\right]+{\partial }_{{\epsilon}}^{+}{\mathbb{E}}_{{a}^{{\epsilon}}}\left[f\right].\end{equation} \tag{ A1 }$

Then, consider the first term on the right-hand side,

$\begin{align*}\hfill {\partial }_{{\epsilon}}^{+}\left[S\left(P{a}^{{\epsilon}}\right)-S\left({a}^{{\epsilon}}\right)\right]& =-\sum _{y\in \mathcal{Y}}\left[{\partial }_{{\epsilon}}^{+}{a}^{{\epsilon}}\left(y\right)\mathrm{ln}\;{a}^{{\epsilon}}\left(y\right)+{\partial }_{{\epsilon}}^{+}\left[P{a}^{{\epsilon}}\right]\left(y\right)\right]+\sum _{x\in \mathcal{X}}\left[{\partial }_{{\epsilon}}^{+}{a}^{{\epsilon}}\left(x\right)\mathrm{ln}\;{a}^{{\epsilon}}\left(x\right)+{\partial }_{{\epsilon}}^{+}{a}^{{\epsilon}}\left(x\right)\right]\hfill \\ \hfill & =-\sum _{y\in \mathcal{Y}}\left(b\left(y\right)\;-\;a\left(y\right)\right)\mathrm{ln}\;{a}^{{\epsilon}}\left(y\right)+\sum _{x\in \mathcal{X}}\left(b\left(x\right)\;-\;a\left(x\right)\right)\mathrm{ln}\;{a}^{{\epsilon}}\left(x\right)\hfill \end{align*}$

Evaluated at = 0, the last line can be written as

$\begin{align*}\hfill & -\sum _{y\in \mathcal{Y}}\left(b\left(y\right)\;-\;a\left(y\right)\right)\mathrm{ln}\;a\left(y\right)+\sum _{x\in \mathcal{X}}\left(b\left(x\right)-a\left(x\right)\right)\mathrm{ln}\;a\left(x\right)\hfill \\ \hfill & =D\left(Pb{\Vert}Pa\right)+S\left(Pb\right)-S\left(Pa\right)-D\left(b{\Vert}a\right)-S\left(b\right)+S\left(a\right)\hfill \end{align*}$

We next consider the ${\partial }_{{\epsilon}}^{+}{\mathbb{E}}_{{a}^{{\epsilon}}}\left[f\right]$ term,

$\begin{align*}\hfill {\partial }_{{\epsilon}}^{+}{\mathbb{E}}_{{a}^{{\epsilon}}}\left[f\right]& ={\partial }_{{\epsilon}}^{+}\left[\sum _{x\in \mathcal{X}}\left(a\left(x\right)+{\epsilon}\left(b\left(x\right)-a\left(x\right)\right)\right)f\left(x\right)\right]\hfill \\ \hfill & ={\mathbb{E}}_{b}\left[f\right]-{\mathbb{E}}_{a}\left[f\right].\hfill \end{align*}$

Combining the above gives

$\begin{align*}\hfill {\partial }_{{\epsilon}}^{+}{\Gamma}\left({a}^{{\epsilon}}\right){\vert }_{{\epsilon}=0}& =D\left(Pb{\Vert}Pa\right)-D\left(b{\Vert}a\right)+S\left(Pb\right)-S\left(b\right)-\left(S\left(Pa\right)-S\left(a\right)\right)+{\mathbb{E}}_{b}\left[f\right]-{\mathbb{E}}_{a}\left[f\right]\hfill \\ \hfill & =D\left(Pb{\Vert}Pa\right)-D\left(b{\Vert}a\right)+{\Gamma}\left(b\right)-{\Gamma}\left(a\right).\hfill \end{align*}$

□

Theorem A1. Let V be a convex subset of Δ. Then for any q ∈ argmin_{s ∈ V}Γ(s) and any p ∈ V,

$\begin{equation}{\Gamma}\left(p\right)-{\Gamma}\left(q\right){\geqslant}D\left(p{\Vert}q\right)-D\left(Pp{\Vert}Pq\right).\end{equation} \tag{ A2 }$

Equality holds if q is in the relative interior of V.

Proof. Define the convex mixture q := q + (p − q). By lemma A1, the directional derivative of Γ at q in the direction p − q is

$\begin{equation*}{\partial }_{{\epsilon}}^{+}{\Gamma}\left({q}^{{\epsilon}}\right){\vert }_{{\epsilon}=0}=D\left(Pp{\Vert}Pq\right)-D\left(p{\Vert}q\right)+{\Gamma}\left(p\right)-{\Gamma}\left(q\right).\end{equation*}$

At the same time, ${\partial }_{{\epsilon}}^{+}{\Gamma}\left({q}^{{\epsilon}}\right){\vert }_{{\epsilon}=0}{\geqslant}0$ , since q is a minimizer within a convex set. equation (A2) then follows by rearranging.

When q is in the relative interior of V, q − (p − q) ∈ V for sufficiently small > 0. Then,

$\begin{align*}\hfill & 0{\leqslant}\underset{{\epsilon}\to {0}^{+}}{\mathrm{lim}}\frac{1}{{\epsilon}}\left({\Gamma}\left(q-{\epsilon}\left(p-q\right)\right)-{\Gamma}\left(q\right)\right)\hfill \\ \hfill & =-\underset{{\epsilon}\to {0}^{-}}{\mathrm{lim}}\frac{1}{{\epsilon}}\left({\Gamma}\left(q+{\epsilon}\left(p-q\right)\right)-{\Gamma}\left(q\right)\right)\hfill \\ \hfill & =-\underset{{\epsilon}\to {0}^{+}}{\mathrm{lim}}\frac{1}{{\epsilon}}\left({\Gamma}\left(q+{\epsilon}\left(p-q\right)\right)-{\Gamma}\left(q\right)\right)\hfill \\ \hfill & =-{\partial }_{{\epsilon}}^{+}{\Gamma}\left({q}^{{\epsilon}}\right){\vert }_{{\epsilon}=0}.\hfill \end{align*}$

where in the first inequality comes from the fact that q is a minimizer, in the second line we change variables as ↦ −, and the third line we use the continuity of Γ on interior of the simplex. Combining with the above implies

$\begin{equation*}{\partial }_{{\epsilon}}^{+}{\Gamma}\left({q}^{{\epsilon}}\right)=D\left(Pp{\Vert}Pq\right)-D\left(p{\Vert}q\right)+{\Gamma}\left(p\right)-{\Gamma}\left(q\right)=0.\end{equation*}$

□

Lemma A2. For any c ∈ L(P) and q ∈ $\underset{s:\mathrm{supp}s\subseteq c}{\mathrm{argmin}}$ Γ(s),

$\begin{equation*}\mathrm{s}\mathrm{u}\mathrm{p}\mathrm{p}\;q=\left\{x\in c:f\left(x\right){< }\infty \right\}.\end{equation*}$

Proof. We prove the claim by contradiction. Assume that q is a minimizer with supp q ⊂ {x ∈ c: f(x) < ∞}. Note there cannot be any x ∈ supp q and $y\in \mathcal{Y}{\backslash}\mathrm{s}\mathrm{u}\mathrm{p}\mathrm{p}\;Pq$ such that P(y|x) > 0 (if there were such an x, y, then q(y) = ∑_x'P(y|x')q(x') ⩾ P(y|x)q(x) > 0, contradicting the statement that $y\in \mathcal{Y}{\backslash}\mathrm{s}\mathrm{u}\mathrm{p}\mathrm{p}\;Pq$ ). Thus, by definition of islands, there must be an ∈ c\supp q, ∈ supp Pq such that f() < ∞ and P(|) > 0.

Define the delta-function distribution u(x) := δ(x, ) and the convex mixture q(x) = (1 − )q(x) + u(x) for ∈ [0, 1]. We will also use the notation q(y) = ∑_xP(y|x)q(x).

Since q is a minimizer of Γ, ∂Γ(q)|_{= 0} ⩾ 0. Since Γ is convex, the second derivative ${\partial }_{{\epsilon}}^{2}{\Gamma}\left({q}^{{\epsilon}}\right){\geqslant}0$ and therefore ∂Γ(q) ⩾ 0 for all ⩾ 0. Taking a = q and b = u in lemma A1 and rearranging, we then have

$\begin{align}\hfill {\Gamma}\left(u\right)& {\geqslant}D\left(u{\Vert}{q}^{{\epsilon}}\right)-D\left(Pu{\Vert}P{q}^{{\epsilon}}\right)+{\Gamma}\left({q}^{{\epsilon}}\right)\hfill \\ \hfill & {\geqslant}D\left(u{\Vert}{q}^{{\epsilon}}\right)-D\left(Pu{\Vert}P{q}^{{\epsilon}}\right)+{\Gamma}\left(q\right),\hfill \end{align} \tag{ A3 }$

where the second inequality uses that q is a minimizer of Γ. At the same time,

$\begin{align}\hfill D\left(u{\Vert}{q}^{{\epsilon}}\right)-D\left(Pu{\Vert}P{q}^{{\epsilon}}\right)& =\sum _{y}P\left(y\vert \hat{x}\right)\mathrm{ln}\frac{{q}^{{\epsilon}}\left(y\right)}{{q}^{{\epsilon}}\left(\hat{x}\right)P\left(y\vert \hat{x}\right)}\hfill \\ \hfill & =P\left(\hat{y}\vert \hat{x}\right)\mathrm{ln}\frac{{q}^{{\epsilon}}\left(\hat{y}\right)}{{\epsilon}P\left(\hat{y}\vert \hat{x}\right)}+\sum _{y\ne \hat{j}}P\left(y\vert \hat{i}\right)\mathrm{ln}\frac{{q}^{{\epsilon}}\left(y\right)}{{\epsilon}P\left(y\vert \hat{x}\right)}\hfill \\ \hfill & {\geqslant}P\left(\hat{y}\vert \hat{x}\right)\mathrm{ln}\frac{\left(1-{\epsilon}\right)q\left(\hat{y}\right)}{{\epsilon}P\left(\hat{y}\vert \hat{x}\right)}+\sum _{y\ne \hat{j}}P\left(y\vert \hat{i}\right)\mathrm{ln}\frac{{\epsilon}P\left(y\vert \hat{x}\right)}{{\epsilon}P\left(y\vert \hat{x}\right)}\hfill \\ \hfill & =P\left(\hat{y}\vert \hat{x}\right)\mathrm{ln}\frac{\left(1-{\epsilon}\right)}{{\epsilon}}\frac{q\left(\hat{y}\right)}{P\left(\hat{y}\vert \hat{x}\right)},\hfill \end{align} \tag{ A4 }$

where in the secondline we have used that q() = , and in the third that q(y) = (1 − )q(y) + P(y|), so q(y) ⩾ (1 − )q(y) and q(y) ⩾ P(y|).

Note that the right-hand side of equation (A4) goes to ∞ as → 0. Combined with equation (A3) and that Γ(q) is finite implies that Γ(u) = ∞. However, ${\Gamma}\left(u\right)=S\left(P\left(Y\vert \hat{x}\right)\right)+f\left(\hat{x}\right){\leqslant}\vert \mathcal{Y}\vert +f\left(\hat{x}\right)$ , which is finite. We thus have a contradiction, so q cannot be the minimizer. □

Lemma A3. For any island c ∈ L(P), q ∈ $\underset{s:\mathrm{supp}s\subseteq c}{\mathrm{argmin}}$ Γ(p) is unique.

Proof. Consider any two distributions p, q ∈ $\underset{s:\mathrm{supp}s\subseteq c}{\mathrm{argmin}}$ Γ(s), and let P' = Pp, q' = Pq. We will prove that p = q.

First, note that by lemma A2, supp q = supp p = c. By theorem A1,

$\begin{align*}\hfill {\Gamma}\left(p\right)-{\Gamma}\left(q\right)& =D\left(p{\Vert}q\right)-D\left({p}^{\prime }{\Vert}{q}^{\prime }\right)\hfill \\ \hfill & =\sum _{x,y}p\left(x\right)P\left(y\vert x\right)\mathrm{ln}\frac{p\left(x\right){q}^{\prime }\left(y\right)}{q\left(x\right){p}^{\prime }\left(y\right)}\hfill \\ \hfill & =\sum _{x,y}p\left(x\right)P\left(y\vert x\right)\mathrm{ln}\frac{p\left(x\right)P\left(y\vert x\right)}{q\left(x\right){p}^{\prime }\left(y\right)P\left(y\vert x\right)/{q}^{\prime }\left(y\right)}\hfill \\ \hfill & {\geqslant}0\hfill \end{align*}$

where the last line uses the log-sum inequality. If the inequality is strict, then p and q cannot both be minimizers, i.e., the minimizer must be unique, as claimed.

If instead the inequality is not strict, i.e., Γ(p) − Γ(q) = 0, then there is some constant α such that for all x, y with P(y|x) > 0,

$\begin{equation}\frac{p\left(x\right)P\left(y\vert x\right)}{q\left(x\right){p}^{\prime }\left(y\right)P\left(y\vert x\right)/{q}^{\prime }\left(y\right)}=\alpha \end{equation} \tag{ A5 }$

which is the same as

$\begin{equation}\frac{p\left(x\right)}{q\left(x\right)}=\alpha \frac{{p}^{\prime }\left(y\right)}{{q}^{\prime }\left(y\right)}.\end{equation} \tag{ A6 }$

Now consider any two different states x, x' ∈ c such that P(y|x) > 0 and P(y|x') > 0 for some y (such states must exist by the definition of islands). For equation (A6) to hold for both x, x' with that same, shared y, it must be that p(x)/q(x) = p(x')/q(x'). Take another state x'' ∈ c such that P(y'|x'') > 0 and P(y'|x') > 0 for some y'. Since this must be true for all pairs x, x' ∈ c, p(x)/q(x) = const for all x ∈ c, and p = q, as claimed. □

Lemma A4. Γ(p) = ∑_c∈L(P)p(c)Γ(p^c).

Proof. First, for any island c ∈ L(P), define

$\begin{equation*}\phi \left(c\right)=\left\{y\in \mathcal{Y}:\exists x\in c\;\text{s.t.}\;P\left(y\vert x\right){ >}0\right\}.\end{equation*}$

In words, ϕ(c) is the subset of output states in $\mathcal{Y}$ that receive probability from input states in c. By the definition of the island decomposition, for any y ∈ ϕ(c), P(y|x) > 0 only if y ∈ c. Thus, for any p and any y ∈ ϕ(c), we can write

$\begin{equation}\frac{p\left(y\right)}{p\left(c\right)}=\frac{{\sum }_{x}P\left(y\vert x\right)p\left(x\right)}{p\left(c\right)}=\sum _{x\in \mathcal{X}}P\left(y\vert x\right){p}^{c}\left(x\right).\end{equation} \tag{ A7 }$

Using p = ∑_c∈L(P)p(c)p^c and linearity of expectation, write ${\mathbb{E}}_{p}\left[f\right]={\sum }_{c\in L\left(P\right)}p\left(c\right){\mathbb{E}}_{{p}^{c}}\left[f\right]$ . Then,

$\begin{align*}\hfill S\left(Pp\right)-S\left(p\right)& =-\sum _{y}p\left(y\right)\mathrm{ln}\;p\left(y\right)+\sum _{x}p\left(x\right)\mathrm{ln}\;p\left(x\right)\hfill \\ \hfill & =\sum _{c\in L\left(P\right)}p\left(c\right)\left[-\sum _{y\in \phi \left(c\right)}\frac{p\left(y\right)}{p\left(c\right)}\mathrm{ln}\frac{p\left(y\right)}{p\left(c\right)}+\sum _{x\in c}\frac{p\left(x\right)}{p\left(c\right)}\mathrm{ln}\frac{p\left(x\right)}{p\left(c\right)}\right]\hfill \\ \hfill & =\sum _{c\in L\left(P\right)}p\left(c\right)\left[S\left(P{p}^{c}\right)-S\left({p}^{c}\right)\right],\hfill \end{align*}$

where in the last line we have used equation (A7). Combining gives

$\begin{align*}\hfill {\Gamma}\left(p\right)& =\sum _{c\in L\left(P\right)}p\left(c\right)\left[S\left(P{p}^{c}\right)-S\left({p}^{c}\right)+{\mathbb{E}}_{{p}^{c}}\left[f\right]\right]\hfill \\ \hfill & =\sum _{c\in L\left(P\right)}p\left(c\right){\Gamma}\left({p}^{c}\right).\hfill \end{align*}$

□

We are now ready to prove the main result of this appendix.

Theorem 1. Consider any function ${\Gamma}:{{\Delta}}_{\mathcal{X}}\to \mathbb{R}$ of the form

$\begin{equation*}{\Gamma}\left(p\right){:=}S\left(Pp\right)-S\left(p\right)+{\mathbb{E}}_{p}\left[f\right]\end{equation*}$

where $P\left(y\vert x\right)$ is some conditional distribution of $y\in \mathcal{Y}$ given $x\in \mathcal{X}$ and $f:\mathcal{X}\to \mathbb{R}\cup \left\{\infty \right\}$ is some function. Let $\mathcal{Z}$ be any subset of $\mathcal{X}$ such that $f\left(x\right){< }\infty$ for $x\in \mathcal{Z}$ , and let $q\in {{\Delta}}_{\mathcal{Z}}$ be any distribution that obeys

$\begin{equation*}{q}^{c}\in \underset{r:\mathrm{s}\mathrm{u}\mathrm{p}\mathrm{p}r\subseteq c}{argmin}{\Gamma}\left(r\right)\quad \quad \;\text{for}\;\text{all}\;\;c\in {L}_{\mathcal{Z}}\left(P\right).\end{equation*}$

Then, each ${q}^{c}$ will be unique, and for any $p$ with $\mathrm{s}\mathrm{u}\mathrm{p}\mathrm{p}p\subseteq \mathcal{Z}$ ,

$\begin{equation*}{\Gamma}\left(p\right)=D\left(p{\Vert}q\right)-D\left(Pp{\Vert}Pq\right)+\sum _{c\in {L}_{\mathcal{Z}}\left(P\right)}p\left(c\right){\Gamma}\left({q}^{c}\right).\end{equation*}$

Proof. We prove the theorem by considering two cases separately.

Case 1: $\mathcal{Z}=\mathcal{X}$ . This case can be assumed when f(x) < ∞ for all x, so that ${L}_{\mathcal{Z}}\left(P\right)=L\left(P\right)$ . Then, by lemma A4, we have Γ(p) = ∑_c∈L(P)p(c)Γ(p^c). By lemma A2 and theorem A1,

$\begin{equation*}{\Gamma}\left({p}^{c}\right)-{\Gamma}\left({q}^{c}\right)=D\left({p}^{c}{\Vert}{q}^{c}\right)-D\left(P{p}^{c}{\Vert}P{q}^{c}\right),\end{equation*}$

where we have used that if some supp q^c = c, then q^c is in the relative interior of the set $\left\{s\in {{\Delta}}_{\mathcal{X}}:\mathrm{s}\mathrm{u}\mathrm{p}\mathrm{p}\;s\subseteq c\right\}$ . q^c is unique by lemma A3.

At the same time, observe that for any $p,r\in {{\Delta}}_{\mathcal{X}}$ ,

$\begin{align*}\hfill D\left(p{\Vert}r\right)-D\left(Pp{\Vert}Pr\right)& =\sum _{x}p\left(x\right)\mathrm{ln}\frac{p\left(x\right)}{r\left(x\right)}-\sum _{y}p\left(y\right)\mathrm{ln}\frac{p\left(y\right)}{r\left(y\right)}\hfill \\ \hfill & =\sum _{c\in L\left(P\right)}p\left(c\right)\left[\sum _{x\in c}\frac{p\left(x\right)}{p\left(c\right)}\mathrm{ln}\frac{p\left(x\right)/p\left(c\right)}{r\left(x\right)/r\left(c\right)}-\sum _{y\in \phi \left(c\right)}\frac{p\left(y\right)}{p\left(c\right)}\mathrm{ln}\frac{p\left(y\right)/p\left(c\right)}{r\left(y\right)/r\left(c\right)}\right]\hfill \\ \hfill & =\sum _{c\in L\left(P\right)}p\left(c\right)\left[D\left({p}^{c}{\Vert}{r}^{c}\right)-D\left(P{p}^{c}{\Vert}P{r}^{c}\right)\right].\hfill \end{align*}$

The theorem follows by combining.

Case 2: $\mathcal{Z}\subset \mathcal{X}$ . In this case, define a 'restriction' of f and P to domain $\mathcal{Z}$ as follows:

(a)
Define $\tilde {f}:\mathcal{Z}\to \mathbb{R}$ via $\tilde {f}\left(x\right)=f\left(x\right)$ for $x\in \mathcal{Z}$ .
(b)
Define the conditional distribution $\tilde {P}\left(y\vert x\right)$ for $y\in \mathcal{Y},x\in \mathcal{Z}$ via $\tilde {P}\left(y\vert x\right)=P\left(y\vert x\right)$ for all $y\in \mathcal{Y},x\in \mathcal{Z}$ .

In addition, for any distribution $p\in {{\Delta}}_{\mathcal{X}}$ with $\mathrm{s}\mathrm{u}\mathrm{p}\mathrm{p}\;p\subseteq \mathcal{Z}$ , let $\tilde {p}$ be a distribution over $\mathcal{Z}$ defined via $\tilde {p}\left(x\right)=p\left(x\right)$ for $x\in \mathcal{Z}$ . Now, by inspection, it can be verified that for any $p\in {{\Delta}}_{\mathcal{X}}$ with $\mathrm{s}\mathrm{u}\mathrm{p}\mathrm{p}\;p\subseteq \mathcal{Z}$ ,

$\begin{equation}{\Gamma}\left(p\right)=S\left(\tilde {P}\tilde {p}\right)-S\left(\tilde {p}\right)+{\mathbb{E}}_{\tilde {p}}\left[\tilde {f}\right]{:=}\tilde {{\Gamma}}\left(\tilde {p}\right)\end{equation} \tag{ A8 }$

We can now apply case 1 of the theorem to the function $\tilde {{\Gamma}}:{{\Delta}}_{\mathcal{Z}}\to \mathbb{R}$ , as defined in terms of the tuple $\left(\mathcal{Z},\tilde {f},\tilde {P}\right)$ (rather than the function ${\Gamma}:{{\Delta}}_{\mathcal{X}}\to \mathbb{R}$ , as defined in terms of the tuple $\left(\mathcal{X},f,P\right)$ ). This gives

$\begin{equation}\tilde {{\Gamma}}\left(\tilde {p}\right)=D\left(\tilde {p}{\Vert}\tilde {q}\right)-D\left(\tilde {P}\tilde {p}{\Vert}\tilde {P}\tilde {q}\right)+\sum _{c\in L\left(\tilde {P}\right)}\tilde {p}\left(c\right)\tilde {{\Gamma}}\left({\tilde {q}}^{c}\right),\end{equation} \tag{ A9 }$

where, for all $c\in L\left(\tilde {P}\right)$ , ${\tilde {q}}^{c}$ is the unique distribution that satisfies ${\tilde {q}}^{c}\in {\mathrm{argmin}}_{r\in {{\Delta}}_{\mathcal{Z}}:\mathrm{s}\mathrm{u}\mathrm{p}\mathrm{p}\;r\subseteq c}\tilde {{\Gamma}}\left(r\right)$ .

Now, let q be the natural extension of $\tilde {q}$ from ${{\Delta}}_{\mathcal{Z}}$ to ${{\Delta}}_{\mathcal{X}}$ . Clearly, for all $c\in L\left(\tilde {P}\right)$ , ${\Gamma}\left({q}^{c}\right)=\tilde {{\Gamma}}\left({\tilde {q}}^{c}\right)$ by equation (A8). In addition, each q^c is the unique distribution that satisfies ${q}^{c}\in {\mathrm{argmin}}_{r\in {{\Delta}}_{\mathcal{X}}:\text{supp}\;r\subseteq c}{\Gamma}\left(r\right)$ . Finally, it is easy to verify that $D\left(\tilde {p}{\Vert}\tilde {q}\right)=D\left(p{\Vert}q\right)$ , $D\left(\tilde {P}\tilde {p}{\Vert}\tilde {P}\tilde {q}\right)=D\left(Pp{\Vert}Pq\right)$ , $L\left(\tilde {P}\right)={L}_{\mathcal{Z}}\left(P\right)$ (recall the definition of ${L}_{\mathcal{Z}}$ from section 2.4). Combining the above results with equation (A8) gives

$\begin{equation*}{\Gamma}\left(p\right)=\tilde {{\Gamma}}\left(\tilde {p}\right)=D\left(p{\Vert}q\right)-D\left(Pp{\Vert}Pq\right)+\sum _{c\in {L}_{\mathcal{Z}}\left(P\right)}p\left(c\right){\Gamma}\left({q}^{c}\right).\end{equation*}$

□

Example 8. Suppose we are interested in thermodynamic costs associated with functions f whose image contains the value infinity, i.e., $f:\mathcal{X}\to \mathbb{R}\cup \left\{\infty \right\}$ . For such functions, Γ(p) = ∞ for any p which has support over an $x\in \mathcal{X}$ such that f(x) = ∞. In such a case it is not meaningful to consider a prior distribution q (as in theorem 1) which has support over any x with f(x) = ∞. For such functions we also are no longer able to presume that the optimal distribution has full support within each island of c ∈ L(P), because in general the proof of lemma A2 no longer holds when f can take infinite values.

Nonetheless, by equation (A9), for the purposes of analyzing the thermodynamic costs of actual initial distributions p that have finite Γ(p) (and so have zero mass on any x such that f(x) = ∞), we can always carry out our usual analysis if we first reduce the problem to an appropriate 'restriction' of f.

Example 9. Suppose we wish to implement a (discrete-time) dynamics P(x'|x) over $\mathcal{X}$ using a CTMC. Recall from the end of section 2.2 that by appropriately expanding the state space $\mathcal{X}$ to include a set of 'hidden states' $\mathcal{Z}$ in addition to $\mathcal{X}$ , and appropriately designing the rate matrices over that expanded state space $\mathcal{X}\cup \mathcal{Z}$ , we can ensure that the resultant evolution over $\mathcal{X}$ is arbitrarily close to the desired conditional distribution P. Indeed, one can even design those rates matrices over $\mathcal{X}\cup \mathcal{Z}$ so that not only is the dynamics over $\mathcal{X}$ arbitrarily close to the desired P, but in addition the EF generated in running that CTMC over $\mathcal{X}\cup \mathcal{Z}$ is arbitrarily close to the lower bound of equation (21) [21].

However, in any real-world system that implements some P with a CTMC over an expanded space $\mathcal{X}\cup \mathcal{Z}$ , that lower bound will not be achieved, and nonzero EP will be generated. In general, to analyze the EP of such real-world systems one has to consider the mismatch cost and residual EP of the full CTMC over the expanded space $\mathcal{X}\cup \mathcal{Z}$ . Fortunately though, we can design the CTMC over $\mathcal{X}\cup \mathcal{Z}$ so that when it begins the implementation of P, there is zero probability mass on any of the states in $\mathcal{Z}$ [21, 22]. If we do that, then we can apply equation (A9), and so restrict our calculations of mismatch cost and residual EP to only involve the dynamics over $\mathcal{X}$ , without any concern for the dynamics over $\mathcal{Z}$ .

Example 10. Our last example is to derive the alternative decomposition of the EP of an SR circuit which is discussed in section 6.2. Recall that due to equation (29), the initial distribution over any gate in an SR circuit has partial support. This means we can apply equation (29) to decompose the EF, in direct analogy to the use of theorem 1 to derive theorem 2—only with the modification that the spaces X and Y are set to ${\mathcal{X}}_{\mathrm{p}\mathrm{a}\left(g\right)}$ and ${\mathcal{X}}_{g}$ , respectively, rather than both set to ${\mathcal{X}}_{n\left(g\right)}$ , as was done in deriving theorem 2. (Note that the islands also change when we apply equation (29) rather than theorem 1, from the islands of P_g to the islands of π_g). The end result is a decomposition of EF just like that in theorem 2, in which we have the same circuit Landauer cost and circuit Landauer loss expressions as in that theorem, but now have the modified forms of circuit mismatch cost and of circuit residual EP introduced in section 6.2.

Appendix B.: Thermodynamics costs for SR circuits

To begin, we will make use of the fact that there is no overlap in time among the solitary processes in an SR circuit, so the total EF incurred can be written as

$\begin{equation}\mathcal{Q}\left(p\right)=\sum _{g\in G}{\mathcal{Q}}_{g}\left({p}_{n\left(g\right)}\right).\end{equation} \tag{ B1 }$

Moreover, for each gate g, the solitary process that updates the variables in n(g) starts with x_g in its initialized state with probability 1. So we can overload notation and write ${\mathcal{Q}}_{g}\left({p}_{\mathrm{p}\mathrm{a}\left(g\right)}\right)$ instead of ${\mathcal{Q}}_{g}\left({p}_{n\left(g\right)}\right)$ for each gate g.

B.1. Derivation of theorem 2 and equation (49)

Apply theorem 1 to equation (18) to give

$\begin{align}\hfill {\hat{\sigma }}_{g}\left({p}_{n\left(g\right)}\right)& =D\left(\right.{p}_{n\left(g\right)}{\Vert}{q}_{n\left(g\right)}\left.\right)-D\left(\right.{P}_{g}{p}_{n\left(g\right)}{\Vert}{P}_{g}{q}_{n\left(g\right)}\left.\right)+\sum _{c\in L\left({P}_{g}\right)}{p}_{n\left(g\right)}\left(c\right){\hat{\sigma }}_{g}\left({q}_{n\left(g\right)}^{c}\right),\hfill \end{align} \tag{ B2 }$

where q_n(g) is a distribution that satisfies

$\begin{equation*}{q}_{n\left(g\right)}^{c}\in \underset{r:\mathrm{s}\mathrm{u}\mathrm{p}\mathrm{p}\;r\subseteq c}{\mathrm{argmin}}{\hat{\sigma }}_{g}\left(r\right)\end{equation*}$

for all islands c ∈ L(P_g). Next, for convenience use equation (B1) to write $\mathcal{Q}\left(p\right)$ as

$\begin{equation}\mathcal{Q}\left(p\right)=\mathcal{L}\left(p\right)+\left(\left[\sum _{g\in G}{\mathcal{Q}}_{g}\left({p}_{\mathrm{p}\mathrm{a}\left(g\right)}\right)\right]-\mathcal{L}\left(p\right)\right)\end{equation} \tag{ B3 }$

The basic decomposition of $\mathcal{Q}\left(p\right)$ given in theorem 2 into a sum of $\mathcal{L}$ (defined in equation (37)), ${\mathcal{L}}^{\mathrm{l}\mathrm{o}\mathrm{s}\mathrm{s}}$ (defined in equation (47)), $\mathcal{M}$ (defined in equation (50)), and $\mathcal{R}$ (defined in equation (51)) comes from combining equations (43), (B2) and (B3) and then grouping and redefining terms.

Next, again use the fact that the solitary processes have no overlap in time to establish that the minimal value of the sum of the EPs of the gates is the sum of the minimal EPs of the gates considered separately of one another. As a result, we can jointly take ${\hat{\sigma }}_{g}\left({p}_{n\left(g\right)}\right)\to 0$ for all gates g in the circuit [21]. We can then use equation (B1) to establish that the minimal EF of the circuit is simply the sum of the minimal EFs of running each of the gates in the circuit, i.e., the sum of the subsystem Landauer costs of running the gates. In other words, the circuit Landauer cost is

$\begin{equation} {\mathcal{L}}^{\mathrm{c}\mathrm{i}\mathrm{r}\mathrm{c}}\left(p\right)=\sum _{g\in G}\left[S\left({p}_{n\left(g\right)}\right)-S\left({P}_{g}{p}_{n\left(g\right)}\right)\right]\end{equation} \tag{ B4 }$

$\begin{equation}=\sum _{g\in G}\left[S\left({p}_{\mathrm{p}\mathrm{a}\left(g\right)}\right)-S\left({\pi }_{g}{p}_{\mathrm{p}\mathrm{a}\left(g\right)}\right)\right]\end{equation} \tag{ B5 }$

$\begin{equation}=\sum _{g\in G{\backslash}W}\left[S\left({p}_{\mathrm{p}\mathrm{a}\left(g\right)}\right)-S\left({\pi }_{g}{p}_{\mathrm{p}\mathrm{a}\left(g\right)}\right)\right]\end{equation} \tag{ B6 }$

To derive the second line, we have used the fact that in an SR circuit, each gate is set to its initialized value at the beginning of its solitary process with probability 1, and that its parents are set to their initialized states with probability 1 at the end of the process. Then to derive the third line we have used the fact that wire gates implement the identity map, and so S(p_pa(g)) − S(π_gp_pa(g)) = 0 for all g ∈ W.

This establishes equation (49).

B.2. Derivation of equation (55)

To derive equation (55), we first write Landauer cost as

$\begin{equation*}\mathcal{L}\left(p\right)=S\left(p\right)-S\left(Pp\right)=\sum _{g\in G}\left[S\left({p}_{V}^{\mathrm{b}\mathrm{e}\mathrm{g}\left(g\right)}\right)-S\left({p}_{V}^{\mathrm{e}\mathrm{n}\mathrm{d}\left(g\right)}\right)\right].\end{equation*}$

We then write circuit Landauer loss as

$\begin{equation} {\mathcal{L}}^{\mathrm{l}\mathrm{o}\mathrm{s}\mathrm{s}}\left(p\right)=\sum _{g\in G}\left[S\left({p}_{n\left(g\right)}\right)-S\left({P}_{g}{p}_{n\left(g\right)}\right)\right]-\mathcal{L}\left(p\right)=\sum _{g\in G}\left[\left(S\left({p}_{n\left(g\right)}\right)-S\left({p}_{V}^{\mathrm{b}\mathrm{e}\mathrm{g}\left(g\right)}\right)\right)-\left(S\left({P}_{g}{p}_{n\left(g\right)}\right)-S\left({p}_{V}^{\mathrm{e}\mathrm{n}\mathrm{d}\left(g\right)}\right)\right)\right]=\sum _{g\in G}\left[\left(S\left({p}_{n\left(g\right)}\right)+S\left({p}_{V{\backslash}n\left(g\right)}^{\mathrm{b}\mathrm{e}\mathrm{g}\left(g\right)}\right)-S\left({p}_{V}^{\mathrm{b}\mathrm{e}\mathrm{g}\left(g\right)}\right)\right)-\left(S\left({P}_{g}{p}_{n\left(g\right)}\right)+S\left({p}_{V{\backslash}n\left(g\right)}^{\mathrm{e}\mathrm{n}\mathrm{d}\left(g\right)}\right)-S\left({p}_{V}^{\mathrm{e}\mathrm{n}\mathrm{d}\left(g\right)}\right)\right)\right]\end{equation} \tag{ B7 }$

$\begin{equation}=\sum _{g\in G}\left[{I}_{{p}^{\mathrm{b}\mathrm{e}\mathrm{g}\left(g\right)}}\left({X}_{n\left(g\right)};{X}_{V{\backslash}n\left(g\right)}\right)-{I}_{{p}^{\mathrm{e}\mathrm{n}\mathrm{d}\left(g\right)}}\left({X}_{n\left(g\right)};{X}_{V{\backslash}n\left(g\right)}\right)\right],\end{equation} \tag{ B8 }$

In equation (B7), we used the fact that a solitary process over n(g) leaves the nodes in V\n(g) unmodified, thus $S\left({p}_{V{\backslash}n\left(g\right)}^{\mathrm{b}\mathrm{e}\mathrm{g}\left(g\right)}\right)=S\left({p}_{V{\backslash}n\left(g\right)}^{\mathrm{e}\mathrm{n}\mathrm{d}\left(g\right)}\right)$ .

Given the assumption that x_g = ∅ at the beginning of the solitary process for gate g, we can rewrite

$\begin{equation}{I}_{{p}^{\mathrm{b}\mathrm{e}\mathrm{g}\left(g\right)}}\left({X}_{n\left(g\right)};{X}_{V{\backslash}n\left(g\right)}\right)={I}_{{p}^{\mathrm{b}\mathrm{e}\mathrm{g}\left(g\right)}}\left({X}_{\mathrm{p}\mathrm{a}\left(g\right)};{X}_{V{\backslash}\mathrm{p}\mathrm{a}\left(g\right)}\right).\end{equation} \tag{ B9 }$

Similarly, because X_v = ∅ for all v ∈ pa(g) at the end of the solitary process for gate g, we can rewrite

$\begin{equation}{I}_{{p}^{\mathrm{e}\mathrm{n}\mathrm{d}\left(g\right)}}\left({X}_{n\left(g\right)};{X}_{V{\backslash}n\left(g\right)}\right)={I}_{{p}^{\mathrm{e}\mathrm{n}\mathrm{d}\left(g\right)}}\left({X}_{g};{X}_{V{\backslash}g}\right).\end{equation} \tag{ B10 }$

Finally, for any wire gate g ∈ W, given the assumption that X_g = ∅ at the beginning of the solitary process, we can write

$\begin{equation}{I}_{{p}^{\mathrm{b}\mathrm{e}\mathrm{g}\left(g\right)}}\left({X}_{\mathrm{p}\mathrm{a}\left(g\right)};{X}_{V{\backslash}\mathrm{p}\mathrm{a}\left(g\right)}\right)={I}_{{p}^{\mathrm{e}\mathrm{n}\mathrm{d}\left(g\right)}}\left({X}_{g};{X}_{V{\backslash}g}\right).\end{equation} \tag{ B11 }$

Equation (55) then follows from combining equations (B8)–(B11) and simplifying.

B.3. Derivation of corollary 4 and equation (57)

First, write circuit Landauer loss as

$\begin{equation}{\mathcal{L}}^{\mathrm{l}\mathrm{o}\mathrm{s}\mathrm{s}}\left(p\right)=\sum _{g\in G{\backslash}W}\left[S\left({p}_{\mathrm{p}\mathrm{a}\left(g\right)}\right)-S\left({p}_{g}\right)\right]-\mathcal{L}\left(p\right).\end{equation} \tag{ B12 }$

Then, rewrite the sum in equation (B12) as

$\begin{align}\hfill \sum _{g\in G{\backslash}W}\left[S\left({p}_{\mathrm{p}\mathrm{a}\left(g\right)}\right)-S\left({p}_{g}\right)\right]& =\sum _{g\in G{\backslash}W}S\left({p}_{\mathrm{p}\mathrm{a}\left(g\right)}\right)-\sum _{g\in G{\backslash}W}S\left({p}_{g}\right)\hfill \\ \hfill & =\sum _{g\in G{\backslash}W}S\left({p}_{\mathrm{p}\mathrm{a}\left(g\right)}\right)-\sum _{g\in G{\backslash}\left(W\cup \mathrm{O}\mathrm{U}\mathrm{T}\right)}S\left({p}_{g}\right)-\sum _{g\in \mathrm{O}\mathrm{U}\mathrm{T}}S\left({p}_{g}\right)\hfill \\ \hfill & =\sum _{g\in G{\backslash}W}S\left({p}_{\mathrm{p}\mathrm{a}\left(g\right)}\right)-\sum _{v\in V{\backslash}\left(W\cup \mathrm{O}\mathrm{U}\mathrm{T}\right)}S\left({p}_{v}\right)+\sum _{v\in \mathrm{I}\mathrm{N}}S\left({p}_{v}\right)-\sum _{g\in \mathrm{O}\mathrm{U}\mathrm{T}}S\left({p}_{g}\right).\hfill \end{align} \tag{ B13 }$

Now, notice that for every v ∈ V\(W ∪ OUT) (i.e., every node which is not a wire and not an output), there is a corresponding wire w which transmits v to its child, and which has S(p_w) = S(p_v). This lets us rewrite equation (B13) as

$\begin{align}\hfill & \sum _{g\in G{\backslash}W}S\left({p}_{\mathrm{p}\mathrm{a}\left(g\right)}\right)-\sum _{w\in W}S\left({p}_{w}\right)+\sum _{v\in \mathrm{I}\mathrm{N}}S\left({p}_{v}\right)-\sum _{g\in \mathrm{O}\mathrm{U}\mathrm{T}}S\left({p}_{g}\right)\hfill \\ \hfill & =\sum _{g\in G{\backslash}W}\left[S\left({p}_{\mathrm{p}\mathrm{a}\left(g\right)}\right)-\sum _{v\in \mathrm{p}\mathrm{a}\left(g\right)}S\left({p}_{v}\right)\right]+\sum _{v\in \mathrm{I}\mathrm{N}}S\left({p}_{v}\right)-\sum _{g\in \mathrm{O}\mathrm{U}\mathrm{T}}S\left({p}_{g}\right)\hfill \\ \hfill & =-\sum _{g\in G}\mathcal{I}\left({p}_{\mathrm{p}\mathrm{a}\left(g\right)}\right)+\sum _{v\in \mathrm{I}\mathrm{N}}S\left({p}_{v}\right)-\sum _{g\in \mathrm{O}\mathrm{U}\mathrm{T}}S\left({p}_{g}\right)\hfill \end{align} \tag{ B14 }$

where in the second line we have used the fact that every wire belongs to exactly one set pa(g) for a non-wire gate g, and in the last line we used the definition of multi-information. Then, using the definition $\mathcal{L}\left(p\right)=S\left({p}_{\mathrm{I}\mathrm{N}}\right)-S\left({p}_{\mathrm{O}\mathrm{U}\mathrm{T}}\right)$ , the definition of multi-information, and by combining equations (B12)–(B14), we have

$\begin{equation}{\mathcal{L}}^{\mathrm{l}\mathrm{o}\mathrm{s}\mathrm{s}}\left(p\right)=\mathcal{I}\left({p}_{\mathrm{I}\mathrm{N}}\right)-\mathcal{I}\left({\pi }_{{\Phi}}{p}_{\mathrm{I}\mathrm{N}}\right)-\sum _{g\in G}\mathcal{I}\left({p}_{\mathrm{p}\mathrm{a}\left(g\right)}\right).\end{equation} \tag{ B15 }$

To derive equation (57), note that

$\begin{align}\hfill \mathcal{I}\left({p}_{\mathrm{I}\mathrm{N}}\right)& =\left[\sum _{v\in \mathrm{I}\mathrm{N}}S\left({p}_{v}\right)\right]-S\left({p}_{\mathrm{I}\mathrm{N}}\right){\leqslant}\sum _{v\in \mathrm{I}\mathrm{N}}S\left({p}_{v}\right){\leqslant}\sum _{v\in \mathrm{I}\mathrm{N}}\mathrm{ln}\;\left\vert {\mathcal{X}}_{v}\right\vert =\mathrm{ln}\left\vert \;,\prod _{v\in \mathrm{I}\mathrm{N}},{\mathcal{X}}_{v}\right\vert =\mathrm{ln}\;\left\vert {\mathcal{X}}_{\mathrm{I}\mathrm{N}}\right\vert .\hfill \end{align} \tag{ B16 }$

Appendix C.: SR circuits with out-degree greater than 1

In this appendix, we consider a more general version of SR circuits, in which non-output gates can have out-degree greater than 1.

First, we need to modify the definition of an SR circuit in section 5. This is because in SR circuits, the subsystem corresponding to a given gate g reinitializes all of the parents of that gate to their initialized state, ∅. If, however, there is some node v that has out-degree greater than 1—i.e., has more than one child—then we must guarantee that no such v is reinitialized by one its children gates before all of its children gates have run. To do so, we require that each non-output node v in the circuit is reinitialized only by the last of its children gates to run, while the earlier children (if any) apply the identity map to v.

Note that this rule could result in different thermodynamic costs of an overall circuit, depending on the precise topological order we use to determine which of the children of a given v reinitialized v. This would mean that the entropic costs of running a circuit would depend on the (arbitrary) choice we make for the topological order of the gates in the circuit. This issue will not arise in this paper however. To see why, recall that we model the wires in the circuit themselves as gates, which have both in-degree and out-degree equal to 1. As a result, if v has out-degree greater than 1, then v is not a wire gate, and therefore all of its children must be wire gates—and therefore none of those children has multiple parents. So the problem is automatically avoided.

We now prove that for SR circuits with out-degree greater than 1, circuit Landauer loss can be arbitrarily large.

Proposition 1. For any ${\pi }_{{\Phi}}$ , non-delta function input distribution ${p}_{\mathrm{I}\mathrm{N}}$ , and $\kappa {\geqslant}0$ , there exist an SR circuit with out-degree greater than 1 that implements ${\pi }_{{\Phi}}$ for which ${\mathcal{L}}^{\mathrm{l}\mathrm{o}\mathrm{s}\mathrm{s}}\left(p\right){\geqslant}\kappa$ .

Proof. Let ${\Phi}=\left(V,E,F,\mathcal{X}\right)$ be such a circuit that implements π_Φ. Given that p is not a delta function, there must be an input node, which we call v, such that S(p_v) > 0. Take g ∈ OUT to be any output gate of Φ, and let π_g ∈ F be its update map.

Construct a new circuit ${{\Phi}}^{\prime }=\left({V}^{\prime },{E}^{\prime },{F}^{\prime },{\mathcal{X}}^{\prime }\right)$ , as follows:

(a)
V' = V ∪ {w', g', w''};
(b)
E' = E ∪ {(v, w'), (w', g'), (g', w''), (w'', g)};
(c)
${F}^{\prime }=\left(F\;{\backslash}{\pi }_{g}\right)\;\cup \;\left\{{\pi }_{{w}^{\prime }},{\pi }_{{w}^{{\prime\prime}}},{\pi }_{{g}^{\prime }},{\pi }_{g}^{\prime }\right\}$ where
$\begin{align*}\hfill {\pi }_{{w}^{\prime }}\left({x}_{{w}^{\prime }}\vert {x}_{v}\right)& =\delta \left({x}_{{w}^{\prime }},{x}_{v}\right)\hfill \\ \hfill {\pi }_{{w}^{{\prime\prime}}}\left({x}_{{w}^{{\prime\prime}}}\vert {x}_{{g}^{\prime }}\right)& =\delta \left({x}_{{w}^{{\prime\prime}}},{x}_{{g}^{\prime }}\right)\hfill \\ \hfill {\pi }_{{g}^{\prime }}\left({x}_{{g}^{\prime }}\vert {x}_{w}\right)& =\delta \left({x}_{{g}^{\prime }},\varnothing \right)\hfill \\ \hfill {\pi }_{g}^{\prime }\left({x}_{g}\vert {x}_{\mathrm{p}\mathrm{a}\left(g\right)},{x}_{{w}^{{\prime\prime}}}\right)& ={\pi }_{g}\left({x}_{g}\vert {x}_{\mathrm{p}\mathrm{a}\left(g\right)}\right).\hfill \end{align*}$
(d)
${\mathcal{X}}_{{w}^{\prime }}={\mathcal{X}}_{{g}^{\prime }}={\mathcal{X}}_{{w}^{{\prime\prime}}}={\mathcal{X}}_{v}$ .

In words, Φ' is the same as Φ except that: (a) we have added an 'erasure gate' g' which takes v as input (through a new wire gate w'), and (b) this erasure gate is provided as an additional input, which is completely ignored, to one of the existing output gates g (through a new wire gate w'').

It is straightforward to see that π_Φ' = π_Φ. At the same time, S(p_pa(g')) − S(Π_g'p_pa(g')) = S(p_v), thus

$\begin{equation}{\mathcal{L}}_{{{\Phi}}^{\prime }}^{\mathrm{l}\mathrm{o}\mathrm{s}\mathrm{s}}\left(p\right)={\mathcal{L}}_{{\Phi}}^{\mathrm{l}\mathrm{o}\mathrm{s}\mathrm{s}}\left(p\right)+S\left({p}_{v}\right),\end{equation} \tag{ C1 }$

where ${\mathcal{L}}_{{{\Phi}}^{\prime }}^{\mathrm{l}\mathrm{o}\mathrm{s}\mathrm{s}}$ and ${\mathcal{L}}_{{\Phi}}^{\mathrm{l}\mathrm{o}\mathrm{s}\mathrm{s}}$ indicate the circuit Landauer loss of Φ' and Φ respectively. This procedure can be carried out again to create a new circuit Φ'' from Φ', which also implements π_Φ but which now has Landauer loss ${\mathcal{L}}_{{{\Phi}}^{{\prime\prime}}}^{\mathrm{l}\mathrm{o}\mathrm{s}\mathrm{s}}\left(p\right)={\mathcal{L}}_{{\Phi}}^{\mathrm{l}\mathrm{o}\mathrm{s}\mathrm{s}}\left(p\right)+2S\left({p}_{v}\right)$ . Iterating, we can construct a circuit with an arbitrarily large Landauer loss which implements π_Φ. □

Thermodynamics of computing with circuits

Article metrics

Author e-mails

Author affiliations

Author notes

ORCID iDs

Dates

Abstract

1. Introduction

1.1. Contributions

1.2. Roadmap

2. Background

2.1. General notation

2.2. Stochastic thermodynamics

2.3. Information theory

2.4. 'Island' decomposition of a conditional distribution

2.5. Circuit theory

3. Decomposition of EF

4. Solitary processes

5. Serial-reinitialized circuits

6. Thermodynamic costs of SR circuits

6.1. General decomposition of EF and EP

6.2. Circuit-based decompositions of EF and EP

6.3. Information theory and circuit Landauer loss

6.4. Information theory and circuit mismatch loss

7. Related work

8. Discussion and future work

Acknowledgments

Appendix A.: Proof of theorem 1 and related results

A.1. Preliminaries

A.2. Proofs

Appendix B.: Thermodynamics costs for SR circuits

B.1. Derivation of theorem 2 and equation (49)

B.2. Derivation of equation (55)

B.3. Derivation of corollary 4 and equation (57)

Appendix C.: SR circuits with out-degree greater than 1

Footnotes

Thermodynamics of computing with circuits

Article metrics

Share this article

Author e-mails

Author affiliations

Author notes

ORCID iDs

Dates

Abstract

1. Introduction

1.1. Contributions

1.2. Roadmap

2. Background

2.1. General notation

2.2. Stochastic thermodynamics

2.3. Information theory

2.4. 'Island' decomposition of a conditional distribution

2.5. Circuit theory

3. Decomposition of EF

4. Solitary processes

5. Serial-reinitialized circuits

6. Thermodynamic costs of SR circuits

6.1. General decomposition of EF and EP

6.2. Circuit-based decompositions of EF and EP

6.3. Information theory and circuit Landauer loss

6.4. Information theory and circuit mismatch loss

7. Related work

8. Discussion and future work

Acknowledgments

Appendix A.: Proof of theorem 1 and related results

A.1. Preliminaries

A.2. Proofs

Appendix B.: Thermodynamics costs for SR circuits

B.1. Derivation of theorem 2 and equation (49)

B.2. Derivation of equation (55)

B.3. Derivation of corollary 4 and equation (57)

Appendix C.: SR circuits with out-degree greater than 1

Footnotes