Number of hidden states needed to physically implement a given conditional distribution

We consider the problem of implementing a given conditional distribution relating the states of a physical system at two separate times using a physical process with (potentially time-inhomogeneous) master equation dynamics. This problem arises implicitly in many nonequilibrium statistical physics scenarios, e.g., when designing processes to implement some desired computations, feedback-control protocols, and Maxwellian demons. However it is known that many such conditional distributions $P$ over a state space $X$ cannot be implemented using master equation dynamics over just the states in $X$. Here we show that any conditional distribution $P$ can be implemented---if the process has access to additional"hidden"states, not in $X$. In particular, we show that any conditional distribution can be implemented in a thermodynamically reversible manner (achieving zero entropy production) if there are enough hidden states available. We investigate how the minimal number of such states needed to implement any $P$ in a thermodynamically reversible manner depends on $P$. We provide exact results in the special case of conditional distributions that reduce to single-valued functions. For the fully general case, we provide an upper bound in terms of the nonnegative rank of $P$. In particular, we show that having access to one extra binary degree of freedom (doubling the number of states) is sufficient to carry out any $P$. Our results provide a novel type of bound on the physical resources needed to perform information processing---the size of a system's state space.

We consider the problem of implementing a given conditional distribution relating the states of a physical system at two separate times using a physical process with (potentially timeinhomogeneous) master equation dynamics. This problem arises implicitly in many nonequilibrium statistical physics scenarios, e.g., when designing processes to implement some desired computations, feedback-control protocols, and Maxwellian demons. However it is known that many such conditional distributions P over a state space X cannot be implemented using master equation dynamics over just the states in X. Here we show that any conditional distribution P can be implemented -if the process has access to additional "hidden" states, not in X. In particular, we show that any conditional distribution can be implemented in a thermodynamically reversible manner (achieving zero entropy production) if there are enough hidden states available. We investigate how the minimal number of such states needed to implement any P in a thermodynamically reversible manner depends on P . We provide exact results in the special case of conditional distributions that reduce to single-valued functions. For the fully general case, we provide an upper bound in terms of the nonnegative rank of P . In particular, we show that having access to one extra binary degree of freedom (doubling the number of states) is sufficient to carry out any P . Our results provide a novel type of bound on the physical resources needed to perform information processing-the size of a system's state space.

A. Background
Often in science and engineering we want to understand how a physical system can implement a given stochastic matrix P taking its initial, "input" state x 0 ∈ X to its "output" state x 1 at some fixed later time. For example, P may be a conditional distribution that governs the evolution of some naturally occurring system between two particular moments, and we wish to understand what underlying physical process could result in that distribution. In other situations, one wishes to design a physical process which implements a given inputoutput map, i.e., a physical system whose dynamics induce some specified conditional probability distribution of the state of a system at some final time given its initial state. Prototypical examples of this problem arise in the study of the thermodynamics of computation, as well as in the study of feedback-control protocols [1] and Maxwellian demons [2]. For example, one might be interested in physically implementing a logical operation such as "bit erasure" [3][4][5][6][7][8][9][10][11][12][13][14][15], in which both of two possible initial states, 0 and 1, are sent to a single final state (say, 0), or "bit randomization" [3,[7][8][9], in which both initial states are sent to final state 0 or final state 1 with equal probability. * Massachusetts Institute of Technology; Arizona State University Fundamental results relate the stochastic matrix implemented by a physical process and the amount of heat generated by that process [7,[15][16][17][18][19][20][21][22][23][24][25][26][27][28][29][30]. In particular, it is known that any physical process that transforms an initial distribution p(0) over system states to a final distribution p (1), while in contact with a heat bath at temperature T , must generate a minimum amount of heat, where S(·) is Shannon entropy in nats, k is Boltzmann's constant, and the units of time are arbitrary [10,13,14,[30][31][32][33][34][35][36][37][38][39][40][41][42][43][44][45][46][47][48]. Landauer's well-known bound of kT ln 2 heat generation per bit erasure [3,7,49] is a special case of the Eq. (1). Note that the bound given by Eq. (1) on the generated heat can be negative. For example, in bit randomization, Q ≥ −kT ln 2, and indeed such a process can extract kT ln 2 heat from a heat bath. Equality holds in Eq. (1) when the process implementing the transformation is thermodynamically reversible, i.e., one that results in zero entropy production [30,44,46]. Eq. (1) tells us that the (minimal) expected heat generated in a physical process that implements a given map does not depend on whether the map is logically invertible, or maximally non-invertible (so that the initial and final states are statistically independent), or anything in between. Only the entropies of the marginal distributions at the beginning and ending times matter, not the conditional distribution relating them. Furthermore, the examples discussed above-bit erasure and bit randomization-have been the canonical examples investigated in analyses of the thermodynamics of computa-tion. However they are very special types of computation; both are input-independent stochastic matrices, that is, the distribution over states at the final time does not depend on the initial state. Many of the analyses of these stochastic matrices implicitly exploited this fact-which means those analyses cannot apply to computation in which the output depends on the input. Recognizing this, in [10,45,47,48] Eq. (1) was shown to also apply to stochastic matrices that are input-dependent, as is the case in most real-world computations. This was done by showing how to construct a thermodynamically reversible process that implements any given input-output map of interest, including those whose output depends on their input.
Earlier work on thermodynamics of computation considered the cost of implementing a given input-output map in terms of thermodynamics quantities like heat, work, and entropy production. Here we consider a novel type of physical cost, stated in terms of the size of the state space required to implement a given inputoutput map. Specifically, we consider systems with master equation dynamics, i.e., possibly time-inhomogeneous continuous time Markov chains [CTMC] over a discrete state space, which are often employed in statistical physics, especially in the field of stochastic thermodynamics [39,41,50]. It is known that many conditional distributions P cannot be implemented with any CTMC. A simple example of such a map is the bit flip, which sends state 0 to state 1, and state 1 to state 0. Characterizing the set of stochastic matrices that can be implemented using CTMCs is known as the embedding problem in the mathematical literature [51][52][53][54][55][56]. As we show here, however, any map can in fact be implemented if it is carried out in an enlarged state space; i.e., the system's original state space X is part of a larger space Y = X ∪Z, and there is a CTMC over Y that induces a conditional distribution relating initial and final states whose restriction to X is the desired map P . We call the additional states in Z the hidden states of the implementation. This use of additional states was precisely how the work in [10,47,48] constructed processes to implement any possible input-output map (whether directly in terms of CTMCs, or on some Markovian coarse-graining of an underlying phase space). However, this earlier work did not investigate this cost. In this paper we begin an analysis of this cost by investigating how its minimal value depends on the desired map P , i.e., by investigating the minimal number of hidden states required to carry out any given map with a CTMC.

B. Contribution of this paper
In this paper we analyze this novel cost of information processing, bounding the minimal number of hidden states necessary to carry out any given input-output map using CTMC dynamics. We will show that for singlevalued stochastic matrices over a finite state space, one hidden state is necessary to carry out any invertible map (i.e., a permutation); we then prove by construction that it is also sufficient to do so (to use the previous example, we show that one additional state is necessary and sufficient to implement a bit flip). We use a similar technique to prove that any non-invertible single-valued map (e.g., bit erasure) can be implemented with no hidden states. A natural extension of our results to countably infinite state spaces, reveals that any single-valued map over an infinite state space can be implemented using one hidden state. For the case of noisy stochastic matrices over any finite state space, we prove by construction that any noisy map can be implemented using at most r − 1 hidden states, where r is the nonnegative matrix rank of the desired map.
Note that the nonnegative rank of an input-output map is upper bounded by the number of states in the map. Thus, our results mean that one additional binary degree of freedom (which doubles the number of available states) is sufficient to perform any computation over a finite number of states, and that this can be done in a thermodynamically reversible way.
Importantly, the bounds we derive here are universal, applying to all systems governed by continuous-time Markov dynamics, not just the precise ones analyzed in [10,47,48]. In addition, we show that our bounds are achievable not only by continuous-time physical processes, but by thermodynamically reversible processes. In most cases, this requirement does not impose any additional cost.
Some of our results are established with constructive proofs. These constructions are built out of fundamental processes that we call local relaxations. In particular, we show how various types of input-output stochastic matrices can be realized by composing local relaxations. Local relaxations have several useful physical properties, in that they can be implemented by CTMCs that obey local detailed balance (LDB) and are thermodynamically reversible, thus achieving the generalized Landauer's bound for minimal heat generation [30,47].
The paper is structured as follows. The next section provides relevant background and definitions. In section III, we formally define the quasistatic embedding problem, i.e., what it means to implement a given input-output map using a thermodynamically reversible CTMC. We derive our results for finite-state single-valued and noisy stochastic matrices in section IV. Section V extends the single-valued map results to the case of countably infinite state spaces. The last section concludes with a discussion. All proofs not in the text are in the appendix.
Finally, in a companion paper [57] we analyze the specific case where the CTMC implements a P that is a single-valued map in more detail. Motivated by engineering considerations, we consider a natural decomposition of any CTMC that can implement such a map into a sequence of "hidden" timesteps, demarcated by changes in the set of allowed transitions between states. We demon-strate a tradeoff between the number of hidden states and the number of such hidden timesteps. This tradeoff is analogous to space / time tradeoffs in theoretical computer science.

A. Stochastic thermodynamics
Consider a physical system with a finite state space X of size n, in thermal contact with one or more heat baths. Define ∆ n to be the set of all distributions over X. We write the probability distribution of the state at time t as p(t) ∈ ∆ n , where p i (t) is the probability that the system is in state i at time t. We also suppose that p(t) evolves according to a time-inhomogeneous continuoustime Markov chain (CTMC), i.e., so that the elements in each column of the rate matrix M (t) sum to zero and its off-diagonal entries are positive. We write M ij (t) to indicate the specific transition rate from state j to state i at time t.
For any CTMC, we can relate the distributions at the initial time t = 0 and some later time t = 1 by The stochastic matrix T (0, 1) is known as a transition matrix between times t = 0 and t = 1. It is given by the ordered exponential of M (t) over any pair t, t > t ∈ [0, 1]: Define the rate of (irreversible) entropy production (EP) [39] at time t for a system obeying Eq. (2) aṡ and define the total EP over the time interval [0, 1], given that p(0) = p, as EP (and therefore total EP) is nonnegative. It is zero if and only p(t) is a stationary state of M (t) for all t ∈ [0, 1], and M (t) obeys detailed balance for all those times. This means in particular that the same CTMC will result in different amounts of total EP depending on the initial distribution p(0). In particular, in general changing p(0) will change whether a given CTMC is thermodynamically reversible or not. (See [58] for more on the dependence of total EP on the initial distribution.) If the process is isothermal, i.e., connected to a single heat-bath at temperature T , then total EP can be written as [39], where Q is the heat transferred to the heat bath, and .
indicates the average over all trajectories of states from t = 0 to t = 1.

B. The embedding problem
Perhaps surprisingly, there are stochastic matrices P that cannot be implemented by any CTMC. This is true independent of thermodynamic considerations -using an infinite time (i.e., using units of time that are arbitrarily long), an infinite amount of total EP, etc., will not allow one to implement such matrices. So long as we assume that the dynamics of the underlying physical system obeys a CTMC (as in much of stochastic thermodynamics), we cannot find a way to physically implement such a stochastic matrix.
The problem of specifying the precise set of all matrices P that can be implemented by any CTMC is known as the embedding problem for Markov chains [51][52][53][54][55][56]. In the time-homogeneous case, i.e., the case where M (t) = M is constant, the embedding problem is unresolved in general for matrices larger than 3 × 3. If one imposes the constraint that the matrix M must obey detailed balance, while still assuming M (t) = M is constant, the problem has been resolved for all (finite) size state spaces [56] (this is known as the reversible embedding problem). Some results are also available if we relax the time-homogeneity constraint, allowing time-varying rate matrices. For example, it is known that stochastic matrices with zero or negative determinant have no timeinhomogeneous embedding, nor do stochastic matrices where the determinant is greater than the product of the diagonal entries [55,59].
Strictly speaking, these results for the timeinhomogeneous case mean that even bit erasure is not embeddable, since it has 0 determinant. However, some such stochastic matrices that cannot be exactly embedded can be approximated arbitrarily well with stochastic matrices that do have embeddings. For instance, it is possible to approximate bit erasure arbitrarily well using a sequence of stochastic matrices with smaller and smaller determinants. (Intuitively, stochastic matrices with zero-valued determinants can be approximated arbitrarily well by members of the set of stochastic matrices with strictly positive determinants.)

A. Preliminary definitions
Our analysis will concentrate on two types of stochastic matrix. The precise topology used (e.g., to define limits) is not important, though to ground thinking a topology defined in terms of matrix norms can be used.
We sometimes say that the sequence of CTMCs T (n) (t, t ) "limit-embeds" P .
The following definition introduces thermodynamic considerations into the analysis: Note that the definition of QE involves initial state distributions (in fact, it considers the set of all initial distributions), whereas the definition of limit-embeddable does not. 1 As mentioned above, in general any specific CTMC can only result in zero total EP for some specific initial distributions [58]. This is why the definition of QE allows the CTMC to change depending on the initial distribution q. The key idea is that whatever rate matrix we useas determined by the initial distribution q-the resultant transition matrix from t = 0 to t = 1 is arbitrarily close to P , and has arbitrarily little total EP.
Note that even though we have specified the initial and final times (t = 0 and t = 1, respectively), because time units are arbitrary, our analysis is not restricted to finitetime protocols [60][61][62]. To talk meaningfully about doing something (say, bit erasure) in a particular finite time interval, the length of that interval must be compared to some other timescale. Since we do not have such a timescale, the rates (entries of M (t)) can be scaled up, at no cost, to rescale time as desired, and the equations do not change.

Example 1.
In the model of bit erasure described in [62] a classical bit is stored in a quantum dot, which can be either empty (state 0) or filled with an electron (state 1). The dot is brought into contact with a metallic lead at temperature T which can transfer an electron to/from the dot. The propensity of the lead to give an electron is set by its chemical potential, indicated by µ(t) at time t.

The energy of an electron in the dot is indicated by E(t).
Let p(t) indicate the two-dimensional vector of probabilities, with p 0 (t) and p 1 (t) being the probability of an empty and full dot, respectively. These probabilities evolve according to the rate matrix [62] where C sets the timescale of the exchange of electrons between the dot and the lead and w(t) is the Fermi distribution of the lead, Using Eq. (4) and conservation of probability (i.e., p 0 (t) + p 1 (t) = 1), we can writė Now suppose the system is controlled between times t = 0 and t = 1 by setting µ(t), so that w(t) = (1−t)p 1 (0)+tδ. In this case, Eq. (5) can be explicitly solved for p 1 , Now consider what happens in a limit where C → ∞. First of all, p 1 (1) → w(1) = δ, so the transition matrix T (0, 1) relating the initial and final states of the system p(1) = T (0, 1)p(0) is Furthermore, the total entropy production over the process, approaches zero in this limit. (This is because by making C large enough,ṗ 1 (t) →ẇ(t) = δ − p 1 (0), and the argument of the logarithm approaches 1.) Since δ is arbitrary, this means that we can make T (0, 1) arbitrarily close to bit erasure (δ = 0), while having arbitrarily small total EP (C → ∞). This is a constructive proof that bit erasure is QE.
Note that if we cannot control C directly, we can still achieve the same effect as the limit C → ∞ by running the process with some fixed C for longer and longer times (that is, by changing the endpoint from t = 1 to some t > 1).

B. Elementary properties of QE matrices
In this subsection we present some properties of QE matrices that will be used in the following sections. First, note that since every embeddable matrix has positive determinant, by the continuity of the determinant, there is no limit-embeddable matrix with strictly negative determinant. We show in the appendix that the additional requirement that QE matrices be implementable with arbitrarily little total EP fixes the determinant to be nonpositive as well: Proposition 1. Any QE matrix, except the identity, has determinant zero.
Prop. 1 tells us that while any embeddable matrix is limit-embeddable, and any QE matrix is limitembeddable, no embeddable matrix is QE.
To understand Prop. 1 intuitively, recall that any physical process that exhibits net probability current between any pair of states at any time will incur positive EP then. Avoiding such currents at all times requires that the process remains arbitrarily close to equilibrium throughout the process. (This is the "quasistatic" limit where the equilibrium is varied "infinitely slowly" compared to equilibration timescales internal to the system.) In this limit, all information about the precise initial state is lost, i.e., the associated stochastic matrix must be singular.

Lemma 2. The set of QE matrices is closed under multiplication.
Proof. If T (0, 1) is the transition matrix associated with rate matrix M (t) and S(0, 1) is the transition matrix associated with N (t), then S(1/2, 1)T (0, 1/2) = P (0, 1), where P (0, 1) is associated with rate matrix: To complete the proof note that the entropy production over the whole process is the sum of entropy production over both subintervals.
Our goal is to calculate the minimal size of a space Y ⊇ X such that there is a QE matrix over Y whose restriction to X is a given map P . To do this, we will repeatedly make implicit use of Lemma 2, to establish that a particular matrix is QE by writing it as a finite product QE matrices of a special kind. We define that special kind of QE matrix in the next subsection.

C. Local relaxations
Suppose that the dynamics over X is partitioned under the CTMC, in the sense that there is a set of nonoverlapping subsets of X that cover all of X, {ξ i }, such that there is zero probability flow between any distinct partition elements ξ i and ξ j throughout the time interval [0, 1]. Suppose further that for each partition element ξ i , if the system starts in any particular x ∈ ξ i , then by t = 1 it relaxes to thermal equilibrium masked to the states in ξ i . So all information about the initial system state is lost into the heat bath by time t = 1 (except which partition element the system started in).
In such a process, potentially after some relabeling of the elements of X (i.e., after a permutation of X), the stochastic matrix taking the initial state to the final state can be represented by a block diagonal matrix whose blocks are each rank one. Each block corresponds to one of the partition elements. We call any such stochastic matrix a local relaxation (LR).
We now present several useful facts about local relaxations.

Proposition 3. Any local relaxation is QE.
(See appendix for the proof.) Products of local relaxations need not be local relaxations. 2 On the other hand, by Lemma 2, the set of all QE matrices is closed under multiplication. So the converse of Proposition 3 is false.
We refer to a stochastic matrix that represents a deterministic function (i.e., only has 0/1 entries) as a single-valued map. To analyze such matrices, recall that an idempotent stochastic matrix A is one such that

Proposition 4. Any single-valued map that has zero determinant is QE.
Proof. A well-known result in semigroup theory [64] says that any non-invertible single-valued map f can be written as a finite product of single-valued idempotents. Moreover any single-valued idempotent map is a local relaxation. (The partition of X defining the blocks of that local relaxation is the set of subsets {f −1 (x) : x ∈ f (X)}.) The result follows by applying first Proposition 3 and then Lemma 2.
A close relation between local relaxations and QE matrices is illustrated by the following result: Proposition 5. Any 2 × 2 QE matrix is a local relaxation.
Proof. Consider 2 × 2 QE matrix P . If it is the identity, it is a local relaxation. If not, then by Prop. 1, its determinant is zero, which implies it has rank one (it cannot be the zero matrix because QE matrices are stochastic), and so is a local relaxation.

IV. EMBEDDING WITH HIDDEN STATES
Many stochastic matrices P over a fixed finite space X are not QE. However, as we will show constructively, any stochastic matrix is the restriction to X of a QE matrix over a space Y ⊇ X. The elements of Y \ X are "hidden states", in contrast to the "visible states" that comprise X and evolve according to P .
We wish to investigate how many hidden states are needed to be able to implement P . To do so we need one more definition:

Definition 3. An n × n stochastic matrix A is quasistatically embeddable with m hidden states if there exists some
By Lemma 2, if a stochastic matrix P can be factored into a product of matrices each of which is QE with m hidden states, then P itself is QE with m hidden states. More generally, the number of hidden states required to quasistatically embed a product of stochastic matrices is no more than the maximum of the number required for each of those individual matrices.
Note that we need to relabel the elements of Y for the rightmost of the three matrices to be block-diagonal. Specifically, if we permute the second and the third elements of Y we transform the third matrix as: which confirms that the rightmost of the three matrices is LR.
One can verify that no such relabeling will change the LHS of Eq. (6), Example 2 illustrates one way that hidden states can facilitate embedding a stochastic matrix: by adding one row and column to any stochastic matrix, we can make a singular matrix, thereby evading one of the obstacles to embeddability.
We can easily generalize Example 2 to establish that any P that transposes two elements of X is quasistatically embeddable with one extra state. Since any permutation can be written as a product of transpositions and the product of QE matrices is QE, the next result follows immediately:

Proposition 6. Any permutation is quasistatically embeddable with one extra state.
Whether or not we demand quasistatic embedding, Thm 6.2 in Goodman [59] shows that non-identity invertible (noise-free) stochastic matrices require hidden states to embed. Moreover, combining Proposition 1 and Proposition 6, we see that any invertible single-valued stochastic matrix requires exactly one hidden state to be quasistatically embeddable. In light of Proposition 4, this establishes that the number of hidden states required to implement a single-valued stochastic matrix with arbitrarily small total EP is the same as the number required to embed it at all.
The nonnegative rank of a stochastic matrix P is the the smallest m such that P can be written as P = RS, where R is an n × m stochastic matrix and S is an m × n stochastic matrix [65]. Such a factorization P = RS is similar to how a mixture model of a distribution combines m separate components to construct the full distribution. Our most general result gives an upper bound on the number of hidden states needed for quasistatic embedding in terms of nonnegative rank: Theorem 7. An n × n stochastic matrix P with nonnegative rank k is quasistatically embeddable with k − 1 hidden states.
The intuition behind this bound is that an n × n stochastic matrix of nonnegative rank k can be written as a product of two matrices of dimensions n × k and k × n. These rectangular matrices can be interpreted as representing transfers of probability between disjoint sets of states-the n original states and k hidden ones. These can implemented using one hidden state. A further improvement by two states is possible, and gives the theorem. See appendix for the proof.
Since the nonnegative rank of an n × n matrix is less than n, Theorem 7 implies that any stochastic matrix is QE with, in the worst case, n − 1 hidden states. Note that this is fewer hidden states than what is provided by a single extra binary degree of freedom, which doubles the size of a system's state space. So simply by adding a hidden bit to a system, we can implement any stochastic matrix with arbitrarily small total EP.
Note that for the special case of a block diagonal matrix P (or more generally, a stochastic matrix that can be made block diagonal by basis permutation), each block can be implemented independently, one after the other. This means that such a P is quasistatically embeddable with no more hidden states than the block that requires the most hidden states. This suggests an improvement to Theorem 7 for this special case.

V. INFINITE STATE SPACES
We now consider the case where the state space of our system, X, is countably infinite. For simplicity, we will restrict attention to the implementation of single-valued maps f : X → X over such spaces.
Note that local relaxations of single-valued functions are idempotent functions, even when X is countably infinite. So one natural way to extend our analysis to the case of countably infinite X is to consider constructing a sequence of idempotent functions over some Y ⊃ X that, when restricted to X, gives the desired f . To allow the sequence of functions to implement f (x) for all x ∈ X, in general we must consider sequences that are infinite. However the infinite product of local relaxations need not be QE, or even exist. To circumvent this issue we consider a "practical" interpretation of what it means to implement f , by requiring only that any particular input x ∈ X is mapped to f (x) after finitely many local relaxations.
Adopting this interpretation of what it means to implement a function f , we can use a "dovetailing" algorithm to construct a sequence of local relaxations that implements any specified function: Proposition 8. Let X be a countable set, and take Y := X ∪{z}. For any function f : X → X there is a sequence {g i } of idempotent functions g i : Y → Y such that for all x ∈ X ⊂ Y , there is a m such that for all r ≥ m, g r • g r−1 • · · · • g 1 (x) = f (x).
(See appendix for proof.) Proposition 8 is the infinite-X analog of Propositions 4 and 6, which showed that any function over a finite X can be implemented with at most one hidden state.

VI. DISCUSSION
In this paper we discovered a novel cost to implement a given stochastic matrix using a time-inhomogeneous continuous-time Markov chain without any entropy production : the minimal number of hidden states required to implement the matrix. We then presented a preliminary analysis of how this cost depends on the precise characteristics of the matrix. Table I summarizes what we have found about this dependence for the case of a finite state space. We consider both the minimal number of hidden states needed to implement a given stochastic matrix using limitembeddable matrices (LE), and also the minimal number  needed if we impose the extra restrictions that define quasistatically embeddable matrices (QE), i.e., those without entropy production (see Definition 1 and Definition 2).
We emphasize that this table is a work in progress. For example, we used the fact that QE matrices are a special kind of LE matrices, so that any result regarding the minimal number of hidden states needed for QE matrices is an upper bound on the corresponding result for LE matrices. Nonetheless, it is worth explaining some of its entries. First, consider single-valued invertible stochastic matrices (other than the identity), such as the bit flip. Such matrices are not embeddable, and we showed above that they are not even limit-embeddable. So one or more hidden states are needed to implement them. At the same time, by Prop. 6, any such map is QE with 1 hidden state. Thus any single-valued invertible stochastic matrix (except for identity) requires one hidden state, whether we're considering implementing them with LE or QE matrices. On the other hand, Prop. 4 establishes that any single-valued non-invertible matrix is QE with no hidden states (see [64]). Thus no hidden states are needed to implement single-valued non-invertible stochastic matrices with LE or QE matrices. Now consider noisy matrices. First, we note that there are stochastic matrices with determinant 0 that are QE with no hidden states (e.g., local relaxations). So the lower bound on the number of hidden states for stochastic matrices with determinant 0 is 0, for both the case of implementing them with LE matrices and with QE matrices. The situation is different for stochastic matrices with strictly positive determinant. There are positive determinant stochastic matrices which are LE with no hidden states (e.g., P = exp(G), where G is a rate matrix). However, by Prop. 1, we know that any QE matrix must have determinant 0. Thus, stochastic matrices with strictly positive determinant cannot be implemented with any QE matrix without using some hidden states. So have a lower bound of 1 hidden state for QE. This lower bound is tight (e.g., a logically invertible map with determinant 1 has a QE with one hidden state). Finally, stochastic matrices with strictly negative determinant, or where the determinant is greater than the product of diagonal entries, have no LE; thus for both LE and QE, such stochastic matrices can only be implemented with 1 or more hidden states. This lower bound is tight, since there are single-valued logically invertible stochastic matrices which satisfy these conditions and have QEs with one hidden state.
For the upper bound in Thm. 7, we prove that any noisy map can be quasistatically implemented using k − 1 states, where k is the nonnegative rank of P . It is unknown whether this upper bound is tight (i.e., whether there are P which cannot be implemented with any less hidden states). We do not report upper bounds for LEs, because we are not aware of any particular results on this matter. As explained above, however, upper bounds for LEs will be no greater than the QE upper bound, k − 1.
This work has focused on the number of hidden states necessary to implement a given conditional distribution P . It is important to understand that the purpose of the hidden states is to allow a stochastic matrix that cannot be implemented directly using CTMC dynamics, such as the bit flip, to be implemented somehow. This role of hidden states is different from that of the states of the history tape in Bennett's reversible computation construction [6,7,66], or the states of the extra bits in reversible Toffoli gates [67], whose role is to allow logical reversibility. The dynamics over the augmented state space enlarged by the hidden states is not invertible in general, in contrast to, for example, the dynamics over a system built of Toffoli gates.
To further emphasize this distinction, note that the minimal number of hidden states needed to implement a matrix with zero total EP in some ways behaves "oppo-site" to the typical thermodynamic cost associated with implementing a stochastic matrix. For example, a logically reversible computation needs a hidden state to be implementable by a CTMC, whereas a (noiseless) noninvertible one (which needs hidden states to made logically reversible) does not.
In this work, we use many constructive proofs which involve implementing a given conditional distribution P via a number of discrete steps, specifically as products of local relaxations. We do not consider the number of such steps required to implement a given P , which can be considered another cost of physical implementing a computation. Studying the number of steps required to implement a given P , and how this depends on the number of hidden states and the details of P , remains for future work. These investigations would be closely linked to the state space size / time step tradeoff we discuss in a companion paper [57] in the case of single-valued stochastic matrices, because local relaxations are stable in the sense defined in that work. We note, as a initial result in this direction, that our construction in the proof of Theorem 7 requires 4kn − 10 local relaxations. Some work on a related question for the special case n = 3 was done in [68].
with the bound in Eq. A1, yields: where the third line follows by Jensen's inequality. Rearranging, this becomes Thus, we can make the determinant of the orderedexponential of M (t) as small as desired by choosing δ and sufficiently small. This means that any P I which is quasistatically embeddable must have determinant 0. Step 1 Step 2 Step 3 FIG. 1. (a) The composition R r→j (1)R i→r (α) can be used to effect a transfer of probability α from i to j, using a relay state r. (b) A sequence of these operations can be composed to effect any transfer T .
where P is an m×p matrix with positive entries with column sums less than or equal to one, D is a p × p diagonal matrix that makes the whole block matrix stochastic, and I is the m × m identity matrix.
We will make repeated use of the map, R i→j (α), acting on a set with two elements, which fixes j and sends state i to j with probability α and leaves it as i with probability 1 − α. This map is QE with one hidden state.
We call such matrices transfers, because they can be viewed as representing transfers of probability between two disjoint sets of states of sizes p and m. One can show that: Proposition 9. Any transfer T is QE with one hidden state.
Proof of Prop. 9. Without loss of generality 4 , let If P has only one nonzero row, whose entries are p i , i = 1, . . . , n, then where j is the index of the nonzero row of P in 1. To see this, it is sufficient to note that the R i→j (p i ) for different i commute and fix each others images. Now suppose P has two nonzero rows. Write P as the sum of two matrices P = P 1 + P 2 , each zero except for one of the rows of P . Let D 1 be the diagonal matrix which makes where D 2 is chosen to make the matrix is appears in stochastic (this is always possible since the column sums of P are all less than 1). Since the product of stochastic matrices is stochastic, the matrix appearing on the left hand side is T , establishing the proposition for P with two nonzero rows. The result for general P follows by induction.

State space cost is less than nonnegative rank
Lemma 10. Any 2 × 2 stochastic matrix is QE with one hidden state.
Proof. For any real number p ∈ [0, 1], definep 1 − p. Without loss of generality, write P in matrix notation as P = p q pq We consider two cases. If p ≤q, then if x = p/q and y = q, we have In both cases, the factors on the right hand side are local relaxations, so the result is established. Lemma 11. An n × n stochastic matrix P with nonnegative rank 2 is QE with one hidden state.
Proof. Write the nonnegative rank decomposition P = RS where R and S are stochastic matrices of dimensions n × 2 and 2 × n, respectively. We further decompose R and S into sub-matrices: R = R r R 2 and S = [S r S 2 ], where R 2 and S 2 are 2 × 2. Now note that where D is the diagonal matrix that makes the second factor stochastic, and I is an identity matrix of dimensions suitable to where it appears. To show that P can be implemented with one hidden state, it suffices to show that each of the factors on the right hand side can be.
The middle two factors represent transfers, in the sense described above, so they can implemented with one hidden state. The first and last factors represent operations performed on just two states, so by Lemma 10 can also be implemented with one hidden state. Note that R 2 D −1 is stochastic, because the column sums of R 2 are exactly the diagonal entries of D (recall D was chosen to make the second factor stochastic).
We are now ready to prove the main result.
Proof of Theorem 7. Suppose k > 2 (if not, the result follows from Lemma 11). As before, write π = RS where R and S are stochastic of dimensions n × k and k × n, respectively. But this time, decompose R and S differently: R = [R r P ] and S = S r Q , where P is n × 2 and Q is 2 × n. Note that π = R r S r + P Q.
where D is the diagonal matrix that makes the second factor stochastic, and I is an identity matrix of dimensions suitable to where it appears. To prove the Theorem, it suffices show that each of the (n + k − 2) × (n + k − 2) matrices on the right hand side care QE with one hidden state.
The first and last factors represent transfers from k − 2 hidden states to the original n states (and vice versa), and can be implemented using one hidden state, as described earlier. The middle factor, which represents an operation performed only on the original states, has nonnegative rank 2 (note that P and QD −1 are stochastic), so by Lemma 11 can also be implemented with one hidden state. This establishes the result.