The classical-quantum divergence of complexity in modelling spin chains

The minimal memory required to model a given stochastic process - known as the statistical complexity - is a widely adopted quantifier of structure in complexity science. Here, we ask if quantum mechanics can fundamentally change the qualitative behaviour of this measure. We study this question in the context of the classical Ising spin chain. In this system, the statistical complexity is known to grow monotonically with temperature. We evaluate the spin chain's quantum mechanical statistical complexity by explicitly constructing its provably simplest quantum model, and demonstrate that this measure exhibits drastically different behaviour: it rises to a maximum at some finite temperature then tends back towards zero for higher temperatures. This demonstrates how complexity, as captured by the amount of memory required to model a process, can exhibit radically different behaviour when quantum processing is allowed.

value directly from observational statistics [4,5,6]. Meanwhile, the corresponding optimal models can be systematically constructed. If a process has a statistical complexity of C, we can systematically replicate the process's statistical behaviour using a model that records only C bits of information about the past. This analytic tractability, combined with a clear operational motivation, has propelled the use of statistical complexity within complexity science as a key measure of structure. Its field of study -computational mechanics -has been applied to analyse structure in diverse settings [7,8,9,10], an early example being the Ising spin chain [11].
Conventional studies only consider building classical models. Yet, recent advances show that even when modelling the same classical process, a quantum model may require less input information [12,13,14,15,16]. This motivates an important question: could statistical complexity exhibit very different qualitative behaviour in the quantum regime? If true, this would imply that many existing studies in statistical complexity could draw very different conclusions when taking quantum information processing into account. Progress in this direction, however, has faced significant hurdles. While there are constructions for quantum models that use less memory than the classical limit, it is not known whether these quantum models are optimal. Thus the quantum statistical complexity has never been explicitly evaluated for any process where there is a quantum advantage, and existing quantum models provide only upper bounds [13].
Here, we present the first provably optimal quantum model for a process with a quantum advantage -a model of sequential measurement outcomes along a classical Ising spin chain. We determine the Ising chain's exact quantum statistical complexity. We compare this to existing studies of the spin chain's classical statistical complexity [11]. Whereas the classical statistical complexity monotonically increases with temperature, the quantum measure attains a maximal value at some finite temperature, then decreases monotonically to zero. This demonstrates that the quantum and classical statistical complexities can diverge significantly in their qualitative behaviour. A system can become increasingly more difficult to model classically as certain physical parameters are varied, and yet simpler to model quantum mechanically.
Framework. Consider a stochastic system that at each discrete time t ∈ Z emits an output x t from some configuration space X . The statistical behaviour at each time t is characterised by a random variable X t . At each time t, the resulting string of outputs may be partitioned into a past X t := . . . X t−1 X t comprised of outputs already emitted, and a future X t := X t+1 X t+2 . . . consisting of outputs yet to come. The system's output behaviour can be captured by a stochastic process -a probability distribution over bi-infinite strings P( X t , X t ) that describes how past observations correlate with future behaviour. Here we consider stationary processes, where P ( X t ) = P ( X 0 ) for any t ∈ Z + , and so hereafter drop the subscript t from infinite strings.
Each instance of the process will have some specific past x, with a corresponding conditional future distribution P( X | X = x). For any non-trivial process, P( X | X = x) depends on x -this reflects the ideal that prediction is meaningful only when the past is correlated with the future. A predictive model, then, is an algorithmic abstraction of P( X | X) that exhibits statistically identical conditional future behaviour. Each model specifies an encoding function that maps each possible past x onto an internal state s ∈ S of some physical system, such that systematic actions on this system at future time-steps will generate a string of outputs obeying P( X | X = x).
The stochastic process can then be modeled by Markovian dynamics on S. At each time-step t, a model in state s t generates x according to P (X t+1 = x|s t ) = P (X t+1 = x| x t ), and updates its internal state to s t+1 = ( x ), where x = . . . x t−1 x t x is the updated past. This implies that if we have two black boxes, one containing an instance of the original process with specific past x, and another containing a machine initialized in state s = ( x), then there will be no way to discriminate between them based on their observed output behaviour. In particular, P( X | X = x) = P( X | S = s), where S is the random variable governing s = ( x). The resulting models satisfy unifilarity, such that there is no uncertainty in S t+1 when given X t+1 and S t -an important mathematical property that ensures a model's internal state contains no information about future outputs outside of what is available from the past [25]. The amount of information such a model tracks is then given by the Shannon entropy, H(S) = − s∈S p s log p s , where p s is the probability ( x) = s.
Clearly many different predictive models exist for each stochastic process. The brute force approach would be to take as the identity map, yielding a model that demands the entire past as input. While such a model will no doubt work, it is clearly inefficient. Take for example, the modelling of a completely random process. Here, as all pasts are equally likely, such a model would demand an unbounded amount of past information. On the other hand, if the process is completely random, the future does not depend on the past, and thus no past information should be require to correctly sample the process's conditional future.
Simplest Classical Models. Computational mechanics provides a framework for constructing the provably simplest predictive models of a process -the ones for which H(S) is minimized [1,3]. This optimality can be achieved by introducing the equivalence relation Let S = {s} be the resulting set of equivalence classes on the set of all pasts, and the encoding function be defined such that ( x) = s iff x ∈ s. The resulting map represents a model that stores not which past was observed, but rather only which equivalence class that past belongs in. Given s = ( x), it is possible to systematically sample P( X | X = x). Specifically, consider a Markov finite state machine with state space S. At each time-step, the machine's behaviour is dictated by transition elements T r ij = P(S t+1 = s j , X t+1 = r | S t = s i ): the probability that a machine initially in state s i will transition to s j whilst outputting r ∈ X (e.g. figure 1). This ensures that when initialized in the appropriate causal state ( x), the machine can sequentially generate an arbitrarily long string x t+1 x t+2 . . . with probability P(X t+1 X t+2 . . . | x).
In literature, such machines are referred to asmachines. Meanwhile, the states s ∈ S are named causal states, emphasizing that they capture all the causal information contained in P ( X, X) [1,3]. The amount of information required to store these causal states thus defines the statistical complexity of a stochastic process -representing the minimal amount of past information one needs about the process to replicate its conditional future statistics. Spin Chains. We consider the 1D Ising system with nearest-neighbour interactions (as in [11]) of length 2N + 1 with periodic boundary conditions. This system is described by the Hamiltonian [18] where J is the coupling parameter, B is an external magnetic field, and x k is the state of the spin at site For this particular process, the -machine has the special property such that T (r) ij = 0 for all r = j, and thus its dynamics can be entirely captured by transition probabilities Tij = T j ij , the probability a machine in state si will transition to state sj and output j, and coincides with the probability P (Xt+1 = j|Xt = i). The exact values for Tij depend on B, J and T , and can be found in [17,18]. k, which takes values +1 for spin up, −1 for spin down. At temperature T , the spin configurations at thermal equilibrium obey the Boltzmann distribution, such that where X −N :N = X −N , . . . X N −1 , X N and X k is the random variable that governs x k . The statistical complexity of such Ising chains has been studied in the thermodynamic limit N → ∞ [11]. In this framework we consider a device that scans the chain from left to right. Suppose at time t the device measures spin x t . Let X t govern the statistics of all spins measured so far, to the left of the device, and X t all spins to the right of the device yet to be measured. The probability distribution P( X t , X t ) then represents the total statistics of what the device observes. The resulting statistical complexity then captures the minimum memory requirement of the device which, given access to the state of spins x t left of its current site, can accurately sample from the conditional distribution P( X t | X t = x t ) to simulate spin statistics it has yet to see. As the spins in the chain are correlated, we expect P( X t | X t ) = P( X t ), and thus the memory requirement to be strictly nonzero. In the thermodynamic limit, P( X t , X t ) becomes a stationary stochastic process that is invariant with respect to t. Thus we can drop the index t and adopt the standard tools of computational mechanics.
The associated -machine can be systematically constructed. Here we summarize prior work [11]. A key observation is that P( X, X) is Markovian. Furthermore, P( X | x 0 = +1) = P( X | x 0 = −1) for any finite T . Thus the Ising system has exactly two causal states, s 0 and s 1 , coinciding respectively with the set of pasts where the device last observed +1 or −1. Once initialized in the correct causal state, the resulting -machine can replicate the correct conditional future statistics according to figure 1.
Quantum Statistical Complexity. A number of approaches to reducing the entropy of classical models using quantum processing have been previously proposed [12,14], but the question of whether such models are optimal has not yet been formally studied. To evaluate the quantum statistical complexity, we first need to define quantum models in general, and then define optimal quantum models.

Definition 1 (Quantum models). A quantum model is an ordered triple
where Ω is a set of quantum states, f is a encoding function that that maps each possible past x to a quantum state |s x ∈ Ω, and M is a quantum procedure (that is, a completely-positive trace-preserving map) such that application of M on a physical system Ξ in state Condition (i) ensures that the quantum model generates statistically identical future predictions to its classical ε-machine counterpart. Meanwhile (ii) ensures that the model's internal state updates accordingly at each time-step such that it remains 'synchronized,' so that the internal state of machine at any point in time always correctly encodes the past. In this case a series of L repeated application of M acting on a physical system Ξ initially in state f ( x), will generate a random variable X 1:L | x := X 1 . . . X L | x with probability distribution P (X 1:L | X = x), for any desired L ∈ Z + . In the limit L → ∞ the quantum model will therefore replicate the process's future behaviour by generating a random variable X with probability distribution P ( X| X = x), whenever it is supplied with a system Ξ in state f ( x). This definition assumes encoded states to be pure to maintain unifilarity, in line with classical models. However, the results we shall present here continue to hold even if this assumption is dropped (see Lemma 3 in appendix).
The entropy H(ρ) of the stationary mixture of encoded states ρ represents the amount of past information the model must store to correctly simulate the future. This motivates as the complexity of a quantum model, where ρ = is the probability of the model being in the quantum state |s i . An optimal quantum model is then the quantum model with the least complexity: Definition 2 (Optimality). A quantum model Q of a process P( X, X) is optimal for P( X, X) if and only if for any other quantum model Q of P( X, X), C(Q) ≤ C(Q ).
Unlike the classical -machine, there may be more than one optimal quantum model. At the very least, for any optimal quantum model Q 0 with a set of states Ω 0 , one can form another quantum model Q 0 with states Ω 0 trivially related to those in Ω 0 by a unitary transformation. Since the von Neumann entropy is invariant under unitary transformations, Q 0 will also be an optimal model.
is an optimal quantum model for a process P( X, X), then Ω 0 are a set of quantum causal states for P( X, X).
This furnishes the necessary background for a definition of quantum statistical complexity.

Definition 4 (Quantum Statistical Complexity).
Consider a stochastic process P( X, X) with an op- Thus the quantum statistical complexity represents the minimal past information an optimal quantum model must record in order to accurately sample from P( X | X = x). Evaluating the quantum statistical complexity is generally non-trivial. While general methods of constructing better-than-classical quantum models are known [12,14,15], it remains an open question if and when such models are optimal. Thus to date, C q has never been computed. We do not know in general exactly how much simpler a process can be when modelled quantum mechanically, nor if the quantum statistical complexity exhibits qualitatively different behaviour.
Here we aim to construct a provably optimal quantum model for the Ising spin chain. To do this, we first introduce two useful lemmas. Lemma 1 (Causal state correspondence). For any stochastic process P( X, X), there exists an optimal That is, there exists an optimal quantum model that has a one-to-one correspondence between the causal states and quantum causal states. This lemma implies there is never any benefit in differentiating two different pasts with coinciding conditional futures. The formal proof is based on the concavity of the von Neumann entropy, and is supplied in the technical appendix.
The second lemma places a constraint on how similar two quantum causal states can be, if we require the model to still be capable of generating correct future output statistics.  Figure 2: (a) Quantum circuit of how the quantummachine can sample from the desired probability distribution P( X | X = x) when given a quantum causal state |sj = q ( x). We introduce two unitaries: V that maps |0 to |s0 , and U such that U |s0 = |s1 . The machine is first supplied with an ancillary qubit that is systematically initialized in state |s0 by application of V . Subsequently, it applies U on the ancillary qubit, controlled on the memory (system in state |si ), where the control unitary transformation maps |k |φ → |k U k |φ and k ∈ {0, 1} indexes the computational basis. The original causal state is then emitted from the machine (denoted by Q0). Repetition of this procedure yields a large entangled chain of qubits Q1, Q2, . . . that contains a superposition of all possible futures. Measurement of Q k with respect to observable Z = |0 0| − |1 1| then yields the correct statistics for x k . (b) Alternatively, we can view the above circuit as an iterative procedure. We can measure each as soon as it is emitted, which will collapse the corresponding ancillary qubit to the desired quantum causal state for the subsequent timestep.
classical causal states {s i } i . Let σ i = |s i s i |. Then the following statement is true: The formal proof follows from the monotonicity of the fidelities under valid quantum operations, and is supplied in the technical appendix.
An optimal quantum model of the spin chain. We now apply these general results to the 1D Ising spin chain. To discuss the quantum statistical complexity, we must identify an optimal quantum model of the process. We shall proceed by presenting a model Q 0 = ( q , Ω 0 , M 0 ), based on the construction in [12]. We will later prove that this model is optimal for the Ising spin chain, and hence refer to Q 0 as the quantum -machine for this process. This model uses 1 We remind readers that the fidelity between two classical probability distributions is a special case of the fidelity between two quantum states, hence we use the same symbol for both cases.
two quantum states Ω 0 = {|s 0 , |s 1 } where |s 0 = T 00 |0 + T 01 |1 , and T ij is the transition probability for the classical -machine of the process in state s i to output (−1) j and transition to s j . The associated encoding map is q : x → | ( x) , where is the corresponding map from x to classical causal states. As shown explicitly in figure 2, there exists a quantum procedure M 0 that allows this model to systematically sample from P(X 1 . . . X L | X = x) for any L, by repeated applications of M 0 acting on a physical system Ξ initialized in state q ( x). Thus the resulting triple Q 0 = ( q , Ω 0 , M 0 ) is indeed an accurate quantum model for the Ising spin chain. This model has an internal entropy where ρ = p 0 |s 0 s 0 | + p 1 |s 1 s 1 | such that p 0 (p 1 ) is the probability of any spin being up (down). Since |s 0 and |s 1 are generally not mutually orthogonal, we see immediately that this machine stores less information than any possible classical model. That is C q ≤ H(ρ) < C µ .
To establish that H(ρ) in indeed the quantum statistically complexity (i.e. that H(ρ) = C q ), we need to establish that Q 0 is an optimal quantum model for the Ising system. Central Result. The quantum -machine Q 0 = ( q , Ω 0 , M 0 ) is an optimal quantum model for the 1D Ising spin chain, and hence its internal entropy H(ρ) corresponds to the quantum statistical complexity C q of this system.
The proof is rather involved, and we present the full details in the technical appendix. This proof relies on the Markovian nature of the 1D Ising system, and that it only has two casual states. The basic approach is to first note that it is sufficient to examine quantum models with two states (Lemma 1), and then directly evaluate the fidelity F [P( X | s 0 ), P( X | s 1 )] for the spin chain and show that it coincides with F (|s 0 s 0 | , |s 1 s 1 |). Lemma 2 is then invoked together with the monotonic relationship between the fidelity of two 2 quantum states and the entropy of their statistical mixture to establish optimality of Q 0 .
This result formally establishes equation (7) as an analytical expression for C q , allowing us to compare its qualitative behaviour with C µ .
2 This monotonic relationship does not hold for mixtures of more than two states [19]. This presents a significant challenge for generalizations of this proof to processes with more than two causal states. Qualitative Divergences. The quantum statistical complexity and its classical counterpart exhibit very different qualitative behaviour (figure 3). The classical statistical complexity C µ increases monotonically with temperature for all finite temperatures T . This reflects the fact that at hotter temperatures, all spin states become more equally likely. Thus the two classical causal states become progressively more equiprobable, resulting in a higher associated memory cost. Note that at infinite temperature, C µ drops discontinuously to zero. This can be understood through the observation that at this limit, the spins observed on the left become completely uncorrelated with those on the right, and thus no memory is necessary to simulate the spin chain.
In contrast, the quantum statistical complexity C q peaks at some finite temperature T max . Progressive increases in temperature past this point result in a decrease in the amount of quantum memory required -C q decays smoothly to 0 as T → ∞. We conclude that for T > T max , the quantum and classical analogues of statistical complexity diverge in qualitative behaviour. The Ising system takes progressively more memory to simulate classically with increasing temperature, this relation is reversed in quantum models for high temperatures. Indeed, as T → ∞, the efficiency ratio C µ /C q becomes unbounded.
At this stage, we note an important caveat. The statistical complexity is an entropic quantity, and thus takes operational meaning in the independent identically-distributed limit. Specifically, suppose a processes has respective classical and quantum statistical complexity of C µ and C q . This implies that if we wish to replicate the futures of an ensemble of N such processes in the limit of large N , then we would either need N C µ bits or N C q qubits. However, if we are simulating only a single instance of a stochastic process, C q < C µ does not necessarily imply that the process can be simulated with less qubits than bits. Indeed, modelling a single instance of the spin chain would require either 1 bit or 1 qubit. It would therefore be interesting to see if these qualitative divergences can exist using single shot measures, such as a max entropy of the input (known in complexity literature as the topological state complexity [1]).
Discussion. The statistical complexity -the amount of historical information we must retain about a system's past in order to simulate its future behaviour -is a popular operational quantifier of structure and complexity. Here we study how this measure can change when quantum processing is allowed. Using a 1D Ising system, we showed that the classical and quantum statistical complexity can diverge drastically in qualitative behaviour. Whereas classical models identify spin chains at very high temperatures as the most memory intensive to simulate and thus the most complex, quantum models identify these as being very simple. Our techniques also establish two necessary properties that any optimal quantum model must satisfy, which contribute towards computing quantum statistical complexity in more general settings.
More foundationally, these results suggest that complexity is an observer-dependent concept. Different observers may construe a system to be highly complex or extremely simple depending on whether they could reason quantum mechanically. This motivates a number of questions: how pervasive are such divergences and how do they impact our current understanding of what is complex? Certainly, the Ising model is used to model diverse phenomena in the physical and biological sciences [20,21,22,23]. Meanwhile generalisations of statistical complexity have been used to study the dynamics of selforganization [24] and the structure of input-output processes [26], and it would be interesting to see how such results may change when viewed through the lens of quantum theory.
Note added. During the process of referee review, further work has highlighted the qualitative difference between C q and C µ leads to ambiguity about the simplest parameter regime for a particular system, and discussed generality of this phenomenon [28]. The unbounded efficiency ratio between C q and C µ has also since been shown in the context of progressively longer N -nearest neighbour Dyson-Ising systems [29] (in the limit of C q → 0, as in this article), as well as in simulating discretizations of continuous systems [30] and stochastic processes in continuous time [31] (where C q is bounded, but C µ grows unboundedly). Meanwhile, the differences in C µ and C µ have since being extended to cover input-output processes that react differently to different inputs [32].

TECHNICAL APPENDIX
The -machine for Ising spin chains. Let P = P X, X denote the probability distribution of an Ising spin chain. Crutchfield and Feldman [11] show that that is, the probability of observing the right half spin configurations given the knowledge of the left half spin configuration, requires only the knowledge of x 0 [17]. This implies that any two pasts x = . . . , x −1 , x 0 , x = . . . , x −1 , x 0 belong to the same causal state provided x 0 = x 0 . Thus, the -machine for this process has two causal states, s j = { x : x 0 = −1 j }, with j = 0, 1. The probability of finding the system in either causal state is thus given by the probability of finding any spin in the system to be up or down, denoted by p 0 and p 1 respectively.
As the causal states of the system are in one to one correspondence with x 0 , T (r) ij is non-zero only when r = (−1) j . Thus, the notation for the -machine's dynamics can be simplified by writing T Furthermore the exact transition probabilities between causal states align with the probabilities governing the spin at location k + 1 given the state of the spin at location k.
The exact values of these transition probabilities vary as a function of temperature, and are well known [18]. Proof. We first prove the "only if" by contradiction. Consider two pasts x and x where ( x) = ( x ). This implies P( X | x) = P( X | x ). For some quan- and then there cannot exist a systematic procedure M that can output different statistics on input of either |φ x or |φ x . Thus no quantum model can generate the correct conditional statistics for both x and x . Therefore for any quantum model that simulates P, σ x = σ x whenever ( x) = ( x ). The "if" direction will be demonstrated using concavity of entropy. We show that for an arbitrary quantum model Q , one can always find a quantum model Consider the set of pasts in the same classical causal state s i . Let P i = P( x ∈ s i ) > 0 be the total probability in the stationary state that the past is in causal state s i . Let ρ be the stationary state of quantum model Q , which we divide into contributions from pasts in s i and pasts not in ∈si p( x) σ x , and q( x) = p( x) /P i , we write ρ = (1−P i ) ρ ¬i + P i x∈si q( x) σ x , and can then factor out the summation such that ρ = From the concavity of entropy, Let x m ∈ s i be the past that minimizes this bound, with associated density operatorσ = σ xm . For every quantum model where x ∈ s i are assigned to arbitrary σ x , there is also a valid quantum model that assigns the stateσ for all x ∈ s i , since for all pasts in s i , repeatedly applying the measurement process M toσ will give the desired future statistics. The stationary state of this model isρ It follows by considering every causal state s i in turn, one can construct a quantum model with lower or equal entropy that maps all x ∈ s i to the same quantum state σ i . Hence, for any arbitrary quantum model, there is always another valid quantum model with the same or lower entropy that satisfies σ x = σ x if and only if ( x) = ( x ). As such, we can restrict our search for an optimal model to those that satisfy the criterion of this lemma. Finally, we must establish if there is a valid quantum model at all. This is guaranteed by the existence of a classical -machine, which is a special case of quantum models (with the causal states encoded onto orthogonal quantum states).
Proof. The proof of follows from the monotonicity of F under trace preserving quantum operation (see, e.g. chapter 9 of [27]). A measurement M that extracts statistics must induce a quantum operation σ i → R(σ i ). Let |ψ and |φ be purifications of σ i and σ j into a joint system AB such that σ i = tr B (|ψ ψ|), σ j = tr B (|φ φ|), and F (σ i , σ j ) = | ψ|φ |. Next, we introduce an environment E for the quantum operation R, such that R(σ i ) = tr E U (σ i ⊗ |0 0|) U † . We then note that R(σ i ) = tr BE U (|ψ ψ| ⊗ |0 0|) U † . This also applies to R(σ j ). By Uhlmann's theorem [27], Hence, there is a maximum fidelity F max between the quantum states σ i and σ j set by F max = F P( X | s i ), P( X | s j ) . If this bound is violated, Q is not a valid quantum model.
Optimality of the Quantum -Machine. We prove the following theorem: Central Result. Let P = P( X, X) denote the stochastic process that governs that Ising system, and s 0 and s 1 be the causal states of this process, with transition probabilities T ij in the correspondingmachine. Then there is an optimal quantum model Q 0 = { q , Ω 0 , M 0 } where Ω 0 consists of the pure quantum states |s 0 and |s 0 , given by |s 0 = T 00 |0 + T 01 |1 , Proof. From Lemma 1, we know that there is an optimal quantum model with two quantum causal states. Let them be |ψ 0 and |ψ 1 . The complexity of this model monotonically decreases with increasing fidelity F (|ψ 0 ψ 0 |, |ψ 1 ψ 1 |) between the two causal states. From Lemma 2, we know that F (|ψ 0 ψ 0 |, |ψ 1 ψ 1 |) ≤ F max = F P( X | s 0 ), P( X | s 1 ) for any valid quantum model. Therefore, if there is valid quantum model such that the fidelity F (|ψ 0 ψ 0 |, |ψ 1 ψ 1 |) = F max , then this model is optimal.
The fidelity between the two distributions P( X | s 0 ) and P( X | s 1 ) is given by where x k represents the processes' output at time t = k. The Ising system is a Markovian process where T , allowing us to express the transition probabilities as T ij . Under these circumstances we can invoke the chain rule of probability and reexpress: It can be easily verified that for the quantum states |s 0 and |s 1 presented in equation (15), F (|s 0 s 0 |, |s 1 s 1 |) = √ T 00 T 10 + √ T 01 T 11 , and hence these states saturate the maximal possible fidelity for P.
As such, the only remaining step is to demonstrate the existence of an M that extracts the necessary statistics. This is presented constructively by the quantum circuit in figure 2 of the text. Hence, Q 0 is a valid quantum model, and moreover, since it consists of two pure states that saturate the maximum fidelity, it is an optimal quantum model.
Since we have found an optimal quantum model Q 0 for the Ising spin chain, this demonstrates that the spin chain's quantum statistical complexity C q = C(Q 0 ). We remark that there is in fact a family of optimal models. Fixing the first quantum causal state |σ 0 as |0 without loss of generality (by symmetry), we can write the second quantum causal state |σ 1 = cos θ 2 |0 + e iφ sin θ 2 |1 , where θ ∈ (0, π] is determined by the maximum fidelity, but we are free to choose any φ ∈ [0, 2π) and any |1 such that 0|1 = 0. All such assignments of |σ 0 and |σ 1 are all related by unitary transformations, so if there is a measurement M that extracts the statistics from one such model, a measurement can be found for all such models. Moreover, since the von Neumann entropy is basis-independent, all such choices of model have the same complexity. It thus follows that every assignment of pure states separated by the minimum distance are optimal.
We also establish that the Main Theorem continues to hold when we consider encoding x into mixed quantum states, by the following lemma: where Ω 0 consists of two pure quantum states σ 0 and σ 1 (in one-to-one correspondence with classical causal states), saturating the maximum fidelity F max = F P( X | s 0 ), P( X | s 1 ) , then this Q 0 is optimal.
Proof. It has already been established by the Main Result that such a Q 0 is optimal amongst pure state quantum models. We shall show that there are no alternative models with a lower entropy than Q 0 , even when we allow for encoding onto mixed quantum states.
First, note that the argument of Lemma 1 trivially generalizes to mixed states, we may restrict our search within models with two internal states.
Let Q be a quantum model with Ω = {σ 0 ,σ 1 }, such that at least one state (say,σ 1 ) is mixed. As Q is a valid quantum model for P ( X, X), there is a measurement process that yields outcome x 1 ∈ {−1, 1} with probability P (X 1 = x 1 |S 0 = s i ) when acting oñ . The convexity of quantum states allows us to expressσ 1 as a convex combination of pure statesσ 1 = i λ i |ψ i ψ i |. Moreover, the subset of quantum states ρ satisfying the linear constraint Tr (M ρ) = k is also a convex set. Thus, we can decompose σ 1 into some {λ i , |ψ i } i such that Tr (M |ψ i ψ i |) = Tr (Mσ 1 ) for each i individually.
Letting p 0 and p 1 be the probabilities of classical causal states s 0 and s 1 respectively, we write the stationary state asρ = p 0σ0 + p 1σ1 , and explicitly factorise:ρ Hence, from concavity of entropy, Let |ψ min be the choice of state that minimizes the above expression. Any model that encodes onto the pair of statesσ 0 and |ψ min is guaranteed to have complexity equal or lower than C(Q ). Moreover since {M, I − M } acting onσ 0 and |ψ min produces output statistics with fidelity F max = F (P ( X|s 0 ), P ( X|s 1 )), by monotonicity of fidelity under contractive maps we have F (σ 0 , |ψ min ψ min |) ≤ F max .
Exactly the same argument can be made to find some substitution |φ min forσ 0 , generating a new encoding which is guaranteed to satisfy the maximum fidelity, and to result in equal or lower complexity. This construction shows that no valid mixed state encoding can exceed all pure states encodings. Hence, if we can find a pure state model such that the two internal states saturate the fidelity bound, then it will have lower entropy than any mixed state model.