Quasi-autonomous quantum thermal machines and quantum to classical energy flow

There are both practical and foundational motivations to consider the thermodynamics of quantum systems at small scales. Here we address the issue of autonomous quantum thermal machines that are tailored to achieve some specific thermodynamic primitive, such as work extraction in the presence of a thermal environment, while having minimal or no control from the macroscopic regime. Beyond experimental implementations, this provides an arena in which to address certain foundational aspects such as the role of coherence in thermodynamics, the use of clock degrees of freedom and the simulation of local time-dependent Hamiltonians in a particular quantum subsystem. For small-scale systems additional issues arise. Firstly, it is not clear to what degree genuine ordered thermodynamic work has been extracted, and secondly non-trivial back-actions on the thermal machine must be accounted for. We find that both these aspects can be resolved through a judicious choice of quantum measurements that magnify thermodynamic properties up the ladder of length-scales, while simultaneously stabilizing the quantum thermal machine. Within this framework we show that thermodynamic reversibility is obtained in a particular Zeno limit, and finally illustrate these concepts with a concrete example involving spin-systems.


I. INTRODUCTION
The issue of work in arbitrary-scale quantum systems turns out to be quite subtle, and a good deal of recent studies  have analysed varying notions of work for finite-sized quantum systems. More recently the role that quantum-mechanical properties, such as coherence, play in work extraction are being addressed using resourcetheoretic formulations. For example, it has been pointed out in [23] that free energies do not constitute proper coherence measures, and so to properly quantify the thermodynamic value of quantum coherence it is necessary to develop measures that go beyond free energies. In subsequent analysis [24,25], general upper and lowerbounds have been developed to constrain such coherent transformations under very general thermodynamic operations. Moreover, in [26,27] the question of work extraction from an arbitrary qubit state has been analysed in a context which explicitly models the coherence resources that are required to extract work from the qubit state. The analysis recovers the expected result that one can indeed associate the free energy difference ∆F to an arbitrary pure qubit state |ψ ψ|, but only within a particular "classical regime", in which one has access to an infinite system with unbounded coherence resources. Outside of this setting it is provably impossible to extract all of the free energy from the quantum coherence.
The actual implementation of these thermodynamic processes often assume a complex protocol. A great deal of control is required over the different components in order to magnify the energy acquisition up to scales in which the notion of ordered, robust energy makes more sense. The central aim of this paper to address scenarios in which such thermodynamic processes are carried out on a quantum system S via a quasi-autonomous thermal machine M that is comparable in scale and itself displays quantum-mechanical properties. This sheds light on the physical characteristics demanded of a quantum thermal machine.
In the case of work extraction, a finite-sized quantum machine will absorb energy which, due to its finite dimension, will diminish its ability to function. We show that a resolution to this is to perform an "energy harvesting measurement" on the machine. This harvesting measurement serves a dual function of siphoning off energy from the machine while also stabilizing its ability to function as a thermal machine at these quantum-mechanical scales.

A. Clocks and quantum coherence
As highlighted in [23][24][25]27], in order for thermodynamic processes to be sensitive to quantum coherence at arbitrary scales the thermal machine must itself possess coherent properties. In [25] this requirement for quantum coherence was identified with transformations that break time-translation symmetry. Any quantification of the coherence follows from a quantification of this asymmetry.
More explicitly, the breaking of time-translation symmetry demands that the effects of an action depend on whether it is performed at time t 1 or at some later time t 2 > t 1 . The way in which one handles time-translation asymmetry has been known for a long time -we introduce a "clock" system, sensitive to the passage of time [28]. In the classical regime we of course have abundant access to time-keeping devices, however this becomes a more non-trivial component for extremely small, autonomous quantum devices, or in environments in which quantum-mechanical aspects dominate.
Such considerations have also resulted in an increased focus on the role of clocks in thermodynamics [12, 27, 29-arXiv:1508.02720v3 [quant-ph] 16 Nov 2015 33]. To implement autonomous quantum thermodynamic protocols, generic thermal machines invariably require a clock degree of freedom, which serves as a non-classical time-keeping device. This not only allows access to quantum coherence, but also can be used to induce effective time-dependent interactions within systems.

B. Energy transfer to macroscopic scales via quantum measurements
The use of the clock in, say, a work-extraction protocol necessitates a non-trivial interaction between the quantum system S and the clock. The unavoidable backaction experienced by the machine is in general accompanied by an energy flow that can either be transferred to other degrees of freedom in the machine M , or more simply maintained in the clock itself. However if the machine M is comparable in scale to the quantum system S then it is debatable to what extent one has "gained work" if it is confined to quantum-mechanical degrees of freedom in M . A basic requirement is that the acquired energy can be transferred to larger scales in some natural manner.
The passage from the quantum regime to the classical regime has long been a topic of controversy and debate. Where does quantum end and classical begin? Notions that are applicable on the classical side of this divide are inapplicable on the quantum side, and quantum measurements play a central role in linking these two regimes. Depending on which side of the cut one places the measurement device one can either view a measurement as an abrupt transformation of the quantum state (e.g.via a projective measurement), or one could equally model the measurement process itself as a purely unitary interaction between the quantum system S and some measurement apparatus A. The purpose of the unitary interaction U SA is to magnify the quantum-mechanical aspects up the ladder of length-scales to degrees of freedom that are deemed classical. The apparatus eventually admits a read-off and an objective measurement outcome is obtained.
One particular set of constraints when considering fully quantum mechanical measurement approaches is given by conservation laws. The connection between conservation laws and measurements has a long and subtle history. The WAY-theorem states that in the presence of a conservation law there is an effective superselection rule in place on the observables that can be measured [6,[34][35][36][37][38][39]. In [37] it was shown that the measurement device must carry two different resources -a coherence resource to partially lift the superselection rule, and a charge degree of freedom to balance books. The unitary interaction between the machine M and the system S is constantly entangling the two systems. This unitary correlation process can be viewed as an information-acquisition by the machine in an effective local energy basis that varies with time. Note that this process of M acquiring information on S is dis- Dynamics of the thermal machine. The quantum thermal machine at point A of its orbit, absorbs energy from the heat reservoir, which causes a fraction of the total state to deviate from the reference clock orbit. Specifically a fraction p1(t) evolves to point E off the reference orbit, while the remaining fraction (1 − p1(t)) freely evolves to B. The energy-harvesting measurement projects the component at E onto B and allows us to extract this energy as well as stabilizing the machine back to the reference orbit. Each triangular section constitutes a single unit protocol (UP). The total entropy generated corresponds to the integrated "area" under the jagged curve, while the extracted work corresponds to the integrated "length" of the jagged curve. The reference orbit χ(t) is the shaded great circle with respect to which the protocol is defined. tinct from the external measurement performed on M we will later consider.
In what follows, we view the operation of the thermal machine M with the system S and reservoir R as transforming an energetic degree of freedom of the device under SM , which can then read-off via a subsequent measurement on M . This disturbing measurement transfers the energy acquired by the quantum thermal machine into the macroscopic regime, where it can be ascribed a less ambiguous status.
In what follows we use thermodynamic work extraction from a qubit system as our focus. The work extraction involves a quasi-autonomous thermal machine based on a globally time-independent Hamiltonian, which performs the protocol on the quantum side of the cut, together with a continual flow of energy across the cut via a classically-controlled measurement process that also serves to stabilize the quantum device.

II. AUTONOMOUS QUANTUM MACHINES: THE BASIC CONSTITUENTS
The traditional Szilard argument begins with the knowledge that the system is in one particular state of a pair of energetically degenerate levels (e.g. a particle is on the left-hand side of a piston) [12,[40][41][42][43][44]. Conditioned on this knowledge, the agent applies an adapted protocol that extracts a certain quantity of work. Schematically the unoccupied energy level is elevated by switching on a time-dependent Hamiltonian (e.g. attaching weights, or tuning magnetic fields). The system is then placed in equilibrium with a thermal reservoir at some inverse temperature β, and the Hamiltonian is quasi-statically switched off in a classically controlled manner.
In the fully quantum-mechanical scenario our starting point is again knowing that the qubit system S is in a definite pure state |ψ ψ|. We next feed this state into a thermal machine M , and the composite evolves under the joint Hamiltonian H SM , together with system-bath couplings. Crucially, note that we do not assume classical control over the interaction between the quantum systems. Therefore, to obtain a non-trivial extraction of energy requires us to induce a particular time-dependent evolution of the system S -therefore the machine M must possess a clock degree of freedom that induces an effective time-varying Hamiltonian on the target system. In [31] the authors also consider quantum systems that via interactions with their environment act as clocks, and study their synchronisation as well as the back-action they suffer due to the interaction, although not in a thermodynamic setting. The clock in [31] comprises two components, a clockwork which evolves due to an internal mechanism, and tick registers which briefly interact with the clockwork and extract time information. The notion of a clockwork in [31] is similar to what we call the clock itself, while the system S can loosely be seen as playing the role of the tick registers. The clock in the present study however is in continuous contact with the system and evolves with it under a joint unitary.

A. Induced local time-dependent level splittings
The known state |ψ ψ| defines an orthonormal basis {|ψ , |ψ } in the qubit system. Given this preferred decomposition, the generic interaction Hamiltonian needed on the joint Hilbert space H SM takes the form where σ := |ψ ψ| − |ψ ψ| (see appendix B). In particular, the term σ ⊗H I is what will generate a level-splitting local to the qubit, with the time-dependence being encoded in the thermal machine's quantum state ρ M . The second term 1 ⊗ H F , as we shall see, generates an evolution on the system M which is not sensitive to the state of the qubit, and so can be interpreted as the free Hamiltonian of the machine. Crucially, this joint Hamiltonian is time-independent, fixed for all eternity.
In the absence of bath couplings, the system-machine composite evolves under this Hamiltonian as ρ SM → e −itH SM ρ SM e itH SM . The function of the interaction with the machine is to induce an effective Hamiltonian that is local to the qubit. We define which is a time-dependent mean-field Hamiltonian on the qubit. This mean-field approximation turns out to be the physically appropriate choice in the context considered, and has been analysed in more detail in [12,45]. For the above interaction Hamiltonian this takes the form This mean-field Hamiltonian defines a local effective basis, and encodes the non-correlative (and therefore nonentropy increasing) dynamics local to the system [45][46][47]. Crucially, the thermal bath is assumed to only see this local Hamiltonian, which encodes the statement that the machine M does not undergo direct thermalization. This assumption is equivalent to demanding a large coupling strength between system and bath relative to the coupling to the machine, such that the thermalisation time-scale of the system is much shorter than that of the machine. The exact degree to which thermalization of S occurs will depend on the particular bath coupling rates (see also appendix F). Since the energy exchanges with the bath only depend on the level-splitting of H S it suffices to assume H S (t) = H I σ. Also, note that instead of time being an explicit parameter in the system Hamiltonian tuned by an external agent, the time-dependence is now induced by the dynamics and the particular quantum state of the machine M , giving the machine an inbuilt quantum clock. The time-dependence is explicitly a function of the coherence properties of the state ρ M with respect to the Hamiltonian H SM , as emphasized previously 1 .
Therefore, the joint Hamiltonian is essentially determined for the (known) quantum state |ψ ψ| and, together with an initial joint state induces a specific time-dependent level-splitting local to the qubit. However one must also address the back-action of the dynamics on the machine itself. The joint Hamiltonian can be written in the alternative form where we have defined H ± := H F ± H I . The evolution that is generated by this Hamiltonian splits into two parts where U ± (t) = e −iH±t . This describes a controlled unitary action, in which conditioning on the qubit's |ψ and |ψ states evolves the clock along two independent orbits according to U − and U + respectively.
B. Designing a good machine: core requirements We now turn to the question of what clock characteristics besides the validity of eq. (2) are desirable for the functioning of the thermal machine. To make this a well-posed problem, we demand that the clock's Hilbert space dimension is fixed and that the Hamiltonian has its spectrum upper bounded by some fixed energy-scale ||H SM || ≤ E, but we are otherwise free in designing the machine's Hamiltonians and its starting state ρ M (0).
Firstly, the Szilard argument requires an initial degeneracy in the energies of the qubit. Secondly, it is desirable (but not essential) to fix the energy of the state |ψ to be zero, so that the qubit's induced level-splitting according to eq. (3) is given by ∆(t) := tr[ρ M (t)H + ]. Finally and most importantly we require the right coherence properties of the machine to ensure that it functions well, both as a clock and in its ability to induce level-splittings on the qubit. These three criteria are respectively encoded by the following set of conditions on the operators {ρ M (0), H − , H + }: Conditions (i) and (ii) follow directly and uniquely from the desire for initial degeneracy and fixing of the ground state. The intuition behind (iii) is less straight forward and deserves some elaboration. We would like the Hamiltonians H + and H − to have a large commutator in an operator norm sense. At the level of the algebraic relations this implies a rapidly changing unitary evolution (as can be seen from expanding the unitary in increasing orders of commutators). But we then require the state to have a strong response to the induced dynamics. This can be achieved by having the initial state "mutually unbiased" with respect to the Hamiltonians. Note that every Hilbert space H admits a triple of mutually unbiased bases. An extreme regime is the case of H ± being built from two of these bases, while the quantum state is one of the basis states in the third basis. This guarantees that the state simultaneously has maximal coherence with respect to both bases and therefore will strongly break time-translation invariance. Note that taking the imaginary part in (iii) is due to the fact that the trace will be imaginary due to the anti-Hermitian nature of the commutator. While not being unique, condition (iii) provides a convenient encapsulation of these physical requirements.
From eq. (5) we see that in the absence of any thermal contact, the qubit remains in the state |ψ for all time as the joint system freely evolves under H SM . This defines a reference trajectory for the clock, whose dynamics are fully determined by U − . We therefore define χ(t) := U − (t)ρ M (0)U † − (t) as the clock state at time t on the ideal reference clock-orbit. Clearly, for any given Hamiltonian H SM , there is a range of states {ρ (m) M (0)} m that satisfy conditions (i) and (ii) and ensure validity of eq. (2). Each of these initial clock states has its own unique clock-orbit χ (m) (t) associated with it. We choose the specific state ρ M (0) as the state whose associated orbit χ(t) maximises the level splitting ∆(t) for some t =τ over the orbit, i.e. χ(τ ) := arg max This maximising state can always be taken to be a pure state which we call |d , where d refers to the Hilbert space dimension of the clock. The gap maximization then requires that U − rotates this pure state |d d| (which obeys conditions (i) and (ii)) into the maximum eigenvalue eigenstate of H + . To achieve this, it is sufficient to design the machine's Hamiltonians such that H + and H − are generators of SU(2) on the d-dimensional machine (see appendix A for details). Assuming this choice, eq. (7) is equivalent to optimising for condition (iii), so that the optimal machine starting state is the maximum eigenvalue eigenstate of the operator C := i[H − , H + ] (which is also an SU(2) generator). We define the eigenbasis of C with ascending eigenvalues as {|m } 1≤m≤d , such that ρ M (0) = |d d|. Finally, we introduce the complete rotating clock basis {|m(t) := U − (t) |m } which co-rotates with the clock's m = d reference orbit Choosing H − to be an SU(2) generator on the system Hilbert space H also has the advantage of ensuring closed, periodic orbits and will allow us to run the engine in a well-defined cycle 2 . We call the period of the clock τ , such that τ is the smallest positive number for which U − (τ ) = 1. The d different orbits swept out by the {|m(t) } clock basis states over one clock period form a zero-energy surface with respect to the qubit's |ψ state. Another important property is that, for all t = nτ with n ∈ N, the orthogonal states |m(nτ ) have equal energy with respect to any qubit state. This follows since the entire eigenbasis of C satisfies condition (i), and so we may switch between clock orbits without any energy cost at these times.

III. THE EXPLICIT PROTOCOL
Having established the basic quantum-mechanical properties required of the engine, we now turn to the details of the actual engine protocol. First there is the initialization phase in which the initial state (4) freely evolves for a timeτ at which point it attains a maximal local level-splitting ∆ max := ∆(τ ). This induces a local raising of the system's unoccupied |ψ state, as in the original Szilard-type protocol. However the interaction Hamiltonian H SM is fundamentally time-independent, and so there is a need for a single timing-switch within the machine that initialises thermal contact between the qubit and the thermal bath at time t =τ . It is not entirely clear that such a single time switching is fundamentally necessary, but in approaches such as the present one it appears to be almost impossible to avoid if one wants to keep the model general.
With thermal contact in place, work extraction then takes place forτ < t ≤ τ and can be analysed in steps of duration dt, which can be viewed as constituting unit protocols (UP). Each UP can be further analysed in terms of three sub-components: (a) Thermalisation of the system S. It is important to note that the division into these units is determined by the macroscopically-controlled timing of measurements, and not at the level of the quantum machine.

A. Autonomy of the Thermal Machine
One might argue that the frequent measurements in a (as shall be seen later) time-dependent basis in step (c) are in fact similar to a non-autonomous machine having a fine-tuned time-dependent Hamiltonian which is externally controlled. Note though that these two scenarios only become comparable in the limit dt → 0 where the measurements occur at a very high rate. However, our model is valid for arbitrary dt, and one can even consider the extreme case in which dt = τ −τ and only a single measurement in a fixed basis is performed at the end of the protocol 3 . Despite a reduced work output and higher probability of failure (see below), the thermal machine is still able to extract work from the system, even though the process in this limit becomes comparable to the standard two-point work measurement [48,49] employed in many quantum thermodynamic protocols, but with a fixed Hamiltonian. In the non-autonomous scenario if the Hamiltonian was fixed no work output would be possible in this case. Since the machine studied here is able to extract work for any numbers of interventions we call it quasi-autonomous. It abstractly coincides with the fully non-autonomous case only in the limit of continuous intervention dt → 0. 3 In this case one has to consider more realistic thermalisation protocols such as the ones discussed in appendix F. The choice of a single thermalisation per dt, as employed in the main text, is only necessary for obtaining the analytical expressions, but not a fundamental property of the model.

B. Dynamics of System & Thermal Machine
The exact thermalization process can be modelled in various ways, including non-trivial interaction with the unitary dynamics generated by H SM . However, for the sake of analysis we may approximate steps (a) and (b) as firstly a thermalization of the qubit with respect to the local mean-field Hamiltonian H S (t), followed by unitary dynamics exp[−iH SM dt]. This approximation is robust over a large range of parameters, and exact in the dt → 0 limit (see appendix F and discussion in [12]).
As such, at the beginning of every UP the joint system is to good approximation in a state The evolution under the Hamiltonian H SM takes the joint system from ρ(t) to ρ (t + dt) := U (dt)ρ(t)U † (dt). However, since the qubit is generically in a mixed state, the clock now deviates from the reference clock orbit 4 , i.e. tr S [ρ (t + dt)] = χ(t + dt), as schematically illustrated in Fig. 1. Crucially, this deviation of the clock from its reference orbit corresponds to energy being transferred from the qubit system into the machine. Since the state of the clock is distorted by this gain it therefore not only acts as a time-keeping device but also as temporary battery.

C. Energy-harvesting and clock stabilization
Since the quantum thermal machine suffers a backaction as it absorbs energy its ability to induce locallevel splittings and to function as a clock is affected. A crucial component of the protocol is that we repeatedly perform energy-harvesting measurements on the machine that serve two functions: firstly to transfer the energy gain from the quantum to the classical regime, and secondly to stabilize the clock/machine system. This in turn allows us to separate the concepts of clock and battery in the quantum-mechanical system.
The target state on the reference clock orbit is given by χ(t + dt), and therefore the measurement we perform is the projective rank-1 measurement in the orthonormal clock basis {|m(t + dt) }. The ability to perform this measurement is assumed to be a free operation that is accessible macroscopically, however this too could be modelled more explicitly using a larger coherent reference, if one wished. When the measurement is performed, with high probability we project back onto the reference orbit χ(t + dt), thus stabilizing the clock. This probability tends to one as we either decrease the thermal couplings or increase the rate dt −1 at which we perform the measurements (see appendices B and D).
The measurement performed during each UP does not commute with H SM and is thus not energy-conserving. It therefore leads to energy flows between the joint system and the external measurement device. Since energy is globally conserved we can explicitly compute the energy flow into the measurement device. For the outcome corresponding to the projection Π m (t + dt) := 1 ⊗ |m(t + dt) m(t + dt)| we find this to be is the post-measurement state of the system and machine conditioned on outcome m.

D. Exorcising Demons:
Landauer accounting in the macroscopic regime While eq. (8) provides the exchange of energy between the quantum device and measurement apparatus, we cannot identify this as the extracted work. The reason is that the measurement device must be re-set and its memory erased in order to avoid any Maxwell's demon type scenarios [50][51][52][53][54][55][56][57][58]. However this memory is a classical record in the macroscopic apparatus and therefore the traditional Landauer cost of erasure applies [59][60][61][62][63].
The information gain by the apparatus is described by the distribution of measurement outcomes p(t) := {p m (t, dt)}, and so the minimal cost of erasure is given by where S(·) is the Shannon entropy. We can therefore identify an averaged work gain during the UP as This provides a basic energy accounting over the unit protocol from time t to time t + dt. However in order to compose multiple UPs over the entire engine cycle requires us to address the matter of what happens when we do not obtain the projection onto the reference clock orbit, corresponding to m = d.

E. Modes of operation and quantum feedback
We now restrict our focus to two distinct operation modes for the engine, which we refer to as the unselective and selective protocols.
In the unselective case a sequence of measurements is performed in intervals of length dt, and the outcomes recorded. Finally at the end the total measurement data is erased and we have a net energy gain which is the work output. For this we can define the trajectory m := {mτ , mτ +dt , ..., m τ } as the set of measurement outcomes of the N = τ −τ dt consecutive UPs of a full cycle. In each UP, the energy flow into the measurement device given measurement outcome m t+dt , conditioned on starting the UP in state labelled by m t is where ∆ρ(m t+dt |m t ) := ρ(m t ) − ρ(m t+dt |m t ) is the difference between the initial state of the UP and the postmeasurement state. The notation ρ(m t ) shows explicitly that the clock does not necessarily start on the reference orbit, but on any of the orbits labelled by 1 ≤ m t ≤ d, and that the qubit is in a thermal state with respect to the local Hamiltonian induced by this orbit. It is important to note that the energy flow only depends on the clock state directly preceding each UP, not the entire trajectory, since the thermalisation step essentially kills any trajectory history resulting in a Markov process (see appendix B). The reset at the end of of the cycle is given by the Landauer expression. If p(m) denotes the probability of a certain complete trajectory m, then the reset cost is given by (12) where E(m) = N n=1 dE(mτ +ndt |mτ +(n−1)dt ) is the energy flow for the trajectory m. The unselective protocol constitutes a minimalist approach, in which the quantum components require no feedback control. It is therefore the most autonomous mode of operation.
However in this unselective regime the thermal machine undergoes non-trivial back-action that degrades the clock. One might therefore wish to allow elementary feedback control on the quantum systems with the aim of maintaining the characteristics of the machine. Feedback control has been extensively studied in the context of work extraction protocols, both in the classical as well as in the quantum case (see e.g. [13,15,[64][65][66][67]). These feedback protocols generally employ measurements of the target system, followed by operations which are chosen based on the specific result of the measurement. Similarly here, in the selective protocol we operate conditional on the measurement outcomes. If m = d we have a successful projection onto the reference clock orbit, and all is well. The clock is restored back into the state χ(t + dt), successfully stabilising it, and the joint state is For any m = d a non-ideal outcome occurs and the clock jumps into a different orbit |m(t + dt) . Moreover, if one reads off the outcome, then the qubit is collapsed into |ψ , resulting in a joint state ρ m (t + dt) = |ψ ψ| ⊗ |m(t + dt) m(t + dt)| (see appendix B for details and the specific form of ρ S (t + dt)). The quantum engine has "misfired", and in this case we abort the current engine cycle, decouple the qubit from the bath, and perform the following feedback process that resets the engine for a new cycle. The feedback on the system S flips |ψ into |ψ through a single qubit unitary (whose energy cost has to be accounted for), followed by the free evolution of the joint system, "running the engine in neutral", for a duration τ −(t+dt) so that the clock ends up in |m(τ ) = |m(0) . Crucially, since for t = nπ all states of the clock basis have equal energy as noted above, we may now restore the clock to the reference orbit χ(0) without any energy cost, ready to begin a new cycle.

IV. ACTUAL PERFORMANCE
The cumulative erasure cost in the selective mode is generally smaller than the single large erasure in the unselective mode. Starting from the work expression in eq. (10) one can compute the average work output of this engine mode similar to eq. (12). This average, which we shall simply call W , needs to take into account the probabilities of the engine succeeding each UP, as well as the cost of the feedback protocol in case the engine misfires. The explicit expression for W eq. (C10) is derived in appendix C. We can also define W ideal as the maximum single-shot work output of a cycle that completes without a misfire. Although not established explicitly for general clocks, the selective engine employing a feedback protocol has a higher work output W ideal ≥ W ≥ W u in all examples considered, with equality in the Zeno limit, which we shall now consider.

A. Thermodynamic reversibility and the Zeno Limit
The limiting case of dt → 0 constitutes a Zeno limit (ZL) and takes on a special role, since it allows us to recover the well-established results of equilibrium thermodynamics. Explicitly evaluating all the quantities involved, it is easy to show (see appendix D) that the probability of being projected back into the reference clock orbit is equal to unity up to first order in dt. Thus the selective engine will complete the entire cycle without a single UP failing with probability 1−O(dt 2 ), and the unselective engine will follow the reference orbit trajectory {d, d, ..., d} with a probability 1 − O(dt 2 ). It can also be shown that in both cases the cost of resetting vanishes up to first order, W reset = O(dt 2 ), making the process essentially reversible. This implies that in the ZL both engine modes are equivalent. These results further imply that eq. (10) reduces to lim dt→0 dW (t, dt) = dE(t, dt), i.e. the entire energy flowing into the measurement device can actually be identified as work.
More specifically, we find for the infinitesimal work flow (13) where p 1 (t) is the qubit's thermal occupation with respect to the local Hamiltonian induced by χ(t).
We can compare this to the change in free energy of the qubit. Its partition function is Z(t) = 1 + e −β∆(t) . Substituting this into the infinitesimal change in free energy dF (t) = d(−β ln Z(t)) we recover the quasi-static equilibrium result, Specifically, the quantum Zeno engine with a built-in clock, constantly stabilised via energy-harvesting measurements, is able to extract the entire free energy difference of the system as work. Conversely, if we are not able to perfectly stabilise the clock at all times and only allow the accumulated energy to flow out of the quantum clock and into the classical battery at finite intervals, we are naturally restricted to ∆W < −∆F . This is in accord with the second law, where equality can only be achieved under reversible protocols. This also demonstrates a core tradeoff between work output and power. On the assumption that we are experimentally restricted to a minimum dt, we can either attempt to slow down the system dynamics to get closer to the ZL at the expense of power, or vice versa get a higher power but being further from equality in eq. (14), "wasting" free energy. From eq. (13) we can also see that if the qubit's |ψ state is lowered too fast compared to dt, the clock is not able to sample the qubit's thermal distribution p 1 (t) quick enough to utilize the full free energy difference. This adds to the tradeoff between maximising power and maximising work output.
Integrating eq. (14) overτ ≤ t ≤ τ , the total work output of the Zeno engine is where Z(τ ) is the partition function of the qubit at maximum level splitting ∆(τ ). This shows a second limitation we suffer if considering a realistic finite-sized machine. Even if we were able to perfectly stabilise the clock, we are further limited by the maximum level-splitting that the clock can induce. Only in the limit of an infinitely big (i.e. classical) clock can we reach an infinite level splitting ∆(τ ) → ∞ (i.e. Z(τ ) → 1) and thus obtain the classical result W = kT log 2.

V. AN EXAMPLE: THE SPIN CLOCK
The preceding discussion has remained abstract, and not tied to a specific physical realisation. It provides a broad framework of quasi-autonomous quantum engines driven by measurement-stabilised clocks. However, we can look more closely at an explicit example where FIG. 2. Ideal W ideal (solid) and average W (dashed) work output for a spin-clock-driven engine against clock size l for different stabilisation intervals dt. The red curve shows the Zeno limit dt → 0 which quickly approaches the classical result W = kT log 2. a spin-l particle (dimension d = 2l + 1) acts as the clock [12]. The joint-Hamiltonian takes the form where L k is the angular momentum operator of the spin-l particle along the k-axis. We notice that as desired H − and H + are also generators of SU(2) and find the third generator C = i[H − , H + ] = −L x . As shown in the general framework, the optimal initial clock state χ(0) is the eigenstate of C with maximum eigenvalue, i.e. a spin fully polarised along the negative x-direction. The full clock-basis comprises the eigenbasis of −L x . This example is particularly nice since the spin can be viewed as the quantum-analogue of a clock hand. Under free evolution the state χ(t) simply rotates around an axis defined by H − , just like a clock hand with period 2π. The other clock-orbits co-rotate with the reference orbit, and can essentially be seen as shortened, fuzzy versions of the clock hand, i.e. the spin not being fully polarised in a certain direction but only partially. Note that if there is no polarisation, for example if the clock gets too mixed, the hand disappears, we are unable to tell the time, and so unable to induce time-dependence in the qubit. The effect of backaction from the qubit on the clock is again a stochastic splitting of the clock hand into two parts, one following the clock-orbit, the other one rotating out of the clock-plane.
Applying the earlier analysis to this specific model, one can explicitly calculate the work output of a spin-clockdriven quantum engine for varying spin dimensions d and stabilisation intervals dt. The results are shown in Figures 2 and 3. We see that in the ZL even for small clocks we quickly saturate the classical result W = kT log 2. What might be surprising at first is that for finite dt there is an optimal d which is also finite. This comes back to the issue of sampling the qubit. If the clock is too small, we are unable to raise the |ψ state high enough before starting to thermalise. On the other hand, if the clock is too large, we raise the state very high, but also drop it very fast and hence might not be able to sample the thermal distribution quick enough. For larger clocks the average quickly falls below the ideal output, but then converges again in the l → ∞ limit since in this case the optimal scenario starts to dominate the average due to the increasingly smaller deviation from the reference orbit experienced by larger clocks. Again we see that only in the limit of an infinitely large clock with infinitely fast stabilisation can the classical limit of W = kT log 2 be recovered.

VI. OUTLOOK
We have presented an explicit analysis of the basic requirements of a quasi-autonomous quantum thermal machine 5 . Its sole purpose is to extract work from a qubit degree of freedom.
The operation of the thermal machine involves a complex interplay between certain core properties such as quantum coherence, clock degrees of freedom, and the role of quantum measurements. It would be of theoretical interest to provide more detailed accounts of these aspects as they relate to thermodynamics. While we have a good framework to understand the stochastic aspects of thermodynamics [2], we do not have the same level of clarity regarding the role that coherence has on thermodynamic processes. Such a problem can be tackled from a variety of directions, including the present approach.
Beyond this, it would be valuable to make greater connection with current experimental progress related to this line of inquiry. Already there has been work in the context of optomechanical systems [68][69][70][71], and our work has direct bearing on these models.
From the theoretical perspective it would also be fruitful to further develop the information feedback components, and to consider quantum memory degrees of freedom at the level of the quantum thermal machine. In particular it would be valuable to explore the role that measurements play in such systems and to provide a fuller account for generic POVM scenarios. Ultimately this provides a useful setting to address the interplay between information and energy flows across the quantumclassical divide.

VII. ACKNOWLEDGMENTS
We acknowledge useful discussions with Kais Abdelkhalek and David Reeb. We also thank Janet Anders for helpful feedback on an earlier draft. This work was partially supported by the COST Action MP1209. MF supported by the EPSRC and the Japan Society for the Promotion of Science, and DJ supported by the Royal Society.
In the main text it was stated that choosing H − and H + to be SU(2) generators, finding the clock orbit via eq. (7) becomes equivalent to finding the initial state which optimises the coherence condition (iii). Using the definition C := i[H − , H + ] we can rewrite condition (iii) as This condition is optimised by choosing ρ M (0) = |d d| where |d is the maximum eigenvalue eigenstate of C. Now, since H − and H + are SU(2) generators, the definition of C implies that it is also a generator of SU(2), and one can think of H − , H + and C as representing three orthogonal axes. Since the reference clock-orbit evolution is given by rotating a state that lies initially along the axis defined by C around the orthogonal axis defined by H − , we are guaranteed that for some time t =τ , the state χ(τ ) will coincide with the eigenstate of H + which optimises eq. (7). Due to the symmetry properties of SU(2) (the generators essentially defining orthogonal axes), condition (ii) is satisfied for any state |m in the eigenbasis of C, not only the optimal state |d , since the expectation values of H + and H − with respect to the eigenstates of C are zero. Thus, since the clock orbits are defined with evolution under U − only, and U − and H − commute, condition (ii) is in fact satisfied at all times if the clock is on any of the reference orbits |m(t) . Thus the set of d clock orbits {|m(t) } 1≤m≤d defines a zero energy surface with respect to the qubit's |ψ state.
Finally, since as noted above m|H − |m = 0 = m|H + |m for all m, and H I = 1 2 (H + − H − ), condition (i) is also satisfied for all eigenstates of C. This condition does however in general only hold at the initial time t = 0 and after a complete period t = τ . The fact that for times 0 < t < τ this condition is broken allows us to induce time-dependent level-splittings with respect to the clock orbits, while the fact that it holds for any complete period t = nτ allows us to freely switch between the different |m states and reset the engine to the reference orbit without any energy cost in case it was projected on a different orbit during the cycle.
These considerations show that, while not necessary, it is very desirable to design the quantum thermal machine M in such a way that H ± are SU(2) generators.

Appendix B: The protocols in detail
In this section we shall analyse the engine protocol outlined in the main text in more detail. The protocol starts with the joint system-machine system in the state ρ(0) = |ψ ψ| ⊗ χ(0).
The initialisation phase consists of free evolution of the initial state ρ(0) for a timeτ at which point the qubit's induced level-splitting attains a maximum ∆ max = ∆(τ ). This free evolution leaves the qubit state unchanged and rotates the clock state into χ(τ ) = |d(τ ) d(τ )|, the maximum eigenvalue eigenstate of H + . We now start the actual work extraction process which takes place fromτ ≤ t ≤ τ and is broken down into N = τ −τ dt subroutines of duration dt, which we call unit protocols (UP). These subroutines are further broken down into three steps as outlined in (a)-(c).

Thermalisation
The first step of each UP is thermalisation of the qubit with respect to its local Hamiltonian H S (t) induced by the state of the clock. Assuming that the clock is on the reference orbit at time t, the joint state after thermalisation is where γ(t) = p 0 (t) |ψ ψ|+p 1 (t) |ψ ψ| is the Gibbs state with respect to the reference clock orbit at time t such that p 0 (t) = (1 + e −β∆(t) ) −1 and p 1 (t) = 1 − p 0 (t) and If on the other hand the previous measurement projected the clock onto a different orbit m and we keep the engine running (as is the case in the unselective protocol), the qubit's local mean-field Hamiltonian takes on a different form, giving it a different level-splitting and thus thermalising to a different Gibbs state γ(t|m) = p (m) 0 (t) |ψ ψ| + p (m) 1 (t) |ψ ψ|, where the thermal probabilities are defined as above but with respect to the level-splitting ∆(t|m) = m(t)|H + |m(t) , leading to a post-thermalisation joint state of Note that ρ(t|d) = ρ(t), but we make the distinction to keep notation in the case of the selective engine more concise.

Evolution
Step (b) of the UP consists of free evolution of the joint state for a duration dt. Note that for brevity we will in the following assume that all evolution operators act for a time dt, such that e.g. U − ≡ U − (dt), unless stated otherwise. Under this evolution the state ρ(t) evolves into ρ (t + dt) = U ρ(t)U † . More explicitly, (B4) The first term corresponds to the clock simply evolving along the reference clock orbit for a time dt, since U − χ(t)U † − = χ(t + dt). The second term however corresponds to a deviation of the clock from the reference orbit, and an injection of energy into the clock. The expression for ρ (t + dt|m) = U ρ(t|m)U † has a similar form and interpretation, only with the reference clock orbit χ(t) replaced by the co-rotating orbit |m(t) m(t)|.

Measurement in the selective mode
Finally in the last step (c) of the UP we perform a measurement to try and stabilise the clock and transfer the energy it acquired during step (b) to the macroscopic measurement apparatus. The measurement that will project the clock back onto one of the clock orbits is described by the projectors Π m (t + dt) := 1 ⊗ |m(t + dt) m(t + dt)|. Applying this to the state ρ (t + dt), the post measurement state given measurement outcome m is with probability where we have defined This quantity Γ m m (t, dt) can be understood as the probability of the clock transitioning into the orbit labelled by m under the deviation-inducing evolution U + for duration dt, given that the the clock was in the orbit labelled by m at time t. It can easily be verified that the Γ m m form a doubly-stochastic matrix, with summation over either of the two indices giving unity. Also note that if dt is small, this matrix is diagonally dominant, i.e. the system is more likely to remain on any given orbit than to transition to a different orbit. From eq. (B8) we can directly see that if we observe the measurement outcome m = d, the clock has been successfully stabilised and projected back onto the reference clock orbit at χ(t + dt). The qubit is steered to a slightly altered state ρ S (t + dt) ∝ p 0 (t) |ψ ψ| + p 1 (t)Γ dd (t, dt) |ψ ψ|. On the other hand, if we observe m = d, the clock transitions to a different orbit (often leading to a backflow of energy from the measurement apparatus as we shall show below) and the qubit is instantly collapsed to |ψ .
In the selective operation mode of the engine, such a "misfire" of the engine, a measurement outcome m = d, triggers an abortion of the current engine cycle and a feedback procedure that resets the engine for a new cycle. The qubit is immediately decouple from the bath to avoid further thermalisation. Further, we need to reset the qubit to the |ψ state to ensure that the clock rotates along the clock-orbit and ends up in a state where it can be restored to the reference orbit without any further energy cost. The flip operation is given by the unitary U f := (|ψ ψ|+|ψ ψ|)⊗1 and takes the state ρ m (t+dt) to ρ m (t+dt) = |ψ ψ|⊗|m(t + dt) m(t + dt)|. This flip is not energy conserving and its cost has to be taken into account (see below). Since the qubit is now in the |ψ state again, we can allow the system to freely evolve for a duration τ − (t + dt), resulting in a state |ψ ψ| ⊗ |m(τ ) m(τ )|. Now, due to condition (i), we can restore the clock to the reference orbit at χ(0) for free, and the joint system is ready to begin a new cycle.

Measurement in the unselective mode
In the case of the unselective operation mode the state after measurement depends both on the measurement outcome m at time t + dt, as well as the the previous measurement outcome m at time t. Explicitly, the postmeasurement state after observing m is given by with probability If the engine is run in the unselective mode, the engine is kept running regardless of the specific outcome m , and the measurement is followed by the next UP, beginning with a new thermalisation to the state ρ(t + dt|m ) eq. (B3), which destroys the information of the previous measurement, resulting in a Markov process.
in eq. (C2), leading to a positive energy flow into the apparatus 6 . However, in the case of a misfire event, a measurement outcome m = d, we have a very different energy flow, specifically The dash indicates that this is only part of the energy flow associated with this event (see below). Even though the expectation value of H + with respect to the state |m(t + dt) is generally less than that with respect to |d(t) , the fact that 0 < p 1 (t) ≤ 1 2 implies that in many cases an engine misfire implies a back-flow of energy from the apparatus into the quantum system 7 . Additionally applying the feedback and flipping the qubit via the unitary U f will lead to an additional energy cost (which we assume is taken from the work stored in the apparatus) (C6) Taking this feedback cost into account, the total energy exchange between quantum system and measurement apparatus given a measurement outcome m = d is Also taking into account the cost of resetting the memory W reset (t, dt) = β −1 S p(t) given in eq. (9), we arrive at the average work output of the UP starting at time t in the selective mode of We can split this up and define the work associated with outcome m as such that dW = m p m dW m . Using this, we find for the total work output of the selective engine averaged over a full engine cycle where the first term corresponds to the work extracted in the case each of the N unit protocols succeeding, the second term corresponds to the work extracted up to a misfire at the k-th UP (which also aborts the cycle), and the final term contains the energy flow of the misfire at the k-th UP itself, all weighted by the respective probabilities of these events occurring.
For the unselective engine mode, we find the energy associated with with a transition from the clock orbit m at time t to the orbit m at time t + dt (c.f. eq. (11)) as m |H + |m , (C11) 6 Note that this is not strictly true for every UP in general if the clock evolves along a complicated trajectory (i.e. if the level splitting ∆(t) is not monotonically decreasing), but is always true on average over the intervalτ < t < τ . 7 Note that again 0 < p 1 (t) ≤ 1 2 is not necessarily true in general for complex clock evolution since ∆(t) can in principle get negative, but it holds on average over the intervalτ < t < τ .
where we have omitted the explicit time dependence for notational brevity.
The probability of a complete engine trajectory m = {mτ , mτ +dt , ..., m τ } over all N unit protocols is where the inclusion of mτ in the trajectory allows for the scenario in which the clock begins the protocol on a orbit other than the reference orbit. The total energy flow over this trajectory is Finally at the end of the engine cycle we have to reset the memory which is associated with a work cost W reset = β −1 S({p(m)}), such that the total average work output of the unselective engine is given by where we have averaged over all possible engine trajectories.

Heat
During step (a) of each UP the qubit thermalises with respect to its induced local Hamiltonian by interacting with the thermal reservoir at inverse temperature β. Before the thermalisation at time t + dt the qubit is in the state as can be seen from eq. (B8). Since in general p 0 (t+dt) < p 0 (t) and p 1 (t + dt) > p 1 (t) (since ∆(t + dt) < ∆(t)), and p * 0 > p 0 (t) and p * 1 < p 1 (t), we see that the interaction of the qubit with the machine, and the back-action of the measurement process on the machine actually drive the qubit even further away from its thermal state at time t + dt than it would have been otherwise, leading to an increased heat flow required to thermalise the qubit. Specifically the heat flow during the thermalisation at time t + dt is given by The heat flow in the classical case is given by the same expression but without the ratio Γ dd (t, dt)/p d (t, dt) < 1, showing that the fully-quantum mechanical protocol has a higher heat flow associated with it. However, as we hall show below, this ratio approaches 1 in the Zeno limit, such the the classical result can be recovered even for finite-size quantum machines, as long as they can be stabilised infinitely fast. A similar result can be derived for the unselective mode of operation.

Appendix D: Zeno Limit derivations
In this section we derive the results for the Zeno limit dt → 0, in which the clock is stabilised with infinite fidelity, and energy flows constantly out of the quantum machine into the measurement apparatus.
Let us first consider the quantity Γ m m defined in eq. (B7). We can rewrite it as Expanding the exponential functions ignoring terms of order O(dt 2 ) or higher we have Thus the probability of the clock staying on a certain orbit m, despite the deviating inducing evolution generated by H + , is equal to unity up to first order in dt. Crucially this includes the clock orbit d such that Γ dd (t, dt) = 1 − O(dt 2 ). Having established the limiting value of this central quantity, we can now consider the probability distribution over measurement outcomes p(t). Specifically, we are interested in the probability associated with the clock being projected back onto it's reference orbit, assuming it started the UP in this orbit, i.e. the probability p d (t, dt) eq. (B6) in the selective case, and p(d|d) eq. (B9) in the unselective case. We first note that p d (t, dt) = p(d|d). We have Thus in the Zeno limit we are guaranteed to project back onto the reference orbit and the selective and unselective protocols become equivalent. From this result it also immediately follows that the cost of resetting the memory vanishes for both operation modes as W reset = O(dt 2 ). Thus from eq. (10) we have To find an expression for dE d (t, dt) in the dt → 0 limit we see from eqs. (C2) and (C3) that we need to evaluate the expression Γ dd (t,dt) p d (t,dt) U † − H + U − . For the ratio we find from which it follows that Finally, substituting this expression into eq. (C2) and the result into eq. (D4) we arrive at the result for dW (t, dt) eq. (13).
Let us now focus on the free energy change of the qubit to prove eq. (14), showing that in the Zeno limit the entire free energy difference can be extracted as work. Given the fact that we fixed the qubit's |ψ state to zero energy via condition (ii), the partition function at time t is given by Z(t) = 1 + e −β∆(t) where ∆(t) = tr[χ(t)H + ], where we have assumed that the clock is on the clock orbit χ(t) as we showed is always the case in the Zeno limit. The change in free energy of the qubit is thus Differentiating the partition function with respect to t we get The energy splitting of the qubit varies in time as Finally, recognising that e −β∆(t) Z(t) = p 1 (t) and substituting everything into eq. (D7), we see that dF (t) = −dW (t, dt) up to first order in dt, which concludes the proof of eq. (14).

Appendix E: Detailed analysis of the spin-clock
As outlined in the main text, for the specific example of a spin-l system acting as quantum machine/clock we choose the Hamiltonian such that H ± = 1 √ 2 (L y ± L z ), where L k is the angular momentum operator of the spin-l particle along the kaxis. The angular momentum operators L k clearly are SU(2) generators, and the operators H ± can also be seen as angular momentum operators defining a new coordinate system. The third SU(2) generator can be found via the commutation relation Thus we see that this new coordinate system has essentially been flipped along the x-axis as well as rotated in the y-z-plane. The ideal clock state we want to choose is the maximum eigenvalue eigenstate of C, i.e. a spin fully polarised along the negative x-direction. We define the eigenbasis {|m } −l≤m≤l such that C |m = m |m . Note that here we use a slightly different convention from the remainder of the text where 1 ≤ m ≤ d. This is more suited to the angular momentum eigenbases. The state on the reference clock orbit at time t is thus given by Considering the rotation generated by U − (or using the more formal Wigner matrix derivation introduced below), it is straightforward to show that the level splitting of the qubit induced by a clock on this orbit is simply (E4) We see that the maximum level splitting that this clock can induce is ∆ max = l at a timeτ = π/2. Note also that even though the period of the clock is technically τ = 2π, it is preferable to stop the machine earlier at τ = π, since the qubit is degenerate again at this time with respect to all clock orbits, and keeping the engine running for the remaining period would at best (namely in the Zeno limit) lead to no additional work gain. Even though this was not explicitly stated in the main text, whenever the qubit is degenerate with respect to all clock orbits for some time t = τ < τ it is preferable to stop the engine there and "run the engine in neutral" for an additional time τ − τ to get back to the original setup.
For the spin-clock we can explicitly evaluate the quantity Γ m m defined in eq. (B7). Starting from the definition we have We have Finally, noting that |m = e −iH+π/2 |m − and using the D-matrix result eq. (E6) again we have k − |m = k − |e −iH+π/2 |m − = D km (0, π 2 , 0) = d km ( π 2 ) and similarly m |k − = d m k (− π 2 ). Hence we arrive at This might not look more illuminating than the original expression, but the d-matrices are well known functions and can easily be evaluated computationally, allowing us to explicitly calculate results for the spin-clock example. We used this expression to generate the results shown in Figures 2 and 3. All other results follow by simply substituting this result into the relevant expressions. Let us conclude the analysis of the spin-clock example with the work output in the Zeno limit. Starting from ∆(t) = l sin t we have dF (t) = p 1 (t)l cos(t)dt (E10) which upon integration from π 2 ≤ t ≤ π yields This is the maximum work that can be extracted from a pure qubit if one is limited to utilise a spin-l system as a clock/machine and a thermal reservoir at inverse temperature β = 1/kT . We see that the the semi-classical W = kT log 2 can only be recovered for infinitely large clocks l → ∞ (or for zero temperature β → ∞).

Appendix F: Simulating different thermalisation regimes
As noted in the main text, to obtain the analytic results we have to make the assumption that the thermalisation of the qubit takes place at the beginning of each unit protocol, right after the measurement of the preceding unit protocol. Thus evolution (b) and thermalisation (a) are in a sense non-interacting, separated by the measurement (c). In this section we present simulation results that do not rely on this assumption but instead model non-trivial interactions between thermalisation and evolution, and hence show that the approximation is qualitatively robust in all the regimes considered.
We consider two ways of avoiding the assumption. In the first one, we simply split each unit protocol further into n β sub-units of duration dt = dt/n β , each consisting of thermalisation of the qubit followed by free evolution of the joint system for duration dt . The measurement is still only performed once per UP, at the end. Note that whereas in the main text the qubit always thermalises with respect to the local Hamiltonian induced by the reference clock orbit χ(t), the thermalisations in between t and t + dt are with respect to the local Hamiltonian induced by the clocks momentary state, which in general deviates from the reference orbit between measurements. The more sub-units n β we consider, the more quasi-static the process becomes as the qubit more and more smoothly transitions from one thermal state to the next. Due to the immediate backaction into the clock by the joint evolution, this can be seen as the machine scanning the thermal distribution with a higher fidelity. In the limit n β → ∞ the process becomes a quasi-static equilibrium process and the work output is maximised. The results for the optimal scenario in which each measurement succeeds are shown in Fig. 4 for a spin-l clock with stabilisation interval dt = 0.05 for different numbers of sub-units n β . Note that the analytic result (which is equivalent to n β = 1 in the simulations) is not necessarily more or less realistic than the results for higher n β > 1, but can be seen as a non-equilibrium result similar to a finite thermalisation time of the qubit. To consider an even more realistic model of nonequilibrium behaviour, we can further introduce the notion that during each thermalisation stage the qubit is not instantaneously transformed into a Gibbs state, but instead undergoes an equilibration with a bosonic bath at a finite rate, evolving according to a standard master equation [73]. The mean bosonic occupation numbern is given byn for a mode of frequency ω, and we assume that at any time t the qubit only couples to the mode which it is in resonance with, i.e. for which ω = ∆(t). We further define the clock's states with respect to the qubit's |ψ and |ψ states as respectively. Using this notation it can be shown that equilibration of the qubit with a bosonic bath for an effective duration τ β takes the joint qubit-machine state ρ SM to Ideal work output of a machine with a spin-l clock and stabilisation interval dt = 0.05, plotted against clock size l for n β = 5 finite rate equilibrations per unit protocol with different effective equilibration times τ β . The analytic result with n β = 1 and infinite equilibration rate τ β is shown for comparison. Faster equilibration (larger τ β ) implies an evolution of the qubit closer to equilibrium and thus larger work output. Note that for large l the qubit's upper level drops faster, so that the qubit is further from equilibrium for the same equilibration rate, leading to reduced work output for large l.
where the C x→y transition coefficients are given by C ψ→ψ = e −(2n+1)τ βn +n + 1 2n + 1 (F5) C ψ→ψ = − e −(2n+1)τ β (n + 1) − (n + 1) 2n + 1 (F6) C ψ→ψ = e −(2n+1)τ β (n + 1) +n 2n + 1 (F8) In the limit τ β → ∞ this model of equilibration corresponds to the instantaneous transformation to the Gibbs state considered above, but for finite τ β the reduced state of the qubit will in general "lag behind" the Gibbs state. Using this finite time equilibration in combination with the previous notion of breaking each UP into multiple sub-units of equilibration and evolution before the final measurement allows us to model very realistic nonequilibrium behaviour. Figure 5 shows the results for the spin-clock model for n β = 5 equilibration events and different effective thermalisation times τ β . The analytic result corresponding to n β = 1 and τ β → ∞ is also shown for comparison. We see that for very large l the work output of the machine is drastically reduced and approaches zero in the limit of large l. This is due to the fact that in this specific model the level splitting of the qubit is ∆(t) = l sin t, and thus the larger l, the faster the level splitting changes, requiring longer effective equilibration times τ β to keep the qubit close to its respective Gibbs state. This is in some sense equivalent to the notion encountered in the analytic results in the main text that a larger clock (i.e. faster change in level splitting) requires the machine to sample the qubit's thermal distribution at a higher rate in order to get a good work output. We again clearly see the tradeoff between maximising work output and maximising power. If we want to achieve an optimal work output we have to slow down the system dynamics (which in effect increases the ratio of effective equilibration time τ β to the change in level splitting d∆ dt ), which in turn reduces the power output of our engine. Conversely, increasing the power by increasing the rate of system dynamics we end up further away from the Zeno limit where we can convert the entire free energy difference of the qubit into work, thus sacrificing potential work output. Even though the exact quantitative result strongly depends on the model parameters, we see from the results of this section that the analytic result based on the assumption of a single instantaneous thermalisation during each UP qualitatively contains all the core features and even quantitatively accurately captures the results for certain realistic thermalisation regimes of non-equilibrium dynamics. Particularly as we approach the Zeno limit all results exactly converge, if we assume a very strong coupling between qubit and bath, such that the qubit always remains (approximately) in thermal equilibrium.
Appendix G: Fuelling the engine with mixed states In this section we consider a slight variation of the selective engine mode discussed in the main text and show how the engine behaves if instead of pure states we try and fuel it with (partially) mixed states. If we consider the machine as a black-box system which takes some state ρ S as an input, generates work, and outputs a new state ρ * S , we can say that if given the ideal input ρ S = |ψ ψ| for which the engine was designed, we get the state ρ * S = q as the output, where q = N n=1 p d (τ + ndt) is the probability that all N unit protocols succeed an we finish the cycle with a maximally mixed qubit in the state 1/2, and (1 − q) is the probability that some UP fails and we perform the feedback process which returns the qubit to the original state |ψ ψ|. The work output associated with this state transformation ρ S → ρ * S is W eq. (C10). The machine itself is unchanged and will by the design of the protocol always be in the state χ(t) after outputting ρ * S , thus effectively acting as a catalyst.
But we can also ask what happens if instead of inputting the pure state ρ S into the engine, we try and feed the engine its own output state, the mixed ρ * S . We can rewrite the state as If we feed this state into the engine, note that the initial stage for times 0 ≤ t ≤τ is now not just the trivial level splitting induction in the qubit anymore, but also leads to a deviation of the qubit from it's clock orbit. Thus we need to introduce one additional measurement before the first thermalisation to project the clock back onto its reference orbit 8 . The probability P (1) of this measurement failing in projecting the clock back onto χ(t) is given by If this event occurs, the engine immediately starts a feedback process returning the qubit in the |ψ ψ| at the end of the cycle. Otherwise, since the qubit is now at time t =τ back on its reference orbit and the qubit gets thermalised just as if it would have on input of the ideal state |ψ ψ|, the machine proceeds for the remainder of the cycleτ < t ≤ τ just as in the original protocol. As noted above, this second part, containing the actual work extraction, has a misfire and feedback event with a probability Putting both parts together, the chance of the machine experiencing feedback entering the feedback procedure at any point and thus returning the pure state |ψ is p(|ψ ) = P (1) + (1 − P (1) )P (2) , whereas the engine completes the full cycle and outputs the maximally mixed state 1/2 with probability p( 1 2 ) = (1 − P (1) )(1 − P (2) ).
This allows us to ask for which q = q * the machine outputs the same state that it got as its input. This condition is met when q * = p( 1 2 ), i.e. for In general this quantity depends on the specific machine, but we can explicitly evaluate it in the case of the spin clock in the infinite size limit d → ∞. In this case we have Γ dd (0,τ ) → 0 and N n=1 p d (τ + ndt) → 1 such that after rearranging we find q * = 2 3 . One might intuitively expect that if the engine returns the same state that it got as an input, that at best it has a zero net work output. By explicitly calculating the relevant expressions it can easily be shown that for the stationary state with q = 2/3 in the limit of a machine with d → ∞, the energy transferred to the measurement apparatus is ∆E = 2 3 kT log 2, whereas the resetting cost of the memory is W reset ≥ kT (log 3 − 2 3 log 2) such that ∆E−W reset = kT ( 4 3 log 2−log 3) < 0. Thus the net work output in this scenario is strictly negative. More realistic machines with finite d have even lower work output. This result should not be very surprising, since a state a state with q = 2/3 is closer to being maximally than being pure.
Instead of asking which state is stationary under the action of the machine, we can also ask which state leads to a zero net work output, such that all states more pure than this state would result in positive work. We know that such a state has to be less mixed than the stationary state, i.e. have q < q * . The exact value will again strongly depend on the specific clock/machine used, but we can once more consider the classical equilibrium limit of an infinitely large clock d → ∞ and the qubit being kept in equilibrium with the bath. Assuming the engine gets that far, the actual work extrac-tion stage in this limit always succeeds, outputting an amount kT log 2 of work. Thus it all comes down to the probability P (1) of the first stabilising measurement at t =τ failing or succeeding. The measurement itself can easily be shown to induce a zero energy change on average (although each individual measurement result has different energy flows associated with it). Hence the total average energy transferred by the engine is ∆E = (1 − P (1) )kT log 2 which in this specific limit is equivalent to ∆E = (1− q 2 )kT log 2. The memory erasure cost during the actual work extraction stageτ < t ≤ τ vanishes in this limit, so the only erasure cost required is the one of the initial measurement at t =τ which is given by W reset = kT S({P (1) , 1 − P (1) }) = kT S({ q 2 , 1 − q 2 }). Hence the state with zero energy output has q = q which satisfies This equation can be solved numerically to yield q ≈ 0.454. For any q < q the machine has a net positive work output, whereas for more mixed states with q > q the work output is negative. For realistic machines away from the infinite d and perfect thermalisation limit we require even smaller q (i.e. more pure states) for a positive work output.