Feedback controlled heat transport in quantum devices: Theory and solid state experimental proposal

A theory of feedback controlled heat transport in quantum systems is presented. It is based on modelling heat engines as driven multipartite systems subject to projective quantum measurements and measurement-conditioned unitary evolutions. The theory unifies various results presented in the previous literature. Feedback control breaks time reversal invariance. This in turn results in the fluctuation relation not being obeyed. Its restoration occurs by an appropriate accounting of the information gain and information use via measurements and feedback. We further illustrate an experimental proposal for the realisation of a Maxwell demon using superconducting circuits and single photon on-chip calorimetry. A two level qubit acts as a trapdoor which, conditioned on its state is coupled to either a hot resistor or a cold one. The feedback mechanism alters the temperatures felt by the qubit and can result in an effective inversion of temperature gradient, where heat flows from cold to hot thanks to information gain and use


Introduction
In a famous thought experiment Maxwell envisioned a method for apparently defying the second law of thermodynamics by means of a feedback control mechanism [1]. Maxwell's idea is based on a malicious demon, an intelligent being that is able to observe the microscopic dynamics of a system, and acts on it so as to steer it toward defying the second law. In one of Maxwell's original concepts, the system is a container with two chambers, containing respectively a hot gas and a cold gas. The two chambers are separated by a wall presenting a trap-door which the demon can open and close at will. The demon observes the erratic motion of the gas particle and when sees a particle of the cold chamber approach the trap-door with sufficiently high velocity, she/he swiftly opens the door as to let the particle go through and closes it immediately afterwards. In this way, particle after particle, heat flows from the cold chamber to the hot chamber in contradiction with the second law.
Advance in nanotechnology has made the possibility of bringing Maxwell demons and similar devices from the realm of thought experiments to the realm of real experiments [2,3,4,5]. Both theoretical and experimental studies so far have focused mainly on situations where feedback control is operated as a measurementconditioned driving on some working substance (classical or quantum) coupled to a single temperature, so as to withdraw energy from the latter in contradiction with the second law as formulated by Kelvin. Interesting realistic proposals have appeared in Refs. [6,7]. Situations where heat flows between different temperature reservoirs is controlled, however have not been addressed so far, neither theoretically nor experimentally. The main motivation of the present work is that of filling that gap. In the following we shall present the general theory of feedback controlled heat transport in quantum devices, and shall describe a possible experimental realisation thereof.
The theory presented here builds on previous works concerning fluctuation relations in presence of measurements without feedback [8,9] and with feedback [10], combined with an inclusive approach where quantum heat engines are seen as mechanically driven multipartite systems starting in a multi-temperature initial state [11,12,13,14]. Reference [10] reported on the theory of a one-measurement based feedback control on a quantum working-substance prepared by contact with a single bath. That formalism is here extended to the case of many heat baths, and also repeated measurements, to allow for the study of continuous feedback control of heat flow in a multi reservoir scenario. Previous work concerning repeated measurements appeared in Refs [15] for classical systems in contact with a single bath. Fluctuation relations need to be modified by a mutual information term, which we shall explicitely provide.
Our experimental proposal is based on the fast developing advancements in experimental solid state low temperature techniques: in particular the calorimetric measurement scheme that has been put forward by one of us and co-workers [16,17]. As proven by some recent theoretical proposals [13,18] the method opens up a new avenue for the practical management of heat and work on a chip by means of superconducting devices, particularly superconducting qubits. Here we illustrate the possible implementation of very simple feedback controlled heat transport where the trapdoor is realised by a superconducting qubit whose coupling with two resistors at different temperatures is controlled based on the outcomes of continuous calorimetric monitoring of the resistors themselves.

Theory
Following [14] we model a generic heat transport/heat engine scenario as a driven multipartite system starting in the factorised state, see Figure 1 ρ where H i is the Hamiltonian of each partition including a heat bath and possibly a portion of the working substance, and Z i is the corresponding partition function [14]. Depending on the outcome a j of the measurement the demon applies a quantum gate U j to the bi-partite system with the aim of beating the second law. Each partition is composed of a heat reservoir and possibly one part of a working substance. The whole system evolves with unitaries interrupted by projections.

Let the total Hamiltonian be
where V (t) is an interaction term that is switched on for the time interval t ∈ [0, τ ] over which the system is monitored. We assume that at times t 1 < t 2 < . . . t K some observable A is measured thus causing the wave function describing the compound to collapse onto the subspace spanned by the eigenvectors belonging to the measured eigenvalue a j . Following [10] we shall assume that there can be a measurement error where the eigenvalue a k is recorded instead of the actual eigenvalue a j . This is assumed to happen with probability ε[k|j]. The choice of the interaction V (t) in the interval (t i , t i+1 ) is dictated by the sequence of recorded eigenvalues, or more simply the recorded sequence The corresponding unitary operator describing the evolution in the time span We shall denote the un-conditioned evolution operator from time t = 0 to the time of the first measurement t = t 1 as U 0 . Note that the sequence of recorded labels k j generally differs from the sequence of labels {j 1 , j 2 , ...j i } = j i specifying in which subspace the system state was actually projected at the measurement times t 1 , t 2 , . . . t i . As customary in the context of the fluctuation theorem we shall assume that besides the intermediate measurements of A, all H l 's are measured at times t = 0, t = τ giving the eigenvalues E l n , E l m respectively. The quantity of primary interest is the probability p(m, k, j, n) that n is obtained in the first energy measurement, the sequence j is realised, the sequence k is recorded and m is obtained in the final energy measurement. Here we have introduced the simplified notations j = j K , k = k K . The explicit expression of p(m, k, j, n) is: where p 0 n = Π l e −β l E l n /Z l denotes the probability of obtaining the eigenvalue E n = l E l n in the first measurement; P n denotes the corresponding projector; π j denotes the projector onto the subspace belonging to the eigenvalue a j of A; the symbol ← − Π i denotes i-ordered product, that is, Let ∆E l = E l m − E l n be the energy change in the partition l observed in a single realisation of the feedback driven protocol. Using the cyclic property of the trace and completeness P n = 1, we obtain the following: The proof is reported in the appendix. This relation extends the result presented in Ref. [10] to the case of multipartite system with initial multi temperature state, and to repeated measurements. ‡ The quantity TrA † k,j ρ 0 A k,j represents the probability that the sequences j † = {j K , ...j 2 , j 1 }, k † = {k K , ...k 2 , k 1 }, are realised under the backward evolution specified by the adjoint Kraus operators A † k,j . The total probability γ does not generally add to one. The reason for that is that the i-th evolution U † k i occurs before the the i-th eigenvalue j i is realised in the backward map. The feedback loop is evidently not time-reversal symmetric, and such lack of reversibility breaks the fluctuation theorem e l β l ∆E l = 1 which in fact is a manifestation of time-reversal symmetry [19]. This is reflected by the fact that the quantum channel specified by the Kraus operators A k,j is generally not unital. § The adjoint of a non-unital quantum channel is not trace preserving. In the case of feedback control the quantum channel j,k A k,j ρ 0 A † k,j is generally not unital, as a consequence its adjoint is generally not trace preserving, hence we have generally γ = 1. Lack of unitality generally reflects lack of time-reversal symmetry. Examples are thermalisation maps, namely maps that have a thermal state (not the identity) as a fixed point. Physically these are realised by means of weak contact of a system with a thermal bath, leading to irreversible dynamics. Likewise feedback control breaks the symmetry. This observation reveals some analogy between feedback control and dissipative dynamics.
Before proceeding let us comment briefly on the origin of lack of unitality in feedback controlled systems, in order to gain insight in the issue. For simplicity let us consider the case of a single measurement K = 1. Let us begin by noticing that the quantum channel specified by the A k,j is trace preserving. We have Tr k,j A k,j ρA † k,j = k,j ε[k|j]Tr U k π j ρπ j U † k = k,j ε[k|j]Tr π j ρπ j = j Tr π j ρπ j = Tr ρ, where we have used the cyclic property of the trace, unitarity U † k U k = 1, idempotence π j π j = π j , normalisation k ε[k|j] = 1, and completeness π j = 1. Let us now turn to unitality. We have k,j A k,j A † k,j = k,j ε[k|j]U k π j U † k . If the evolution U k did not dependent on k, that is U k =Ū was chosen regardless of the recorded value k (e.g,Ū is pre-specified or is completely random), one could perform the sum over k using k ε[k|j] = 1 and then use π j = 1 to conclude the map is unital. Feedback, implying explicit dependence on k of U k breaks unitality. Unitality would occur also in the case when ε[k|j] does not depend on j, meaning the measurement outcome k is completely random and has no correlation with the actual state j. In sum if the feedback control measurement is off, either because one decides not to use the information gathered in the measurement, or because the measurement gathers no information in the first place, unitality is recovered, and the fluctuation theorem is restored. This result is in agreement with the established fact that projective measurements without feedback control do not alter the validity of the fluctuation theorem [8,20,21]. Here we have further learned that noise, i.e. choosing the U 's between the measurements completely randomly, also does not affect the integral fluctuation relation. ‡ For simplicity we restricted to the case of cyclic H(t). The extension to non-cyclic case is straightforward. § We recall that a quantum channel specified by Kraus operators Let us now turn to thermodynamics. Using Jensen's inequality, Eq. (5) implies: In the case when the map is unital it is γ = 1, and the second law of thermodynamics is recovered [14]. When γ > 1 the condition l β l ∆E l < 0 is not forbidden, and the apparent violation of the second law becomes possible. This occurs with a proper "demonic" design of the feedback control. When γ < 1 instead the second law is more strictly enforced by means of an "angelic" intervention. As shown in Refs. [10,22] in the case of a single measurement (in either classical or quantum systems) the fluctuation relation can be restored if an information theoretic term, in the form of a mutual information, is added to the exponent in the exponential average. Ref. [15] reports the extension to the case of repeated measurements in the classical scenario. All these results are for a single-temperature initial state. In the present set-up we find as well an information theoretic correction term (see the appendix for a proof): where J k,j is defined by the following set of equations: The symbol p(k, j) represents the joint probability that the sequence j is realised and the sequence k is recorded, while p(k) is the probability that k is recorded. The symbol p(j : k) stands for the probability that the sequence j is realised, conditioned on k being the record. More explicitely The operators B k,j differ from the operators A k,j by the term containing the conditional probability ε[k i |j i ]. Note that the Bayes rule does not apply here, i.e. generally it is The reason is that j and k are concatenated with each other. An outcome j i influences the record k i , which in turn influences the next outcome j i+1 and so on. The quantity J k,j measures the degree of such mutual influence, or correlation between the two sequences j and k . In absence of feedback, namely when there is no correlation between the two sequences, J k,j is null and the standard relation is recovered. Note that given a feedback rule, generally J k,j would grow with the length K of the sequences, i.e. the number of measurements. It is accordingly expected that J k,j ∝ K in the large K regime. With Jensen's inequality Eq. (7) implies We thus have found two bounds to l β l ∆E l . By looking directly at the l β l ∆E l as in Ref. [14] we have found a third bound whose interpretation is most direct and straightforward. Let be the system density matrix at time τ . In the second equality we have used completeness m P m = 1 and the fact that the initial state has no coherences in the energy eigenbasis n P n ρ 0 P n = ρ 0 . Simple manipulations, similar to those employed in Ref. [14] lead to the following salient result where denote the Kullback Leibler divergence between the final state ρ τ and the initial state ρ 0 , Eq. (17); the total amount of correlations (mutual information) that builds up among the partitions as a consequence of their interaction during the time span [0, τ ], Eq. (18); and the total change in von-Neumann entropy of the whole compound, Eq. (19). Here ρ l t = Tr l ρ t is the reduced state of partition l at time t (Tr l denotes trace over all partitions but the l-th). The mutual information I among the partitions of the system (measuring all correlations, quantal and classical), which develops generally due to their interaction V (t) (and can also occur in absence of measurements and feedback Eq. (7) is reminiscent of a similar relation reported by Vedral [23], see Eq. (8) there. The two relations fundamentally differ in various respects. Notably in the meaning of the mutual information term. In our case measuring the correlation between outcomes and their records, in the case of Ref. [23] measuring the correlation between the measurements themselves [14]), should not be confused with the classical mutual information J k,j between the realisation sequence j and the record sequence k caused by the feedback mechanism.
Both the Kullback Leibler divergence D[ρ i τ ||ρ i 0 ] and the mutual information I[ρ t ] are non negative quantities. We thus arrive at the central inequality: In the standard no measurement case, ρ τ is linked to ρ 0 via a unitary map, hence ∆H = 0 and one recovers the result of Ref. [14], , and the second law in its standard form. Note that when there are measurements, but no feedback, the ρ τ is linked to ρ 0 via a unital map, implying γ = 0, J k,j = 0, and ∆H ≥ 0 hence l β l ∆E l ≥ ∆H ≥ 0, meaning that, as is already known [8,20,21] the second law is not altered by the mere application of projective measurements that interrupt an otherwise unitary dynamics. However Eq. (20) clearly indicates that there is a dissipation term associated with quantum-mechanical measurements, which is not present in the classical case. In sum through Eq. (20) we see that there is a thermodynamic cost associated to quantum measurements.
Combining Eqs. (6,14,20) the second law of thermodynamics, in presence of feedback control takes the form

Illustrative example
To exemplify the theory above we consider a prototypical model of quantum heat engine whose working substance is made of two qubits [13,14,24]. Their Hamiltonian reads where σ i z denote Pauli operators. We assume the two qubits have same level spacing ω and are initially in the state: with Z i their partition functions. At t = 0 the σ i z 's are measured collapsing the two qubits in the state |k = |k |k , with k , k = ±, ±. σ i z is irrelevant. At the end of the process each qubit is allowed to relax to thermal equilibrium with their respective thermal baths of inverse temperatures β i so as to reestablish the initial state ρ 0 . Accordingly the average energies ∆E i acquired by each qubit during the process equals the average heats that they release in the baths in the thermal relaxation step. Due to the feedback mechanism energy may be withdrawn from the cold bath and released in the hot one. Note that, due to the fact that the two qubits have same level spacing the SWAP operation does not alter their total energy. Namely there is no energy injection by the Demon: to steer the energy flow he only uses information. The set-up is illustrated in Fig. 2 panel a).
The relevant probability chain is a bit simpler than in the general case because the first energy measurement is itself here also the first feedback measurement.
The probability p(j : k) that the outcome j is realised conditioned on k being recorded is simply the marginal probability p(j) that j is realised because the record k comes chronologically after the realisation of j and hence cannot have any influence on it. The quantity J k,j boils down then to the logarithm of the ratio p(j, k)/p(j)p(k) [10] hence its expectation is the non-negative mutual information between j and k: J k,j = j,k p(j, k)[ln p(j, k)/p(j)p(k)].
Panels b,c) of Fig. 2 show β l ∆E l , − ln γ, − J k,j , ∆H for two choices of β 2 and same β 1 , as a function of the error probability q. In accordance with Eq. (21) we see that β l ∆E l is bounded from below by − ln γ, − J k,j and ∆H. Independent of all other parameters the refrigerator cannot work in the region q < 1/2 where j and k are anti-correlated, while it may only work if q > 1/2. This is captured by − ln γ being positive in the region 0 < q < 1/2 and negative for 1/2 < q < 1. At q = 1/2 outcome and recording are fully uncorrelated, which restores unitality as discussed above and implies ln γ = 0. Regarding ∆H, while it tends to be closer to β l ∆E l in the operation region (q > 1/2), it greatly departs from it in the non-operation region, where it can even get negative values. Notably in both panels there is a value of q for which the bound is saturated by ∆H. Regarding − J k,j we note it is everywhere non-positive as expected. Furthermore it is symmetric with respect to q → 1 − q. This reflects the fact that the mutual information does not distinguish between correlation and anti-correlation. The maximum − J k,j = 0 is attained at q = 1/2 where j, k are uncorrelated, and the standard fluctuation relation is recovered (i.e., γ = 1). In both panels we see that ∆H > − J k,j . Whether this a generic bound is yet to be understood. We note that while at q = 1/2 both − ln γ and − J k,j are null, ∆H is non-negative, reflecting the fact that in absence of feedback there is nonetheless an entropic cost associated to measurements, as discussed above. Such cost can be counterbalanced in presence of feedback (note that ∆H may be negative for q = 1/2). Confronting now the two panels, we see that the higher the thermal gradient β 2 − β 1 , the larger is the point q where the engine starts operating, i.e. where β l ∆E l turns from positive into negative: As intuition suggests the more the gradient the better must your measurement be. This feature is captured also by ∆H but not by − ln γ, − J k,j . Also the smaller the gradient the more the shape of the function β l ∆E l resembles that of − ln γ, with the shift between the two being approximately the value of ∆H at q = 1/2: that is β l ∆E l − ln γ + ∆H| q=1/2 .

Experimental proposal
The general theory developed above allows for a joint information theoretic and thermodynamic analysis of feedback controlled dynamics in the broad scenario where a demon can influence not only the amount of work being provided by the outside as in previous works [2,3,4], but also the heat flow between the various parts of a compound system, e.g. the heat flow between various heat baths. The progress of solid state technology on the other hand allows to realise such feedback controlled heat transport mechanisms in real devices. The example illustrated above can be experimentally realised by introducing a feedback mechanism in the two-superconducting qubits scheme illustrated in in Ref. [13]. Below we illustrate a design that is of more immediate realisation. It is a based on a single qubit and it does not involve any qubit-operation, but only manipulations of qubit-bath couplings. The proposal that we put forward here is based on two ingredients that enable unique capabilities allowing for the implementation of a Maxwell demon based on a most simple concept. The two ingredients are a two-level-system acting as quantum trap door and the calorimetric measurement scheme developed in Refs. [16,17].
The one qubit set-up is illustrated in Fig. 3. The two-level system is embodied by a superconducting qubit of level spacing ω. The two chambers are embodied by two resistors being kept at different temperatures. Qubit and resistors con exchange energy (i.e. heat) in the form of photons of energy ω associated to the TLS absorbing/emitting one photon from/to one of the two baths. The resistors are embedded into an RLC loop of tunable resonance frequency. This results into a tuneable TLS/resistor coupling. When an RLC circuit is far detuned from ω, the qubit is effectively decoupled from the resistor, while maximal coupling occurs when it is in tune with the qubit. The resonance frequency can be tuned by using a SQUID as a non-linear and tuneable inductor, its inductance being governed by a controllable threading magnetic flux.
When a photon enters/exits one of the two resistors, its electronic temperature undergoes a positive/negative jump followed by a fast decay. Two calorimeters [16,17] continuously monitor the two resistors, and count how many photons enter/exit them. This allows for a directional full counting statistics of heat. Most remarkably it also allows to infer the state of the TLS at each time. If an absorption (in either resistor) is observed, it means the TLS jumped down, hence it was up before the absorption was detected, and is down afterwards. This allows to experimentally access the quantum state trajectory of the TLS.
The feedback concept is extremely simple: as soon as a jump-down is observed, turn on the interaction with the cold resistor and turn off the interaction with the hot resistor. Vice-versa for the observation of a jump up. This results in a net flow of heat from the cold resistor to the hot one. Based on the above general analysis the apparent violation of the second law is understood in terms of lack of time-reversal symmetry of feedback control, leading to an overall non-unital dynamics of resistors plus TLS. In a practical realisation one is realistically not able to fully turn off the interactions. Furthermore there will be some delay time δ between measurement being performed and feedback being realised, giving rise effectively to possible error ε[k i |j i ] between measured state k i and actual state j i of the qubit.

Modelling
In the following we model the dynamics of the proposed experiment. We model the evolution of the two level system via a standard Lindblad master equatioṅ where H S = −E 0 (∆σ x + qσ z ) = is the two level system Hamiltonian expressed in terms of the Pauli matrices σ α , and L l are Lindblad operators where S I,l (ω) = S V,l (ω)[R 2 l (1 + Q 2 l [ω/ω LC,l − ω LC,l /ω] 2 )] −1 is the current noise spectrum expressed in terms of the voltage noise spectrum S V,l (ω) = 2R l ω(1 − e −β l ω ) −1 , Q l = L l /C l /R l is the quality factor and ω LC,l = 1/ √ L l C l the resonance frequency of resonator l, expressed in terms of its resistance, inductance and capacitance R l , L l , C l . By increasing L j the rates Γ ↓↑ l , can be quenched, namely the interaction between the TLS and the l-th resistor can be turned off. The symbol M l stands for the mutual inductance between the qubit and the l-th resistor and Φ 0 is the flux quantum. Note that the rates are detailed balanced: The study of heat and work fluctuations requires the study of the dynamics to be performed at the level of single quantum-jump trajectories [13,25], resulting from the unravelling of the master equation. This is here achieved by means of the Monte Carlo wave function (MCWF) method [26,27]. In the specific case under study of a two level system subject to dissipation terms leading to full wave function collapse in either state |− or |+ , this results in a classical dichotomous Poisson process with rates Γ ↓↑ l [13]. The basis of our numerical experiment is the generation of such dichotomous Poisson random trajectories. We chose the right reservoir as the cold one and the left as the hot one. The TLS is assumed to be initially in equilibrium with the left bath. We produce a large sample of trajectories and build the normalised historgram h(N R ) of the number N R of photons entering the right reservoir. Since the heat Q R entering the right reservoir is given as Q R = N R ω, the statistics h(N R ) is the heat statistics. In absence of feedback it satisfies the fluctuation relation The feedback is introduced as follows. At each moment in time we distinguish between the actual state of the system j = ± and the knowledge k = ± we have about it. The latter does not necessarily coincide with the former because we allow for some delaytime δ between a jump occurring in the TLS and our knowledge of the state of the qubit being updated accordingly. The delay time thus effectively introduces an error probability ε[±|±] between the actual state and the knowledge about the state, at each time. At each time, conditioned on the knowledge k of the state we use either one set of rates favouring the interaction with either the cold or hot bath. More explicitly, let Γ ↓↑|± l be the rate for jump down (up) in l-th bath conditioned on TLS being measured to be in state ±. In accordance with Eq. (26) we use the following rates where A, B are determined by the circuitry parameters, and can be tuned via external fluxes Φ i . With B < A, this means that energy exchange with the right (cold) bath is larger when the TLS is believed to be down, so that it becomes more likely that energy flows out of the cold reservoir. Similarly energy exchange with the left (hot) bath is larger when the TLS is believed to be up, so that it becomes more likely that energy flows in the hot reservoir. Overall this results in an effect that contrasts the natural flow from hot to cold. The largest effect can be achieved when turning off the unwanted interaction completely, namely when B = 0. Having in mind a realistic set-up here we keep the ratio A/B finite, meaning partial turning-off is considered. Because of the feedback the fluctuation relation (28) is not obeyed. However it can be proved (see appendix) that, due to the feedback mechanism, the TLS feels the effective temperature gradient we thus see that by tuning the ratio A/B the effective temperature gradient can be manipulated and if the errors associated to the measurement is not too big, it can even be inverted as compared to the original thermal gradient ∆β. So the overall effect of the demon is to change the "temperatures felt" by the TLS. Accordingly the following fluctuation relation is obeyed by the histogram h(N R ). This immediately allows to interpret the quantity via Eq. (7) as the mutual information encoded in a trajectory along which a heat Q R is exchanged with the R bath. Note that when A = B, the feedback has no effect and accordingly J  We also plotted the quantity ln h(N R )/h(−N R ) finding a good agreement with the theoretical prediction −∆β eff ωN R . The effective conditional probabilities ε[k|j] were obtained by recording for each trajectory the total time when state was j and knowledge was k, and averaging their value over the whole ensemble of trajectories. The observed deviation is a consequence of the fact that error here is not introduced in the form of an outcome being missed (as assumed in deriving Eq. (34)), but rather being reported with some delay. With the histogram h(N R ) we computed β l ∆E l = ω∆β N R = −0.0862, − ln γ = − ln e ∆β ωN R = −0.1205, − J exp = −0.2873, for the chosen parameters. The computed values are in agreement with the prediction of Eq. (21). The proposed experiment does not allow to measure ∆H, which would require accessing the full system+baths density matrix.

Energy spent by the Demon
What is the energy cost incurred by the demon to open/close the trap-door? To roughly estimate that we model the LCR circuit as a classical harmonic oscillator (LC circuit) in contact with a heat bath (the resistor) at temperature T . To open/close the door towards one of the two reservoirs, the demon switches the LC frequency from ω i to another frequency ω f so as to put it in/off resonance with the qubit. If the operation is carried in a quasi static manner, the work done is equal to the free energy change: W = k B T ln(ω f /ω i ). The operation would in this case be reversible, and the work lost when opening the door will be retrieved when opening it. The overall cost of a open/close cycle would be null in this limiting case. The other limiting case is when the switch is infinitely fast. The overall cost of a single open/close cycle in this case would be non-negative in accordance with the second law of thermodynamics, and amounts to W = k B T (ω f /ω i − ω i /ω f ) 2 /2. The overall work incurred in a repeated feedback operation is proportional to the number of open/close cycles, which in turn is proportional to the net number of energy quanta being transported, namely the total heat transported. Interestingly we note that the faster the open/close operation, the more effective is the feedback mechanism, the more energy needs to be invested.

Conclusions
We have developed a general quantum theory of repeated feedback control in a multiple heat reservoir scenario. The main effect of feedback control is that it induces a generally non-unital dynamics of the full reservoirs+system compound. As a consequence the standard bound set by the second law od thermodynamics on the dissipation quantifier l β l ∆E l is shifted and may become negative. We have illustrated an experimental proposal where a single superconducting qubit plays the role of a trap-door that is subject to feedback control. The envisaged method for simultaneously measuring the qubit state and the heat exchanged by each reservoir is single photon calorimetry.
Eq. (3) and ρ 0 = n P n e − β l E l n /Z have been used to obtain the second line. Eq.
(3), ρ 0 = n P n e − β l E l n /Π l Z l and Eq. (13) have been used to obtain the second line. Completeness n P n = 1 and unitarity U 0 U † 0 = 1 led to the third line. The fourth line follows from j B k,j B † k,j = 1 which follows by expanding the i-ordered products, apply idempotence π j π j = π j , completeness j π j = 1, and unitarity U j U † j = 1. Cyclical property of the trace, idempotence P m P m = P m and ρ 0 = m P m e − β l E l m /Π l Z l lead to the fifth line. The final result is a consequence of normalisation of ρ 0 and of p(k).

Appendix C. Derivation of Eq. (33)
Under the operation of the demon the TLS experiences effective temperatures of the baths that differ from their actual value. To fix ideas, let us for the moment, assume no delay time and no error in the measurement. The qubit is effectively subject to the following effective rates Γ ↓,eff Let us now introduce the errors ε[±|±] related to the measurement. The stochastic process describing the dynamics of the TLS is still Poissonian with one rate occurring in case of right measurement and one rate occurring in the other case. The idea is that monitoring is continuous, or better, occurring with a sampling time interval dt, which we assume short compared to all rates Γ ↑,↓|± R,L . Let us imagine the system is in state j = +. There is a probability ε[+|+] the observation is k = + and a probability ε[−|+] the observation is k = −. Thus the probability to undergo a jump down in the s reservoir in the interval dt is Hence Eq. (33).