Minimising the heat dissipation of quantum information erasure

Quantum state engineering and quantum computation rely on information erasure procedures that, up to some fidelity, prepare a quantum object in a pure state. Such processes occur within Landauer's framework if they rely on an interaction between the object and a thermal reservoir. Landauer's principle dictates that this must dissipate a minimum quantity of heat, proportional to the entropy reduction that is incurred by the object, to the thermal reservoir. However, this lower bound is only reachable for some specific physical situations, and it is not necessarily achievable for any given reservoir. The main task of our work can be stated as the minimisation of heat dissipation given probabilistic information erasure, i.e., minimising the amount of energy transferred to the thermal reservoir as heat if we require that the probability of preparing the object in a specific pure state $|\varphi_1\rangle$ be no smaller than $p_{\varphi_1}^{\max}-\delta$. Here $p_{\varphi_1}^{\max}$ is the maximum probability of information erasure that is permissible by the physical context, and $\delta\geqslant 0$ the error. To determine the achievable minimal heat dissipation of quantum information erasure within a given physical context, we explicitly optimise over all possible unitary operators that act on the composite system of object and reservoir. Specifically, we characterise the equivalence class of such optimal unitary operators, using tools from majorisation theory, when we are restricted to finite-dimensional Hilbert spaces. Furthermore, we discuss how pure state preparation processes could be achieved with a smaller heat cost than Landauer's limit, by operating outside of Landauer's framework.


Information erasure and thermodynamics
In his attempt to exorcise Maxwell's demon [1,2], Leo Szilard conceived of an engine [3] composed of a box that is in thermal contact with a reservoir at temperature T, and contains a single gas particle. By placing a partition in the middle of the box and determining on which side of this the particle is located, the Maxwellian demon can attach to said partition a weight-and-pulley system so that, as the gas expands, the weight is elevated. By ensuring that the partition moves without friction, and continuously adjusting the weight to make the process quasistatic, one may fully convert k T log 2 B ( ) units of heat energy from the gas into work. Here, k B is Boltzmann's constant and log( · ) is the natural logarithm. In order to save the second law of thermodynamics the engine must dissipate at least k T log 2 B ( ) units of energy to the thermal reservoir as heat. While it was initially believed that this heat dissipation is due to the measurement act by the Maxwellian demon, following the work of Landauer, Penrose, and Bennet [4][5][6][7] the responsible process was identified as the erasure of information in the demon's memory-the logically irreversible process of assigning a prescribed value to the memory, irrespective of its prior state. That the minimum heat dissipation required to erase one bit of information cannot be any smaller than k T log 2 B ( ) is commonly known as Landauer's principle, and said minimum quantity as Landauer's limit. In general, Landauer's principle may be encapsulated by the Clausius inequality Q k T S, where Q D is the heat dissipation to the thermal reservoir and S D is the entropy reduction in the object of information erasure.

Thermodynamics in the quantum regime
Recent years have been witness to a growing interest in thermodynamics and statistical mechanics in the quantum regime (see [8,9] for a review). This has lead to a lively debate regarding the definition of two central concepts in thermodynamics-work and heat-within the framework of quantum theory. In classical physics, the work done during a process is defined as the increase in useful, ordered energy. Conversely, the heat dissipated during a process is the increase in unusable, disordered energy. In Szilard's engine, for example, work is characterised as the (deterministic) elevation of a weight, and hence the increase of its gravitational potential energy. The heat dissipated, on the other hand, would be stored as kinetic energy in the random motion of the atoms that constitute Szilard's engine, as well as the environment. This clear distinction fails in quantum mechanics, which is an inherently probabilistic theory.
Broadly speaking, work may be characterised in two different ways: (i) ò-deterministic work [10,11]; and (ii) average work [12,13]. In either case, one may include the work storage device-a quantum analogue of the elevated weight in Szilard's engine-explicitly in the formalism, such as [14,15]. This is not always done, and one may directly examine the energy change in the system under consideration. In the ò-deterministic framework, the work of a process is defined as the difference in energy measurement outcomes on the system (or work storage device), observed prior and posterior to the process. The ò-deterministic work is then the maximum value of work, thus defined, which occurs with a probability of at least 1  -. Meanwhile, average work is given as either the difference in expectation values of energy, or the difference in the free energies, of the system (or work storage device) observed prior and posterior to the process. The difference in average energy can be converted to the difference in free energy by subtracting the von Neumann entropy of the system, multiplied by the temperature, from its average energy. Definitions of heat can similarly be broadly classified into two categories: (i) where the thermal reservoir is treated extrinsically [12,16]; and (ii) where the thermal reservoir is treated intrinsically [17,18] . If the thermal reservoir is treated extrinsically, whereby it does not explicitly appear in the framework as a quantum system susceptible to change and examination, heat is a property of the system of interest. One may therefore define heat after having determined work-that is to say, given the change in total energy of the system, E D , and the work, W D , the heat Q D is given by the first law of thermodynamics as Q E W D = D -D . Alternatively, Landauer's principle may be invoked to get a lower bound of heat dissipation, given that the system has undergone an entropy change of S D . If the thermal reservoir is treated intrinsically, on the other hand, heat can be defined as the average energy change of the reservoir itself. In other words, heat is average work pertaining to the thermal reservoir. A thermal reservoir, considered intrinsically, is a system that is initially uncorrelated from every other system considered, and is prepared in a Gibbs state. We note that, from this perspective, treating the thermal reservoir with the Born Markov approximation would render it extrinsic; this is because the state of the reservoir, in the coarse-grained picture, is assumed to never change. As such, defining heat dissipation during a process as the average energy increase of the reservoir would lead one to conclude that no heat is dissipated at all. Indeed, the physical justification for the Born Markov approximation is that, at time-scales much shorter than that at which the system changes, the reservoir relaxes to its equilibrium state by interacting with an unseen and, hence extrinsic, environment. If this environment is explicitly accounted for quantum mechanically, then the total system will again evolve unitarily, and the energy increase of this environment has to also be accounted for.
In this article, we shall adopt the view that work is the change in average energy of the system. Moreover, whenever a thermal reservoir is mentioned, we will consider it intrinsically and include it as part of the system under investigation. The work storage device, however, is considered extrinsically: by the first law of thermodynamics we take as a priori the notion that the change in average energy of the system-including the reservoir if it is present-must come from an external energy source. This total change in average energy is defined as the work done by the extrinsic work storage device. If the total system is composed of an object and thermal reservoir, each with a well-defined Hamiltonian, then the portion of this work that is taken up by the object is called the work done on the object, and the portion taken by the reservoir is called the heat dissipated to the reservoir. If the total system is thermal, then the entirety of the work done by the extrinsic work storage device is defined as heat.

A quantum mechanical Landauer's principle
The surge of interest in quantum thermodynamics has included attempts to consider Landauer's principle quantum mechanically [18][19][20][21][22][23][24]. Most notable among such efforts is that of Reeb and Wolf [25], who provide a fully quantum statistical mechanical derivation of Landauer's principle by considering the process of reducing the entropy of a quantum object by its joint unitary evolution with a thermal reservoir. Here, they consider heat dissipation as the average energy increase of the reservoir, which is initially in a Gibbs state and is not correlated with the object. For a reservoir with a Hilbert space of finite dimension d  , they arrive at an equality form of Landauer's principle )   r ¢ is the mutual information between object and reservoir after the joint evolution, and is the relative entropy between the post-evolution state of the reservoir and its initial state at thermal equilibrium. As the mutual information and relative entropy terms are non-negative, this implies Landauer's principle. While equation (1.2) always yields the exact heat dissipation, it involves terms that are cumbersome to calculate and, perhaps more importantly, it is not a function of S D alone. As such, Reeb and Wolf provide an inequality form of Landauer's principle is a non-negative correction term that vanishes in the limit as d  tends to infinity.

The need for a context-dependent Landauer's principle
The study in [25] provides a lower bound of energy transferred to the thermal reservoir as heat dissipation, given that the object's entropy decreases by S D and that the reservoir's Hilbert space dimension is d  . The crucial point however is that this lower bound can be obtained for some physical context, but not all of them. By physical context, we mean the tuple H T , , , , Here   and  r are respectively the Hilbert space and state of the object, while   , H  , and T are respectively the Hilbert space, Hamiltonian, and temperature of the reservoir. For example, one way to achieve the lower bound of equation (1.3) is for the object and reservoir to have the same Hilbert space dimension, allowing us to perform a swap map between them; this will take the mutual information term in equation (1.2) to zero. The next step of the optimisation would be to pick a specific  r , H  and T so as to minimise the relative entropy term. Conversely, for a given physical context such inequalities may prove less instructive. Indeed, if it is impossible to achieve the lower bound of equation (1.3) in a given experimental setup, in what sense can we consider this as the lowest possible heat dissipation due to information erasure? In this study, therefore, we aim to approach the problem of information erasure from the dual perspective: given a physical context, what is the minimum heat that must be dissipated in order to achieve a certain level of information erasure. This context-dependent Landauer's principle will be characterised by the equivalence class of unitary operators that achieve our task. Of course, this first requires a re-examination of what exactly we mean by information erasure.
1.5. Information erasure: pure state preparation and entropy reduction In this article, we take information erasure to be synonymous with pure state preparation; just as in classical mechanics erasure (in the Landauer sense) involves the many-to-one mapping on the information bearing degrees of freedom, then in quantum mechanics this translates naturally as the irreversible process of preparing the object in a pure state. Probabilistic information erasure, then, refers to the case where the probability of preparing the object in the desired pure state is lower than unity. Although erasing the information of an object as presently defined leads to a reduction of its entropy, the two processes are not quantitatively the same. If we wish to maximise the largest eigenvalue in the object's probability spectrum, thereby maximising the probability of preparing it in a given pure state, in general we need not minimise its entropy to do so; the only cases where maximising the probability of information erasure leads to minimising the entropy are when the object has a two-dimensional Hilbert space, or where we are able to fully purify the object and thereby take its entropy to zero. In general, then, a given probability of information erasure is compatible with many different values of entropy reduction. By choosing the smallest entropy reduction, one would expect that we may minimise the consequent heat dissipation, as per equation (1.2). Consequently, our desired task can be stated as the minimisation of heat dissipation given probabilistic information erasure-that is to say, of minimising the amount of energy transferred to the thermal reservoir as heat if we require that the probability of preparing the object in a specific pure state 1 j be no smaller than p max Here p max 1 j is the maximum probability of information erasure that is permissible by the physical context, and 0  d the error. We will refer to the equivalence class of unitary operators that achieve this as U opt [ ( )] d . If the object also has a non-trivial Hamiltonian, then to further reduce the total work cost of information erasure, conditional on first minimising the heat dissipation, we may further optimise the unitary operators within the equivalence class U opt [ ( )] d so that the state of the object is made to be passive [26,27], and with as small an expected energy value as possible. This reduced equivalence class is referred to as U opt 1.6. Information erasure and information processing Reducing the heat dissipation due to information erasure is important for both classical and quantum information processing devices. As recent studies suggest [28], heat dissipation is a major limiting factor on the continual growth in the computational density of modern CMOS transistors. Meanwhile for quantum computation in the circuit-based model, error correction requires a constant supply of ancillary qubits, in pure states, for syndrome measurements. Indeed, the authors in [29] show that in the absence of such a constant supply the number of steps in which the computation can be performed fault tolerantly will be limited. Given a finite supply of ancillary qubits, we must constantly purify them during the execution of the algorithm. If the resulting heat dissipation leads to the intensification of thermal noise beyond the threshold for fault tolerance [30], then the computation will fail. A context-dependent Landauer's principle will thus prove especially important for information processing devices, in both classical and quantum architectures, where the structure of the reservoir Hamiltonian will usually be fixed. Furthermore, our work may be useful for certain highperformance, probabilistic (classical) information processing devices, that would operate at or near the quantum regime. Although the current state of the art in information processing devices dissipates heat orders of magnitude in excess of Landauer's limit, our ever increasing ability to control microscopic devices will mean that achieving such theoretical limits may be possible in the not-too-distant future. Indeed, experiments already exist, both in classical [31] and quantum [32] systems, which have achieved heat dissipation very close to Landauer's limit.

Layout of article
In section 2 we shall characterise the equivalence class of unitary operators acting on the composite system of object and reservoir, as a result of which the object undergoes probabilistic information erasure and, given this, the reservoir gains the minimal quantity of heat. If the object also has a non-trivial Hamiltonian, the unitary operators can be further optimised so as to reduce the energy gained by the object. Here, we operate within Landauer's framework-the object and reservoir are initially uncorrelated and the composite system evolves unitarily. We demonstrate, using a sequential swap algorithm introduced in section 2.5, the tradeoff between probability of information erasure and minimal heat dissipation; an increase in probability of preparing the object in a defined pure state is accompanied by an increase in the minimal heat that must be dissipated to the thermal reservoir. In section 3 we apply the general results to the case of erasing a maximally mixed qubit with the greatest allowed probability of success. Two reservoir classes will be considered: (i) a d-dimensional ladder system, where the energy gap between consecutive eigenstates is uniformly ω; and (ii) a spin chain with nearestneighbour interactions, that is under a local magnetic field gradient. For both models, we shall also inquire into the effect of energy conserving, pure dephasing channels on the erasure process. In section 3.3, we determine the minimum quantity of heat that must be dissipated given full information erasure of a general qudit prepared in a maximally mixed state, in the limit of utilising an infinite-dimensional ladder system, which is a harmonic oscillator. In section 4 we shall address how information erasure can be achieved at a lower heat cost than Landauer's limit, by operating outside of Landauer's framework, but in such a way that terms like heat and temperature would continue to have referents in the mathematical description. In appendix A we provide a brief overview of certain key results from majorisation theory that will be used throughout the article. In appendix B we explain what an equivalence class of unitary operators constitutes. Finally, in appendix C we provide proofs for the main results.
. We note that this state representation is unique if and only if there are no degeneracies in the probability distribution p  . We assume that the total system is thermally isolated, so that the process of information erasure will be characterised by a unitary operator U. The state of the system after the process is complete is therefore represents the partial trace, of a composite system A+B, over the system A.
As the pure state we wish to prepare the object in is arbitrary up to local unitary operations, for simplicity we choose this to be ; In general, we wish to achieve p p is the maximum probability of information erasure permissible by the physical context, and p o 0, max we want the process to produce a larger p 1 ( | )  j r ¢ than o 1  , this will lead to a decrease in the von Neumann entropy of . The von Neumann entropy of a state ρ is S tr log The process is also assumed to be cyclic, meaning that the total Hamiltonian at the start of the process is identical with that at the end. As such, the total average energy consumption of the erasure protocol will be A positive E D implies that the process requires energy from an external work storage device. Conversely, a negative E D implies that the process produces energy that can, in turn, be stored in the work storage device. Here, W D is the energy change in the object, which we call work done on the object, and Q D the energy change in the reservoir, or the heat dissipated to the reservoir. As shown in [25,33], these terms can also be written as is the entropy of ρ relative to σ, and is the mutual information of a state ρ of a bipartite system A+B. As we are only interested in cases where S D is positive, we can infer from the non-negativity of the relative entropy and mutual information that Q D is always positive for information erasure, even though W D may be negative.
We wish to make the physical interpretation that Q D is energy that is irreversibly lost during the information erasure process, and is hence qualitatively different in nature from W D . For this to be true, it must be impossible to extract work from the reservoir, after the process is complete, by means of a cyclic unitary process involving the reservoir alone. This is r ¢ is diagonal in the Hamiltonian eigenbasis, and its eigenvalues are non-increasing with respect to energy. If  r ¢ is not passive, as shown by [34] it is possible to extract a maximum amount of work, given as where passive  r has the same spectrum as  r ¢ , but is passive. As will be shown in the following sections, not only is it possible for  r ¢ to be passive, but this is always satisfied in the case of minimal heat dissipation. However, if the dimension of   is at least three, and we have access to N copies of  r ¢ , it may be possible, for a sufficiently large N, to have the compound state N  r¢ Ä be non-passive. This is called activation. Consequently, by keeping the reservoir systems after their utility in the erasure protocol, and then acting globally on this collection, we may be able to retrieve some energy. The only passive state which cannot be activated, no matter how many copies we have access to, is the Gibbs state [26]. However,  r ¢ will not in general be in a Gibbs state. To ensure that Q D is truly lost, irrespective of what reservoir is used, we must impose an additional structure. The simplest method is to impose the condition that the reservoir system is irrevocably lost after the process is complete. For example, if the reservoir system  is randomly chosen from an infinite collection of identical systems, but we do not know which particular system was used, then the probability of picking this system again at random, after the erasure protocol, will be vanishingly small.

Maximising the probability of information erasure
In appendix C.1 we prove that the maximum probability of information erasure is and the equivalence class of unitary operators that achieve this, denoted U g is an arbitrary orthonormal basis in   . To see what we mean by an equivalence class of unitary operators, refer to appendix B. In other words, to maximise the probability of information erasure the unitary operator must take the d  vectors m y , that have the largest probabilities associated with them in the spectral decomposition of ρ, to the product vectors m 1 j x Ä ¢ . Similar results, leading to the conclusion that p max 1 j in general cannot be brought to unity, have been reported in [25,[35][36][37].
A necessary and sufficient condition for p max 1 j to be greater than the largest eigenvalue of the object's initial Recall that the maximum probability of information erasure is given by summing . This implies that for a nontrivial erasure process, whereby the probability of preparing the object in the state 1 j is increased, we require . Similar arguments were made in [25], although there the focus was on providing a bound on the smallest eigenvalue of  r ¢ that could be obtained.

Minimising the heat dissipation
As the initial state of the reservoir is fixed, the heat dissipation is minimised by lowering the expected energy of the forming an orthonormal basis in   for each m. A unitary operator from this equivalence class will ensure that  r ¢ is passive, and that it majorises any other passive state that could have been prepared. This is done by first maximising the probability of preparing the reservoir in the ground state 1 x , by taking the d  vectors n y , that have the largest probabilities associated with them in the spectral decomposition of ρ, to the product vectors l . After this, the probability of preparing the reservoir in the next energy state 2 x is maximised in a similar fashion, and so on for all other energy eigenstates.

Minimal heat dissipation conditional on maximising the probability of information erasure
If we compare the rule that maximises the probability of information erasure, given by equation (4.5), and the rule that minimises the heat dissipation, given by equation (2.11), we notice that they are incompatible. As such, no unitary operator simultaneously exists in both equivalence classes: The two tasks are in some sense complementary, and there will be a tradeoff between them. Here, we shall prioritise; a unitary operator will be chosen such that it maximises the probability of information erasure and, given this constraint, minimises the heat dissipation. In other words, we find the equivalence class of unitary operators that minimise Q D . The zero in braces indicates that the error in probability of information erasure, δ, is zero. To this end we first divide the vector of probabilities p  to form the non-increasing vector of cardinality d  , denoted 0 P  , and the non-increasing vectors of cardinality d We refer to the mth element of 0 P  as m 0 ( ) P  , and the lth element of m 1 . In appendix C.3 we prove that the equivalence class of unitary operators that maximise the probability of information erasure and, given this constraint, minimise the heat dissipation, is characterised by the rules where, for all m, each member of the orthonormal set Effectively, the first line of equation ( < the object is brought to a passive state, although this state will in general not be thermal [26]. We refer to this as passive information erasure, and the resultant equivalence class of unitary operators as U U 0 0  Figure 2 shows the matrix representatisson of (b) Since the desired task is the maximisation of p 1 ( | )  j r ¢ , we need not in general maximise S D because this will lead to a greater amount of heat dissipation than necessary, as per equation (2.6 j r ¢ in the case of  being a two-level system, this will necessarily minimise p 2 ( | )  j r ¢ , which in turn will result in the spectrum of  r ¢ to majorise all possible spectra.
Consequently, S ( )  r ¢ will be minimised, and hence S D will be maximised. However, one can always say that maximising the probability of information erasure requires that we minimise the min-entropy, S min , defined as . The min-entropy is clearly given by the largest value in the spectrum.
, with the vector numbering being in order of increasing energy. The eigenbasis with respect to which the object is initially diagonal is To minimise the min-entropy, therefore, we must maximise the largest value in the spectrum; this is the definition of maximising the probability of information erasure.
 , and U 0 opt p ( ) for passive, maximally probable information erasure is a swap operation, is when d = 2. For larger dimensions, this is no longer the case.
(d) It is evident that  r ¢ is diagonal with respect to the eigenbasis of H  , and that the spectrum of  r ¢ is nonincreasing with respect to the eigenvalues of H  . In other words,  r ¢ is a passive state. However, its spectrum is majorised by that of ( ) . This conforms with Landauer's principle that information erasure must dissipate heat.

The tradeoff between probability of information erasure and minimal heat dissipation
We would now like to relax the condition of maximising the probability of information erasure, and allow the error δ to take non-zero values. The question we would now like to ask is: How will the minimal achievable Q D of heat Q j D  will be minimal. Furthermore, the marginal state of the object,  r ¢ , will always be passive. Each swap operation acts on a subspace spanned by , , . As the state is initially diagonal with , and swap operations only permute the probabilities in the state's spectrum, the composite system will always be diagonal with respect to this basis at every stage of the algorithm.
Therefore, SW 0 =  and as 1 g  , SW g converges to the swap operation. Hence, for any error , the optimal unitary operator U p opt ( ) d would be given by following the algorithm for discrete errors up to j d  , and then replacing the swap operation which would give the error j 1 d +  with the entangling swap operation defined above, with an appropriate choice of γ. This will ensure for a continuous decrease in δ and a continuous increase in Q D .

Examples: erasing a fully mixed qubit with maximal probability of success
We shall now consider the erasure of a qubit, with Hilbert space We are also interested in examining the scenario where no a priori information about the state of the object is known; the probabilities o 1  and o 2  are both one-half. For simplicity, we make the substitution d d  º for the dimension of the reservoir's Hilbert space. The action of the optimal unitary operator for passive, maximally probable information erasure, would therefore be such that the diagonal elements of r ¢, as depicted in figure 2(b) and from top to bottom in decreasing order, are the probabilities p , ,..., , . We will consider two models for the reservoir: which grows with d. In the limit as d tends to infinity, this system will be a harmonic oscillator of frequency ω, with a spectrum bounded from below by zero, and unbounded from above.
minimised and  r¢ is passive with the smallest average energy possible given this constraint. Here increasing sequence of errors. The elements inside a dashed circle (red online) are those which must be swapped to move from j d  to (b) A chain of spin-half systems, with nearest-neighbour interactions, that are under a linear magnetic field gradient.
Here, the reservoir has the Hilbert space for all k. The Hamiltonian is For each reservoir, we wish to determine how much heat is dissipated in excess of the improved lower bound of Landauer's inequality, determined in [25], given as We use the simple form of this lower bound, which is not tight.

Comparison of reservoirs given unitary evolution
. As such, in this limit p max will refer to the amount by which the average energy of the reservoir, at times t t N > , will be greater than that at times t t 0 < , and will have the same meaning as the heat term in equation (2.4). Implicit in this framework is the notion that changing the Hamiltonian acting on the system will take energy from, or put energy into, a work storage device which we do not account for explicitly. If a non-unitary evolution is effected, however, we cannot in general make such an identification. This is because a general completely positive, trace preserving map can always be conceived, via Stinespring's dilation theorem [39], as resulting from a unitary evolution on the system coupled with an environment. Indeed, the energy consumption in such a case will be determined by the total Hamiltonian of the system plus the environment. If energy is allowed to flow between the system and environment, then the energy increase of  (plus the energy increase in ) will not be identical to the energy consumed from the work storage device; Q D may be less or greater than the energy lost.
The only exception to this rule is when the unitary evolution between system and environment conserves the energy of the two individually, whereby no energy is transferred amongst them. This will result in the system to undergo pure dephasing with respect to the (time-local) Hamiltonian eigenbasis; we refer to such a generalised evolution as an energy conserving one. The simplest realisation of such a scenario would require us to consider the sequence of Hamiltonians to be accompanied by the time-ordered sequence of super-operators k N e 1 ,..., is the eigenbasis of H k , while , [ · · ] -and , [ · · ] + are the commutator and anti-commutator respectively, and 0, [ ) G Î ¥ is the dephasing rate. In each time period t t t , while conserving H k ; the system evolves by energy conserving, Markovian dephasing channels. As such channels are unital, they will cause the consequent heat dissipation to increase in proportion to the entropy reduction in the object; energy conserving, Markovian dephasing will be detrimental to the erasure process [25].
For our two models, we will consider the simplest Hamiltonian cycle where the sequence of Hamiltonians sandwiched by H 0 is the singleton to equal U 0 opt p ( ), as determined by the sequential swap algorithm given in section 2.5, when 1 t = . Now, we let the system evolve instead as e 1 ( )( ) L r r t  . By again evolving the system for a period of 1 t = , we may ascertain how such an environmental interaction affects both the probability of qubit erasure, and the heat dissipation. Figure 6(a) shows the effect of dephasing on the erasure process, when the reservoir is a spin chain of length figure 6(b) shows the effect of dephasing on the erasure process, when the reservoir is a ladder system with We note that when the two reservoirs are dimensionally equivalent, i.e., when the ladder system has dimensions d 2 , 2 , 2 , 2 , commensurate with spin chains of length N 2, 3, 4, 5 { } Î , they display the same behaviour under energy conserving, Markovian dephasing channels. This is because the generator of their evolution, the Liouville super-operator 1 L, is the same in such cases. In both instances, an increase in dimension leads to an increase in Q D , while the probability of qubit erasure increases as we move from d 2 2 = to d 2 3 = , decreasing again as we increase further still to d 2 4 = and d 2 5 = .
What is most striking, however, is that the ladder system seems to perform the best precisely when it is dimensionally equivalent to a spin chain. Consider figures 6(c) and (d

Full erasure of a qudit with a harmonic oscillator
Here, we expound on the example of using a ladder system as a reservoir, but consider what happens as we take the limit of infinitely large d. In this limit we may call the ladder system a harmonic oscillator. Let us first consider the case where the object is a qudit, with Hilbert space , prepared in the maximally mixed state in the limit as ω becomes vanishingly small, whereby the spectrum of H  will be approximately continuous. Now let us focus on the case where the object is a qubit, but with an initial bias in its spectrum: In appendix C.5.1 we show that, in the limit as ω tends to zero, Q D will be Q q q q q q 2 1 log 1 In the limit as q tends to one-half, Q D approaches k T B as in our previous analysis. The concomitant entropy reduction is, of course, always S q q q q log 1 1 log as shown in figure 7, it is evident that except for the trivial case of q = 1, commensurate with S Q 0 D = D = , the heat dissipation will exceed Landauer's limit .

Information erasure beyond Landauer's framework
In section 2 the setup for information erasure had the compound system of object and thermal reservoir-our system of interest-as a thermally isolated quantum system whose constituent parts are initially uncorrelated. The system then undergoes a cyclic process described by a unitary operator, and the average energy increase of the reservoir is defined as heat. Indeed, these are the basic assumptions under which Landauer's principle holds. To achieve heat dissipation lower than that discussed in section 2 we must operate outside of Landauer's framework by abandoning some of these assumptions. However, dissipating less heat than Landauer's limit will become meaningless if there is no referent of heat or temperature in the mathematical model. As such, if we wish to avoid making category errors, there are restrictions on the ways in which we may change our assumptions. That is to say, the model must continue to involve a system that is initially prepared in a Gibbs state that is uncorrelated from any other system considered. This way, the system has a well-defined temperature, and we may continue to consider its energy increase as heat. In addition, the process must still be cyclic, i.e., the Hamiltonian of the total system-in particular the thermal system-must be the same at the end of the process, as it was at the beginning. If this condition is not satisfied, we may observe any value of heat we desire by appropriately changing the final Hamiltonian.
One option available is to move beyond unitary evolution. This can be achieved by introducing an auxiliary system to the setup introduced in section 2 so that the unitary evolution of the totality results in the object and reservoir to evolve non-unitarily; the auxiliary system must also have a trivial Hamiltonian, proportional to the identity, for the resultant decrease in Q D to always translate to a decrease in energy consumption. Although the reservoir must always be uncorrelated from the other subsystems for it to be thermalised relative to them [40], the auxiliary system and object may have initial correlations. Unless these correlations are classical, then the resulting dynamics of the object plus reservoir subsystem would cease to be described by completely positive maps [41,42]. The other option available is to first consider a system that is in a thermal state and, therefore, has a temperature. Subsequently, the system may be (conceptually) partitioned into two correlated subsystems, with one of them taking the role of the object. The energy generation due to information erasure of the object, of course, must then be determined over the total system itself. This is because the subsystems do not have well defined Hamiltonians. Although there is technically no thermal reservoir to speak of, since the total system was initially thermal, the average energy change thereof may still be called heat in a consistent manner as before.  figure 8. We may (probabilistically) prepare the object in a pure state by conducting a cyclic process on the total system, characterised by a unitary operator, as before. By letting the Hamiltonian of the auxiliary system, H  , be proportional to the identity, we may ensure that the total energy consumption due to this process would be accounted for by the energy change of the object and thermal reservoir alone. As before, the energy change of the thermal reservoir, Q D , is heat.
In the extreme case, we may consider that the unitary operator acts non-trivially only on the object plus auxiliary subsystem; the thermal reservoir will thus not be involved, and no heat will be dissipated. We would like to know what the necessary and sufficient conditions for fully erasing the object would be in this case.
where R  is the rank of  r ¢ and, hence, the rank of ρ. The class of states that allow for such a transformation can, without loss of generality, be represented as Therefore, a necessary and sufficient condition for full information erasure by unitary evolution, without using the thermal reservoir, is for the rank of ρ to be less than, or equal to, d  . To see how correlations between  and . As before, the reservoir is initially in a thermal state and uncorrelated from the rest of the system. The initial state of the object and auxiliary, however, may or may not be correlated.
All of these have a rank of at most 2, and the reduced state 1 ñá + -ñá which, with an appropriate unitary operator, can be fully erased to 1 1 | | j j ñá . Each state, however, falls under a different class of correlations: u.c. r is uncorrelated, c.c. r is classically correlated, q.d. r has quantum discord, and p.e. r is a pure entangled state. The only case where the state of  is also left intact, however, is when the two systems are classically correlated. Notwithstanding, this cannot be seen as allowing for  to act as a catalyst for information erasure. For  to be utilised in the information erasure of another object system, with the same unitary operator, the two must first be correlated; this process will have a thermodynamic cost itself [43]. In the case where  and  are in a pure entangled state, the unitary operator which prepares  in a pure state will also prepare  in a pure state. As discussed in [22], this will allow for  to be cooled by transferring entropy from it to , resulting in a negative Q D .
In either scenario, the initial state ρ on the composite system of   + , which has a rank smaller than d  , can be seen as a thermodynamic resource. This is because it is a system that is highly out of equilibrium. Recall that the Hamiltonian of  is considered to be trivial, being proportional to the identity. As such, if this system was also at thermal equilibrium with the inverse temperature β, then we would have ( ) . Any unitary operator acting on such a system would not be able to increase the largest eigenvalue of  r . As such, information erasure would not be possible.
In the case where the rank of ρ is greater than d  , but smaller than d d   , the reservoir may be used to facilitate information erasure using similar arguments as in section 2. This will allow for a larger p max 1 j , and a smaller consequent Q D , than if  was not present.

Object as a component of a thermal system
, so that the union thereof forms an orthonormal basis that spans     Ä . As the system was initially thermal, the gain in its average energy is heat, which obeys the identity Here, we make the substitution . As unitary evolution does not alter the von Neumann entropy, this energy production is a function of the relative entropy alone; Q D is therefore nonnegative and independent of S S S ≔ ( ) ( )  . As γ tends to one, thereby resulting in uncorrelated Hamiltonian eigenvectors, both Q D and S D decrease, vanishing in the the limit as β tends to infinity. However, for intermediate temperatures, Q D becomes so low that it 'violates' Landauer's limit. This is similar to the possibility of extracting work from the correlations between a quantum system and its environment, which are initially in a thermal state [44].

Conclusions
In this article, we have developed a context-dependent, dynamical variant of Landauer's principle. We used techniques from majorisation theory to characterise the equivalence class of unitary operators that bring the probability of information erasure to a desired value and minimise the consequent heat dissipation to the thermal reservoir. By constructing a sequential swap algorithm, we demonstrated that there is a tradeoff between the probability of information erasure and the minimal heat dissipation. Furthermore, we showed that except for the cases where the object is a two-level system, or when we are able to fully erase the object's information, we may maximise the probability of information erasure without also minimising the object's entropy; this allows for a more energy-efficient procedure for probabilistic information erasure.
We also investigated methods of reducing heat dissipation due to information erasure by operating outside of Landauer's framework. However, we wanted this departure to preserve the referent of heat and temperature in our mathematical description; dissipating less heat than Landauer's limit becomes meaningless when there is no temperature or heat to speak of. Therefore, we arrived at two alterations to Landauer's framework which would not result in a category error with respect to heat and temperature. The first avenue was to introduce an auxiliary system to the object and reservoir, while the second was to consider the object as a subpart of a system in thermal equilibrium. In the first instance, the figure of merit was identified as the rank of the system in the object-plus-auxiliary subspace; if the rank of this state is less than the dimension of the auxiliary Hilbert space, then full information erasure is possible with at most zero heat dissipation to the reservoir. In the second instance, information erasure can be achieved with possibly less heat than Landauer's limit when the eigenvectors of the system Hamiltonian, that have support on the pure state we which to prepare the object in, are product states.
The primary question we have not addressed in this study, and shall leave for future work, is the inclusion of time-dynamics into what we consider as the physical context; the optimal unitary operator for information erasure is considered here as a bijection between orthonormal basis sets. In most realistic settings, however, one is restricted in the Hamiltonians they can establish between the object and reservoir. As such, the optimal unitary operator may not always be reachable, resulting in a smaller maximal probability of information erasure, a larger minimal heat dissipation, or both. Furthermore, an interesting question to address is the number of times that we must switch between the Hamiltonians, that generate the unitary group, in order to obtain the optimal unitary operator, and how this would scale with the reservoir's dimension. This would provide a link between the present work and the third law of thermodynamics [45] from a controltheoretic [46] viewpoint.

Appendix A. Majorisation theory
Here we shall introduce some useful concepts from the theory of majorisation [47].
First of all, a degeneracy in the probability distribution p  will mean that the representation of ρ, as given in equation ( Lemma C.  must be one, and the rest zero. , We refer to the mth element of 0 P  as m 0 ( ) P  , and the lth element of m 1 In other words, using the new entangling unitary operator, instead of the sequential swap algorithm, will result in p 2,1 to decrease, and p i m , to increase. If i = 2 and m 2  , this will result in a larger Q j . Conversely, if m = 1 and i 3  , this will increase the average energy of the object, and thereby increase W D . If both i 3  and m 2  , then both Q D and W D will be larger. The same line of reasoning would apply for entanglement of higher Schmidt-rank. , C.5. Full erasure of a maximally mixed qudit with a harmonic oscillator Here, we expound on the example of using a ladder system as a reservoir, but consider what happens as we take the limit of infinitely large d. In this limit we may call the ladder system a harmonic oscillator. Furthermore, we consider the object as a qudit, with Hilbert space . Given a fixed and finite ω, in the limit as d tends to infinity there will be infinitely many eigenvalues of H  that become formally infinite, and hence infinitely many probabilities r m  vanish. As such, we have in the limit as ω becomes vanishingly small, and hence the optimal case is achieved when we take the double limit of d going to infinity while ω goes to zero.
Of course, the 'rate' at which we take the limit d  ¥ must be greater than that at which ω approaches zero. As shown in figure C2 (a), for the case of d 2  = , if we increase d while decreasing ω in such a way so as to keep H   constant, both the probability of qubit erasure and the heat dissipation decrease. Precisely, this may be achieved if we define the frequency as , for all m. We may therefore simplify our calculations by replacing sums with Riemann integrals. First, we note that in this case the maximum probability of qudit erasure is  , respectively, precisely when H   is infinitely large. Therefore, if ω and d decrease and increase, respectively, in such a way so that H   also increases, then in this limit we achieve the optimal case of full information erasure with the minimal heat dissipation of d k T 1 B ( )  -. One way of ensuring this, as shown in figure C2(b), is to define the dimension of the reservoir as d 2 1 n = + , where n is a natural number, while defining the frequency as n w w = .
The Hamiltonian norm will be H n 2 , C . 1 9 n ( )  w =   which, in the limit as n tends to infinity, becomes infinitely large.
C.5.1. Full erasure of a qubit with an initial bias. We have shown that when the whole harmonic oscillator is used as a reservoir we can fully purify a qubit in a maximally mixed state, where the entropy reduction is S log 2 ( ) D = , with a heat cost of Q k T B D > . Here we wish to evaluate the optimal Q D for arbitrary initial states of the qubit and, hence, arbitrary entropy changes S D . To this end, define the initial state of the object as After the joint evolution with an infinite-dimensional reservoir, the above sequence p  describes the spectrum of  r ¢ , with the first entry associated with eigenvector 1 x , and so on. In the limit of infinitesimally small ω, the energy spectrum of the reservoir and, hence, the probabilities r  can be approximated as a continuum. We may therefore evaluate Q D by where Ω is the energy 'width' which satisfies q q 1 e ( ) -= bW . In the limit as q tends to one-half, Q D approaches 1 b as in our previous analysis.