Dynamic Cooling on Contemporary Quantum Computers

We study the problem of dynamic cooling whereby a target qubit is cooled at the expense of heating up $N-1$ further identical qubits, by means of a global unitary operation. A standard back-of-the-envelope high temperature estimate establishes that the target qubit temperature can only be dynamically cooled by at most a factor of $1/\sqrt{N}$. Here, we provide the exact expression for the minimum temperature to which the target qubit can be cooled and reveal that there is a crossover from the high initial temperature regime where the scaling is in fact $1/\sqrt{N}$ to a low initial temperature regime where a much faster scaling of $1/N$ occurs. This slow $1/\sqrt{N}$ scaling, which was relevant for early high-temperature NMR quantum computers, is the reason dynamic cooling was dismissed as ineffectual around 20 years ago; the fact that current low-temperature quantum computers fall in the fast $1/N$ scaling regime, reinstates the appeal of dynamic cooling today. We further show that the associated work cost of cooling is exponentially more advantageous in the low temperature regime. We discuss the implementation of dynamic cooling in terms of quantum circuits and examine the effects of hardware noise. We successfully demonstrate dynamic cooling in a 3-qubit system on a real quantum processor. Since the circuit size grows quickly with $N$, scaling dynamic cooling to larger systems on noisy devices poses a challenge. We therefore propose a suboptimal cooling algorithm, whereby relinquishing a small amount of cooling capability results in a drastically reduced circuit complexity, greatly facilitating the implementation of dynamic cooling on near-future quantum computers.


I. INTRODUCTION
Quantum computers offer massive advantages over classical computers in terms of execution time and memory efficiency for a subset of problems, such as optimization and simulation [1,2].While various physical implementations of quantum computers are still being explored (e.g., superconducting circuits, ion traps, neutral atoms), all must fulfill a fundamental set of requirements [3].One of these requirements is the ability to initialize the quantum bits, or qubits, into a pure, fiducial quantum state.Furthermore, pure ancilla qubits will be required for fault-tolerant quantum computers to perform quantum error correction [1,4,5].The preparation of pure state qubits is therefore a key hurdle in the successful implementation of quantum computers now and in the future.
The problem of initializing a large set of qubits into a pure state was first studied in the context of generating highly polarized qubits in nuclear magnetic resonance (NMR) systems to improve signal-to-noise ratios [6][7][8].Since large polarization in qubits can be obtained by cooling them down to very low temperatures, scientists began to explore techniques to cool qubits below temperatures that can be achieved with direct, physical cooling methods (e.g., cooling with lasers or large magnetic fields).Schulman and Varizani were the first to propose effective cooling of qubits for quantum computation via the application of certain logic gates on the qubits [9], which following Ref.[10], we refer to as dynamic cooling.Their proposal, based on entropy manipulation in a closed system, cools a subset of qubits (e.g., a single target qubit) at the expense of heating the others by performing unitary operations on the entire set of qubits, see Figure 1.
In the high initial temperature regime, which was relevant for the NMR-based quantum computers available at the time dynamic cooling was proposed, the Shannon bound establishes that the target qubit can be dynamically cooled by a factor of at most 1/

√
, where  is the total number of qubits [11,12].This slow scaling led to the dismissal of dynamic cooling as an impractical method for cooling qubits, and gave thrust to further research aimed at beating Shannon's bound.Since the bound holds for closed systems, subsequent proposals extended the scenario to open systems by allowing a subset of qubits to interact with the environment (i.e., a heat bath), thereby achieving cooling beyond Shannon's bound [11,12].Such techniques are usually referred to as heat bath algorithmic cooling (sometimes simply algorithmic cooling) [13][14][15][16][17][18][19][20][21][22][23][24][25][26][27].However, quantum computing technology has undergone a dramatic revolution in the last two decades, with hightemperature NMR quantum computers falling out of favor as newer models that operate at very low temperatures (e.g., superconducting circuits, ion traps) have shown great promise [28,29].In tandem, thanks to the development of quantum thermodynamics, much interest has grown within the scientific community in regard to the possible advantage, in terms of energy consumption, of quantum technology in general and quantum computing in particular [30].
Here, we re-examine dynamic cooling in light of the scientific and technological advances that have been achieved since its inception over two decades ago.We consider a set of  identical qubits, each initially in thermal equilibrium at some initial temperature , that undergoes dynamic cooling via a global unitary transformation , schematically represented in Figure 1.After such a transformation, the ground and excited state populations of the target qubit change, thereby affecting a change in its temperature.We analytically solve the problem of finding the minimum final temperature  ′ that can be achieved as a function of initial temperature , qubit resonant frequency , and total number of qubits .This allows us to unveil a crossover from the expected 1/

√
scaling at high  to a much faster, unexpected, 1/ scaling at low .
We also provide an analytical expression for the minimal work cost associated to maximal cooling and show that it scales linearly with  (i.e., it is extensive) and displays distinct behaviours at low and high temperature.While it vanishes like 1/ in the high  regime, it vanishes exponentially as  −1/ in the low  regime.These results evidence that dynamic cooling behaves very differently at high and low initial temperatures.
In particular, at low  it is much more effective in terms of system-size scaling and energy cost.
Since current quantum computers operate in the low  regime (unlike early NMR quantum computers), these results reinstate the appeal of dynamic cooling for generating pure state qubits for quantum computation.Given this renewed viability, we discuss the the implementation of dynamic cooling in terms of quantum circuits, and examine the effect of noise on cooling on near-term quantum computers.We successfully demonstrate dynamic cooling on a real quantum processor on a system of  = 3 qubits.While scaling dynamic cooling up to larger systems on noisy quantum computers is a challenge due to the rapid growth of circuit size with , we demonstrate how this can be overcome by accepting a suboptimal cooling scheme, whereby increased cooling can be achieved at a fixed (low) circuit complexity as the system size is increased.Our re-examination of dynamic cooling suggest that it is a promising technique for preparing pure state qubits on near-future quantum computers.

II. MAXIMAL COOLING
The initial state of the global system reads where  = 1/(  ),   is Boltzmann's constant and  () = Tr  −  = 2 cosh(/2) is the partition function of any of the qubits, whose Hamiltonian   = ℏ   /2, is here expressed in terms of the Pauli operator    , the reduced Plank's constant ℏ and the resonant frequency .All qubits are assumed to have the same resonant frequency.
To maximally cool the target qubit, the goal is to minimize  ′ 1 over all possible global unitaries .This problem is equivalent to finding the set of unitaries that minimizes the expectation value of the final energy of the target qubit . Thus, we must solve the minimization problem: where  is the target qubit Hamiltonian expressed in the Hilbert space of the total system.This problem is formally identical to finding the ergotropy of a driven system (ergotropy is the maximum extractable work from a quantum system) [31].
The only difference is that solving for the ergotropy addresses the total system energy, setting  in Eq. 2 to the total system Hamiltonian.Here, we address the energy of a subsystem (i.e., the target qubit), using its Hamiltonian for .Since the specific form of the Hamiltonian is irrelevant to the objective minimization problem, we may borrow techniques used to compute the ergotropy and directly apply them to our problem.
To do so, we note the critical fact that for our system, the initial state  commutes with the Hamiltonian .As discussed in Ref. [31], in such a case the optimization is particularly simple: the minimum is achieved when  is a permutation matrix that maps the eigenstate of  with largest eigenvalue to the eigenstate of  with smallest eigenvalue, the eigenstate of  with second largest eigenvalue to the eigenstate of  with second smallest eigenvalue, etc.In other words, if   are the eigenenergies of , then if we order the eigenvalues   of  in non-increasing fashion, that is the minimizing unitary  in Eq. 2 is one that performs the permutation  such that Note that this holds even in the case of degenerate spectra.Indeed, in our case the spectrum of  is highly degenerate, with only two distinct eigenvalues: The maximum amount by which the target qubit can be cooled is determined by calculating its final excited state population  ′ 1 .Note that  ′ 1 is simply the sum of the final occupation probabilities of the states {|1  2 ...  ⟩}, which are the exact set of states to which the lowest half of probabilities are mapped.Therefore, to compute  ′ 1 we simply generate a list of all the occupation probabilities in non-decreasing order and sum the first half of the list.
To do so, note that the occupation probability of a state with  bits set to 0 is given by (1 −  1 )    − 1 (where  1 is given by the initial temperature of the qubits), and there will be   states with this probability.States with more bits set to 0 (higher ) have higher initial occupation probabilities.Therefore, we can generate a list of the probabilities in nondecreasing order by appending the   probabilities of value (1 −  1 )    − 1 to the list as we increase  from 0 to .Summing the first half of this list will give  ′ 1 ( 1 , ).When  is odd, there are an even number of distinct values of  ranging from  = 0 to  = .The number of probabilities for the first half of  values ( = 0, ..., ⌊/2⌋) is equal to the number of probabilities for the second half of  values ( = ⌈/2⌉, ..., ).Thus, The calculation is slightly more complicated when  is even, since now, dividing the list of probabilities in half involves splitting in half the degenerate group of probabilities where  = /2.This means that we must add to Eq. 5 half of the   /2 degenerate probabilities with value (1 −  1 )  /2   /2 1 : An intriguing observation is that if we start from an odd number of qubits, adding one more qubit will not increase maximal cooling: (proof provided in Appendix B).This generalizes the fact that a total of at least three (identical) qubits is required to obtain some cooling [32].To see this, note that with a total of one qubit no cooling is possible by means of a unitary manipulation, so Eq. 7 implies that cooling with a total of two identical qubits is likewise impossible; a minimum of three qubits is required for dynamic cooling.
Figure 2 shows  ′ 1 as a function of  1 for increasing system sizes .Note that  ′ 1 ( 1 , ) is an increasing function of  1 , meaning that the larger the initial temperature, the larger the final temperature, which agrees with intuition.Note also that, in the interval [0, 1/2[,  ′ 1 ( 1 , ) is a decreasing function of , namely, the larger  the higher the cooling, in agreement with what one would expect.We have lim meaning that as long as the initial temperature is finite and non-negative, by increasing  one can cool the target qubit Final probability of the excited state of the target qubit after maximal cooling  ′ 1 versus its initial probability  1 , for increasing numbers of total system qubits  = 2 2 , 2 3 , ..., 2 10 .arbitrarily close to zero temperature.Note however the crucial fact that  ′ 1 (1/2, ) = 1/2 for any .This is because any unitary evolution leaves the completely mixed state unaltered; no cooling is possible if the initial temperature is infinite, regardless of .This constraint is responsible for the low 1/

√
scaling at high temperature, which will be discussed below.
Using the relation between the initial excited state population  1 and temperature     ℏ as well as the analogous relation between the final, minimal excited state population  ′ 1 and temperature  ′ , we can write the final minimal temperature as a function of the initial temperature  as: Here the expression final temperature is not being used in a strictly thermodynamic sense, i.e., to denote the temperature of the thermal bath surrounding the qubit, but rather it is being used in an "effective" sense, i.e., to the denote the temperature that the bath would have if it were in equilibrium with the qubit.
Figure 3 shows a log-log plot of  ′ versus  for various system sizes  = 2.The black dashed line plots  ′ =  to guide the eye in seeing the amount of cooling that occurs.In both the low- regime and the high- regime, there is a linear relationship between  ′ and  (the slope of the log-log plots is 1) but the coefficient of proportionality (i.e. the vertical shift of the plots) scales differently with  in the two regimes.
In the high- regime,  1 is close to 1/2, hence we Taylor /ℏ FIG. 3. Log-log plot of the minimum final temperature  ′ versus the initial temperature  for various system sizes  =  2 .The black dashed line plots  ′ =  (slope equal to 1) to guide the eye in seeing the amount of cooling that occurs.In the low- regime (bottom-left), curves for various system sizes are parallel with a slope of 1, implying a linear relationship between  ′ and ; and the curve for each system size  has a vertical shift of ln  from the black dashed line.In the high- regime (top-right), curves for various system sizes are also parallel with a slope of 1, but now the curve for each system size  has a vertical shift of ln  2 + ln( 2 ) from the black dashed line. expand Expanding the expression  ℏ = (1 − Note that  ′ > / √  because √︁ /2 > 1, which means that Shannon's bound is obeyed as expected, but not saturated.Finding that the scaling 1/

√
is realized in the high T regime (as opposed to just a theoretical bounding limit) is per se a non trivial result.This slow scaling is clearly visible in the top right corner of Figure 3.
In the low- regime, we have For small  1 , we have Using Stirling approximation we obtain   ≃  ln 4 +  (ln ) (see Appendix D), hence ℏ ′ ≃ (ℏ − ln 4).At low temperature (i.e., ℏ ≫ 1) the term ln 4 is negligible, therefore  ′ ≃ , or: This reveals that in the low- regime, a much faster 1/ scaling holds for dynamic cooling.This superior scaling is clearly visible in the bottom left corner of Figure 3.
A characteristic value of   /ℏ for contemporary quantum computers based on superconducting qubits, ion traps, or neutral atoms is ≃ 0.2, which places them within the start of the low  regime.For example, current superconducting qubit quantum computers typically operate at  = 5 GHz and  1 = 0.01, which equates to an initial temperature of  = 8.3 mK.Given these values, we find  ′ = 2.1 mK for  = 5 ( = 9, 10), which is slightly above the scaling value of / = 8.3/5 mK = 1.66 mK.However, in accordance with our analysis above, the estimate / becomes better and better as  increases and/or as  decreases further.

III. MINIMAL WORK
Due to the large degeneracy of the spectrum of  (defined in Eq. 2), there is a great number of distinct permutations that achieve the desired ordering of eigenvectors for maximal cooling.A natural question is then, which among all these permutations have the smallest cost in terms of energy injection into the system, i.e., the work performed on the system, given by where  =    denotes the total system Hamiltonian.We recall that, since the initial state is passive, we have  ≥ 0. When  realizes a permutation , Eq. 16 boils down to where   are the eigenvalues of .Minimal work cost is thus determined by the following minimization problem: where C denotes the set of permutations that realize maximal cooling.Solving this further minimization problem is straightforward.As described above, in order to achieve maximal cooling it is sufficient to map the half of states with the highest occupation probabilities to the set of states {|0  2 ...  ⟩}.To simultaneously achieve minimal work cost, within this set of states the highest probability should be assigned to the state with the lowest total system energy, the second highest probability should be assigned to the state with the second lowest total system energy, etc.The probabilities should be mapped in an analogous way for the other half of states in the set {|1  2 ...  ⟩}.This works because states with lower total system energies have higher initial occupation probabilities by definition.So assigning the highest final probability to the state with the lowest total system energy within each half-list minimizes the differences of the initial and final probabilities   () −   in Eq. 17, thereby minimizing work (see Appendix A for more details).
Computing the minimal value of work  that must be invested to obtain maximal cooling is conceptually a simple task, but, in practice, it presents some challenges.Note that due to memory limitations, writing the 2  dimensional arrays that list the energy eigenvalues   , and the populations   ,   () quickly becomes intractable as  increases (on a desktop computer this already happens around  ≃ 26).We overcome this bottleneck by exploiting the sparsity of these arrays which allows us to encode the relevant information into arrays whose sizes scale linearly, thereby allowing the evaluation of  for  into the thousands.
Figure 4 shows the rescaled work / as function of  1 for various .For  1 ∈ [0, 1/2], our numerical calculations cleary evidence that as  is increased the solid curves approach the black dashed curve, which plots the following analytic expression: The minimal work is extensive which evidences a trade-off between cooling power and energetic cost: the further one cools a qubit, the more energy one must expend.At low , this trade-off is balanced, as the product  ′ ∼ 2 w is of order 1.However, in the high  limit, the trade-off is disadvantageous because the product  ′ ∼ ( √ /2) w √  scales like √ .Note that  goes to zero as it should for  1 = 1/2 (at infinite initial temperature, any  will leave  unaltered), and w   /ℏ FIG. 5. Rescaled minimum work in the thermodynamic limit versus initial temperature .The inset shows an enlargement of the exponential vanishing of work at low initial temperatures.Symbols denote state-of-the-art values for qubits on various contemporary quantum computers, including superconducting qubits (black circle), neutral atom qubits (white triangle), and the trapped ion qubits (black cross).
for  1 = 0 (at zero initial temperature, the best you can do is to leave  unaltered, i.e.,  = 1).It is instructive to rewrite the scaling function w in terms of initial temperature plotted in Figure 5.In the high- regime w vanishes like 1/ while in the low- regime, w vanishes exponentially,  −ℏ/ .The inset of Figure 5 shows an enlargement of the low- behaviour of w and marks state-of-the-art values for qubits on various contemporary quantum computers [33], including superconducting qubits (black circle) [34], neutral atom qubits (white triangle) [35], and trapped ion qubits (black crosss) [36].
All of them are in the low  region of the curve, while early NMR qubits are far beyond the full scale of the plot in the high  regime.

IV. IMPLEMENTATION
In order to perform dynamic cooling on quantum computers, the cooling unitary  must be translated into a quantum circuit.As stated above, there is a large family of unitaries that can achieve maximal cooling, and different quantum circuits will result from different choices of .Near-term quantum computers are noisy, with larger circuits accumulating more errors than smaller ones.Therefore, from an implementation perspective, cooling unitaries that can be translated into smaller circuits are more desirable (here and below, the size of a quantum circuit refers to the number of constituent elementary oneand two-qubit gates).
Various protocols exist for generating maximally cooling unitaries.A few protocols of interest include (i) the partnerpairing algorithm (PPA), described in Ref. [13], (ii) a minimum work protocol, which generates a unitary with minimal FIG. 6. Final temperatures  ′ of the target qubit versus noise probability  of the quantum computer for various system sizes  initialized at (a)  = 15 mK or (b)  = 100 mK.We assume a value of  = 5 GHz, which is a typical value for superconducting qubits.The black dashed lines denote initial temperatures of the system.Simulation results from quantum circuits derived from the mirror protocol and executed on a noisy quantum simulator.
work cost, and (iii) a protocol we call the mirror protocol.The mirror protocol is convenient as it can quickly and automatically generate a unique maximal cooling unitary for arbitrarily large  (other protocols can be significantly more computationally difficult, more heuristic, or have degenerate solutions), the downside being that it generates the largest circuits (see Appendix A for more details on the various protocols).The design of a protocol for generating a maximal cooling unitary with minimal circuit size remains an open question for future research.
It is expected that the size of the cooling circuits will grow exponentially with system size , as the number of states that must be permuted for maximal cooling likewise grows exponentially in  (sub-exponential growth, however, has not been disproved).See Appendix E for an expanded discussion of the circuit sizes for dynamic cooling.This scaling has major implications for the practical implementation of dynamic cooling on noisy quantum computers.Namely, while increasing  increases the theoretically optimal cooling capability, increasing  also increases the depth of the associated circuit, and therefore the accumulated error due to noise.
Figure 6 plots the final temperature of the target qubit versus a noise probability parameter  for various system sizes  for two different initial temperatures.The results are derived from quantum circuits simulated with a noisy quantum simulator (a classical computer used to simulate the performance of a noisy quantum computer) [37], using a noise model based on a depolarizing channel [38], which can be tuned with a single noise parameter  that effectively sets the probability of error.It is implemented by inserting a random Pauli operator after each gate in the circuit with probability .The model is commonly used to emulate the performance of circuits on noisy quantum computers as it approximates well the average noise for large circuits [39][40][41][42].Furthermore, as it is parameterized by only one parameter, the model facilitates studying the scaling of performance versus noise.The quantum circuits were generated using the mirror protocol, which was chosen because (i) it generates a unique cooling unitary for each system size , providing a fair comparison across various system sizes and (ii) it produces larger circuits than other protocols, meaning that if cooling is possible with the mirror protocol, it will certainly be possible with cooling unitaries better optimized for circuit size.
We emphasize that these results should only be understood qualitatively, since the noise model does not describe the precise noise present on any particular quantum processor.Moreover, optimizations in terms of selecting a cooling unitary with minimal circuit size and advanced circuit transpilation techniques were not implemented, which would result in shorter, less noisy circuits.As a result, quantitative conclusions from the plots cannot be drawn; rather, Figure 6 serves to reveal trends in how performance scales with noise.
The initial temperature in each plot is indicated with a horizontal black dashed line.The colored lines indicate the final temperature of the target qubit versus the noise probability  for a range of different system sizes.Given a system with an odd number of qubits , both plots show that the addition of one more qubit (which theoretically should exhibit identical cooling capability) impairs cooling capability at higher noise probabilities.While adding more than one qubit to the system increases cooling capability at low noise, we see this can actually decrease cooling capability when noise is sufficiently high.Furthermore, for a given system size, the noise probability  at which addition of qubits becomes detrimental as opposed to advantageous is smaller when the system is initialized at a lower temperature.In other words, a system initiated at a lower temperature will be more sensitive to noise.
We conlcude that in practice, there will be an optimal (finite) number of qubits to use for dynamic cooling, which depends on the level of noise in the quantum hardware as well as the initial temperature of the qubits.

V. DEMONSTRATION
We demonstrate dynamic cooling with  = 3 qubits on the IBM quantum computer.Advanced circuit optimization was performed with BQSKit [43] to reduce the the cooling circuit down to only nine 2-qubit elementary gates.The circuits were executed on the ibmq_brisbane quantum processor, which contains 127 qubits.Dynamic cooling was individually performed within twelve different 3-qubit clusters on the chip simultaneously.Figure 7 plots the presumed initial temperature of each target qubit (black) and the final temperature of each target qubit (blue) after dynamic cooling was executed within each cluster.The presumed initial temperature was computed by executing an empty circuit that only measured the target qubit.The measurements allowed us to compute the initial population of the excited state of the target qubit, which was then converted into an initial temperature using Eq. 9.This calibration circuit was run five separate times, each with 1024 shots, with the black dots denoting the average value and error bars denoting one standard deviation.The clusters are indexed from 1 to 12, in increasing order of initial temperature.Dynamic cooling was executed in 36 separate runs, each with 1024 shots, on each of the 3-qubit clusters, with the blue dots denoting the averaged value and error bars denoting one standard deviation.The final temperature is analogously calculated using Eq. 9 with measurements of the final population of the excited state of the target qubit after dynamic cooling.The black and blue curves are drawn to guide the eye in seeing the successful cooling in cluster 12.
The fact that cooling only occurs in the cluster with the highest initial temperature is in line with the trends revealed in Section IV; namely, qubits at lower temperature are more sensitive to noise, making them harder to cool at a given level of noise.These results suggest that dynamic cooling might best be used in a scheme which scans the initial temperatures of the qubits (or some estimate thereof) and only applies dynamic cooling to those qubits above some threshold initial temperature.
As noise levels continue to decrease on quantum computers, larger circuits will become more feasible to execute, allowing dynamic cooling to be scaled up to larger system sizes , thereby cooling the target qubit down to even lower temperatures.While the rapid growth of circuit size with increasing  poses a challenge, the fact that cooling scales much better with system size in the low- regime may enable sufficient cooling with moderately low .Another path forward to ameliorate large circuit size is a suboptimal cooling scheme, whereby giving up a small amount of cooling power results in a large decrement of circuit complexity, as we next describe in Section VI.

VI. SUBOPTIMAL DYNAMIC COOLING AT FIXED COMPLEXITY
Currently, the biggest hurdle to the success of dynamic cooling on near-term quantum computers is the size of the cooling circuit.While there is a balanced trade-off between the amount of cooling and energy expenditure at low , circuit size appears to grow exponentially with  (see Appendix E).While this may seem to be an insurmountable obstacle, here we shall see that it can be overcome by relaxing the requirement of optimal (i.e., maximal) cooling and agreeing to achieve a suboptimal final temperature.In fact, we find that suboptimal cooling can still (ideally) cool the target qubit down to arbitrarily low temperatures by increasing , but at a fixed circuit complexity and with a lower work expenditure as compared to optimal cooling.The price that needs to be paid is that the scaling of the final temperature with system size will be slower than 1/.
To see this, consider the following suboptimal cooling protocol.Take a system with a total of  =  2 qubits, divided into  clusters, each containing  qubits.Cooling can then be executed in two steps, where first, dynamic cooling is performed within each of the  clusters.Assuming we are operating in the low- regime, this will bring a total of  qubits to  ′ ≃ (2/) (one qubit from each of the  clusters).In the second step, dynamic cooling is performed amongst these  cooled qubits, bringing one of them to  ′′ ≃ (4/ 2 ) = (4/).While this is less than the maximal cooling  ′ ≃ (2/ 2 ) = (2/), it only requires cooling unitaries acting in a space of dimension √ , drastically reducing the associated circuit complexity.Here and below, the complexity of the circuit refers to the maximum Hilbert space dimension on which any of the associated cooling unitaries acts.If we take, for example,  = 9, optimal cooling requires a circuit to be generated from a unitary acting on a Hilbert space of dimension 2 9 , whereas the suboptimal cooling circuit is generated from unitaries acting only on Hilbert spaces of dimension 2 3 .This algorithm can be generalised to  =   , using  steps to obtain a final suboptimal temperature of  ( ) = (2  /).For a fixed dimension of the clusters , and hence for a fixed circuit complexity, this amounts to a cooling that scales as versus suboptimal cooling (dashed-dotted curve) versus noise probability .We compare optimal cooling with  = 3 qubits (solid blue),  = 5 qubits (solid green), and  = 9 qubits (solid black) to suboptimal cooling with  = 2 steps of cooling with clusters of size  = 3, for a total of  = 9 qubits (dashed-dotted black).An initial temperature of 15 mK is assumed, denoted with the dashed horizontal line.
For  > 2 (which is necessary for cooling to occur in the first place) this implies a negative exponent, ensuring that the qubit can be taken to arbitrarily small final temperature, by increasing , affected by increasing the number of cooling steps  [44].
The key feature here is that cooling of the target qubit can be augmented by increasing  without increasing the complexity of the circuit, which remains fixed.Note that in suboptimal cooling, the total circuit comprises a number of -qubit sub-circuits equal to  =1   − ≤   −1 =  / =  ln /( ln ) .Therefore, the while the circuit size still grows with increasing , it does so only quasi-linearly, as opposed to exponentially with , amounting to a major reduction in circuit size for a given total system size .
The reduction in circuit complexity of suboptimal cooling significantly increases the feasibility of dynamic cooling on noisy quantum hardware, as evidenced in Figure 8.For the same total number of qubits  = 9, while suboptimal cooling with  = 3 and  = 2 (dashed-dotted black curve) relinquishes a small amount of cooling capability at very low levels of noise, it has a significant performance advantage for moderate to high levels of noise compared to optimal cooling (solid black curve).This is due to the substantially reducted circuit complexity.Notice how optimal cooling at the same circuit complexity ( = 3, solid blue curve) achieves significantly less cooling than the suboptimal routine.Remarkably, the suboptimal cooling routine achieves more cooling than optimal cooling with  = 5 (solid green curve), even though it has a smaller circuit complexity.The advantage of the smaller circuit complexity can also be seen by comparing the noise level at which adverse effects begin to impair cooling: reduction in cooling capability begins at a noise level that is an order of magnitude larger for the suboptimal cooling with  = 3,  = 2 versus optimal cooling with  = 5, confirming that suboptimal cooling is more resilient to noise.
There is also a reduction in the work cost with suboptimal cooling.The total work cost  ( ) is given by the sum of the work costs associated to each step where the symbol stands for the -fold application of the function   ()  ′ 1 (, ), which we introduce for clarity of notation, and  is given in Eq. (18).
The solid curves in Figure 9 plot the minimal work for suboptimal cooling  ( ) ( 1 , )/  as a function of  1 for  = 3, with either  = 3 (black) or  = 4 (red).The according values for work at optimal cooling  ( 1 ,   )/  using the same total number of qubits  =   are also plotted for reference with dotted curves and corresponding colors.Note how, as anticipated, the work associated to the suboptimal -step cooling is less than the minimal work associated to optimal cooling for the same total system size .(We remark that a different form of suboptimal cooling was also studied in Ref. [10], where it was also shown to lead to a dramatically reduced work cost).Note also that the curves  ( ) ( 1 , )/  collapse onto a single curve for growing , meaning that for large ,  ( ) ( 1 , ) scales linearly with , namely,  ( ) ≃  w ( ) .This linear scaling can be understood by noting that the sum in Eq. ( 22) is dominated by the first term   −1  ( 1 , ), which is upperbounded by   w( 1 ) which is linear in  =   .Subsequent terms are upper bounded by   − w(  −1  ( 1 )).While the factor   − =  1−/ is evidently sub-linear, the overall scaling is much slower than that, because the factor w is evaluated at points  −1  ( 1 ) that quickly vanish as  grows (note that w() ≃ /2 for small , Eq. ( 19)).
We have thus shown the non-trivial fact that one can, in principle, cool a target qubit down to arbitrarily low temperature with fixed circuit complexity and at fixed work cost per qubit.The price that must be paid is that of a slower than linear scaling of cooling with system size, Eq. (21).A price that is, however, counterbalanced by a smaller energy cost and significantly reduced circuit complexity and therefore, resilience to noise.

VII. CONCLUSION AND OUTLOOK
In light of the major developments in quantum technology, which have moved contemporary quantum computers into the low- regime, we have re-examined dynamic cooling as an effective technique for cooling qubits beyond what is practically achievable with direct, physical cooling methods.We found an analytic expression for the minimum final temperature  ′ that can be achieved for the target qubit as a function of the intial temperature .We explored the high- and low- regimes and discovered a crossover from a problematic scaling of 1/

√
at high  to a much more efficient scaling of 1/ at low .We also proposed an analytic expression for the minimal work cost  associated to maximal dynamic cooling, which scales linearly with .In particular, while the work cost vanishes like 1/ in the high  regime, it vanishes exponentially in the low  regime as  −ℏ/ .
We then turned to the implementation of dynamic cooling on noisy quantum computers.We noted that different protocols for generating cooling unitaries will give rise to varying quantum circuits sizes, leaving for future work the problem of finding the cooling unitary with minimal quantum circuit size.We acknowledge that circuit sizes grow rapidly with system size , and explore the implications with simulations of dynamic cooling on a noisy quantum simulator.The results indicate that there exists an optimal, finite value of  with which to perform dynamic cooling, dependent on the level of noise on the quantum hardware and the initial temperature of the qubits.
Despite high levels of noise of current quantum computers, we were nevertheless able to successfully demonstrate dynamic cooling on the IBM quantum processor.Using a system size of  = 3, we performed dynamic cooling on twelve separate 3-qubit clusters on the 127-qubit chip, observing cooling in just one of the clusters, which is presumed to have been at a higher initial temperature than all the others.These results suggest that a prudent approach for implementing dynamic cooling on noisy quantum devices may be to scan the initial temperatures (or an estimate thereof) of all the qubits, and only apply dynamic cooling to those qubits above a threshold temperature.As noise levels continue to decrease, we expect dynamic cooling will be capable of cooling qubits initialized at lower temperatures, and achieve greater cooling using larger system sizes .
Because of the superior scaling of dynamic cooling with system size in the low- regime, it may be sufficient to perform dynamic cooling with few enough qubits to maintain reasonable circuit sizes in near-future quantum devices.How-ever, to overcome the hurdle posed by the rapid growth of circuit size in the near-term, we proposed an algorithm for suboptimal dynamic cooling, whereby instead of reaching the optimal final temperature for a given , we agree to reach a somewhat higher final temperature at the gain of drastically reduced circuit complexity.Surprisingly, cooling a target qubit down to arbitrarily low temperatures is still possible, in principle.While cooling scales more slowly than 1/ with this suboptimal routine, the circuit complexity remains fixed with increasing , yielding the ability to increase cooling capability without increasing the complexity cost.
Recent progress in quantum computing technology has exhibited a slow but steady decrease in noise levels, but a relatively fast increase in the total number of available qubits.Large numbers of moderately low-noise qubits render the suboptimal cooling described above a very viable scheme for the near-future quantum devices.Furthermore, given the demonstration of cooling with a 3-qubit cluster on a considerably noisy quantum processor, there is hope that suboptimal cooling with cluster sizes of  = 3 could soon be a realizable path for cooling.It should be noted, however, that such schemes may require the connectivity of qubits to be re-considered in superconducting qubit implementations, which usually provide lattice-shaped connectivity.Instead, a fractal-like network of clustered qubits could greatly facilitate the suboptimal cooling algorithm.Trapped ion and neutral atom quantum computers, which provide all-to-all qubit connectivity, are better suited for both optimal and suboptimal dynamic cooling.
All our results support the conclusion that in the lowtemperature regime, dynamic cooling is much more effective in terms of scaling and energy cost than at high initial temperatures, and is capable of achieving cooling when noise is reduced to low enough levels.Given that current quantum computers operate in the low  regime (unlike early NMR quantum computers), these results reinstate the interest of dynamic cooling for quantum computing applications.
Note Added.After submission of our manuscript, it was brought to our attention that Eqs. 5 and 6 had previously been derived using a majorization technique in the Ph.D. Thesis of Rodríguez-Briones [45].

Maximal cooling with 𝑁 = 3 qubits
Consider three identical qubits all initialized at the same temperature .Our goal is to maximally cool the first qubit below this temperature with the application of a unitary  on all three qubits.
If one lists the 2  = 8 states of the total system in lexicographic order, shown in column 2 of Table I, one sees that they are listed in order of increasing eigenenergies of : the states in the first half of the list, of the form |0  2  3 ⟩, have eigenenergy −ℏ/2, while the states in the second half of the list, of the form |1  2  3 ⟩ have eigenenergy +ℏ/2.Initial occupation probabilities of each state   are given in column 3, where  ≡  1 is the initial occupation probability of the excited state for the target qubit.Notice that the 2  2 = 4 largest occupation probabilities are not in the first half of the list.Specifically, the 4 highest probabilities are ).However, one of these values appears in the lower half of the list.Using the PPA, maximal cooling can be achieved by reordering the probabilities in non-increasing order, which can be accomplished by swapping the two states |011⟩ and |110⟩.
All states of the  = 3 qubit system listed in lexicographic order (column 2) with their initial occupation probabilities   (column 3).Columns 4 and 6 give various permutations, while columns 5 and 7 give the final occupation probabilities of each state after the respective permutation.(For better readibility the states that are not being displaced by the permutation are in grey).
This permutation can be carried out with a unitary operator , defined in the computational basis, as 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 To gain a better understanding of how this unitary performs optimal cooling, we examine how the population of the excited state of the target qubit changes before and after this transformation.Recall that  1 and  ′ 1 denote the occupation probability of the target qubit's excited state before and after the application of , respectively.Thus, where   1  2  3 is the occupation probability of the state | 1  2  3 ⟩.Now,  implements a transformation that swaps the state |100⟩ with the state |011⟩.Accordingly, it exchanges the populations of the two states, that is  ′ 100 =  011 , and  ′ 011 =  100 , while leaving all other populations unaltered.It follows that because the probability  011 =  −ℏ/2 / 3 featuring two excitations is lower than  100 =  ℏ/2 / 3 featuring only one excitation.From Eq. (A4) it follows that Thus, for 0 <  1 < 1/2 (i.e.,  > 0), we have  ′ 1 <  1 , namely the target qubit is cooled.Due to the degeneracy in the spectra of  and , there exists a family of degenerate unitaries that can achieve maximal cooling.For example, any unitary of the form in Eq. (A1), but with arbitrary phases replacing the 1's, can also achieve maximal cooling.Furthermore, any unitary that swaps states with equal occupation probabilities (in addition to the swap |100⟩ ↔ |011⟩) also achieves maximal cooling.In fact, there even exist unitaries performing maximal cooling that implement permutations with cycle lengths greater than two (nb: a swap is a permutation cycle of legnth two).An example of such a unitary is given by   in column 6 in Table I, featuring the the single cycle of length 6 given by In this case, the probabilities are no longer in non-increasing order.However, the 4 largest probabilities reside in the first half of the list, which implies maximal cooling of the target qubit.
In the case of  = 3 qubits, the mirror protocol and the minimal work protocol use the same cooling unitary as the PPA.Therefore, we reserve explanation of these two protocols until the next illustrative example with  = 4 qubits, where all three protocols can generate different maximally cooling permutations.
Finally, we remark that cooling cannot be achieved with a total of  = 2 identical qubits.If the 2 2 = 4 states are ordered in increasing lexicographic order, |00⟩ , |01⟩ , |10⟩ , |11⟩, the states are automatically listed in order of increasing eigenenergy of  (i.e., energy of the target qubit), and we see that the two highest probabilities already occupy the first half of the list.Thus, the target qubit cannot be further cooled.

Maximal cooling with 𝑁 = 4 qubits
We now consider the case of  = 4 qubits.As before, we list the 2  = 16 states of the total system in increasing lexicographic order, as shown in column 2 of Table II.Again, this automatically orders the states by increasing eigenenergies of : the states in the first half of the list, of the form |0  2  3  4 ⟩, have eigenenergy −ℏ/2, while the states in the second half of the list, of the form |1  2  3  4 ⟩, have eigenenergy +ℏ/2.The occupation probabilities of the states   are now denoted by symbols in column 4 of Table II to guide the eye to more quickly recognize patterns, where ■ (1 − ) 4 ; ▲ (1 − ) 3 ; | (1 − ) 2  2 ; • (1 − ) 3 ; _  4 .Again,  ≡  1 is the initial occupation probability of the excited state for the target qubit.Roughly, the more vertices the symbol has, the higher the probability it represents.The 2  2 = 8 largest occupation probabilities can thus be represented by a set containing one ■, four ▲'s, and three out of the six |'s.The energy of the total system   for each state is given in column 3, which is relevant for determining permutations that maximally cool a target qubit with minimal work cost.
There are a number of permutations that will transform the probabilities in the first half of the list into the 8 largest probabilities, three of which are shown in Table II.The first is a permutation generated according to the PPA    , which permutes all the probabilities into non-increasing order.Note that there is a degenerate family of permutations that can be generated by the PPA.One such permutation, given in column 5 of Table II, features two cycles of length 2 (i.e. two swaps) given by |0011⟩ ↔ |1000⟩ and |0111⟩ ↔ |1100⟩.
The second permutation,   , given in column 7 of Table II, features one of a degenerate family of minimal work protocols, which achieves maximal cooling with minimal work cost.In short, after maximal cooling is achieved by moving the highest half of probabilities to the top half of the lexicographically ordered list, the minimal work protocol sorts the probabilities within each All states of the  = 4 qubit system listed in lexicographic order (column 2) with total state energy (column 3) and their initial occupation probabilities (column 4).Columns 5, 7, and 9 give various permutations, while columns 6, 8, and 10 give the final occupation probabilities of each state after the respective permutation.Occupation probabilities are represented by symbols, where ■ (1 − ) 4 ; ▲ (1 − ) 3 ; | (1 − ) 2  2 ; • (1 − ) 3 ; _  4 .(For better readibility the states that are not being displaced by the permutation are in grey).
half of the list separately.Within each half-list, the highest probability is assigned to the state with the lowest total system energy, the second highest probability is assigned to the state with the second lowest total system energy, etc.It turns out that in the case of  = 4 qubits, the PPA also belongs to the family of minimal work protocols, but this is not generally the case.The third permutation,   , given in column 9 of Table II, enacts what we call the mirror protocol.In the mirror protocol, states that have the target qubit set to 0 and have a total of  < /2 bits set to 0 are swapped with their mirror image (also, called the negative image).The idea is that these are the states in the top half of the lexicographically ordered list that have lower probabilities than their mirror-image state in the bottom half of the list.This is because a state with  < /2 bits set to zero will necessarily have fewer bits set to zero than its mirror image, and thereby have a lower occupation probability.These mirror-image swaps ensure that all states with the target qubit set to 0 are assigned a higher probability than their mirror-image state, which necessarily have the target qubit set to 1.In turn, this means the highest half of probabilities will reside in the first half of the lexicographically ordered list.The advantage of the mirror protocol is two-fold: (i) the ease with which one can automatically generate the maximally cooling unitary for any system size  and (ii) the protocol generates a single, unique cooling unitary for each system size , as opposed to the PPA and minimum work protocols which can generate a family of degenerate cooling unitaries.
In the mirror protocol for the case of  = 4, we seek states that start with 0 and have  < /2 = 2 total bits set to 0. The only state that adheres to these criteria is the state |0111⟩, which we swap with its mirror image: |1000⟩.Notice that it is not a minimal work protocol as the state in the first half of the list with highest total energy |0111⟩ is not assigned the lowest probability in the top-half of the list.Note, also, that the permutation  on  = 3 qubits, given in Table I is an instance of the mirror protocol, as well as a minimal work protocol.
To convince ourselves that all three permutations in Table II This is exactly the same expression found for the  = 3 case.Namely, adding a fourth qubit did not increase the cooling power.This is a special case of a more general result: There is no cooling gain in going from an odd  to  + 1. Adding a fourth qubit, however, has the adverse effect of increasing the complexity of the unitary operation needed to implement the cooling (in general, operators acting on larger Hilbert spaces are more complex).In fact, for a given system size , different permutations will carry different complexities in terms of their implementation in quantum circuits.Notice that the permutation   in Table II contains one permutation cycle of length 2 (i.e., a swap) which acts on all the qubits, while   contains two swaps, but each swap only acts on three out of the four qubits.Such characteristics of the permutation will alter the complexity of the final quantum circuit, and should therefore be considered from a practical standpoint when implementing dynamic cooling on quantum computers.Another crucial point is that distinct permutations that achieve maximal cooling are generally accompanied by distinct energy costs.For example, the work accompanying the permutation   (see Eq. 16 in the main text) is: Similarly, the work accompanying the permutation   is Note that the minimal work permutation costs half work of the mirror protocol.  is more energy efficient than   while achieving the same cooling power  ′ 1 ( 1 , 4).Hence   may be preferable when it comes to practical applications.
For ease of notation we set  ≡  1 and  ≡  ′ 1 .Note that in this notation, (1 − ) ≡  0 .For  = 2, we have In the second line we used the identity   =  −1  +  −1 −1 .In the fourth line we used a change of variable  =  − 1.In the sixth line we used that fact that  −1  We have: Using the identity we obtain: where we used the fact that the binomial coefficient is symmetric with respect to reflection about its point of maximum.
Using the identity where we used the reflection symmetry of the binomial coefficient and kept in mind not to count the mid value 2−2 −1 twice.From the above equation it follows that: Summing up: where we use Eq.(C4) in the last equality.Therefore: (C11) 10.Quantum circuit implementing a swap between states |01111⟩ and |10000⟩ using the Gray code shown in Eq.E1.Each wire represents a qubit in system.The circuit is comprised of 2 − 3 MCX gates, where  = 6 is the length of the Gray code and  = 5 is the number of qubits in the system.Open circles with a cross in the MCX gates are the NOT (i.e., Pauli-X) gate, closed circles imply the NOT gate is applied when the corresponding control qubit is in the |1⟩ state, while open circles imply the NOT gate is applied when the corresponding control qubit is in the |0⟩ state.
the sub-circuit for this swap we let  1 = 01111 and  2 = 10000 and define a Gray code from  1 to  2 , such as the following: (E1) Here, the length of the Gray code is  = 6.To construct the circuit for the swap, we insert one MCX gate to transform each bitstring to the subsequent one in the Gray code.After insertion of  − 1 MCX gates, the circuit will successfully transform an input state  1 to  2 .To implement the reverse transformation (since we wish to swap the two states), and uncompute any changes made to other input states not involved in the swap, it is necessary to add the first  − 2 MCX gates in reverse order.Thus, a quantum circuit implementing a swap between states with a Gray code of length  will contain 2 − 3 MCX gates.The quantum circuit implementing the the swap between  1 and  2 using the Gray code in given in Eq.E1 is depicted in Figure 10.
Once the quantum circuit has been built with MCX gates, it is necessary to decompose these complex gates into the native gates of the quantum computer (generally these include a two-qubit gate, such as the CNOT gate, and some set of single-qubit gates that render the native gate set universal).Ref. [47] describes how such MCX gates can be decomposed into a number of elementary gates (the CNOT gate, and arbitrary 1-qubit gates) that scales quadratically with system size .However, if we are not concerned about relative phases between the qubits being conserved, the number of elementary gates scales linearly with .For cooling, we are not concerned about the relative phases between qubits, rather just the populations of each state, and thus, this linearly scaling transformation can be used.
While the number of elementary gates needed for each MCX gate only scales linearly with , unfortunately, the total number of MCX gates in the circuit is expected to grow exponentially with  (since the number of permutations in the cooling unitary is expected to grow exponentially with ).This clearly poses a problem for near-term quantum computers with high levels of noise.However, since the amount of cooling scales better with system size in the low- regime, it may be sufficient to cool a target qubit with few enough auxiliary qubits to maintain reasonable circuit sizes on near-future quantum devices with lower levels of noise.It is also possible to suboptimally cool with at a fixed complexity (i.e., elementary gate count) as discussed in Section VI.
While it is difficult to derive an analytic expression for the minimal number of elementary gates required for optimal cooling, Figure 11 plots a (loose) upper-bound on the number of CNOT gates required for optimal cooling for various system sizes .The black curve plots the number of CNOT gates calculated using the Gray code method with mirror protocol, as described above.The red curve plots the number of CNOTs in circuits derived from circuit synthesis using quantum transpilers (such IBM's Qiskit transpiler and the BQSKit transpiler [43]), which take as input a unitary matrix and output a circuit.We emphasize that neither of these gate counts describes the minimal gate count for each system size.There is plenty of room for optimization in terms of selecting a cooling unitary with minimal circuit complexity (indeed, we know the mirror protocol is not optimal for complexity), as well as in terms of circuit transpilation techniques.They do, however, reproduce the expected exponential scaling of CNOT count with system size.We note that while the circuit transpilation gets extremely computationally expensive as  is increased (we could only go up to  = 11 in a reasonable amount of compute time), the CNOT count can easily be computed up to any  using the Gray code method.Therefore, while computing the circuit complexity with the Gray code method will not necessarily 11.Number of CNOT gates required for optimal dynamic cooling versus system size N.The black curve plots the gate count derived from the Gray code method described above, while the red curve plots the gate counts derived from using a circuit transpiler to synthesize the circuit from the input cooling unitary.
give the optimal complexity (as evidenced in Figure 11), it can be useful for quickly comparing complexities between different protocols for large .

FIG. 1 .
FIG.1.Schematic of the dynamic cooling of  identical qubits.Here, the target qubit is cooled at the expense of heating up the auxiliary qubits via the application of a global unitary operator .

FIG. 7 .
FIG.7.Initial (black) and final (blue) effective temperatures of various 3-qubit clusters on the IBM quantum computer after dynamic cooling.

FIG. 8 .
FIG.8.Comparison of final temperatures for optimal (solid curves) versus suboptimal cooling (dashed-dotted curve) versus noise probability .We compare optimal cooling with  = 3 qubits (solid blue),  = 5 qubits (solid green), and  = 9 qubits (solid black) to suboptimal cooling with  = 2 steps of cooling with clusters of size  = 3, for a total of  = 9 qubits (dashed-dotted black).An initial temperature of 15 mK is assumed, denoted with the dashed horizontal line.

1 FIG. 9 .
Rescaled work for -step suboptimal cooling with cluster size  = 3 (solid curves) and the associated optimal cooling with equal total system sizes (dotted curves) for total system sizes  = 27 (black) and  = 81 (red).

𝑁 2 − 1 = 𝑁 − 1 𝑁 2 for
C9) where we used   = 2−1  .Using Eq. (D2) we get: ln   ≃ 2 ln 2 − 2 ln 2 + ln  − ln 2 − ln √  − ln √  +  ln 4 = ln 2 − ln √  + ln States of the form |0  2 ...  ⟩, which are half of the total 2  states, have an eigenenergy of  equal to −ℏ/2, while states of the form |1  2 ...  ⟩ have an eigenenergy of  equal to +ℏ/2.Maximal cooling can thus be implemented by mapping the half of states with the highest occupation probabilities to the half of states with the lower eigenenergy   = −ℏ/2.Due to the large degeneracy in the spectrum of  (as well as degeneracy in the spectrum of ), there will be many distinct permutations , and hence many distinct unitaries , that achieve maximal cooling.Illustrative examples for  = 3, 4 are provided in Appendix A.
1 FIG. 4. Rescaled minimum work versus  1 for various system sizes .Dashed black line is Eq.19.
|⟩   [ℏ/2]   |   ()⟩     ()|  ()⟩    () |  ()⟩   () all perform maximal cooling, we can compute the probability of the excited state of the target qubit  ′ 1 after each transformation.In this case,  ′ By consulting what these constituent probabilities are after each permutation in TableII, one finds that in all cases: