Cooling toolbox for atoms in optical lattices

We propose and analyze several schemes for cooling bosonic and fermionic atoms in an optical lattice potential close to the ground state of the no-tunnelling regime. Some of the protocols rely on the concept of algorithmic cooling, which combines occupation number filtering with ideas from ensemble quantum computation. We also design algorithms that create an ensemble of defect-free quantum registers. We study the efficiency of our protocols for realistic temperatures and in the presence of a harmonic confinement. We also propose an incoherent physical implementation of filtering which can be operated in a continuous way.


I. INTRODUCTION
Ultracold atoms stored in optical lattices can be controlled and manipulated with a very high degree of precision and flexibility. This places them among the most promising candidates for implementing quantum computations [1,2,3,4,5] and quantum simulations of certain classes of quantum many-body systems [6,7,8,9,10,11,12,13,14,15]. Quantum simulation would allow us to understand physical properties of certain materials at low temperatures that so far have eluded a theoretical description or numerical simulation. However, both quantum simulation and quantum computation with this system face a crucial problem: the temperature in current experiments is too high. In this paper we propose and analyze several methods to decrease the temperature of atoms in optical lattices and thus to reach the interesting regimes in quantum simulations, as well as to prepare defect-free registers for quantum computation.
So far, several experimental groups have been able to load bosonic or fermionic atoms in optical lattices and reach the strong interaction regime [16,17,18,19,20,21,22,23,24]. The analysis of experiments in the Tonks gas regime indicates a temperature of the order of the width of the lowest Bloch band [17], and for a Mott Insulator (MI) a temperature of the order of the on-site interaction energy has been reported [18,25]. For fermions one observes temperatures of the order of the Fermi energy [26,27,28]. Those temperatures severely restrict the physical phenomena that can be observed and the quantum information tasks that can be carried out with these lattices.
One may think of several ways of cooling atoms in optical lattices. Since the process of loading atoms into the lattice may lead to additional heating [29] we focus here on methods that operate once the lattice potential has been raised. For example, one may sympathetically cool the atoms in the lattice using a second Bose-Einstein condensate [8,30]. A completely different approach is the filtering scheme of Rabl et al. [31]. It operates in the no-tunnelling regime, transferring atoms between optical lattices so as to create a configuration with one atom per site. Such a loading scheme can, however, originate holes due to imperfections in the original cloud. Our analysis of filtering in the presence of a harmonic trap has shown that these holes, which are preferably located at the borders of the cloud, result in a considerable amount of entropy [32].
Here we propose and investigate several cooling schemes which overcome the limitations of filtering. The first set of schemes uses discrete operations to make atoms in different sites interact, thus concentrating the entropy on some atoms which are then expelled from the lattice. Due to the similarity with quantum information processing, we term this kind of methods algorithmic cooling [46]. The second set of cooling methods combine filtering with either particle hopping or evaporative cooling techniques. All our cooling protocols are based on translation invariant operations (i.e. do not require single-site addressing) and consider the residual harmonic confinement present in current experiments. Although we will be mostly analyzing their effects on bosonic atoms, they can also be trivially generalized to fermions.
This work has several spin-offs. First of all, we have designed algorithms that very efficiently remove all residual defects from a Mott insulator, thus producing an ensemble of perfect registers for quantum computing [5]. Second, we propose how to create pointer atoms at the endpoints of a perfect quantum registers and show how these pointers can be used to tailor the register to a specific length. Finally, since filtering is an important ingredient of all our algorithms, we have also designed a new incoherent version of filtering procedure which is better adapted to our protocols than the adiabatic scheme from Ref. [31].
The paper is organized as follows. We start in Sect. II describing our system in terms of the Bose-Hubbard model and discussing the properties of current experimental states, such as densities, temperature and entropy. We also summarize the basic tools and figures of merit which underlie our cooling schemes. Sect. III is devoted to a particular tool, filtering. We describe the effect of filtering on the entropy and temperature for realistic initial states and study the optimal choice of ex-perimental parameters. In the last part we propose an implementation of filtering which is similar to a radiofrequency knife. Once we have described all tools, in Sect. IV we introduce several protocols for cooling to the ground state. This includes the algorithmic cooling in Subsect. IV A and two other protocols in the following subsections. Based on similar ideas, Sect. V presents a quantum protocol that produces an ensemble of defectfree quantum registers and shows how to create pointer atoms on these registers. We conclude this work with some remarks concerning possible variants and combinations of our cooling schemes. In Appendix A we present a detailed description of the numerical method that we use to simulate classically correlated states.

II. PHYSICAL SYSTEM
A. Bose-Hubbard model We consider a gas of ultra-cold bosonic atoms which have been loaded into a three dimensional (3D) optical lattice. This lattice is created by six laser beams of wave vector k = 2π/λ propagating along three orthogonal directions. If the laser light is off-resonant with any atomic transition, the AC Stark effect induces a periodic potential on the atoms of the form: (1) with a strength or "lattice depth" V 0 proportional to the dynamic atomic polarizability and the laser intensity. The Gaussian profile of the laser beams creates an additional harmonic confinement which can be compensated by additional magnetic or optical confining elements [17,33].
In the following we will mostly be concerned with onedimensional (1D) lattices. In other words we will assume that the lattice potential is so strong along two directions, V 0y , V 0z ≫ V 0x that tunnelling is only allowed along the third one. We will also assume that the confinement along all directions is still much stronger than the atomic interaction strength. Under these conditions, the atoms can be described using a single band Bose-Hubbard model (BHM) [34], which for a lattice of length L reads The parameter J denotes the hopping matrix element between two adjacent sites, U is the on-site interaction energy between two atoms and the energy b accounts for the strength of the harmonic confinement. Second quantization operators a † k and a k create and annihilate, respectively, a particle on site k, and n k = a † k a k is the occupation number operator. Since the tunnelling rate decreases exponentially with the trapping strength, V 0 , while the on-site interaction U remains almost constant [34], we have adopted this last value as the unit of energy for our work. Even more important, this exponential dependence allows one to reach the strong interaction regime U ≫ J. At U ≈ 11.6 J the ground state changes from a superfluid (SF) to a Mott insulator (MI) [16,34]. In the SF regime particles are delocalized over all lattice sites. In 1D without harmonic trap the SF ground state reads: where N is the particle number. In the MI regime particles are localized at individual lattice sites. For integer filling factors ν = N/L the ground state is given by: The presence of a harmonic trap typically leads to a wedding cake structure of MI states with different ν. For very shallow traps one obtains only a single MI phase, which has filling ν = 1 and which is centered around the bottom of the trap. In the Fock basis with respect to lattice sites this MI state can be written as: This state will be the target ground state for our cooling schemes.

B. Initial states
Throughout this paper we will work with 1D thermal states in the grand canonical ensemble, which are characterized by two parameters: the temperature kT = 1/β and the chemical potential µ. We are particularly interested in the no-tunnelling limit [47], J → 0. In this limit the Hamiltonian (2) becomes diagonal in the Fock basis of independent lattice sites: {|n −L/2 . . . n 0 . . . n L/2−1 } and the density matrix becomes a product of thermal states for each lattice site, This simplifies calculations considerably and for instance the von-Neumann entropy can be written as the sum over single-site entropies Let us now study thermal states of the form (6) in more detail. In Fig. 1 we have depicted graphically all relevant energy scales in the no-tunnelling limit. The chemical potential determines the size of the cloud via the relation µ = bk 2 µ , where k µ denotes the site at which n kµ = 0.5. For µ ≈ U singly occupied sites at the border of the cloud become energetically degenerate with doubly occupied sites in the center.
The analysis of recent experiments in the Mott regime [17,25] reveals a substantial temperature of the order of the on-site interaction energy U . This result is consistent with our own numerical calculations and translates into an entropy per particle s := S/N ≈ 1. The particle number in a 1D tube of a 3D lattice as in [16] typically ranges between N = 10 and N = 130 particles. A representative density distribution corresponding to such initial conditions (with N = 65) is plotted in Fig. 2a. In this example the inverse temperature is given by βU = 3.1. Since our cooling protocols lead to even lower temperatures, we will from now on focus on the low temperature regime, βU ≫ 1. Moreover, we will only consider states with at most two particles per site, which puts the constraint µ 2U − 1/β on the chemical potential. Such a situation can either be achieved by choosing the harmonic trap shallow enough or by applying an appropriate filtering operation [31].
Under the assumptions e βU ≫ 1 and µ − U/2 b + 1/(2β) we have shown in [32] that the density distribution of the initial state (6) can be separated into regions that are completely characterized by fermionic distribution To be more precise, at the borders of atomic cloud, bk 2 ≫ µ − U/2 + 1/(2β), the mean occupation number is given by n k ≈ n I (k) with n I (k) := f k (b, β, µ), while in the center of the trap, bk 2 ≪ µ − U/2 − 1/(2β), one has: n k ≈ 1 + n II (k) with n II (k) := f k (b, β, µ II ) and effective chemical potential µ II := µ − U . Note also that the underlying MI phase in the center of the trap is well reproduced by the function n I (k) , which originally has been derived for the border region. As a consequence, the density distribution for the whole lattice can be put in the simple form: n k ≈ n I (k) + n II (k), which corresponds to two fermionic phases I and II , sitting on top of each other [ Fig. 2b]. In other words, the initial state of our system can effectively be described in terms of non-interacting fermions, which can occupy two different energy bands I and II, with dispersion relations ε I = bk 2 and ε II = bk 2 + U , respectively [ Fig. 3]. So far this fermionic interpretation is just based on the properties of the density profile of thermal states in the MI regime. In order to put this fermionic picture on more general grounds one can fermionize the BHM (2) directly. In [32] we have shown that the dynamics at finite J can effectively be described by the following fermionic Hamiltonian: Here, the fermionic operators c k and d k refer to energy band I and II, respectively. This effective description is self-consistent as long as the probability of finding a particle-hole pair is negligible, i.e. c k c † k d † k d k ≈ 0. We have shown above that this fulfilled for thermal states at low temperatures and at negligible tunnelling. Later we will see to what extent the validity can be extended to finite tunnelling rates.
In general, the state of the system will however exhibit both classical and quantum correlations. In this case we have to represent the state in terms of a Matrix-Product State (MPS) [35,36]. This concept can also be used to compute 1D thermal states numerically [37]. Based on this method one can, for instance, estimate how the temperature of a 1D tube changes when passing from the MI to the SF regime. Assuming that the process is thermodynamically adiabatic one obtains the new temperature with the following procedure. We tune the parameters of a thermal state in the SF regime until the expectation values for the entropy and particle number match the corresponding values of the initial thermal state in the MI regime. This can be illustrated with the following example. Starting from a representative state in the notunnelling regime with s = 1 and N = 65 we have to tune FIG. 3: Effective description of thermal states in the notunnelling limit in terms of independent fermions occupying two energy bands. The dispersion relations are εI = bk 2 and εII = bk 2 + U , where k denotes the lattice site and U is the interaction energy. Increasing the harmonic trap strength from b to b ′ increases the chemical potential to µ ′ so that the population of the upper band becomes energetically favorable. In the bosonic picture this process corresponds to the formation of doubly occupied sites.
the temperature to kT = 0.46U = 2.9J at J/U = 0.16 (V 0 = 5E r ) in order to leave s and N unchanged [ Fig. 2a]. Hence, in the SF regime one faces a substantial temperature of the order of the width of the lowest Bloch band. Note, however, that this is only a lower bound to the true temperature, because our approach does not include any sort of heating processes induced by the adiabatic evolution.

C. Entropy as figure of merit
We will show below that algorithmic protocols are suited both for ground state cooling and the initialization of quantum registers. The goal in both cases is to create a pure state under the constraint of keeping a large number of particles. Given the fact that our protocols converge to the desired family of states, we can measure the performance of the protocol by computing the mixedness of the state. This mixedness can in turn be quantified using the von-Neumann entropy S (7). In some cases, such as for finite hopping J we will not be able to compute the von-Neumann entropy efficiently. We will then refer to the Rényi entropy which is a lower bound S 2 ≤ S and can be evaluated using MPS [Appendix A]. In order to assess the efficiency of a protocol in achieving our objectives we define two figures of merit. The ratio of the entropies per particle after and before invoking the protocol, s f /s i , quantifies the amount of cooling. The ratio of the final and initial number of particles, N f /N i , quantifies the particle loss induced by the protocol. Note that these figures of merit can sometimes be misleading and should therefore be applied with care. In the case of ground state cooling the entropy is only a good figure of merit if the state of the system after the cooling protocol, ρ f , is close to thermal equilibrium. If this is not fulfilled, we use an effective thermal state, ρ f → ρ ef f , with the same number of particles, N , and energy, E, to compute the figures of merit. For instance, the final entropy is given by S(ρ ef f ). Given that our system can somehow thermalize, these will be indeed the properties of our final state. Finally, it is important to point out, that other variables, like energy or temperature, are not very well suited as figures of merit, because they depend crucially on external system parameters like the trap strength.
Note also that in the case of quantum registers it can be erroneous to assume that a finite value of the final entropy implies the existence of defects. For example, we will propose a protocol below that generates a state which his an incoherent superposition of perfect quantum registers with varying length and position. This state has some residual entropy but it is ideal for quantum information processing.

D. Basic operations
All our cooling protocols rely on a set of translationally invariant quantum operations that can be realized in current experiments with optical lattices. These operations are: (i) Particle transfer: Depending on the occupation numbers, an integer number of particles is transferred between internal states |a and |b . This process can be described by the unitary operation where x is an integer and |m, n are Fock states with m and n atoms in internal states |a and |b , respectively. Note that for this unitary operator it holds U m+x,n−x m,n = U m,n m+x,n−x . Certain operations, like U 0,2 2,0 or U 0,1 1,0 , have already been demonstrated experimentally in an entanglement interferometer [38].
(ii) Lattice shifts: We denote by S x the operations which shift the |b lattice x steps to the right. For example, S −1 transforms the state ⊗ k |m k , n k k into ⊗ k |m k , n k+1 k . This operation can be realized in statedependent lattices by adjusting the intensity and polarization of the laser beamsa [1,2,3,4,39].
(iii) Merging and splitting of lattice sites: Making use of superlattices [40] one can either merge adjacent lattice wells or split a single site into a pair of two sites.
(iv) Emptying sites: All atoms in internal state |b are removed from the system. We denote this operation by E b . It transforms the state ⊗ k |m k , n k k into ⊗ k |m k , 0 k . Experimentally, this can be achieved either by switching off the lattice potential acting on |b or by coupling this state resonantly to an untrapped state. The first proposal for a coherent implementation of the operation F 1 [ Fig. 4] appeared in Ref. [31]. More recently, a scheme based on resonant control of interaction driven spin oscillations has been put forward [41]. In [32] we have proposed an ultra-fast coherent implementation of F 1 relying on optimal laser control. In this paper we will propose an incoherent realization of filtering, which has the special virtue that it can be applied continuously.

III. FILTERING
We now study the filtering operation F 1 in Eq. (12) in more detail. This operation is especially relevant for cooling because it produces a state close to the ground state of the MI regime. Thus, it can serve as a benchmark which has to be beaten by alternative cooling schemes. In this section we begin summarizing the analytical results about filtering from Ref. [32] and discuss the conditions to reach optimal cooling efficiency. We close this section with a possible experimental implementation of filtering which is based on an incoherent coupling of atoms to a continuum of untrapped states.

A. Theory
In Fig. 5 we have depicted the particle and entropy distributions before and after one filtering step, F 1 . One observes that a nearly perfect MI phase with filling factor ν = 1 is created in the center of the trap. Defects in this phase are due to the presence of holes and concentrate at the borders of the trap. This behavior is reminiscent of fermions, for which excitations can only be created within an energy range of order kT around the Fermi level. This numerical observation can easily be understood with our previous analysis of the initial state. Filtering removes phase II, which is due to doubly occupied sites, and leaves the fermionic phase I unaffected [ Fig. 5a].
The fermionic picture allows us to to find a simple estimate for the final entropy. From the expression for the density of states, g(E) = 1/ √ bE, one immediately obtains that the number of states (lattice sites) within an energy range of 2kT around the chemical potential µ is given by: This parameter is characteristic for the tail width of the density distribution [ Fig. 5] and will in the following be central for the analysis of our protocols. Since the final entropy must be localized at these sites we expect that S f ≈ ∆. Indeed, the rigorous derivation in [32] yields S f = σ I ∆ = σ I N f /βµ, with σ I = π 2 /(6 ln 2). Here, we have introduced the final number of particles N f , which can be well approximated by the parameter N : Hence, the final entropy per particle can be written as: which reflects the size of the region of defects ∆ in units of the system size N . For the special choice µ = U (or equivalently n 0 = 1.5) one finds the following expressions for our figures of merit [32]: with numerical parameters σ II ≈ 2.935 and η II = (1 − √ 2) √ πζ(1/2) ≈ 1.063. This result shows that filtering becomes more efficient with decreasing temperature, It is important to note that the state after filtering is not an equilibrium state, because it is energetically favorable that particles tunnel from the borders to the center of the trap, thereby forming doubly occupied sites. However, the system can easily be brought to equilibrium by adapting the trap strength. While tunnelling is still suppressed, one has to decrease the strength of the harmonic confinement to a new value b ′ , This observation shows that it is misleading to infer the cooling efficiency solely from the ratio T ′ /T , because it depends crucially on the choice of b ′ .

B. Optimal initial conditions
Let us now study how the cooling efficiency of filtering depends on the initial state variables b, β and µ. Since the initial temperature is dictated by the experimental setup, we consider only the trap strength b and the chemical potential µ (which can be varied via the particle number N ) as free parameters. Since our figures of merit are computed per particle we expect a weak dependence on N and therefore focus on the b dependence. This can be studied in terms of the mean central occupation number n 0 . For instance, in the special case n 0 = 1.5, we have obtained the analytical expression (16) for the ratio s f /s i , which exhibits a strong temperature dependence. In contrast, for n 0 = 2 one finds that the cooling efficiency becomes independent of the temperature: s f /s i ≈ 1/ √ 3 for βU ≫ 1. This can be understood from the presence of a ν = 2 MI phase in the center of the trap, which does not contain entropy. In the opposite regime n 0 = 1 the protocol has no cooling effect at all. For general n 0 , we have computed the figures of merit numerically exact [ Fig. 6]. One observes that for initial entropies s i 1 a central filling n 0 ≈ 1.5 is always close to optimal.
A special situation arises when we approach zero temperature. As shown in Fig. 6d, the quantity s f /s i changes periodically when decreasing the trap strength b/U . In this regime an additional MI phase with ν = 2 is present in the center of the trap and the splitting of the entropy between the lower and upper fermionic band depends dramatically on the value of the harmonic confinement. In particular, one can find trap strengths (e.g. Fig. 6d) at which the final entropy approaches zero and only a comparatively small number of particles is lost (N f /N i = 0.9). For shallower traps the ν = 2 MI phase starts to collapse. In this regime it becomes very difficult to find proper thermal states which match the initial conditions in terms of entropy and particle number. Finally, the central occupation approaches one and the cooling protocol leaves the initial state unchanged.
It is also interesting to study the efficiency of filtering when acting on the full 3D lattice. In this case only a small subset of 1D tubes will satisfy the optimal initial conditions. Using the parameters of Sect. II (s i = 1 and N = 2 10 5 ) we obtain that s f /s i = 0.78 and N f /N i = 0.62. The relatively large particle loss results from the high densities in the center of the trap. This is also the main reason why the protocol performs worse compared to a 1D tube with s i = 1 as shown in Fig. 6b.

C. Experimental realization of continuous filtering
Filtering constitutes a fundamental tool in all our cooling protocols. As shown before [31,32,41] filter operations can be realized by coherently transferring particles between two internal states and then removing these particles. What we will show below is that both processes can be combined producing an incoherent evolution that gives rise to the completely positive map F 1 (12). The experimental procedure for achieving the map F M is very simple. We will use two atomic states: one atomic state shall be confined by an optical lattice, deep in a MI phase, while the other one will be in a continuum of untrapped states which are free to escape the lattice. We will couple the trapped and untrapped states with two Raman lasers which have a relative detuning of the order of the interaction energy, δ ∼ M U [Fig. 7]. As long as the coupling is active, the lasers will depopulate lattice sites which have too many atoms, n > M , while leaving other sites untouched [48].
If the untrapped states are such that they have few atoms and they are quickly expelled from the trap (for instance by a magnetic field gradient), these states will behave for practical purposes like a thermal bath in a vacuum state. If the coherence time of this bath, physically determined by the time free atoms spend close to the trapped ones, is very short compared to the Rabi frequency Ω of the Raman transition, we will be allowed to write a master equation for the trapped atoms. The solution of this equation converges at large times to the desired filtered state, e.g. M = 1 for F 1 .
Conceptually, this mechanism is equivalent to the frequency knife from evaporative cooling, where atoms containing too much kinetic energy are expelled from the trap in order to lower the temperature. In our case, however, it is the interaction energy we get rid off and, as a side effect, we make the filling of the lattice more uniform.
Compared to the optimal control scheme in [32], the operation of this much simpler method is not very fast. From the solution of the master equation it follows that states with occupation n > 1 decay after a time which is of the order of the inverse Rabi frequency, t 1 ∼ 1/(nΩ). The main source of errors arises from the non-resonant coupling of the n = 1 state with the reservoir. The probability of a defect (empty site) will approximately be given by: p 0 ≈ Ω 2 /U 2 . Hence, for p 0 = 10 −4 we get an operation time t 1 ∼ 100/U which is comparable to the time scale of the adiabatic scheme [31]. However, our incoherent scheme has two big advantages. First, it can be applied continuously. Second, it does not put any constraints on the interaction energies of the two species. This holds under the assumption that untrapped atoms are expelled so quickly from the trap, that they do not interact significantly with the trapped atoms.

IV. GROUND STATE COOLING
We have seen that the residual entropy after filtering is concentrated at the borders of the density distribution. Particles on these sites are also the only source of energy excitations because all doubly occupied sites have been removed. In the following we will propose several protocols which selectively remove particles at the borders, thereby bringing the system closer to its ground state.

A. Algorithmic Cooling
This is a cooling method based on a set of discrete steps which resemble a quantum information processing protocol. The steps are the following: (i) We begin with a cloud in thermal equilibrium in the no-tunneling regime, having two or less atoms per site, all in internal state |a . This can be ensured with a filtering operation F 2 . (ii) We next split the particle distribution into two, with an operation U 1,1 2,0 [ Fig. 8a]. (iii) The |a and |b atoms are shifted away from each other until both ensembles barely overlap. We then begin moving the clouds one against each other, removing in the process atoms from doubly occupied sites. Experimentally, this can be achieved by introducing a third internal level |c and applying the unitary operation U 0,0,2 1,1,0 in generalization of (11). This sequence sharpens the density distribution of both clouds and it is iterated for a small number of steps, of order ∆ [ Fig. 8c]. (iv) The atoms of type |b are moved again to the other side of the lattice and a process similar to (iii) is repeated [Fig. 8d]. (v) Remaining atoms in state |b can now be removed. Notice that the final particle distribution will never be perfectly sharp [ Fig. 8d] because this protocol is intrinsically limited by thermal and quantum fluctuations in the initial state. Qualitatively, in order to remove a particle of type |a from the tail, it has to coincide with a particle of type |b from the other cloud. Errors arise when particles do not coincide and are thus proportional to the fluctuations of the density. If we want to clean about ∆ sites or remove ∆ particles, the errors will be O( √ ∆). In the limit βU ≫ 1, we obtain the following scaling behavior for the final entropy per particle where N (14) is the number of particles in |a after step (ii) and N i is the initial particle number. This simple estimate is already in good agreement with the exact result in [32], which takes also particle losses into account. Our findings show that the algorithmic protocol becomes more efficient with increasing initial particle number. This is in contrast to filtering, for which s f is independent of N i . We have verified numerically the scaling (18) [Fig. 9b]. The numerical simulation is by no means trivial because the protocol establishes classical correlations both among lattice sites and also among internal states. To reproduce these correlations we have resorted to representing the classical density matrices using MPS [Appendix A]. These numerical simulations show that the final density distribution is very close to a thermal distribution [ Fig. 9a]. Indeed if we apply one or few iterations of the protocol, the final state will still contain some holes on the tails and be close to thermal equilibrium, as the computation of the Rényi entropy entropy s 2,f (10) be-fore and after equilibration shows [ Fig. 9a]. On the other hand, for a large number of iterations, the final density matrix will be an incoherent superpositions of perfect uniform MI states which differ on their length and position [Similar to Fig. 12]. These states are suitable for quantum computation but are far from equilibrium.
It is clear that we could cool the atoms to the ground state with 100% efficiency if the density distributions of |a and |b atoms were perfectly correlated. It seems also that we can improve the performance in realistic situations by removing the classical correlations which are established during the process [Fig. 9a]. We also increase the performance if we break the correlations between the atomic species at the end of the protocol while leaving the inter-site correlations untouched. Each iteration of the protocol would then reduce the total entropy by a factor of √ 2 [32]. From an experimental point of view, this means that the algorithm will work better when using multiple independent clouds to clean each other. These clouds may come from loading the lattice with atoms in different internal states, from splitting the lattice into multiple condensates, or simply by using the clouds from different 1D tubes to clean each other. . . Many other possibilities can be conceived. The protocol can be further improved by selecting specific particle number subspaces from the density matrix before restoring thermal equilibrium (see Sect. V for details).
Note also that we need not get rid of the atomic cloud in state |b . For appropriate initial trap parameters, it is preferably to move this cloud back to the center and to pump all atoms back into internal state |a . The resulting state will exhibit a MI shell structure with two plateaus at densities n = 1 and n = 2. This is a simple way to engineer ground states of tighter traps. Finally, let us remark that this protocol will only correct inhomogeneities along one direction. In a real experiment the protocol should be repeated along the other three directions.

B. Non-algorithmic cooling schemes
We have seen that algorithmic schemes perform much better than simple filtering. Here, we propose two nonalgorithmic ground state cooling schemes which combine filtering with other quantum operations available in optical lattices.

Sequential filtering
The basic idea of this protocol is to first transfer particles from the tails of the cloud to the center of the trap, to form doubly occupied sites, and then to remove these particles with some filtering operation. To be more precise, one has to iterate the following sequence of operations: (i) Allow for some tunnelling while the trap is adjusted in order to reach a central occupation of n 0 ≈ 1.5.  This protocol has been analyzed in detail in [32] and the results can be summarized as follows. If one allows for equilibration then a state asymptotically close to the ground state can be reached after only a few filtering cycles. Without equilibration the final entropy is limited by the initial probability of finding a defect in the center of the trap.
Alternatively, one can think of an implementation of sequential filtering that operates at fixed but non-zero hopping rate. Such a protocol would clearly profit from the fact that adiabatic changes of the lattice potential, which lead to additional heating, are not required. A key ingredient is the generation of doubly occupied sites in the center. One has to fix the hopping rate at a value, at which there exists a coupling between doubly occupied sites at the center and singly occupied sites at the borders. Our analysis of the BHM in terms of an effective two-band fermionic model [32] shows, however, that this occurs typically only deep in the SF regime for J/U 0.5.
We have studied the sequential filtering at fixed J numerically, computing the cooling efficiency as a function of the hopping rate. The results are shown in Fig. 10. For high temperatures the ratio s f /s i is rather independent of J and we achieve some cooling. For very small temperatures, kT 0.01U , it changes dramatically with J. This is a clear signature of the quantum phase transition, which is expected to occur at J c ≈ 0.085U in the thermodynamic limit. While for J < J c the state is well described in terms of independent wells and filtering works very efficiently, for J > J c particles are delocalized over the lattice, and filtering causes heating rather than cooling. Summing up, since we require a significant value of the hopping, J/U ∼ 0.5, this second variant of sequential filtering can only be used as an initial step, when temperatures are still comparatively high.

Filtering combined with frequency knife
Particles located at the tails of the density distribution can also be removed with a method similar to evaporative cooling or a frequency knife. The idea is to make use of inhomogeneous on-site energies and to choose the detuning δ of a radiation field in such a way that only atoms located at specific lattice sites are resonantly coupled to another internal state. Using a magnetic field gradient it has been demonstrated that individual lattice sites can be resolved within an uncertainty of about five sites [42]. In our case the spatial dependence of the on-site energies is naturally provided by the harmonic trapping potential. However, in order to make use of this inhomogeneity one has to ensure that the atoms are coupled to an internal state which responds differently to the ac-Stark shift induced by the lattice laser beams. Experimentally, this can, for instance, be achieved with a setup similar to the one for the creation of spin-dependent lattices [1,4,39]. There, one atomic species is trapped exclusively by σ − polarized laser light, whereas the other species is trapped predominantly by σ + polarized light. Hence, an optically untrapped internal state can simply be realized by using only σ + polarized laser light for creating the optical lattice.
Let us now estimate the requirements on the the Rabi frequency Ω of the transition depending on the trap strength b. For simplicity, we consider two internal states, and a configuration for which the excited state exhibits zero on-site energy at each lattice site. We are interested on removing particles which are typically located at a distance k µ from the center of the trap. Hence, for resonant coupling of these atoms we must choose the detuning to be |δ| = bk 2 µ = µ. Particles at sites k µ + ∆k feel an effective detuning δ ef f (∆k) ≈ bN ∆k, where N = 2k µ denotes the particle number (14). The probability that a particle at site k µ + ∆k is transferred to the excited state is then given by: In order to locally address the site k µ reasonably well one has to demand: Ω bN . For typical harmonic trapping frequencies in the MI regime, ω ho = 8b/( mλ 2 ) 2π × 65 Hz [16], and N = 50 this translates into Ω 1 kHz. Experimentally it should therefore be feasible to resolve individual sites with an uncertainty of a few lattice sites and thus to sharpen the density profile within this uncertainty. Note that this scheme clearly profits from a large number of particles per tube. Another advantage is that it preserves the spherical symmetry of the density distribution in a 3D lattice.
Finally let us remark that this method can easily be incorporated in the continuous filtering scheme [ Fig. 7] proposed in Sect. III, since there the atoms are also coupled to untrapped states.

V. ALGORITHMIC COOLING OF DEFECTS IN QUANTUM REGISTERS
A perfect one-dimensional quantum register is a connected array of commensurately filled lattice sites. For most purposes the filling factor is ν = 1. This state appears naturally as the unique ground state of the BHM in the no-tunnelling limit and in the presence of harmonic confinement. Hence, any efficient ground state cooling protocol will produce a good quantum register. Moreover, we have seen in the previous section, that algorithmic protocols can be used to create an ensemble of quantum registers rather than a unique one (see also [5,32]). In this section we will propose and analyze two alternative algorithms for the creation of a quantum register ensemble. As compared to previous proposals [5,32] these schemes require only a small number of operations, which makes them more appealing for experimental implementation. The first protocol produces registers with filling ν = 1, whereas the second protocol is optimized for filling ν = 2. In addition, we propose how to transform atoms at the endpoints of each register into "pointer" atoms, which then enables addressing individual lattice sites. This can be used, for instance, to create registers of equal length. Even more important it offers the opportunity to perform ensemble quantum computation in this system.
We start from a rather cold cloud, βU ≫ 1, which has been subject to fundamental filtering operations F M (12). The result is an almost perfect MI in the center of the trap with some residual defects (or holes) which are predominantly localized at the borders [See Fig. 5]. Since we operate solely in the no-tunnelling regime these defects cannot redistribute nor evaporate. Our goal is to remove all these defects and we will achieve it by applying nearest-neighbor quantum gates which simulate inelastic collisions between particles and holes. Simply put, whenever a particle sits next to a hole, the particle will be annihilated. This process is analogous to spin flips in ferromagnets and the formation of domains of equal magnetization. Thus, these algorithmic schemes can also be understood as controlled equilibration and cooling of defects.
For all protocols that will be proposed below, the resulting state is a mixture of perfect (up to defects in the central MI phase) quantum registers, which differ only in their length and lateral position. The entropy of this state will be of the order S f ∼ log 2 ∆, where the parameter ∆ (13) quantifies the translational uncertainty in the initial density distribution. Note, however, even though the final entropy is very small, these protocols are typically not suited for ground state cooling. The reason is that the final state is far from thermal equilibrium. Numerically we find that the value of the entropy after equilibration is comparable to the value before invoking the protocol. However, if one makes use of the pointer atoms to select only registers with a specific length then these protocols indeed lead to cooling. The operation sequence is illustrated in Fig. 11. We first merge oddly aligned pairs of sites using a superlattice [43,44] and empty sites with only one atom. The sites are again split and the original lattice is restored. This operation is then repeated several times, alternating between even and oddly aligned sites, until the total entropy reaches a minimum.
We have analyzed the performance of this protocol under realistic conditions using the MPS description to compute the at most classically correlated density matrix [Appendix A]. In Fig. 12 we plot the typical density distribution after different steps of the protocol. A single step changes very little the density but decreases dramatically the entropy (From S 2 = 17.5 to S 2 = 7.9). Indeed, the first steps account for the elimination of most defects. After a few iterations the value of the total entropy starts to saturate because the density matrix has collapsed to a classical ensemble of (almost) defect-free quantum registers [ Fig. 12]. As a consequence of the protocol, the number of atoms per register must be even, which leads to steps in the final density profile [ Fig. 12].
Clearly the fix point of this protocol would be a state with no particles at all. Let us now estimate how many iterations M of the protocol are required to reach a state with reasonable particle number and tolerable defect probability. First, we have to point out that there are two sources of defects: (i) holes in the central MI phase and (ii) particles at the borders which are disconnected from the central MI phase and which have not yet been erased by the protocol. The probability for defects of the first kind is negligible in the limit of low temperatures βU ≫ 1 [see Sect. II B]. Defects of kind (ii) can be assessed by the following observation: In order to erase a connected array of M particles, which is separated by at least one empty site from a central MI state, one requires exactly M/2 iterations of the protocol. The probability of finding defects after M iterations is then given by the probability of having states with array size larger than Stopping the algorithm at iteration 3 gives a minimum in the entropy per particle: s2 = 0.095. Upon reaching thermal equilibrium this value has increased to s2 = 0.265 (s = 0.36) which is comparable to the value after filtering. However, when selecting the subspace containing only registers of length N = 48 then the protocol leads to cooling. The entropy per particle in thermal equilibrium will then be given by s = 0.17. Right: The step like structure of the density distribution allows one to deduce the states which contribute significantly to the final density matrix. The states can be classified according to their particle number and their lateral position.
M in the initial density matrix. This probability can easily be computed numerically exact from the distribution (8). The derivation of closed expressions is however difficult. Nevertheless, one can get a good estimate for the optimal number of iterations based on the following argument. The characteristic tail width of the initial particle distribution (8) is ∆ (13). Hence, the occurrence of arrays of size M = 2∆ is already very unlikely. This implies that after roughly ∆ iterations of protocol 1 we expect to have registers with negligible defect probability. This can be illustrated with an example. For the initial state in Fig. 12 we have ∆ = 10 and the relative difference in the total entropy after the seventh and eighth iteration has reduced to 10 −4 , which implies a defect probability of the same order of magnitude. The typical size of the registers after ∆ iterations is roughly N − 2∆, where N (14) is the characteristic size of the initial MI state. This implies N ≫ ∆, otherwise no particles remain in the system. Since ∆ ≈ N /βU , one has to require the low temperature regime βU ≫ 1.
As mentioned above, the algorithmic ground state cooling protocol of the previous section can also be used for the creation of quantum registers. However, the ground state cooling algorithm involves O(N ) operations as compared to O(∆) operations of the current protocol.
Note also that the example presented in Fig. 12 indicates that this protocol does not lead to cooling. After restoring thermal equilibrium, the entropy has reached a value comparable to the one after filtering. We have re- peated this analysis for various initial conditions and our results confirm this observation. However, if one selects a particle number subspace with N being larger than the mean value than our protocol can indeed be used for cooling. According to the data in Fig. 12 the entropy per particle can be reduced by roughly 50% as compared to filtering.

B. Protocol 2
We start from a state which contains only empty or doubly occupied sites. This can be achieved by applying filtering operation F 2 , followed by U 0,1 1,0 and E b . The protocol is depicted in Fig. 13. We begin with transferring one particle per site to state |b . Then the |b -lattice is shifted one site to the left and the same operation as before is performed. Afterwards one empties single occupied sites. This procedure allows to remove defects in a correlated way. Occupied sites which have an unoccupied site to the right are emptied. Since the probability of finding particles in the center is close to one, the central MI is preserved except for losses at the right border. This procedure is repeated until all particle disconnected from the central MI phase become annihilated and only perfect MI phases in the center remain. One step of the protocol can be summarized in the following sequence of operations: U 1,1 2,0 , S 1 , U 1,1 2,0 , E b , U 0,1 1,0 , E b . Following the discussion of protocol 1, this sequence has to be applied approximately 2∆ times, where ∆ denotes again the characteristic tail width of the initial particle distribution. The factor two stems from the fact that at each step of the protocol the size of "particle islands" in the tails, as well as the central MI phase, is reduced only by one as compared to two in protocol 1. The final density matrix looks very similar to the one after protocol 1, with the difference that also registers with odd number of atoms appear.

C. Pointer atoms and register length control
Given an ensemble of quantum registers we will now show how to create pointer atoms. To be more precise, our goal is to selectively transfer the two end atoms of each register to a different internal level. These pointer atoms enable single-site addressing which can used to engineer registers of specific length.

Creation of pointer atoms
Our scheme relies on the same set of operations that is used in current experiments for entangling atoms located at different lattice sites [4]. We consider quantum registers with one atom per site. Initially all atoms are in internal state |a . A Hadamard transformation puts the atoms in the coherent superposition state (|a + |b )/ √ 2. One then shifts the |b -lattice one site to the right and waits an appropriate time until the on-site interaction between species |a and |b yields a collisional π-phase. This means that on each site the state |a |b is transformed into −|a |b . After two lattice shifts to the left one waits again until a π-phase has built up. Then the lattice is shifted back to its original position and a second Hadamard operation is performed. The resulting state is a again a product state but the end atoms have been promoted to level |b . These atoms can be considered as pointer atoms because they mark the beginning (and the end) of each register and can thus be used to access every site within the register in a deterministic way. In practice, only one pointer atom is needed. The second pointer atom, e.g. the one on the left, can easily be removed by applying the following operation sequence: S −1 , U 2,0 1,1 , E b , U 1,1 2,0 .

Manipulation of register length
The pointer atoms can now be used to create an ensemble of registers of fixed length, a feature which is desired for quantum computation and which can even lead to cooling. We first present a protocol that requires only two internal states of the atoms. Then we show that the algorithm can be simplified considerably when a third internal level is included. In both cases we start from a situation where all registers have one pointer atom in state |b which is located at the right most occupied site of the register.
Protocol based on two internal states: The central idea is to remove atoms selectively from the system by transferring them to the pointer level |b . We first show how to discard all registers which are shorter than a desired length M . We start with the sequence: U 0,2 1,1 , S −M , U 2,0 0,2 , E b . This ensures that registers of length N ≥ M are protected against further modifications, because their pointer atoms have been removed. Next we promote atoms in doubly occupied sites to the pointer level and shift this (two-atom) pointer one site to the right. If the pointer hits an occupied site then one atom in |a will be removed. Iteration of this process removes all atoms on the right of the current pointer position.
To be more precise, one has to iterate M − 1 times the sequence: U 0,2 2,0 , S 1 , U 2,0 0,2 , U 2,1 1,2 , E b . In a last step the remaining doubly occupied sites are emptied via U 0,2 2,0 , E b . After creating new pointer atoms the minimal register length is given by M ′ = M − 2. Again we keep only the pointer atom on the right side. Let us now show how to shorten registers of length N > M ′ to length M ′ . We first apply the sequence: S −1 , U 0,2 1,1 , S −M ′ , U 1,1 0,2 , S −1 . For the target register of length M ′ this merely transfers one atom from the right end of the register to the left end. For larger registers one obtains a two-atom pointer that can now be used to select and discard all atoms located left of the pointer. This can be accomplished by iterating the sequence: U 2,1 1,2 , E b , U 0,2 2,0 , S −1 until all registers with appreciable weight in the density matrix have been shortened to the desired size.
Protocol based on three internal states: This protocol is based on the operation U 0,1,1 1,1,0 , which transfers an atom from |a to |c , given that a pointer atom is present. This way one can use the pointer as a "marker", which allows one to produce registers of desired length M in a very simple manner. One first marks an array of M atoms and then discards all atoms which have remained in level |a . The algorithm can be summarized as follows: apply the sequence S −(M−1) , U 2,0 1,1 , E b , U 1,1 2,0 , U 0,1,1 1,1,0 ; iterate M − 1 times the sequence S 1 , U 0,1,1 1,1,0 ; finally apply the operation E a .

VI. CONCLUSION
We have proposed various ground state cooling schemes that allow to reduce the temperature in current optical lattice setups considerably. In particular, we have presented algorithmic cooling schemes which combine filtering with concepts inspired by ensemble quantum computation. We have shown that these algorithmic protocols can be designed both for ground state cooling and the creation of defect-free quantum registers. We have tested and analyzed our protocols under realistic experimental conditions, with special emphasis on the presence of a harmonic trap. We found that the residual entropy of all our protocols depends crucially on one parameter which accounts for the translational uncertainty of the initial particle distribution.
A special virtue of our schemes is, that they rely on general concepts which can easily be adapted to different experimental situations. For instance, little modifications ensure that our protocols can be applied both to bosonic and fermionic systems. A second advantage of our protocols is, that they are designed for the notunnelling regime and hence do not necessarily require equilibration processes induced by particle hopping and elastic collisions. Hence, they allow to approach the ground state of the no-tunnelling regime in very short times.
In this sense, the collection of cooling schemes presented in this article can be considered as a toolbox which is tailored for cooling atoms in optical lattice setups. The tool (or combination of tools) which is best suited for a given purpose, can be chosen according to the characteristic features of a specific experimental setup. For instance, in the case of large systems at high temperatures, one can think of combining filtering with the frequency knife method which is then followed by the algorithmic ground state cooling protocol. Or ground state cooling can be combined with adiabatic transformation of the Hamiltonian so as to produce ground states of models different from the simple Bose-Hubbard considered here. And one should also keep in mind that a 3D lattice structure offers a large variety of possibilities, which have not been fully explored yet.
The methods introduced here greatly enhance the potential of optical lattice setups for future applications and might pave the way to the experimental realization of quantum simulation and quantum computation in this system. We also hope that the concepts introduced in this work might trigger further research in the direction of ground state cooling in optical lattices.

VII. ACKNOWLEDGEMENTS
This work was supported in part by the EU IST projects QUPRODIS and COVAQIAL , the DFG (SFB 631) and the "Kompetenznetzwerk Quanteninformationsverarbeitung der Bayerischen Staatsregierung". Our algorithmic protocols establish classical correlations between different lattice sites, when applied to thermal states in the no-tunnelling regime. Hence, a description in terms of independent wells (as in (6)) is no longer adequate. Therefore we refer to a representation in terms of matrix product states (MPS) [35,36].
To be more precise, for a 1D lattice of length L we want to map a classical density matrix of the form Here, A i k denote matrices of dimension D × D and i k ∈ {0, . . . , d − 1} is the occupation number of site k.
The matrices at the endpoints, A i1 1 and A iL L , are 1 × D and D × 1 vectors, respectively. The mapping from ρ to |ψ can easily be accomplished by setting: ρ i1...iL = A i1 1 A i2 2 . . . A iL L . Expectation values for operators of the formÔ =Ô 1 ⊗ O 2 . . . ⊗Ô L are calculated according to the relation Local operations on ρ, like unitary operations U n ′ ,m ′ n,m (11) or filter operations (12), amount to transformations of the local matrices A i k k in the MPS picture. For illustration, let us consider a completely positive map C which acts on a local state ρ k at site k. In Kraus representation one can write with the Kraus operators E α satisfying the completeness relation α E † α E α = 1l. The MPS matrices transform according toÃ Non-local operations that involve more than one lattice site are more complicated to implement. As an example, let us consider the most complicated case, which occurs in our protocols: a spin-dependent lattice shift S −1 which shifts the lattice |b one site to the left. For this we have to consider two species of atoms. In generalization of (A2), the MPS matrices A i k ,j k k are now labelled with two physical indices, i k and j k , referring to states |a and |b , respectively. It will turn out to be convenient to rewrite these matrices in tensor form: (A i k ,j k k ) α,β = A k (α, β, i k , j k ), with α, β = 1 . . . D.
Accordingly, we define B 2 (α 2 , β 2 , i 2 ) := Σ 1/2 (α 2 , γ) W † (γ, β 2 , i 2 ), (A8) and iterate this scheme until the end of the lattice. Note, that the dimension D of the MPS matrices can, in principle, increase exponentially with the number of lattice shifts. If D becomes larger than a desired value D max one has to resort to a truncation method similar to the one proposed for mixed quantum states [37]. However, in practice, we find that D increases only linearly with the number of lattice shifts, given that each shift is followed by a filter operation. Thus, with our method we can fairly easy simulate protocols exactly, i.e. without truncation, which involve up to 100 lattice shifts on lattices with up to L = 500 sites.