Preparation of ordered states in ultra-cold gases using Bayesian optimization

Ultra-cold atomic gases are unique in terms of the degree of controllability, both for internal and external degrees of freedom. This makes it possible to use them for the study of complex quantum many-body phenomena. However in many scenarios, the prerequisite condition of faithfully preparing a desired quantum state despite decoherence and system imperfections is not always adequately met. To path the way to a specific target state, we explore quantum optimal control framework based on Bayesian optimization. The probabilistic modeling and broad exploration aspects of Bayesian optimization is particularly suitable for quantum experiments where data acquisition can be expensive. Using numerical simulations for the superfluid to Mott-insulator transition for bosons in a lattice as well for the formation of Rydberg crystals as explicit examples, we demonstrate that Bayesian optimization is capable of finding better control solutions with regards to finite and noisy data compared to existing methods of optimal control.


Introduction
In this paper, the focus is on creating spatially ordered states in two quite different ultracold systems, atoms in optical lattice and highly excited Rydberg atoms, as testbeds for Bayesian optimization. Optical lattices [1,2] provide a natural and versatile platform to study strongly correlated condensed matter systems [3,4], with implementations in quantum information processing [5] and quantum metrology [6,7,8]. Likewise the extraordinary properties of Rydberg atoms [9] in combination with ultra-cold gases establish a facile way to realize strongly interacting many-body system [10,11,12]. Recently with Rydberg atoms in tweezers has allowed to produce some of the largest nonclassical states [21] and is a serious competitor for the realization of quantum computer [22]. While adiabatic preparation still remains an intuitive and straightforward way to create these interesting states, it requires long timescales especially for large systems with vanishing energy gaps. This makes them prone to accumulation of errors due to prolonged interaction with the environment. Furthermore, there is no guarantee that a particular target state of interest is adiabatically connected to the initial state. Thus in recent years, there has been a growing interest in applying optimal control to ultra-cold systems [23,24,25,26,27].
At the core of any quantum optimal control framework, lies optimization algorithms that aim at maximizing the figure of merit of the experiment with respect to input control fields. A figure of merit (FoM) measures the success of the experiment which could be the preparation of a highly desirable quantum state or gate or simply achieving a certain phase transition. In a typical quantum experiment, input control fields are tunable variables such as external electric or magnetic fields, amplitude and phase of lasers or microwave pulses. The explicit dependence of the FoM on its input control fields defines the control (optimization) landscape. To obtain an optimal solution, the optimization scheme generally requires to probe several points in the control landscape. Undoubtedly the more points the optimization algorithm can probe the better is the optimal solution. However, if the FoM is evaluated on a quantum experiment, probing several points in the control landscape would imply repeating the experiment several times. Depending on the complexity of the quantum task at hand and the optimization routine, the experiment may have to re-run for unusually high number of times, which is highly inefficient. This constraint applies equally well to numerical calculations where the Hilbert space grows exponentially with system size. Most conventional methods of optimization find fast protocols but not necessarily efficiently with regards to the number of data points needed from the control landscape. Furthermore, there is always some uncertainty in evaluating the FoM due to experimental noise and imperfections which makes the optimization process more challenging.
In this regard, Bayesian Optimization (BO) [30,31,32] is an attractive option and has been successfully used in robotics [33,34], machine-learning [31] and has recently made its way to quantum optimal control problems [25,35,36,37]. Section 2 discusses how BO fits into the general framework of quantum optimal control followed by a discussion outlining the essential steps of BO in Section 3. In Section 4, BO is applied to optimize the quantum dynamics which were solved numerically for a small number of atoms trapped in a lattice. In the first example, BO's efficiency with regards to the minimum number of evaluations needed to drive a superfluid (SF) to Mott insulator (MI) state transition is established when compared with several other optimization routines. In the second example, BO is applied to a strongly interacting gas of trapped Rydberg atoms. Although initial studies of Rydberg gases revealed the possibility of creating ordered structures via a phase transition [81], the presence of a quantum critical point makes the realization of crystalline ordering with large number of Rydberg excitations still challenging. Section 5 summarizes these results and provides an outlook for future work.  Figure 1: Quantum optimal control scheme: F (θ m ) is the FoM to maximize and θ m are input parameters describing the control fields for a quantum system. An optimization task is initiated by evaluating the FoM either numerically or experimentally for a set of random control parameters θ 0 . At every stage, the Bayesian optimizer suggests new set of parameters for the quantum system based on previous evaluations of the FoM. This process continues over several iterations.
Within the framework of optimal control, the entire optimization task involves maximizing the FoM, F (θ) with respect to its control fields θ until an optimum solution, θ opt is reached, as expressed below, (1) θ = [θ 1 , θ 2 , . . . , θ N ] is a vector of size N that represents the parametrization of the input control fields. For time-dependent control, this parametrization is done using special functions [56], Fourier series [55] as well as piece-wise constants [57]. As illustrated in Fig. 1, the FoM is evaluated for some random set of initial parameters θ 0 . This is passed onto the optimizer which in turn suggests the next set of parameters, for which FoM is again evaluated thus repeating the optimization task iteratively. For every m th iteration, F (θ m ) is obtained either from a real experiment or through numerical calculations. The hope is to obtain convergence to θ opt well before exhausting the experimental/numerical resources. Although experimental constraints are not explicitly included in Eq. 1, it can be incorporated either in the FoM F (θ) directly or in the choice of the parametrization. A variety of optimization algorithms are available and are broadly classified based on the ability to evaluate the gradient of F (θ) with respect to θ. Gradient based algorithms such as Krotov method [58,59] and GRAPE [57] have been successfully applied to numerical simulations only. Other gradient based methods such as finitedifferences [60], simultaneous perturbation stochastic approximation (SPSA) [61,48] and hybrid quantum-classical algorithms [47,48] can be applied to experiments. However, these gradient based methods are considered as local optimization methods whose convergence is fast as long as the optimization landscape is well-behaved. These methods get compromised by the presence of any local minima and plateaus in the optimization landscape [62,63,64]. Alternatively, gradient-free algorithms are able to explore the optimization landscape more globally than gradient based methods making them less vulnerable to local minimas. Among them, Nelder-Mead has been extensively used in the context of CRAB [55,65,66] as well as independently [45,44,67]. It has the advantage of simplicity but can still get trapped in local minima and its convergence is limited by the presence of noise in the observations [61]. Other popular non-gradient methods include evolutionary algorithms [40,62,68] and the recently introduced reinforcement learning techniques [63,69,70]. These methods usually require large number of iterations to find the optimal solution.
Bayesian Optimization (BO) is a non-gradient based method offering an appealing alternative as it cleverly selects the next set of parameters to evaluate at each step of the optimization. This can lead to significant reduction in the number of iterations needed for convergence while performing global optimization. BO has the additional advantage of integrating probabilistic elements of data acquisition which can be exploited to increase its efficiency [71]. These characteristics make it well suited to perform optimal control on quantum experiments but also when numerical simulations of the system are time-consuming.

Bayesian Optimization
The essential steps of BO used for quantum optimal control are shown in Fig. 1 and are discussed in this section: BO relies on an approximate model of the optimization landscape, which is updated at each iteration, and is leveraged to choose the next set of parameters for which the FoM is evaluated.
As highlighted in the previous section, the FoM can be obtained either experimentally or numerically and is passed to the optimization routine. Each of these evaluations can be time consuming, and in experimental setups, subject to noise. For these reasons, one cannot expect to resolve perfectly the true optimization landscape and probabilistic modeling becomes convenient which is denoted by f . A suitable probabilistic model for f is specified in terms of a distribution, p(f ) taken to be a Gaussian process [72], whose details are given in Appendix A.1. In essence, it favours smooth and regular functions to describe the unknown control landscape F . This distribution does not yet incorporate evaluations of the FoM and is therefore referred as the prior distribution. The next step in BO is to update the probability distribution p(f ) based on the new values of the FoM. Consider an arbitrary iteration step, where D evaluations of the FoM have been obtained and is denoted by vector, y = [y 1 (θ 1 ), ..., y D (θ D )]. The posterior distribution is the probability distribution p(f |y) defined as the distribution for f conditioned on all the evaluations y obtained so far and is specified using Bayes rules as follows, where p(y) is the probability distribution for the set of observations y and p(y|f ) is the likelihood for the set of observations y to occur based on a given model f . Specific details regarding the evaluation of p(f |y) are described in Appendix A.2. In order to decide which control pulse to use in the next step of the iterative optimization, one would naively choose it where the model f takes its maximal value. However, this model is only approximative and thus likely to miss some interesting features of the optimization landscape. Therefore it is also of interest to evaluate the FoM where uncertainty in the model is high. Combining these two aspects, sometimes referred respectively as exploitation and exploration, are captured by an acquisition function (see Appendix A. 3 To illustrate the points discussed above, a simple example of one dimensional optimization is considered in Fig. 2. The maximum value of an arbitrary function of a single parameter θ ∈ [0, 4] is searched for using BO, implemented using the numerical package [73]. This function represents the true FoM which is shown in dashed red and the noisy evaluations of the function serve as elements of y are shown as red dots. Ten such evaluations have been obtained and the posterior distribution p(f |y) is plotted in the top panel of Fig. 2(a). The posterior distribution is defined by its mean (shown in solid blue line) and its variance which are used to compute a 95% confidence interval (depicted as shaded blue across the mean). This width of the interval results from the finite number of evaluations and the noisy evaluations. On comparing the model with the true FoM, one finds that in Fig. 2(a) does not adequately reproduce the underlying control landscape due to the small number of evaluations. This lack of knowledge of the true underlying landscape is captured by the variance of the distribution resulting in a large confidence interval. The acquisition function given in Eq, (A.7)) is plotted in the lower panel of Fig. 2(a). In this case, the maximum of the acquisition function is at θ ≈ 2.7 (highlighted as dashed vertical line), where the model has both, a high mean value (not necessarily maximal) and also shows high uncertainty. Based on the acquisition function, we evaluate for next parameter giving the Fig. 2(b) which corresponds to 11 iterations and the new data point (green square) has been used to update the model. This cycle of updating the model and suggesting the next parameter to evaluate is repeated another nine times giving Fig. 2(c) where the model is in close agreement with the true landscape and has successfully identified a parameter θ close to the true global maximum.

Preparation of ordered states in ultra-cold systems beyond adiabatic methods
Controlled preparation of many-body states with spatial correlations resembling solidstate matter are of great interest to the condensed matter community. For example, phenomena such as MI-SF transition realized with bosonic atoms trapped in optical lattices [74,75] simulate the transition of a conducting state to an insulator state in solid state physics. Similarly creating highly correlated ordered states with long range interactions in a cold gas of Rydberg atoms [76,77,78] is akin to crystals in solids with long range Coulomb interactions between the electrons. In the Bose-Hubbard model, the only form of interactions were short-range on-site interactions while in the Rydberg system, the interactions can be relevant over a range longer than a typical lattice spacing [79,80]. Both of these examples are well studied in the context of ultra-cold physics and would serve as perfect testbeds for establishing the merits of BO techniques.
In Sec. 4.1, using the Bose-Hubbard model as a toy example, BO's ability to obtain an optimal protocol that drives the SF-MI transition is tested within the context of quantum speed limit [59]. Furthermore, BO's performance is benchmarked with other optimization methods in terms of its efficiency towards convergence. Section 4.2 is where BO is applied to the optimization of laser pulse dynamics to create Rydberg crystals with large fraction of Rydberg excitations in different lattice geometries; one dimensional (1D), two dimensional (2D) as well as three dimensional (3D).

Superfluid to Mott insulator transition in Bose-Hubbard model
The Bose-Hubbard model is widely studied in solid state physics and is simulated with bosonic atoms in an optical lattice. Although conceptually simple, this many-body system cannot be mapped onto a single particle problem and contains the interesting phenomena of transitioning from a superfluid state to a Mott insulating state [74]. Experimental realization was achieved [75] by slowly varying the depth of the optical lattice potential.

Setup:
For the Bose-Hubbard model, a homogeneous system of N bosonic atoms with repulsive interactions in a lattice with L sites is considered. The Hamiltonian for this model is given as The first term in the Hamiltonian describes the tunnelling of bosons between neighboring potential sites whose strength is given by J(t). The annihilation (creation) of a boson at site i is defined by operatorsb i (b † i ) which follow the usual commutation relation, This tunneling term tends to delocalize each atom over the lattice. The repulsive interaction between two bosonic atoms is quantified by U (t). Owing to the short range of the interactions compared to the lattice spacing, they are referred as on-site interactions wheren i =b † ib i is the number operator for a given site i. Fig. 3, by varying the potential depth of the optical lattice in time, both terms J(t) and U (t) are affected. J(t) changes depending on the tunnelling barrier between neighboring lattice sites, while the change in the on-site interaction is due to the variation in atom wave function confinement. Introducing a single dimensionless quantity,

As shown schematically in
Hamiltonian is re-written as: The Bose-Hubbard Hamiltonian has two distinct ground states depending on the strength of the interactions U relative to the tunneling J. In the limit where Γ(t) = 0, the interaction term vanishes and the ground state of each atom is delocalized over the entire lattice. The atom number per site is uncertain and thus the many-body state can be described as a superposition of different atom number states, Here |0 is the many-body vacuum state in the Fock basis representation. The SF state is characterized by a non-zero variance of the number operator and a constant global phase across the lattice.
In the opposite limit with Γ(t) = 1, the hopping between adjacent sites is suppressed and the ground state of the system consists of localized atomic wave functions that minimize the interaction energy. For a homogeneous system, one can have commensurate filling of n atoms per lattice site and thus the many-body state is then a product of local Fock states in the atom number for each lattice site. For the ground state where n = N/L, the Mott insulator is given as Global phase coherence of the matter wave field is lost at the expense of perfect atom number correlations between lattice sites. In the next section we proceed to analyze the energy spectrum of the Bose-Hubbard model. Interestingly, in the limits of Γ(t) = 0 or Γ(t) = 1, the Hamiltonian in Eq. (4]) is completely integrable. However, for the inter-mediate values of Γ, the system is non-integrable [82].

4.1.2.
Many-body energy spectrum: For a chain of L = 5 sites with periodic boundary conditions and unit filling (L = N = 5) of bosons, the many-body energy spectrum as a function of Γ is provided in Fig. 3(c). This is solved using exact diagonalization in the Fock basis representation and is implemented with QuSpin [83]. The energies E n of the many-body eigenstates |n are plotted where n = 0, 1, 2 . . . numerates the state in ascending order of energy at every value of Γ. At Γ = 0 and Γ = 1, there is considerable degeneracy in the states although the energies can be expressed analytically [82]. The ground state is highlighted in red ranging from superfluid ground state (Γ = 0) to Mott insulating ground state (Γ = 1) and takes specific values, E 0 (Γ = 0) = −2N and E 0 (Γ = 1) = 0 respectively. There are multiple avoided crossings for all the other values of Γ.

Results and discussions:
The time-dependent control Γ(t) is parametrized with 10 parameters representing the control values taken at equidistant time for t = 0 − T QSL with boundary conditions, Γ(0) = 0 and Γ(T QSL ) = 1. The duration of the protocol is taken to be the quantum speed limit as defined in [24,59], T QSL = π/∆ where ∆ is the minimum energy gap between the ground and first excited state within the bounds of Γ (see Fig. 3(c)). This limit theoretically bounds the minimum time to reach perfect fidelity by the system. The values of the control for inter-mediate times are obtained by fitting a cubic spline. The dynamics is solved numerically by varying Γ(t) and performing exact diagonalization at every instant giving the state, |ψ(t) that solves for H(t) from Eq. (4). At any given time, |ψ(t) will be in superposition of different |n(t) states, where n = 0, 1, 2 . . . labels the energy in increasing order. Thus, |n(0) = 0 and |n(T QSL ) = 0 are the ground state SF and MI states respectively. In Fig.4(a) shows the drive with a simple linear increase of Γ(t) and is compared to optimal driving found with BO shown in Fig.4(c). The plots in Fig.4 (b),(d) are the squared overlap of the state |ψ(t) with |n(t) defined as P n (t) = | ψ(t)|n(t) | 2 . As Γ(t) is varied from 0 to 1, we retrieve the ground state fidelity at the end of the protocol with almost 90% in the case of the optimized protocol while the linear ramp has a fidelity of less than 30%. For the linear protocol to reach the same fidelity as the optimized one, it would take about an order of magnitude longer in time thus establishing the non-adiabatic character of the optimum protocol.
To benchmark BO's overall performance for the SF-MI task, results obtained using BO are compared with other optimization routines (some of which were mentioned in Sec. 2) in Fig.5. These include both gradient as well as non-gradient methods. For comparison with gradient based method, spontaneous perturbation stochastic approximation (SPSA) algorithm was chosen [84] as it was shown to be competitive in the number of iterations for convergence and robust to noise [61]. For comparison with non-gradient based methods, the differential evolution(DE) and Nelder-Mead(NM) optimizers were implemented using Scipy [85]. Finally random search (RANDOM) was also included in the list of optimizers which relies on randomly sampling a new set of parameters at each iteration. Fig.5 have been divided into three columns corresponding to optimization results for different protocol durations, T = 0.5T QSL , T QSL , 1.2T QSL . Motivated from [62,63], these three regimes can have very different influence on the complexity of the optimization task based on the quantum speed limit criterion introduced earlier in the section. Fig.5 plot the infidelity as a function of the number of iterations. In the first row of Fig.5, all the optimizers have access to the exact evaluation of this fidelity and the FoM is effectively given as F MI = | ψ M I |ψ(T ) | 2 and thus the corresponding infidelity is defined as (1 − F MI ). However for Fig.5(d-f), a more experimentally realistic FoM is used, denoted by F exp . This particular choice of FoM is motivated by experimental constraints such that the impossibility to project onto the target state and finite size sampling effects. The infidelity in this case is F exp = V i i , where V i is the variance in the occupation number for given lattice site i. F exp is minimized (ideally vanishes) when a MI phase has been reached and thus provides an appropriate guidance for the optimization. Rather than the true expected value of the variance, an estimation of this figure based on a finite number of repetitions (here 1000) is used.
As shown in Fig.5, over all these different configurations, BO (orange curve) exhibits faster convergence towards low infidelity. Both SPSA and DE also converge to good parameters but at the expense of more iterations compared to BO. NM in most of the cases get quickly stuck in local minima with poor fidelities. When the time allowed for evolution is too short, Fig. 5(first column), the best final infidelity reached is of the order of only 50%. All optimizers are quite competitive in this case. However, when the time allowed for the transition to occur is more or equal to T QSL (second and third columns in Fig.5), an order of magnitude less iterations are required by BO to converge to high fidelity control. Using the experimental motivated FoM F exp , the final fidelities dropped by 5% for all the optimizers indicating additional difficulty to perform optimization for the noisy scenario. Interestingly SPSA performs better with the noisy FoM Fig.5(d) while getting stuck when it has access to the true fidelity (c). This highlights the ability of SPSA to leverage noise to escape local minima.
Based on these results, BO seems to provide a decisive advantage when it comes to number of iterations.

Creation of Rydberg crystalline states
After the first observations of interacting Rydberg gases [13,14,15,16,17,18], more control over the nature, form and structure of the quantum many-body state has been achieved by shaping the excitation sequence [19] and the arrangement of the atoms in specific patterns [20]. Early proposals [76,77] for creating ordered phases in Rydberg gases were adiabatic in nature, in the sense that the overall dynamics was slower than the minimum energy gap of the many-body system. Unfortunately, the energy gap decreases exponentially with the number of Rydberg excitations preventing the preparation of crystals with large number of Rydberg excitations.
Other proposals rely on admixing the ground state atoms with small amount of Rydberg state to create super-solid droplet crystals [86,87] and was experimentally done [78], but only for a very small system where the energy gap was rather large compared to the coherence time. Moreover Rydberg admixing of ground state atoms requires considerable fine tuning [88] especially to avoid spontaneous scattering and Figure 6: Schematic diagram of the setup: Ground state atoms denoted by |g are trapped in a deep optical lattice. The ground state and Rydberg state (|r ) of each atom is optically coupled with an effective Rabi frequency Ω and detuning ∆. Two Rydberg atoms located at postions r i and r j interact via the van der Waals interaction. molecular resonances. More recently optimal control has been applied to optimize the preparation of Rydberg crystals [89] for specific geometries (quasi one dimensional) and very low Rydberg fraction. Having proved BO's efficiency in the last example, there is a strong motivation to apply BO for preparing Rydberg ordered states and find better protocols with regards to enhanced number of Rydberg excitations and in the presence of experimentally related parameter noise and possible lattice filling imperfections. Fig. 6) is an illustrative figure of the setup under consideration. Specifically, we have eight to nine Rb atoms in their ground state |g = 5S coupled to a particular Rydberg state |e = 50S in the presence of a laser field, thus describing each atom effectively as a two-level system. The atoms are either trapped in a deep optical lattice [90,19] or in an array of optical dipole traps [91] with uniform unit filling and lattice spacing l = 1.5 µm. The advantage of trapping atoms at suitable spacings is that unwanted molecular resonances and scattering can be avoided. Moreover, if needed, simultaneous trapping of ground and Rydberg state atoms is possible in a magic wavelength lattices [92,93,94]. Two Rydberg atoms with positions r i and r j interact via the van der Waals interaction given by V ij = C 6 (n)/|r i − r j | 6 , where C 6 > 0 is the van der Waals coefficient which for our chosen state takes the specific value, C 6 = 1.56 × 10 −26 Hz m 6 [95]. The Hamiltonian describing the full setup in the rotating wave approximation is given aŝ

Setup:
where |e i e i | is the projection operator onto the excited state for the i th atom and |g i e i | is the transition operator. The first two terms in the Hamiltonian represent the atom-laser interaction with the laser detuning ∆ and two photon effective Rabi frequency Ω of the system. For the small system size of less than ten atoms, uniform excitation laser profile is assumed such that all atoms experience the same Rabi coupling. [GHz] [GHz] Figure 7: Panels (a-c) correspond to zero laser intensity, many-body energy spectrum with Rydberg state, n = 50 (for which C 6 = 1.56 × 10 −26 Hz m 6 ) shown for different lattice geometries (with lattice spacing l = 1.5 µm). The many-body ground and excited states are shown explicitly as green and red lines respectively. Panels (d-f) depict the many-body energy spectrum for the same parameters as (a-c) but now with a coupling of Ω = 100 MHz.
There are different energy scales at play here, one can increase interaction strength C 6 by selecting higher principal quantum number n, which provides more bandwidth for Ω and can be tuned with respect to overall lifetime of the system. Any motional dynamics is neglected as their typical timescales are much longer than the fast excitation dynamics that is considered here. Spontaneous decay of the Rydberg state including black-body radiation for the chosen state, 50S is about 65 µs [22].

Many-body energy spectrum:
A gas of N atoms with n e Rydberg excitations can be represented by the many-body eigenstate |n e , k where k labels the different combinatorial possibilities to distribute n e excitations over N atoms. The many-body energy spectrum with Ω = 0 for different geometries is shown in Fig. 7(a-c). In the zero intensity field scenario, the energy of a many-body state is given as where ne are the interaction energies. The many-body ground state (n e = 0) has zero energy and is highlighted in green in Fig. 7(a-c). States with the same number of Rydberg excitations form a manifold with the same slope with respect to detuning. Thus the state that has the highest slope (shown in red) corresponds to all atoms in the lattice excited, n e = N . For non-zero laser intensity, anti-crossings occur between the many-body states as shown in Fig. 7(d-f). Since repulsive interactions are assumed for Rydberg atoms, the overall energy of the system is minimized by maximizing the spacing between Rydberg atoms and hence forming an ordered phase of Rydberg excitations. For a 1D system, sweeping the detuning from large negative values to specific positive values, the state with all atoms in the ground state gets connected to states with one, two or more Rydberg excitations depending on the final value of the detuning. One way to ensure that the final state is a Rydberg crystalline state, is to have the entire pulse dynamics performed slower than the inverse of the relevant energy gap. Unlike in the Bose-Hubbard model, it is not trivial to calculate the energy gap as it depends not only on the Rabi frequency but also on the interactions [76]. As the number excitations increase, the energy gap decreases exponentially preventing the formation of Rydberg crystals with large Rydberg excitation fraction, ν = n e /N 1. Moreover for 2D and 3D lattice geometries, the many-body energy spectrum is far more complicated as there are more degeneracies and requires further careful optimization over the (Ω, ∆) control landscape. For this work, all parameters are chosen such that there is no Rydberg blockade in the system [22], which implies that in principle all atoms are allowed to be excited to their Rydberg state.

Results and discussion:
The dynamics for Eq. (7) is numerically solved using a linear multi-step integration method implemented using the QuTiP package [96]. The total duration of the dynamics is fixed to T = 1 µs which is well below the overall lifetime of the many-body Rydberg system and of any adiabatic scheme. The control field is parametrized using six parameters, [Ω(t 1 ), Ω(t 2 ), Ω(t 3 ), ∆(t 1 ), ∆(t 2 ), ∆(t 3 )] which correspond to Rabi frequencies and detunings at three different times. For intermediate times, the values of (Ω(t), ∆(t)) are fitted using quadratic interpolation. Additionally, boundary conditions are imposed where the intensity of the laser vanishes at t = 0 and t = T . This is done by multiplying Ω(t) with a Tukey window function w(t) [97].
For this system, the relevant figures of merit are target state fidelity, spatial correlation function and number of Rydberg excitations, all of which can be potentially measured in a typical Rydberg experiment with spatially resolving detection methods [98]. The control protocol is optimized for a specific number of Rydberg excitations n e which is represented by the many-body eigenstate |n e , k . By denoting the fidelity as F ne (t) = k | Ψ(t)|n e , k | 2 , the FoM provided at each iteration to BO is given by F ne (T ). The sum is over all the configurations containing n e Rydberg excitations.
The total number of atoms for 1D and 2D lattices is nine and the protocol is optimized for five Rydberg excitations. The 3D lattice consists of eight atoms with the protocol optimized for four Rydberg excitations. The fidelities F ne (t) over time for the optimal control protocols found by BO are depicted in Fig. 8 (a-c)   optimization within 10 iterations. In order to reproduce real experiments and test BO's resilience, parameter noise are included by introducing 5% relative noise in Ω(t) and ∆(t) respectively. The standard deviation for Ω(t) and ∆(t) is shown in shaded blue and red colors respectively across their mean values (which are depicted with solid lines). It is most perceptible for Ω in Fig. 8 (e). These noises are usually negligible but could arise due to experimental drifts or Doppler broadening. Inspecting the protocols optimized in Fig. 8, it seems like BO naturally selects protocols with the general trend where the detuning starts from large negative values and then transitions to positive values. However, the optimized Rabi frequency is nontrivial especially for the 1D and 2D lattice models. Fig. 8 (g-i) show the probability of Rydberg excitation P (T ) for every site at the end of the optimum protocol. It verifies the formation of Rydberg crystalline states as the atomic excitations are maximally separated for any given lattice model. For 1D and 2D lattices, there is only one unique spatial configuration that corresponds to five excitation Rydberg crystal. However in 3D lattice, there are two configurations that equally contribute to the four excitation Rydberg crystal. In Fig. 8 (i), one of the two configurations is rotated onto the other. Next, we include errors originating from inaccurate readout and imperfect lattice fillings which are prevalent in current experiments [90,21]. Both these imperfections ultimately result in detection errors and are included in the simulations by associating a probability for Rydberg detection for each atom to be 0.9. In one case, the optimum protocol is obtained for a perfect lattice and is labeled as OP1 while the other protocol is obtained for imperfect lattice filling which is labeled as OP2. The chosen FoM is the number of Rydberg excitations which was optimized for in Fig. 8. Fig. 9 (a-c) show the FoM calculated by applying both control dynamics, OP1 (shown in solid orange) and OP2 (shown in dashed orange) on an imperfect lattice filling averaged over 50 realizations. The profile of the drivngs that account for lattice imperfections are vastly different from the ones that represent for the idealized case. This in turn, results in better performances as can be seen in Fig. 9 (a-c) where OP2 exhibits higher final fidelities compared to OP1 across all lattice geometries. This shows the importance of characterizing the noise sources and incorporating them in the numerical simulations. Even better, performing optimizations directly onto the experiments rather than relying on theoretical protocols based on idealized scenarios. Also the number of iterations required when noise is incorporated is five times higher which reiterates the need for optimization routines to be efficient for them to be of practical use.
BO created Rydberg crystals in all three different lattice geometries with a Rydberg fraction of ν = n e /N = 0.55 in T = 1 µs. To create Rydberg crystals with such large Rydberg fraction would take much longer in an adiabatic scheme. For example, the adiabatic preparation of Rydberg crystals in one dimension were done for ν = 0.016, T = 1 µs in [76] and ν = 0.125, T = 4 µs in [77]. The experiment [19] was done for ν = 0.13, T = 4 µs while the optimal control using Nelder-Mead [89] did it for quasi-1D for ν = 0.17, T = 4 µs. In summary, BO is able to efficiently obtain high fidelity protocols despite the parameter noise and lattice filling imperfections that were included in the simulations. This is reassuringly promising especially considering the complex many-body energy spectrum of this model.

Discussion and Outlook
Quantum optimal control is about designing fast high fidelity schemes to prepare desirable target states in a quantum system. An optimization routine that delivers when there is limited availability of data due to high costs associated with repetitions and noisy evaluations, will be advantageous, especially for experiments but also for heavy numerical simulations. In this context, BO seems to be a promising optimization tool and to make this case, it was applied to create spatially ordered phases of ultra-cold gases trapped in lattices.
BO's performance in comparison to other optimization routines in the SF-MI transition was found to be most efficient with regards to the number of iterations needed for convergence. It is primarily due to its ability to cleverly select the next set of parameters to probe. In the second example, notwithstanding the complexity in the many-body energy spectrum and the vanishing energy gap for large Rydberg systems, BO obtained high fidelity dynamics that created Rydberg crystalline states in lattice models of arbitrary dimension with much larger Rydberg fraction than previous methods. More interestingly, BO successfully navigated the optimization landscape even with the inclusion of noisy parameters and imperfect lattice fillings which is evidence of its probabilistic feature in the modeling.
No initial guess function was assumed for the control in all the optimizations performed in this work. Such a guess could be incorporated by means of the choice of a different parametrization [65] and may result in optimized control with higher fidelity. Similarly the number of parameters used for the time-varying control field could be increased to allow more freedom in its dynamics and, at least in principle, better control. However in practice optimizing over higher dimensional parameter space may be challenging and, for a given number of iterations, a trade-off between flexibility in the control field and realistic number of iterations has to be found.
As an outlook, we would like to remark that the field of probabilistic machine learning [99] is advancing at fast pace and has a lot to offer for the characterization and optimization of quantum systems. In particular several alternatives for the model used here for BO have been suggested [100,101]. They could be easily integrated into the framework and decrease the computational burden associated to Gaussian processes when the number of iterations becomes really large. Another interesting direction of research for even more efficient optimizations could originate from the field of transfer-learning [102] where one is able to transfer what has been learnt in optimizing a given task to a new but related one. For example, it could allow to leverage results of optimizations performed on an (approximative) numerical simulator to speed up optimizations performed on the real experiments.

Acknowledgments
The authors acknowledge H. Weimer for insightful and productive discussions. Both Imperial and Stuttgart have received funding from the QuantERA ERANET Cofund in Quantum Technologies implemented within the European Unions Horizon 2020 Programme under the project Theory-Blind Quantum Control TheBlinQC and from EP-SRC under the grant EP/R044082/1. This work was supported through a studentship in the Quantum Systems Engineering Skills and Training Hub at Imperial College London funded by the EPSRC(EP/P510257/1). with N large enough such that each sample with finite size N effectively looks like a function.
In general, it is not possible to have a precise idea of the values of these hyperparameters beforehand but they can be fitted to the observations at any stage of the optimization. This fitting is often done by minimizing the log marginal-likelihood [72] and is implemented in any GP library such as [103].
In summary the prior distribution is entirely defined by the choice of a GP, a mean function m, a covariance function k and the hyper-parameters σ 2 0 and l. In particular, under the choice of a mean and a kernel function presented here, a priori each value f (θ m ) taken by the model has a distribution: .., f (θ M ), f (θ * )] and using the conditioning properties of Gaussian distributions [72], the predictive distribution is given by where the column vector k has entries k m = k(θ m , θ * ), and elements of the covariance matrix K are given by K m,n = k(θ m , θ n ). The symbols µ f (θ * ) and σ f (θ * ) denote the mean and standard deviation of this predictive distribution. They both are function of the parameter θ * as each element of k depends on it. Compared to the prior distribution p(f (θ * )) in Eq. A.3, the mean of the predictive distribution has been shifted from 0 to k T K −1 f and its variance has decreased from σ 2 0 by a positive quantity k T K −1 k, resulting from the incorporation of the observations. The most computational demanding part of evaluating this mean and variance comes from the inversion of the matrix K which has complexity O(M 3 ).
In an experimental scenario, the set of measurements y does not directly reveal the values taken by the model, still model and observations can be related by positing a noise model. This noise, originating from both experimental imperfections and also quantum fluctuations, can be approximated as an additive constant Gaussian noise such that y(θ j ) = f (θ j )+ε j , where the noise terms ε j are assumed to be independently drawn from the same distribution p(ε j ) = N (0, σ 2 N ). The extra parameter σ N , capturing the amount of noise, can be fixed or could also be fitted to the data in the same way as the hyper-parameters σ 2 0 and l. Under the independence assumption for the noise terms, the likelihood of recording the full data set of observations y for a given set of values f taken by the model can be written as p(y|f ) = N (f , σ 2 N I). With this simple phenomenological model of the noise the sought-after predictive distribution can be derived: p(f (θ * )|y) = df p(f (θ * )|f )p(f |y) = df p(f (θ * )|f )p(y|f )p(f )/p(y) where the first line shows how it can be decomposed, using Bayes rule and marginalization over the vector of values f , as a product of the prior distribution p(f ) appearing in Eq. A.1, the conditional distribution in Eq. A.4, the likelihood p(y|f ) previously defined and the probability of obtaining a given set of observations p(y). The next line provides the analytical result of this integral [72]. This final result is quite similar to Eq. A.4 except that the inverse of the covariance matrix K −1 has now been replaced by (K + σ 2 N I) −1 thus incorporating noise in the observations. A practical example of evaluating this predictive distribution for a one-dimensional parameter θ ∈ [0, 4] based on a set of M = 10 observations is provided in Fig. A1(b). The mean function µ f and a confidence interval with bounds µ f ± 1.96σ f have been represented with, in addition, the explicit densities p(f (θ)|y) for two specific values of the parameter. Far away from the observations, for example for θ ≈ 0.75, the confidence interval is wider indicating uncertainty in the predictive distribution, while closer to observations, for example for θ ≈ 0.5, it shrinks but does not vanish as the model identified noise in the observations (more precisely the value of the hyper-parameter σ N representing the strength of noise after fitting is non 0).

Appendix A.3. Decision rule: Acquisition function
The last step in the BO framework is the choice of the next parameter θ M +1 to observe based on the surrogate model. This new parameter could be picked such that it maximizes the mean of the predictive distribution given in Eq. (A.5): θ M +1 = arg max θ µ f (θ). However, with only a limited amount of observations, it is likely that the model does not capture the full range of variations in the figure of merit and this approach is likely to end up in a local minima. This can be seen for example in Fig.2(a) where the model misses one of the peak. Alternatively an explorative strategy would consist on taking measurements where the uncertainty in the model is maximal. This can be done by choosing θ M +1 = arg max θ σ f (θ), where the standard deviation σ f (θ) quantifies the uncertainty in the model. Balancing these two objectives, which are finding the location of the maximum and also exploring region of the parameter space where not enough observations have been made, is often referred as the explorationexploitation trade-off. In BO this trade-off is dealt with by introducing an acquisition function α(θ) aiming at capturing these two aspects; and the choice of the next parameter is taken such that: where the positive scalar k balances the bias toward exploration (for high value of k) or exploitation (small values of k). This acquisition function was used to produce the graphs in the bottom panels of Fig.2. In Fig.2(a-b) a value of k = 4 was used, while in 2(c) in the last iteration of the optimization it was taken to be k = 0 in order to show the optimal parameter found according to the model. Another acquisition function often used is the Expected Improvement (EI) function, where the improvement, defined with regards to the best observation of the figure of merit obtained so far y max , is averaged under the predictive distribution: α EI (θ) = max(0, f (θ) − y max )p(f (θ)|y)df (θ), (A. 8) which has analytic solution [31]. The choice of the acquisition function can have in some cases significant impact on the final results and both the UCB and EI functions were considered in this study.
In practice finding the maximum of these acquisition functions is often performed by gradient ascent (possibly repeated over several random starting parameters). For both the UCB and EI, the gradients of the acquisition function with regards to the parameters have analytical expressions and can be efficiently evaluated. These two acquisition functions and the routines for finding their maxima are implemented in most of the BO libraries such as [73].