Characteristics of a magneto-optical trap of molecules

We present the properties of a magneto-optical trap (MOT) of CaF molecules. We study the process of loading the MOT from a decelerated buffer-gas-cooled beam, and how best to slow this molecular beam in order to capture the most molecules. We determine how the number of molecules, the photon scattering rate, the oscillation frequency, damping constant, temperature, cloud size and lifetime depend on the key parameters of the MOT, especially the intensity and detuning of the main cooling laser. We compare our results to analytical and numerical models, to the properties of standard atomic MOTs, and to MOTs of SrF molecules. We load up to $2 \times 10^4$ molecules, and measure a maximum scattering rate of $2.5 \times 10^6$ s$^{-1}$ per molecule, a maximum oscillation frequency of 100 Hz, a maximum damping constant of 500 s$^{-1}$, and a minimum MOT rms radius of 1.5 mm. A minimum temperature of 730 $\mu$K is obtained by ramping down the laser intensity to low values. The lifetime, typically about 100 ms, is consistent with a leak out of the cooling cycle with a branching ratio of about $6 \times 10^{-6}$. The MOT has a capture velocity of about 11 m/s.


Introduction
The magneto-optical trap (MOT) [1] is at the heart of a vast range of scientific and technological applications that use ultracold atoms. In a MOT, pairs of counterpropagating laser beams cross at the zero of a magnetic quadrupole field, subjecting atoms to a velocity-dependent force, which cools them, and a position force, which traps them. Recently, there has been a great effort to extend the method to molecules, motivated by many new applications [2] that include quantum simulation and information processing, the study of collisions and ultracold chemistry, and tests of fundamental physics. Laser cooling has been applied to a few species of diatomic molecules [3,4,5], and recently even to a triatomic molecule [6]. Two-dimensional magneto-optical compression of a YO beam was demonstrated [4], followed by the first three-dimensional molecule MOT [7], which used SrF.
Laser cooling and magneto-optical trapping are more difficult for molecules than for atoms. Several vibrational branches have to be addressed, each requiring a separate laser. For molecules with electronic and nuclear spin, spin-rotation and hyperfine interactions further increase the number of levels involved. To avoid decay to multiple rotational states it is necessary to drive transitions with F ≥ F [8], where F and F are the angular momenta of the ground and excited states. Consequently, the number of ground states typically exceeds the number of excited states, and this reduces the scattering rate and increases the saturation intensity [9]. Furthermore, for such transitions, there are dark states present amongst the ground-state sub-levels. This can be especially problematic in a MOT because optical pumping into dark states diminishes the trapping force [10]. In the rf MOT [11,12] this problem is avoided by synchronously modulating the laser polarization and magnetic field direction at frequencies similar to the scattering rate, typically around 1 MHz. In a dc MOT, where there is no such modulation, one can either rely on the motion of the molecules through the spatiallyvarying polarization of the light field to de-stabilize the dark states, or make use of the dual-frequency mechanism [13]. Here, one (or more) of the MOT transitions is driven by two frequency components with opposite circular polarization, one red-detuned and the other blue-detuned. Both types of MOT have been investigated for SrF. For this molecule, the dc MOT exhibited a short lifetime and sub-millikelvin temperatures could not be reached [7,14]. In the rf MOT the lifetime was longer and it was found that the temperature could be reduced to 250 µK by ramping down the intensity of the MOT light [11,12]. Recently, we demonstrated a dc MOT of CaF molecules, and we showed how to cool the molecules to 50 µK by first ramping down the intensity and then transferring to an optical molasses where sub-Doppler cooling processes are effective [15]. An rf MOT of CaF has also now been demonstrated [16].
Since molecular MOTs display some important differences to atomic MOTs, and since they are still new, a thorough characterisation of their properties is called for. In this paper, we study our dc MOT of CaF in detail. The MOT is loaded from a beam of molecules decelerated to low velocity by frequency-chirped counter-propagating laser light. We investigate how the parameters of this deceleration step influence the number of molecules loaded into the MOT, and how to maximize this number. Then, we study how the molecule number, scattering rate, trap oscillation frequency, damping constant, temperature, cloud size and lifetime each depend on the intensity and detuning of the MOT light. We also measure how the number of molecules and the cloud size depend on the applied magnetic field gradient, and we measure the capture velocity of the MOT. We compare our results to simple analytical models and find that most of the dependencies we observe are described adequately by these models. Thus, despite their greater complexity, many properties of molecular MOTs can be understood using similar models as for atomic MOTs, with only minor modifications. We also compare our measurements to the results of numerical models based on multi-level rate equations [10] and find good agreement for most, but not all, of the MOT properties.

Methods
The setup used for this work is the same as described in [15]. Here we give a more detailed description of the setup and our methods. The MOT is loaded from a buffer gas source of CaF molecules that are decelerated to low velocity using frequency-chirped counter-propagating light. The setup uses five lasers, which are detailed in table 1. The transitions driven by these lasers are illustrated in figure 1(a) which also specifies the molecular notation we use. The ground electronic state of CaF is X 2 Σ + , the first electronically excited state is A 2 Π whose decay rate is 2π × 8.3 MHz, and the second excited state is B 2 Σ + whose decay rate is 2π × 6.3 MHz. The main slowing laser drives the B 2 Σ + (v = 0, N = 0) ← X 2 Σ + (v = 0, N = 1) transition and is denoted L s 00 . A repump laser, L s 10 , drives the A 2 Π 1/2 (v = 0, J = 1/2, p = +) ← X 2 Σ + (v = 1, N = 1) transition, to recover the population that leaks into the v = 1 state. The MOT uses four lasers which drive transitions and are denoted L ij . Although the vibrational branching ratios from the B state are more favourable than for the A state, the B 2 Σ + (v = 0, N = 0) state has a hyperfine splitting of about 20 MHz which might be problematic for a MOT, being neither large nor small compared to the linewidth [13]. The hyperfine splitting in the A 2 Π 1/2 (v = 0, J = 1/2, p = +) is unresolved, which is why we preferred to use this state for the MOT. Each of the lower levels, X 2 Σ + (v, N = 1), is split into two components by the spin-rotation interaction, and these are each split in two again by the hyperfine Table 1.
Lasers used in the experiment, the transitions they drive, and their frequencies and powers.

Laser Transition a
Role Accurate to 600 MHz. c Full power in a single beam. d This light is derived from the same laser as L s 10 .
interaction, giving four components ‡ with total angular momentum F = 1, 0, 1, 2. Figure  1(b) shows the hyperfine intervals for the X 2 Σ + (v = 0, N = 1) state. The hyperfine intervals for the other vibrational states are similar. Figure 1(b) also shows the sideband structure applied to each of the MOT lasers to ensure that all hyperfine components of each transition are driven (see later). The source of CaF molecules is a cryogenic buffer gas source, described briefly here and in detail in reference [17]. A calcium target inside a 4 K copper cell is ablated at t = 0 by a pulse of light from a Nd:YAG laser (5 mJ energy, 4 ns duration, 1064 nm wavelength, 2 Hz repetition rate). The liberated Ca enters a stream of 4 K helium gas flowing through the cell with a flow rate of 0.5 sccm. Sulphur hexafluoride (SF 6 ) enters the cell from a room temperature capillary at a flow rate of 0.01 sccm. The Ca and SF 6 react to create CaF molecules which are cooled by collisions with the He and leave the cell through a 3.5 mm aperture. The resulting pulse contains around 1.9×10 11 CaF molecules per steradian in the X 2 Σ + (v = 0, N = 1) state, with an average forward velocity of 150 m/s and a duration of 280 µs measured 2.5 cm downstream of the exit aperture. The beam then passes through an 8 mm diameter skimmer, 15 cm downstream, which separates the source chamber from the slowing chamber where the molecules are decelerated to low speed. Finally, they enter the MOT chamber via a 20 mm diameter, 200 mm long differential pumping tube, and are captured in the MOT 120 cm from the exit of the source. The pressures in the source, slowing and MOT chambers are 2 × 10 −7 , 6 × 10 −8 and 2 × 10 −9 mbar respectively. The MOT and slowing chambers are connected via a bellows for vibration isolation.
The molecules are decelerated using the frequency-chirped slowing technique described in reference [18]. The slowing light counter-propagates to the molecular beam and consists of L s 00 and L s 10 combined into a single beam with a Gaussian intensity profile ‡ For simplicity, we refer to these as the hyperfine components.  Figure 2. Schematic showing how sidebands are applied to the laser light to address the hyperfine structure, and how the beams are combined. whose 1/e 2 radius is 9 mm at the MOT converging to 1.5 mm at the source. It is linearly polarized at 45 • to the direction of a uniform 0.5 mT magnetic field applied throughout the slowing region that de-stabilizes dark Zeeman sub-levels. Figure 2 shows the setup used to apply rf sidebands to the laser light and then combine the beams. The spectrum of L s 00 is shown in figure 1 and is generated by driving an electro-optic modulator (EOM) at 24 MHz with a phase modulation index of approximately 3.1. The initial detuning of L s 00 is set to −270 MHz so that molecules travelling at 145 m/s are Doppler shifted into resonance. L s 00 is turned on at t = t on with the frequency held constant until t = t chirp . Then, the frequency is linearly chirped at rate α from t = t chirp to t = t off when the light is turned off. This chirp, which is applied directly to the laser, ensures that the light remains resonant with the molecules as they slow down. The repump light, L s 10 , is derived from the same laser as L 10 which has zero detuning. L s 10 passes twice through a 110 MHz acousto-optic modulator (AOM) which is used to control its detuning, nominally -220 MHz. The frequency-shifted light then passes through three successive EOMs driven at 72, 24 and 8 MHz giving a near-continuous spectrum of light with a width of about 360 MHz. § The frequency shift and broadening ensure that all molecules interact strongly with the light irrespective of velocity or hyperfine state. L s 10 is turned on at t on and off at t off , and its centre frequency is constant throughout. Figure 1(b) shows the frequency components of the main MOT light, L 00 , and the handedness of each component. The transition from F = 2 is driven by two frequency components of opposite handedness, one red-detuned and the other blue-detuned. This implements the dual-frequency scheme described in [13], which avoids optical pumping into dark states and produces strong confinement. Because the separation between the F = 2 and upper F = 1 hyperfine levels is only 24 MHz, the laser component detuned § 90% of the power lies within this bandwidth.  to the red of the F = 1 level also acts as the blue-detuned component for the F = 2 level. The optical system used to generate the required light is shown in figure 2. We first split L 00 into two parts in the ratio 3 : 1. The first part passes through a 74.5 MHz EOM, so that the carrier addresses F = 0 while the sidebands address the F = 2, and lower F = 1 levels. The second passes through a 48 MHz AOM to generate the sideband that addresses the upper F = 1 level. The detuning and power of L 00 are both varied throughout the experiments presented in this paper. They are controlled by a 110 MHz AOM set up in a double-pass configuration. Each of the three MOT repumps passes through a 24 MHz EOM to generate the sideband structure shown in figure 1(b). All three are set to zero detuning. The two L 00 beams along with the three MOT repumps are combined using a fibre cluster into a single fibre that delivers the light to the MOT. The output of the fibre is linearly polarized and the intensity profile has a 1/e 2 radius of 8.1 mm. The resonant frequency of each laser is given in table 1. We find these by using each laser, with sidebands, as a transverse probe, and measuring the laser-induced fluorescence as a function of frequency. The largest fluorescence signal occurs when all hyperfine levels are addressed simultaneously. For all lasers other than L 00 we refer to this as zero detuning. For L 00 we find that there is a critical frequency where a MOT is produced in half of all shots, and we denote this as the zero of detuning for L 00 , ∆ 00 = 0. No MOT is formed when ∆ 00 > 0, and a stable MOT is formed for ∆ 00 < 0. This is a more sensitive and reproducible way of fixing the frequency than finding the maximum fluorescence, which occurs at ∆ 00 = 2π × 2(4) MHz. Figure 3 shows the MOT chamber and illustrates how the light is folded and retroreflected to produce the six counter-propagating MOT beams. The optical layout follows reference [7] -at each input window a 617 nm quarter-wave plate produces the required circular polarization, and at each exit window another quarter-wave plate returns the polarization to linear, so that the light is linearly polarized at every mirror. The vertical beams have opposite handedness to the horizontal ones. All windows and waveplates have a broadband anti-reflection coating. The beam is slightly converging to compensate for the residual losses, so that each pass has approximately the same intensity. To ensure optimal overlap of the retro-reflected light the beam is recoupled back through the fibre.
The magnetic quadrupole field of the MOT is produced by a pair of anti-Helmholtz coils, with an inner diameter of 30 mm, placed inside the MOT chamber as illustrated in figure 3(a). Each coil consists of two laser-cut copper spirals, with eight turns per spiral, mounted on either side of an AlN plate. The plates are connected to a copper block which is mounted to the chamber and acts as a heat sink. Three Helmholtz coil pairs mounted around the chamber, one for each axis, are used to cancel background fields. To take pictures of the MOT, we image its fluorescence. The imaging system is composed of two lenses, a 50.8 mm diameter, 60 mm focal length lens inside the chamber which collimates the fluorescence, and a 40 mm diameter, 28.6 mm focal length lens outside the chamber, 100 mm from the first, which images the light onto either a photomultiplier tube (PMT) or a CCD camera. The imaging system has a measured magnification factor of 0.5. From numerical ray tracing, transmission measurements and the specified quantum efficiency of the camera, we calculate a detection efficiency of 1.5(2)%. We were careful to reduce background scatter from the MOT beams. The MOT coils, the AlN support plates, and an octagonal enclosure inside the chamber are all painted black ¶. The copper heat sink is covered with light absorbing foil + which provides a dark background against which the MOT fluorescence is imaged. A bandpass filter placed between the two lenses transmits the dominant 606 nm fluorescence while blocking the background scatter from the 628 nm lasers. With this set up, and the MOT beams at full power, the background is 8.5 × 10 4 photons/s/mm 2 . This may be compared to the fluorescence collected from a single molecule which is 3.6 × 10 4 photons/s. Figure 3(b) shows an image of a MOT. From the number of photons collected, our measurement of the scattering rate (see section 4.2) and the measured detection efficiency, we deduce that there are 4.6 × 10 3 molecules in this MOT. When all parameters are optimized, this number is about 2 × 10 4 . We typically sum together 50 images, giving a standard deviation in the number of molecules detected of 3%.

Loading the MOT
We will see in section 4.7 that the capture velocity of the MOT is about 11 m/s. To load the most molecules, they should reach this capture velocity just as they arrive in the MOT capture volume. If they reach low velocity prematurely, they diverge too much and are less likely to be captured. We turn on the slowing light at t on = 2.5 ms and apply a linear frequency chirp of α = 24.5 MHz/ms between t chirp = 3.4 ms and t = t off , when the slowing light is turned off. The MOT does not load until the slowing light ¶ Alion MH2200. + Acktar Spectral Black. is turned off because this light pushes trappable molecules out of the trapping region. Figure 4(a) shows the fluorescence of molecules in the MOT region as a function of time, recorded by a PMT. For these data, t off = 15 ms, shown by the vertical dotted line. The orange curve shows the arrival time distribution when there is no slowing applied. The most probable arrival time is 8 ms, corresponding to a speed of 150 m/s. The blue curve shows the distribution when the slowing is applied but no molecules are captured because only one MOT beam is used and the magnetic field is off. At early times, up until 7.3 ms, the slowed and unslowed curves are similar because the fastest molecules (> 165 m/s) do not interact with the slowing light. Then there is a dip in the slowed signal, followed by a broad bump at later times, corresponding to molecules that have been slowed. The dashed purple curve is from a simulation of the slowing that uses the experimental slowing parameters and free flight velocity distribution as inputs. No scaling is applied, and the result matches perfectly with the experiment, showing that we have a good understanding of the slowing. These simulations have also been validated previously [18]. The red curve shows the distribution when the MOT magnetic field is on and all six beams are present. This curve roughly follows the blue curve until t = t off . Then there is a rise in signal as molecules are loaded into the MOT, followed by a decline as they are gradually lost. Figure 4 results for MOT loading. The red curve is the same experimental data as shown in (a). The black curve shows the simulated arrival-time distribution of molecules that arrive at the MOT within a 1 cm diameter disk, with forward speeds below 10 m/s, and arrival times greater than t off , which we call N trappable (t). Without modelling any of the details of the capture process itself, we can estimate the number of trapped molecules to be [ t t off N trappable (t ) dt ]e −(t−t off )/τ where τ = 95 ms is the measured lifetime (see section 4.6). This result is shown by the purple curve, whose height has been scaled to give a good match with the experiment. We see that this simulated MOT loading curve is in excellent agreement with the measured one. This analysis gives a clear picture of which molecules in the beam are captured by the MOT, which is especially important in designing strategies to increase the number of molecules loaded.
Next we investigate how the number of molecules captured in the MOT depends on the chirp amplitude of the slowing laser, which is a key parameter of the slowing process. The red curve in figure 5(a) shows the measured effect of t off , which we also express in terms of the total frequency change δf slowing = α(t off − t chirp ). Increasing t off reduces the final velocity of the slowest molecules, so the number of trapped molecules initially increases with t off . However, if t off is too large many of the molecules reach zero velocity before they arrive at the MOT and so the trapped number falls again. The number in the MOT is largest when t off = 14.2 ms, corresponding to δf slowing = 265 MHz. The coloured band indicates the fluctuations in the 50-image averages (the shot-to-shot fluctuations are 7 times larger). These fluctuations are large for t off = 14.2 ms, so we prefer to use t off = 15 ms [δf slowing = 285 MHz] where the fluctuations are far smaller while the average number is only a little less. This is the value of t off we use for all subsequent data. The blue points in figure 5(a) show the expected number of molecules in the MOT for various t off predicted by the slowing simulations discussed above in the context of figure 4. As before, we sum up all molecules that arrive at the MOT within a 1 cm diameter disk, with forward speeds below 10 m/s, and arrival times greater than t off . These simulations show the same overall trends as in the experiment, but predict a large peak when t off = 13.25 ms which we do not see in the experiment. To understand this peak, it is helpful to study the velocity distributions found from the simulations. Figure 5(b) shows these velocity distributions for three values of t off . When t off = 12 ms, the velocity distribution has a narrow peak at 24 m/s, which are molecules that have faithfully followed the chirp, and then a broader, faster distribution which are those that have fallen behind the chirp. For this value of t off , there are hardly any molecules slow enough to be captured. Increasing t off pushes the narrow peak in the velocity distribution to lower velocities. The peak also gets smaller because of the increased divergence of the slowest molecules. When t off = 13.25 ms, the narrow peak is pushed below 10 m/s and can be captured, producing the predicted peak in figure  5(a) for this t off . For t off = 15 ms, molecules in the slow peak are brought to rest or even turned around before reaching the MOT. However, some of the molecules which have fallen behind the chirp are now slow enough to be captured, so there is no sharp cut-off as t off increases. At present, we do not know why we fail to observe the strong response predicted at t off = 13.25 ms. The slowing simulations appear to be reliable [18], and they predict the observed arrival time distributions accurately [see figure 4]. We note that the larger fluctuations near the expected optimum t off shows that a larger number is sometimes obtained, and suggests strong sensitivity to some parameter that is inadequately controlled.

Properties of the MOT
In this section we show how the properties of the MOT vary with the key parameters of the setup, mainly the total intensity at the MOT (I 00 ) of L 00 , the detuning (∆ 00 ) of L 00 , and the axial magnetic field gradient (dB/dz). When not being varied these are set to be I 0 00 = 400 mW/cm 2 , ∆ 0 00 = −0.75 Γ and dB/dz = 30.6 G/cm. These standard parameters are also used for fluorescence imaging, unless otherwise stated.

Number of molecules
We first investigate how to maximize the number of molecules in the MOT. The filled points in figure 6(a) show how this number depends on the power of L s 00 . For powers below 20 mW we observe no molecules in the MOT. Above this threshold the number increases roughly linearly until it saturates at a power near 120 mW. The open points are the results of simulations identical to those discussed above but done for various L s 00 powers. The simulations predict the same trends as we observe, with two notable exceptions. First, they predict that the trapped number will fall above a certain power, because at high power the molecules reach low speed too early. We do not see this effect in the experiment. Second, the simulations exhibit high sensitivity to the exact power (the sharp structures are not noise). The experimental data does not have the resolution to see this. We have also investigated how the number of molecules depends on the power of L s 10 . The nominal power is 130 mW, and we see no difference if we halve or double this value. Figure 6(b) shows the relative number of molecules as a function of I 00 . The number increases with intensity until 200 mW/cm 2 , above which it remains constant within 10%. We note that we still load 10% of this maximum number when I 00 is only 2% of I 0 00 . Figure 6(c) shows the relative number of molecules versus ∆ 00 . This shows a parabolic dependence with a maximum at ∆ 00 = −0.75Γ = −2π × 6.2 MHz. No MOT is formed when the light is blue-detuned or when it is red-detuned by more than 1.8Γ. Figure  6(d) shows the relative number of molecules versus dB/dz. We observe a MOT once dB/dz > 10 G/cm, and obtain the most molecules when dB/dz = 30 G/cm. As the field gradient is increased beyond this, there is a slow decline in the number of trapped molecules, suggesting that the trap capture volume starts to decrease at these higher gradients.

Scattering Rate
A simple rate model [9] can be used to predict the photon scattering rate of the molecules, and hence many of the properties of the MOT, as was done previously [11]. In this model, n g ground states are coupled to n e excited states, and the steady-state scattering rate is found to be R sc = Γ n e (n g + n e ) + 2 Here, I j is the intensity of the light driving transition j, ∆ j is its detuning, and I s,j = πhcΓ/(3λ 3 j ) is the saturation intensity for a two-level system with transition wavelength λ j . For our MOT there are n g = 24 Zeeman sub-levels of the v = 0 and v = 1 ground states, all coupled to the same n e = 4 levels of the excited state. It is safe to neglect the v = 2 and v = 3 ground states since their populations are always small. In the experiment, the intensity of L 10 is always higher than that of L 00 , and L 10 is on resonance whereas L 00 is detuned. It follows that the transitions driven by L 00 dominate in the sum and we can neglect those driven by L 10 . The transitions driven by L 00 have common values for ∆ and I s , and the total intensity, I 00 , is divided roughly equally between them so that we can write I j = I 00 /12. With these approximations, the scattering rate becomes where Γ eff = 2n e n g + n e Γ = 2 7 Γ, and s eff = I 00 I s,eff = 2(n g + n e ) n 2 g I 00 I s .
Equation (3) gives Γ eff = 14.9 × 10 6 s −1 , and equation (4) gives I s,eff = 50 mW/cm 2 . We measure the scattering rate in the MOT by turning off L 21 and detecting the decay of fluorescence as molecules are optically pumped into v = 2. The scattering rate is simply R sc = 1/(b 2 τ 2 ), where b 2 is the branching ratio for the excited state to decay to v = 2, and τ 2 is the measured 1/e decay constant of the fluorescence. We determine b 2 by comparing the MOT fluorescence intensity on the A(0) → X(0) (λ 00 = 606 nm) and A(0) → X(2) (λ 20 = 652 nm) transitions, where we are using the same notation as in Table 1. We isolate these two contributions to the fluorescence using bandpass filters placed between the two lenses of the imaging system, where the light from the MOT is collimated. We switch back and forth frequently between the two filters. Each time the filter is switched we take an image with no molecules present, and then subtract this from the MOT image so that only the fluorescence from the molecules remains. The pass band of each filter has a full width at half maximum of 20 nm, and the transmission exceeds 93%. Importantly, the filter that transmits at λ 20 has a transmission of less than 10 −6 at λ 00 , which is small enough to neglect. We can express b 2 in terms of known quantities as follows: Here, I 2 /I 0 is the measured fluorescence ratio, b 0 = 0.987± 0.013 0.019 is the branching ratio to v = 0 [19], T λ lens1 (2) is the transmission of lens 1(2), T λ window is the transmission of the vacuum viewport, T λ filter0 (2) is the transmission of the filter that isolates the fluorescence to v = 0(2), and λ camera is the quantum efficiency of the camera, all at wavelength λ. All transmissions and quantum efficiencies are taken from data supplied by the manufacturers. The result is b 2 = 8.4(5)×10 −4 . This is 40% smaller than the theoretical value of 1.2 × 10 −3 given in reference [20]. Figure 7 shows an example of a scattering rate measurement where the time constant is τ 2 = 572(7) µs giving a scattering rate of 2.08(13) × 10 6 s −1 Figures 8(a) and (c) show the measured and simulated scattering rate versus I 00 , along with fits to equation (2) and the associated fit parameters. Both the experimental and simulated results fit well to this model. The values of I s,eff are in agreement with each other and are close to that predicted by the simple model outlined above. The simulation gives Γ eff close to the predicted value of 0.29Γ, but the experimental value is a factor of 2 smaller than simulated. This difference could be caused by optical pumping into coherent states that, for short periods, are dark to the MOT light, an effect which cannot be captured by the rate model. There are no stable dark states for molecules that move quickly enough through the non-uniform magnetic field and light polarization, but there may be states that are decoupled from the light for long enough to limit the scattering rate. Figure 8(b) and (d) show the measured and simulated scattering rate versus ∆ 00 , along with fits to equation (2). The fit to the measurements is unconvincing, though the fit parameters are consistent with those found in (a), showing that equation (2) is sufficient to represent the dependence of the scattering rate on both intensity and detuning. The simulated data fits well to the model but with parameters somewhat different to those found in (c).

Oscillation frequency and damping constant
We consider the motion of the molecules in the direction of the slowing laser, since it is straightforward to give the molecules a push in this direction, and because the camera views this axis. Consider a molecule with displacement x and velocity v along this axis, interacting with the six MOT beams. At low intensity, s eff 1, the total scattering rate is the sum of the scattering rates from each beam individually, so it is easy to identify the force exerted by a single beam. This is not the case at higher intensities. Nevertheless, the force due to one of the MOT beams in the horizontal plane may be written as where k = 2π/λ is the wavevector, ∆ is the laser detuning, g eff is an effective magnetic g-factor for the transition, A is the magnetic field gradient in the horizontal plane, s eff is the saturation parameter for all six beams, and the factors of √ 2 account for the 45 • angle between the MOT beams and the x-axis. Here, in the numerator we have used the intensity of the single beam applying the force, while in the denominator we have used the intensity of all six beams to account for the saturation of the scattering rate by the full intensity. This approximation is known to work well for the modest saturation parameters used in this work [21]. The total force in the x-direction is where m is the mass and the factor of 2 accounts for there being two pairs of horizontal MOT beams. Using a Taylor expansion about x = x = 0, we obtain where ω, the trap oscillation angular frequency, is given by and β, the damping constant, is To measure the radial trap oscillation frequency, f = ω/(2π), and the damping constant, β, we pulse on L s 00 for 0.5 ms to push the cloud in the radial direction. We then image the cloud after various delay times using an exposure time of 0.5 ms, which is short compared to the oscillation period. Figure 9(a) is a sequence of such images showing the damped oscillation of the cloud. We integrate each image over the axial coordinate to give radial distributions, and then fit a Gaussian model to each distribution to obtain the central position of the cloud at each time. Figure 9(b) shows the mean radial displacement versus time. A suitable solution of equation (8) for the displacement from the equilibrium position x 0 is where a is the amplitude of oscillation and φ is a phase. All the oscillation data presented here fit well to this model. Figure 9(b) shows an example of this fit for the case where ∆ 00 = −0.75Γ and I 00 = 200 mW/cm 2 . For these parameters, we find f = 93(5) Hz and β = 363(40) s −1 . Here, the uncertainties are the standard deviations of repeated measurements. Figures 10(a,c) show the measured and simulated oscillation frequency as a function of I 00 . The measured frequency increases with I 00 until it reaches a maximum at 400 mW/cm 2 . The measured and simulated frequencies are in excellent agreement, differing by less than 20% across the whole range of intensities. We fit these data to the frequency given by equation (9). In this equation, all the parameters are known apart from g eff , Γ eff and I s,eff . The first two appear only as a product, so the free parameters of the fit are g eff Γ eff , which is a measure of the strength of confinement, and I s,eff which sets the intensity required to reach the strongest confinement. The measurements and simulations fit well to this simple model, with the best fit parameters given in the figure. They show that the saturation intensity appropriate to the oscillation frequency is considerably higher than that found for the scattering rate -it takes more power to trap the molecules than might be expected from the scattering rate data. It is peculiar that the experimental and simulated oscillation frequencies are in such good agreement even though the scattering rates differ by a factor of 2. Figures 10(b,d) show the measured and simulated oscillation frequency as a function of ∆ 00 , along with fits to equation (9). Both experiments and simulations suggest that the maximum f occurs very close to resonance, requiring a large I s,eff in equation (9) for a good fit. As with the scattering rate data, the best fit parameters obtained from the intensity dependence and detuning dependence are consistent with one another. It is interesting that, over the range of ∆ explored here, the oscillation frequency shows a weak and simple dependence on ∆ despite the small interval (3Γ) between the upper two hyperfine states. Figures 11(a,c) show the measured and simulated damping constant as a function of I 00 . We see that the measured values of β are far smaller than the simulated ones across the whole intensity range. At low intensities, I 00 < 25 mW/cm 2 , the measured β is 2-3 times smaller than simulated, while at high intensities the discrepancy is a factor of 5-10. In the experiment, β has a maximum near 100 mW/cm 2 , while in the simulations the maximum is beyond 750 mW/cm 2 . Damping constants much smaller than the simulated ones are also found in both the dc and rf MOTs of SrF [14,11]. We tentatively attribute this to the effect of the polarization gradient force. This force has the opposite sign to the Doppler cooling force, and dominates at low velocities, especially when the intensity is high [22]. This could result in a reduced damping constant, with a higher reduction factor at higher intensities. We are currently investigating the role of these polarization gradient forces in the molecular MOT. Despite the discrepancy between experiment and simulation, both datasets follow equation (10), as can be seen by the fits in figures 10(a,c). The much weaker damping in the experiment is reflected by a much smaller Γ eff in the fit, and the shift of the maximum to lower intensity in the experiment is reflected by a smaller I s,eff . Figures 11(b,d) show the measured and simulated damping constant as a function of ∆ 00 . We see that β gradually decreases as |∆ 00 | approaches zero, which is the opposite behaviour to f . Thus, the choice of detuning is a trade-off between maximizing f and β. Unlike the dependence on I 00 , the experimental dependence on ∆ 00 does not seem to follow equation (10). The simulation results do roughly follow this equation however.

Temperature
The expected temperature can be expressed in terms of the damping constant β and a velocity-independent momentum diffusion coefficient D, following a standard treatment [23] extended to three dimensions. For a force that is linear in the momentum, F = −β p, the cooling power is P cool = βp 2 /m = 2βE, where E is the kinetic energy. The heating power is P heat = d dt p 2 2m = D/m. Here, we have used the definition 2D = d dt ( p 2 − p 2 ), and the fact that p = 0. Equating the heating and cooling powers we find an equilibrium energy E = 3 2 k B T = D 2mβ . If the light field has no intensity gradients, so that there is no heating due to fluctuations of the dipole force, the momentum diffusion is due only to the two randomly-directed recoils per absorption-spontaneous emission cycle: This gives us the temperature Using equations (2) and (10) for R sc and β we obtain an expression for the Doppler temperature: This is identical to the Doppler temperature for a two-level atom. We note that the expression for D is modified slightly at intermediate intensities, and that fluctuations of the dipole force can alter D considerably when there are intensity gradients [23]. Nevertheless, equation (14) has been verified for a three dimensional MOT in conditions where sub-Doppler processes are ineffective [24].
To measure the temperature of the MOT we record fluorescence images after various free expansion times. We first load the MOT with the maximum number of molecules by using the intensity I 0 00 and detuning ∆ 0 00 . At t = 50 ms we either ramp the intensity to a new value over a period of 20 ms or we jump the detuning to a new value. At t = 75 ms we turn off the magnetic field and L 00 so that the cloud is free to expand. After a free expansion time ∆t, L 00 is turned back on with the standard parameters I 0 00 and ∆ 0 00 , and the fluorescence is imaged for 1 ms. Figure 12(a) shows the expansion of the cloud in a sequence of images for several ∆t. Each image is the sum of 20 repeats of the measurement. For each image, we sum over the axial (radial) coordinate to obtain the radial (axial) distribution. To each distribution we fit the Gaussian model n(x) = Ae −(x−x 0 ) 2 /(2σ 2 ) . Figures 12(b,c) show σ 2 versus (∆t) 2 for the axial and radial directions. For free expansion, the rms width σ follows σ 2 = σ 2 0 + k B T (∆t) 2 /m, where σ 0 is the initial rms width and T is the temperature. Molecules in the wings of the cloud scatter at a slightly lower rate than those at the centre due to the change of laser intensity across the cloud. This slightly reduces the apparent size of the cloud, and since the effect is stronger for larger clouds the relation between σ 2 and (∆t) 2 becomes slightly non-linear. We account for this by fitting the data to σ 2 = σ 2 0 + k B T (∆t) 2 /m + a(∆t) 4 . A model of fluorescence imaging that takes this effect into account verifies that this approach gives reliable temperatures [15]. For our data, inclusion of the (∆t) 4 term in the fit typically gives a temperature about 10% higher than otherwise. Other potential systematic errors in these temperature measurements were considered in [15] and found to be negligible. Figures 12(b,c) show examples of the fit where the axial and radial temperatures are found to be T z = 2.1 mK and T ρ = 1.9 mK. The temperatures for the two directions are always close, so we take the geometric mean T = T 2/3 ρ T 1/3 z . Figure 13(a) shows the temperature versus I 00 together with the Doppler temperature given by equation (14). At full intensity (I 0 00 ), the temperature is 13 mK, which is 17 times higher than expected from equation (14). The temperature decreases as the intensity decreases, reaching a minimum at 9 mW/cm 2 where it is 960 µK, 4 times the Doppler temperature. When the intensity is reduced below 9 mW/cm 2 , the temperature increases again. According to equation (13) the temperature is related in a simple way to the scattering rate and the damping constant. Since we have measured T , R sc and β across a wide range of intensities, we can test whether this relation is accurate for our MOT. Using linear interpolations over the measured values of R sc [ figure 8(a)] and β [ figure 11(a)] at various intensities, and equation (13), we obtain the expected temperature shown by the dashed line in figure 13(a). This shows exactly the same intensity dependence as we measure. The temperature found from equation (13) is higher than the Doppler temperature because the damping constant is smaller than predicted. The measured temperature is higher again, by a factor of 2 for intensities below 10 mW/cm 2 and by a factor of 3 at higher intensities. This shows that the diffusion constant is higher than that given by equation (12). This might be due to dipole force fluctuations which are not included in equation (12). We note that excess heating at high intensity is also seen in atomic MOTs [25,26] and various explanations have been given such as the effect of coherences between excited state sub-levels [27] and transverse intensity fluctuations of the MOT beams [28]. Figure 13(b) shows how the temperature depends on ∆ 00 at both high intensity (400 mW/cm 2 ) and low intensity (4 mW/cm 2 ). At high intensity the temperature is highest at ∆ 00 = −0.75 Γ and decreases at both larger and smaller detunings. At low intensity the temperature decreases as |∆ 00 | decreases, until ∆ 00 = −0.5 Γ where it reaches a minimum of 730 µK, 3.5 times the Doppler temperature. Our previous work shows how to reduce the temperature below the Doppler limit using a blue-detuned optical molasses [15].

Cloud size
The filled points in figure 14(a) show the rms radial size of the cloud, σ 0 , as a function of I 00 . We can interpret these data with the help of the equipartition theorem which relates σ 0 to ω and T : As I 00 is reduced from 500 to 50 mW/cm 2 the cloud size decreases. This is because T falls by a factor of 7 over this intensity range [see figure 13], whereas ω only falls by 50% [see figure 10]. As I 00 decreases further the cloud size increases because ω falls while T stops falling and then starts increasing. The open points in figure 14(a) are the predictions of equation (15) for those values of I 00 where we have measured both ω and T . These predictions agree well with the measurements between 5 and 500 mW/cm 2 . At lower intensity, the measured size is smaller than predicted by equation (15). We do not know the reason. Figure 14(b) shows the size of the cloud versus ∆ 00 . The cloud is smallest when ∆ 00 = −0.75Γ and grows rapidly as |∆ 00 | increases, reflecting the decrease in ω. Figure 14(c) shows that the size of the cloud decreases as the magnetic field

Loss rate
For times t > 50 ms, the fluorescence decays exponentially as molecules are lost from the MOT. Figure 15 shows the loss rate versus the scattering rate, which we control via the intensity I 00 . The relation between R sc and I 00 is obtained from the fit shown in figure 8(a). The loss rate increases approximately linearly with R sc as we would expect if the loss is due to a leak out of the cooling cycle. The linear fit shown in figure 15 gives a branching ratio for this leak of 6.3(5) × 10 −6 . The loss could be to a higher-lying vibrational state, or it could be due to magnetic dipole or electric quadrupole transitions that connect the excited state to rotational states N = 0 and N = 2. We note that the linear fit to the loss rate data gives a statistically significant negative intercept of −3.8(1.2) s −1 , which is not physical. This hints at a more complicated dependence on the scattering rate.  Figure 15. Loss rate versus scattering rate. Points: measurements (error bars are roughly the size of the points). Line: linear fit.
When the temperature becomes comparable to the trap depth, the high energy molecules will spill out of the trap and increase the loss rate. This loss mechanism was considered in the context of the first molecular MOT [7], where the rate for the process was approximated as This result assumes that the oscillation is lightly damped and that the force is linear in the displacement out to the trap radius, r trap . These are poor approximations, but nevertheless we can expect the order of magnitude of the loss rate to be given by this equation. Our simulations of the MOT show that the restoring force has a turning point at a radial distance of about 8 mm. Choosing this value for r trap , and using our measured values of ω ρ and T , equation (16) give R loss = 1 s −1 at the highest intensity where the scattering rate is 2.6 × 10 6 s −1 . This contribution to the loss rate falls very rapidly as the intensity is reduced because T falls faster than ω ρ . For example, lowering the scattering rate by 15% reduces R loss by a factor of 200. Thus, this loss mechanism could only be significant at the very highest intensity explored. Interestingly, the data point at the highest scattering rate in figure 15 does indeed lie significantly above the linear fit to the data.

Capture Velocity
It is difficult to measure the capture velocity of the MOT directly. Instead, we measure the escape velocity by pushing the MOT and measuring the fraction of molecules that are lost as a function of their speed. We then infer the capture velocity from these results, with the help of a simple model. To apply an impulsive push, we turn off L 00 and pulse on the slowing light, L s 00 , for a short time t push . The molecules are at the zero of the MOT magnetic field where states dark to the polarization of L s 00 are not destabilized effectively, so we found it necessary to modulate the polarization of L s 00 to reach a sufficient scattering rate. We also reduced the size of the slowing beam to 3 mm 1/e 2 radius, in order to increase the applied force. For various push parameters, we first determine the initial displacement and velocity of the cloud, x i = x(t push ) and v i = v(t push ), by imaging the cloud at various times after t push with the MOT magnetic field turned off. Figure 16(a) shows the set of {x i , v i } pairs used. Then, for each push, we turn the MOT back on at t = t push and measure the fraction of molecules recaptured by imaging the MOT at t = t push + 20 ms. Figure 16(b) shows this fraction versus v i , where the v i should be understood as {x i , v i } pairs. Were the initial displacements negligible, it would be simple to determine the capture velocity from these measurements. Since they are not, we use a model to interpret the results. In this model, the force in the direction (x) of the slowing light is given by equation (7). To account for the intensity distribution of the MOT beams we make the replacement s eff → s eff ( x √ 2 ) in equation (6), where s eff (r) = I 00 (r)/I s,eff , and I 00 (r) has a Gaussian intensity distribution with a 1/e 2 radius of 8.1 mm, truncated at a radius of r trunc = 15 mm. Using this force, we solve the equation of motion for a distribution of initial coordinates in phase-space and for a time long enough that the distribution of final coordinates separates into two components, one with x close to zero and the other with x r trunc . This gives a contour in phase space, v sep (x), which separates molecules that will be recaptured from those that escape. We then calculate the fraction of molecules from the initial phase-space distribution that lie inside this contour. We use a Gaussian spatial distribution centred at x i with rms radius equal to the measured one, σ 0 = 2.0 mm, and a Gaussian velocity distribution centred at v i and characterized by the measured temperature of 12 mK. This gives the simulated recapture fraction for each {x i , v i } pair used for the measurement. In applying this procedure, it is not clear what values of Γ eff , I s,eff and g eff to use, so we keep them as free parameters. We compare the simulated results to the measured ones for a wide range of these free parameters and choose the parameter set that gives the smallest value of χ 2 . The open circles in figure 16(b) show the results that fit best. They are found for Γ eff = 0.15Γ, I s,eff = 50 mW/cm 2 , and g eff = 0.25, all reasonable values. This comparison between model and measurements gives us a best estimate for v sep (x). The capture velocity, v c , is the largest velocity a molecule that starts at the edge of the MOT can have if it is to be recaptured, v c = v sep (−r trunc ). The result is v c = 11.2± 1. 2 2.0 m/s. The MOT simulations described in [13] predicted a capture velocity of 20 m/s, but this used larger beams, higher power and a slightly different polarization configuration. Repeating these simulations for the exact parameters used in the experiment we find a capture velocity of 14 m/s, close to our measured value.

Conclusions
Despite the complexity of the level structure and the need to avoid optical pumping into dark states, the CaF MOT behaves much like a normal atomic MOT in many respects. The intensity dependence of the scattering rate, trap frequency and damping constant all conform to the analytical results based on an effective two-level model [equations (2), (9) and (10)], and do so over a very wide range of intensities, although somewhat different values for the free parameters of these equations are needed in each case. The trap frequency is in excellent quantitative agreement with that predicted from rate equation simulations, whereas the scattering rate is a factor 2 smaller than expected and the damping constant is typically a factor of 5-10 smaller. We tentatively attribute the reduction of the damping constant to polarization gradient forces [22], though that remains to be verified. For our parameters, we measure a capture velocity of 11.2± 1. 2 2.0 m/s, consistent with our simulation. The temperature of the MOT is considerably higher than the Doppler temperature, especially at high intensity. It is also a factor of 2-3 higher than predicted by equation (13) when we use our measured values of the scattering rate and damping constant. This shows that the elevated temperature is a consequence of two factors, a reduced damping and some excess heating above that given by the diffusion constant in equation (12). The reasons for the reduced damping and the enhanced diffusion would make an interesting topic for future study. We find that the usual relation (equation (15)) between MOT size, temperature and trap frequency holds for our MOT, except at very low intensity where there seems to be a discrepancy. As expected, the size of the cloud scales inversely with the square root of the magnetic field gradient. Molecules are lost from the MOT with a typical rate of about 10 s −1 , and this loss rate scales linearly with scattering rate. The data is consistent with a leak out of the cooling cycle with a branching ratio of about 6 × 10 −6 . The lifetime, ∼ 100 ms, is short by the standards of atomic MOTs, but is adequate for most purposes. Indeed, the whole process of capturing the molecules and cooling to sub-Doppler temperatures [15] takes less than 50 ms. Once loaded into a conservative trap, we can expect the lifetime to increase.
Rapid deceleration of the molecular beam with good velocity control is crucial to loading the largest number of molecules. It has also been one of the most difficult steps to perfect and understand. We use the frequency-chirp method to slow down the molecular beam. Our simulations and experiments [18] suggest that this is a better slowing method than the alternative of broadening the frequency of the slowing laser [29].
We previously estimated a flux at the MOT region of 7 × 10 5 molecules per cm 2 per shot with speeds below 15 m/s [18]. We have recently re-evaluated the flux from our cryogenic source [17] and find that it is a factor of 4 smaller than our estimate in [18]. This reduces the flux of molecules with speeds below 15 m/s to 1.8 × 10 5 molecules per cm 2 per shot. Scaling to our measured capture velocity, assuming the flux scales roughly as the square of the velocity, we expect about half this number. We use MOT beams with 1/e 2 radius of 8.1 mm, giving a capture area of at least 0.5 cm 2 . Thus, there are at least 4 × 10 4 trappable molecules, which is a factor of 2 larger than we observe in the MOT. The measured capture velocity is appropriate to molecules entering on axis and must be smaller away from the axis, and this may account for the discrepancy. While our simulations match well with our measured arrival-time distributions and velocity distributions, they predict a large peak in the number loaded into the MOT when the chirp parameters are tuned correctly, which we do not observe. Thus, while much progress has been made in understanding the slowing and MOT loading processes, some mysteries remain. We believe there is great scope for increasing the number of molecules loaded. We plan to add a region of transverse cooling before the slowing begins to compress the transverse velocity distribution, which should increase the number delivered to the MOT. Our linear frequency chirp is probably not the best chirp function and optimization of its functional form may give more molecules. Other slowing methods, such as Zeeman-Sisyphus deceleration [30], promise to deliver further increases in flux.
At present, the only other 3D molecular MOTs reported are the dc and rf MOTs of SrF [7,14,11,12] and the recent rf MOT of CaF [16]. For SrF, the rf MOT seems superior to the dc MOT. In particular, the rf MOT loads more molecules, has a long lifetime, and can be cooled towards the Doppler limit by lowering the laser intensity. By contrast, molecules in the dc MOT could not be cooled by lowering the intensity because its lifetime was found to decrease drastically at lower laser powers. Our dc MOT of CaF does not suffer from this problem. In fact, the lifetime is longer at lower intensities. The molecules cool as the intensity is lowered, just as in the SrF rf MOT. We suggest that the differences observed between the dc MOTs of CaF and SrF might be due to the difference in the ground-state hyperfine intervals. Most of the confinement in our CaF MOT comes from the dual-frequency effect [13] acting on the F = 2 state. In CaF, the splitting between the upper two hyperfine components is about 3Γ. When the detuning is −Γ, the upper F = 2 level is addressed by a σ − component detuned by −Γ and a σ + component detuned by about 2Γ (see figure 1(b)). It is known that this dual-frequency arrangement produces a strong confinement (see figure 2 of reference [13]). The same polarization configuration has been used for the dc MOT of SrF [14], and the dual-frequency effect provides the confinement in that case too. In SrF however, the equivalent hyperfine splitting is 6Γ, resulting in weaker confinement. It may be that at low laser intensity the confining forces in SrF are too weak, resulting in the short lifetimes observed. If this is the cause, then it could be solved by adding an extra frequency component to approach more closely the ideal dual-frequency scheme.
The molecule MOT presents many opportunities for new research. One is to study ultracold collisions between atoms and molecules and to cool molecules to even lower temperatures by sympathetic cooling with ultracold atoms [31]. Another is to load single molecules into optical tweezer traps in order to assemble small arrays [32] for quantum simulation [33]. The molecules could also be loaded into chip-scale electric traps where they could be coupled to a superconducting microwave resonator, realizing a molecular quantum processor [34]. Precise measurements of the vibrational frequency of CaF can test whether the fundamental constants are changing in time [35]. The extension of the cooling and trapping methods to other amenable molecules [36,37] can improve the measurements of the electric dipole moments of electron and proton [9,38] and advance the measurement of nuclear anapole moments [39].