Correlative methods for dual-species quantum tests of the weak equivalence principle

Matter-wave interferometers utilizing different isotopes or chemical elements intrinsically have different sensitivities, and the analysis tools available until now are insufficient for accurately estimating the atomic phase difference under many experimental conditions. In this work, we describe and demonstrate two new methods for extracting the differential phase between dual-species atom interferometers for precise tests of the weak equivalence principle. The first method is a generalized Bayesian analysis, which uses knowledge of the system noise to estimate the differential phase based on a statistical model. The second method utilizes a mechanical accelerometer to reconstruct single-sensor interference fringes based on measurements of the vibration-induced phase. An improved ellipse-fitting algorithm is also implemented as a third method for comparison. These analysis tools are investigated using both numerical simulations and experimental data from simultaneous $^{87}$Rb and $^{39}$K interferometers, and both new techniques are shown to produce bias-free estimates of the differential phase. We also report observations of phase correlations between atom interferometers composed of different chemical species. This correlation enables us to reject common-mode vibration noise by a factor of 730, and to make preliminary tests of the weak equivalence principle with a sensitivity of $1.6 \times 10^{-6}$ per measurement with an interrogation time of $T = 10$ ms. We study the level of vibration rejection by varying the temporal overlap between interferometers in a symmetric timing sequence. Finally, we discuss the limitations of the new analysis methods for future applications of differential atom interferometry.


Introduction
Einstein's equivalence principle (EEP) is a fundamental concept in physics that describes the exact correspondence between the gravitational and inertial mass of any object. It is a central assumption of the theory of General Relativity-which interprets gravity as a geometrical feature of space-time, and predicts identical accelerations for different objects in the same gravitational field. Precise tests of the EEP are of great interest in various fields of physics. For instance, some theories that attempt to unify gravity with the other fundamental forces predict a violation of this principle [1,2]. The detection of such a violation could aid our understanding of dark energy in cosmology, and advance the search for physics beyond the Standard Model. In contrast, null results are also pivotal for putting bounds on model parameters contained in various extensions to General Relativity [3,4]. The equivalence principle is generally divided into three subprinciples that each must be satisfied for the EEP to hold [5,6]: the local Lorentz invariance, the local position invariance and the weak equivalence principle (WEP). In this article, we will focus on the latter.
The WEP-otherwise known as the universality of free fall-states that a chargefree body will undergo an acceleration in a gravitational field that is independent of its internal structure or composition. Tests of the WEP generally involve measuring the relative acceleration between two different test bodies that are in free fall with the same gravitational field. The WEP is characterized by the Eötvös parameter, η, given by where a 1 and a 2 are the accelerations of the two bodies, ∆a = a 1 − a 2 is the relative acceleration, and a = (a 1 + a 2 )/2 is the average acceleration. The WEP is satisfied if and only if ∆a = 0-implying that η = 0. The most precise tests of the WEP have been carried out with lunar laser ranging techniques [7], or using a rotating torsion balance [8,9], which have both measured η at the level of a few parts in 10 13 . Various Space missions to test the WEP at improved levels (10 −15 or better) using other classical devices are presently in progress [10,11,12]. On a separate frontier, a number of groups have carried out tests between cold atoms [13,14,15,16,17] in an effort to probe the WEP at the quantum level. The majority of these tests have been conducted using matter-wave interferometers which, over the past few decades, have been extensively studied both theoretically and experimentally [18,19,20,21]. Atom interferometers have been utilized as ultra-precise inertial sensors to measure, for example, the gravitational acceleration g [22,23,24,25], the gravitational constant G [26,27,28], gravity gradients [29,30,31,24,32], gravitational field curvature [33], and rotations [34,35,36,37]. A WEP test based on atom interferometry involves measuring the differential phase shift resulting from a relative acceleration between two species with different masses that are in free fall within the same gravitational field. This measurement is based on the same principle as gravity gradiometry, where the quantity of interest is the differential phase between test atoms of the same type but in different spatial locations. The gradient of the gravitational field can be extracted from the differential phase between two sources, while higher derivatives of the field can be accessed if more than two sources are used. This technique was recently demonstrated to measure the curvature of the gravitational field, and has been proposed to detect gravitational waves and to study geophysical effects [38,39,40]. Presently, the state-ofthe-art for WEP tests using matter-wave interferometry corresponds to an uncertainty of 10 −8 [17]. A comparison between the gravitational acceleration measured by atoms and a macroscopic object (i.e. a falling corner-cube) have also been carried out, and yield agreement at the level of δη 6.5 × 10 −9 [41]. A handful of ground-based [42,43,44,45] and micro-gravity-based [46,47,48,49,40] cold-atom experiments are currently underway that aim to greatly improve this precision. In addition, there have been a number of proposals for Space-based quantum tests of the WEP [50,51,52,53,6] that target accuracies at the level of 10 −15 .
So far, most tests with cold atoms have used two isotopes of the same atomic element, e.g. 85 Rb and 87 Rb [13,14,54,17], or 87 Sr and 88 Sr [15]. Although this class of test bodies has demonstrated a good level of common-mode noise rejection when performing differential phase measurements [14], it is intrinsically less sensitive to possible violations of the equivalence principle because the two atoms are relatively similar in mass and composition. Thus, it is interesting to perform these tests with two entirely different atomic elements. In this article, we will focus on the case of 87 Rb and 39 K. These atoms exhibit a large difference in their number of nuclei-facilitating a mass ratio of M Rb /M K ∼ 2.2. Additionally, they have identical hyperfine spin structure, and similar excitation wavelengths (around 780 nm and 767 nm, respectively), which enables the use of the same laser technology and optics for cooling and interferometry. Dual-species interferometers of this type have the added advantage of being highly independent-that is, atomic sample properties such as the size and temperature, or interferometer parameters such as the interrogation time, Raman phase, and detuning, can be controlled independently. In contrast to dual-isotope setups where many of these parameters are coupled, this feature is ideal for studying a variety of systematic effects that will be important for future precision measurements [54]. For a more complete comparison of alkali atoms as candidates for WEP tests, see for example ref. [55].
One complication that arises with non-common elemental species is a difference in the scale factors, S j k eff j T 2 j , between the interferometers. When the interrogation times T j are the same, this difference originates from the effective wave vectors k eff j of the interferometer beams used for atoms j = 1 and 2. Assuming that the WEP is true, the phase shift of the two interferometers due to a common acceleration a is Φ j = S j a. Thus, a difference in the scale factors produces a relative phase shift between interferometers for the same acceleration: δφ sys d = (S 1 − S 2 )a. For the case of 85 Rb and 87 Rb, the scale factors can be made the same by a suitable choice of Raman laser detuning that guarantees k eff 1 = k eff 2 [51]. However, this is not generally possible for different elemental species, such as the alkali metals, and this problem must be addressed in other ways.
Another issue related to having different scale factors regards the rejection of common-mode vibration noise between interferometers. From an analysis of the interferometer transfer functions (see Appendix C or refs. [56,57,51], for instance), one can show that perfect common-mode rejection requires four conditions to be satisfied: (i) the interferometers occur simultaneously with T 1 = T 2 , such that they experience the same vibration noise, (ii) they have identical wavevectors, k eff 1 = k eff 2 , they exhibit (iii) identical effective Rabi frequencies, Ω eff 1 = Ω eff 2 , and (iv) identical pulse durations, τ 1 = τ 2 . These conditions imply that if S 1 = S 2 , the interferometers do not respond to common-mode noise with the same phase shift.
The scale factors can be made the same by adjusting the interrogation times of the interferometers such that T 1 = rT 2 , where r = k eff 2 /k eff 1 [47]. This technique eliminates the systematic phase shift δφ sys d resulting from a constant acceleration, and improves the rejection of common vibration noise at frequencies 1/T 1 , but it degrades the rejection efficiency at frequencies above ∼ 1/T 1 (see Appendix C). However, if the ratio r is very close to unity, as it is for some choices of atoms (r 1.009 for 39 K and 87 Rb), this option represents a good compromise between efficient noise rejection and reducing systematic effects.
In this article, we describe and demonstrate three analysis methods for atominterferometric WEP tests-including two new techniques that eliminate both aforementioned problems of systematic phase shifts and diminished common-mode rejection between coupled interferometers of different atomic species. The first of these two new methods is a generalized Bayesian analysis of the Lissajous curves formed by plotting the coupled sensor measurements parametrically. The second technique involves restoring the interferometer fringes by correlating with an auxiliary mechanical accelerometer. In this case, the phase shift for each species can be measured directly from the reconstructed fringes regardless of their scale factors or the degree of temporal overlap between the interferometers. Both of these new methods intrinsically accounts for different scale factors, and return unbiased estimates of the differential phase. Finally, to give a complete picture, we compare these techniques with an improved ellipse-fitting method recently developed by Szpak et al [58]. This numerical procedure yields an estimate of the differential phase shift with reduced bias compared to more commonly implemented algorithms in the presence of significant amounts of uncorrelated noise between sensors.
In this work, we also report correlated phase measurements between simultaneous interferometers of differential elemental species ( 39 K and 87 Rb). When operated in an environment with significant levels of vibration noise, we demonstrate a common-mode vibration rejection factor of γ 730. These results represent a major step toward precise tests of the WEP with elements exhibiting vastly different masses. We also investigate the accuracy of the three analysis aforementioned methods on experimental data obtained from the K-Rb interferometer.
The article is organized as follows. Section 2 reviews some theoretical background concerning a WEP test with a dual-species interferometer. In sec. 3, we briefly describe the three methods of extracting the differential phase. We give a brief description of the experimental setup for the K-Rb interferometer in sec. 4. We present our experimental results in sec. 5, and we give a discussion of the advantages and limitations of the new methods in sec. 6. Finally, we conclude in sec. 7. A detailed description of the three analysis methods, including extensive numerical tests of the generalized Bayesian estimator, can be found in the Appendices.

Testing the WEP with two atomic species
An atom-interferometric test of the WEP involves measuring the relative acceleration between two atoms of different mass. This can be done in one of two ways: (i) the absolute acceleration of each atom, a 1 and a 2 , can be individually measured and subtracted, or (ii) ∆a can be measured directly from the differential phase, φ d . In the ideal case, acceleration measurements are performed simultaneously in order to take advantage of correlated noise between sensors-reducing the total uncertainty in ∆a. Since method (ii) involves a direct measurement of φ d , it intrinsically requires both simultaneity and phase correlation between atomic sensors to reject common-mode noise. Henceforth, two or more atom interferometers that satisfy these conditions are referred to as "coupled sensors". Method (i) can be carried out regardless of these two constraints. In this section, we outline some theoretical background related to a WEP test with method (ii).
Generally, the output from two coupled atomic sensors is described by the following sinusoids where A j and B j are, respectively, the amplitude and offset of the interferometer fringes associated with sensor j (j = 1, 2). In principle, these two parameters can be measured and eqs. (2) can be recast in the normalized form n j = (y j − B j )/A j : Here, a is an acceleration common to both atoms, S j is the scale factor for interferometer j, and φ j is a phase shift. The scale factors can be computed exactly from the integral of the response function, f j (t), given by eq. (C.5): where k eff j is the effective wave-vector for the counter-propagating interferometer beams, T j is the interrogation time, and τ j is the π/2 Raman pulse duration. A detailed explanation of the response function and its role in WEP tests is outlined in Appendix C. For large interrogation times, T j τ j , the scale factors reduce to the well-known relation S j k eff j T 2 j . The phases φ j can be related to the Eötvös parameter by realizing that, in the absence of any additional phase shifts, φ 1 = S 1 (a 1 − a) and φ 2 = S 2 (a 2 − a), such that The sensitivity in this type of WEP test increases as the square of the interrogation time, T 1 , due to the scale factor, S 1 , that appears in the denominator of eq. (5). The general form of eqs. (3) describes a Lissajous curve. For the purposes of this analysis, it is useful to redefine the phases in eqs. (3) to reduce the number of free parameters. Choosing sensor 2 as a reference to rescale the phase of sensor 1, we define a common phase φ c that satisfies where the two new parameters, the scale factor ratio κ and the differential phase φ d , are given by The sensor outputs are now recast according to Comparing eqs. (5) and (7), it follows that the Eötvös parameter is directly proportional to the differential phase: η = φ d /S 1 a.

Correlative methods of differential phase extraction
In this section, we review three different methods to measure the differential phase from experimental data: ellipse fitting, Bayesian analysis and fringe reconstruction from mirror acceleration measurements.

Improved ellipse fitting
The ellipse fitting technique was first applied to atom interferometry in ref. [59] for situations in which the phase common to two coupled atomic sensors is sufficiently scrambled to impede individual fringe observation. In this case, when the measurements from each sensor are plotted parametrically, one obtains an ellipse that is free from common phase noise. Using a least-squares ellipse fitting algorithm, the differential phase φ d can be extracted. Multiple groups have demonstrated the utility of ellipse fitting for measurements of gravity gradients [31,32] and the gravitational constant G [26,60,27]. However, this technique suffers from a number of drawbacks. First, it is valid only for coupled sensors with the same scale factor (κ = 1). Second, in the presence of moderate amounts of noise in the fringe offsets or amplitudes [the parameters A j and B j in eqs. (2)], or in the differential phase, the ellipse fit returns a biased estimate of φ d ‡.
Recently, Szpak et al [58] developed an algorithm based on the optimization of the approximate maximum likelihood distance which seeks a balance between costly geometric methods and stable algebraic techniques. This algorithm-termed the "fast guaranteed ellipse fitting" (FGEF) method-exhibits a smaller bias in the differential phase estimate over a relatively large phase range (centered on π/2) compared to the more commonly used "direct ellipse fit" (DEF) technique [61]. Additionally, ref. [62] includes error estimations for the geometrically meaningful ellipse parameters (center coordinates, axes and orientation). We have extended their work to include an estimate of the statistical uncertainty in the differential phase, δφ d . We provide a more detailed comparison between DEF and FGEF methods of ellipse fitting in Appendix A.

Generalized Bayesian analysis
Heuristic approaches to estimating the differential phase, such as ellipse-fitting methods, do not have knowledge of the noise present in experimental data, nor of how various types of noise can affect the outcome of measurements. Bayesian analysis offers an efficient alternative to the problem by constraining the estimate based on a statistical model that describes the distribution of data that results from different noise sources [63]. Bayesian phase estimation was studied in the context of atom interferometry in ref. [64] for two sensors containing the same scale factor (κ = 1). In that work, a detailed study of each possible noise source (amplitude, offset and differential phase) is presented. Reference [55] also used Bayesian analysis to estimate the differential phase from a hypothetical system with κ < 1. There, however, only noise in the differential phase is considered, and the range of common phase was constrained to φ c ∈ [0, π]. To the best of our knowledge, no complete Bayesian estimator exists that (i) is valid for any scale factor ratio, (ii) accounts for noise in all relevant system parameters, and (iii) allows φ c to vary over a broad range. Furthermore, this type of analysis has not yet been demonstrated on experimental data from dual-species interferometers.
In this work, we have developed a generalized Bayesian estimator for φ d -based on the approach of ref. [64]-that satisfies all three of the requirements mentioned above. We demonstrate this technique by measuring φ d from both simulated data (see Appendix B) and experimental data from our K-Rb interferometer (see sec. 5). The advantage of using this estimation technique is that the uncertainty in φ d converges much faster than other methods (i.e. it scales as ∼ 1/ √ N , where N is the number of measurements), so ‡ Rosi et al [33] demonstrated that the bias in the estimate of φ d can be eliminated under certain conditions when fitting an ellipse in three dimensions from the output of three simultaneous interferometers. fewer data are required to reach a given level of sensitivity. Furthermore, since κ is built directly into the Bayesian estimate of φ d , it is free from the aforementioned systematic phase shift δφ sys d arising between interferometers with different scale factors. However, some of the drawbacks of the Bayesian analysis are that it requires a priori knowledge of the noise in the system, and it is computationally costly due to the large number of integrals that must be evaluated. Figure 1 illustrates the basic Bayesian estimation procedure. Here, we simulate data that follow the Lissajous equations (8) with added Gaussian noise in the sensor offsets. After each successive measurement from the system, the width of the new "prior" probability distribution decreases and additional peaks are suppressed-facilitating an improvement in the estimate of φ d . This is how the Bayesian method builds in information from previous measurements. It is clear from figure 1(c) that after only a few iterations, both the statistical and systematic error in φ d have decreased dramatically. A detailed description of the generalized Bayesian analysis can be found in Appendix B.

Fringe reconstruction by accelerometer correlation -The differential FRAC method
Differential atom interferometry is often utilized under conditions where each sensor is overwhelmed by external phase noise that is common to both sensors. Typically, one is concerned with only the differential phase and not the common phase φ c , which is treated as an arbitrary parameter. Both the ellipse-fitting and Bayesian estimation methods for extracting φ d take this approach. An alternative technique involves measuring the common phase and correcting for it. For the case of parasitic mirror vibrations, singlesensor interference fringes that are otherwise smeared by phase noise can be restored based on measurements from seismometers [65,66,67] or mechanical accelerometers [68,40,69]. Henceforth, we refer to this as the fringe reconstruction by accelerometer correlation (FRAC) method. In this work, we demonstrate how the FRAC method can be applied to two quasi-simultaneous interferometers of different atomic species to measure the relative phase shift between them. This technique to extract φ d is referred to as the differential FRAC method throughout the article to differentiate between the (standard) FRAC method, which is generally employed to measure the absolute phase shift of a single atom interferometer. Figure 2 illustrates the basic schematic of the FRAC method for a single interferometer. A mechanical accelerometer is secured to the back of the reference mirror used to retro-reflect interferometry light, and the time-dependent mirror acceleration, a vib (t), is recorded during the interferometer sequence. These acceleration measurements are first weighted by the response function of the j th interferometer, f j (t), and are then integrated to find the vibration-induced phase given by For each repetition of the experiment, this random phase is computed and correlated with the interferometer signal. This process allows one to reconstruct the interference fringes point-by-point. Depending on the level of vibrations and the interferometer sensitivity, the range of vibration-induced phases can span multiple fringes-enabling the single-sensor phase shift φ j to be measured using, for instance, a sinusoidal leastsquares fit to the data. It is straightforward to extend this algorithm for two or more interferometers, which do not need to be overlapped in time. In this case, the only additional requirement is that the time-series of mirror acceleration measurements span the interrogation times for all interferometers. For two coupled sensors, the differential phase is easily computed from the individual sensor phase shifts via φ d = φ 1 − κφ 2 . The statistical error in this quantity is governed by where the δφ j represent the statistical uncertainties in the φ j obtained from fits to the two fringes, and φ 1 ,φ 2 is the correlation coefficient for the measurements of φ 1 and φ 2 .
In the limit of perfect correlation ( φ 1 ,φ 2 = 1), the uncertainty in the differential phase reduces to δφ d = |δφ 1 − κδφ 2 |. Figure 3(a) illustrates how the coupled-interferometer correlation is utilized by the FRAC method. Since the fringes for each interferometer are recovered using measurements from the same classical device, the phase noise present on each fringe is highly correlated. This induces a correlation between the measurements of φ 1 and φ 2 extracted from the fits, as characterized by φ 1 ,φ 2 . The key to the differential FRAC method is maximizing this correlation to reduce the uncertainty in φ d . The correlation coefficient for a given set of reconstructed fringes can be estimated numerically from a large sample of simulated data. We find that it is sensitive to experimental parameters such as the level of uncorrelated noise on each sensor, the scale factor ratio and the differential phase. For instance, figure 3(b) shows the dependance of φ 1 ,φ 2 on φ d for synthetic fringes that contain non-common phase noise with a standard deviation of 0.1 rad. The correlation coefficient yields a maximum when the interferometers are perfectly in phase and or π radians out-of-phase. This is an ideal feature for WEP tests, since the maximum sensitivity occurs exactly at the expected signal of φ d = 0. This implies that, unlike ellipse-fitting methods where the sensitivity is optimized at φ d = π/2, one does not need to engineer an additional phase shift between the atoms to optimize the sensitivity and reduce systematic bias. Furthermore, a recent study of a gradiometer configuration (i.e. κ = 1) has shown that the differential FRAC method can reach sensitivities close to the quantum-projection-noise limit when modest levels of uncorrelated phase noise are present [70]. A number of ideal features make this technique interesting for both absolute and differential atom interferometry experiments.
1) The differential FRAC estimate of φ d is precise and unbiased over the full phase range φ d ∈ [0, π], since it relies on least-squares fits to individual fringes.
2) It is simple, fast, and computationally low in cost-allowing the interferometer phase to be corrected in real-time [69], or by post-processing the data [65,66,68].
3) Unlike the Bayesian analysis, the FRAC method does not require any a priori information about the interferometer offsets, contrasts, and noise parameterswhich can be challenging to measure accurately in situ.
4) Systematic phase shifts in φ d due to non-identical pulse durations τ j and Rabi frequencies Ω eff j [14] are accounted for in the estimates of φ vib j for each interferometer. Such systematics will be important to consider in future long-baseline differential interferometry experiments [50,43,44,51,52,53,6,45].

5) The relative timing between coupled interferometers can be freely chosen-they
need not be overlapped. This is a unique feature to dual-species interferometers that do not share the same Raman beams. Unlike the ellipse-fitting and Bayesian techniques, the FRAC method allows one to extract absolute phase information from each sensor. Varying the temporal overlap between interferometers can be useful for studying a variety of effects, such as the level of correlation between sensors, or systematics related to the interaction between atoms [54].
Although the standard FRAC method is conceptually simple to implement, the drawback is that it is sensitive to errors in the measurements of vibrations. Such errors include the quality of coupling between the mirror and the mechanical device, noise in the signal acquisition, the level of self-noise of the device, drifts in the offset or sensitivity factor, and non-linearities in both the amplitude and frequency response. On the other hand, measurements of φ d using the differential FRAC method are much less sensitive to many of these effects, since they are common to two simultaneous interferometers. We discuss the limitations of the method in more detail in sec. 7.

Description of the ICE experiment
ICE (Interférométrie Cohérente pour l'Espace) is an experiment that aims to measure η using a dual-species interferometer of 87 Rb and 39 K. It is designed to be transportable and to operate in the micro-gravity environment provided by the Novespace Zero-g plane [71,47,68,40]. In this section, we give a brief description of the experimental setup. A detailed description of the telecom-frequency fiber-based laser system used on ICE can be found in refs. [72,40]. For each atomic species, we utilize a master-slave architecture, where the master laser diode is locked to either a saturated absorption peak (in the case of rubidium), or to a frequency comb (in the case of potassium). The slave lasers are frequency-locked to their corresponding master through an optical beatnote in the 1550 nm telecom band. After second harmonic generation to 780 nm for 87 Rb and 767 nm for 39 K, the frequency of each slave laser can be precisely adjusted over ∼ 1.3 GHz within ∼ 2 ms of settling time. Approximately 1.5 W of total light is available in each slave beam before entering a free-space optical bench. This module is composed of a series of shutters and acousto-optic modulators (AOMs) that are used to split, pulse and frequency shift the light appropriately for cooling, state preparation, interferometry and detection. Finally, the 780 and 767 nm light is coupled into a series of single-mode, polarization-maintaining fibers and sent to the vacuum chamber. The two frequencies required for cooling and repumping, as well as driving Raman transitions in 87 Rb, are generated via a broadband fiber-based electro-optic modulator operating near 6.8 GHz. Similarly, an AOM operating in dual-pass configuration at ∼ 230 MHz is used to generate these frequencies for 39 K.
The sensor head is composed of a non-magnetic titanium vacuum chamber surrounded by a µ-metal shield. The chamber resides within three nested Helmholtz coils used compensate residual magnetic fields and to generate a bias along the vertical axis. A custom 2-to-6 way fiber splitter is used to combine the 780 and 767 nm light intended for laser cooling without significant power loss via a polarizing cube and a dichroic wave plate. The splitter subsequently divides the light equally into six beams that are recoupled into independent fibers used for the dual-species vapor-loaded magneto-optical trap (MOT). In a similar way, light for both detection and interferometry is overlapped in a free-space 2-to-1 way fiber combiner for 780 and 767 nm. The ∼ 2 cm diameter beams output from the combiner have the same linear polarization, and are aligned along the vertical direction through the atoms. A quarter-wave plate (fabricated for the intermediate wavelength of 773 nm) rotates the polarization of the Raman beams by 90 • such that the counter-propagating fields have lin⊥lin polarization.
A typical experimental sequence for the K-Rb interferometer is shown in figure 4 and is carried out as follows. The MOT beams load approximately 2×10 8 (7×10 7 ) atoms  in 0.5 s, which is followed by a 7 ms (5 ms) molasses cooling stage for the 87 Rb ( 39 K) sample. In addition to cooling, the rubidium molasses stage also pumps the atoms into the |F = 2 ground state. This is followed by a microwave π-pulse that transfers atoms into |F = 1, m F = 0 , and the remaining atoms are removed with a push beam resonant with the F = 2 to F = 3 transition. During the potassium molasses, the frequency and intensity of the cooling and repump beams are modified in a similar manner to refs. [73,74]. At the end of the molasses, the atoms are in a superposition of both hyperfine ground states, which is a critical part of the cooling mechanism for potassium [73]. We detune our 767 nm push beam to the red of the F = 2 to F = 3 transition by ∼ 17 MHz (2.9 Γ) to optically pump the atoms into the F = 1 level with a 3 µs pulse. Following this depumping stage, the 39 K atoms are distributed roughly equally amongst the magnetic sub-levels of the lower hyperfine ground state. With this system, we achieve temperatures of ∼ 3 µK for 87 Rb and ∼ 20 µK for 39 K, as confirmed by both time-of-flight imaging and velocity-sensitive Raman spectroscopy. After preparing the internal atomic states, we typically wait ∼ 12 ms for the atoms to fall such that the Doppler resonance of both sets of counter-propagating Raman beams becomes nondegenerate. Additionally, we apply an external magnetic bias field between 1 − 2 Gauss to shift the |F = 1, m F = ±1 states of potassium away from the central m F = 0 state on which we perform interferometry. The frequency of the Raman beams for both species is detuned by −1.2 GHz (−200 Γ) relative to the F = 2 to F = 3 transition. We then apply the interferometry pulses in a symmetric fashion, such that the central π-pulse for both interferometers occurs at the same time, as shown in figure 4. The delay between the π/2 pulses for either atom, ∆T Rb,K , can be adjusted within the interrogation time of the rubidium interferometer, T Rb , in order to study correlations and effects related to the scale factor ratio, κ. Finally, we measure the atomic state populations for each atom via fluorescence detection on an avalanche photodiode (50 MHz bandwidth) within 100 µs of one another.

Experimental results
We now describe some experimental results obtained from the K-Rb interferometer. All of the data presented in this work were recorded in a laboratory environment, with the interferometer beams aligned along the vertical direction, and with no anti-vibration platform. To compensate for the Doppler shift due to gravity, the frequency difference between Raman beams for interferometers j = 1 ≡ K and j = 2 ≡ Rb is chirped at a rate of α j k eff j g to account for the gravity-induced Doppler shift of the falling atoms. This modifies the total phase shift of the interferometers from eqs. (2) The last expression represents the case when both interferometers experience the same acceleration, a j = a = g, and the scale factors can be approximated as S j k eff j T 2 j . Determining the location of the central fringe, for which α j = k eff j g is fixed for all T j , yields a precise measurement of g. Since the sensitivity of the interferometer scales as T 2 j , the gravitational acceleration can be estimated with high precision [23,75,76,67]. As discussed in the introduction, we are interested in measuring the differential acceleration ∆a between 39 K and 87 Rb. One way of achieving this is to measure the gravitationally-induced accelerations g K and g Rb from each interferometer independently by scanning the chirp rates, α j , in a low-noise environment. This is the approach recently employed for WEP tests with 39 K and 87 Rb by Schlippert et al [16]. However, at high levels of sensitivity (i.e. large T j ), or in "noisy" environments, mirror vibrations can corrupt the fringes-making individual phase measurements more challenging. We now demonstrate the utility of the FRAC technique for measuring g from a single interferometer under these conditions.
There are typically two approaches in which the FRAC method can be applied to restore the interference fringes of a single interferometer. The first approach is to let the interferometer phase be "scanned" randomly by vibrations while the laser-induced phase is held fixed. The reconstructed fringes in this case are purely a function of φ vib j , as shown in figure 2. This mode of operation can be used to precisely calibrate the mechanical accelerometer by rescaling the voltage-to-acceleration sensitivity factor of the device such that the fringe period is 2π §. The second approach is to scan the interferometer phase in a controlled manner, for example by varying the phase difference between Raman lasers, and to correct each phase using φ vib obtained during the same measurement interval. This procedure is illustrated in figure 5, where the § One advantage of performing this procedure is that the device can be precisely calibrated for the vibration spectrum on site. Depending on the bandwidth and spectral response of the device, the sensitivity can vary significantly with the vibration spectrum. The solid curve is a least-squares fit to the corrected data, resulting in a signal-tonoise ratio of ∼ 30 and a relative statistical uncertainty of 10 −7 in the determination of g Rb -corresponding to almost an order of magnitude improvement compared to the raw data.
fringes of a T = 25 ms 87 Rb interferometer are shown before and after applying the FRAC correction. Here, the interferometer is operated without any vibration isolation in the presence of a root-mean-squared (rms) vibration noise of a vib rms 6 × 10 −5 m/s 2 (integrated over the frequency response of the interferometer)-corresponding to an rms phase noise of φ vib rms 0.6 rad. Acceleration measurements were performed with a forcebalance three-axis accelerometer (Nanometrics Titan, DC to 430 Hz bandwidth, 5 V/g sensitivity). By applying the FRAC correction to these data, we improve the signalto-noise ratio (SNR) and hence the uncertainty in the central fringe measurement by almost an order of magnitude. We estimate an individual phase correction uncertainty of δφ vib = 1/SNR 33 mrad based on the improved SNR of ∼ 30. With this method, we emphasize that the interferometer sensitivity is directly linked to the intrinsic noise of the accelerometer + signal acquisition system, and the quality of the coupling between the device and the Raman mirror. Therefore, modest improvements to any of these system components can result in a dramatic increase in the fringe SNR.

K-Rb Interferometer Correlation
Typically, when mirror motion is the dominant source of phase noise it is advantageous to use differential atom interferometry techniques to measure ∆a through the differential phase φ d . This requires a high level of correlation between interferometers in order to  (2) rad. (d) Interferometer fringes reconstructed from measurements of mirror motion using the FRAC method. The red and blue curves correspond to least-squares fits to Rb and K data, respectively. The differential phase estimated from the fits is φ FRAC d = 1.17(1) rad. Other interferometer parameters: pulse separations: T Rb = 3.018 ms, T K = 3 ms; π/2-pulse durations: τ Rb = 4 µs, τ K = 6 µs; delay between interferometers: ∆T K,Rb = 10 µs; one-photon Raman detunings: reject the common-mode phase noise. We now compare three methods of extracting φ d from experimental data recorded in an environment with high vibrational noise, as in the case of onboard applications [68,57]. These studies are also applicable to future high-sensitivity differential interferometers operated in low-noise environments [43,44,45]. Figure 6 shows data produced by quasi-simultaneous K-Rb interferometers at a total interrogation time of 2T = 6 ms. Here, we held the chirp rate fixed at α j k eff j g for each species, and we applied strong vibrations to the system (a vib rms 0.05 m/s 2 ) such that the random vibration-induced phase φ vib j spanned multiple fringes (φ vib rms 7.3 rad). Figure 6(a) shows a histogram of 87 Rb |F = 2 population measurements, y Rb , which clearly indicates the characteristic bimodal probability distribution of a sinusoid. These distributions can be used to estimate the contrast, offset and SNR of the interferometer fringes as described in ref. [68]. We note that the bimodal distribution is less pronounced for 39 K in figure 6(b) owing to a smaller fringe contrast, and thus a lower SNR, compared to 87 Rb. Despite this fact, the two sensors exhibit strong correlations, as confirmed by the ellipse in figure 6(c).
For these experimental parameters the scale factor ratio is κ = S K /S Rb = 1.008, and the Lissajous curve formed by parametrically plotting the atomic state populations, y Rb and y K , is indistinguishable from an ellipse at the present level of offset noise. We measure a differential phase of φ ellipse d = 1.13(2) rad from a least-squares fit to an ellipse using the FGEF method [58]. We also estimate φ Bayes d = 1.18(2) rad using the Bayesian analysis described in sec. 3.2 and Appendix B. Here, it is worth mentioning that that this non-zero differential phase does not originate from a WEP violation, but from systematic phase shifts in the experiment-primarily due the quadratic Zeeman effect from an external magnetic field. Figure 6(d) shows the output of each interferometer as a function of the vibrationinduced phase, φ vib j . Here, the single-sensor fringes were reconstructed using the FRAC method using mirror vibration measurements from a broadband micro-electromechanical accelerometer (Colibrys SF3600, DC to 1 kHz bandwidth, 1.2 V/g sensitivity). From these data the differential phase shift between interferometers is clearly visible. Sinusoidal least-squares fits to each fringe yield φ FRAC d = φ K − κφ Rb = 1.17(1) rad. Here, the statistical uncertainty δφ FRAC d was computed from the quadrature sum of each interferometer phase error. The value of φ d estimated from the Bayesian analysis and the FRAC method are in good agreement. On the other hand, the differential phase from the ellipse fit is underestimated by ∼ 40 mrad, i.e. 2σ below φ Bayes d and φ FRAC d . We attribute this discrepancy to the inherent bias of ellipse-fitting techniques (see Appendix A), which increases with the level of offset noise or differential phase noise in either interferometer.
We emphasize that a crucial input parameter for the Bayesian analysis is the common phase range. We use the accelerometer data to estimate this range once the experiment is complete: φ c ∈ [min(φ vib Rb ), max(φ vib Rb )]. However, if an accelerometer is not available, it is also possible to estimate this range using the raw data from a single interferometer. For example, one can reduce the interrogation time until the sensitivity to vibrations reaches a point where interference fringes are clearly visible. By measuring the rms scatter of the phase about a reference sinusoid, one can estimate the level of vibration noise via the relation a vib rms = φ vib rms /S. Once a vib rms is known, this relation can be inverted to determine the range of phase scanned by the same level of vibrations at larger sensitivities/interrogation times.
The data shown in figure 6(d) also indicate that the combined differential-atomicsensor + mechanical-accelerometer system is capable of efficiently rejecting common vibrational noise. We estimate a rejection factor of γ = k eff a vib rms T 2 /δφ FRAC d 730 for these data. This represents a small improvement compared to γ = 550 reported in ref. [14] with 85 Rb and 87 Rb-where a high rejection factor is expected since the wave vectors are effectively the same. Figure 7 displays the results of a correlation study between rubidium and potassium interferometers operating at a total interrogation time of 2T = 20 ms. Similar to figure 6, the interferometer phases are scanned by externally applied vibrations (a vib rms 1.6×10 −3 m/s 2 , φ vib rms 2.6 rad at T Rb = T K = 10 ms). Here, we vary the interrogation time of potassium, T K , in a symmetric way with respect to rubidium such that the centers of the π-pulses coincide. This optimizes the degree to which the vibration-induced phase The symmetric, quasi-simultaneous K-Rb interferometer was operated with T Rb = 10 ms and the interrogation time for potassium was varied between T K = 6 − 10 ms. The interferometer phase was scanned by externally applied vibrations, and individual fringes were restored using the FRAC method. Parametric plots of the atomic populations, along with the expected Lissajous curve (solid green line). These curves result from plotting the fit functions to each reconstructed fringe parametrically. There is a clear disagreement between the predicted Lissajous curves and the data for T Rb − T K 2 ms. Other interferometer parameters: τ Rb = τ K = 3 µs; noise remains common-mode, while modifying the degree of temporal overlap between interferometers. It also allows us to control the scale factor ratio since κ scales at (T K /T Rb ) 2 .
From figure 7, three features are clearly visible as T K is decreased. First, the potassium fringes undergo a phase shift that modifies the differential phase relative to the rubidium fringes. This feature, along with the fact that the scale factor ratio is varied, causes the shape of the Lissajous figures to change, as shown by the solid green curves. Second, the phase range scanned by the potassium interferometer reduces, since it scales as T 2 K . Finally, the level of correlation between the interferometers degrades as the temporal overlap decreases. This is evident from the lack of agreement between the data and the predicted Lissajous curves, particularly for T K 8 ms.
Regardless of this degradation of correlation and temporal overlap between interferometers, the differential FRAC method is able to restore the interference fringes with a good SNR (∼ 30 for 87 Rb, ∼ 10 for 39 K, limited by uncorrelated offset noise). This permits accurate, unbiased estimates of φ d with a statistical uncertainty at the level of δφ d ∼ 20 mrad with 300 points. The robustness of the differential FRAC technique makes it an ideal candidate for future WEP tests [52,53,6], or other differential atom interferometry applications [39,33].
In contrast, for Bayesian estimation, an increase in uncorrelated phase noise is problematic. When the "common phase" becomes largely uncorrelated, the Bayesian method can converge on multiple possible φ d , or may not converge at all. For these data, we find that by T K = 9 ms the Bayesian estimate of φ d is not consistent with the FRAC estimate, and for T K 8 ms the analysis is not able to converge on a unique value. We note that these particular results are strongly dependent on the level of phase noise, the degree of temporal overlap, the value of φ d and the scale factor of each interferometer. In the following section, we study some of these dependencies more quantitatively.

Comparison of Bayesian and FRAC methods as a function of κ and φ d
We have tested the functionality and accuracy of both the Bayesian and FRAC methods for extracting φ d from experimental data acquired under various conditions. Specifically, we are interested in the accuracy of these techniques over (i) the full range of differential phase φ d ∈ [0, π], and (ii) a broad range of interferometer scale factor ratios κ = S K /S Rb . To investigate these two aspects, we recorded data using the symmetric interferometer configuration shown in figure 4 with different interrogation times, T Rb and T K . Since κ is proportional to (T K /T Rb ) 2 , each configuration of T j corresponds to a different scale factor ratio. Additionally, the differential phase is modified with each T K due to a systematic phase shift of the potassium interferometer from an external magnetic field. Therefore, we are able to study both effects with a single data set. Figure 8 shows a comparison between Bayesian and FRAC estimates of φ d , using the FRAC estimate as a reference. We varied T Rb from 1 to 5 ms, and T K independently in the vicinity of T Rb such that the scale factor ratio was modified over a relatively broad range (κ 0.45 to 1.01). The phase noise due to the externally applied vibrations was kept quasi-common-mode between sensors by ensuring that T K was within a few 100 µs of T Rb . Over this range of T Rb and T K , we found that the differential phase ranged from roughly φ d = 0 to 2.8 rad as a result of a systematic shift of the potassium interferometer. It is clear from figure 8 that there is a high degree of correlation between the Bayesian and FRAC estimates, which is consistent with our expectations based on the simulations discussed in sec. 3.2. The error bars in this figure were computed from the combined statistical uncertainties of both methods, which both typically yield δφ d ∼ 30 mrad at the present level of noise.
To summarize, we find that the difference between the two estimates is consistent with zero within a typical total uncertainty of ∼ 40 mrad. These data confirm that the . A value of σ φ d = 0.05 rad was used for the differential phase noise of all data sets. The range of common phase noise was estimated from accelerometer measurements. two analysis techniques produce unbiased estimates of φ d for dual-species interferometers with vastly different scale factors. We discuss further the advantages and limitations of these two techniques in the following section.

Advantages and limitations of the methods
As discussed in sec. 3.2, Bayes' method is optimally efficient and yields a statistical error that scales as 1/ √ N , compared to more heuristic fitting techniques which converge more slowly. This improved efficiency is a clear advantage of the Bayesian estimator compared to the FRAC analysis. However, the disadvantage is that it requires a priori information about the system, such as noise levels and interferometer contrasts, and it requires significant computational resources to evaluate. Furthermore, it is only a viable solution for simultaneous interferometer configurations that exhibit a high-degree of phase correlation.
In contrast to the Bayesian estimator, the FRAC method requires only the interferometer timing parameters and a sensitive accelerometer that is well-coupled to the reference mirror in order to function accurately. It does not assume any particular interferometer configuration or require any additional system information. The FRAC method also has applications in absolute interferometry, as has been previously demonstrated in refs. [65,66,68,25,67]. Additionally, it is fast enough to be used for real-time feedback, which has been shown to improve single-sensor sensitivity [69].  Table 1. Comparison between phase noise for a single interferometer, and two coupled interferometers with effective wavevectors k eff Rb and k eff K . The rms phase noise (in radians) due to vibrations (φ vib rms ) and the self-noise of the mechanical accelerometer (φ self rms ) are shown for different frequency bands and interrogation times, T Rb . The noise from each band is summed in quadrature to obtain the total noise. Contributions less than 1 mrad are not shown. For the simultaneous differential sensor, it is assumed that k eff Rb T 2 Rb = k eff K T 2 K . The rms phase noise was computed from eq. (C.10) for the single sensor and eq. (C.11) for the differential sensor using a model for the power spectral density of ground accelerations, S a (ω), in a "quiet" location [65,66] with integrated rms noise 1.4 × 10 −4 g. The self-noise of the accelerometer was assumed to be white noise with an rms value of a self rms 3.2 × 10 −8 g/ √ Hz. Quantities in the last row indicated by "*" correspond to low-noise conditions that can be achieved with passive vibration isolation (integrated rms noise 1.4 × 10 −6 g), and an accelerometer with 10 times smaller self-noise.
vibration noise (φ vib rms ) and the self-noise of the accelerometer (φ self rms ) are shown for various frequency bands and interrogation times. For a single sensor analyzed with the FRAC method, the vibration-induced noise φ vib rms represents the spread of phase on the uncorrected fringes, while the quantity φ self rms indicates the residual phase noise present on the corrected fringes. Since this term is directly linked to the intrinsic noise of the mechanical accelerometer, it represents a fundamental limitation of the method. To give some numbers, based in the self-noise of the Titan accelerometer used in our experiments (3.2 × 10 −8 g/ √ Hz), the corresponding phase noise reaches ∼ 90 mrad for an interrogation time of 100 ms, and ∼ 3 rad by T Rb = 1 s. With this level of selfnoise, fringes cannot be reconstructed accurately. However, for a state-of-the-art device with an order of magnitude smaller self-noise (3.2 × 10 −9 g/ √ Hz), the phase noise decreases by a further factor of 10. We also point out that the noise contributions from both vibrations and self-noise are smallest at high frequencies-a result of the natural low-pass filtering of atom interferometers. Thus, high-bandwidth accelerometers are generally not required to implement the FRAC method with a single sensor.
On the other hand, when employing the differential FRAC method with two simultaneous interferometers, one can measure the differential phase significantly more accurately than the self-noise limit φ self rms . This is because the noise introduced by the accelerometer is correlated between the two interferometers-reducing the uncertainty in the determination of φ d , as discussed in sec. 3.3. A recent study [70] has shown that uncertainties close to the quantum projection noise limit can be obtained with this method when the interferometers are in phase (φ d = 0), and the accelerometer exhibits a conservative level of self-noise. For these reasons, we emphasize that state-of-the-art mechanical accelerometers are not required to make sensitive measurements of φ d with long-baseline differential interferometers, and we anticipate that competitive levels of accuracy can be achieved with readily available devices.
For two coupled interferometers exhibiting different wavevectors, the vibration phase noise is not identical and thus cannot be perfectly rejected at all frequencies.
The values of φ vib rms listed in the last four columns of Table 1 contribute directly to φ drepresenting the level of uncorrelated differential phase noise in the system. We estimate that by T Rb = 1 s the differential phase noise reaches a level of ∼ 3 rad. However, we note that the differential transfer function (eq. (C.11)) rejects most efficiently at frequencies below ∼ 1/T , and this estimate is directly linked to the vibration spectrum used. In a quieter environment, such as that achieved with a vibration isolation platform [65,66] or in a satellite [51], the phase noise can be reduced by an order of magnitude or more. At this point, the Bayesian method can be employed-which easily handles differential phase noise. Since the sensitivity scales as φ vib rms / √ N , the analysis simply requires more measurements for larger φ vib rms to reach a given level of precision.

Conclusion
We have described and demonstrated experimentally two new analysis techniques for extracting the differential phase from coupled atom interferometers with different scale factors, S j . A non-unity ratio κ = S 1 /S 2 can result from using atoms with different k eff j , or from interferometers with different interrogation times, T j . We have also carried out correlated phase measurements between simultaneous interferometers of two elements exhibiting different scale factors, and we have demonstrated a vibration rejection factor of γ 730. This system was used to validate the Bayesian and FRAC analysis methods, as well as a new ellipse fitting procedure [58], for extracting φ d .
Both the generalized Bayesian and differential FRAC methods yield unbiased estimates of φ d for any scale factor ratio, κ, and are robust against experimental parameters such as the common phase range scanned by the two interferometers, or the level of uncorrelated offset noise present in the system. These features make both methods ideal for applications of dual-species interferometry where, until now, the available analysis tools could accommodate only systems that exhibit low common phase noise or κ = 1. These new methods are also appealing for gradiometer configurations using the same atoms and the same T j [70], which have previously been utilized for precisely measuring G [26,30,60,27,28].
The freedom to vary the scale factor, the interrogation time or phase of either interferometer independently can be advantageous for studying systematic effects, interactions between atomic species [54], or for shifting the differential phase toward a region of higher sensitivity. Examples of such regions include φ d = π/2 in the case of ellipse-fitting methods, and φ d = 0 or π for the FRAC technique [70]. Both the FRAC and Bayesian methods also eliminate the systematic shift introduced on the measurement of ∆a when using dual-species interferometers with κ = 1-making them well-suited for upcoming WEP tests on ground [43,44,45], in microgravity [46,47,48,49,40], and in Space [50,51,52,53,6]. A precise determination of η with our apparatus is beyond the scope of this work, but will be the subject of a future publication.

Appendix A. Ellipse fitting methods
In this appendix, we give some background regarding ellipse-fitting techniques and illustrate the problem of parameter bias for two different fitting algorithms.
The general form of an ellipse in a cartesian plane is described by the algebraic equation for a conic F (λ, y) = λ · y = Ay 2 1 + By 1 y 2 + Cy 2 2 + Dy 1 + Ey 2 + F = 0, (A.1) provided that B 2 < 4AC. Here, λ = {A, B, C, D, E, F} and y = {y 2 1 , y 1 y 2 , y 2 2 , y 1 , y 2 , 1}. The center, orientation, major and minor axes of the ellipse are determined by the elements of λ, and the differential phase can be shown to be Generally, two types of ellipse-fitting algorithms exist: those that seek to minimize (i) an algebraic distance or (ii) a geometric / orthogonal distance between the ellipse and the data points. While algebraic methods tend to be simple, efficient and can guarantee an ellipse solution to the conic equation (A.1) (i.e. parabolic and hyperbolic solutions  Figure A1. (Color online) (a) Synthetic data following an ellipse with added offset noise. The solid green curve represents the actual ellipse, and fits to the data using the DEF method (red curve with big dashes) and FGEF method (blue curve with small dashes). The simulated ellipse contains 500 points with Gaussian-distributed noise on the offset parameters B j with standard deviations {σ B1 , σ B2 } = {0.01, 0.03} (corresponding to SNR ∼ {20, 6}). Ellipse parameters: Measured bias in differential phase estimates, φ est d , from the DEF (red triangles) and FGEF (blue points) methods relative to the actual value, φ act d . The black squares show the estimates from the differential FRAC method for comparison. On all plots, the error bars correspond to the statistical distribution of fits to 100 synthetic data sets.
can be eliminated), they tend to suffer highly from bias in the ellipse parametersresulting in a poor fit under certain circumstances. Geometric methods are usually much more accurate than algebraic algorithms, but at the cost of more complexity, more computation and less stability. Since minimizing the orthogonal distance between a point and an ellipse has no closed-form solution, these routines resort to iterative techniques that are not guaranteed to converge on an ellipse.
A commonly used algebraic method is the simple and robust "direct ellipse fitting" (DEF) method developed by Fitzgibbon et al [61] that minimizes the sum of squared algebraic distances between the points and the ellipse, N i=1 F (λ, y i ) 2 , subject to the constraint B 2 = 4AC − 1. Recently, Szpak et al [58,62] developed an algorithm based on the optimization of the approximate maximum likelihood distance which seeks a balance between the costly geometric methods and stable algebraic techniques. This algorithm-termed the "fast guaranteed ellipse fitting" (FGEF) method-also includes error estimation for the geometrically meaningful ellipse parameters (center coordinates, axes and orientation) which we have extended to include an estimate of the differential phase error, δφ d . Figure A1 illustrates the bias introduced on the differential phase estimated by the DEF and FGEF methods. For moderate amounts of noise in the offset, the DEF method tends to produce fits that are characteristically compressed along the major axis and stretched along the minor axis of the ellipse, shown as the red curve in figure A1(a).
This effect results in a biased estimate of φ d that increases monotonically away from π/2, as shown in figure A1(b). In contrast to the DEF method, the FGEF algorithm predicts an ellipse (shown in blue) that is much more representative of the actual ellipse (shown in green), and also results in less bias in φ d in the central region around π/2. Outside of this region, the bias behaves nonlinearly in a manner that depends on the ellipse parameters and the level of noise. Here, we point out that these bias estimates are dependent on the type of noise (offset, amplitude, or differential phase) and the amount of noise present in the data, but typically the bias is smallest in the vicinity of φ d = π/2, and decreases with the noise level. In general, ellipse-fitting techniques always generate a non-zero systematic on the differential phase estimate, and depending on the level of sensitivity, this bias must be carefully accounted for when performing precise measurements with φ d [30,60,27,33].

Appendix B. Bayesian analysis of Lissajous curves
In this appendix, we describe in detail our generalized Bayesian analysis technique to estimate the differential phase from Lissajous curves. We also demonstrate the effectiveness of this method using numerically simulated data with Gaussian noise in the offset parameters {B 1 , B 2 } and the differential phase, φ d . Noise in the amplitude parameters {A 1 , A 2 } of the coupled-sensor model (2) can also be included via a trivial modification of the noise model. In what follows, we first provide some relevant theoretical background of the Bayesian estimation technique. In a generalized system, where M represents a measurement of the system quantities and V represents a variable we are interested in measuring, Bayes' rule can be summarized by the following equation Here, P (V |M ) is called the "posterior" probability distribution and represents our state of knowledge after a measurement, M . p(V ) is the "prior" probability before the measurement, and L(M |V ) is called the "likelihood" to obtain a certain result for M given V . The key to the entire estimation process is the likelihood distribution, which is computed based on a specific model of the noise present in the system. The quantity N (M ) = V L(M |V )p(V ) is the probability of measuring M integrated over all possible values of V , and is just a normalizing factor for the posterior distribution. Mathematically, L(M |V ) can be thought of as a function of V with M fixed, and vice versa for P (V |M ). The essence of Bayes' rule is that knowledge of the variable V can be updated on a measurement-by-measurement basis-with each successive measurement contributing additional information that narrows the width of the probability distribution associated with V . A well-known example of this type of recursive analysis is a Kalman filter [77], which is used extensively in the fields of guidance, navigation and trajectory optimization.
For the specific case of two coupled atom interferometers, the variable of interest is φ d and the i th system measurement is given by the pair of (normalized) atomic state populations M i = {n 1 , n 2 } i . Thus, for a single measurement eq. (B.1) becomes where P (φ d |{n 1 , n 2 } i ) i is referred to as the conditional distribution based on the i th measurement. The basic algorithm for Bayes' estimation can be summarized as follows: 1) Choose a suitable initial prior distribution, p(φ d ) i=1 . In our case, we take this to be a uniform distribution within the range φ d ∈ [0, π], and zero elsewhere.
2) Record a new measurement {n 1 , n 2 } i , and calculate the likelihood distribution L({n 1 , n 2 } i |φ d ) i from the noise model.

5) Repeat steps 2) through 4) until the width of the conditional distribution reduces
to the desired level.

The likelihood distribution
The main challenge in Bayesian analysis is to compute the likelihood distribution L {n 1 , n 2 }|φ d given a specific model for n 1 and n 2 . For the specific case of coupled interferometers, there are three possible sources of noise: amplitude, offset and differential phase. To illustrate each source, we modify the definitions of the n j in eq. (8) to explicitly include these noise terms The parameters δA j , δB j , and δφ d represent uncorrelated noise in the amplitude, offset and differential phase, respectively, each of which is assumed to follow a Gaussian probability distribution with zero mean and non-zero standard deviation. Using this model, the likelihood distribution can be shown to be [64] L {n 1 , n 2 }|φ d = Here, P (n 1 |s 1 ) and P (n 2 |{s 2, ; s 1 , φ d }) are the single-sensor conditional probability distributions for n 1 and n 2 , which we discuss in more detail below. The quantities s 1 ≡ cos(κφ c + φ d ) and s 2, ≡ cos(φ c ) are the principle variables on which the coupled measurements n 1 and n 2 depend in the model (B.3). Due to the periodic nature of the Lissajous equations (8), for each value of n 1 there are multiple possible solutions for n 2 (as shown in figure B1). We assign an integer to each of these solutions. More specifically, s 2, is the th root of n 2 given n 1 = s 1 . The sum over appearing in eq. (B.4) accounts for all possible solutions. In the distribution functions P (n 1 |s 1 ) and P (n 2 |{s 2, ; s 1 , φ d }), we denote the implicit dependence on variables s 1 and φ d by a semi-colon. This notation emphasizes that the quantity s 2, is coupled to s 1 through the common phase φ c . Finally, we point out that the coupled variables s 1 and s 2, both depend on φ d , but we do not write this dependence explicitly. At this point, we need to know the possible values n 2 = s 2, (given a measurement of n 1 = s 1 ) which enter into the likelihood distribution. We devote the remainder of this section to a detailed description of computing the roots of the Lissajous equations (8). As mentioned above, due to the non-linear nature of Lissajous curves, there are multiple possible solutions for n 2 given a single value of n 1 within a predefined phase range. We denote these solutions s 2, for integer . When κ = 1, the Lissajous curve collapses to an ellipse, and only two values of n 2 exist for each n 1 over any 2π range of φ c . In this case, it is straightforward to compute the two solutions as s 2,±1 = cos[cos −1 (s 1 ) ± φ d ]. However, when κ = 1, the problem is much more complex. If the scale factor ratio can be written in the form κ = p/q, where p and q are prime numbers, then the period of the Lissajous curve is 2πq-requiring q revolutions to form a closed loop. Within each 2π interval, there can be either 0, 1 or 2 solutions of n 2 for each n 1 , as illustrated in figure B1.
To calculate these solutions for a given n 1 = s 1 and φ d , it is necessary to know the approximate range of common phase spanned by the data: φ c ∈ [φ min c , φ max c ] ¶. With this information, we compute the range of phase spanned by sensor 1, θ ∈ κ[φ min c , φ max c ] + φ d , and we subdivide this range into intervals of π such that the th interval is defined as the range θ ∈ [ , + 1)π, where = θ/π . Here, the brackets · · · indicate the floor function. Beginning with the left-most interval, we check for solutions sequentially at each π phase bin until the entire range is spanned. Empirically, we find that if a solution exists within the th interval given a value n 1 , then it is unique and can be written explicitly as Since it is assumed that φ c is random and unknown, the probability distributions of s 1 and s 2, are equivalent to that of a sinusoid: P (s 2, |φ c ) = (1 − s 2 2, ) −1/2 . ¶ In practice, this range estimate does need not to be very precise-we find that estimating the correct range to within ±π still results in a precise estimate for φ d . However, overestimating the phase range may result in a slower convergence rate for the estimate. See sec. 5.1 for a description of how the phase range can be estimated experimentally. where the integers m 1, and m 2, are defined as With these solutions in hand, it is possible to compute the likelihood (B.4) given specific noise models for the single-sensor probability distributions P (n 1 |s 1 ) and P (n 2 |{s 2, ; s 1 , φ d }). We now investigate the specific cases of offset and differential phase noise on the extraction of φ d from simulated data sets. This analysis can also be extended to include noise in the fringe amplitudes through the parameters δA j [64], but we do not consider this case here.

Offset noise
When the system exhibits noise only in the offset of the atomic state measurements, the parameters δB j are randomly distributed for each repetition of the experiment, and δA j = 0 and δφ d = 0 in the model (B.3). Under realistic conditions, these noise parameters follow a Gaussian distribution with zero mean and standard deviations given by σ B j , and the single-sensor conditional probabilities can be written as (B.7b) Figure B2 shows some examples of simulated data in the presence of offset noise, where the differential phase has been extracted using the Bayesian estimation algorithm described above. These simulations show that φ d can be precisely estimated over the full range of 0 − π, and for a wide variety of scale factor ratios. Here, we demonstrate the technique for the limited range κ ∈ [0.6, 1.4], but we have also verified that the extraction method works well outside this range. In contrast to ellipse-fitting techniques, no systematic bias in the phase estimates is observed, and fewer points are required to converge to competitive error levels.

Differential phase noise
Since the noise parameter associated with the differential phase, δφ d , adds directly to the quantity of interest, φ d , we can account for this type of noise by adding an extra convolution with our noise model at the end of any likelihood calculation. We choose to examine the case of Gaussian noise for the differential phase, such that the conditional probability distribution is Here, φ d represents a measured value of the differential phase in the presence of Gaussian noise centered on the most likely value, φ d , and σ φ d is the standard deviation of the noise distribution. The modified likelihood function is described by the convolution In a similar fashion to the offset, in the absence of any other noise sources it is necessary to estimate multiple candidate solutions for φ d over a given range of φ c in order to compute the likelihood function. Before convolving with the conditional probability distribution in eq. (B.9), the likelihood function can be written as where δ(x) is the Dirac delta function, and the sum over k accounts for all candidate solutions φ d,k that exist in the common phase range φ c ∈ [φ min c , φ max c ]. These solutions can be computed by, again, dividing the phase range into intervals of π, and labeling each of them by an integer k = φ c /π . We find that two possible solutions exist for φ d within each interval, which we denote as φ (±) d,k for φ c ∈ [±kπ, ±(k + 1)π). Explicitly, these phases can be computed from where m k = (−1) k (|k| + 1)/2 . We transform these phases into the range of 0 − π using φ Two subtleties exist with this analysis, however, that warrant discussion. First, when the common phase range exceeds φ c ∈ [−π, π], the Bayesian analysis may predict multiple equally probable values for φ d . This is obviously a problem if we are interested in a precise, unique estimate of the differential phase, and we have no pre-existing knowledge of its value. Therefore, we restrict our consideration of the problem to a range of common phase within −π to π. Second, the noise parameter δφ d can theoretically take any value, i.e. δφ d ∈ (−∞, ∞), although in practice it is limited to a finite range defined by σ φ d . So far, we have considered φ d only in the range of 0 to π, but for situations where σ φ d π/4, the likelihood distribution can have significant contributions from the wings of the adjacent π phase intervals. This effect can be taken into account by using the fact that P (φ d ) = P (−φ d ) = P (2π − φ d ), and adding mirrored versions of the likelihood to the convolution in eq. (B.9). This "tiling" technique can be extended to account for large noise levels, where more than one π phase bin is spanned [64]. Figure B3 shows some examples of simulated data in the presence of differential phase noise. As for the case of offset noise, estimates of φ d exhibit no significant bias over the full range of 0 − π, and for a large range of scale factor ratios. Additionally, only a small number of points are required to converge to a level of uncertainty less than that of the noise defined by σ φ d . The convergence of this uncertainty as a function of the number of measurements is the subject of the next section.

Scaling with measurement number
To test the scaling of the statistical and systematic error of the Bayesian estimator as a function of the number of measurements, we performed the following study. We randomly generated M = 50 samples of "measurements", each containing 100 points following the model (B.3) with noise added to either the differential phase or the offset. As a function of the measurement number, N , within each sample, we computed the Bayesian estimate φ est d (N ) and the standard deviation of the associated probability distribution δφ est d (N ). The statistical error for each measurement is taken as the average of δφ est d (N ) over all M samples, which we denote as stat φ d (N ) = δφ est d (N ) M . Similarly, the systematic error is defined as sys The results are shown in figure B4.
For the specific case of noise that contributes directly to the variable of interest (e.g. differential phase noise) the statistical uncertainty of the Bayesian estimator is given by stat As we show in figure B4(a), the measured statistical error closely follows this dependence. Similarly, on average the systematic error drops to a level much less than stat φ d after only a few measurements. This level is primarily determined by the grid resolution used when computing the likelihood distribution for φ d . During the estimation procedure, we initially set the phase grid resolution to ∼ π/100, and we refine this grid size on a measurement-by-measurement basis. As the likelihood distribution narrows, grid points are redistributed toward the maximum likelihood value. We find that this grid optimization procedure can improve the resolution by up to an order of magnitude (depending on the level of noise in the system), while keeping the number of integral evaluations per measurement fixed. For the more general case of noise present in a parameter that is indirectly related to the quantity of interest through some function, the uncertainty is constrained by the Cramer-Rao lower bound This relationship can be used to compute the convergence of φ est d in the presence of offset or amplitude noise, for example, where the noise affects φ d indirectly through the quantities {n 1 , n 2 }. The Cramer-Rao lower bound includes the Fisher information, I(φ d ), of an individual measurement, which can be computed from the likelihood distribution L({n 1 , n 2 }|φ d ) as follows Here, the brackets · · · {n 1 ,n 2 } denote an average over the random variables {n 1 , n 2 }. The Fisher information is a measure of the amount of information that a random variable (or a set of random variables) carries about an unknown parameter. In this case, the unknown parameter of interest is φ d and the set of random variables is the set of measurements {n 1 , n 2 }, which are governed by the likelihood distribution L({n 1 , n 2 }|φ d )-hence its appearance in eq. (B.14). This quantity has no closed-form expression for the case of offset or amplitude noise in our system, and must be evaluated numerically. For the parameters used in figure B4( gives a convergence rate of σ est This rate agrees reasonably well with the measured statistical uncertainties shown in the figure. We note that the Fisher information empirically scales as I ∼ e −βσ B , where β is a large factor that depends on the differential phase and the scale factor ratio used (e.g. β ∼ 35 for κ = 0.8 and φ d = 1 rad). Thus, with only a moderate reduction to the level of offset noise in the system, one can dramatically improve the convergence rate of the Bayesian estimate.

Appendix C. Response of a dual-species interferometer to mirror vibrations
Here, we summarize the essential theoretical tools required to evaluate the response of both single-and dual-species interferometers to vibrational noise of the retro-reflection mirror.
First, we provide a review of the sensitivity function for a single atom interferometer, g(t). This function characterizes how the interferometer transition probability behaves in the presence of fluctuations in the Raman laser phase difference, ϕ L (t). Developed previously for use with atomic clocks [78], the sensitivity function is a useful tool that can be applied, for example, to evaluate the response of the interferometer to laser phase noise [56], or to correct for spurious vibrations in the Raman beam optics [68,57,40]. We are primarily interested in the latter.
The sensitivity function is a unitless quantity that is defined as follows where δϕ is a phase jump occurring at time t during the interferometer that modifies the total interferometer phase, Φ, by an amount δΦ, and the transition probability P (Φ) = (1 − cos Φ)/2 by a corresponding amount δP . Thus, the interferometer phase due to an arbitrary phase noise function, ϕ(t), can be computed as The quantum mechanical nature of the atom plays a crucial role on the sensitivity function-in particular, the evolution of the internal atomic states during each Raman pulse. Using the procedure outlined in refs. [56,40], the sensitivity function, g j (t), of an interferometer with timing parameters labeled with subscript "j" can be shown to be Here, T j is the interrogation time, τ j is a pulse duration, Ω eff j is the effective Rabi frequency associated with the two-photon Raman transitions, and ∆T j is a delay with  Figure C1. (Colour online) Weight functions, w j (t), described by the response function (C.5). These weights determine the phase shift associated with mirror vibrations in eq. (C.4). The pulse durations, τ j , satisfy Ω eff j τ j = π/2. The differential weight function, i.e. the difference between the red and blue curves, is shown in black.
respect to t = 0 that facilitates a difference in the start time between interferometers. It is assumed that Ω eff j τ j = π/2, such that the first and third interferometer pulses have pulse areas of π/2 with duration τ j , and the second is a π-pulse of duration 2τ j .
To evaluate the response of an interferometer to Raman mirror motion, the phase noise function is first expressed as ϕ j (t) = k eff j z(t), with z(t) representing the timedependent position of the mirror along the axis of the beams. Then, the phase shift of interferometer j due to movement of the Raman mirror is where a vib (t) =z(t) is the time-dependent acceleration of the mirror due to vibrations, w j (t) = k eff j f j (t) is a time-dependent weight function for the mirror accelerations, and f j (t) is called the response function associated with the j th interferometer. This function is given by the integral of the sensitivity function: f j (t) = − t 0 g j (t )dt , and can be evaluated as otherwise. (C.5) At its heart, eq. (C.4) is a generalization of the well-known interferometer phase shift due to a constant acceleration, a φ j = S j a = k eff j (T j + 2τ j ) T j + 4τ j π a k eff j T 2 j a. (C. 6) In this relation, the quantity S j k eff j T 2 j is equivalent to the integral of the weight function, w j (t), which determines how strongly the mirror vibration at time t contributes to the interferometer phase shift. This function is triangle-shaped, as shown in figure C1, which indicate that the phase contributions are smallest near t = ∆T j and ∆T j + 2T j + 4τ j , where the wavepacket separation is a minimum. Similarly, the weights are largest near the mid-point, t = ∆T j + T j + 2τ j , where the separation between the interfering states is a maximum.
For the case of two coupled interferometers, the differential phase shift resulting from mirror vibrations can be expressed as where the differential weight function, w d (t), is given by the difference between the single-sensor weight functions This function has an intuitive understanding. For the extreme case when k eff 1 = k eff 2 and the two interferometers are perfectly overlapped (i.e. ∆T 1 = ∆T 2 , T 1 = T 2 , τ 1 = τ 2 ), w d (t) is zero everywhere. This implies that the differential phase shift due to mirror motion is φ vib d = 0-corresponding to perfect common-mode phase noise rejection. In the opposite extreme, when either k eff 1 = k eff 2 or the interferometers are not well-overlapped, vibration noise induces a differential phase shift φ vib d between the two sensors given by eq. (C.7). This non-zero phase shift is directly responsible for uncorrelated contributions to φ d in the case of non-overlapped interferometers, and it explains the loss of commonmode rejection in the case of coupled interferometer with different scale factors. For the case of a constant acceleration, eq. (C.7) can also be used to derive the systematic shift δφ sys d = (S 1 − S 2 )a resulting from interferometers exhibiting S 1 = S 2 . One can characterize how mirror vibrations with a given frequency spectrum affect each interferometer by computing the mean-squared phase noise Here, S a (ω) is the power spectral density of acceleration noise on the mirror, and H j (ω) is the transfer function associated with interferometer j given by the Fourier transform of w j (t). The transfer function describes how acceleration noise at a given frequency affects the phase over the duration of the interferometer. For frequencies ω Ω eff j and pulse separations T j τ j , this function is well-approximated by H j (ω) = −ie −iω(∆T j +T j +2τ j ) k eff j T 2 j sinc 2 ωT j 2 (C.10)  Figure C2. (Colour online) Normalized transfer functions, |H j (ω)|/k eff j T 2 j , described by eq. (C.10) for coupled K-Rb interferometers. These functions determine the response of the single-species (red curve) and dual-species (black curve) interferometers to acceleration noise at different frequencies, ω. Here, T 1 = 1 s and T 2 = (1 − )T 1 , with = 1 − k eff 1 /k eff 2 0.0087, and τ 1 = τ 2 = 10 µs.
For the dual-species interferometer, one uses the differential transfer function in the same fashion: H d (ω) = H 1 (ω) − H 2 (ω). (C.11) These functions are shown in figure C2 for realistic experimental parameters associated with a K-Rb interferometer. Here, there is a clear difference between the transfer functions associated with single-species and dual-species interferometers. For the individual sensors, the transfer function is well-approximated by the square of a sinc function-which exhibits regular zeroes at the fundamental frequency 1/T j and an envelope that decreases as (2/ωT j ) 2 . This dependence implies that the interferometer naturally filters the high-frequency components of the vibration spectrum, with a −3 dB cut-off frequency of ω cut j /2π = √ 2/πT j 1/2T j . The differential transfer function, on the other hand, has a much more complicated frequency dependence. We will focus on the most interesting case for WEP tests, i.e. when the wavevectors satisfy k eff 1 = (1 − ) 2 k eff 2 where 1, and the two interferometers are symmetrically overlapped in time as shown in figure C1. Under these conditions, we find that H d can be approximated by − 2 k eff 1 T 2 2 sinc 2 ωT 2 2 .
(C. 12) It follows that there is a competition between the two terms in this expression. For the extreme case when = 0 (i.e. k eff 1 = k eff 2 ), the differential transfer function is dominated by the first term, which is identically zero for all frequencies only if T 1 = T 2 .
This represents the ideal case for gravity gradiometry applications. On the other hand, when > 0 and T 1 = T 2 , the second term in eq. (C.12) dominates. Since the two interferometers are assumed to have different wavevectors, it is not possible to make the transfer function zero at all frequencies. However, it is straightforward to show that H d = 0 in DC provided that k eff 1 T 2 1 = k eff 2 T 2 2 . This criteria optimizes the rejection of common-mode vibration noise at frequencies below the cut-off for a single-sensor, ω cut j , and can be achieved by adjusting the interrogation times such that T 2 = k eff 1 /k eff 2 T 1 = (1 − )T 1 . Figure C2 shows a comparison between single-sensor and differential transfer functions for T ∼ 1 s interferometers. When operated differentially, the sensitivity to vibrations at frequencies less than ω cut j is typically more than 3 orders of magnitude below that of the single interferometer, despite the fact that k eff 1 = k eff 2 .