Experimental violation of the Leggett-Garg inequality in a 3-level system

The Leggett-Garg (LG) test of macroscopic realism involves a series of dichotomic non-invasive measurements that are used to calculate a function which has a fixed upper bound for a macrorealistic system and a larger upper bound for a quantum system. The quantum upper bound depends on both the details of the measurement and the dimension of the system. Here we present an LG experiment on a three-level quantum system, which produces a larger theoretical quantum upper bound than that of a two-level quantum system. The experiment is carried out in nuclear magnetic resonance (NMR) and consists of the LG test as well as a test of the ideal assumptions associated with the experiment, such as measurement non-invasiveness. The non-invasive measurements are performed via the modified ideal negative result measurement scheme on a three-level system. Once these assumptions are tested, the violation becomes small, despite the fact that the LG value itself is large. Our results showcase the advantages of using the modified measurement scheme that can reach the higher LG values, as they give more room for hypothetical malicious errors in a real experiment

Introduction.-The predictions of quantum mechanics regarding microscopic systems do not carry over to macroscopic objects. Unlike photons and electrons, cats and tables do not seem to exist in a superpositions of two classically observable states such as dead and alive or here and there. There is, however, no known theoretical limit on the size of objects that can be observed in an arbitrary superposition of two states and it is conceivable that we will one day be able to isolate large objects from the environment, such that they can be in what we may call a macroscopic superposition state. The Leggett-Garg (LG) experiment [1] and some extensions [2,3] allow us to test the assumption that a given system confined to a discrete set of classically observable states is never in a superposition of these states. The experiment leads to an inequality that, under some reasonable assumptions, cannot be violated when the system is in a definite classically observable state at all times, but can be violated when it is superposition of these states.
Unlike Bell's inequality, the assumptions regarding the Leggett-Garg inequality (LGI) depend on the physical system and the experimental setup. Of the three fundamental assumptions: (A1) macroscopic realism (MR): the system cannot be in a superposition of the classically observable state, (A2) non-invasive measurability (NIM): It is possible to measure the macroscopic system without disturbing it, and (A3) induction: the future cannot influence the past, only the last is independent of the experimental setup. The LGI is therefor a test of MR under a set of reasonable assumptions about the system, in particular a version of NIM. The violation of the inequality leads to the conclusion that either MR or one of the other assumptions is incorrect [4]. The aim of a welldesigned experiment is therefore to convince a skeptic that the incorrect assumption is probably MR, i.e. the system is in a superposition of classically observable states sometime during its evolution.
In the standard LG experiment a parameter K 3 is classically constrained to take values between −3 and 1, whereas quantum mechanics predicts possible violations of up to 1.5, giving a narrow margin for experimental errors. It has recently been noted [3] that in order to convince a skeptic that the reasonable assumptions are indeed reasonable, they need to be tested and the inequality must be adjusted accordingly. Consequently the margin for error gets reduced even further. Until recently it was believed that the maximal violation of K 3 is independent of the number of possible macroscopically distinct states of the system due to the fact that the measurements are dichotomic. However, Budroni and Emary [18] showed that this is only true if the measurements follow the naive Lüders update rule. In a more general setting, it is possible to observe larger violations by going to higher dimensional systems, up to the algebraic maximum of 3. While such measurements give a bigger margin for errors, it was not clear how to construct them in a reasonable way that does not require a seemingly artificial dephasing step between measurements which is in conflict with NIM.
In this work we demonstrate the first violation of the LGI with an experiment that has a theoretical bound beyond K 3 = 1.5. We present results of a set of experiments performed on an ensemble of 3-level systems in liquid-state nuclear magnetic resonance (NMR) and provide a natural method for performing the required measurement without an artificial dephasing step. The inequality is corrected for a number of non trivial assumptions about the state of the systems and the measurement device, in particular the LGI is corrected to account for non ideal measurements.
The Leggett-Garg test.-Consider a system which is evolv- ing under certain Hamiltonian. We decide to perform dichotomic measurements of an observable Q, at some time t i represented as Q i , that can perfectly distinguish between two states of a system. The outcomes of these measurements are denoted as q 1 i = +1 and q 2 i = −1. In a macrorealistic system, the outcomes q l i (l = 1, 2) represent the real state of the system, i.e. if the result was q l i we can infer that the system was in the state corresponding to q l i at time t i . A test of macrorealism is a test of this hypothesis. For LG test, one chooses three distinct times to perform a measurement and three independent experiments. In each of the three experiments we start with the same state, and then perform measurements on two of the three chosen times as shown in Fig. 1. These three independent experiments are performed many times to estimate the probabilities of being in different possible states. Using these probabilities one can calculate the two time correlations of the measurements, where q l i (l = 1, 2) means the l th outcome of measurement performed at t i .
The 3-measurement LG string is If we assume that the measurements do not disturb the system (NIM assumption) and the system is classical (i.e. macrorealistic), the value of K 3 is bounded by −3 ≤ K 3 ≤ 1. On the other hand, if the system is quantum it is possible to choose the evolution times between measurements in such a way that K 3 will go beyond 1, violating the LGI that K 3 ≤ 1. The quantum bound for K 3 is 1.5 for a 2-level system [2]. More general systems have the same bound if the measurements follow the Lüders update rule which is natural for these types of experiments. According to the Lüders rule the dichotomic measurement projects the state of the system into one of two orthogonal subspaces corresponding to the ±1 measurement results. While this projection is invasive when the system is quantum, it is theoretically non-invasive if we assume MR. In performing the LG test, we must however consider the possible objection of a skeptic who may object to our assumption that the measurement indeed follows the Lüders rule. To counter such an argument, LG suggested that the experiment is carried out using ideal negative result measurements (IN-RMs). INRMs are implemented by measuring if a system is in a given state (say that state that corresponds to q 1 i = +1) and post-selecting on negative outcomes that allow us to infer the state of the system, e.g. by finding that the system is not in a q 1 i = +1 state we infer that it must be in a q 2 i = −1 state. The original LG test considered only 2-level systems. Recently Budroni and Emary [18] showed that if one relaxes the assumption that the measurement follows the Lüders update rule, and instead one allows a more general update rule which also destroys some of the phase information within the ±1 subspaces, then the quantum bound on K 3 could be extended to a value that depends on the dimension of the system, and goes asymptoticly to the algebraic maximum of 3. For a 3level system, such measurements can lead to the value K 3 = 1.7566, when the observable Q = −|0 0| + |1 1| + |2 2| and the measurement acts like a complete dephasing channel. However, the channel seems to be more invasive than necessary and can raise questions about the validity of NIM. In such a case, it is hard to justify the violation of the LGI as a violation of MR. However, as we show below, the channel can be implemented using INRMs.
Measuring the Probabilities using INRMs.-The scheme for performing the modified LGI measurement is based on using three INRMs, one for each of the possible states. The measurement is registered on an ancillary qubit initially in the state |0 . When performing the INRM of the system state |j , the ancilla remains in the state |0 if the system is in |j and rotates to |1 otherwise. The three gates below correspond to the three types of measurements.
Consider, for example, the application of CG 0 on the following general state of system and ancilla being in state |0 where a, b and c are the off-diagonal elements of the system's density matrix. The diagonal elements of the ancilla after tracing out system are P 0 , P 1 + P 2 . Thus we can measure P 0 non-invasively. Similarly for CG 1 and CG 2 after the similar procedure, the diagonal elements of ancilla are P 1 , P 0 + P 2 and P 2 , P 0 + P 1 respectively, which enables a way of measur-ing P 1 and P 2 non-invasively. The measurement at the end of the expriment is not required to be non-invasive since we are not worried about the future dynamcis of the system. After the second evolution of the system, we measure the diagonal elements of the combined ancilla and system state. The elements corresponding to state |00 SA , |10 SA , and |20 SA are post-selected. These elements correspond to probabilities, P (i, 0), P (i, 1), and P (i, 2) respectively when CG i gate is applied, where i = 0, 1, 2 corresponds to the three states of the system.This scheme is illustrated in Fig. 2. Each single measurement described above follows the Lüders update rule. However, since we are post-selecting, we end up with only part of the quantum channel (i.e. a subchannel) that corresponds to the negative result. Adding the three subchannels that we post-select on, effectively creates a measurement that does not follow the Lüders update rule. Instead the effective trace-preserving channel that describes the evolution during the measurement is represented by three Kraus FIG. 2. General Scheme for a single run of the LG test with an INRM. We start with the system in some state ρ and ancilla in |0 0|.
The two evolution times ti and tj depend on which of the three experiments is performed (see Fig. 1). The controlled gate is the first measurement performed (one of three possible INRMs), and it is noninvasive if nothing happens, i.e the state of the ancilla is unchanged. The last measurement is not necessarily non-invasive since we are not concerned about the future dynamics of the system. The results are post-selected to include only the instances when the INRM was successful, i.e. the situations where the ancilla is in the state |0 0|.
For each measurement setting in Fig. 1, we perform three runs, one for each state of the system. Experimental implementation and results.-The experi-ments are carried out at the ambient temperature on a Bruker DRX 700MHz NMR spectrometer. As described earlier, a spin-1 system and a spin-1/2 ancilla are needed to perform the non-invasive measurements. In the experiments, we use two spin-1/2 nuclei to simulate the dynamics of the spin-1 system via the Clebsch-Gordan approach [19], which transforms a space consisting of two spin-1/2 particles to another space consisting of one spin-1 and one spin-0 particle. This transformation defining the spin-1 in terms of two spin-1/2 particles are |0 s = |00 , |1 s = (|01 + |10 )/ √ 2, and |2 s = |11 , as well as the spin-0 (singlet) state |s = (|01 − |10 )/ √ 2. For convenience, we employ this spin-1/singlet notation to describe the system state unless otherwise specified.
Therefore, we need three qubits to implement the experiment. The sample is chosen as 13 C-labeled trans-crotonic acid dissolved in acetone-d6. The molecular structure, Hamiltonian parameters and the relaxation times (T 1 and T 2 ) are shown in Fig. 3, where C 2 and C 3 are used to simulate the dynamics of the spin-1 system and C 4 as the ancilla. The spatial averaging method [20] is adopted to initialize the 3-qubit NMR system into the pseudo-pure state (PPS) where I is identity and ≈ 10 −5 is the polarization. The NMR circuit of the PPS preparation is shown in Fig. 4(a). The Hamiltonian of the spin-1 system during the free evolutions in Fig. 2 is chosen as H sys = −Ωσ s1 x /2, where Ω is set as 1 kHz and σ s1 x is the Pauli operator in the spin-1 representation. The propagator at time t i is thus In the experiment, the three different times are chosen as t 1 = 0.5 ms, t 2 = τ + t 1 , and t 3 = τ + t 2 respectively, and the experiments are conducted for a few values of τ as shown in Fig. 5. The observable to be measured is chosen as Q = −|0 0| s + |1 1| s + |2 2| s , which is equivalent to measuring the diagonal elements of the density matrix. Ideally, the maximal value of K 3 should be obtained at τ = 0.208 ms, and the following tests of non-invasiveness are performed at this optimal point. The controlled gates in Fig. 2 are decomposed into singlequbit rotations and delays, and the pulse sequence of the entire experiment is illustrated in Fig. 4. All pulses are realized by the gradient ascent pulse engineering (GRAPE) technique [21][22][23], and are robust against the B 1 inhomogeneity with the fidelity over 0.997. The π/2 and π pulses are of length 1 ms. The observable Q is measured by performing diagonal tomography in the spin-1 subspace without considering the spin-0 component [24].
The values of K 3 for different τ are shown in Fig. 5, where the blue curve is the theoretical prediction, green circles are the simulated results with the T 1 , T 2 and pulse imperfections incorporated, and red crosses are the experimental results. At the point of the maximum violation, τ = 0.208 ms, the exper-  imental values of correlations are Q 1 Q 2 = 0.542 ± 0.021, Q 2 Q 3 = 0.294 ± 0.016, and Q 1 Q 3 = −0.676 ± 0.003, respectively. It leads to the experimental value of K 3 = 1.511 ± 0.027, in consistence with the simulated result 1.495. In contrast, the ideal value of the maximum violation is 1.757, and the discrepancy (≈ 0.246) between the experimental and ideal value is dominated by the T 1 , T 2 relaxation, as the pulse imperfections merely contribute around 0.01 loss of the ideal value.
Experimental test of assumptions.-In getting the theoretical bound of K 3 we have made a number of implicit assumptions about our experimental system, in which the most notable assumption is INRMs. Since it is possible to modify the LGI by taking into account any deviations from these assumptions, our experiment is supplemented by another set of experiments to test (i) the invasiveness of the intermediate measurements and loss, (ii) preparation errors, and (iii) malicious losses. We also discuss the possibility of dark counts. An additional assumption about the pseudo-pure state is discussed in the appendix.
First we quantify how much the system is disturbed due to the imperfect controlled gates. Ideally these controlled gates should not disturb the system when it is in a fixed state |0 s , |1 s , |2 s or |s , but in practice they do disturb the system due to the long application time and pulse imperfections. Moreover, the three controlled gates are distinct and are expected to have different back actions on the system even after the negative results are post-selected in the INRMs. Explicitly, CG 0 is a direct J-coupling gate, CG 2 involves two SWAP τ is the tunable time between measurements, i.e. τ = t2 − t1 = t3 − t2. A violation of the LGI means the value of K3 goes beyond 1 which is the classical limit. The maximum violation in a 3-level system is K3 ≈ 1.757 when choosing τ = 0.208 ms. In experiment, decoherence limits our maximum violation around K3 = 1.511 ± 0.027.
gates and CG 1 is a combination of the two. The experimental lengths of CG 0 , CG 1 and CG 2 are about 40 ms, 116 ms and 76 ms, respectively. In attempt to quantify how much the system is disturbed by INRMs, we perform the following two types of experiments: (a) start with either |0 s , |1 s , or |2 s , evolve the system for a fixed time and measure the probabilities; (b) start with either |0 s , |1 s , or |2 s , apply the controlled gate and measure the probabilities. Ideally, the results from the two experiments should match perfectly, but they indeed have variations in the presence of errors. Table II shows the experimental results and their contribution to the inequality is discussed in next section.
In testing non-invasiveness we can calculate the correlation, C, value when the starting state is |p s (p = 0, 1, 2) using eq. (1)) Now the difference between the C value when we apply the gate vs no gate is the disturbance induced by our measurements. This ∆C values contribute thrice in calculation of LGI (eq.2). Since it contributes 2 times positively and one time negatively, we define the following modification over the original inequality where P.E. is the preparation error, i.e. how much the initial state deviates from the expected.
The values of probabilities which were not used in eq. 8 are considered as loses. We consider the losses that can act maliciously during the experiment, i.e we assume that the losses are somehow maliciously designed to maximize K 3 . The lowest value when we apply no gate is considered non malicious since it is independent of the gate and/or initial state. The difference between the highest and lowest give the range for the possible malicious errors. Now this is the value for one evolution, in the LG experiment there are 5 such evolutions (two for each of the experiments giving Q 1 Q 3 and Q 2 Q 3 and one for the experiment giving Q 1 Q 2 ), hence the total malicious loss if 5 times the difference With these modifications, we modify the original inequality on K 3 to K 3 ≤ 1 + KM 1 + M al = 1 + 0.1936 + 0.2095 = 1.4031 (10) We note that while this value takes the imperfections in preparation into account in the worst possible way, it is extremely unlikely that these preparation errors decrease the discrepancy between the ideal measurements and the actual measurements. A slightly more liberal version of the inequality would read Finally, we must account for the sources of errors that lead to dark counts, i.e an artificial increase in the probabilities that are post selected. There are two possible sources for this kind of error. First the measurements are not perfect and there are situations where the ancilla does not rotate to |1 when it should, leading to a false reading of |0 . Second, there are situations where a system in the singlet state goes back into one of the triplet states. The margin for the violation leaves us with an upper bound on the tolerance of the violation for possible dark counts, assuming these behave in the most malicious way possible. These can range between 0.1081 and 0.2105 depending on how we account for preparation errors in the test of non-invasiveness.
Discussion.-The motivation behind a LG experiment is to test macroscopic realism, i.e try to refute MR for a macroscopic system or at least convince a skeptic that MR assumption is implausible. While the NMR sample that we use can be considered macroscopic, the individual molecules are still in the microscopic domain, moreover there is little doubt that the individual nuclear spins can be in a superposition state. In that respect it is not too surprising that the LGI is violated, and indeed its violation tells us nothing new about macroscopic realism. We do, however, learn that we can control the systems well enough to violate the inequality and that the qutrit used can pass some quantum tests under reasonable assumptions. The violation of a LGI does not rule out the existence of a hidden variable model and indeed a skeptic could simply argue that our system behaves strangely due to the existence of hidden variables that are influenced by our choice of measurements. For liquid state NMR we already know that such a model exists [25]. Moreover we purposely discarded some of the experimental data as part of the experiment, i.e the spectrum generated at the end of each experiment could be used to generate more than the six probabilities we discussed (the off diagonal elements in the density matrix).
Since we are not, strictly speaking, testing MR, our main result is not the violation per-se but rather the methods used to achieve the violation, the discussion of possible errors in the experiment and the demonstration of their experimental relevance. Such a discussion has been missing from much of the experimental literature to date (see [3,16] for two exceptions). The LG test cannot be performed without some assumptions about the physical systems involved and, in particular, the inner workings of the measurements that we assume are non invasive. These assumptions must be tested, as they can lead to artificial violations of the inequality. In our experiment we tested particular malicious scenarios that, although unlikely, must be taken into account and discussed before they are rejected (experimentally or theoretically). We note that both our simulated predictions and experimental results (see Fig. 5) show that such artificial violations are unlikely in our system, consequently we believe that although many previous experiments did not include a careful analyses of possible errors, the violations of LGI in those experiments would probably hold even if imperfections were taken into account.
Conclusion.-We demonstrated a violation of 3-level LGI. Non-invasive measurements, an essential requirement when performing a LG test were carried out using ideal negative result measurement. We verified the non-invasiveness of such measurements experimentally and quantified how much this measurement disturbs the system. We also took account error that can occur in experiments into account and used them to modify the original inequality. These modifications resulted in increasing the classical bound and making our violation significantly smaller (but still beyond the error margins). We emphasize that the margin of violation between quantum and classical upper bound is greater when a 3-level system is tested (compared to a 2-level system). In practice the actual margin is quite low when various errors are taken in account and the use of the modified (non Lüders) measurement scheme allowed us to observe the violation despite many imperfections. The difference in experimental value from theoretical is due to the T1 and T2 decay, these errors can be avoided in different systems, for example if the couplings are strong, the gate lengths will be short. It would be a challenge to the quantum control community to observe a violation larger than 0.5 above the classical bound (modified for imperfections), however the real challenge remains to demonstrate such violations in macroscopic systems.   Fig. 1). The row index denotes the two measurement outcomes and the three values (Theory, Sim, Exp) correspond to the probabilities for these outcomes in theory, simulation and experiment respectively. (For example the row 01 represents the probability that the result was 0 in the first measurement and 1 in the second). Since the results are post-selected, the probabilities in the simulation and experiment do not add up to 1.  Each of the three tables shows the result when starting with the state mentioned on the top. The rows corresponds to the probabilities of the state denoted in first column. The first and second index in first column corresponds to the state of system and ancilla respectively. CGi stands for the gate applied and N G, when no gate is applied. Ideally the column N G should contain positive values only for the states |00 , |10 and |20 (in blue), all other values are treated as losses since they are lost in post-selection. Moreover, for an INRM the columns NG and CGi should match, the discrepancies between these columns at the post selected values (blue) are used to give an upper bound on the possible deviation from K3 due to the measurement procedure. LG as explained in text. Loss is calculated using the discarded values in an experiment where we don't expect to discard any values in postselection (the red columns in table II are discarded in post selection, and the loss is the sum of these values). Q is calculated using equation- (8), ∆Q is the difference of Q values when the gate is applied and when it is not. For an INRM and the setup used ∆Q = 0 and any deviation from 0 could theoretically boost K3 even in a MR system. P.E. stands for preparation error, i.e the probability that the prepared starting state is not the desired starting state. KM1 is the maximal boost to K3 due to measurement error, as defined in equation- (9). The losses are broken into two types. Non-Malicious (Non. Mal.) are the losses that appear irrespective of the specific experiment. Malicious ( Mal) are the losses that may depend on the choice of experiment. We assume the malicious losses are chosen in such a way that they boost the calculated value K3 by as much as possible.
The pseudo pure state dynamics.
In an NMR experiment we have access to pseudo-pure states(Eq, 6 Main text), to verify that this does not effect the credibility of the result, we perform the Leggett-Garg experiment starting from an identity state instead of |0 for the system. If starting from an identity state the end state remains identity it has no contribution in the Leggett-Garg inequality. The spectra for the Leggett Garg test on the identity was compared with a reference spectra of an initial thermal state (see fig. 6) to ensure that the contribution of the signal is below the level of precision used in the experiment.

Signal( Arbitrary Units)
FIG. 6. the spectra for the LG experiment with the identity as the initial state. The blue spectra is the signal for a run of the Leggett-Garg experiment with the identity as the initial state. The red spectra is the initial thermal state which is given as a reference. Note that while an identity will give a flat spectrum at 0, the flat spectrum does not guarantee that the state is the identity. To verify that this is the identity we rotated the state before the final measurement and produced the same flat spectrum for different observables.