Experimentally exploring compressed sensing quantum tomography

In the light of the progress in quantum technologies, the task of verifying the correct functioning of processes and obtaining accurate tomographic information about quantum states becomes increasingly important. Compressed sensing, a machinery derived from the theory of signal processing, has emerged as a feasible tool to perform robust and significantly more resource-economical quantum state tomography for intermediate-sized quantum systems. In this work, we provide a comprehensive analysis of compressed sensing tomography in the regime in which tomographically complete data is available with reliable statistics from experimental observations of a multi-mode photonic architecture. Due to the fact that the data is known with high statistical significance, we are in a position to systematically explore the quality of reconstruction depending on the number of employed measurement settings, randomly selected from the complete set of data, and on different model assumptions. We present and test a complete prescription to perform efficient compressed sensing and are able to reliably use notions of model selection and cross-validation to account for experimental imperfections and finite counting statistics. Thus, we establish compressed sensing as an effective tool for quantum state tomography, specifically suited for photonic systems.


Introduction
Quantum technologies have seen an enormous progress in recent years.Photonic architectures have matured from basic proof-of-principle schemes to intermediate scale quantum devices [1], while the robustness offered by integrated optical devices is poised to push these systems yet further [2,3].Similarly, systems of two-digit trapped ions [4] and other condensed-matter type systems such as superconducting devices are catching up at a remarkable pace [5].Building upon this technological development, important primitives of quantum information science are being experimentally realised [6][7][8][9][10].In light of these systems, it has become increasingly important to establish a toolbox for tomographic reconstruction that can keep up with this rapid development: The ironic situation that is emerging is that by now, the state of large quantum systems can be manipulated with a high degree of control, but not easily reconstructed.Clearly, these technologies and the community require further advancement of their tools for state reconstruction.In this work, we discuss an explicit method to achieve such a reconstruction, thus contributing to this long-term goal.Specifically, we demonstrate a comprehensive exploration of the performance of state reconstruction in the photonic setting as one varies both the number of measurements and the noise model.
The framework of compressed sensing, a set of techniques originating from the context of classical signal processing [11,12], has emerged as a key protagonist in closing the gap between technology and diagnostics [13][14][15].The idea behind its functioning is rooted in the fact that a substantial amount of data encountered in realistic situations are structured and can be characterised by significantly fewer parameters than with ad hoc schemes.Approximately low-rank matrices are at the center of the paradigm of matrix completion in compressed sensing and correspond precisely to approximately pure quantum states.Since pure quantum states are widely regarded as the key resource for quantum information processing, such methods for reconstructing low-rank states are especially relevant.For even larger systems, tomographic tools based on basic variational sets are conceivable, with matrix product states [16,17], their continuous analogues [18], and permutationally invariant states [19] providing prominent examples.The theory of such novel tools of reconstruction is progressing quickly.This applies, e.g., to new insights to the assignment of fair and rigorous confidence regions [20][21][22][23] as well as economical ways of performing instances of quantum process tomography [14,24,25].
Exciting steps towards using compressed sensing in experimental settings have been made [19,24,26,27] in the regime in which one assumes knowledge about the basis in which sparsity is expected [24], assumes additional structure [19], or is in the highly informationally incomplete regime [27].In this work, we complement the picture for experimental tomography for medium-sized quantum systems.In its simplest formulation, compressed sensing tomography is based on a few random expectation values of suitable observables, from which approximately low-rank states can be accurately reconstructed [13].This is suited for the situation in which expectation values can be obtained with good statistical significance, although acquiring many of them may be expensive.Still a missing piece in this picture, however, is the exploration of model selection techniques that have to be considered in the realm of experimental imperfections and finite counting statistics in order to make compressed sensing tomography a practical tool.Model selection allows to prevent over-and underfitting by controlling the dimensionality of the model of the system-in our case, the rank of the density matrix.
Here, we present a comprehensive analysis of experimental data from a multi-photon, multi-mode GHZ state source using tools of compressed sensing.Instead of working with expectation values of observables-as it is commonly done in this context, but may amount to information loss-our experi-mental setup allows us to obtain information on the individual projector level from the respective outcomes of each measurement setting.In contrast to complementing recent work [27], we are not tied to the regime of tomographically incomplete knowledge.This allows us to study the behaviour of the reconstruction for the entire range of measurement settings.We quantitatively explore model selection via cross validation and compare it to the model suggested by the anticipated noise statistics.With these tools, we provide a more systematic way to choose the appropriate parameters for compressed sensing quantum tomography.The results then provide the reader with the toolkit and understanding to effectively implement these methods for future quantum state tomography (QST) in general, and specifically for photonic systems.
This work is structured as follows: We start by reviewing concepts of quantum state tomography and discuss the specifics of compressed sensing in QST.We subsequently present our experimental setup consisting of a four-qubit photonic system, which is used as a test bed for our tomographical approach.We continue by discussing concepts of model selection in the context of QST and determine the appropriate model from the experimental data.With this, we perform compressed QST and study the performance of the reconstruction depending on the amount of collected data as well as the robustness of our method with respect to model mismatches.

Elements of quantum state tomography
Quantum state tomography is the most common method to diagnose quantum information processing tasks.It is used to estimate the unknown quantum state of a system from data produced by measuring an ensemble of identically prepared systems.By fixing a basis, a general finite-dimensional quantum state can be identified with a positive semi-definite, unittrace matrix, the density matrix Here, H d ⊂ C d×d denotes the set of Hermitian matrices, and χ 0 stands for a positive semi-definite matrix.
In order to determine the density matrix of a quantum system, we need to prepare sufficiently many copies of the state from identical preparations, perform a measurement on each copy using one out of m different measurement settingscorresponding to different observables, i.e.Hermitian matrices A (j) , j = 1, . . ., m-and count the respective number of measurement outcomes.Ideal measurements are associated with unit rank projectors Π k is the k-th normalised eigenvector of A (j) .For each measurement setting j the specific outcome k = 1, . . ., d occurs with probability p j,k := tr(Π Completeness, i.e. the property that the projectors sum up to unity, ensures normalisation for each measurement setting j, so that d k=1 p j,k = 1.For each measurement setting j, the outcome k corresponds to a random variable Y j,k .Repeated measurements are independent from each other, and are performed on N j copies of the state for each measurement setting j, yielding the respective integer-valued realisation y j,k as observed frequency with d k=1 y j,k = N j .Hence, for each measurement setting j, the probability of the random variables (Y j,1 , . . ., Y j,d ) to take the configuration of measurement outcomes (y j,1 , . . ., y j,d ) is given by following a multinomial distribution M(N j , (p j,1 , . . ., p j,d )).
Accordingly, we will obtain the k-th outcome N j p j,k times in expectation.We formalise the measurement process by introducing the linear operator which maps density matrices in S d to matrices in R m×d + , corresponding to measurement outcomes k = 1, . . ., d for different measurement settings j = 1, . . ., m.We emphasise that A( ) is not an experimental data matrix itself; according to the law of large numbers, the frequencies in each measurement realisation Y := (y j,k ) ∈ N m×d from the experiment will converge to A( ) with growing number of measurements N j , i.e. the expectation value E(Y j,k ) of the random variable Y j,k is given by for each j, k.Apart from additional systematic sources of error, e.g.due to experimental imperfections, the difference between Y and A( ) is due to finite counting statistics, and in many settings, this is the largest contribution to the error.The most straightforward approach to determine from Y would be to attempt to invert the linear system of equations In general, however, noise on the data Y would render the reconstructed density matrix ˆ unphysical (ˆ 0).A generic (full rank) density matrix in S d is determined by d 2 − 1 independent real parameters.Hence, in general, one requires at least d 2 − 1 linearly independent equations in order to solve Eq. ( 7).This is also called tomographic completeness.When dealing with significantly less information, specialised reconstruction techniques are important with compressed sensing being a natural choice, which we will discuss in the next section.
In our system, we will be concerned with local Pauli measurements on each subsystem of a multi-partite state.We measure an n-qubit system (d = 2 n ) using m different measurement settings, each of which corresponds to an n-qubit Pauli operator j = 1, . . ., m, with σ (j) i ∈ {σ x , σ y , σ z }, where σ x , σ y , σ z are the Pauli matrices.This is often referred to as Pauli basis measurement.The projectors of the two-qubit operator A (1) := σ z ⊗ σ z , for example, are Π = |1, 1 1, 1|.For n qubits, there exist m max := 3 n different Pauli words in total (excluding the identity matrix for each qubit), each with 2 n eigenvectors, which corresponds to a maximum of 3 n • 2 n equations in Eq. (7).Each set of Pauli projectors {Π for fixed setting j contains a subset of elements that is linearly independent from the projectors for all other settings.Hence, any number of smaller than m max measurement settings will lead to the loss of tomographic completeness.When performing QST on large systems, however, it is of practical necessity to employ as few measurement settings as possible (and often also only few repetitions per measurement setting).The key question arising in this context, therefore, is whether it is feasible to reconstruct an unknown state with not only m < m max measurement settings, but a significantly smaller subset.The need for minimising the number of measurement settings is particularly pressing in architectures such as linear optical ones, since high repetition rates and good statistics are available, while it can be tedious or costly to alter the measurement setting.This is indeed the case in many practically relevant situations using compressed sensing schemes, which will be discussed in the next section.

Compressed sensing for quantum state tomography
By parameter counting, a state with rank r < d can be completely characterized by fewer than d 2 parameters, that is ∼ rd.However, it is far from obvious how to acquire these parameters using fewer measurement settings and how to do so in a robust fashion-this is the starting point for compressed sensing [11,28].Originally conceived for reconstructing sparse vectors, the concept was extended to the recovery of low-rank matrices [29,30] and adapted to the problem of QST [13,31].Here, one again considers structured problems in which one can exploit the fact that in many useful settings approximately low rank states are of interest.This is a reasonable assumption, since most quantum information experiments aim at preparing pure states.
In order to obtain a general complex-valued low-rank matrix from measurements A, naïvely, one would search within the set of low-rank matrices for the one that matches the measurement constraint, solving The key idea for compressed sensing in matrix recovery is to relax this NP-hard problem [32] into the closest convex optimisation problem [33] min We denote the nuclear norm (better known as the trace norm in the context of reconstructions in quantum mechanics) of a matrix χ by χ * := tr( χ † χ).Such problems are well known to be efficiently solvable [34].
The crucial question in compressed sensing is how many measurements are required to satisfactorily reconstruct the sought-after matrix.Many proofs rely on randomized measurements schemes: In Ref. [35], it has been shown that for a general map A : R d×d → R M with Gaussian entries, M 3r(2d − r) copies of are provably sufficient for the recovery of .Building on this and closer to our situation is the recovery guarantee presented in Ref. [36], in which M ≥ c rd copies are needed with some constant c > 0, for A : S d → R M , → (tr(Π (j) )) j=1,...,M , mapping density matrices from S d to vectors in R M , with Π (j) = v (j) v (j) † , and v (j) a Gaussian vector for each j.In practice, numerical computations outperform these theoretical bounds.However, there is a fundamental lower bound for the number of copies, M = 4r(d − r) − 1, using a theoretically optimal POVM with M elements [37].Note that-in the mindset of measurement settings and outcomes-the number of outcomes k per measurement setting j scales with the dimension of the Hilbert space d.Since M corresponds to m d, the number of measurement settings scales just with the rank, i.e. m = c r.
It is in general harder to prove comparable results for deterministic measurements-in our setting with v (j) being eigenvectors of Pauli operators.To bridge this gap, notions of partial derandomisation have been introduced, where v (j) are not Gaussian, but drawn from spherical designs-certain finite subsets of the d-dimensional complex sphere-leading to similar statements [36].Spherical designs, in turn, can be related to eigenvectors of n-qubit Pauli operators [38].Apart from results on the level of expectation values [39], less has been proven for products of single-qubit eigenvectors, the setting at hand-strikingly in contrast to the great success of the procedure in practice.These results remain stable when taking noise into account.
The measured data can be written as with N and η j,k representing the noise due to finite counting statistics.For positive semi-definite matrices such as quantum states, the nuclear norm of a matrix reduces to the trace of the matrix.Consequently, relaxing the equality constraint in Eq. ( 10) and including the positivity constraint, we arrive at the semi-definite program (SDP) [32] min for some yet-to-be-determined ε > 0 and • 2 representing the entrywise two-norm.This is exactly the problem we aim to solve in order to achieve efficient QST.SDPs, being convex programs, feature a rich theory, and numerical implementation is easily achievable [40,41].Note that the procedure minimizes the trace, which at first sight might seem contradictory to the requirement for density matrices to have unit trace.However, the unit trace requirement is implicitly included in the data constraint since the probabilities in the map A are normalised.Perfect data would lead to an optimizer with trace exactly equal to one.In turn, a relaxation of this constraint leads to a relaxation of the unit trace requirement.As a result, generically for not too small ε, the optimal χ, denoted by χ, will be subnormalised, due to its location on the part of the boundary of the ε-ball with the lowest trace.In order to obtain a physically meaningful reconstruction ˆ ∈ S d , we find in our simulations that renormalising via produces the highest fidelity results.To carry out the optimisation procedure, we employ the convex optimisation solver SDPT3 4.0 [42] together with CVX [43].For higher Hilbert space dimensions, methods like singular value thresholding [44] come into play, which typically are faster, but less accurate.

Experimental setup
The experiment is designed to prepare the four-qubit GHZ state associated with the state vector with the qubits encoded in the polarisation degree of freedom of four photons.Here, |H and |V represent horizontally and vertically polarized photons, respectively, hence effectively spanning a two-dimensional Hilbert space.The experimental setup, building upon the one outlined in Ref. [45], is shown in Fig. 1 and consists of two Bell pair sources which undergo a parity check or postselected fusion [8,[46][47][48][49][50][51] to probabilistically generate the GHZ state.Both the photon pairs, generated by spontaneous four-wave-mixing in microstructured fibers, and the fusion operation are successful only probabilistically, but in a heralded fashion, i.e. a classical signal is available signifying success of the preparation.Successful generation of the state is determined by post-selecting only four-photon coincident events which occur at a rate of approximately 1-2 Hz.The post-selected data is effectively free from dark counts-noise generated by single photon detectors firing erroneously in the absence of a photon.This is due to the fact that the rate at which dark counts in n modes occur in the coincidence window decreases exponentially with n, i.e. four simultaneous dark counts are negligibly rare.Due to additional experimental imperfections, however, the prepared state is non-ideal.The main cause of deviation between the actually prepared state and the target state arises from the distinguishability of photons partaking in the fusion operation and inherent mixedness from the parasitic effects in the pair generation [52].These tend to cause the generated state to resemble a partially dephased GHZ state [8].Measurements on the state then proceed using single qubit rotations (waveplates) and projections (polarising beam splitters and single-photon detection with avalanche photo-diodes) using well-characterised bulk-optical elements allowing high-fidelity measurements to be performed.Experimental setup for generating the four-photon polarisation entangled states |ψGHZ , consisting of photonic crystal fiber (PCF) sources, half-wave plates (HWPs), quarter-wave plates (QWPs), a Soleil-Babinet (SB), polarising beam-splitters (PBSs), and dichroic mirrors (DMs).The 80 MHz Ti-Saph laser is split onto two PCF sources in twisted Sagnac-loop interferometer configurations generating polarisation entangled Bell pairs.The signal and idler photons from each source are separated by DMs and the signal photons interfere on a PBS with relative time between paths ∆τ ≈ 0, which on postselecting a single photon in each output port performs a fusion operation.The SB is set to match the phase between the |H, H, H, H and |V, V, V, V components to zero.Each mode is measured by single-qubit rotations consisting of a HWP and QWP, and is projected in the {|H ,|V }-basis by PBSs and avalanche photodiode detectors.
As stated above, in order to achieve a tomographically complete basis for n qubits, one requires m max = 3 n measurement settings.In our system of four qubits, n = 4, we have measured a tomographically complete set of 81 local Pauli operators.For each measurement setting, around 650 fourcoincident events are accumulated within an integration time of six minutes.Evidently, given the exponential scaling of the tomographically complete set of measurement settings, achieving such reliable statistics for larger states (n > 4) is increasingly demanding on resources and quickly becomes infeasible.

Model selection
The starting point for carrying out compressed sensing quantum tomography is the question of determining an appropriate value for ε in the optimisation procedure Eq. ( 12).Essentially, larger values of ε result in greater relaxation of the data fitting constraint, leading to lower-rank estimates ˆ ; while smaller ε values will yield ˆ matrices with larger rank, which better fit the particular data set.Depending on the underlying state and the particular instance of noise in the data, the choice of ε might result in under-fitting with too coarse a model, or in overfitting-i.e.including parts of the noise into the model of the state.Both extremes in general lead to states that fail to correctly predict future data.In the most severe cases, it could happen that using the same measurement prescription A and the same data Y, the optimisation procedure in Eq. ( 12) yields a full rank or a rank-one matrix, depending on the choice of ε.Worse still, too small a value of ε can make the optimisation procedure unfeasible, whereby there is no feasible state that would result in data sufficiently close to that measured.The task of determining the appropriate model-in our case, the value of ε-that is statistically faithful to the data via an appropriate choice of the respective external parameters is called model selection (see e.g.Ref. [53]).Several ideas of model selection have a rigorous mathematical underpinning: Particularly well-known is the Akaike information criterion (AIC) [54], providing a measure of the relative quality of statistical models for a given set of data.For a collection of models compatible with a given data set, this criterion gives an estimate for the relative quality of each model.Similarly frequently employed is the Bayesian information criterion (BIC) [55].Direct application of AIC and BIC to quantum tomography-an approach followed in Ref. [15]-is problematic for larger systems since it requires rank-restricted maximum-likelihood estimation, leading to non-convex optimisation, which scales unfavorably with the system size.This is due to the fact that these techniques are discrete in the sense that they explicitly restrict the rank of the density matrix.In the compressed sensing mindset, the parameter that controls the rank in a continuous fashion is ε.As we mentioned above, this is at the centre of our discussion.
For sufficiently small noise, a promising ansatz for identifying a suitable ε is to use the data to compute the estimate ε(Y) according to the expectation value of Assuming the noise is solely due to finite counting statistics, i.e. the deviations from measurement outcomes from the expected variance of the multinomial distribution, we obtain with variance V.The second step follows from E(η j,k ) = 0 for each j and k.In order to compute ε from the data, we need to approximate p j,k as y j,k /N j , which is reasonable for sufficiently large N j according to the law of large numbers.By Eq. ( 16), we obtain the estimate This choice of ε = ε(Y) scales linearly with m, the number of measurements in the dataset Y.Note that ε depends on the noise model, which in several cases may not be sufficiently established.In our case, however, the noise model is known to a high degree, which allows us to study and compare different methods for estimating the parameter ε.
Complementarily, we employ a straightforward, wellestablished model selection technique based on cross validation (see e.g.Ref. [56]), which is more scalable than the use of AIC or BIC in our case.For this, the data is partitioned into independent training and testing sets.Different models, i.e. different values for ε, are built from the training data and used to predict the testing data.The sought-after parameters-in our case ε-then result from the model corresponding to the smallest error with respect to the testing data.
Specifically, we randomly draw m = 10, 15, 20, 40, 60, 80 out of the m max = 81 measurement settings without replacement, corresponding to different levels of limited experimental knowledge.The respective data sets Y(m) ∈ R m×d + are then partitioned into five subsets Z (1) (m), . . ., Z (5) (m) ∈ R m/5×d + .The optimisation in Eq. ( 12) is performed with respect to every possible union of four subsets 5 i=1,i =q Z (i) (m), q = 1, . . ., 5, and different ε parameters.Each reconstruction yields an estimate ˆ (m, q, ε) and the remaining subset Z (q) (m) is used as a testing set.The state estimate ˆ (m, q, ε) is used to compute the predicted measurement data A m,q (ˆ (m, q, ε)) and compare these with the corresponding subset of the experimental measurement data Z (q) (m) (A m,q : S d → R m/5×d + being the reduction of the operator A to the subsets of measurement settings corresponding to m and q).The resulting distance A m,q (ˆ (m, q, ε)) − Z (q) (m) 2 , between the predicted and measured data, also known as the prediction error or predicted risk, is averaged over q (fivefold cross-validation), yielding an estimate for the averaged prediction error (testing set error) If the corresponding optimisation problem is infeasible for a certain combination of ε, m, and q (i.e. the set of density matrices that satisfy the constraint in Eq. ( 12) is empty), the prediction error is set to Z (q) (m) 2 .For averaging, each point (m, ε) is sampled 50 times.
The mean values and standard deviations of the prediction error depending on the model parameter are depicted in Fig. 2. We see that for values of ε around ε the error is smallest, which is consistent with our ansatz and allows us to gain confidence in the assumption that the measurement data can be effectively modelled by a multinomial distribution.The more measurement settings are considered, the clearer the choice of the optimal ε becomes, with both the prediction error and its variance attaining their minima close to ε = ε.For those values of ε close to ε and sufficiently many measurement settings, the prediction error E(m, ε) is only slightly bigger than the error estimate for the data ε.Here, the error arises primarily from raw multinomial noise, ε, present in the testing set itself and cannot be overcome with improved reconstruction methods.Where fewer measurement settings are considered, less information about the state is available, resulting in large q=1 Am,q(ˆ (m, q, ε)) − Z (q) (m) 2 in units of ε depending on the model parameter ε and on the number of measurement settings m.The standard deviation is bigger for fewer measurement settings and for smaller ε.The latter is due to the increasing chance of the optimisation to be infeasible for smaller ε.For ε close to ε and sufficient many measurement settings, the error is only slightly bigger than the deviation due to the multinomial distribution of the measurement outcomes.
testing set errors as well as greater variance of the state estimates, although the smallest prediction errors are still seen for ε close to ε.As ε decreases below ε, the chance of the optimisation being infeasible increases, causing the prediction errors to effectively increase with a greater spread attributed to different optimisation runs.As ε increases above ε, the data fitting constraint is weakened, resulting in too coarse model fits and a gradually increasing prediction error.
Using Eq. ( 17) instead of cross validation has the advantage of much less computational effort and is useful in a scenario with good statistics for each measurement setting.Moreover, cross validation relies on partially discarding data, which could aggravate the issues of having too little data, yielding poorer estimates for ε.However, Eq. ( 16) relies on the assumption of a well identified error model-in our case, multinomial noise, as verified by cross validation.In cases in which the error model is not known, cross validation can provide a more robust estimate of ε.

Compressed sensing tomography of the GHZ state
Having verified that the optimal value for ε is close to that computed from Eq. ( 17), we use it as input for the compressed sensing tomography of the experimental state and compute the optimal estimate ˆ CS := ˆ (m max , ε) of the a priori unknown experimentally prepared state .The good statistics available in our experiment allow us to estimate with comparably high accuracy.In general, due to experimental imperfections, (and hence ˆ ) will deviate from the target state GHZ := |ψ GHZ ψ GHZ |, see Fig. 3 for a pictorial representa-tion.There, we show a comparison between the density matrices of the target state and the optimal compressed sensing estimate using bar plots.
Target state Estimated state FIG. 3. Bar plot of the density matrix of the target (GHZ) state ρGHZ and its optimal compressed sensing estimate ρCS.The basis is fixed to the tensor products of one-particle vectors in the order |H, H, H, H , |H, H, H, V , . . ., |V, V, V, V .The height of each bar corresponds to the size of the absolute value of the respective density matrix entry j,k = | j,k | e iϕ j,k and the colour to its complex phase ϕ j,k ∈ (−π, π].The colourmap is chosen to account for the periodicity of the phase.The fidelity of the estimate with respect to the GHZ state is 0.855±0.006and its purity tr(ˆ 2 CS ) = 0.60±0.01,representing an expected mixedness due to experimental imperfections.
The standard figure of merit to determine the performance of tomography is the quantum fidelity F of two states χ and σ, which is defined as . We find that the fidelity between the GHZ state GHZ and the estimated state ˆ CS is The uncertainty of the fidelity is determined by using the op-timal compressed sensing estimate, ˆ , as input for the generation of simulated data-parametric bootstrapping [56]-and taking the empirical standard deviation of the fidelity values.This uncertainty determines the robustness of the method.Obtaining a closed expression for proper error bounds from the data with respect to positivity constraints is hard [23,57], while bootstrapping and taking the empirical standard deviation gives a good estimate of uncertainty [56].
To build confidence, we also computed the maximum likelihood estimate [58], ˆ MLE , using the same data to obtain a fidelity with respect to the target state of F ( GHZ , ˆ MLE ) = 0.843 ± 0.004, which shows that the estimators yield similar results; as will other estimators such as least squares with positivity constraint.Additionally, since we have measured a tomographically complete set of observables and the statistical properties of the measured data are sufficiently understood, we are able to provide an estimate of the fidelity with respect to the target state directly from the measured data without the need of performing tomography and an estimate of the corresponding error bound, see Appendix A for details.With this, we obtain a fidelity of 0.845 ± 0.005, which again is in good agreement with the results computed from the compressed sensing estimate.We note that the standard technique for estimating the fidelity of a state with respect to a specific target state requires estimating only the expectation values of a set of operators that form a decomposition of the target state.For a four-qubit GHZ state, this requires a minimum of nine specific Pauli basis measurements, as explained in Appendix A. In contrast, using compressed sensing tomography, even a random set of measurement settings produces fidelity estimates with respect to the GHZ state, which quickly approach the maximum at around 25 measurement settings.Furthermore, these measurement settings suffice to compute the fidelities with respect to arbitrary states, since they allow for the estimation of the entire state.
Compressed sensing is about employing provably fewer measurement settings than with standard methods, while still producing satisfactory reconstructions, i.e. to effectively sense in a compressive way.Along these lines, we explore how varying the number of measurement settings m affects the fidelity.This is shown in Fig. 4. In order to make the results independent from specific measurement settings, we randomly draw without replacement m out of m max different settings 200 times and average over the resulting fidelities, thus providing a value for a typically expected fidelity for each m.As one would expect intuitively, we can see that the value of the fidelity increases monotonically with the number of measurement settings and converges to the fidelity of the estimate from tomographically complete data.The shaded region represents the uncertainty (± standard deviation) in the fidelity computed via bootstrapping and displays the decreasing uncertainty with increasing numbers of measurement settings.The fidelity already falls within the errorbars of its final value for comparably small m.

Deviations from the optimal parameter
In this section, we study the effect that misestimating ε has in the performance of the reconstruction of the state.We carry out this task by numerical simulation: Using the compressed sensing state estimate ˆ CS , we simulate measurement data, which we subsequently input to our compressed sensing reconstruction procedure, varying both ε, m and randomly drawing measurement settings without replacement.If the corresponding optimisation problem is infeasible and yields no estimate, the fidelity F is set to zero.The fidelities F (ˆ CS , ˆ (m, ε)) are averaged over data and measurement settings (500 different data sets and different measurement settings per m and ε).
The results for varying m and ε in units of ε are shown in Fig. 5.We compare the reconstructed states to ˆ CS , which we used to generate the simulated data.We see that as m increases, the fidelity converges to unity at ε = ε (where ˆ CS is defined).We are interested in how quickly our reconstructed state approaches the optimal ˆ CS with fewer measurement settings, particularly if ε is misestimated.For instance, we see that we can obtain average fidelities of more than 0.8 for only 6 measurement settings.Fig. 5 (top) again illustrates that ε = ε is the best choice as the fidelities around this region (and away from pathologically small numbers of measurement settings, m > 3) are the highest.Moreover, we also see that with increasing m, the standard deviation ∆F of the fidelity becomes smaller for ε ≥ ε.For ε < ε, infeasibilities of the optimisation Eq. ( 12) that appear for certain choices of measurement settings lead to large standard deviations, which can be seen by the ridge in the area left of ε = ε in Fig. 5 (bottom).The ridge as well as the region of infeasibility gets close to ε = ε for large m, which is reasonable since more information (i.e. more constraints) puts greater restrictions on the optimisation problems.If fewer measurement settings are considered, as in the highly tomographically incomplete regime, overestimation of ε is less detrimental and state estimates still perform well, i.e. the fidelity is relatively constant for ε ε 3 ε.However, as m increases, the reconstruction becomes more strongly dependant on the choice of ε.Generally, we see that the higher the fidelity, the lower the standard deviation.

Discussion
In this work, we have experimentally explored the compressed sensing paradigm for quantum state tomography as applied to the photonic setting.We have explicitly laid out a method for applying these techniques and reconstructed the state of a four-photon system with tomographically complete data available, observing a high fidelity of the reconstructed state with respect to the target state.The presence of noise in the data requires that one carefully chooses appropriate constraints on the optimisation.In current applications, these parameters are usually obtained in an ad hoc way.We have provided a prescription to establish the parameters in a more systematic way by modelling the noise and performing cross validation, which is a general method for model selection.The quality of the data, being afflicted with noise predominantly attributed to finite counting statistics, allows us to model the noise via a multinomial distribution.This is a situation commonly expected for photonic experiments with postselected data.In fact, we observe a great agreement between estimating the model parameter from theoretical noise modelling and cross validation.
Having established the appropriate model, we have been able to perform state reconstruction with tomographically incomplete data, which rapidly converges to the highest fidelity estimate as the number of measurement settings increases.As a validity check, we have also run different estimators on the full data and obtained similar results, showing that our compressed sensing procedure yields reasonable estimates.As is predicted by the mathematical theory of compressed sensing, we have found that the number of measurement settings needed for a satisfactory estimate of the underlying state is much smaller than the number of measurements necessary for tomographic completeness.We have also carried out a comprehensive bootstrapping analysis to build confidence in the robustness of our method.In fact, we have observed that the uncertainty in the fidelity quickly decreases with increasing number of measurement settings.Furthermore, we have studied the robustness of our method with respect to improper model selection and the effects on the reconstruction.We have found that for several choices of models and different numbers of measurement settings, the performance of the reconstruction can vary dramatically.For small numbers of measurement settings, our method depends less strongly on the model.In contrast, for large numbers of measurement settings, it is imperative to determine the appropriate model for optimal performance.These results confirm that compressed sensing in conjunction with suitable model selection gives rise to reliable procedures for state reconstruction leading to effective tomography with tomographically incomplete data.These techniques can be applied to a wide range of experimental settings and provide a means to identify and verify appropriate models thereby paving the way for the future of practical quantum state tomography.With this, we contribute to establishing compressed sensing as a practical tool for quantum state tomography in the low-information regime.
FIG. 1.Experimental setup for generating the four-photon polarisation entangled states |ψGHZ , consisting of photonic crystal fiber (PCF) sources, half-wave plates (HWPs), quarter-wave plates (QWPs), a Soleil-Babinet (SB), polarising beam-splitters (PBSs), and dichroic mirrors (DMs).The 80 MHz Ti-Saph laser is split onto two PCF sources in twisted Sagnac-loop interferometer configurations generating polarisation entangled Bell pairs.The signal and idler photons from each source are separated by DMs and the signal photons interfere on a PBS with relative time between paths ∆τ ≈ 0, which on postselecting a single photon in each output port performs a fusion operation.The SB is set to match the phase between the |H, H, H, H and |V, V, V, V components to zero.Each mode is measured by single-qubit rotations consisting of a HWP and QWP, and is projected in the {|H ,|V }-basis by PBSs and avalanche photodiode detectors.

FIG. 2 .
FIG. 2. Cross validation results.Prediction errors E(m, ε) = 1/5 5q=1 Am,q(ˆ (m, q, ε)) − Z (q) (m) 2 in units of ε depending on the model parameter ε and on the number of measurement settings m.The standard deviation is bigger for fewer measurement settings and for smaller ε.The latter is due to the increasing chance of the optimisation to be infeasible for smaller ε.For ε close to ε and sufficient many measurement settings, the error is only slightly bigger than the deviation due to the multinomial distribution of the measurement outcomes.

FIG. 4 .
FIG. 4. Fidelity F ( GHZ, ˆ (m, ε)) as a function of the number of measurement settings m with uncertainty (shading) from bootstrapping for ε = ε.For large m, F approaches the fidelity of GHZ and ˆ , F ( GHZ, ˆ CS) = 0.855, getting very close already for comparibly few measurement settings, and the standard deviation becomes smaller.

FIG. 5 .
FIG. 5. Fidelity F (ˆ CS, ˆ (m, ε)) depending on the number of measurement settings m and the model parameter ε (top) and corresponding standard deviation ∆F (bottom) obtained via bootstrapping.Since in compressed sensing we are more interested in the regime of few measurement settings and the fidelities do not change significantly for larger m, we restrict ourselves to the region with m ≤ 20.The data are generated randomly from ˆ CS and the measurement settings per m are drawn randomly as well.The fidelities are averaged over different data realisations and measurement settings.The highest fidelities are achieved for ε ≈ ε with rapid decrease for ε < ε where the fraction of infeasible optimisations increases.Note that the higher the fidelity, the lower the standard deviation.