Experimental quantum homodyne tomography via machine learning

,

Introduction.Exploiting the full potential of quantum technologies involves the challenge of 'quantum volume': keeping a high degree of control over a complex many-body quantum system in spite of its growing size [1].This important challenge concerns, in particular, methods for complete characterization of quantum states and processes.Quantum state tomography (QST), the reconstruction of quantum states from measurement statistics in multiple bases [2,3], is routinely performed in quantum physics experiments of various nature.Nevertheless, because the number of parameters describing a state of a quantum system grows exponentially with its size, tomography becomes increasingly demanding in application to large-scale quantum systems that are now engineered in experiments with ultracold atoms [4][5][6][7], ions [8], superconducting devices [9], and quantum light [10].This problem manifests itself in two aspects.First, full quantum tomography of multi-dimensional quantum systems requires large portions of data, which are typically difficult to acquire experimentally.Second, even if such data are available, they are quite difficult to process with reasonable computational resources.Fortunately, it often happens that the physical setting being studied imposes certain a priori restrictions on the quantum states that can be prepared in it.As a result, the states can be described using a set of parameters that grows polynomially, rather than exponentially, with the size of the system.This observation gave rise to alternative approaches such as permutationally invariant tomography [11], quantum compressed sensing [12], and tensor networks [13][14][15].Each of these approaches makes particular assumptions about the physical restrictions imposed upon the state in question.
In the absence of knowledge about the physics of the system, one can use a universal approach based on generative artificial neural networks.Generally, neural net-works are known to be capable of finding the best fit to arbitrarily complex data patterns with a limited number of parameters available [16].In the context of quantum physics, this capability has been exploited in the context of neural networks known as the restricted Boltzmann machine (RBM).RBMs are capable to encode the information about exponentially many terms of a quantum state in a polynomial number of units [17].This feature makes RBMs attractive for a variety of quantum variational optimization problems [18], which require finding a quantum state that best satisfies a certain criterion.Examples of such problems, in addition to quantum tomography [19], include searching ground states of Hamiltonians in quantum chemistry tasks [20], investigating tensor network states [21] and topological states [22], and simulating open quantum many-body systems [23][24][25][26][27].
In the original theoretical proposal [19], RBM-based QST has been applied to simulated pure states of interacting many-qubit systems.A subsequent work [28] has generalized this approach to mixed states and applied it to perform QST of a two-qubit system associated with a polarization-entangled photon pair.Very recently, the method was used in application to an experimental Rydberg-atom simulator with eight and nine atoms, using a pure-state, constant-phase approximation and measurements in a single basis [29].Neural network techniques in the context of QST were also employed, albeit in a very different setting, to pre-process the data, thereby reducing the effect of state preparation and measurement errors [30].
Here we apply the neural network QST approach to homodyne tomography of optical states, in which measurements of electromagnetic field quadratures at various phases are performed to reconstruct the state of light in a given mode [2].We verify our method on experimental data for the cases of optical Schrödinger's cat states and arbitrary Fock-state superpositions up to the two-photon level, where we obtain high quality of quantum state reconstruction.The approach generally outperforms standard maximum-likelihood based methods [31], which, as we demonstrate, is deeply linked with reduced overfitting.To our knowledge, this is the first application of neural networks in continuous-variable quantum setting.
Neural network tomography.An RBM is a neural net containing two layers, visible and hidden, with all-to-all connections between the neurons in different layers and none inside each layer [hence the term 'restricted', see Fig. 1(a)].The neurons can take on binary values {0, 1}.Any set of neuron values, defined by binary vectors v and h, is associated with the Boltzmann probability where where E is the energy function, Z is the partition function, and Ŵ , a, b are the network parameters: weights and biases, respectively.The conventional RBM is trained to find the parameter set that maximizes the product of marginal distributions, over the training set {v}, i.e. {v} p(v).The RBM trained in this way will produce similarly low energy values for test inputs that are similar to elements of the training set, which is useful for pattern recognition [32].Furthermore, by sampling high-probability visible layer vectors, one can use the RBM as a generative neural network [33].
In the classical case, the data (such as the pattern to be recognized) are fed to the RBM through the visible layer.Doing so for quantum tomography would be unimaginable because there are infinitely many quantum states and even more possible measurement data sets.On the other hand, we can take advantage of our a priori knowledge of the connection between quantum states and the measurement probabilities associated with different bases.
These important differences dictate a different way that RBMs can be applied for quantum optimization problems.Here we utilize the RBMs to define an Ansatz expression for the quantum state |Ψ , which we wish to reconstruct.The neural network parameters are then used as the variational parameters of that Ansatz.We calculate the likelihood function (probability of having acquired the present experimental data set given |Ψ ) using the knowledge of quantum mechanics, and optimize the parameters, and therefore |Ψ , to maximize that likelihood.The visible layer no longer plays the role of the container for the data, but only serves to index the basis  of the Hilbert space: each possible configuration v of the visible layer is associated with one and only one basis element |v .The Carleo and Troyer Ansatz [17], which we utilize here, uses two RBMs of identical architectures [Fig.1(b)], with the parameter sets λ = { Ŵ λ , a λ , b λ } and µ = { Ŵ µ , a µ , b µ } to express, respectively, the amplitudes and phases of the state's decomposition into this basis: where and E λ,µ are defined by Eq. ( 2) for the two corresponding RBMs.We note that the partition function Z is present only in the expression for the amplitudes, but not phases, because the phases have no normalization requirement.The logarithm is included in the phase for mathematical convenience.
In optical homodyne tomography, the basis traditionally used for state reconstruction is the Fock basis, bounded from above by some cut-off photon number N ph .Because an RBM with the visible layer of size m can represent a Hilbert space of dimension 2 m , the natural choice is to construct the reconstruction basis from photon number states {|0 , . . ., |N ph = 2 m − 1 }.The basis is then encoded in the visible layer in a straightforward fashion, for example, for m = 2 : The tomography experiment consists in measuring the continuous electromagnetic field quadrature samples X on multiple copies of the state |Ψ at various phases θ.
The log-likelihood functional is then as follows: where ρ = |Ψ Ψ| is the density matrix, j enumerates measurement outcomes.This is a differentiable function of the RBM parameters, defined through Eqs. ( 2), (4), and (5).These parameters can be therefore optimized using gradient descent to maximize the log-likelihood.
A general quantum tomography method must be able to work with not only pure states, but also with mixed ones.The method above is readily generalized to mixed states by means of purification: introducing an ancillary "environment" Hilbert space, whose dimension is equal to that of the Hilbert space of interest.The mixed state that needs to be reconstructed can then be written as a partial trace where the pure state |Ψ SE is a vector of the tensor product Hilbert space comprising the system and the environment and can be reconstructed from the experimental data as described above (see Methods for details).We note that, although the dimension of the tensor product space is the square of the dimension of the system, the number of visible units needed to represent that space is only twice as large as that for the system alone.
We emphasize again the difference between the RBM approach to state reconstruction and the conventional quantum expectation-maximization (MaxLik) technique [31,34].In both cases, we optimize the parameters of the state to maximize the likelihood functional (6).However, in the standard approach, all elements of the density matrix are being optimized, which corresponds to the number of parameters equal to the dimension of the Hilbert space squared.Within the RBM Ansatz, on the other hand, the number of parameters is on the scale of the product of the number of visible and hidden units, i.e. scales logarithmically with the Hilbert space dimension.As discussed previously, this is of great advantage when this dimension is large.Although reducing the number of parameters does restrict the set of states that can be expressed by the RBM Ansatz, we found it to be sufficient to adequately represent the states observed in homodyne tomography experiments.
We test our approach on two sets of experimental data.The first set corresponds to an optical analog of Schrödinger's cat, i.e. the superposition of two oppositeamplitude coherent states.The data have been taken from the experiment [35] and correspond to the cat state of amplitude α = 1.85 squeezed by 3 dB along the quadrature axis.The second data set was obtained in an experiment on engineering arbitrary superpositions of Fock states a 0 |0 +a 1 |1 +a 2 |2 with the amplitude ratio a 0 : a 1 : a 2 ∼ −0.76 : 0.49 : 0.42 [36].We compare our reconstruction results with standard iterative MaxLik algorithm with efficiency correction.For both methods, we obtain Wigner functions and density matrices of the reconstructed states (Fig. 2).
For the reconstruction of the cat state, we used the cut-off photon number of N ph = 7 (i.e.m = 3), which corresponds to the amplitude and phase RBMs containing 2m = 6 visible units each.Additionally, each RBM contained 8 hidden units.The reconstruction featured correction for 62% detection efficiency (see Methods).For the Fock state superposition, each RBM had 4 visible units, 4 hidden units, N ph = 3 (m = 2) and the efficiency correction 55%.As we see in Fig. 2, both methods resulted in similar reconstructed states, with the relative fidelity about 0.998 in both cases.
Effects of overfitting.Our next goal is to compare the performance of the RBM approach to MaxLik.Using bona fide experimental data is suboptimal for this purpose because it is not known what "true" state they correspond to, and hence we cannot tell which method gives better reconstruction.
Therefore we generate a simulated quadrature data set corresponding to the Schrödinger's cat states |α − |−α with α = 4, reconstruct the state from this set and compare it to the original.The RBM reconstruction was performed without assuming the state to be pure, using an RBM with 10 visible units (m = 5) and 3 hidden units.The cut-off point was at 31 photons both for RBM and MaxLik.The motivation for choosing this relatively large Hilbert space is to explore the case in which the number parameters optimized by the RBM is much less than MaxLik.
Figure 3(a) shows the photon statistics of the state reconstructed using the two methods.Theoretically, we expect this state to show Poisson statistics for odd photon numbers, but zero probability for even photon numbers.We see that the state reconstructed using RBMs largely follows this rule whereas the MaxLik reconstructed state has significant nonzero statistics for even photon numbers.In Fig. 3(b) we plot the fidelity of the reconstructed state with the original one as a function of the data set size and observe that RBM performs significantly better.
The improved performance of the RBM approach for a smaller amount of experimental data is likely associated with lower overfitting [17].Indeed, the number of parameters in MaxLik is, as discussed 32 2 − 1 = 1023, whereas for RBM it is 2×(10×3+10+3) = 86.In order to demonstrate that overfitting is indeed the cause for poorer performance of MaxLik, we implement the following crossvalidation test.We generate multiple quadrature data sets of the the same size and reconstruct the state from one of them.Then we calculate the log-likelihood (6) for the data from each set with respect to the reconstructed state.If overfitting plays a significant role in the reconstruction, the likelihood of the "native" data set (from which the state was reconstructed) is expected to be significantly higher than for other sets.We plot the mean difference of the log-likelihoods for the "native" and "non-native" data sets in Fig. 3(c) and observe this difference to be much higher for MaxLik than for RBM.This confirms our hypothesis.
Outlook.Our results demonstrate that the neural network QST approach is a promising way of characterizing the states observed in optical experiments.We found this method to be capable of reliable state reconstruction and much less prone to overfitting compared to standard MaxLik approach.However, the full capability of our method is expected to be unveiled for very large Hilbert spaces, such that traditional methods become inapplicable.Therefore the natural next step would be to implement a complex multimode entangled state and apply RBM for its reconstruction.Promising sources of such states are multimode parametric oscillators that have seen rapid development in recent years [37,38].
To proceed in this direction, we will also need to change the strategy of RBM training.Presently, our method relies on exhaustive search of all possible configurations of the neural network units (see Methods).However, such a search will be impossible in large Hilbert spaces.
Instead, we will have to rely on approximate methods of RBM training such as contrastive divergence [41] or Gibbs sampling [42].Additionally, there exists a class of quantum states with physical interest that carry no efficient RBM description [43].Alternative neural network architectures should therefore be explored.In particular, it would be interesting to look for ways to utilize forward propagating neural networks, rather than RBMs, for QST.Such neural networks are more common in modern machine learning because their training is much more straightforward.
Our approach can be generalized to broader classes of physical problems.First, in addition to light, it is applicable to any physical system that can be mapped to a harmonic oscillator, such as atomic ensembles [39] and nanomechanics [40].Second, we reiterate that neuralnetwork based QST studied here belongs to a larger class of problems in which one looks for a quantum state that best satisfies a certain criterion.A particularly promising field of research, in our opinion, is complex phenomena in condensed matter systems, such as many-body localization, and describing exotic phase transitions.Approaches based on machine learning constitute a new and promising way of tackling them.

METHODS
Pure states.Here we present the details of training our RBM.The neural network parametrization for the wavefunction is defined by Eqs. ( 4) and (5).We introduce additional notation.First, because there is one-to-one correspondence between visible layer configurations |v and Fock states |n , as discussed in the main text, we will use the symbol n to denote both these objects.Second, we denote the unnormalized Boltzmann probability with Z λ ≡ n P λ n , for the amplitude RBM, and the analogous quantity for the phase RBM.We remind the reader that the letters λ and µ denote the respective parameter sets of these RBMs.By plugging the expression (4) into the log-likelihood function (6) and using that the fact that overlap between the number and quadrature eigenstates θ, X |n corresponding to the phase θ is just the n's harmonic oscillator eigenfunction (Hermite-Gaussian polynomial) H n (X) with a phase factor, θ, X |n = H n (X) exp[−inθ], (10) we obtain the following expression: where , the summation with index j is over all quadrature measurements and N m is the number of measurements.
For the network training, we evaluate the gradients of the above log-likelihood Ξ(λ, µ) over the neural net parameters λ and µ as follows: where we defined with D n µ ≡ ∇ µ log P µ n and D n λ ≡ ∇ λ log P λ n .Ascending by these gradients, we can maximize the log-likelihood (11).Both RBMs are trained simultaneously.
We note that the above gradients contain exhaustive summation over possible configutations {n, h} of the visible and hidden layers of both RBMs.In the present work, we are able to compute this sum directly since the number of RBM units is relatively small.However, in the case of high Hilbert space dimension, Boltzmann sampling using an annealing device or algorithm will be required.
Mixed states.As discussed in the main text, see Eq. ( 7), we treat the mixed state ρ to be reconstructed as a partial state of a pure state |Ψ SE in a tensor product Hilbert space with the dimension (N ph + 1) × (N ph + 1).We decompose this state in the Fock basis |Ψ SE ≡ N ph n=0 N ph m=0 C nm |n, m and apply the same parametrization as in the previous subsection: with The partial trace of this state over the environment is as follows: The log-likelihood ( 6) is then given by Ξ(λ, µ) = − j log nm θ j , x j |n ρ λ,µ nm m|θ j , x j where the summation indices n, m, k run over the truncated Fock basis, j over all quadrature measurements, and We note that the expression ( 17) is very similar to the pure state case (11), but requires two additional summations over the truncated Fock basis.The log-likelihood (17) gradients over µ and λ read similarly to those for the pure state ( 12), but with the parameters (13) redefined as follows: The remainder of the treatment replicates that for pure states.
Efficiency correction.To correct for an imperfect homodyne detector efficiency η < 1 in our neural net approach, we model it as a perfect detector preceded by beam splitter of transmission η [31]

Figure 1 .
Figure 1.Architecture of restricted Boltzmann machines for classical pattern recognition tasks (a) and quantum tomography (b).

Figure 2 .
Figure 2. Experimentally reconstructed Wigner functions and density matrices for optical Schrödinger's cats (a-b) and engineered Fock superpositions up to the two-photon level (c-d) using neural network quantum tomography (a,c) and MaxLik (b,d).The relative fidelity of the two reconstructed states is about 0.998 in both cases after efficiency correction.

Figure 3 .
photon statistics , which changes the quantum state ρ by means of generalized Bernoulli transformation to a new state ρη according tom| ρη |n = ∞ k=0 B m+k,m (η)B n+k,n (η) m + k| ρ |n + k ,(20)where B n+k,n = n+k n η n (1 − η) k .Now we can repeat the above procedure for the mixed state (purification) Ansatz, with the only difference that we use ρη instead of ρ to calculate the log-likelihood (6).