Machine learning assisted quantum state estimation

We build a general quantum state tomography framework that makes use of machine learning techniques to reconstruct quantum states from a given set of coincidence measurements. For a wide range of pure and mixed input states we demonstrate via simulations that our method produces functionally equivalent reconstructed states to that of traditional methods with the added benefit that expensive computations are front-loaded with our system. Further, by training our system with measurement results that include simulated noise sources we are able to demonstrate a significantly enhanced average fidelity when compared to typical reconstruction methods. These enhancements in average fidelity are also shown to persist when we consider state reconstruction from partial tomography data where several measurements are missing. We anticipate that the present results combining the fields of machine intelligence and quantum state estimation will greatly improve and speed up tomography-based quantum experiments.


Introduction
Quantum information science (QIS) is a rapidly developing field that aims to exploit quantum properties, such as quantum interference and quantum entanglement 1 , to perform functions related to computing 2 , communication 3 , and simulation 4 . Interest in QIS has grown rapidly since it was discovered that many tasks can be performed using QIS systems either more quickly than, or which are completely unavailable to, their classical counterparts. In general, all QIS tasks require the support of classical computation and communication in order to coordinate, control, and interpret experimental outcomes. While the classical overhead needed to effectively operate and understand quantum systems is often negligible in current experimental settings, the exponential growth of parameters describing a quantum system with qubit number will quickly put substantial demands on available computing resources.
Using machine learning (ML) to reduce the burden of classical information processing for QIS tasks has recently become an area of intense interest. Examples of where this intersection is being investigated include the representation and classification of Figure 1. Schematic of the robust tomography scheme with machine learning. The noisy tomography measurements are fed to the convolutional neural network, which makes predictions of intermediate τ-matrices as the outputs. At the end, the predicted matrices are inverted to reconstruct the pure density matrices for the given noisy measurements. arXiv:2003.03441v1 [quant-ph] 6 Mar 2020 many-body quantum states 5 , the verification of quantum devices 6 , quantum error correction 7 , quantum control 8 , and quantum state tomography (QST) 9,10 . Here we focus on QST, where a large number of joint measurements on an ensemble of identical, but completely unknown, quantum systems are combined to estimate the unknown state. For a quantum state of dimension d there are d 2 − 1 real parameters in the density matrix describing that state, and hence the resources required to measure and process the data required for QST grows quickly for systems with large dimension, such as those needed to demonstrate quantum supremacy 11 . Current methods for full state reconstruction from tomographic measurements scale as O(d 4 ) even when making the simplifying assumption that all noise is Gaussian [12][13][14] . As an example how demanding this scaling is in modern experiments, the reconstruction of an 8-qubit state in 15 took weeks of computation time, in fact, more time than was required for data collection itself 16 . Recently, various deep learning approaches have been proposed for efficient state reconstruction [17][18][19][20][21][22] with some techniques indicating a scaling of O(d 3 ) 23 .
In this paper we implement a convolutional neural network (CNN) to reduce the computational overhead required to perform full QST. Our system is shown via simulated measurements to construct equivalent density matrices to traditional methods of state estimation to a high degree of accuracy. Our QST system has the distinct benefit that all significant computations can be performed ahead of time on a standalone computer with the final result deployed on more modest hardware. Further, in the setting where tomographic measurements are noisy or incomplete, we are able to demonstrate a significant enhancement in average fidelity over typical reconstruction methods by training our QST system with simulated noise ahead of time. These results constitute a significant step toward the implementation of high-speed QST systems for applications requiring high-dimensional quantum systems.
The design of our QST setup is shown schematically in Fig. 1. A series of noisy and potentially incomplete measurements performed on a given density matrix are simulated, which are then fed to the input layer of a CNN. Examples of the noisy tomography are shown as the tomography measurements in Fig. 1 (left-side). Then the CNN makes the prediction of τ-matrices (which are discussed in the following section) as the output. Finally, the output is inverted, resulting in a valid density matrix. Examples of the reconstructed density matrices are shown in Fig. 1 (right-side). This process is repeated many times for various sizes of random measurements, strengths of noise, missing measurements. The average fidelity (F) of the setup is calculated and compared to the fidelity when a non-machine learning method is used.

Results
The general setup of our CNN is depicted in Fig. 1, which consists of feature mappings, max pooling, and dropout layers 24 . More specifically, the two dimensional convolutional layer has a kernel of size of 2 × 2, stride length of 1, 25 feature mappings, zero padding, and a rectified linear unit (ReLU) activation function. The max-pooling layer is two-dimensional and has a kernel of size 2 × 2 with stride length of 2 which halves the dimension of the inputs, which is further followed by a convolutional layer with the same parameters as discussed above. Next, we attach a fully connected layer (FCL) with 720 neurons, and the ReLU activation. Then we have a dropout layer with a rate of 50%, which is followed by another FCL with 450 neurons, and the ReLU activation. Similarly, after this we attach, again, a dropout layer with a rate of 50%, which is finally connected with an output layer with 16 neurons. Note that the hyperparameters of the CNN are manually optimized as discussed in 25 . Furthermore, the network is designed such that the output (firing of 16 neurons) comprises the elements of the τ-matrix (see Method), which can be listed as [τ 0 ,τ 1 ,τ 2 ,τ 3 , .. .. .., τ 15 ]. Next, the list of 16 elements is re-arranged to form a lower triangular matrix as given in equation 1 which is, finally, compared with the target (τ target ) for the given measurements (see Method) in order to find the mean square loss. We optimize the loss using adagrad-optimizer (learning rate of 0.008) of tensorflow 26 . Additionally, at the end of an epoch (one cycle through the entire training set), the network makes the τ-matrix prediction for the unknown (test) noisy measurements, which is later inverted to give the tomography and fidelity of the setup as given by equation 2, where ρ pred and ρ targ represent the predicted and target density matrices, respectively. The form of equation 2 guarantees that the network always makes predictions which are physically valid 27 . Note that the conversion of τ-matrices to their corresponding density matrices and evaluation of the fidelity are inbuilt to the network architecture, so there is no separate post-processing unit. First we evaluate the average fidelity with respect to number of sets of density matrices used in the network for both pure and mixed states. In order to generate training and test sets, we randomly create 200 density matrices and their corresponding τ-matrices (see "Method"), again for both pure and mixed states. After this we randomly simulate the 200 noisy (σ = π/6) tomography measurements (each measurement contains 36 projections as described in "Method") for each of the τ-matrices, for a total of 40,000 sets (see "Method"). We then split each set of 200 noisy measurement results per τ-matrix into training and test sets (unknown to the network) with sizes of 195 and 5, respectively. For example, if we are working with 80 random density matrices (τ-matrices) then 195 out of 200 noisy tomography measurement data sets per density matrices, i.e, a total of 15,600 (80 × 195), are used to train the network and a total of 400 (80 × 5), are used to test the network. Note that in order to efficiently train the networks, we implement the batch optimization technique with a batch size of 4 for all the calculations discussed in the paper. With these training sets and hyper-parameters the CNN is then pre-trained up to 800 epochs.
For comparison with standard techniques, we also implement the Stokes reconstruction method 27 (see "Method"). The average fidelity is found to be significantly enhanced when the CNN is used (solid line) over the Stokes technique (dotted line) for the various number of sets of density matrices is shown in Fig. 2 (a). Note that we run the same training and testing process 10 times with different (random) initial points, in order to gather statistics (shown by the error bars). In the case of 20 sets of density matrices, we find a remarkable improvement in average fidelity from 0.749 to 0.998 with a standard deviation of 2.9 × 10 −4 , and 0.877 to 0.999 with a standard deviation of 1.21 × 10 −4 for the pure states (blue curves) and mixed states (red curves), respectively. Similarly, even for the larger sets of 200 density matrices we find an enhancement of 0.745 to 0.969 with a standard deviation of 1.03 × 10 −3 , and 0.874 to 0.996 with a standard deviation of 2.07 × 10 −4 for the pure states and mixed states, respectively. These results not only demonstrate an improved fidelity when compared to Stokes reconstruction but also approach the theoretical maximum value of unity. Additionally, improvement in average fidelity of the generated density matrices for unknown noisy tomography measurements per each training epoch is shown in the inset of Fig. 2 (a). The average fidelity is found to be saturated after 500 epochs.
We have also investigated how the number of noisy training sets per random density matrix impacts the effectiveness of our system. To do this we fix the number of sets of density matrices at 100 and vary the number of noisy measurements per set (in the previous paragraph, and Fig. 2 (a), this was fixed at 195). For testing purposes we use the same 5 noisy measurement sets per random density matrix which were used to create Fig. 2 (a). As expected, the average fidelity improves noticeably as the number of noisy measurement training sets per random density matrix is increased, as shown in Fig. 2 (b). Specifically, the average fidelity improves from 0.751 to 0.982 with a standard deviation of 1.04 × 10 −3 , and 0.88 to 0.996 with a standard deviation of 2.1 × 10 −4 for the pure states and mixed states, respectively. Additionally, even when we only train on simulated noise 40 times per random density matrix the average fidelity still increases from 0.751 to 0.923, and 0.88 to 0.982 with a standard deviation of 4.5 × 10 −3 and 1.6 × 10 −3 , respectively, for the pure and mixed states.
In order to investigate the robustness of our system, we now vary the strength (σ ) of noise used to both train and test our CNN. Specifically, we vary the noise strength from strong, σ = π, to weak, σ = π/21. For each σ value, we fix the number of sets of density matrix at 100 and randomly generate 200 noisy tomography measurements per set of density matrices resulting in a total of 20,000. As previously discussed, 195 (total of 19,500) and 5 (total of 500) out of the 200 per set of density matrices for the given noise are used as the training and test set, respectively. Note that we separately train the CNN for each different value of the noise. With the CNN pre-trained up to 500 epochs, a significant improvement in average fidelity of the generated density matrices with the CNN (red dots) over the conventional method (green dots) at various strengths of noise is shown in Fig. 3(a). We find a significant enhancement in average fidelity from 0.669 to 0.972 with a standard deviation of 7.8 × 10 −4 , and 0.985 to 0.999 with a standard deviation of 4.96 × 10 −5 for the strong noise strength of σ = π, and the weak noise strength of σ = π/21, respectively. Similarly, for weaker strengths of noise, we show the average fidelity of the generated quantum states with the CNN begins to converge with the conventional method as shown in the inset of Fig. 3 (a-i). We find the average fidelity from the CNN generation method as well as the conventional method for the noise strengths of π/800, π/1200, and π/1600 converge to unity. This can be considered qualitative evidence that our CNN approach to quantum state reconstruction is effectively equivalent to Stokes reconstruction in the absence of measurement noise. In order to further illustrate the efficacy of the CNN, we simulate 60,000 random tomography data sets without measurement noise. Note that the simulated 60,000 tomography measurements are random and unique. As before, the total set is divided into a training set with 55,000 measurements, and a testing set with 5,000 measurements. The tomography measurements in the testing set are completely unknown to the network. The average fidelity of the generated quantum states via the CNN per epoch for the unknown measurement data is shown in the inset of Fig. 3 (a-ii). We find the generated quantum states from the CNN (NN: right-column) for the blind test data are functionally equivalent to Stokes reconstruction (SR: left-column) as shown in the inset of Fig. 3 (a-ii). Lastly, we investigate how our CNN can handle the experimental scenario where some fraction of the 36 total tomography measurements is missing. Since the remaining bases measurements are not guaranteed to span the total 2-qubit Hilbert space, there is a priori reason to assume our CNN should have an advantage over Stokes reconstruction for this problem. For this analysis we use data with 100 sets of density matrices, a noise strength of σ = π/6, and the same training and testing data structure as previously discussed. However, in order to simulate missing measurement points we reduce the number of features in the input data. For example, in the extreme case of only using four projective measurements the input consists of only 4 feature float points over the 6 × 6 available space. The remaining 32 spaces are filled with 0 (zero padding). Similarly, for 8 projectors, 28 places are filled with 0; for 12 projectors, 24 places are filled with 0, and so on. For the sake of comparison we also perform zero-padding on the matrices for use with Stokes reconstruction. With training up to 500 epochs, we find an improvement in the average fidelity of the generated density matrices with the CNN (red dots) over the conventional Stokes technique (green dots) for every available size of the tomography measurements (projectors) as shown in Fig. 3(b). Note that the 4/8 error bars represent one standard deviation away from the mean value. We find a significant enhancement in the average fidelity from 0.61 to 0.9827 with a standard deviation of 1.08 × 10 −3 ; from 0.532 to 0.95 with a standard deviation of 1.5 × 10 −3 , and from 0.352 to 0.658 with a standard deviation of 2.3 × 10 −3 for the measurement size of 28, 16, and 4, respectively. In addition, we find an enhancement in the average fidelity even without zero padding in the input data with the CNN, which are shown by blue dots in Fig. 3 (b).

Discussion
We demonstrate quantum state reconstruction directly from projective measurement data via machine learning techniques. Our technique is qualitatively shown to reproduce the results of standard reconstruction methods when ideal projective measurement results are assumed. Further, by specifically training our network to deal with a common source of error in projective measurement data, that of measurement basis indeterminacy, we show a significant improvement in average fidelity over that of standard techniques. Lastly, we also consider the common situation where some number of the projective measurements are unsuccessfully performed, requiring the reconstruction of a density matrix from partial projective data. This situation is particularly troublesome as the final set of projectors used to collect data are unlikely to span the full Hilbert space. For this scenario we find a dramatic improvement in the average reconstruction fidelity even when only 4 of the total 36 measurements are considered. These results clearly demonstrate the advantages of using neural networks to create robust and portable QST systems.

Generating pure states
We define the horizontal and vertical polarization states as H and V , respectively, which are given by equation 3, In order to generate the pure states, we use Haar measure to simulate 4 × 4 random unitary matrices u. Then we use the first column of the simulated random unitary matrices as the coefficients of the pure states as in equation 4 where u i j represents the i th row and j th column of the random unitary matrix (u), |HH , |HV , V H , and |VV are the tensor products |H ⊗ |H , |H ⊗ |V , |V ⊗ |H , and |V ⊗ |V , respectively. Note that we add a tiny perturbation term ε (1 × 10 −7 ) to the simulated pure states as given in equation 5 to avoid the possible convergent issue under Cholesky decomposition of the pure state density matrix (ρ p ) 28 , Generating mixed states First we simulate the random matrix from the Ginibre ensemble 29 as given in equation 6, where N 0, 1, [4,4] represents the random normal distribution of size of 4 × 4 with zero mean and unity variance. Finally, the random density matrix (ρ m ) using the Hilbert-Schmidt metric 30 is given by equation 7 Where Tr represents the trace of a matrix.

Simulating tomography measurements
Here we simulate the exact sequence of the tomography measurements used by the Nucrypt entangled photon system 31 .
In addition to |H and |V , we now define a diagonal (|D ), anti-diagonal (|A ), right circular (|R ), and left circular (|L ) polarization states, which are given in equation 8

5/8
Furthermore, in order to simulate the experimental scenarios, we introduce the 36 projectors as given by equation 9 in the exact order of the Nucrypt's coincidence measurements, where h = |H H|, v = |V V |, d = |D D|, a = |A A|, r = |R R|, and l = |L L|. Therefore, the perfect tomography measurements (without any noise or rotations), M, given that any density matrix ρ are calculated using equation 10 M = Tr(ρ P[i, j]); for i, j = 0, 1, 2, 3, 4, 5.
Next, we discuss adding noise to the measurements, M. In order to do this, we introduce arbitrary rotations to the operators defined in equation 9 by making use of the unitary rotational operator (U) as given in equation 11 Note that we randomly sample ϑ , ϕ, ξ from the normal distribution with zero mean and σ 2 variance. Finally, we simulate the tomography measurements under the noisy environment as given by equation 12

Stokes reconstruction
To compare our system to a non-machine learning and non-adaptive technique, we use the Stokes reconstruction method for the given set of tomography measurements M 6×6 (pure/noisy). We express the Stokes reconstruction of the density matrix as ρ recons = 1 4 (s 00 I ⊗ I + s 01 I ⊗ σ x + s 02 I ⊗ σ y + s 03 I ⊗ σ z + s 10 σ x ⊗ I + s 20 σ y ⊗ I + s 30 σ z ⊗ I + s 11 σ x ⊗ σ x + s 12 σ x ⊗ σ y + s 13 σ x ⊗ σ z + s 21 σ y ⊗ σ x + s 22 σ y ⊗ σ y + s 23 σ y ⊗ σ z + s 31 σ z ⊗ σ x + s 32 σ z ⊗ σ y + s 33 σ z ⊗ σ z ), where σ i for i ∈ {x, y, z} are the Pauli matrices and the parameters s lk for l, k ∈ {0, 1, 2, 3} for the given 36 tomography measurements are given by equation 14. 6/8 Generating the τ-matrix In order to evaluate the τ-matrix for the given set of density matrices (ρ), we use the matrix decomposition method discussed in 27 where m i j 1 for i, j ∈ {0, 1, 2, 3}, and m pq,rs 2 (p = r and q = s) for p, q, r, s ∈ {0, 1, 2, 3} are the first and second minor of ρ, respectively.