Detecting quantum entanglement with unsupervised learning

Quantum properties, such as entanglement and coherence, are indispensable resources in various quantum information processing tasks. However, there still lacks an efficient and scalable way to detecting these useful features, especially for high-dimensional and multipartite quantum systems. In this work, we exploit the convexity of samples without the desired quantum features and design an unsupervised machine learning method to detect the presence of such features as anomalies. Particularly, in the context of entanglement detection, we propose a complex-valued neural network composed of pseudo-siamese network and generative adversarial net, and then train it with only separable states to construct non-linear witnesses for entanglement. It is shown via numerical examples, ranging from two-qubit to ten-qubit systems, that our network is able to achieve high detection accuracy which is above 97.5% on average.Moreover, it is capable of revealing rich structures of entanglement, such as partial entanglement among subsystems. Our results are readily applicable to the detection of other quantum resources such as Bell nonlocality and steerability, and thus our work could provide a powerful tool to extract quantum features hidden in multipartite quantum data.


Introduction
Peculiar quantum features, signalled by quantum entanglement [1] and coherence [2], enable us to accomplish tasks impossible for classical systems [3], such as ensuring the security of communications and speeding up certain hard computational tasks [4,5].Hence, an important question naturally arises: How can the presence of these features be efficiently detected for any given quantum system?Indeed, this is a challenging task for high-dimensional and multipartite systems because quantum features usually imply correlated patterns hidden within subsystems.Taking entanglement for example, except for low-dimensional systems, e.g., 2 ⊗ 2 and 2 ⊗ 3, of which entanglement could be detected faithfully via the Positive Partial Transpose (PPT) criterion [6], generically, it is an NP-hard problem [7].Besides, even though at least one linear entanglement witness could be found to witness any entangled state [8,9,1,10] as displayed in Fig. 1, there still lacks a universal and scalable way to construct such an appropriate witness for an arbitrary state in practice.
In this work, we turn to the machine learning technique which is powerful in extracting features or patterns hidden in large multipartite datasets to tackle the quantum detection problem.Recently, much progress has been achieved in this interdisciplinary field of quantum machine learning [11].For example, on one hand, many quantum or quantum-inspired algorithms have been developed to speed up some well-known machine learning algorithms [12,13,14].On the other hand, machine learning is also a natural candidate to extract correlated features of high-dimensional quantum systems, which has found wide applications in quantum control [15], state tomography [16], measurement [17,18], and many-body problems [19,20,21].Especially, the task of quantum entanglement detection can be formulated as a binary classification problem.As a consequence, various classical neural nets, trained with both entangled and separable samples, have been constructed to solve this problem via supervised learning [22,23,24].However, the supervised training method requires a large pre-labelled dataset.In practice, it is time-consuming or even impossible to faithfully label a large number of entangled states in a high-dimensional space [7], thus leading these supervised methods into a dilemma.
Here, we instead build up an unsupervised model to accomplish the task of entanglement detection beyond the above issues.Following from the fact that separable states form a convex set, it becomes an anomaly detection problem of which all separable samples are labelled as normal and entangled ones are abnormal.Particularly, as shown in Fig. 3, a class of complex-valued neural networks composed of a pseudo-siamese network and a Generative Adversarial Net (GAN), is constructed and then trained with very few normal samples to detect entanglement for multipartite systems, ranging from 2-qubit to 10-qubit states.It is noted that our model is much more feasible than anomaly detection methods proposed in [25,26] which require quantum hardware.
It is further illustrated in Fig. construct the boundary between separable and entangled samples.Numerical results show that it is able to achieve extremely high accuracy of entanglement detection with above 97.5% on average, and even capable to detect partial entanglement within subsystems, e.g., bi-separable states in 3-qubit system with accuracy above 97.7%.Our work is organised as follows.In section 2, we give a brief introduction to the task of entanglement detection and unsupervised learning method.Then we propose an unsupervised learning neural network targeted for the detection of generic quantum features.In section 3, multipartite entanglement detection is taken as examples to illustrate the performance of our model, with only separable samples used for training.Finally, we conclude this work with a summary in section 4.

The task of detecting entanglement
Entanglement is not only of significant importance to understand quantum theory at the fundamental level [1], but also has found applications in information protocols, such as quantum teleportation [27].For a given n-partite quantum system, entanglement associated with the state is defined in a passive way in which a state ρ is entangled if and only if it cannot be described in a fully-separable form of [28] with non-negative coefficients satisfying m i=1 λ i = 1.Here ρ j i denotes the state density matrix of the j-th subsystem.Obviously, all of the separable states as per Equation (1) form a convex set in the sense that any convex combination of these states in this set also belong to the same state set.It is noted that the above definition of entanglement does not fully capture the entangled structure in the state, e.g., the partial entanglement [29], which will be discussed later.
In practice, whether a given state ρ is entangled or not, can be experimental-friendly determined via an entanglement witness [1,10].Indeed, as shown in Figure 1(a), an entanglement witness essentially defines a hyperplane which separates the entangled state from the convex set of separable states.Furthermore, it has been shown in [1] that it is impossible for one linear witness to detect all entangled states, implying that a large set of linear witnesses illustrated in Figure 1(b) (could be impractical) or certain nonlinear witness shown in Figure 1(d) may be required.Besides, it becomes extremely inefficient and impractical to construct a proper witness for an arbitrary state, especially in multipartite systems.The entanglement witnesses as neural networks are experimentally accessible and has been demonstrated in [24].In fact, since neural networks are learning the linear and nonlinear correlations on the quantum states to form a classifier, a properly parameterized neural network layer is equivalent to a set of generalized Bell's inequalities for the experimental detection of entanglement.In the following, we propose a complex-valued neural network trained in unsupervised manner to search for the nonlinear entanglement witnesses as desired.

Unsupervised learning
The unsupervised model refers to the process of learning a probability distribution over the data that has not been classified or categorized.In this situation, automated methods or algorithms must explore the underlying features from the available data and group them with similar characteristics.Specifically, the unsupervised model only receives a training set S that contains without supervised target outputs {y 1 , y 2 , y 3 , • • •}.In contrast to supervised learning where tagging data requires a large amount of time, unsupervised learning exhibits high efficiency and self-organization in capturing patterns from untagged data.Autoencoder [30] is a widely used unsupervised learning method that aims to learn efficient representations for a set of data.Typically, an autoencoder consists of two modules, namely encoder E and decoder D, where the former learns the latent representation (encoding) for input data, and the latter is trained to generate an output

Inputs
Latent vectors

Output
Siamese network

Shared weights
Similarity measure as close as possible to its original input from the latent representation.Another wellknown unsupervised learning method is GAN [31,32].Specifically, two neural networks, namely generator G and discriminator D, contest with each other in the form of a zerosum game in GAN, where the gain of one module is the loss of the other.This technique learns to generate new data with the same statistics as the training set.The siamese network [33,34], as shown in Figure 2, contains a pair of neural networks built by the same parameters, which receives two inputs and detects their difference by comparing the output vectors of the networks.The siamese network is capable of learning generic features for making predictions about an unknown distribution even when few examples from the distribution are available, which provides a competitive approach for pattern recognition without the domain-specific knowledge.In particular, the siamese network can be trained in an unsupervised manner, as the labels of the input data are not needed.For these reasons, the method proposed in this paper has been built upon the siamese network, which is suitable for one-class unsupervised learning.The basic idea is similar to one-class support vector machine for anomaly detection [35].That is, given a set of training samples, we aim to model the underlying distribution of the data and detect the soft boundary of this set, in order to classify new inputs as belonging to this set or not.In this case, the model will only take a training dataset without class labels as input, which means the model is a type of unsupervised learning methods.

Constructing the complex-valued neural networks
As shown in Figure 3, our networks could be decomposed into two parts: One is the pseudo-siamese neural network (in the red dashed box) and the other is the GAN (in the purple dashed box).The complex-valued neural network receives the density state matrix as the input.The building modules for these networks are detailed in Appendix .Structure of the complex-valued neural network.The complexvalued network is composed of two parts: One is the pseudo-siamese network in the red dashed box and the other is a GAN in the purple dashed box.The pseudosiamese neural network consists of two encoders E r and E g that share the same network structure.ρ gen is generated by G.The discriminator network D is a binary classifier which outputs either 0 or 1.The generator G and discriminator D form the GAN, which aims to produce a ρ gen that is as close as possible to ρ real .

A.
The pseudo-siamese neural network consists of two encoders sharing the same network structure, labelled as E r and E g , respectively.In contrast to the original siamese network [34] which requires quadratic pairs as input, the pseudo-siamese network only requires a single input ρ real be fed to the first encoder E r .The second input ρ gen to the second encoder E g is automatically generated by the decoder G whose aim is to reconstruct ρ real .Therefore, the pseudo-siamese network trains much faster than the original siamese network while inherits its few-shot learning ability.In principle, these two encoders competes with each other to produce a pair of indistinguishable feature vectors v 1 and v 2 .The performance is evaluated by the cost function where the norm x could be the L p -norm of any complex vector x with Here 2-norm is chosen for Equation (3).As the two inputs to the encoders E r and E g are slightly different, the two encoders would not share the same weight parameters after training [36].
Combining the encoder E r with G yields an encoder-decoder structure which aims to produce fake samples that are close to real ones.Thus we introduce the loss function to quantify its performance.In analogy to classical autoencoders [37], it is found that L 1 -norm achieves better performance than that of p = 2 for this loss term.An optional discriminator network could be introduced for additional adversarial training.The discriminator D and generator G form the GAN (Fig. 3) which could enhance the ability of G to produce more realistic quantum samples.Indeed, D is a binary classifier trained to discriminate fake samples from real ones.The two cost functions for this adversarial net are given by which are alternatively minimized via gradient descent method.Specifically, the gradients are clipped between −1 and 1, turning the network into a Wasserstein GAN which is easy to train [32].In each round, the parameters of D are updated by minimizing L adv1 , while the parameters of G and E r are updated by minimizing L adv2 .Finally, by combining ( 3)-( 6), the complex-valued neural network is trained by alternatively minimizing L adv1 and with the weight parameters w 1 , w 2 , and w a being chosen adaptively.

Training the networks via unsupervised learning
Suppose the complex-valued network is trained with separable states only, an entangled state would result in a feature vector v ent distinct from that of the generated one in the latent space.Indeed, the entire training and prediction process can be divided into three steps as follows.
1) Preparing separable states as training samples.Following Equation (1), each ρ j i is generated via HH † /(tr HH † ), where H is a complex-valued matrix whose real and imaginary parts of each entry are sampled from independent Gaussian distributions.It is noted that this sampling method could cover the whole space of separable states [38].
2) Training the neural network on the generated set of separable states by alternatively minimizing L adv1 as per Equation ( 5) and L 3 as per Equation ( 7) via the gradient descent method.
where TP, FP, TN, and FN refer to the number counts of true positive, false positive, true negative, and false negative samples.Here, being positive or negative stands for a separable or entangled sample.Choosing b to satisfy Equation ( 8) implies that the probabilities of misclassifying entangled and separable states are the same on the test set.Hence, if the score of a quantum state is larger than this b, then it will be detected as entangled.
For each ρ in the test set, its score for entanglement detection can be defined as It could be further expressed in a witness-like form of where W E f (G) denotes the weight tensor which generates the corresponding linear and nonlinear network transformations.For this reason, the neural network model can be regarded as trying to determine the nonlinear witness W which approximately characterizes the boundary between separable and entangled states, without relying on samples of entangled states during training.Alternately, there is another way to implement the model for prediction without the test dataset, making both training and prediction independent of any information of entangled states.This is achieved by determining b as Obviously, this approach leads to a higher detection accuracy than using Equation ( 8).Since both the training and implementation do not rely on entangled samples, this approach is computationally efficient.More importantly, the major advantage of our unsupervised learning framework lies in its scalability, as generating sufficient entangled states for training becomes impractical for high-dimensional quantum systems.

Evaluation metrics
We use two evaluation metrics of binary classification in our experiments.The first metric is the Area Under Curve (AUC) of the Receiver Operating Characteristic (ROC) curve, which is created by plotting the True Positive Rate (TPR = TP/(TP + FN)) against the False Positive Rate (FPR = FP/(FP + TN)) using the similarity score defined in (9) for various values of b [39].The second metric is Equal Error Rate (EER), which is defined as FN/(TP + FN) when Equation ( 8) holds [40].

Detecting 2-qubit entangled states
The number of training samples for 2-qubit case is 160000, all composed of separable states.The number of testing samples is 80000, including 40000 separable states and 40000 entangled states.2-qubit separable states are generated by where m i=1 λ i = 1 and 0 ≤ λ i ≤ 1, with m iterating from 1 to 20.Entangled states are selected from randomly generated states of the entire system using PPT criterion.
The structure of the 4-layer encoder is illustrated in Figure 4(a).The last two layers of the encoder are Fully-Connected (FC) layers, with output channels being 64 and 10, respectively.The first two layers can be convolutional with different kernels and different number of output channels, or fully connected as tested in Figure 4(b).The best performance of the model has been achieved with the convolutional kernel size of the first two layers being 2 × 2 and 2 × 2. The best AUC is 0.99 and EER is 2.99%, attained at a small w a which is the weight of adversarial cost for training.As shown in Figure 4(c), convolutional layer performs much better than FC layer, with AUC being consistently higher than 0.975 and EER lower than 5%.Figure 4(e-f ) shows the performance of convolutional neural networks when the number of output channels varies, indicating that a small number of output channels is enough to extract the features of entanglement for 2-qubit states.

Detecting 3-qubit entangled states
An entangled 3-qubit state can be classified into several types, e.g.bi-separable states and bound entangled states [24].The 3-qubit state is fully-separable if The distribution of 3-qubit states is illustrated in Figure 5(a).In this case, successful supervised learning requires that one can generate enough and balanced samples for all types of entanglement, which cannot be guaranteed by the current random sampling techniques.In contrast, a universal entanglement detector could be built using only the fully-separable samples if unsupervised learning method is employed.The numerical results in Figure 5(b) are based on a dataset consisting of 160000 training samples and 200000 test samples.The training samples are fully-separable states, and the test samples include 40000 fully-separable states, 40000 bound entangled states and 120000 bi-separable states (40000 for each subtype).To accommodate the 8 × 8 density matrix input, a third convolutional layer is added.The number of the output channels for the three convolution layers is 10, 30, 50, respectively.Since the unsupervised model focuses on detecting the feature of separability instead of the features of different types of entanglement, it has achieved similar detection accuracy on four types of entangled samples.
The proposed unsupervised learning method is applicable to the detection of partial entanglement and genuine entanglement.Here we take the detection of bi-separable states of a 3-qubit system as an example [41].Suppose the task is to discriminate the biseparable states ρ A|BC (B and C are entangled) from the other states.By generating the entangled states for subsystem BC using the PPT criterion, the samples of bi-separable states are given by A classifier for A|BC separability can be obtained by training on these samples in an unsupervised manner.Particularly, if we replace ρ BC i in ( 14) by a generic 2-qubit state, the anomalies detected would be the quantum states that are entangled between A and BC (Page 10).Furthermore, if we generate the samples as the abnormal samples detected by the unsupervised model would be quantum states which are not bi-separable.In other words, the genuine entanglement of the 3-qubit state can be detected as an anomaly.

Scalability up to 10-qubit states
The unsupervised learning method is applied on 4-to 10-qubit states to study its scalability.We have found that the generation of separable states for training is very efficient even for tens of qubits, because the generation of separable pure states is very efficient, which is done by generating single qubit states and calculating their Kronecker products.Consequently, mixed (fully and partial) separable states can be constructed as linear combinations of pure states, which does not take much time.In this work, it takes less than 10 minutes to generate enough pure separable samples for 10-qubit states on a desktop computer, and mixing the samples takes less than 3 minutes.Moreover, we have observed a linear increase on the generation time with the dimension.Here we used pure 4-to 10-qubit states for training because the test samples of mixed states (mixed entangled states) are hard to label for high-dimensional system, while we have developed an efficient algorithm [42] that can tell whether a randomly generated 10qubit pure state is entangled or not within 5 seconds.However, test samples are just used to measure the accuracy of the model.The model is trained using the separable samples only, which can be generated efficiently.The trained model can be implemented without using test samples as shown in (11).Therefore, the model can also be trained and implemented with mixed state samples for high-dimensional cases.Note that the geometrical measure is only used to label the entangled states for the test dataset.The separable pure states are generated by where |ψ j i is a randomly generated pure state vector of the j-th qubit.The real and imaginary parts of the complex-valued vector are sampled from an independent Gaussian distribution.The density matrix ρ sep = |ψ sep ψ sep | is used as the input to the neural network.Figure 6(a) depicts the network structure of the encoder for entanglement detection in 5-qubit states, where a max pooling layer has been added to handle the increased dimension of the input.For 10-qubit states, we adopt three convolutional layers and increase the max pooling size to 4 × 4. The training dataset is composed of 160000 separable states, and the test dataset is composed of 40000 separable and 40000 entangled states.The entangled states are found by randomly generating 4-to 10-qubit pure states and computing their entanglement measures using the numerical method from [42].See Appendix B for the details of the algorithm.
As shown in Figure 6(b), the unsupervised model achieves an AUC of 0.9952 and an EER of 2.02% for entanglement detection in 10-qubit states.The EER is 0.54% for entanglement detection in 5-qubit states, which means only 54 in 10000 states are misclassified.The short inference time is another advantage of the neural network model.The inference time of the neural network model on GPU is about tens of microseconds to hundreds of microseconds for up to 10 qubits (Figure 6(c)), which is significantly faster than the up-to-date numerical method which takes the state vector instead of density matrix as the input for computing the geometrical entanglement measure.The time needed for generating training dataset is greatly reduced as compared to supervised learning methods, since there is no need to label the entangled states.For example, suppose the 10-qubit training dataset of the supervised method consists of 100000 samples, which must be labelled by numerically computing the geometrical entanglement measure.The total time needed for generating the dataset is about 138 hours (labelling each sample takes 5 seconds in average).In contrast, generating separable training samples of the 10-qubit system is much more simple, which only takes several minutes.
The upper half of Figure 6(d) shows the evolution of feature vectors of 1000 separable and 1000 entangled states in the training process for 5-qubit states.We visualize the evolution by t-SNE method [43] which maps the feature vectors to twodimensional space.In the first 10 epochs, the entangled and separable states are mixed up in the latent space and difficult to distinguish.After 20 epochs, the feature vectors start to split into two set.In the last 20 epochs, the feature vectors of separable states are separated completely from the feature vectors of entangled states, with very few exceptions.A similar evolution can be seen in the distribution of detection scores of the input states.After training, the detection scores of separable states are more closed to zero, while the scores of entangled states are concentrated around 0.001.

Conclusions and Discussions
We have proposed an efficient and scalable method with unsupervised learning to detect quantum entanglement.Specifically, we build up a class of complex-valued pseudosiamese neural networks which is easy to implement as it is trained without entangled samples.Moreover, it is scalable to detect entanglement of multipartite systems where sufficient labelled entangled samples become difficult to obtain, and our numerical analysis finds that we could still obtain a rather high accuracy with above 97.5% on average for multipartite systems from 2-qubit to 10-qubit.For this reason, we believe that our work provides a promising tool to detect quantum features of high-dimensional quantum data.
Finally, it is noted that we exploit the convexity of separable samples and thus reformulate entanglement detection as an anomaly detection problem, for which the unsupervised neural networks are suitable.Since other useful quantum features, such as Bell nonlocality and Einstein-Podolsky-Rosen steerability, also share the same property that it is defined as a distinguishable sample from a convex set, it is evident that our work can be readily generalized to solve the similar detection problem.

BN( z) =
γ rr γ ri γ ri γ ii z + β. (A.6) The parameters γ r(i)r(i) and β are trainable.Each convolutional layer is composed of a convolutional operation, a CReLU and a BN layer.The first fully-connected layer is composed of a fully-connected operation and a CReLU.The last fully-connected layer generates the final output directly via a fully-connected operation.The operations defined above are differentiable, which means the neural network could be trained efficiently with back-propagation.The gradient is calculated with respect to the realvalued cost function L as The back-propagation updates the complex-valued parameter t = t r + it i of the neural network by which could be implemented using Pytorch [45].

Appendix B. Computing the GME of quantum pure states
We employ the algorithm proposed in [42] to compute the GME for an arbitrary quantum pure state.The algorithm is based on a tensor version of the Gauss-Seidel method for computing unitary eigenpairs (U-eigenpairs) of a non-symmetric complex tensor A which corresponds to the given quantum pure state.

Figure 2 .
Figure 2. Pipeline of the siamese network.Two inputs are encoded into latent vectors whose difference is detected by a similarity measure.

2 DFigure 3
Figure 3. Structure of the complex-valued neural network.The complexvalued network is composed of two parts: One is the pseudo-siamese network in the red dashed box and the other is a GAN in the purple dashed box.The pseudosiamese neural network consists of two encoders E r and E g that share the same network structure.ρ gen is generated by G.The discriminator network D is a binary classifier which outputs either 0 or 1.The generator G and discriminator D form the GAN, which aims to produce a ρ gen that is as close as possible to ρ real .

3 )
Determining the decision threshold value b on the test set after training.We choose b to satisfy FN TP + FN = FP FP + TN ,

Figure 4 .
Figure 4. Detection of 2-qubit states.(a) 4-layer neural network structure of the encoder, including two convolutional layers followed by two FC layers.The structures of the encoder and decoder are symmetric.The structure of the discriminator is the same as the encoder, with an additional normalization layer to produce a scalar output.(b) The performance of the neural networks with the first two layers being convolutional or FC.The convolutional kernel size combinations that have been tested are 1 × 1 and 3 × 3, 2 × 2 and 2 × 2, 3 × 3 and 1 × 1, with the number of output channels being 10 and 30, respectively.If the first two layers are FC, the number of output channels is set to 32 and 128, respectively.(e-f ) AUCs and EERs with different number of output channels for convolutional layers.The convolutional kernel sizes are 2 × 2 and 2 × 2.

Figure 5 .
Figure 5. Detection of 3-qubit states.(a) Distribution of 3-qubit states.ρ A|BC is a bi-separable state with qubit B and C entangled.(b) EERs and AUCs with different combinations of convolutional kernels, where i − j − k stands for the kernel sizes of the three convolutional layers.

14 Figure 6 .
Figure 6.Scalability of unsupervised learning.(a) The neural network structure of the encoder for 5-qubit pure states.The convolutional block is composed of two convolutional layers and a 2 × 2 max pooling layer.(b) AUCs and EERs achieved by the unsupervised learning method as the number of qubits increases.(c) The comparison of inference time between the up-to-date numerical method for computing the Geometrical Entanglement Measure (GME) and the neural network method.(d) The evolution of feature vectors and detection scores for 1000 separable and 1000 entangled samples of 5-qubit states during training.