Quantum autoencoders with enhanced data encoding

We present the enhanced feature quantum autoencoder, or EF-QAE, a variational quantum algorithm capable of compressing quantum states of different models with higher fidelity. The key idea of the algorithm is to define a parameterized quantum circuit that depends upon adjustable parameters and a feature vector that characterizes such a model. We assess the validity of the method in simulations by compressing ground states of the Ising model and classical handwritten digits. The results show that EF-QAE improves the performance compared to the standard quantum autoencoder using the same amount of quantum resources, but at the expense of additional classical optimization. Therefore, EF-QAE makes the task of compressing quantum information better suited to be implemented in near-term quantum devices.


I. INTRODUCTION
Large-scale fault-tolerant quantum computation is a rather distant dream, typically estimated to be a few decades ahead. A reasonable question then is whether we can do something useful with the existing noisy intermediate-scale quantum (NISQ) [1,2] computers. The main proposal is to use them as a part of a hybrid classical-quantum device. The variational quantum algorithms (VQAs) [3] are a class of algorithms that use such hybrid devices, which manage to reduce the requisites of quantum computational resources at the expense of classical computation.
The general rationale of a VQA is to define a parametrized quantum circuit whose architecture is dictated by the type and size of the quantum computer that is available. This quantum circuit, in turn, will depend on a set of classical parameters that can be adjusted using a quantum-classical optimization loop by minimizing a cost function. In this manner, we look for a quantum circuit that allows to perform a particular task, given the available quantum resources. Let us remark here that several VQAs have already been proposed in the context of making NISQ computers practically useful for real applications [4][5][6][7][8][9][10][11][12][13][14][15][16][17][18][19].
Recently, much attention has been paid to data encoding in VQAs [20,21], since it was proven that data encoded into the model alters the expressive power of parameterized quantum circuits [22,23]. Specifically, this idea has been implemented for classification of data [24,25], and to study energy profiles of quantum Hamiltonians [26].
In this paper, we will explore how data encoding influences a Quantum Autoencoder (QAE) [9]. The QAE is a VQA designed to compress the input quan- 1. Circuit implementation of a quantum autoencoder with a 2-qubit latent space. The unitary U (θ) encodes a 6-qubit input state ρ in into a 2-qubit intermediate state, after which the decoder U † (θ) attempts to reconstruct the input, resulting in the output state ρ out . tum information through a smaller latent space. In this scheme, we look for a parameterized quantum circuit U (θ) that encodes an initial input state into an intermediate latent space, after which the action of the decoder, U † (θ), attempts to reconstruct the input. A graphical depiction of a QAE is shown in Fig. 1. For readers interested in experimental applications, a QAE implementation in a photonic device can be seen in Ref. [27].
Note that the motivation for a quantum autoencoder is to be able to recognize patterns beyond the capabilities of a classical autoencoder, given the different properties of quantum mechanics. Moreover, recall that for NISQ devices, any tool that can reduce the amount of quantum resources can be considered valuable. For instance, quantum autoencoders could be used as a state preparation engine in the context of other VQAs. That is, we could combine, say, a Variational Quantum Eigensolver [4] with a pretrained QAE, where now the only active parameters are associated with the latent space.
FIG. 2. Schematic representation of the EF-QAE. The input to EF-QAE is a set of initial states ρ in , a feature vector x that characterizes the initial states, and a shallow sequence of quantum gates U . The feature vector x is encoded together with the variational parameters θ, where the latter are adjusted in a quantum-classical optimization loop until the local cost C(θ) converges to a value close to 0. When this loop terminates and the optimal parameters θopt are found, the resulting circuit U (θopt, x) prepares compressed states |φ of a particular model. Moreover, we may apply U † (θopt, x)|0 . . . 0 ⊗ |φ to recover ρ out ≈ ρ in . This paper is organized as follows. In Sec. II we introduce the enhanced feature quantum autoencoder (EF-QAE). As we will see, its key ingredient is to include a feature vector into the variational quantum circuit that characterizes the model we aim to compress. Next, in Sec. III and Sec. IV we compare and assess the performance of the EF-QAE and the standard QAE in simulations, by compressing ground states of the 1D Ising model and classical handwritten digits, respectively. Finally, in Sec. V, we present the conclusions of this work.

A. Overview
Here, we present the enhanced feature quantum autoencoder (EF-QAE). A schematic diagram of the EF-QAE can be seen in Fig. 2. The algorithm can be initialized with a set of initial states ρ in i , a feature vector x, and a shallow sequence of quantum gates U . In this scheme, we define a unitary U (θ, x) acting on the initial state ρ in i , where x is a feature vector that characterizes the set of input states. For instance, as we will see in Sec. III, x may be the transverse field λ of the 1D Ising spin chain. Once the trial state is prepared, measurements are performed to evaluate the cost function C(θ). This result is then fed into the classical optimizer, where the parameters θ are adjusted. This quantum-classical loop is repeated until the cost function converges to a value close to 0. When the loop terminates, U (θ opt , x) prepares compressed states |φ of a particular model.
A summary comparing EF-QAE and QAE proposed in Ref. [9] can be seen in Appendix A. Note that the main difference between EF-QAE and QAE is the presence of a feature vector x in the sequence of gates U . This will allow us to study and explore how data encoding influences the behavior of a quantum autoencoder.

B. Cost function
The goal of a quantum autoencoder is to store the quantum information of the input state through the smaller latent space. Therefore, it is important to quantify how well the information is preserved. This in general is quantified by a cost function that one has to minimize. In Ref. [9], this cost function evaluates the fidelity of the input and output states, and it is constructed from global operators. However, it is known that global cost functions lead to trainability issues even for shallow depth quantum circuits [28,29].
To address this issue, we use a cost function designed from local operators, proposed in Ref. [29]. As mentioned therein, there is a close connection between data compression and decoupling. That is, if the discarded qubits, from now on referred to as trash qubits, can be perfectly decoupled from the rest, the autoencoder reaches lossless compression. For instance, if the output of the trash subsystem is a fixed pure state, say |0...0 , then it is decoupled and consequently, the input state has been successfully compressed.
A figure of merit to quantify the degree of decoupling, or data compression, when training is simply the total amount of non-zero measurement outcomes on the n t trash qubits, which will be minimized. To design the cost function to be local, different outcomes may be penalized by their Hamming distance to the |0 ⊗nt state, which is just the number of symbols that are different in the binary representation. Thus, the local cost function C to be minimized is where d Hj denotes the Hamming distance and M k,j are the results of the j-th measurement on the k trash qubit in the computational basis. Equivalently, it can also be defined in terms of local Z Pauli operators. Finally, notice that this cost function delivers direct information on how the compression of the trash qubits is performed and has a zero value if and only if the compression is completed.

C. Ansatz
To implement the EF-QAE model on a quantum computer, we must define the form of the parametrized unitary U (θ, x), decomposing it into a quantum circuit suitable for optimization. Recall that a quantum autoencoder may be thought of as a disentangling unitary. The complexity of the circuit thus limits this property. Given the limited available quantum resources in practice, due to the coherence times and gate errors, we will look for a circuit structure that maximally exploits entanglement while maintaining a shallow depth.
A primitive strategy to construct a variational circuit in a more general case may consist of building a circuit of arbitrary 2-and 1-qubit gates characterized by some parameters. However, this is a naive approach. The action of the EF-QAE on the original state is Thus, it is clear that the entangling gates should mostly act between each of the trash qubits, and between the trash qubits and the qubits containing the final compressed state. Subsequently, we may avoid using entangling gates between the qubits that are not trash while maximizing the entangling gates on the ones of interest. This could be done using a similar structure to that depicted in Fig. 3. Notice that most of the sequence of entangling gates can be applied in parallel at the same step, and that the number of quantum gates is linear with the number of qubits and layers.
In this work, we follow a similar encoding strategy to that in Ref. [25]. That is, we encode the feature vector x into each of the single R y qubit rotations by using a linear function as where i, j indicates a component of the vector, and θ are the parameters adjusted in the optimization loop.
The rationale behind choosing this kind of encoding is that it has been shown to provide universality, provided enough layers, and with a single qubit [25]. Here, although we use multiple qubits, and entanglement is allowed, we expect a similar behavior as the number of layers increases. Note as well that this encoding is clearly analogous to that used in classical neural networks. That is, θ plays the role of the weights and biases, while the rotation gate plays the role of the non-linear activation function. On the other hand, the role of the feature vector x is inspired by feed-forward classical neuronal networks. Specifically, in this type of classical network, data is reintroduced and processed by many layers of neurons, similar to what our quantum circuit is doing. From a quantum mechanical perspective, we can say that the quantum data compression is tailored to a particular input, informed by the feature vector x. That is, EF-QAE is applying different unitary operations U (θ, x) to different input states, depending on the extra information delivered by the feature vector x, and by doing so, improving the compression performance.
Lastly, let us remark that other encoding strategies of the feature vector can be considered, for instance, using a non-linear encoding [26].

III. 1D ISING SPIN CHAIN
The EF-QAE can be verified on simulations. We utilized the open-source Python API Qibo [30,31] for the simulation of the quantum circuits. Here, we benchmark both the EF-QAE and the standard QAE in the case of a paradigmatic quantum spin chain with 6 qubits, the transverse field Ising model. The 1D Ising model is described by the following Hamiltonian where λ is the transverse field. In the thermodynamic limit, the system has a quantum phase transition exactly at λ = 1.
The EF-QAE and QAE are optimized over a training set of ground states of the Ising model. Specifically, we have considered N=20 equispaced ground states in between λ = 0.5 and λ = 1.0, with initial random parameters. For the cost function, we computed Eq. 1 for each training state and then averaged them as Nonetheless, notice that for other models, sophisticated cost functions could be more convenient to implement. We have considered the variational quantum circuit in Fig. 3 with 3 layers, and therefore, the resulting compressed state contains 4 qubits. Here, the feature vector x for the EF-QAE is a scalar that takes the value of the transverse field λ.
The classical technique employed in the optimization loop is the BFGS method, which is gradientbased and involves estimation of the inverse Hessian matrix [32]. Let us also briefly comment here on the training required for both QAE and EF-QAE. Indeed, although the depth of the circuit is equivalent, the number of trainable parameters is not. In this sense, QAE has 1 trainable parameter on each rotation-gate, whereas EF-QAE has dim(x) + 1 trainable parameters. For this example, dim(x) = 1, since x is just a scalar value, and therefore, the number of trainable parameters is 2. For gradientbased optimizers, this may imply the computation of extra gradients, and therefore, extra cost function evaluations. Recall, however, that this possible classical overhead is only present during the training procedure, and hence, we will not face any overhead when using a pretrained EF-QAE in combination with other machine learning tasks.
In Fig. 4, we show the cost function value as a function of the number of evaluations. The EF-QAE* is the EF-QAE initialized with the optimal parameters of QAE. This way, the EF-QAE* will always improve the QAE performance. As can be seen, the EF-QAE achieves almost twice the compression of the QAE. Nevertheless, notice that for the EF-QAE, the number of function evaluations required to achieve higher compression is larger. Recall that this is simply a trade-off between classical and quantum resources. That is, using the same quantum resources we improve the compression performance at the expense of additional classical optimization.
To quantify these expectations, we assess both EF-QAE and QAE with the optimal parameters against two test ground states of the Ising model, specifically, with λ = 0.60 and λ = 0.75. The results are shown in Fig. 5. Here, we show a density matrix visualization of the trash space. The EF-QAE achieves better compression to the |00 trash state, and therefore, higher fidelity on the output state. As we change the values of the transverse field, we note however that compression differs. In Appendix B we discuss and provide the output fidelities of the training and 60 test ground states.

IV. HANDWRITTEN DIGITS
In this section, we benchmark EF-QAE and QAE models in the case of 8 × 8 handwritten digit compression with 6 qubits using 4 layers. The data comprising each digit consists of a matrix with values from 0 to 16 corresponding to a gray map. Each value of this matrix is encoded in the amplitude of a 6-qubit state, further restricted to normalization.
The EF-QAE and QAE are optimized over a training set of handwritten digits obtained from the Python package Scikit Learn [33]. Specifically, we have considered N=20 handwritten digits, 10 of each corresponding to 0 or 1. The simulation details are equivalent to those in Sec. III. Here, the feature vector for the EF-QAE corresponds to x = 1, 2. That is, we simply input a value of x = 1 (x = 2) if the handwritten digit corresponds to 0 (1). The reason to choose x = 1, 2 is that no obvious feature distinguishes both digits. Nonetheless, more convenient strategies could be used in future work. For instance, one may allow the feature vector x to be a free variational parameter.
In Fig. 6, we show the cost function value as a function of the number of evaluations. Recall that EF-QAE* is simply the EF-QAE initialized with the optimal parameters of QAE. We note that EF-QAE achieves three times the compression of QAE using the same quantum resources. However, in contrast to the previous Ising model case, EF-QAE requires even fewer function evaluations to improve over the standard QAE. This is due to the fact that, although the parameter search space is larger, by including the feature vector we are affecting the parameter landscape in such a way that now it is well-behaved, and therefore, the optimization procedure leads to faster convergence. Once again, to gain insight into the compression process, we assess both EF-QAE and QAE with the optimal parameters against two handwritten test digits corresponding to 0 and 1. The results are shown in Fig. 7. Here, we plot the output digit of the EF-QAE and QAE. Once more, since EF-QAE achieves better compression to the |00 trash state, we obtain higher fidelity on the output state. Remarkably, in both cases, the performance of the EF-QAE is improved with respect to the QAE. In Appendix B we discuss and provide the output fidelities of the training and 60 test handwritten digits.

V. CONCLUSION
We have presented a variational quantum algorithm called EF-QAE capable of compressing quantum data of a parameterized model. In contrast to standard QAE, EF-QAE achieves this compression FIG. 7. Images of 0 and 1 handwritten test digits encoded into a 6-qubit state (8 × 8 pixels). Images shown correspond to the input state, and the output states of the EF-QAE and QAE models. As can be seen, the fidelity of the EF-QAE output state is improved compared to QAE.
with higher fidelity. Its key idea is to define a parameterized quantum circuit that depends upon adjustable parameters and a feature vector that characterizes such a model. In this way, the data com-pression can be tailored to the particular input, informed by the feature vector, and the compression performance is enhanced.
We have validated the EF-QAE in simulations by compressing ground states of the 1D Ising spin chain, and classical handwritten digits encoded into quantum states. We compared the results with the standard QAE. The results show that EF-QAE achieves better compression of the initial state, and therefore, the final output state is recovered with higher fidelity. Moreover, the learning task of EF-QAE can be initialized with the optimal QAE parameters. In this manner, EF-QAE will always improve the QAE performance. Nonetheless, the encoding strategy of the feature vector is amenable to be improved, for instance, allowing the feature vector to be a free variational parameter or using a non-linear encoding. We leave the study of encoding strategies for future work.
The EF-QAE may need additional classical optimization compared to QAE. In contrast, we increase the compression performance using the same amount of limited quantum resources. In this sense, EF-QAE is a step toward what could be done on NISQ computers, shortening the distance between current quantum devices and practical applications.  Ising model: In Fig. 10 we show the output fidelities of 20 training and 60 test Ising ground states. As can be seen, the output fidelities of the EF-QAE are higher, except for a few outlier values around λ = 0.7. This could be improved, for instance, by simply increasing the number of training states, or by populating values around λ = 0.7 taking noneq-uispaced training ground states.

Appendix C: Resilience to noise
It has been shown recently that specific VQAs can exhibit noise resilience [35]. That is, the optimal parameters are unaffected by certain noise models. Here we prove that the local cost function C is resilient to global depolarizing noise. Let us rewrite C from Eq. 1 as where ζ (k) = 0|U † (Z k ⊗ 1 1 k )U |0 . From now on, we refer toC andζ as the noisy versions of these quantities. Recall that global depolarizing noise transforms the state according to ρ → qρ+(1−q)1 1/d. If we consider a circuit that has depth D, then the final state is q D ρ + (1 − q D )1 1/d. Notice as well thatζ (k) is estimated simply by executing the circuit in Fig. 3 and measuring in the computational basis. The maximally mixed state has zero expectation value, since we measure Pauli Z operators. Therefore, we obtain thatζ (k) = q D ζ (k) , where D is the depth of the circuit used to estimate ζ (k) . This implies