Neural

Inspired by the success of classical neural networks, there has been tremendous effort to develop classical effective neural networks into quantum concept. In this paper, a novel hybrid quantum– classical neural network with deep residual learning (Res-HQCNN) is proposed. We firstly analyse how to connect residual block structure with a quantum neural network, and give the corresponding training algorithm. At the same time, the advantages and disadvantages of transforming deep residual learning into quantum concept are provided. As a result, the model can be trained in an end-to-end fashion, analogue to the backpropagation in classical neural networks. To explore the effectiveness of Res-HQCNN , we perform extensive experiments for quantum data with or without noisy on classical computer. The experimental results show the Res-HQCNN performs better to learn an unknown unitary transformation and has stronger robustness for noisy data, when compared to state of the arts. Moreover, the possible methods of combining residual learning with quantum neural networks are also discussed. © 2021TheAuthor


Introduction
Artificial neural networks (ANNs) stand for one of the most prosperous computational paradigms (Nielsen, 2015).Early neural networks can be traced back to the McCulloch-Pitts neurons in 1943(McCulloch & Pitts, 1943).Over the past few decades, taking advantage of a number of technical factors, such as new and scalable software platforms (Bergstra et al., 2010;Jia et al., 2014;Maclaurin et al., 2015;Paszke et al., 2017Paszke et al., , 2019) ) and powerful special-purpose computational hardware (Chetlur et al., 2014;Jouppi et al., 2017), the development of machine learning techniques based on neural networks makes progress and breakthroughs.Currently, neural networks achieve significant success and have wide applications in various machine learning fields like pattern recognition, video analysis, medical diagnosis, and robot control (Amato et al., 2013;Bishop et al., 1995;Mitchell & Thrun, 1993;Nishani & Çiço, 2017).One notable reason is the increasingly passion for exploring new neural architectures, including manual ways by expert knowledge and automatic ways by auto machine learning (He et al., 2016a;Peng et al., 2020a;Pham et al., 2018).Particularly, in 2016, deep residual networks (ResNets) are proposed with extremely deep architectures showing excellent accuracy and nice convergence behaviors (He et al., 2016a(He et al., , 2016b)).Their result won the 1st place on the ImageNet Large Scale Visual Recognition Challenge 2015 classification task for ImageNet classification and ResNet (and its variants) achieves revolutionary success in many research and industry applications.
Among these works, Beer et al. (2020) caught our attention.On one hand, it provides the readers with accessible code.On the other hand, the paper presents a truly quantum neurons forming a quantum feedforward neural network.It has remarkable ability to learn an unknown unitary and striking robustness to noisy training data.This work is important for noisy intermediate-scale quantum devices due to the reduction in the number of coherent qubits.We are interested in it.The cost function in this work is chosen as the fidelity between a pure quantum state and an arbitrary quantum state.Nevertheless, as the number of network layers deepens, we find that the rate of convergence of the cost function decreases, and the value of convergence even cannot reach the maximum for clean data, see Fig. 1(a).As for noisy data, Fig. 1(b) shows that the robustness for noisy data gets weaker with the network deeper and deeper.So we wonder whether the performance of the cost function for both clean and noisy data can be improved or not.Here we use a 1-dimensional list of natural numbers to represent the number of perceptrons in the corresponding layer.
Inspired by the efficiency of deep residual learning, we try to design a novel quantum-classical neural network with deep residual learning (Res-HQCNN) to achieve our goal.This idea is novel, and as far as we know, no work has been attempted up to now.We wish to explore how to put residual scheme into the QNNs in Beer et al. (2020) efficiently.It is not trivial.First, the QNNs in Beer et al. (2020) is a closed quantum system.The dynamics of a closed quantum system are described by a unitary transform.The input and output matrices in the QNNs of Beer et al. have unit trace, which is an important constraint for a density matrix.The residual scheme may increase the trace of input and output matrix, which will cause that the implementation is not possible for quantum computer.Second, our goal is to improve the performance of the cost function for both clean and noisy quantum data.We hope the experiment can not only be carried out on quantum computer, but also have improved performance.Third, connecting residual scheme into a feed-forward QNNs will change the training algorithm, especially the process of updating unitary perceptrons.The number of the residual block structure, the number of network layers, and the choice of skipping layers or not will all influence the procedure of updating parameters.Since the updating parameters matrix is calculated from the definition of derivative function, different network structure has different updating parameters matrix.The more diverse the network structure, the more complex the corresponding updating parameters matrix.So this exploration is challenging but interesting.We hope our paper can be a useful reference in this research area.Contributions stemming from this paper include: • Design a new residual learning structure based on the QNNs in Beer et al. (2020).
• Give the model of Res-HQCNN and calculate the new training algorithm.Based on the training algorithm, we present an analysis from the perspective of propagating information feedforward and backward.
• Explore different ways of connection between the new residual scheme and QNNs, such as identity shortcut connection skipping one layer.
• Present improved performance of Res-HQCNN on both clean and noisy quantum data over the former QNNs at the cost of implementing only on classical computer.
• Discuss another method to design residual block structure into quantum neural networks so that the implementation can be carried out on quantum computer.
The remainder of this paper is organized as follows.Section 2 reviews the related contributions about quantum neural networks and residual scheme.In Section 3, we briefly introduce the basic concepts of quantum qubits and the operators we mainly use in the paper, as well as the mechanism of deep residual learning.Section 4 gives the model of quantum neural network with deep residual learning, including its architecture and training algorithm on classical computer.To validate the improved performance, Section 5 provides the experimental simulations and corresponding analysis.Finally conclusion and discussion in Section 6 are given.

Quantum neural networks
Quantum neural networks have strong potential to be superior to the classical neural network after combining neural computing with the mechanics in quantum computing.Quantum data is in the form of quantum states.Just as a classical bit has a state 0 or 1, a qubit also has a state.Two possible states for a qubit are the states |0⟩ and |1⟩.The examples can be the two different polarizations of a photon and two states of an electron orbiting a single atom.Quantum neural networks can also process realworld data (Yan et al., 2017;Yan et al., 2016).In this paper, we mainly focus on quantum neural networks with quantum data.Specifically, these research achievements mainly include the following aspects: solving central tasks in quantum learning (Beer et al., 2020;Bisio et al., 2010;Sasaki & Carlini, 2002); enhancing the problem of machine learning (Alvarez-Rodriguez et al., 2017;Dunjko et al., 2016;Purushothaman & Karayiannis, 1997); and efficient classification of quantum data (Li & Wang, 2020;Sentís et al., 2019;Zhao et al., 2019).Among them, those papers about quantum learning for an unknown unitary transformation impress us a lot.In detail, Bisio et al. (2010) addressed this task and found optimal strategy to store and retrieve an unknown unitary transformation on a quantum memory.Soon after, Sedlák et al. (2019) designed an optimal protocol of unitary channels, which generalizes the results in Bisio et al. (2010).Moreover, Beer et al. (2020) proposed a quantum neural network with remarkable generalization behavior for the task of learning an unknown unitary quantum transformation.We hope the ability of learning unknown unitary transformation can be improved due to the new idea we propose.

Residual scheme in neural networks
Proposed in 2012, AlexNet (Krizhevsky et al., 2017) became one of the most famous neural network architecture in deep learning era.This was treated as the first time that deep neural network was more successful than traditional, hand-crafted feature learning on the ImageNet (Deng et al., 2009).Since then, researchers pay many efforts to make the network deeper, as deeper architecture could potentially extract more important semantic information.But deeper networks are more difficult to train, due to the notorious vanishing/exploding gradient problem.Residual scheme in ResNet (He et al., 2016a) is one of the most successful strategy of improving current neural networks.ResNet makes it possible to train up to hundreds or even thousands of layers and still achieves compelling performance.Basically, this scheme reformulates the layers as learning residual functions with reference to the layer inputs, instead of learning unreferenced functions.Based on this, a residual neural network builds on constructs known from pyramidal cells in the cerebral cortex, utilizing skip connections, or shortcuts to jump over some layers.This simple but efficient strategy largely improves the performance of current neural architectures in many fields.For instance, in image classification tasks, many variants of ResNets (Chen et al., 2018;Dong et al., 2020;He et al., 2016a;Korpi et al., 2020;Xie et al., 2017) are proposed and get the state-of-the-art performance.This scheme is even introduced into graph convolutional networks.For instance, works from Peng et al. (2020aPeng et al. ( , 2020b) ) and Yan et al. (2018) also endow graph convolutional network with the residual connections, capturing a better representations for skeleton graphs.Nevertheless, as far as we know, there is no work introducing the residual scheme in the field of quantum neural networks.In this paper, we will make the first attempt to do this and present an efficient Res-HQCNN which can be trained with an end-to-end fashion.

Qubits and quantum operators
Analogous to the role bit is the smallest unit of classical computing, qubit is the smallest unit in quantum computing.The notation |•⟩ is called a ket which is used to indicate that the object is a column vector.The complex conjugate transpose of (Nielsen & Chuang, 2002).A twolevel quantum system in a two-dimensional Hilbert space C 2 with basis {|0⟩, |1⟩} is a single qubit, which can be written from the superposition principles: The quantum operators used in this paper mainly include tensor product operator, reduced density operator and partial trace.Tensor product is a way to extend the dimension of vector spaces through putting vector space together (Nielsen & Chuang,

2002
).The symbol for tensor product is denoted by ⊗.Assume A is an m by n matrix, B is a p by q matrix, then Another important operator used in this paper is the reduced density operator (Nielsen & Chuang, 2002).It is often used to get the desired subsystems of a composite quantum system.Assume A and B are two physical systems.The state in A ⊗ B is described by density matrix ρ AB .The reduced density operator for A is given by where tr B is a map of operators called partial trace over B.Here partial trace is defined as where |a 1 ⟩ and ⟨a 2 | are any vectors in A, |b 1 ⟩ and ⟨b 2 | are any vectors in B. For example, if A and B are both two-dimensional complex vector space C 2 , then ] .

Deep residual learning
Training a deep neural network can be computationally costly.He et al. (2016a) proposed neural networks with deep residual learning framework, which is easier to optimize and obtain high accuracy from increased depth.In a residual block structure, there is a shortcut pathway connecting the input and output of a block structure.Specifically, residual learning chooses to fit the residual mapping F (x):= H(x)-x, rather than approximating the desired underlying mapping H(x) directly.The final mapping of a residual block structure is F (x)+x, which is equivalent to H(x), see Fig. 2.
The degradation problem tells us that the deeper network has higher training error and test error.But compared with shallower counterpart, the performance of deeper network should not be worse.This suggests that the network might have difficulties in approximating identity mapping.So optimizing the residual mapping F (x) towards zero is relatively easier than approximating H(x) into identity mapping.
For ANNs, deep residual learning has the advantage of solving the problem of vanishing/exploding gradients and the degradation problem.This paper is mainly based on the QNNs in Beer et al. (2020), which is absence of a barren plateau in the cost function landscape.So we expect the QNNs with residual learning can get improved performance of the cost function both for clean and noisy data.We hope that the deeper the Res-HQCNN, the more effective it will be.

The model of Res-HQCNN
In this section, we define the architecture of Res-HQCNN based on the QNNs in Beer et al. (2020).According to the mechanism of the defined Res-HQCNN, we explain its training algorithm in different cases.

The architecture of Res-HQCNN
We firstly define a residual block structure in Res-HQCNN.Then the architecture of Res-HQCNN with multiple layers is presented.For better understanding the mechanism, we show an example of Res-HQCNN with one hidden layer.Finally, we give an analysis about the differences between former QNNs and Res-HQCNN.
Based on the residual block structure in Fig. 2, a new residual block structure in Res-HQCNN is defined as follows.For convenience, we put forward a few assumptions and notations at the beginning.
Res-HQCNN has L hidden layers.The perceptron nodes of each layer represent single qubits.Denote m l as the number of nodes in each layer l and we assume m l−1 ≤ m l for l = 1, 2, . . ., L.Here l = 0 represents the input layer and l = L + 1 corresponds to output layer.ρ l in and ρ lout represent the input and output state of layer l in Res-HQCNN.Then the residual block structure can be designed in Fig. 3.
Mathematically, we set the input mapping and final mapping of the residual block structure in Res-HQCNN as ρ l in and ρ l+1 in , respectively.The residual mapping is chosen as ρ lout .It is important to note that the output of the former layer is not the exact input of the next layer.As shown in Fig. 3, the new input of layer l+1 is the addition of the output state and the input state in layer l.Here the additive operation corresponds to the matrix elementwise addition.Due to m l−1 ≤ m l , we apply tensor product to ρ l in and |0 Next, we go on investigating the method to merge the residual block structure and quantum neural network together.Define quantum perceptron in layer l of Res-HQCNN to be an arbitrary unitary operator with m l−1 input qubits and one output qubit.For example, as presented in Fig. 4, quantum perceptron U L j is a (m L−1 + 1)-qubit unitary operator for j = 1, 2, . . ., m L .The Res-HQCNN is made up of quantum perceptrons with L hidden layers.It acts on an input state ρ 1 in of input qubits and obtains a mixed state ρ L+1out for the output qubits based on the layer unitary operator U l in the form of a matrix product of quantum perceptrons: Here U l j acts on the qubits in layer l − 1 and l for j = 1, 2, . . ., m l .The unitary operators are arbitrary, and they do not always commute, so the order of the layer unitary is important.During processing information from ρ 1 in to ρ L+1out , the residual block structure produces the new input state for layer l + 1 through adding the input state with the output state of layer l for l = 1, 2, . . ., L.  To facilitate the understanding, we give an example of the mechanism for Res-HQCNN with one hidden layer, see Fig. 5.
Define the layer unitary between the input layer and the hidden layer as U 1 which is in the form of a matrix product of quantum perceptrons.Analogously we define the layer unitary 1 between the hidden layer and the output layer.For the first step, we apply the quantum perceptrons layerwise from top to bottom, then the output state ρ 1out of the hidden layer is Next, we apply residual block structure to ρ 1 in and ρ 1out in order to get a new input state for the output layer: In the third step, we get the final output state for this Res-HQCNN in Fig. 5: Compared the former QNNs in Beer et al. (2020) with Res-HQCNN, we find that the trace value of the input state ρ l+1 in for some l changes due to the addition operation in the resid- , then the trace values of ρ 2 in and ρ 3 in are 2 and 4, respectively.In theory, ρ 2 in and ρ 3 in are not density matrix, and we cannot apply the training algorithm in quantum computer.However, every coin has two sides.The residual block structure improves the performance of cost function, especially for deeper network, which can be demonstrated in the experiment part.
One may also notice that we can apply the residual block structure for all the hidden layers, but not for the last output layer.Since we have assumed m l−1 ≤ m l for l = 1, 2, . . ., L and m 0 = m L+1 , then in general, the qubits in the last output layer L +1 are no more than the qubits in layer L. If we use the residual block structure to the last output layer L+1, then the final output of the network will be ρ out = ρ L+1 in + ρ L+1out .The dimension of ρ L+1 in is no less than the dimension of ρ L+1out , so we should apply partial trace to ρ L+1 in in order to keep the rule of matrix addition.
As we mentioned before, the residual block structure in Res-HQCNN also has difficulty in approximating identity mapping.
However, we will lose some information of ρ L+1 in in this way, which is contradicted with the goal of using residual scheme.We also provide an experiment result to show the inefficiency when applying residual block structure to the last output layer.

The training algorithm of Res-HQCNN
We randomly generalize N pairs training data which are possibly unknown quantum states in the form of (|φ in x ⟩, |φ out x ⟩) with x = 1, 2, . . ., N. It is also allowed to use enough copies of training pair (|φ in x ⟩, |φ out x ⟩) of specific x so that we can overcome quantum projection noise when computing the derivative of the cost function.Here for simplicity, we do not allow input states interacting with environment to produce output states (e.g., thermalization).
We choose to consider the desired output with V an unknown unitary operation.
The cost function we choose is based on the fidelity between the output of Res-HQCNN and the desired output averaged over all training data.But due to the definition of residual block structure and linear feature of fidelity, we define that the cost function of Res-HQCNN should divide 2 t , where t is the number of residual block structure in Res-HQCNN: Since we want to know how close the network output state and the desired output state, and the closer they are, the bigger fidelity is.If the cost function comes to 1, we judge the Res-HQCNN performs best, otherwise 0 the worst.So our goal is to maximize the cost function in the training process.
For each layer l of Res-HQCNN, denote ρ II1b.Apply the layer unitary between layer l−1 and l, II1c.Trace out layer l−1 and obtain the output state ) .

II2. Residual learning:
II2a.Apply the residual block structure in Fig. 3 to x (s) and ρ lout x (s) to obtain the new input state of layer l + 1, ) .

III. Update parameters:
III1.Compute the cost function: III2.Update the unitary of each perceptron via Here K l j (s) is the parameters matrix.
III2a.If Res-HQCNN has three layers with one hidden layer.
When l = 1, we can get the analytical expression for K l j after calculation, where the trace is over all qubits of Res-HQCNN which are not affected by U l j .η is the learning rate and N is the number of training pairs.Moreover, M l j is made up of two parts of the commutator: (3) When l = 2, then where M l j is Eq. ( 3) and III2b.If Res-HQCNN has four layers with two hidden layers.
When l = 1, then where M l j is Eq. ( 3) and When l = 2, then where M l j is Eq. ( 3) and When l = 3, then where M l j is Eq.(3) and (11) III2c.If Res-HQCNN has more than five layers, we can calculate the corresponding parameters matrices K l j (s) with the method in Appendix.There is no fixed formula for K l j (s).It changes with the depth of network and the parameter l.
IV. Repeat steps II and III until reaching the maximum of the cost function.
As for other cases of Res-HQCNN, we can get t ≤ L due to the skipping connection.The calculation method is the same as the one in Appendix, and the complexity of calculation decrease compared with the case t = L.So we do not give the training algorithm for other cases in detail.Here we can automatically update the unitary of each perceptron using Eq. ( 1) until converging to the optimal value of the cost function.Therefore, we can conclude that this Res-HQCNN can be trained in an end-to-end fashion.
Compared with the training algorithm on classical computer of QNNs in Beer et al. (2020), we find that K l j (s) in QNNs is no more than the one in Res-HQCNN.The residual block structure brings the new items for K l j (s), such as N l j (s), P l j (s), Q l j (s), S l j (s) and T l j (s) in Eq. ( 5), ( 7), ( 9), ( 11), ( 12).Observing the analytical expressions of Eq. ( 5), ( 7), ( 9), ( 11), ( 12) in detail, we notice that the input state can be applied by any deeper layer from the perspective of propagating information feedforward (e.g.N l j (s), Q l j (s), S l j (s), T l j (s)) and backward (e.g.P l j (s)).For better understanding of the training algorithm, we also provide a simple flowchart for the readers as follows (see Fig. 6).

Experiments
In this section, we conduct comprehensive experiments to evaluate the performance of Res-HQCNN.The experiments were run on ThinLinc server with 2 x Intel Xeon CPU E5-2650 v3 (2.30 GHz) on CentOS (Linux RHEL clone) Operating system.Firstly, we start from the elementary tests to prove the effectiveness.Here we also show the experimental results if we apply residual block structure to the last output layer.Then, we further explore the performance of Res-HQCNN with more layers.For four-layer Res-HQCNN, we compare the performance of different types, such as skipping one layer to connect the residual block structure.Finally, we generalize the experiments into noisy training data for testing the robustness of Res-HQCNN.
The training data for the following elementary tests and big networks are possibly unknown quantum states in the form of (|φ In order to show the power of residual block structure in Res-HQCNN, we compare with the results using the training algorithm of QNNs in Beer et al. (2020).For convenience, we still apply a 1-dimensional list of natural numbers to refer to the number of perceptrons in the corresponding layer as in Fig. 1.Specially, if there is a residual block structure shown in Fig. 3 that acts on the hidden layers, we plus a tilde on the top of the natural numbers.For example, a 1-2-1 quantum neural network in Beer et al.
(2020) can be denoted as [1, 2, 1], and a 1-2-1 quantum neural network with our residual block structure in this paper will be written as [1,2,1].If there is a skipping connection, we plus a hat on the top of the number representing the skipped layer.For example, [2,3,3,2] ) .

Elementary tests
We consider Res-HQCNN [1, 2, 1] and [2, 3, 2] with η = 1/1.8and ϵ = 0.1 for elementary tests, see Fig. 7(a).We find that the solid line is higher than the dashed one in the same color and both lines converge to 1 as the training rounds increase to 250.Compared the solid lines with dashed lines in the same color, we find that the solid lines have higher rate of convergence than the dashed ones before reaching 1.
We then test the performance of [1, 2, 1] and [2, 3, 2] with η = 1/2 and ϵ = 0.1, which is the case that applying the residual block structure to the last output layer, see Fig. 7(b).Since the residual block structure is used twice in [1, 2, 1] and [2, 3, 2], the update parameters matrix K l j in Eq. (4) will increase by adding one more term after computation.The method of computation is in Appendix.So we set η = 1/2 to decrease the learning rate a bit.In Fig. 7(b), the solid lines in blue and red cannot reach the maximum of the cost function.Compared the solid and dashed lines in the same color, the rate of convergence of the solid lines is not always higher than the one of dashed lines.These results give agreement with the theoretical analysis in the last paragraph of Section 4.1.Therefore we will not use residual block structure to the last output layer for Res-HQCNN in the following.

Big networks
In this subsection, we consider Res-HQCNN with deeper layers to test the advantages of deep residual learning.We firstly select Res-HQCNN [2,3,3,2] and [2,3,4,2] for 10 training pairs.The corresponding simulation results of them are presented in Figs. 8  and 9.
In order to test the power of skipping connection, we plot the performance of [2,3,3,2] and [2, 3, 3, 2] with suitable learning rate η = 1/3 in Fig. 8(b).One can find that [2, 3, 3, 2] with ρ x (s).Therefore we find that the skipping connection here is helpful in improving the performance of cost function.
We then try to change QNNs [2,3,4,2] into Res-HQCNN with all possible cases, see Fig. 9(a).Analogous to the results in Fig. 8(a), the blue line is above the other lines.The red and green lines are still unstable due to a smaller learning rate.We still test the skipping connection for [2,3,4,2] and [2,3,4,2] to see a difference.The green line is higher than the black one, which is consistent with the previous results in Fig. 8(b).
Taken Figs. 8 and 9 together, one can read out that if a QNN is with L hidden layers, then the case that Res-HQCNN has L residual block structures can bring the best improvement of cost function than the other cases, including the case with skipping connection.So in the following, we will consider the best case of Res-HQCNN with L residual block structures.
In order to test the effectiveness of residual block structure in quantum concept further, we go on detecting deeper Res-HQCNN [2,3,3,3,2] and [2,3,4,5,2].When the number of training pairs is set to be 5 with training rounds 1000 in Fig. 10(a), we find that the solid line is increasing with decreasing slope and finally it converges to 1 as the training rounds increase to 600.The slope of the bottom dashed line is increasing, but the dashed line does not converge at all during 1000 training rounds.This result is quite impressive.Next in Fig. 10(b), the cost function of [2,3,4,5,2] converges slower than the one of [2,3,3,3,2] due to the increase of quantum perceptrons.But the value of cost function for [2,3,4,5,2] is always larger than the one for [2,3,4,5,2] and solid line has larger convergence speed.
Compared Fig. 10 with the former Fig. 7, Fig. 8 and Fig. 9 together, one can see that deeper Res-HQCNN could bring more significant improvement, such as the difference of convergence rate between the solid and dashed line in the same color.As we know, if a Res-HQCNN has L hidden layers, then it will use L residual block structures to perform well.The larger L is, the bigger the parameters matrix K l j is, which results in a smaller suitable learning rate for Res-HQCNN.So when compared the performance of QNNs and Res-HQCNN at small learning rate, we will find out deeper Res-HQCNN learns faster and better than the former QNNs.

Generalization: the robustness to noisy data
In this subsection, we begin examining the robustness of Res-HQCNN to noisy quantum data.For comparison conveniently, we  [2,3,3,3,2] and [2,3,4,5,2] [2,3,4,2] and [2,3,4,2] to noisy training data.Here η = 1/9 and ϵ = 0.1, which is the same setting in Fig. 9(a).The step-sizes between two adjacent dots are 3 and 5, respectively.We also plot the variance of cost between the green dots and red dots on the right.(For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.)employ the same rule in Beer et al. (2020)  we find that if the number of noisy pairs are small, such as less than 35, Res-HQCNN and QNNs in Beer et al. (2020) both have strong robustness to noisy quantum data.As the number of noisy pairs continues to increase, the values of cost for green dots and red dots both begin to decrease.
When the number of noisy pairs exceeds 60, the variation of cost becomes increasing, reaching maximum value when the number of noisy pairs are 70.There are three unstable points (55, −0.0115), (90, −0.0161) and (100, −0.0012) that the variation is negative.As we can see, there are 21 pairs of green dots and red dots in Fig. 11  it is pleased to find that all variances of cost function are always positive without unstable points.The maximum value of variance is more than 0.35, while the one in Fig. 11(b) is no more than 0.12.
Up to now, we have gone through the experiments for Res-HQCNN with or without noisy and obtained the improvement of performance for cost function compared with the QNNs in Beer et al. (2020).Although we do not show the results for Res-HQCNN with four or more hidden layers, we conjecture that deeper Res-HQCNN would bring better improvement of performance for cost function due to the mechanism of its training algorithm.

Conclusion and discussion
In this paper, we have developed a hybrid quantum-classical neural network with deep residual learning to improve the performance of cost function for deeper networks.A new residual block structure in quantum concept has been designed based on QNNs in Beer et al. (2020).We have presented how to connect residual block structure with QNNs.The corresponding training algorithm of Res-HQCNN has also been given for different cases.
From the perspective of propagating information, the residual block structure allows information to propagate from input layer to any deeper layer, which is similar to the mechanism of ANNs with deep residual learning.The simulations have illustrated the power of Res-HQCNN at the cost of only running on classical computer.
There is an another method to design residual block structure in quantum concept: It is a convex combination of ρ l in and ρ lout .∆m l = m l − m l−1 .Such an operation is possible to implement on a quantum computer.A quantum device would choose the one state with some probability p and the other with 1 − p for 0 ≤ p ≤ We choose QNNs [1, 2, 1] and [2,3,2] to have a test.When p = 0, it is just the case of former QNNs.Randomly choose p = 0.3, p = 0.6 p = 0.9 and p = 1.From Fig. 13, we find that only the result of p = 1, i.e. ρ l+1 in = ρ l in for l = 1, 2, . . ., L, can bring improvement of cost function.So we give up this method and do not try to satisfy the requirements for running on a quantum computer.We think improvement of the performance for cost function are more meaningful for us.
This paper is mainly focus on how to combine deep residual learning with QNNs to improve the ability of QNNs learning an unknown unitary transform with or without noisy.The readers can learn the advantages and disadvantages from our analysis of the model Res-HQCNN.This is just the beginning of combining quantum neural network with residual block structure.Related work is none.We hope our paper could be a useful reference in this area, which is exactly the point of this paper.
One future investigation of this field could be achieved by processing classical data, such as images and sounds.A suitable data encoding rule between quantum states and real world data should be designed.At present, scholars have carried out many methods on quantum image representation (Yan et al., 2017;Yan et al., 2016;Yao et al., 2017).Due to the combination of quantum computing and deep residual learning, we believe the exploration of Res-HQCNN with real world data will be interesting on some computer vision tasks.Another possible future research is to connect and compare the current results with quantum process tomography (Mohseni et al., 2008;O'Brien et al., 2004).Since the main goal in the whole field of quantum process tomography is to reconstruct quantum processes.Finding out the connection and difference between Res-HQCNN and quantum process tomography may be helpful in characterization of quantum dynamical systems.

IV. Repeat Step II and
Step III until reaching the maximum of the cost function.
In the following, we will derive a formula for K l j (s) to update the perceptron unitaries U l j (s) with l = 1, 2. We assume the unitaries can always act on its current layer, such as U 1 2 (s) is actually U 1

Fig. 4 .
Fig. 4. The architecture of Res-HQCNN with L hidden layers.''Res'' represents the residual block structure of Res-HQCNN in Fig. 3.Not only can the ''Res'' be connected layer by layer continuously, but it can be connected by skipping one or more layers.The architecture of Res-HQCNN propagates information from input to output and gradually goes through a quantum feedforward neural network.

x
as the input state of layer l and ρ lout x as the output state of layer l with l = 1, 2, . . ., L and x = 1, 2, . . ., N. We firstly consider the case that each layer l is added with a residual block structure with no skipping layer, then t = L.The training algorithm for this kind of Res-HQCNN is given by the following steps:I.Initialize: I1.Set step s = 0.I2.Choose all unitary U l j (0) randomly, j = 1, 2, . . ., m l , where m l is the number of nodes in layer l.II.For each layer l and each training pair (|φ in x ⟩, |φ out x ⟩),

Fig. 8 .Fig. 9 .
Fig. 8. Numerical results of [2, 3, 3, 2] with residual block structure in different cases for 10 training pairs and 600 training rounds.(For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.)
Fig. 11(a), the values of cost for[2, 3, 2]  and [2, 3, 2] both decrease as the number of noisy pairs increase and the variation of the cost value is always positive.This shows the superiority of [2, 3, 2] for noisy training data with small training rounds and small training pairs.When the number of training rounds and training pairs are big, such as 200 training rounds and 100 training pairs in Fig. 11(b),

Fig. 13 .
Fig. 13.Behaviors of [1, 2, 1] and [2, 3, 2] with different p.Here λ = 1 and ϵ = 0.1.Fig. 12(a), we find that the improvement is obvious.The variances of cost function are always positive.Roughly speaking, the variation decreases with the number of noisy pairs increasing.When the number of training rounds and training pairs are big, such as 600 training rounds and 100 training pairs in Fig.12(b), it is pleased to find that all variances of cost function are always positive without unstable points.The maximum value of variance is more than 0.35, while the one in Fig.11(b) is no more than 0.12.

1.
For convenient, we call quantum neural networks with this kind of deep residual learning as p-ResQNN.The training algorithm of p-ResQNN is also changed.For example, if p-ResQNN has one hidden layer, when l = 1, the update parametersmatrix K l j (s) = η 2 m l−1 N ∑ N x=1 tr rest (1 − p) * M l j ; when l = 2,which is the same as the definition of cost function in Beer et al. (2020).Here are the simulation results of p-ResQNN with one hidden layer, see Fig. 13.
2, . . ., N with an unknown unitary matrix V .The elements of |φ in x ⟩ are randomly picked out of a x ⟩, |θ out x ⟩), where the desired output |θ out x ⟩ has no direct transform relation to |φ in x ⟩.Both good and noisy training data in the experiments are randomly generated quantum states.
. Then the training data are generated 21 times.For each time, the good training data (|φ in distribution before normalization.So we think the randomness of training data causes some unstable points, which can be seen as comparable results.As a whole, Res-HQCNN[2, 3, 2]shows its stronger robustness to noisy data than QNNs[2, 3, 2].We also detect deeper network[2, 3, 4, 2]to noisy data, see Fig. 12.When the number of training rounds and training pairs are small, such as 150 training rounds and 30 training pairs in