Enhancing the expressivity of quantum neural networks with residual connections

In the recent noisy intermediate-scale quantum era, the research on the combination of artificial intelligence and quantum computing has been greatly developed. Inspired by neural networks, developing quantum neural networks with specific structures is one of the most promising directions for improving network performance. In this work, we propose a quantum circuit-based algorithm to implement quantum residual neural networks (QResNets), where the residual connection channels are constructed by introducing auxiliary qubits to the data-encoding and trainable blocks of the quantum neural networks. Importantly, we prove that when this particular network architecture is applied to a $l$-layer data-encoding, the number of frequency generation forms can be extended from one, namely the difference of the sum of generator eigenvalues, to $\mathcal{O}(l^2)$. And the flexibility in adjusting the corresponding Fourier coefficients can also be improved due to the diversity of spectrum construction methods and the additional optimization degrees of freedom in the generalized residual operators. These results indicate that the residual encoding scheme can achieve better spectral richness and enhance the expressivity of various parameterized quantum circuits. Extensive numerical demonstrations in regression tasks of fitting various functions and applications in image classification with MNIST datasets are offered to present the expressivity enhancement. Our work lays the foundation for a complete quantum implementation of the classical residual neural networks and explores a new strategy for quantum feature map in quantum machine learning.


I. INTRODUCTION
Quantum computing is a new computing paradigm based on quantum mechanics that utilizes qubits instead of classical bits to store and process information [1].Since the theoretical concepts were proposed [2-4], quantum computers have developed at an astonishing speed, gradually moving from the proof-of-principle demonstration like quantum supremacy in the laboratory [5][6][7] to the stage of application exploration [8][9][10].Among its many applications, quantum machine learning is an emerging field that leverages the power of quantum computers to overcome bottlenecks of high computing power requirements in the machine learning [11][12][13][14].On the current noisy intermediate-scale quantum devices [15], one popular strategy for constructing quantum machine learning algorithms is using classical-quantum hybrid optimization loops to train the parameterized quantum circuits for various learning tasks, such as pattern recognition [16,17] and classification [18][19][20].
Similar to the classical neural networks that consist of input layers, hidden layers and output layers, the fundamental structures of the variational quantum neural networks include data-encoding or quantum feature map circuits U(x), which map the classical data x ∈ χ to a quantum state in Hilbert space H, variational ansatz W(θ) containing trainable parameters θ, and output layers realized by quantum measurement [21,22].To be specific, the data-encoding processes serve as one of the main sources of non-linearity for the networks, and there exist numerous encoding strategies such as amplitude encoding and angle encoding [23].Moreover, different choices of architectures for the variational ansatz will lead to various quantum neural networks [24][25][26][27][28][29][30][31][32] and it will greatly affect the network performance such as generalization [33] and trainability [34].For example, general deep parameterized quantum circuits suffer from the barren plateau phenomenon, leading to vanishing gradients [34][35][36][37].But it can be avoided by networks with hierarchical structure, proposed as a realization of the quantum convolutional neural networks (QCNN) [20,25,38], which has been proved the absence of barren plateaus [39].Finally, the output of an n-qubit quantum neural networks is the mean value of a measurable observable O as where initial state |ψ 0 ⟩ = |0⟩ ⊗n and U θ (x) is the parameterized quantum circuit consisting repeatable dataencoding and trainable blocks.Interestingly, the expressivity and universality of such variational quantum models can be guaranteed by the fact that one can naturally write the outputs as partial Fourier series in the network inputs [40][41][42][43], and the accessible frequencies are determined by the eigenvalues of the generator Hamiltonian in the data-encoding gates, while the coefficients are controlled by the design of the entire circuits [43].
A great deal of research work has subsequently devoted to advancing the quantum neural networks, with one intuitive approach being the quantization of classical networks [29][30][31][32].Especially, inspired by the classical residual neural networks, which are proposed for alleviating the vanishing gradient problem during the training process of deep neural networks [44], its quantum counterpart is promising to mitigating barren plateaus [32].The key idea is to introduce residual connections into the traditional neural networks, as shown in the figure 1. Mathematically, the residual connections can provide an (a) additional cross-layer propagation channel for the input features, leading to a basic residual unit of neural networks as H(x) = F(x) + x, where the non-linear parameterized function F(x) represents the traditional neural networks.Although there exist some works on the quantum realization of residual neural networks, the residual channels are usually implemented using classical or hybrid methods [32,45].The researches on the full quantum implementations of residual connections and effects on the expressivity are still very lacking.
In this work, we address these issues by proposing a quantum algorithm for the digital simulation of quantum residual neural networks (QResNets).The residual connection channel is constructed through one ancillary qubit and the target evolution process is embedded in the subspace.Such structures are compatible to both the data-encoding and trainable blocks in the variational quantum neural networks.We also further parameterize the encoding gates on the auxiliary qubit and obtain the generalized residual operators.Furthermore, we find that the Fourier spectrum of the output of parameterized quantum circuits can be enriched when the residual connections are used for the data-encoding blocks.The number of frequency combinations forms can be extended from one, namely the difference of the sum of generator eigenvalues, to O(l 2 ) for the l-layer residual encoding.Moreover, the diverse construction methods for frequencies in the residual loss functions and the extra trainable parameters in the generalized residual operators can expand the Fourier coefficient space.The results suggest that the expressivity of quantum models can be enhanced by residual connections.We offer extensive numerical demonstrations of the quantum algorithm in the regression tasks by function fitting of Fourier series, and also present the performance of binary classification with standard MNIST datasets to recognize the handwritten digits images, achieving an accuracy improvement of over 7% with residual encoding.
The remainder of this paper is organized as follows.We introduce the theory in the Sec II, including realization of quantum residual connections, proof of frequency spectra enhancement and measurement scheme.Sec III and Sec IV give the numerical results of the proposed quantum algorithms in fitting functions and classifying handwritten character images.Finally, a conclusion in the sec V is given.

II. THEORY A. Realization of Quantum Residual Connection
In the QResNets, there are multiple layers of repeatable data-encoding block U(x) and trainable parameterized ansatz W(θ), and the residual connections can be adopted in some of the blocks, as shown in the figure 1.The data-encoding block consists of quantum rotation gates of the form U (x) = e iHx where H is a generator Hamiltonian, while the trainable circuits are composed of single-and two-qubit parameterized quantum gates W (θ) with optimization parameters θ.Some gates in the data-encoding and ansatz block can be sampled to add residual connections forming quantum residual operators R(x/θ), which correspond to the residual evolution process.For an n-qubit quantum system with initial state |ϕ 0 ⟩, the evolution under residual operator can be expressed as where σ 0 is the identity matrix and L(x/θ) is a unified expression of the gates in data-encoding and trainable blocks.It means that L(x/θ) = U (x) in the quantum feature map block and L(x/θ) = W (θ) in optimization ansatz.Such an evolution operator can be realized by the frame of linear combination of unitary with one ancillary qubit, and the target quantum states are obtained by post-processing [46,47].Specifically, we first apply a Hadamard gate to encode the ancillary system followed by a controlled-L(x/θ) operator.After adding another Hadamard gate, we can measure the ancillary qubit with results m a = 0/1 corresponding to quantum states |0⟩/|1⟩.Then the evolution results under residual operators can be obtained in the |0⟩⟨0| subspace.The introduction of an auxiliary qubit provides an additional channel that allows the unevolved quantum state to pass alone and add to the evolved quantum state.More generally, the weight of the summation process can also be adjusted by replacing the first Hadamard gate on the ancillary qubit with R y (2α) rotation with trainable angles α.Then the corresponding residual operator is generalized as a single optimization-angle residual operator Such a construction does not require a post-selection process, but rather reconstructs the target operator from the measurement results.It can be reduced to R(x/θ) with α = π/4 and m a = 0. Similarly, a two optimizationangles residual operator R 2 (x/θ) can also be constructed by replacing both Hadamard gates with parameterized rotation gates, and the detail is shown in appendix A. In principle, the introduction of more trainable parameters in these two generalized residual operators will provide additional degrees of freedom for optimization, which can further increase the expressivity of the parameterized quantum circuits.Therefore, we can conclude here that a general residual connection in quantum neural networks can be realized in the complete quantum circuit frame.It is also worth noting that in some special network structures such as the QCNN [25], by reusing discarded qubits, we can simulate the residual connections without additional qubits.Moreover, due to the fact that the expressivity of quantum models is fundamentally limited by the dataencoding strategy, we will prove below that the residual connections applied to data-encoding block, no matter what ansatz used, will lead to a better spectra richness in the Fourier series of quantum model output, resulting an expressivity enhancement.

B. Frequency Spectra Enhancement
It has been pointed out that the output of a parameterized quantum circuit can be expressed as a finite-term Fourier series of the input features [43] where the frequency ω of spectrum Ω = {w k − w j |j, k ∈ [d]} depends on the d-dimensional generator of one-layer data-encoding gate U (x) = e iHx with eigenequations here.It means that the accessible frequency of the quantum model is constructed from the difference between the generator eigenvalues.For example, a frequently used generator is the Pauli matrix H = σ/2 with two eigenvalues w 1,2 = ±1/2 where σ = {σ x , σ y , σ z }, then such a one-layer data-encoding block would produce a frequency spectrum Ω = {0, ±1}.Moreover, the expansion coefficients c ω (θ, O) are associated with the entire structure of the quantum circuit including trainable parameters θ, and the observable O.
However, for a data-encoding block with residual connection, more frequency components can be involved, realizing an improvement in the circuit approximation ability.Assuming that the initial quantum state |ϕ 0 ⟩ of the residual encoding block is related to the optimization parameters θ, the residual loss function can be expressed as It is clear that the first term produces the same frequency components as the traditional encoding scheme, whereas the second term corresponds to the zero-frequency component, independent of input feature x.So the key lies in the third term.Because the eigenstates |h j ⟩ of the generator Hamiltonian form a complete basis, we can then expand the initial quantum state |ϕ 0 ⟩ and the observable It can be found that this part will produce new frequency components for the quantum models, which are the eigenfrequencies of generator themselves ±w k for k ∈ [d], but not the differences between them.Therefore, the new spectra of the one-layer data-encoding block with residual connection is which indicates that the frequency generation forms of the quantum neural networks with residual encoding is more diverse, and the resulting Fourier spectrum in general could also be more abundant.In this case, the toy model we exemplified above will produce new spectrum {0, ±1/2, ±1}, which includes more frequency components and leads to an enhanced approximation ability for the parameterized quantum circuits.
A natural issue needs to be addressed is when will the residual encoding strategy behaves better than the traditional method.For the one-layer data-encoding block in quantum neural networks, it needs to meet the condition that there exists frequency component Such a constraint can be satisfied in many practical cases because we usually use Pauli operators as the generator Hamiltonian.
Furthermore, for the data-encoding strategy repeated l-times either in sequence or in parallel, the traditional scheme will lead to a frequency spectrum , which has only one frequency combination form, namely the difference between the sum of two sets of l frequencies [43].However, for the residual encoding, there are more ways to construct the spectrum and the combination forms of frequencies will be more complex and diversified.Specifically, the frequency spectrum of a two-layer residual encoding is which contains four kinds of frequency combination forms.More frequency generation forms in general can result in a larger upper limit for the spectrum size.We can summarize by induction that for a l-layer residual encoding scheme, the number of frequency combination forms is where ⌈•⌉ and ⌊•⌋ represent roundup and rounddown functions.This is a squared improvement over the traditional scheme and detail is shown in the appendix B.
In addition to enlarging the accessible frequency spectrum, residual encoding can also improve the flexibility of the corresponding Fourier coefficients, both of which determine the expressivity of a quantum model.The enhancement comes from two aspects, one is due to the introduction of additional optimization degrees of freedom in the generalized residual operators R 1,2 (x/θ), and another one is due to the more diverse construction methods of frequency and the corresponding recombination of Fourier coefficients, which means that a single frequency component can be generated from the recombination of different terms in the residual loss functions.The latter one is the reason why residual operator R(x) can behave better than the traditional encoding strategy in expanding Fourier coefficient space without introducing additional optimization parameters.We will show the expressivity improvement in detail in the numerical simulation section.

C. Measurement Scheme
To get the expectation values of an observable O for the quantum state R(x)|ϕ 0 ⟩, which is embedded in the |0⟩⟨0| subspace of the ancillary qubit, we can introduce another observation operator Ō = |0⟩⟨0|⊗O on the system.Then the output observation values can be expressed as where |ϕ f ⟩ = |0⟩R(x)|ϕ 0 ⟩ + |⊥⟩ is the output quantum state of the whole system, and the second item |⊥⟩ is orthogonal to the first part.Furthermore, because we can expand the measurement operator as Ō = (σ 0 +σ z )/2⊗O, we can also have This indicates that we can obtain the residual loss functions f R (x, θ) by measuring the average expectation of system output state |ϕ f ⟩ with two observations {σ 0 ⊗ O, σ z ⊗ O}, which is experimentally feasible and introduces little resource overhead.For a l-layer residual encoding, we need l ancillary qubits at most and the corresponding observation operators will be {(σ 0 +σ z ) ⊗l ⊗O}, whose size grows exponentially with layers of residual encoding.In practice, we do not need to use residual feature maps in every block, and inserting residual connections to some sampled data-encoding blocks could make the networks obtain better expressivity.In addition, the measurement schemes suggest that our algorithm is compatible with the existing methods for calculating the gradient of expectation value of the quantum circuit with respect to the optimization parameters [48][49][50].Using parameter-shift rule [48], the gradient of the residual loss function for a parameter θ j can be calculated as where f R (x, θ j ± π/2) are the expectation values when the target parameter θ j is shifted by ±π/2 respectively.Furthermore, it should be mentioned that the approximation improvement can be understood from the universal approximation property with polynomial basis functions [51], which states that the linear combination of different observations can approximate any continuous functions.Based on the above analysis for the quantum models with the specific residual encoding structures, we can see that such a combination of measurement results can actually lead to a frequency richness improvement in the Fourier series, which enhances the expressivity ability of quantum neural networks.Therefore, our work can serve as a specific case to bridge the polynomial approximation [51] and Fourier series approximation [43], two perspectives for understanding the universal approximation property of quantum machine learning models.

III. NUMERICAL DEMONSTRATION
To demonstrate the improvement of the Fourier frequency spectrum by residual connections, we present a proof-of-principle numerical simulation with Pennylane [52] here, which solves regression tasks of fitting quantum models to the target Fourier series.We adopt the traditional qubit encoding strategy to map classical data x into quantum state with a single-qubit Paulirotation U (x) = R y (x) = e −ixσy/2 operator, where the generator Hamiltonian G = −σ y /2 has two eigenvalues e 1,2 = ±1/2.The optimization ansatz used has two arbitrary single-qubit rotation gates , 2 placed before and after the data-encoding block, resulting a quantum model U θ (x) = U (θ 2 )U (x)U (θ 1 ).The observable is σ z and then the loss function is f (x, θ) = ⟨0|U † θ (x)σ z U θ (x)|0⟩.The quantum models are trained by a supervised learning frame to search the optimal parameters θ * , which minimizes the mean squared error (MSE) as where D is the dimension of the data set and y(•) is the target function.We use Adam optimizer with at most 200 steps and set the learning rate as 0.3 with batch size 0.7D in the simulation.A termination condition for the optimization convergence, that is, the variance of ten consecutive loss function values is less than 10 −8 , is also used.
As shown in the figure 2, this quantum model can learn functions of the form y 1 (x) = ωi∈Ω1 (ae iωix + a * e −iωix ) with a MSE value ∆ = 6.0 × 10 −5 , where a is an amplitude parameter and the frequency spectrum is Ω 1 = {ω 0 = 0, ω 1 = 2|e 1,2 | = 1}, and this is consistent to the results in [43].However, a multi-frequency function with spectrum Ω 2 = {ω 0 = 0, ω 1 = 1, ω 2 = 0.5} cannot be well fitted with error ∆ = 5.1 × 10 −2 , due to the frequency lack of parameterized quantum circuits caused by data-encoding strategy.The frequency mismatch can be mitigated by inserting residual connections to the data-encoding block with an output MSE value ∆ = 5.1 × 10 −5 , because the resulting residual operator R(x) can bring richer frequency components to enhance the circuit expressivity.It is worth noting that the residual data encoding scheme still works well for the spectral Ω 1 besides Ω 2 , and the optimization process can converge quickly.
Furthermore, we turn to a more general case for fitting the function y 2 (x) = ωi∈Ω2 (a ωi e iωix + a * ωi e −iωix ), where the amplitudes can be different for each frequency component.Additional degrees of freedom can be obtained from the multi-combination methods of singlefrequency components in residual loss functions and the parameterized gates on the auxiliary qubit in the generalized residual operators R 1,2 (x/θ).We can conclude from the numerical results in the figure 3 that the traditional encoding scheme still cannot fit the target function with train encode train FIG.4: (a-c) The real and imaginary parts of the Fourier coefficients sampled from 1000 random quantum models.(d) Quantum models with one-layer data-encoding structure.The quantum models share the same ansatz but vary the dataencoding strategies by traditional encoding (gray), residual feature map with the R(x) (red), R1(x) (green) and R2(x) (blue) operators.The distribution of coefficients widens from gray to red to green to blue.
MSE value ∆ = 0.09, while the residual feature map with R(x) operator works better with error ∆ = 2.1 × 10 −3 .When we use the generalized residual operators, the fitting results can be further improved, which converges to a smaller MSE values with ∆ = 1.1 × 10 −4 for R 1 (x) and ∆ = 1.7 × 10 −4 for R 2 (x) in fewer optimization steps with 77 steps for R 1 (x) and 55 steps for R 2 (x).Moreover, the extra combination forms and trainable parameterized quantum gates bring more flexibility for fitting, which expand the Fourier coefficient space.As shown in the figure 4, we sample the quantum models 1000 times with different feature maps which produce Fourier series, and then get the distribution of Fourier coefficients.We can see that under the same ansatz, the residual feature map with R 2 (x) operator has the widest Fourier coefficients distribution, and all the three residual encoding are better than the traditional encoding scheme.
In addition, this enhancement can be quantitatively measured by a commonly used expressibility metric [53].We first generate many pairs of parameters Θ 1 and Θ 2 randomly, and calculate the distribution (P F ) of state fidelities F = |⟨0|U † Θ1 (x)U Θ2 (x)|0⟩| 2 , which measure the overlap of quantum states generated by quantum models.Then the Kullback-Leibler (KL) divergence [54] is used to quantify the circuit expressivity by comparing Quantum models the sampled fidelity distributions with that of the Haardistributed state ensemble (P Haar ) as where the analytical form of the fidelity distribution for the ensemble of Haar random states is p Haar (F ) = (N − 1)(1 − F ) N −2 and N is the dimension of Hilbert space [55].A smaller KL divergence value corresponds to a more favorable expressibility.We sample each quantum model in the figure 4  = 0.0429, respectively.We can see that the generalized residual operators can indeed increase the circuit expressivity relative to traditional encoding scheme.Moreover, it worth mentioning that the reasons for expressivity enhancement are different for R(x) and R 1,2 (x) operators.The former one is due to the diverse construction methods of frequencies in residual loss function, while the latter is also due to the additional optimization parameters.It is known that constructing frequencies only from the difference between the sum of the generator's eigenvalues will limit the access to higher-order components, resulting in a reduction in coefficient variance [43].Therefore, the residual encoding method which can offer more methods to construct frequency could broaden the distribution of Fourier coefficients, which suggests an enhanced expressivity of quantum models by residual connections.
Moreover, similar to the traditional encoding, we can extend the accessible frequency spectrum by repeating the residual encoding block multi-times in sequence or in parallel method.To investigate the frequency extension by sequential and parallel repetitions of data-encoding, we fit the aforementioned target function y 2 (x) with a more complex spectra Ω 3 = {ω 0 = 0, ω 1 = 1, ω 2 = 0.5, ω 3 = 1.5, ω 4 = 2} and amplitude a 0 = 0.1 and a 1.5,2 = 5a 1,0.5 = 0.15 + 0.15i.Two-layers of repeating structures for the traditional encoding in sequence and residual encoding with R 2 (x) operators in sequence and in parallel are used, as shown in the figure 5.The single-qubit observable is O = σ z for all cases.All the quantum models were trained with 200 steps at most using Adam optimizer and with batch size 16.We can see that both the sequential and parallel repetitions of residual encoding can extend the Fourier spectrum and fit the target function well.The MSE values and optimization steps for the sequential repetitions are ∆ = 3.3 × 10 −4 and 159 steps, while ∆ = 4.2 × 10 −4 and 115 steps for parallel repetitions.It should be clarified that the mixed use of residual and traditional encoding will also bring an enhanced expressivity.Therefore, replacing parts of the encoding blocks in complex quantum models with residual blocks, but not all of them, can enrich the expressivity of the whole neural networks.

IV. APPLICATION IN IMAGE CLASSIFICATION
In this part, we turn to discuss the performance of QCNN algorithm with residual encoding for image classification using a real-word dataset MNIST.The MNIST includes 60000 (10000) images for train (test) datasets with 10 classes of handwritten digits, and each image is a 28 × 28 pixels data.Here we focus on the binary classification with selected classes 0 and 1, and the sizes for the train and test datasets used are 12665 and 2115.Constrained by the current quantum hardwares, highdimensional data usually require classical pre-processing techniques for dimensionality reduction, and we adopt principal component analysis (PCA) technology to match the input data with the four-qubit data-encoding layer [56].For comparison, we use qubit encoding and consider the case where no residual connection is added, and the case where the residual operator R 2 (x) is applied to the i-th qubit, denoted as traditional and residual-Q i schemes, respectively The ansatz for QCNN algorithm is composed of a series of alternating convolutional and pooling layers [25], as shown in the figure 6.Each convolutional layer in- cludes several single-and two-qubit parameterized quantum gates, keeping a translationally invariant structure.We use Ising interactions between adjacent qubits with one parameter as ZZ(ϕ) = e −iσz⊗σzϕ/2 and single-qubit U 3 gates with three parameters as U 3 (θ, ϕ, δ) = cos(θ/2) −e iδ sin(θ/2) e iϕ sin(θ/2) e i(ϕ+δ) cos(θ/2) (16) The pooling layer is implemented by a parameterized controlled-U 3 gate and one qubit will be traced out, reducing the quantum states from two qubits to a single qubit.We measure the expectation values ⟨σ z ⟩ i on the output qubit for the i-th input data with label   c 0/1 = 0 for |⟨σ z ⟩| < ϵ, while other values are marked as unclassifiable optimization results.A smaller value for ϵ represents higher optimization accuracy and higher classification standards.The optimization results of cost function and accuracy are shown in the figure 7 and table I.We set ϵ = 0.1 in the simulation and there are 20 free parameters involved in the ansatz.We can conclude that the residual encoding schemes can obtain smaller convergence values of loss than the traditional encoding method, which means that the models have better approximation ability.Such an enhancement can lead to better expressivity and higher accuracy for quantum models in complex learning tasks.In addition, the residual encoding can produce a high classification accuracy, reaching 92.85% and 92.47% on average for the train and test datasets respectively, which are about 7.74% and 7.57% higher than that with the traditional encoding strategy.

V. CONCLUSION
In summary, we have proposed a complete quantum circuit-based architecture for the digital implementation of quantum residual neural networks, dubbed QResNets.
The classical residual connection channel is quantized by adding an auxiliary qubit to the data-encoding and trainable blocks, which is then generalized with additional parameterized gates.We further prove mathematically that the Fourier spectrum of quantum models output can be enriched when the residual connections are applied to the data-encoding blocks.There is a squared improvement in the number of frequency generation forms of residual encoding over the traditional schemes.It means that the l-layer residual encoding strategy can produce O(l 2 ) frequency combination methods, rather than just by the difference of sum of generator eigenvalues as in traditional methods.Moreover, the diverse spectrum construction methods in the residual loss functions and additional optimization degrees of freedom in the generalized residual operators could make the Fourier coefficients more flexible, favoring the access to higher-order components.This indicates that the residual encoding can enrich the spectrum and broaden the Fourier coefficient distribution, that is, it can enhance the expressivity of various parameterized quantum circuits.Various numerical simulation of fitting the functions of Fourier series, and a demonstration of binary classification in images of handwritten digits with MNIST datasets are conducted to show the algorithm performance.Compared with the traditional encoding, the accuracy of residual encoding can be improved by about seven percent.Our work advances the design of quantum neural networks with specific structures and, for the first time, enables a full quantum realization of classical residual connections, and also provides a new quantum feature map strategy.residual operator R 1,2 (x) is used in the data-encoding block, the residual loss function is where the trainable coefficients for R 1 (x) operator are A R1 1 (α) = sin 2 α/2, A R1 2 (α) = cos 2 α/2 and A R1 3 (α) = (−1) ma sin 2α/2, while for the R 2 (x) operator are A R2 1 (α, η) = (sin α sin η) 2 , A R2 2 (α, η) = (cos α cos η) 2 and A R2 3 (α, η) = (sin 2α sin 2η)/2.Such extension offers additional degree of freedom for the optimization process and can relax the range of Fourier coefficients for the new frequency component w k in equation 6 to A R1,2 3 j ϕ * j o jk ϕ k , and similar effect is true for other frequency components.In fact, the generalized residual loss function f R1,2 (x, θ) can be seen as a weighted version of the residual loss function f R (x, θ), where the weights of each term are trainable.

Appendix B: Proof of Frequency Combination Forms
As mentioned above, there are four kinds of combination forms for frequency generation with a two-layer residual encoding.When another residual encoding layer is added, the spectrum Ω R l=1 = {w k − w j , ±w k |j, k ∈ [d]} would be combined to the spectrum Ω R l=2 .We first consider the component of difference of the sum of generator eigenvalues, and it would bring new frequency components for the three-layer residual spectrum as We can combine the above cases for frequency generation and simply mark the combination forms of ±( l1≥1 m=1 w jm − l2≥1 n=1 w kn ) as DS(l 1 , l 2 ), which means the difference between the sum of two sets with l 1 and l 2 frequencies.Note that we mark the combination form of ± l≥1 m=1 w jm as DS(l, 0).Then we can find that there are six kinds of frequency combination forms for the three-layer residual encoding, and it can be concluded as {DS(3, 3), DS(3, 2), DS(3, 1), DS(3, 0), DS(2, 2), DS(2, 1)}.Further, for the l-layer residual encoding, the spectrum with various frequency generation forms can be formally expressed as It can be concluded that compared with the traditional encoding method which generates frequency only with DS(l, l) [43], there is a squared improvement in frequency generation methods for the residual encoding scheme with N (Ω R l ) ∝ O(l 2 ).While different combinations may produce some of the same frequency components, in general, more frequency-generation methods suggest that the possible upper bounds for the size of Fourier spectrum of quantum model outputs can be larger, allowing for more complex learning tasks.Moreover, the diverse construction methods for frequencies can also improve the flexibility of Fourier coefficients, favoring the access to higher-order components and further improving the expressivity of quantum models.

FIG. 1 :
FIG. 1: (a) A schematic of the quantum neural networks with residual connections.The quantum feature map circuit U(x) and trainable variational circuit W(θ) are repetitively implemented multiple times to form the multilayer structures.The R(x/θ) blocks labeled by red represent the data-encoding gates U (x) and parameterized gates W (θ) with residual connections.(b)The classical residual unit and its quantum counterpart.The residual connection channels are shown with blue arrows, and the output of residual block is H(x) = F(x) + x, where non-linear function F(x) represents the classical neural networks.The quantum residual operator R(x/θ) implemented on the initial state |ϕ0⟩ can be realized in the subspace of an ancillary qubit with measurement results ma = 0/1.(c) The residual feature map can introduce more frequency components (blue) to the original spectra of quantum neural networks (gray), and also make the Fourier expansion coefficients more flexible.

FIG. 2 :
FIG. 2: The fitting results of quantum models to the target function y1(x) with frequency spectra Ω1 = {0, 1} (a,b) and Ω2 = {0, 1, 0.5} (c,d).The top panels show the theoretical function values (black dashed lines), and the quantum model outputs with traditional (gray) and residual (red) encoding strategies, respectively.The bottom panels show the MSE values during the training processes.
FIG. 3: (a)The fitting results of quantum models to the target function y2(x) with traditional encoding scheme (gray) and residual feature map with the R(x) (red), R1(x) (green) and R2(x) (blue) operators, respectively.(b) The MSE values during the training processes.

FIG. 5 :
FIG. 5: (a) The fitting results of quantum models with twolayer data-encoding for target function y2(x) with frequency spectra Ω3.(b) The MSE values during the training processes.(c) Quantum models with two-layer data-encoding structure.The residual operator R2(x) is repeated in sequence and in parallel, and the output is the measurement value ⟨σz⟩ on a qubit.
by 1000 times and use 45 histogram bins to estimate the fidelity distribution, which are then compared with the sampled fidelities ensemble of the Haar random states.The computed results of KL divergence are D trad KL = 0.0634, D

FIG. 6 :
FIG. 6: A schematic of the QCNN algorithm with residual encoding for image classification.The handwritten digits are encoded as quantum states via quantum feature map, where the green blocks represent qubit encoding schemes and the red blocks are residual encoding with R2(xi) operators on the ith qubit.The multiple convolutional (C) and pooling (P ) layers use quantum gates with trainable parameters θ, and the detailed structures are shown below.The measurement outcome of the quantum circuit ⟨σz⟩ is used to calculate the cost function C(θ) and characterize the binary classification results c 0/1 .The classical computer updates the optimization parameters of QCNN algorithm based on gradients until the cost function converges.

FIG. 7 :
FIG. 7:The performance of QCNN algorithm with different data-encoding strategies for image classification.Simulations with the traditional scheme and residual encoding on qubits Q0 and Q2 in the train and test datasets are offered.The panel (a) shows the evolution processes of cost function with optimization steps and panel (b) is the corresponding results in accuracy.

TABLE I :
The average accuracy obtained from twenty repetitions of training for the image binary classification with MNIST datasets using different data-encoding strategies.