Leveraging Data Locality in Quantum Convolutional Classifiers

Quantum computing (QC) has opened the door to advancements in machine learning (ML) tasks that are currently implemented in the classical domain. Convolutional neural networks (CNNs) are classical ML architectures that exploit data locality and possess a simpler structure than a fully connected multi-layer perceptrons (MLPs) without compromising the accuracy of classification. However, the concept of preserving data locality is usually overlooked in the existing quantum counterparts of CNNs, particularly for extracting multifeatures in multidimensional data. In this paper, we present an multidimensional quantum convolutional classifier (MQCC) that performs multidimensional and multifeature quantum convolution with average and Euclidean pooling, thus adapting the CNN structure to a variational quantum algorithm (VQA). The experimental work was conducted using multidimensional data to validate the correctness and demonstrate the scalability of the proposed method utilizing both noisy and noise-free quantum simulations. We evaluated the MQCC model with reference to reported work on state-of-the-art quantum simulators from IBM Quantum and Xanadu using a variety of standard ML datasets. The experimental results show the favorable characteristics of our proposed techniques compared with existing work with respect to a number of quantitative metrics, such as the number of training parameters, cross-entropy loss, classification accuracy, circuit depth, and quantum gate count.


Introduction
The choice of an appropriate machine learning model for specific applications requires consideration of the size of the model since it is linked to the performance [1].Considering the aforementioned factor, convolutional neural networks (CNNs)s are preferable to multi-layer perceptrons (MLPs)s because of their smaller size and reduced training time while maintaining high accuracy [2,3].Preserving the spatiotemporal locality of data allows CNNs to reduce unnecessary data connections and therefore reduces their memory requirements [2,3].This phenomenon reduces the number of required training parameters and thus incurs less training time [2,3].
In the context of quantum computing, great emphasis has been given to quantumbased machine learning, and, in recent years, various techniques have been devised to develop this field [4].The contemporary quantum machine learning (QML) techniques can be considered as hybrid quantum-classical variational algorithms [5][6][7][8][9].Generally, variational quantum algorithms (VQAs) utilizes parameterized rotation gates in fixed quantum circuit structures, usually called ansatz, and is optimized using classical techniques like gradient descent [5][6][7][8][9].However, like MLPs, preserving data locality is challenging for QML algorithms.For instance, the multidimensionality of input datasets is ignored in contemporary QML algorithms and are flattened into one-dimensional arrays [5][6][7][8][9].Furthermore, the absence of a generalizable technique for quantum convolution limits the capability of QML algorithms to directly adapt CNN structures.
In this work, we present an multidimensional quantum convolutional classifier (MQCC) to address the shortcomings of the existing CNN implementations in reconciling the locality of multidimensional input data.The proposed VQA technique leverages quantum computing to reduce the number of training parameters and time complexity compared with classical CNN models.Similar to the CNN structures, MQCC contains a sequence of convolution and pooling layers for multifeature extraction from multidimensional input data and a fully connected layer for classification.
The subsequent sections of this paper are organized in the following structure.Section 2 discusses fundamental background information regarding different basic and complex quantum operations.Section 3 highlights existing works that are related to the proposed techniques.The proposed methodology is introduced in Section 4 with details given to the constituent parts.The experimental results and the explanation of the used verification metrics are presented in Section 5. Further discussion about the obtained results is provided in Section 6.Finally, Section 7 concludes this work with potential future directions.

Background
In this section, we present background information pertaining to quantum computing and quantum machine learning.Here, we present the quantum gates and operations that are utilized in the proposed multidimensional quantum convolutional classifier (MQCC).In addition, interested readers may find fundamental details related to quantum information and computing in Appendix A.

Quantum Measurement and Reset
The quantum measurement operation of a qubit is usually and informally referred to as a measurement "gate".The measurement gate is a nonunitary operation that can project the quantum state of a qubit |ψ⟩ to the |0⟩ or |1⟩ basis states [10].The likelihood of measuring any basis state can be obtained by taking the squared magnitude of their corresponding basis state coefficient.For an n-qubit register |ψ⟩ with 2 n possible basis states, the probability of measuring each qubit in any particular basis state |j⟩, where 0 ≤ j < 2 n is given by c j 2 [11].The classical output of n-qubit amplitude-encoded [12] data can be decoded as ψ classical decoded-data .This classical output vector can be reconstructed by the square root of the probability distribution P(|ψ⟩), as shown in (1), (2), and Figure 1.When amplitude-encoding [12] is used for encoding positive real classical data, the coefficients of the corresponding quantum pure state [10] |ψ⟩ are also positive real, i.e., c j ∈ R + , where 0 ≤ j < 2 n .Thus, the amplitudes of |ψ⟩ are numerically equal in values to the coefficients of ψ classical decoded-data , i.e., |ψ⟩ = ψ classical decoded-data .Therefore, the quantum state |ψ⟩ can be completely determined from the measurement probability distribution such that |ψ⟩ = P(|ψ⟩) only when the amplitudes of the quantum state are all of positive real values.Moreover, the probability distribution P(|ψ⟩) can be reconstructed by repeatedly measuring (sampling) the quantum state |ψ⟩.In general, an order of 2 n measurements is required to accurately reconstruct the probability distribution.In order to reduce the effects of quantum statistical noise, it is recommended to gather as many circuit samples (shots) [13] as possible.
, where p j = c j 2 , and 0 ≤ j < N (1) The reset operation sets the state of qubits to |0⟩.This operation consists of a midcircuit measurement gate followed by a conditional X gate [14,15] such that the bit-flip operation is applied when the measured qubit is in state |1⟩.The reset gate and its equivalent circuit are both shown in Figure 2 [15].

Classical-to-Quantum (C2Q)
There are a number of quantum data encoding techniques [12,16], each of which uses different methods to initialize quantum circuits from the ground state.Among the many methods, this work leverages the classical-to-quantum (C2Q) arbitrary state synthesis [12,16] operation to perform amplitude encoding and initialize an n-qubit state |ψ 0 ⟩, see Figure 3.The C2Q operation employs a pyramidal structure of multiplexed R y and R z gates.It should be noted that the R z gates are only required for positive real data.Thus, for positive real data, the circuit depth is 2 • 2 n − n − 2, while for complex data, the circuit depth is 3 • 2 n − n − 4 [12].

Convolutional Neural Networks (CNNs)
CNNs are one of the most widely used types of deep neural networks for image classification [17].It consists of convolutional, pooling, and fully connected layers.The convolutional layer applies multiple filters to the input to create feature maps.The pooling layer reduces the dimensionality of each feature map but retains the most important information.Some of the most important pooling techniques include max-pooling, average pooling, and sum pooling.Fully connected layers are in the last layers of the CNN network, and their inputs correspond to the flattened one-dimensional matrix generated by the last pooling layer.Activation functions, which are essential for handling complex patterns, are used throughout the network.Finally, a softmax prediction layer is used to generate probability values for each of the possible output labels.The label with the highest probability is selected as the final prediction.

Quantum Machine Learning with Variational Algorithms
Variational quantum algorithms are a type of quantum-classical technique that facilitates the implementations of machine learning on noisy intermediate-scale quantum (NISQ) machines [5,7].Current quantum devices are not able to maintain coherent states for sufficient periods, preventing current algorithms from performing meaningful optimization on the machine learning model.Thus, VQAs combine classical optimization algorithms with parameterized quantum circuits, or ansatz.Here, the ansatz takes on the role of the model [5].One specific type of VQA is the variational quantum classifier (VQC), which is used for classification problems.Existing VQCs [6,8,9] have been shown to be effective for classifying datasets with high accuracy and few training parameters in both simulation and current quantum processors.

Related Work
In this section, we discuss the existing related works with an emphasis on quantum machine learning.Our discussion focuses on commonly used data encoding techniques, existing implementations of the convolution and quantum convolution algorithms, and related quantum machine learning (QML) algorithms.Moreover, we also discuss existing quantum convolutional classification algorithms that leverage data locality.

Data Encoding
For encoding classical image data into the quantum domain, the commonly used methods are Flexible Representation of Quantum Images (FRQI) [18] and Novel Enhanced Quantum Representation (NEQR) [19].In FRQI, positional and color information is encoded as amplitude encoding and angle encoding, respectively.In NEQR, positions of the pixels are encoded using amplitude encoding but color information is encoded using basis encoding, where q represents the number of bits/qubits allocated for color data.For N = 2 n data points, in terms of circuit width and depth, FRQI incurs n + 1 and O(4 n ), respectively, while NEQR incurs n + q and O(qn2 n ), respectively [20].Although these techniques are employed in the existing quantum convolution techniques, their disadvantages are discussed below.

Convolution
We now discuss existing implementations of convolution and discuss their associated time complexity.These implementations consist of various classical and quantum techniques.In addition, we consider the shortcomings of existing quantum convolution methods.

Classical Convolution
Classical implementations of convolution are usually implemented directly, through general matrix multiplication (GEMM) or through the Fast Fourier transform (FFT).A data size N, running the direct implementation on CPUs, has complexity O(N 2 ) [21], while the complexity of an FFT-based implementation is O(N log N) [21].On GPUs, FFT-based convolution incurs a similar O(N log N) complexity [22], while the direct approach requires O(N K N) FLOPS [23,24], where N K is the filter size.

Quantum Convolution
The existing quantum convolution techniques [25][26][27][28][29] rely on fixed filter sizes and support only specific filters at a time, e.g., edge detection.They do not contain methods for implementing a general filter.Additionally, these techniques have a quadratic circuit depth, i.e., O(n 2 ), where n = ⌈log 2 N⌉ is the number of qubits and N is the size of the input data.While these methods appear to show quantum advantage, these results do not include overhead incurred from data encoding.The related methods employ the FRQI and NEQR data encoding methods, leading to inferior performance compared with classical methods once the additional overhead is factored in.
The authors in [30] propose an edge-detection technique based on quantum wavelet transform QWT and amplitude encoding, named quantum Hadamard edge detection (QHED), which is not generalizable for multiple convolution kernels or multidimensional data.Thus, their algorithm loses parallelism, increases circuit depth, and is difficult to generalize beyond capturing 1-D features.In [20], the authors have developed a quantum convolution algorithm that supports single feature/kernel and multidimensional data.In this work, we leverage the convolution method from [20] and generalize it to support multiple features/kernels in our proposed MQCC framework.

Quantum Machine Learning
There exist two primary techniques for quantum convolutional classification that may leverage data locality through convolution: quantum convolutional neural network (QCNN)s [31] and quanvolutional neural networks [32].QCNNs are inspired by classical convolutional neural networks, employing quantum circuits to perform convolutions, and quanvolutional neural networks replace classical convolutional layers with quantum convolutional (or quanvolutional) layers.

Quantum Convolutional Neural Networks
The QCNN [31] and our proposed multidimensional quantum convolutional classifier (MQCC) are both VQAs models with structures inspired by CNNs.However, QCNNs borrow the superficial structure of CNNs without considering the underlying purpose.Specifically, the QCNN's quantum ansatz is designed so that its "convolution" and "pooling" operations exploit the locality of qubits in the circuit (rather than the locality of data).However, unlike data locality, qubit locality does not have a practical purpose for machine learning in terms of isolating relevant input features.Moreover, by considering input data as 1-D, QCNNs do not leverage the dimensionality of datasets, which constitute a primary advantage of CNNs.MQCC, on the other hand, faithfully implements CNN operations in quantum circuits, offering performance improvements in circuit execution time (based on circuit depth) classification accuracy over contemporary implementations of QCNNs on classical computers.

Quanvolutional Neural Networks
Quanvolutional neural networks [32] are a hybrid quantum-classical algorithm named eponymously after the quanvolutional layer added to a conventional CNN.These quanvolutional layers serve to decimate a 2D image, which is then sequentially fed into a quantum device.In this manner, the quanvolutional layer effectively exploits data locality.Yet, the model's dependency on classical operations, specifically the decimation of input data and the repeated serial data I/O transfer, vastly increases compute time.In contrast, the required convolution operation is incorporated into our proposed MQCC, reducing classical-quantum data transfer.Moreover, MQCC takes advantage of parallelism inherent to quantum computers, while quanvolutional neural networks do not.Together, this allows the MQCC to apply convolutional filters to window data in parallel.

Materials and Methods
In this section, we describe the materials and methods associated with the proposed multidimensional quantum convolutional classifier (MQCC).The proposed method mainly uses generalized quantum convolution, quantum pooling based on the quantum Haar transform (QHT) and partial measurement [11], and a quantum fully connected layer that is illustrated in this section.To the best of our knowledge, this work is the first to carry out the following: • Develop a generalizable quantum convolution algorithm for a quantum-convolutionbased classifier that supports multiple features/kernels.• Design a scalable MQCC that uses multidimensional quantum convolution and pooling based on the QHT.This technique reduces training parameters and time complexity compared with other classical and quantum implementations.

•
Evaluate the MQCC model in a state-of-the-art QML simulator from Xanadu using a variety of datasets.

Quantum Fully Connected Layer
A fully connected classical neural network constitutes a collection of layers that each perform a linear transformation on N in input features x ∈ R N in to generate N out -feature output y ∈ R N out [2,3].Each layer can be represented in terms of a multiply-and-accumulate (MAC) operation and an addition operation, as shown in (3), where W ∈ R N out ×N in and b ∈ R N out represent the trainable weight and bias parameters, respectively.Here, we use bold symbols to represent classical quantities, e.g., vectors, while the Dirac notation to represent their quantum counterparts.
The particular weights and biases that generate the j th feature of the output, y j , can be isolated by taking the j th column of W, w j , and the j th term of b, b j , as shown in (4), which can be directly implemented using quantum circuits.Section 4.1.1discusses the quantum circuits for a single-feature output, and Section 4.1.2generalizes the proposed technique for an arbitrary amount of features in the output.
For a single-feature output neural network, the weight parameters can be represented as a vector w ∈ R N in .Here, w can be expressed as a quantum state |w⟩, as shown in (5), similar to the process of C2Q data encoding; see Section 2.2.
Similarly, for a single-feature output, Dirac notation of the MAC operation follows from (4), as shown in (6), where |ψ⟩ corresponds to the input data.
However, it is necessary to obtain a quantum operator to perform a parameterized unitary linear transformation from the weights vector |w⟩ on the input data |ψ⟩ using an inverse C2Q operation as shown in Figure 4 and described by (7).

Multifeature Output
A multifeature output can be implemented in a naive approach using Single-Feature Output (7) for an N out -feature output, where N out ≤ N in , which can be obtained by encoding each weight vector w j : 0 ≤ j < N out as a normalized row in U MAC .However, the result might yield a nonunitary operator as the weight vectors can be arbitrary.U MAC is unitary when each row is orthogonal to all other rows in the matrix, mathematically, ⟨W i |W j ⟩ = δ ij : ∀i, j ∈ [0, N out ).As described in Appendix A.4, independently defined weights can be supported for each feature of the output by multiplexing U MAC .Now, the generic fully connected operation, U FC , can be generated as shown in (8), where n out = ⌈log 2 N out ⌉.

Replication:
To replicate the initial state, |ψ 0 ⟩ n out qubits, which extends the state vector to a total size of 2 n in +n out , see ( 9) and Figure 5.
2 n in +n out (9) By applying an n out -qubit Hadamard (see Appendix A.3) operation to the relevant qubits the replicas can be obtained through superposition (see (10) and Figure 5), which generates the desired replicas scaled by a factor of 1 √ Applying the U FC Filter: U FC can perform the MAC operation for the entire N out -feature output in parallel with the set of replicas of |ψ 0 ⟩; see (11) and Figure 5.
Data Rearrangement: The data rearrangement operation can be performed by applying perfect-shuffle gates; see Appendix A.6.It simplifies the output-feature extraction by gathering them into N out data points at the top of the state vector; see (12) and Figure 5.
It is worth mentioning that instead of applying the auxiliary qubits at the most significant position, as shown in the decomposed and simplified fully connected circuit in Figure 6, auxiliary qubits can be applied at the least significant position to avoid perfectshuffle permutations.

Circuit Depth of the Quantum Fully Connected Layer
As discussed in Section 2.2, U MAC operation is implemented by applying the C2Q/ arbitrary state synthesis operation with a depth of (3 • 2 n in − n in − 4) fundamental singlequbit and CNOT gates.The depth is expected to increase by a factor of 2 n out when multiplexing U MAC to implement an N out -feature output [33], see (13).

Generalized Quantum Convolution
The most significant part of the MQCC framework is the generalized quantum convolution operation with support for arbitrary, parameterized filters.Compared with the classical convolution operation, the convolution operation in the quantum domain achieves exponential improvement in time complexity due to its innate parallelism.The convolution operation consists of stride, data rearrangement, and multiply-and-accumulate (MAC) operations.

Stride:
The first step of quantum convolution is generating shifted replicas of the input data.Quantum decrementers controlled by additional qubits, called "filter" qubits are used for this purpose.The U −1 shift operator shown in Figure A8b shifts the replica by a single stride.The stride operation, U stride , is composed of controlled quantum decrementers, where each U −1 shift operation has a quadratic depth complexity; see (A15).Thus, the depth of the controlled quantum decrementer can be derived according to (14), where n corresponds to the number of qubits the decrementer is applied to and c reflects the number of control qubits.
Multiply-and-Accumulate (MAC): Kernels are applied to the strided replicas of the input data in parallel using the MAC operation; see Figure 4.In the MAC operations, kernels are applied to the contiguous set of data with the help of the inverse arbitrary state synthesis operation.One benefit achieved by using this MAC operation is the superposition of multiple kernels.The superposition of the kernel can be helpful for the classification of multiple features.

Data Rearrangement:
Data rearrangement is required to coalesce the output pieces of the MAC steps and create one contiguous piece of output.This step is performed using perfect shuffle permutation (PSP) operations described in Appendix A.6.

One-Dimensional Multifeature Quantum Convolution
The one-dimensional quantum convolution operation, with a kernel of size N K terms, requires generating N K replicas of the input data in a range of possible strides between 0 ≤ k < N K .Therefore, a total of N K N terms need to be encoded into a quantum circuit, including the n k = ⌈log 2 N K ⌉ additional auxiliary qubits, denoted as "kernel" qubits, which are allocated the most significant qubits to maintain data contiguity.
Necessary N K replicas of the input vector are created by using Hadamard gates; see Figure 7. Convolution kernels can be implemented using multiply-and-accumulate (MAC) operations; as such, it is possible to leverage U MAC , as defined in Section 4.1, for implementing quantum convolution kernels.Given a kernel K ∈ R N K , the corresponding kernel operation U K can be constructed from the normalized kernel |K⟩, as shown in (15).
When applied to the n k lower qubits of the state vector, U K applies the kernel K to all data windows in parallel.However, in CNNs, convolution layers typically must support multiple convolution kernels/features.Fortunately, one major advantage of the proposed quantum convolution technique is that multiple features can be supported by multiplexing only the MAC operations-the stride and data rearrangement operations do not need to be multiplexed; see Figure 7. Accordingly, for N F features, n f = ⌈log 2 N F ⌉ must be added to the circuit and placed in superposition using Hadamard gates, similar to the process in (9).Thus, the depth complexity of U stride can be expressed in terms of ∆ cU −1 shift (n − j, c), as described by (16), where c = 1 for all 0 ≤ j < n k ; see Figure 7. Similarly, the depth complexity of U K can be expressed by (17).Finally, The depth of the proposed multifeature 1D quantum convolution can be obtained as (18).

Multidimensional Multifeature Quantum Convolution
Multidimensional quantum convolution can be implemented by stacking multiple onedimensional quantum circuits, as shown in Figure 8.A d-dimensional quantum convolution circuit can be constructed with a stacked kernel of 1-dimensional convolution circuits only when the multidimensional kernels are outer products of d instances of 1-dimensional kernels.The depth of d-D quantum convolution can be obtained as (19). where

Quantum Pooling
A critical part of CNNs is the pooling operation or downsampling of the feature maps.One widely used method is average pooling, where only the average of the adjacent pixels in the feature map is preserved, creating a smoothing effect [34].

Quantum Average Pooling using Quantum Haar Transform
The average pooling operation can be implemented using the quantum wavelet transform (QWT) [11], which has the advantage of preserving data locality using wavelet transform decomposition.It is a commonly used technique for dimension reduction in image processing [11].In this work, we utilize the simplest and first wavelet transform, quantum Haar transform (QHT) [11] to implement quantum pooling operation.This operation is executed in two steps: the Haar wavelet operation and data rearrangement.

Haar Wavelet Operation:
For separating the high-and low-frequency components from the input data, H gates are applied in parallel.The number of H gates applied in QHT is equal to the levels of decomposition.

Data Rearrangement:
After separating the high-and low-frequency components, quantum rotate-right (RoR) operations are applied to group them accordingly.As mentioned before, the proposed framework is highly parallelizable regardless of the dimensions of the data, as the QHT operation can be applied to multiple dimensions of data in parallel.
As shown in Figure 9a, for a single-level of decomposition, H gates are applied on one qubit (the least significant qubit) per dimension, and for ℓ-level decomposition, shown in

Quantum Euclidean Pooling using Partial Measurement
In machine learning applications, the average and maximum (max) pooling [34] are the most commonly used pooling schemes for dimension reduction.The two schemes differ in the sharpness of data features.On one hand, max pooling yields a sharper definition of input features, which makes it preferable for edge detection and certain classification applications [34].On the other hand, average pooling offers a smoother dimension reduction that may be preferred in other workloads [34].Thus, to accompany our implementation of quantum averaging pooling using QHT (see Section 4.3.1), it would be beneficial to have an implementation of quantum max pooling.However, such an operation would be nonunitary, creating difficulty for the implementation of quantum max pooling [35].Therefore, instead of max pooling, we utilize an alternative pooling technique we denote as quantum Euclidean pooling.
Mathematically, average and Euclidean pooling are special cases of the p-norm [36], where for a vector of size N elements, the p-norm or ℓ p norm of a vector x ∈ C N is given by (20) for p ∈ Z [36].The average pooling occurs for the 1-norm (p = 1) and Euclidean pooling occurs for the 2-norm (p = 2).A notable benefit of the Euclidean pooling technique is its zero-depth circuit implementation by leveraging partial measurement [35].
This work leverages the multilevel, d-dimensional quantum Euclidean pooling circuit presented in [35]; see Figure 10.Here, for each dimension i, ℓ i is the number of decomposition levels for dimension where 0 ≤ i < d [35].

Multidimensional Quantum Convolutional Classifier
The proposed multidimensional quantum convolution classifier framework, see Figure 11, resembles a CNN [2] structures.After a sequence of convolution pooling (CP) pairs, the model is finally connected to a fully connected layer, see Figures 6 and 11.The total number of layers in the proposed model can be expressed in terms of CP pairs as 2λ + 1, where λ is the number of CP pairs.It is worth mentioning that there is no advantage in changing the number of features among convolution/pooling layers in the MQCC because of the implementation constraints.Therefore, the total number of kernel features can be estimated globally instead of layer-by-layer.
The circuit width of the MQCC ( 21) can be derived from the number of convolution layers, pooling layers, and the fully connected layer.Input data are encoded using n qubits, and each convolution operation adds n f = ⌈log 2 N F ⌉ qubits for N F features and n k = ⌈log 2 N K ⌉ qubits per layer for kernels.In addition, the fully connected operation contributes n c = ⌈log 2 N C ⌉ qubits to encode N C output features/classes.On the other hand, each Euclidean pooling layer frees ℓ qubits, which can then be reused by other layers.
The MQCC can be further parallelized in terms of circuit depth between the interlayer convolution/fully connected layers.This parallelism can be achieved by performing (multiplexed) MAC operations from the quantum convolution and fully connected layers in parallel with the stride operation from the previous layer(s) of quantum convolution.The circuit depth of MQCC can be derived as shown in (22).

Optimized MQCC
Figure 12 presents a width-optimized implementation of MQCC, which we refer to as Quantum-Optimized Multidimensional Quantum Convolutional Classifier (MQCC Optimized).To reduce the required number of qubits, the convolution and pooling operations are swapped, which allows kernel qubits to be trimmed for each convolution layer, see Section 4.2.To achieve higher processing efficiency, trimmed qubits are reassigned to later layers of dimension reduction and run in parallel.Furthermore, only Euclidean pooling with partial measurements is used because of the inherent circuit depth efficiency.The circuit width of MQCC Optimized is shown in (23), where n is the number of qubits corresponding to the data size, n f is the number of qubits corresponding to the features, and n c is the number of qubits corresponding to the classes.If necessary, additional pooling operations can be applied to keep the circuit width at or below the absolute minimum number of qubits n by excluding qubits dedicated to features and classes.It should be noted that reordering convolution and pooling operations reduces the maximum number of convolution operations by 1.
Accordingly, the depth of MQCC Optimized can be expressed as shown in (24).
To further reduce the depth of MQCC Optimized, we investigated replacing inverse-C2Q operations for MAC operations with different parameterized ansatz.More specifically, a common ansatz in QML, namely NLocal operation in Qiskit [37] or BasicEntangler-Layer in Pennylane [38], was utilized; see Figure 13.The depth of this ansatz is linear with respect to the data qubits (see (25)), which is a significant improvement over arbitrary state synthesis, which has a circuit depth of O(2 n ) for an n-qubit state.Although the ansatz could potentially reduce circuit depth, its structure lacks theoretical motivation or guarantees for high fidelity when modeling convolution kernels.

Experimental Work
In this section, we first detail our experimental setup, followed by the results for the proposed MQCC technique.Experiments were conducted using real-world, multidimensional image data to test both the individual and composite components of our techniques.

Experimental Setup
The MQCC methodology was validated by first evaluating its most important component, namely convolution using the metric of fidelity and then evaluating the complete technique, i.e., MQCC and MQCC optimized by conducting classifications experiments using 1D, 2D, and 3D datasets.
For the convolution experiments on 1D data, we used audio material published by the European Broadcasting Union for sound quality assessment [39].Using a preprocessing step, the data were converted into a single channel, with the data size varying from 2 8 data points to 2 20 data points, sampled at a rate of 44.1 kHz.
For conducting 2D convolution experiments, we used 2D images that are either black and white or color Jayhawks [40], as shown in Figure 14.These images range in size from (8 × 8) pixels to (512 × 512 × 3) pixels.For the 3D image experiments, we used hyperspectral images from the Kennedy Space Center (KSC) dataset [41].The images were preprocessed and resized, with the sizes ranging from (8 × 8 × 8) pixels to (128 × 128 × 128) pixels.Simulations of quantum convolution operation were run using Qiskit SDK (v0.45.0) from IBM Quantum [13] over the given data.To demonstrate the effect of statistical noise on the fidelity ( 26), both noise-free and noisy (with 1,000,000 circuit samples/shots) simulation environments were evaluated.To evaluate the performance of the complete MQCC and the MQCC-optimized technique, they were tested against CNNs, QCNNs, and quanvolutional neural networks by their capabilities of binary classification on real-world datasets, such as MNIST [42], FashionMNIST [43], and CIFAR10 [44].The classical components in these trials were run using PyTorch (v2.1.0)[45], while the quantum circuits used Pennylane (v0.32.0), a Xanadu QML-focused framework [46].
The experiments were performed on a cluster node at the University of Kansas [47].The node consisted of a 48-Core Intel Xeon Gold 6342 CPU, three NVIDIA A100 80 GB GPUs (CUDA version 11.7) with PCIe 4.0 connectivity and 256 GB of 3200 MHz DDR4 RAM.To account for initial parameter variance in ML or noise in noisy simulations, experiments were repeated for 10 trials, with the median being displayed in graphs.

Configuration of ML Models
The different techniques fundamentally being ML models meant that they could share some parameters and metrics during their testing.For example, the log loss and the Adam optimizer [48] were shared by all the techniques, and the "feature-count" metric was shared between the CNN and MQCC, which have 4 features per convolution layer.The parameters that were unique to each model are discussed next.
Convolutional Neural Networks: In Figure 15, we show the classification accuracy of the CNN model on (16 × 16) and (28 × 28) FashionMNIST datasets, using average pooling, max pooling, and Euclidean pooling.The plots show the obtained accuracy with and without ReLU [49], which is an optional layer that can be appended to each pooling layer in a CNN.Based on the results, which show Max Pooling without ReLU to be the configuration with the best accuracy, we chose it to be the baseline configuration for CNN in our tests.Quanvolutional Neural Networks: While quanvolutional neural networks were initially introduced without a trainable random quantum circuit in the quanvolutional layer, later work has suggested implementing parameterized and trainable quanvolutional layers.We, therefore, test both the trainable and nontrainable quanvolutional techniques, and Figure 16 demonstrates that the trainable variant outperforms the other method in the (16 × 16) FashionMNIST dataset, although the differences are negligible on the (28 × 28) dataset.This is used as evidence behind our decision to use the trainable variant of the quanvolutional neural network as the baseline for comparison with the other models.

Quantum Convolutional Neural Networks:
We based our implementation of the QCNN on [50]; however, some modifications were made to the technique to work around limitations present in the data encoding method.When encoding data that are not of size 2 n in each dimension, the original method flattens (vectorizes) the input before padding with zeros, as opposed to padding each dimension.However, this sacrifices the advantage of multidimensional encoding, where each dimension is mapped to a region of qubits.To ensure a level field between QCNN and MQCC, the (16 × 16) and (28 × 28) FashionMNIST datasets were tested both for the original (1D) and a corrected (2D) data encoding configuration of the QCNN, the results of which are shown in Figure 17.As expected, we see a clear improvement on the (28 × 28) dataset, and based on this, we chose the corrected (2D) data encoding method as our baseline QCNN for comparison against other ML models.

Results and Analysis
We first present the results of the quantum convolution operations on data with varying dimensionalities.Then, we compare the fidelity results of the quantum convolution under a noisy simulation environment with reference to classical convolution implementation.Finally, we present the results for MQCC.

Quantum Convolution Results
The fidelity of the quantum convolution technique was tested in both a noise-free and noisy environment against a classical implementation using common (3 × 3) and (5 × 5) filter kernels.These kernels, as described in ( 27)- (33), include the Averaging F avg , Gaussian blur F blur , Sobel edge-detection F Sx /F Sy , and Laplacian of Gaussian blur (Laplacian) F L filters.To enable a quantum implementation of these kernels, a classical preprocessing step zero-padded each kernel until the dimensions were an integer power of two.As negative values may occur in classical convolution, the magnitudes of the output values were cast into a single-byte range [0, 255] in a classical postprocessing step.
The reconstruction from convolution operations in classical, noise-free, and noisy environments on (128 × 128)-and (128 × 128 × 3)-pixel input images can be seen in Tables 2 and 3, respectively.
Finally, a 3D averaging kernel of sizes (3 × 3 × 3) and (5 × 5 × 5) was applied to hyperspectral images from the KSC dataset [41].The images were preprocessed and resized to a power of two, ranging from (8 × 8 × 8) pixels to (128 × 128 × 128) pixels in size.Table 4 shows the reconstructed output images from convolution operations in classical, noise-free, and noisy environments.Compared with the expected, classically generated results, the noise-free quantum results tested at 100% fidelity across all trials.Therefore, in a noise-free environment, given the same inputs, the proposed convolution techniques have no degradation compared with classical convolution.The fidelity of noisy simulations using 1D audio, 2D B/W, 2D RGB, and 3D hyperspectral data are presented in Figures 18,19,20,and 21, respectively.The fidelity degradation in these figures is due to the statistical noise where the constant shot count (number of circuit samples) becomes less and less sufficient to describe the increasing data size.We could improve this reduction from noise by increasing the number of shots, but our experiments were limited to 1,000,000 shots in simulation, which is the maximum number of shots allowed by the simulator.

Discussion
In this section, we discuss the results of our experiments with MQCC and compare them against the other models in terms of the number of required training parameters, the accuracy of the model, and the circuit depth of the implemented model.The number of qubits required by MQCC can be easily calculated using (23).The number of qubits required for data encoding and filter implementation can be obtained from the dimensions of the data and filter respectively, i.e., n = ⌈log 2 128⌉ + ⌈log 2 128⌉ + ⌈log 2 3⌉ = 16 for (128 × 128 × 3) data, n f = ⌈log 2 4⌉ = 2 qubits for four features.Similarly, the number of qubits required for feature classes can be calculated as, for example, n c = ⌈log 2 2⌉ = 1 for 2 classes.All together, for input data encoded into n qubits, the optimized MQCC requires n + n f + n c = 19 qubits.

Number of Parameters
Among the classical ML models evaluated, MQCC had the fewest trainable parameters; see Figure 22.This implies potential advantages such as reduced memory usage and faster performance when using classical gradient descent.Although the reduction in parameter decreases from (MLP to CNN) and then further from (CNN to MQCC), parameter reduction diminishes from (MLP to CNN) and further from (CNN to MQCC), and there is still a significant 83.62% decrease in parameter count.

Loss History and Accuracy
ML-based classifiers aim to maximize the accuracy of their classifications, measured by a loss function during training to estimate the accuracy that may be exhibited when deployed.Hence, Figures 23 and 24 depict the performance of the ML models across the experimental datasets in their plotting of log-loss history and classification accuracy.The MNIST [42] dataset is not complex enough to effectively distinguish models; however, differences begin to emerge in the FashionMNIST [43] and CIFAR10 [44] datasets.MLP consistently achieves the highest accuracy across trials due to its larger parameter count, allowing for greater flexibility in adapting to nuances in input.The CNN showcases its ability to select relevant input features using convolution and data locality, demonstrating the second-highest accuracy.Among the tested models, QCNN generally performs the poorest, displaying its inability to properly leverage data locality.However, comparing the accuracy of MQCC and quanvolutional neural networks is inconclusive.Quanvolutional neural networks performed better on FashionMNIST, whereas MQCC performed better on CIFAR10.

Gate Count and Circuit Depth
Although comparing MQCC, MQCC Optimized, and QCNN with quantum metrics like gate count and circuit depth is viable, see Figure 25, it is challenging to include quanvolutional neural networks in the comparison due to their significant differences from the other models.These differences are due to the quantum component within quanvolutional neural networks constituting a small fraction of the entire algorithm, bringing it closer to a classical algorithm than a quantum algorithm.Meanwhile, comparing the techniques of MQCC and QCNN in Figure 25 highlights the rationale behind developing MQCC Optimized.Initially, MQCC performed worse than QCNN in gate count and circuit depth.However, after optimizations, MQCC matched the performance of QCNN and even outperformed it in the best-case scenarios.While the QCNN architecture appears more suitable for shallower quantum circuits than MQCC, it is because the high parallelization of each QCNN layer halves the active qubits.
Despite QCNN using half the active qubits per layer than MQCC, MQCC utilizes the extra qubits for weights and features, with each pooling layer reducing the qubit count by a constant amount, n k .However, as QCNN structures are motivated by the classical convolution operation, they usually need more complex and deeper "convolution" and "pooling" ansatz to attain a higher accuracy.

Complexity Comparison with Classical Models
The proposed method can also be compared with the classical models in terms of temporal complexity.The temporal/depth complexity of MQCC Optimized can be easily derived using Figures 11 and 12, as well as (24), as shown in (35), where O stride and O For a MAC-based fully connected layer, we can consider U FC = U MAC to obtain the complexity using ( 13) and (35), as shown in (36).Similarly, for an ansatz-based fully connected layer, we can consider U FC = U ansatz to obtain the complexity using (25) and (35), as shown in (37).
The general expression for calculating the temporal complexity of classical CNNs is shown in (38).It can be broadly divided into three parts: the complexity of the convolutional layers, the complexity of the pooling layers, and the complexity of the fully connected layers.The complexity of the combined layers of convolution and pooling is primarily determined by the number of filters, the size of the filters, and the dimensions of the input feature maps, while the complexity of fully connected layers depends on the number of neurons in the layers.It is worth mentioning that in classical CNNs, convolutional layers dominate both the pooling and fully connected layers in the overall execution time, as expressed by (38).
A comparison of the depth/temporal complexity of MQCC Optimized against the classical method is shown in Table 5.It details the depth complexities of C2Q data encoding (I/O overhead), two variants of MQCC Optimized, i.e., MAC-based fully connected layer and ansatz-based fully connected layer, and three variants of classical CNN implementation.
Among the classical CNN variants, i.e., Direct (CPU), FFT (CPU/GPU), and GEMM (GPU), we consider the Direct (CPU) case as the worst case and GEMM (GPU) as the best case.The MQCC-optimized algorithm with a MAC-based fully connected layer matches the best case.Considering the depth of the I/O circuit, which, in this case, is the depth of the C2Q method, the overall complexity of MQCC is still better than the worst case in classical methods.The MQCC-optimized algorithm, with an ansatz-based fully connected layer, has the least complexity among all the compared models.Although the I/O overhead represents the worst-case scenario for our proposed technique (see Table 5), the complexity of our proposed MQCC technique, including the I/O overhead, matches the best case in classical methods.It is worth emphasizing that the I/O overhead is not intrinsic to our proposed technique; rather, it is a general consideration for any data-intensive quantum application like quantum machine learning (QML) classification.Moreover, our proposed MQCC method provides two additional advantages over classical CNNs, being highly parallelizable and requiring fewer training parameters, see Figure 22, which ultimately leads to fewer resource requirements than classical CNNs.

Conclusions
In this paper, we presented a multidimensional quantum convolutional classifier (MQCC) that consists of quantum convolution, quantum pooling, and a quantum fully connected layer.We leveraged existing convolution techniques to support multiple features/kernels and utilized them in our proposed method.Furthermore, we proposed a novel width-optimized quantum circuit that reuses freed-up qubits from the pooling layer in the subsequent convolutional layer.The proposed MQCC additionally preserves data locality in the input data which has shown to improve data classification in convolutional classifiers.The MQCC methodology is generalizable for any arbitrary multidimensional filter operation and pertinent for multifeature extraction.The proposed method can also support data of arbitrary dimensionality since the underlying quantum pooling and convolution operations are generalizable across data dimensions.We experimentally evaluated the proposed MQCC on various real-world multidimensional images, utilizing several filters through simulations on state-of-the-art quantum simulators from IBM Quantum and Xanadu.In our experiments, MQCC achieved higher classification accuracy over contemporary QML methods while having a reduced circuit depth and gate count.In our future work, we are planning on expanding MQCC with additional convolution capabilities, such as arbitrary striding and dilation, and further optimizing it for deployment on real-world quantum processors.In addition, we will investigate using our proposed quantum techniques for real-life applications such as medical imaging and classification.All gates, single-or multiqubit, can be extended to have control qubit(s), as shown for a general U in (A9) and Figure A4 [33].When a general multiqubit gate is extended to have more control qubits, it becomes an n-qubit multiple-controlled U (MCU) gate.In the context of CNOT gates, the operation becomes a multiple-controlled-not (MCX) gate [52].For an n-qubit MCX gate with n − 1 control qubits and 1 target qubit (see (A10)), the depth of an MCX gate with the addition of a single extra qubit can be found in (A11) [52].
The most general controlled gate is the multiplexer, see (A12), which defines a quantum operation to be applied on n target qubits for each state permutation of some control qubit(s), n control [33].In quantum circuits, the square box notation is used to denote a multiplexer operation, as shown in Figure A5 [33].Here, U i is a matrix of size 2 n target × 2 n target defining the unitary operations/gates applied on each data qubit for the corresponding values i, where, 0 < i < 2 n control −1 .Each quantum shift operation U ±1 shift acting on an n-qubit state can be decomposed into a pyramidal structure of n × MCX gates in a series pattern [20], as shown in Figure A8a,b.In terms of fundamental single-qubit and CNOT gates, the depth of a quantum shift operation with ±1 can be demonstrated as quadratic [52]; see (A15).Generalized quantum shift operations U ±k shift can be derived using k × U ±1 shift operations, from the expression in (A14), and the corresponding circuit depth can be derived as shown in (A16).

Figure 5 .
Figure 5.Quantum fully connected layer with an N out -feature output.

Figure 6 .
Figure 6.Decomposed and simplified quantum fully connected layer.

Figure 9b ,Figure 9 .
Figure 9b, l number of H gates are applied per dimension.Each level of decomposition reduces the size of the corresponding dimension by a factor of 2.

Figure 11 .
Figure 11.High-level overview of the MQCC architecture.

Figure 19 .
Figure 19.Fidelity of 2D convolution filters with unity stride on 2D B/W data (sampled with 1,000,000 shots).

Figure 22 .
Figure 22.Number of training parameters for ML models for tested data sizes.