Quantum computing model of an artificial neuron with continuously valued input data

Artificial neural networks have been proposed as potential algorithms that could benefit from being implemented and run on quantum computers. In particular, they hold promise to greatly enhance Artificial Intelligence tasks, such as image elaboration or pattern recognition. The elementary building block of a neural network is an artificial neuron, i.e. a computational unit performing simple mathematical operations on a set of data in the form of an input vector. Here we show how the design for the implementation of a previously introduced quantum artificial neuron [npj Quant. Inf. $\textbf{5}$, 26], which fully exploits the use of superposition states to encode binary valued input data, can be further generalized to accept continuous -- instead of discrete-valued input vectors, without increasing the number of qubits. This further step is crucial to allow for a direct application of an automatic differentiation learning procedure, which would not be compatible with binary-valued data encoding.


I. INTRODUCTION
Quantum computers hold the promise to greatly enhance the computational power of not-so-distant in future computing machines [1,2]. In particular, improving machine learning techniques by means of quantum computers is the essence of the raising field of the field of Quantum Machine Learning [3][4][5]. Several models for the quantum computing version of artificial neurons have been proposed [6][7][8][9][10][11], together with novel quantum machine learning techniques implementing classification tasks [12][13][14], quantum autoencoders [15,16], and quantum convolutional networks [17,18], to give a nonexhaustive list.
In this context, quantum signal processing leverages the capabilities of quantum computers to represent and elaborate exponentially large arrays of numbers, and it could be used for enhanced pattern recognition tasks, i.e. going beyond the capabilities of classical computing machines [19]. In these regards the development of Neural Networks dedicated for quantum computers [20] is of fundamental importance, due to the preponderance of this type of classical algorithms in image processing [21,22].
In the commonly accepted terminology of graph theory, neural networks are directed acyclic graphs (DAG), i.e., a collection of nodes where information flows only in one direction, without any loop. Each node is generally defined an artificial neuron, i.e., a simplified mathematical model of natural neurons. In practice, it consists of an object function that takes some input data, processes them using some internal parameters (defined weights), and eventually gives an output value. In their simplest * stefano.mangini01@universitadipavia.it † Present address IBM Research GmbH, Zurich Research Laboratory, Säumerstrasse 4, CH-8803 Rüschlikon, Switzerland form, the so called McCulloch-Pitts neurons [23] only deal with binary values, while in the most common and most useful form, named perceptron [24], they accept real, continuously valued inputs and weights. Continuous inputs are not possible in conventional, digital computers, and these are usually rendered by using bit strings: a grey scale image pixel is for instance usually rendered in natural numbers on a scale from 0 to 255 using 8-bit binary strings. Some approaches propose to use a similar representation in quantum computers by assigning several qubits per value [25][26][27]. However, these approaches are particularly wasteful, especially in light of the fact that quantum mechanical wavefunctions can be inherently represented as continuously valued vectors.
A previous work [9] introduced a model for a quantum circuit mimicking a McCulloch-Pitts neuron. Here we generalize that model to the case of a quantum circuit accepting also continuously valued input vectors. We thus present a model for a continuous quantum neuron which, as we will see, can be used for pattern recognition in greyscale images without the need to increase the number of qubits to be employed. This represents a further memory advantage with respect to classical computation, where an increase in the number of encoding bits is required to deal with continuous numbers. We employ a phase-based encoding, and show that it is particularly resilient to noise.
Differently from classical perceptron models, artificial quantum neurons as described, e.g., in Ref. [9] can be used to classify linearly non separable sets. In the continuously valued case, we thus harness the behaviour of our quantum perceptron model to show its ability to correctly classify several notable cases of linearly non separable sets. Furthermore, we test this quantum artificial neuron for digit recognition on the MNIST dataset [28], with remarkably good results. We further stress that the present generalization of the binary-valued artificial The artificial neuron evaluates a weighted sum between the input vector, i, and the weight vector, w, followed by an activation function, which determines the actual output of the neuron.
neuron model is a crucial step in view of fully exploiting the great potential allowed from automatic differentiation such as gradient descent. These techniques are commonly employed, e.g., in supervised and unsupervised learning procedures, and would be impossible to be applied to the oversimplified McCulloch-Pitts neuron model.

II. CONTINUOUSLY VALUED QUANTUM NEURON MODEL
A. The algorithm Let us consider a perceptron model with real valued input and weight vectors, which are respectively indicated as i and w, such that i j , w j ∈ R. A schematic representation of the classical perceptron is reported in Fig. 1.
Similarly, we define a model of a quantum neuron capable of accepting continuously valued input and weight vectors, by extending a previous proposal for the quantum computing model of an artificial neuron only accepting binary valued input data [9]. In order to encode data on a quantum state, we make use of a phase encoding. Given an input θ = (θ 0 , . . . , θ N −1 ) with θ i ∈ [0, π], which consists of the classical data to be analyzed, we consider the vector: which we will be referring to as the input vector in the following. With this input vector we define the input quantum state of n = log 2 N qubits: where the states |k denote the computational basis states of n qubits ordered by increasing binary repre-sentation, {|00 . . . 0 , |00 . . . 1 , · · · , |11 . . . 1 }. Since we are dealing with an artificial neuron, we have to properly encode another vector, which represents the weights in the form φ = (φ 0 , . . . , φ N −1 ) with φ i ∈ [0, π], i.e. the corresponding vector: which in turn defines the weight quantum state: Notice that (2) and (4) have the same structure, i.e. they consist of an equally weighted superposition of all the computational basis states, although with varying phases. By means of such encoding scheme, we can fully exploit the exponentially large dimension of the n qubits Hilbert space, i.e., by only using n qubits it is evidently possible to encode and analyze data of dimension N = 2 n . Due to global phase invariance, the number of actual independent phases is 2 n − 1, which does not spoil the overall efficiency of the algorithm, as it will be shown. We also notice that the class of states represented as 1 2 n/2 i e iαi |i , as (2) and (4) are known as locally maximally entanglable (LME) states, as introduced in Ref. [29].
Having defined the input and weight quantum states, their similarity is estimated by considering the inner product = 1 2 n e i(θ0−φ0) + · · · + e i(θ 2 n −1 −φ 2 n −1 ) , which corresponds to evaluating the scalar product between the input vector in Eq. (1) and the conjugated of the weight vector in Eq. (3), w * , similarly to the classical perceptron algorithm. Since probabilities in quantum mechanics are represented by the squared modulus of wavefunction amplitudes, we consider | ψ w |ψ i | 2 , which is explicitly given as (see App. A): It is easily checked that | ψ w |ψ i | 2 = 1 for θ i = φ i ∀i, since the two states would coincide in such case.
Equation (6) represents the activation function implemented by the proposed quantum neuron. Even if it does not remind any of the activation functions conventionally used in classical machine learning techniques, such as the Sigmoid or ReLu functions [30], its nonlinearity suffices to accomplish classification tasks, as we will discuss in the following sections.
Color invariance and noise resilience From Eq. (6), we define the activation function of the quantum artificial neuron as Keeping φ fixed, suppose two different input vectors are passed to the quantum neuron: θ and θ = θ + ∆, with ∆ = (∆, . . . , ∆). Whatever the value of ∆, it is easy to infer that both input vectors will result in the same activation function. Hence, two input vectors only differing by a constant, albeit real valued, quantity will be equally classified by such model of quantum perceptron. Hence, in the context of image classification, we can state that the present algorithm has a built in color translational invariance. This should not come as a surprise, since the activation function actually depends of the differences between phases. In fact, the artificial neuron tends to recognize as similar any dataset that displays the same overall differences, instead of perfectly coincident datasets.
Next, we assume that the input and weight vectors do coincide, but only up to some noise corrupting the input vector, such that: represents the small variations, now assumed to be different on each pixel. Substituting the above values in Eq. (7), we obtain Assuming then the noise factors, ∆ i , distributed according to a uniform distribution in the interval [−a/2, a/2], the activation function averaged over the probability distribution of ∆ i can be calculated as (see App. B): Since all the possible input data lie in the interval [0, π/2], a reasonable noise would be of the order of some fraction of π/2, which implies a < 1. Hence, in the case of small noise, Eq. (9) can be recast as Thus, the classification of the quantum neuron is only slightly perturbed by the presence of noise corrupting an input vector otherwise having a perfect activation. By similar calculations, it can be shown that this property also holds for any kind of input vector, not only those with perfect activation (see App. B).
After having outlined the main steps defining the quantum perceptron model for continuously valued input vectors, we now proceed to build a quantum circuit that allows implementing it on a qubit-based quantum computing hardware.

B. Quantum circuit model of a continuously valued perceptron
A quantum circuit implementing the quantum neuron described above is schematically represented in Fig. 2. The first section of the circuit, denoted as U i , transforms the quantum register, initialized into the reference state |0 ⊗n , to the input quantum state defined in Eq. (2); the following operation U w performs the inner product between the input and weight quantum state; finally, a multi-controlled CNOT targeting an ancillary qubit is used to extract the final result of the computation. We now explain in detail how these transformations can be achieved.

FIG. 2: Quantum circuit model of a perceptron with continuously valued input and weight vectors.
Starting from the n-qubit state, |00 . . . 0 = |0 ⊗n , the U i operation creates the quantum input state U i |0 ⊗n = |ψ i (2). Such a unitary can be built by means of a brute force approach. First of all, we apply a layer of Hadamard gates, H ⊗n , which creates the balanced superposition state H ⊗n |0 ⊗n = |+ ⊗n , with |+ = (|0 + |1 )/ √ 2. The quantum state |+ ⊗n consists of an equally weighted superposition of all the states in the n qubits computational basis, hence we can target each of them and add the appropriate phase to it, in order to obtain the desired result. This action corresponds to the diagonal (in the computational basis) unitary operation whose action is to phase shift each state of the computational basis, |i , to e iθi |i , with phases θ i ∈ R, that are (in general) independent from each other. We decompose where U (θ i ) is the unitary whose action is U (θ i ) |i = e iθi |i , while leaving all the other states in the computational basis unchanged. These unitaries are equivalent to a combination of X gates and a multi-controlled phase shift gate, C n−1 R(θ), where the phase shift gate is the unitary operation defined as R(θ) = 1 0 0 e iθ [31]. For example, suppose having n = 3 qubits, and consider the state |101 to be phase shifted to e iθ3 |101 . This transformation is achieved by the following quantum circuit: , while leaving all other states of the computational basis unchanged. Iterating a similar gate sequence for each state of the computational basis eventually yields the overall unitary operation, (11). So far, we have built the quantum circuit allowing to encode an arbitrary input vector: given the input i = (e iθ0 , e iθ2 , · · · , e iθ 2 n −1 ) as in Eq. (1), we create the state |ψ i (2), by means of the operation U i |0 ⊗n = U (θ)H ⊗n |0 ⊗n = |ψ i , whose parameters depend on the input entries.
The unitary U w can then be constructed in a similar fashion. First of all, notice that the U i is unitary, thus reversible. Be w = (e iφ0 , e iφ2 , · · · , e iφ 2 n −1 ) the weight vector, then the desired inner product ψ w |ψ i (6) resides in the overlap of the quantum state |φ i,w = (U (φ)H ⊗n ) † |ψ i with the ground state |0 . . . 0 . In fact, since U (φ)H ⊗n |0 ⊗n = |ψ w (4), the scalar product is clearly given as In order to extract the result, a final layer of X ⊗n gates is applied to all encoding qubits, such that the desired coefficient now multiplies the component |1 ⊗n in the superposition: with c 2 n −1 = ψ w |ψ i . Thus, the U w transformation in Fig. 2 actually consists in the quantum operations U w = X ⊗n H ⊗n U (φ) † . By means of a multi-controlled C n NOT, we load the result on an ancillary qubit (14) Eventually, a final measurement of the ancilla qubit will yield result 1, which is interpreted as a firing neuron, with probability |c 2 n −1 | 2 = | ψ w |ψ i | 2 = | i · w * | 2 /(2 2n ), which consists in the neuron activation function, Eq. (6).
We notice that an input vector containing N = 2 n elements only requires n + 1 qubits to implement the quantum circuit above, one of them being the ancilla qubit. To avoid introducing an ancilla qubit, an alternative strategy would be to perform a joint measurement on all n qubits in the state |φ i,w given in Eq. (12), with the probability of obtaining |0 . . . 0 being proportional to the inner product. However, with the idea of implementing the quantum computing version of a feedforward neural network, it is essential to have a model for which information is easily transferred from each neuron to the following layer. This can be accomplished by using an ancilla qubit per artificial neuron, where the quantity of interest can be loaded [32]. The time complexity of this quantum circuit depends linearly on the dimension of the input vectors N . Indeed, the quantum circuit introduced above requires O(N ) operations to implement all the phase shifts necessary to build the LME states, like Eq. (2). Depending of the relation between the input data, θ i , other preparation schemes involving less operations could be devised [29]. Finally, it is worth noticing that thanks to global phase invariance, the activation function (6) can be recast as: By exploiting this redefinition of the parameters, it is possible to implement the same transformation but employing fewer gates, since it is equivalent to leaving the state |0 ⊗n unchanged during the whole computation. Depending on the actual quantum hardware and data, further simplifications to the circuit could be obtained in compiling time. In Fig. 3, the scheme of a quantum circuit implementing the artificial neuron model is shown for the specific case involving n = 2 qubits.

III. RESULTS: IMAGE RECOGNITION AND LEARNING
The quantum neuron model introduced above is an ideal candidate to perform classification tasks involving Ui Uw FIG. 3: Scheme of the quantum circuit for the n = 2 qubits case. The parameters are redefined as Since we make use of a phase encoding, all inputs (and weights) to the artificial neuron have to be normalized in the interval [0, 2π]. In this work we further restrict this domain for two reasons. First, values in [0, π] and [π, 2π] are fully equivalent, due to the periodicity in phase and the squared modulus in Eq. (6); second, for the same reason, states with zero or π phase yield the same activation function, which in turn means that images with inverted colors (i.e., by exchanging white with black) would be recognized as equivalent by this perceptron model. Hence, to distinguish a given image from its negative, we further restrict the input and weight elements to lie in the range [0, π/2]. Thus, an image such as the one reported in Fig. 4 is subject to the normalization (255, 170, 85, 0) → π/2 255 (255, 170, 85, 0), before using it as an input vector to be encoded into the quantum neuron model. We implemented and tested the quantum circuit both on simulators and on real quantum hardware, by using the IBM Quantum Experience 2 and Qiskit [33]. The results are reported in the following.

A. Numerical Results
To better appreciate the potentialities of the continuously valued quantum neuron, we analyse its performance in recognizing similar images. We fix the weight vector to φ = (π/2, 0, 0, π/2), which corresponds to the checkerboard pattern represented in the image of Fig. 5, and then generate a few random images to be used as inputs to the quantum neuron. For each input, the circuit is executed multiple times, thus building a statistics of the outcomes. With m = 30 random generated images, the results of the classification are depicted in Fig. 5, which includes the analytic results, the results of numerical simulations run on Qiskit QASM Simulator backend, and finally the results obtained by executing the quantum circuit on the ibmqx2-yorktown (accessed in March 2020) real device. Due to errors in the actual quantum processing device, the statistics of the outcome differ from either the simulated one or the analytic result. Nevertheless, the same overall behaviour can be easily recognised, thus showing that the quantum neuron circuit can be successfully implemented also in an actual quantum processor giving reliable results for such recognition tasks. The images producing the largest activation are the ones corresponding to input vectors similar to the checkerboard-like weight vector, which confirms the desired behaviour of the quantum neuron in recognizing similar images. On the contrary, the images with lowest activation are similar to the negative of the target weight vector, as desired.

IV. LEARNING
The process of finding the appropriate value for the weights to implement a given classification is called learning, and it is generally based upon an optimization procedure in which a cost function is minimized by some gradient descent technique. Ideally, the minimum of the cost function corresponds to the targeted solution.
A simple learning task for our quantum neuron is to recognize a single given input. Starting from an input vector, θ, we aim at finding a weight vector, φ, producing a high activation. Since the activation function for our quantum neuron is given in Eq. (6), we know that perfect activation can only be obtained when the input and weight vectors are exactly coincident, θ = φ. This case can easily be checked numerically, by letting the neuron learn the right weights through a classical optimization technique.
A naive yet efficient choice for the cost function driving the learning process is L(φ) = (1 − f (θ, φ)) 2 , in which f (θ, φ) is the activation function of the artificial neuron with input θ and weight vector φ, as in Eq. (7). The minimum of the cost function, zero, is reached when the quantum neuron has full activation, i.e. f (θ, φ) = 1. The minimization process is driven by the Stochastic Perturbation Stochastic Approximations (SPSA) [34], which is built for optimization processes characterized by the presence of noise and is thus particularly effective in the presence of probabilistic measurement outputs. An actual implementation on the QASM simulator leads to the following results. The task is to recognize the input vector θ = (π/5, 0, π/3, 0.1). Using the SPSA optimizer, the cost function gets minimized by varying the weight vector, as reported in Fig. 6a, where it is evident that the cost function rapidly converges to values close to zero after a few iteration steps. The solution to the problem, that is the final optimized weight vector, is φ f = (1.03, 0.19, 1.47, 0.61), whose grayscale representation is plotted in Figure 6b. Even if the input and weight vector are not numerically equivalent, we can see that the final weight image actually looks very much like the target one, as expected. In fact, the two images retain almost the same shades of gray, with the optimized one being a bit shifted towards the brightest end of the spectrum, and as we previously noticed, the neuron is blind to overall color shifts.
In general, when dealing with a classification task, there is more than one input vector to be classified. Let us restrict ourselves to the case of a supervised binary classification 3 , where a each input θ is associated to a binary label, y, such that y ∈ {0, 1}. Thus, the learning procedure consists in finding the right parameters (i.e. a weight vector w) for which the artificial neuron reproduces the correct association of a given input vector with its corresponding label. In order to implement this dichotomy in the perceptron model, it is common practice to introduce a threshold value, t: given an input and a weight vector, if the activation of the artificial neuron is above the value set by t, then the assigned label is 1; otherwise it is 0. A common choice for the cost function is the distance of the correct label assignment from the one implemented by the artificial neuron, which is expressed as where M is the number of input entries, y i is the correct label associated to input value θ i , andỹ i is the label assigned by the neuron, which is calculated as The learning process then consists in minimizing the cost function, such as the one in Eq. (16). Generally speaking, in a supervised learning procedure the inputs are divided into two distinct sets: the training set, which contains the input values that are used to drive the learning procedure, and the test set, which contains input vectors used to test the actual classification power of the quantum neuron with data never analysed before. Now that we have introduced the general learning framework, we can apply it to a few specific cases.
A. Learning of two dimensional data As a first example, we consider a classification problem of the form {x i , y i } i=1,...,M , in which x i = (x i 1 , x i 2 ) are two dimensional input data, and y i their labels, such as the ones represented in Fig. 7a. The color indicates the label associated to the input value, i.e., red for zero and blue for one. Since the data are two dimensional, we only need a single qubit to encode the information in the quantum state. The cost function (16) is minimized using the SPSA optimizer and its behaviour is reported in Fig. 7c. The learning procedure converges towards a minimum of the cost function, and its value on the test set displayed in Fig. 7b amounts to L test = 0. This can be seen in Fig. 7b, where we plot the decision boundary of the neuron along with the input values of the test set. All the calculations were performed on the QASM simulator.

B. Non separable points using a bias
We have just shown that a single neuron is sufficient to classify some kind of two dimensional data. The procedure might fail on more complex structures of the dataset, though. For example, if one needs to classify data as in Fig. 8a, a single qubit encoding of the quantum perceptron model is not enough. However, using a quantum neuron implemented with two qubits allows to capture more degrees of freedom, thus helping to successfully tackle the problem. In fact, with n = 2 qubits it is possible to encode 2 2 = 4 parameters, or input data. Two of these are employed to encode the actual data of interest, one can be kept fixed to zero 4 , and the last free parameter can be interpreted as a bias. Thus, a convenient encoding scheme is to consider input vectors of the form θ = (θ 0 , θ 1 , θ 2 , θ 3 ) = (0, x 1 , x 2 , 0), and weight vectors φ = (φ 0 , φ 1 , φ 2 , φ 3 ) = (0, φ 1 , φ 2 , b), where b denotes the bias. After the learning procedure, reported in Fig. 8c, the test set is classified as in Fig. 8b.

C. MNIST dataset
As a concluding example, it is interesting to show the application of the proposed quantum neuron model to the classification of the MNIST dataset, composed of 70000 grayscale images of digits ranging from zero to nine. A selection of sample images extracted from the given dataset are reported in Fig. 9. We limit ourselves to the binary problem of correctly classifying the images of zeros and ones. Since each image in the MNIST dataset is composed of 28 × 28 pixels, which is clearly not in the suited form 2 n/2 × 2 n/2 required to be encoded on the quantum state of an artificial neuron with N = 2 n input data, we modify the images by adding a number of white redundant pixels, such that the processed images have 32 × 32 pixels. A quantum artificial neuron with n = 10 qubits can thus be used to encode the input images. Here we limit our analysis to checking whether the activation function introduced in Eq. (6) is sufficient to discriminate between the encoded images of zeros and ones. With this goal in mind, we fix the weight vector of the artificial neuron to a sample "one" selected from the MNIST dataset, and then proceed to the classification with the remaining input images. Using a threshold of t = 0.85 in Eq.  images amounts to L ∼ 0.02, which in turn means an accuracy ∼ 98%. In Fig. 10 it is shown the confusion matrix of some zeros and ones from the MNIST dataset evaluated with the activation function of the quantum neuron. According to the artificial neuron, the "ones" are more similar among each other with respect to the "zeros". Even if classical machine learning techniques can yield a classification accuracy above 99%, the present results show a remarkable degree of precision, also considering that in this particular example no learning and optimization procedure has been used, and just a single quantum neuron has been used for the classification. In addition to this strategy, we also tried a pooling procedure, in which each image in the MNIST dataset is first reduced to a 4 × 4 image by means of a mean pooling filter, and then classified by the neuron. After the learning, the neuron reaches a best accuracy around 80%. Nonetheless, these preliminary results show the potential of the activation function implemented by the quantum neuron to be used for recognition of complex patterns, such as numerical digits. Our quantum neuron model performs well when compared with other proposed quantum algorithms for the classification of the MNIST dataset. In fact, alternative algorithms have been proposed for this task, some of them using a hybrid classical-quantum approach, such as leveraging well established classical pre/post processing of data through classical machine learning techniques [35,36]. These hybrid approaches may yield higher (although comparable) classification accuracy when compared to our quantum neuron model. However, we emphasize that in our case the artificial neuron model is fully quantum in nature. When compared to other works using only quantum resources, our model seems to yield better results [28,37]. FIG. 10: Confusion matrix related to some of the sample "one" and "zero" images taken from the MNIST dataset, evaluated with the activation function in Eq. (6) and implemented by our quantum neuron model.

V. CONCLUSIONS
We have reported on a novel quantum algorithm allowing to implement a generalized perceptron model on a qubit-based quantum register. This quantum artificial neuron accepts and analyzes continuously valued input data. The proposed algorithm is translated into a quantum circuit model to be readily run on existing quantum hardware. It takes full advantage of the exponentially large Hilbert space available to encode input data on the phases of large superposition states, known as locally maximally entanglable (LME). These LME states can be constructed with a bottom-up approach, by imprinting each single phase separately. However, it should be stressed that alternative and possibly more efficient strategies could directly yield such states as ground states of suitable Hamiltonians, or as stationary states from dissipative processes [29]. The proposed continuously valued quantum neuron proves to be a good candidate for classification tasks of linearly non-separable two dimensional data, mostly related to pattern recognition tasks involving grayscale images. In this regard, thanks to the phase encoding, the neuron can leverage a built-in "color translational" invariance, as well as significant noise re-silience. In particular, the activation function implemented by the quantum neuron yields very high accuracy in the order of 98% when used to discriminate between images of zeros and ones from the MNIST dataset, thus indicating the ability to distinguish also complex patterns. A further step would be to consider multiple layers of connected quantum neurons to build a continuous quantum feed-forward neural network. In addition, it would be interesting to study the application of phase encoding to other quantum machine learning techniques, such as quantum autoencoders. An important future direction would also be to design approximate methods to perform the weight unitary transformation in a way which scales more favorably with the number of encoding qubits: this could be achieved, for example, by training suitable variational or adaptive quantum circuits.
The squared modulus of the collection of complex numbers {z i = r i e iγi ∈ C| i = 1, . . . , N }, is given as where in the last line the following relation has been applied e ix + e −ix = 2 cos(x) .
Setting r i = 1/N and γ i = θ i − φ i , respectively, we finally get: (A3) which correctly reduces to Eq. (6) in the main text, upon substituting N = 2 n and shifting the summation indices to start from zero.