Quantum Lernmatrix

We introduce a quantum Lernmatrix based on the Monte Carlo Lernmatrix in which n units are stored in the quantum superposition of log2(n) units representing On2log(n)2 binary sparse coded patterns. During the retrieval phase, quantum counting of ones based on Euler’s formula is used for the pattern recovery as proposed by Trugenberger. We demonstrate the quantum Lernmatrix by experiments using qiskit. We indicate why the assumption proposed by Trugenberger, the lower the parameter temperature t; the better the identification of the correct answers; is not correct. Instead, we introduce a tree-like structure that increases the measured value of correct answers. We show that the cost of loading L sparse patterns into quantum states of a quantum Lernmatrix are much lower than storing individually the patterns in superposition. During the active phase, the quantum Lernmatrices are queried and the results are estimated efficiently. The required time is much lower compared with the conventional approach or the of Grover’s algorithm.


Introduction
There are two popular models of quantum associative memories, the quantum associative memory as proposed by Venture and Martinez [1-3] and the quantum associative memory as proposed by Trugenberger [4,5]. Both models store binary patterns represented by linear independent vectors by basis encoding. They prepare the linear independent states by a procedure that is based on dividing present superposition into processing and memory terms flagged by an ancilla bit. New input patterns are successively loaded into the processing branch that is divided by a parametrized controlled-U operation on an ancilla and then the pattern is merged, resulting in a superposition of linear independent states. The method is linear in the number of stored patterns and their dimension [6].
In the quantum associative memory as proposed by Venture and Martinez, a modified version of Grover's search algorithm [7][8][9][10], ref. [7] is applied to determine the answer vector to a query vector [1][2][3]. In Trugenberger's model, the retrieval mechanism is based on Euler's formula to determine if the input pattern is similar to the set of stored patterns. In an additional step, the most similar pattern can be estimated by the introduced temperature parameter or alternatively by the Grover's search algorithm. Both models suffer from the problem of input destruction (ID problem) [11][12][13]: • The input (reading) problem: The amplitude distribution of a quantum state is initialized by reading n data points. Although the existing quantum algorithm requires only O( √ n) steps or less and is faster than the classical algorithms, n data points must be read. Hence, the complexity of the algorithm does not improve and is The destruction problem: A quantum associative memory [1][2][3][4][5] for n data points for dimension m requires only m · log(n) or fewer units (quantum bits). An operator, which acts as an oracle [3], indicates the solution. However, this memory can be queried only once because of the collapse during measurement (destruction); hence, quantum associative memory does not have any advantages over classical memory.
Most quantum machine learning algorithms suffer from the input destruction problem [13]. Trugenberger tries to overcome the destruction problem by the probabilistic cloning of the quantum associative memory [4,14]. This approach was criticized in [15]. The efficient preparation of data is possible in part for spare data [16]. However, the input destruction problem is not solved till today, and usually theoretical speed ups are analyzed [17] by ignoring the input problem, which is the main bottleneck for data encoding.
In our approach, the preparation costs in which data points must be read and the query time are represented by two phases that are analyzed independently. As in the Harrow [16] approach, our data are sparse. The sparse data are stored in the best possible distributed compression methods [18,19] by a Lernmatrix [20,21] also called Willshaw's associative memory [22]. Our quantum Lernmatrix model is based on the Lernmatrix.
We prepare a set of quantum Lernmatrices in superposition. This preparation requires a great deal of time and we name it the sleep phase. On the other hand, in the active phase, the query operation is extremely fast. The cost of the sleep phase and the active phase are the same as one of a conventional Lernmatrix. We assume that in the sleep phase we have enough time to prepare several quantum Lernmatrices in superposition.
The quantum Lernmatrices are kept in superposition until they are queried in the active phase. Each of the copies of the quantum Lernmatrix can be queried only one time due to the destruction problem. We argue that the advantage to conventional associative memories is present in the active phase where the fast determination of information is essential. The naming of the phases is in analogy to a living organism that prepares itself during the sleep for an active day.
The quantum Lernmatrix does not store independent vectors, but units that represent the compressed binary patterns. The units are described by binary weight vectors that can be correlated, so we cannot use the approach as proposed by Venture, Martinez and Trugenberger. Instead, we prepare the superposition of the weight vectors of the units by the entanglement of index qubits in superposition with the weight vectors. The retrieval phase is based on Euler's formula as suggested by Trugenberger [4,14]. However, we do not determine the Hamming distance to the query vector, but the number of ones of the query vector that are present in the weight vector. We indicate the quantum Lernmatrix qiskit implementation step by step. Qiskit is an open-source software development kit (SDK) for working with quantum computers at the level of circuits and algorithms from IBM [23]. The paper is organized as follows: • We introduce the Lernmatrix model described by units that model neurons. • We indicate that Lernmatrix has a tremendous storage capacity, much higher than most other associative memories. This is valid for sparse equally distributed ones in vectors representing the information. • Quantum counting of ones based on Euler's formula is described.

•
Based on the Lernmatrix model, a quantum Lernmatrix is introduced in which units are represented in superposition and the query operation is based on quantum counting of ones. The measured result is a position of a one or zero in the answer vector. • We analyze the Trugenberger amplification. • Since a one in the answer vector represents information, we assume in that we can reconstruct the answer vector by measuring several ones, taking for granted that the rest of the vector is zero. In a sparse code with k ones, k measurements of different ones reconstruct the binary answer vector. We can increase the probability of measuring a one by the introduced tree-like structure. • The Lernmatrix can store much more patterns then the number of units. Because of this, the cost of loading L patterns into quantum states is much lower than storing the patterns individually.

Lernmatrix
Associative memory models human memory [24][25][26][27][28]. The associative memory and distributed representation incorporate the following abilities in a natural way [18,[28][29][30]: • The ability to correct faults if false information is given. • The ability to complete information if some parts are missing. • The ability to interpolate information. In other words, if a sub-symbol is not currently stored, the most similar stored sub-symbol is determined.
Different associative memory models have been proposed over the years [19,28,[31][32][33]]. The Hopfield model represents a recurrent model of the associative memory [29,31,34], it is a dynamical system that evolves until it has converged to a stable state. The Lernmatrix, or Willshaw's associative memory, also simply called "associative memory" (if no confusion with other models is possible [32,33]), it was developed by Steinbuch in 1958 as a biologically inspired model from the effort to explain the psychological phenomena of conditioning [20,21]. The goal was to produce a network that could use a binary version of Hebbian learning to form associations between pairs of binary vectors. Later, this model was studied under biological and mathematical aspects mainly by Willshaw [22] and Palm [18,24] and it was shown that this simple model has a tremendous storage capacity.
Lernmatrix is composed of a cluster of units. Each unit represents a simple model of a real biological neuron. Each unit is composed of binary weights, which correspond to the synapses and dendrites in a real neuron (see Figure 1).    The presence of a feature is indicated by a "one" component of the vector, its absence through a "zero" component of the vector. A pair of these vectors is associated and this process of association is called learning. The first of the two vectors is called the query vector and the second, the answer vector. After learning, the query vector is presented to the associative memory and the answer vector is determined by the retrieval rule.

Learning and Retrieval
Initially, no information is stored in the associative memory. Because the information is represented in weights, all unit weights are initially set to zero. In the learning phase, pairs of binary vector are associated. Let x be the query vector and y the answer vector, the learning rule is: (1) This rule is called the binary Hebbian rule [18]. Every time a pair of binary vectors is stored, this rule is used.
In the one-step retrieval phase of the associative memory, a fault tolerant answering mechanism recalls the appropriate answer vector for a query vector x.
The retrieval rule for the determination of the answer vector y is: where T is the threshold of the unit. The threshold T is set to the number of "one" components in the query vector x, T := |x|. If the output of the unit is 1, we say that the units fires, and for the output 0 the unit does not fire. The cost of the one-step retrieval is O(n · m). The retrieval is called: • Hetero-association if both vectors are different x = y, • Association, if x = y, the answer vector represents the reconstruction of the disturbed query vector.
For simplicity, we assume that the dimension of the query vector and the answer vector are the same, n = m.

Storage Capacity
We analyze the optimal storage costs of the Lernmatrix. For an estimation of the asymptotic number L of vector pairs (x, y) that can be stored in an associative memory before it begins to make mistakes in the retrieval phase, it is assumed that both vectors have the same dimension n. It is also assumed that both vectors are composed of k ones, which are equally likely to be in any coordinate of the vector. In this case, it was shown [18,19,38] that the optimum value for k is approximately For example, for a vector of the dimension n = 1,000,000, only k = 18 ones should be used to code a pattern according to the Equation (3). For an optimal value for k according to the Equation (3) with ones equally distributed over the coordinates of the vectors, approximately L vector pairs can be stored in the associative memory [18,19]. L is approximately This value is much greater than n. The estimate of L is very rough because Equation (3) is only valid for very large networks; however, the capacity increase is still considerable. The upper bound for large n is I = n 2 log 2 = n 2 · 0.693 (5) the asymptotic capacity is 69.311% percent per bit, which is much higher than most associative memories. This capacity is only valid for sparse equally distributed ones [18]. The promise of Willshaw's associative memory that it can store much more patterns then the number of units. The cost of loading L = (ln 2)(n 2 /k 2 ) patterns in n units with . It is much lower than storing the L patterns in a list of L units O(n · L) This is because L > n, or The Lernmatrix has a tremendous storage capacity [18,19], it can store much more patterns then the number of units. The description of how to generated efficiently binary sparse codes of visual patterns or other data structure is described in [39][40][41]. For example, real vector patterns have to be binarized.
The asymptotic capacity is 69.311% per bit, which is much higher than most associative memories. This capacity is only valid for sparse equally distributed ones [18]. The description of how to generate efficiently binary sparse codes of visual patterns or other data structures is described in [39][40][41]. For example, real vector patterns have to be binarized.

Large Matrices
The diagram of the weight matrix illustrates the weight distribution, which results from the distribution of the stored patterns [42,43]. Useful associative properties result from equally distributed weights over the whole weight matrix and are only present in large matrices. A high percentage indicates an overload and the loss of its associative properties. Figure 5 represents a diagram of a high loaded matrix with equally distributed weights. Figure 5. The weight matrix after learning of 20,000 test patterns, in which ten ones were randomly set in a 2000 dimensional vector represents a high loaded matrix with equally distributed weights. This example shows that the weight matrix diagram often contains nearly no information. Information about the weight matrix can be extracted by the structure of weight matrix. (White color represents wights.)

Monte Carlo Lernmatrix
The suggested probabilistic retrieval rule for the determination of the answer vector y for the query vector x is and describing the probability of firing or not firing of one unit with During the query operation one unit is randomly sampled and either it fires or not according to the probability distribution.
To determine the answer vector, we have to sample the Monte Carlo Lernmatrix several times. For the reconstructed vector three states will be present: 1 for fired units, 0 for not fired units and unknown for silent units. The Monte Carlo Lernmatrix is a close description of the quantum Lernmatrix. In the quantum Lernmatrix, units are represented by quantum states, with sampling correspond to the measurement.

Qiskit Experiments
Qiskit is an open-source software development kit (SDK) for working with quantum computers at the level of circuits and algorithms [23], IBM Quantum, https://quantumcomputing.ibm.com/ (accessed on 25 May 2023 ), 2023, Qiskit (Version 0.43.0). Qiskit provides tools for creating and manipulating quantum programs and running them on prototype quantum devices on the IBM Quantum Experience or on simulators on a local computer. It follows the quantum circuit model for universal quantum computation and can be used for any quantum hardware that follows this model. Qiskt provides different backend simulator functions.
In our experiments, we use the statevector simulator. It performs an ideal execution of qiskit circuits and returns the final state vector off the simulator after application (all qubits). The state vector of the circuit can represent the probability values that correspond to the multiplication of the state vector by the unitary matrix that represents the circuit. We use the statevector simulator to check the value of all qubits.
If we want to simulate an actual device of today, which is prone to noise resulting from decoherence, we can use the qasm simulator. It returns counts, which are a sampling of the measured qubits that have to be defined in the circuit. One can easily port the simulation using simulator = Aer.get_backend('statevector_simulator') and the command qc.measure(qubits, c) indicates that we measure the qubits (the counting begins with zero and not one) and store the result of the measurement in the conventional bits c.
Our description involve simple quantum circuits using basic quantum gates that can be easily ported to other quantum software development kits.

Quantum Counting Ones
In a binary string of the length N, we can represent the fraction of k ones by the simple formula k/N and of the zeros as (N − k)/N resulting in a linear relation. We can interpret these numbers as probability values. We can map these linear relations into the sigmoid-like probability functions for the presence of ones using Euler's formula [4] in relation to trigonometry and of zeros with together with in the Figure 6, the sigmoid-like probability functions for N = 8 are indicated. This operation can be implemented by quantum counting of ones. In our example, the state |101 is represented by N = 3 qubits, of which two (k = 2) are one.
To count the number of ones, we introduced the control qubit in superposition 1/ √ 2 · (|0 + |1 ). For the superposition part represented by the control qubit 0, the phase e i· π 2·3 is applied for each one. For the superposition part represented by the control qubit 1, the phase e −i· π 2·3 is applied for each one.
If we apply a Hadamard gate to the control qubit [4], we obtain The probability of measuring the control qubit |0 is p(|0 ) = p(|0101 ) = cos π · 2 2 · 3 2 = 0.25 and the probability of measuring the control qubit |1 is indicating the presence of two ones. The representation of the circuit in qiskit is given by from qiskit import QuantumCircuit, Aer, execute from qiskit.visualization import plot_histogram from math import~pi qc = QuantumCircuit (4)  the resulting quantum circuit is represented in Figure 7 and the resulting histogram of the measured qubits is represented in Figure 8.

Quantum Lernmatrix
Useful associative properties result from equally distributed weights over the whole weight matrix and are only present in large matrices, in our examples we examine toys examples as a proof of concept for future quantum associative memories.
The superposition of the weight vectors of the units is based on the entanglement of the index qubits that are in the superposition with the weight vectors. The count is represented by a unary string of qubits that controls the phase operation. It represents the net value of the Lernmatrix. The phase information is the basis of the quantum counting of ones that increases the probability of measuring the correct units representing ones in the answer vector. We will represent n units in superposition by entanglement with the index qubits.
To represent four 4 units, we need two index qubits in superposition. Each index state of the qubit is entangled with a pattern by the Toffoli gate also called the ccX gate (CCNOT gate, controlled controlled not gate), by setting a corresponding qubit to one. In our example, we store three patterns x 1 = (1, 0, 0, 1); y 1 = (1, 0, 0, 1), x 2 = (1, 0, 0, 0); y 2 = (0, 1, 0, 0) and x 3 = (0, 0, 1, 0); y 3 = (0, 0, 1, 0) resulting in the weight matrix represented by four units (see Figure 9). After the entanglement of index qubits |index j in superposition with the weight vectors the following state is present, the state count j and unit j are represented by four qubits each for the four binary weights, with |unit j = |(w 1 w 2 w 3 w 4 ) j (see Figure 10) Figure 10. The quantum circuit that produces the sleep phase. The qubits 0 to 3 represent the query vector, the qubits 4 to 7 the associative memory, the qubits 8 to 11 represent the count and the qubits 12 and 13 are the index qubits, while the qubit 14 is the control qubit.
The value |count j is the unary representation of the Lernmatrix value net j . We include the query vector as x q = (1, 0, 0, 1), the resulting histogram of the measured qubits is represented in Figure 11. The qubits 0 to 3 represent the query vector x q = (1, 0, 0, 1), the qubits 4 to 7 the associative memory, the qubits 8 to 11 represent the count, the qubits 12 and 13 are the index qubits, and the control qubit 14 is zero. Note that the units are counted in the reverse order by the index qubits: 11 for the first unit, 10 for the third unit, 01 for second unit and 00 for the fourth unit.
In the next step, we describe the active phase (see Figure 12). For simplicity, we will ignore the index qubits, since they are not important in the active phase. We perform quantum counting using the control bit that is set in superposition resulting in Applying controlled phase operation with N = 2 since two ones are present in the query vector and count j ≤ 2 (16) and applying the Hadamard gate to the control qubit, we obtain The architecture is described by fifteen qubits, see Appendix A. With the query vector x q = (1, 0, 0, 1) units represented by the states have following values: • The first unit has the value count 1 = 2 and the two corresponding states are: for the control qubit = 1 the value is 1 = sin π 2 with the measured probability sin π 2 · 1 2 2 = 0.25 and for the control qubit = 0 the value is 0 = cos π 2 with the measured probability 0. • The second unit has the value count 2 = 1 and the two corresponding states are: for the control qubit = 1 the value is 1 √ 2 = sin π 4 with the measured probability sin π 4 · 1 2 2 = 0.125 and for the control qubit = 0 the value is 1 √ 2 = cos π 4 with the measured probability cos π 4 · 1 2 2 = 0.125.

•
The third unit has the value count 3 = 0 and the two corresponding states are: for the control qubit = 1 the value is 0 = sin 0 with the measured probability = 0 and for the control qubit = 0 the value is 1 = cos 0 with the measured probability = 0. • The fourth unit has the (decimal) value count 4 = 2 and the two corresponding states are: for the control qubit = 1 the value is 1 = sin π 2 with the measured probability sin π 2 · 1 2 2 = 0.25 and for the control qubit = 0 the value is 0 = cos π 2 with the measured probability 0.
There are five states with probabilities not equal to zero, see Figure 13. The measured probability (control qubit = 1) indicating a firing of the units is 0.625.

Generalization
We can generalize the description for n units. After the entanglement of index qubits in superposition with the weight vectors, the following state is present, and the state count j and unit j are represented by [4,5], with the cost O(n 2 ). We apply the control qubit (ignoring the index qubits) Applying controlled phase operation with N for present ones in the query vector and · |count j |unit j ⊗ |query (20) and applying the Hadamard gate to the control qubit, we obtain the final result with The cost of one query is O(n) and for k = log 2 (n/4) queries O(log(n) · n).

Applying Trugenberger Amplification Several Times
According to Trugenberger [5], applying the control qubit sequential b times results in with |v being the binary representation of the decimal value v. The idea is then to measure b control qubits b times, until the desired state is obtained. Trugenberger identifies the inverse parameter b as temperature t = 1/b and concludes that the accuracy of pattern recall can be tuned by adjusting a parameter playing the role of an effective temperature [5].
In Figure 18, the control qubit was applied two times for the quantum circuit of the Figure 10. Figure 19 represents the resulting histogram of the measured qubits. Figure 18. Circuit representing the application of the control qubit two times for the quantum circuit of Figure 10.

Relation to b
Trugenberger [5] identifies t = 1/b as a temperature and concludes: the lower t; the better one can identify the desired states. Assuming we have eight states indicated by the index qubit 2, 3 and 4, one marked state 010 has the count two, and the other seven state the count of one, see Figure 20. Figure 21 represents the resulting histogram of the measured qubits (b = 1) and Figure 22 represents the resulting histogram after applying the control qubit two times (b = 2).   Now we can take the idea further and generalize it. For n states, one state is marked with the count of 2, and all other remaining states have the count of 1. Since there are n states, the marked state has the probability value 1/n and the 2 · (n − 1) remaining states have the probability value p(x). It follows and For the next control qubit, we would obtain resulting in the sequence After measuring the control qubit at step b, the probability of the marked state is (see Figure 23) and with the probability of measuring the control qubit at step b With the assumption of independence, measuring the control qubits in the sequence b = 1, b = 2, b = 3, · · · , b B p(control 1 , control 2 , · · · , control B ) = B ∏ j=1 p(control j ) (28) results in a low probability (see Figure 23). The assumption that "if t is lower (higher b;) than the determination of the desired states is better" is not correct. As a consequence, we can measure the sequential control qubits two times (b = 2) before the task becomes not tractable. With the assumption of independence, measuring the control qubits in the sequence b = 1, b = 2, b = 3, · · · , b B results in a low probability indicated by the circles. The x-axis indicates the number measurements b of the control qubits. As a consequence, we can measure the sequential control qubits two times before the task becomes not tractable.

Tree-like Structures
We want to increase the probability of measuring the correct units representing the ones in the answer vector and decrease the probability of measuring the zeros. For example, in a sparse code with k ones, k measurements of different ones reconstruct the binary answer vector and we cannot use the idea of applying Trugenberger amplification several times as indicated before. Instead, we can increase the probability of measuring a one by the introduced tree-like structure [44]. The tree-like hierarchical associative memory approach is based on aggregation of the neighboring units [44]. The aggregation is a Boolean OR-based transform for two or three neighboring weights of unit results resulting in a more dense memory, see Figure 24. Figure 24. (a) In our example, we store three patterns, x 1 = (1, 0, 0, 1), y 1 = (1, 0, 0, 1); x 2 = (1, 0, 0, 0), y 2 = (0, 1, 0, 0) and x 3 = (0, 0, 1, 0), y 3 = 0, 0, 1, 0), and the query vector is x q = (1, 0, 0, 1). (b) The aggregation is a Boolean OR-based transform for two neighboring weights of units results resulting in a more dense memory with x q = (1, 0, 0, 1, 1, 0, 0, 1) It was shown by computer experiments that the aggregation value between two and three is an optimal one [45]. The more dense memory is copied on top or the original memory. Depending on the number of units, we can repeat the process in which we aggregate groups of two to three neighboring groups of equal units. We can continue the process till we arrive in two different groups of different units, the number of possible different aggregated memories is logarithmic, with log(n − 1). Since in our example only four units are present, we aggregate two units resulting in a memory of four units described by 2 identical units each.
The query vector is composed of log(n − 1) concatenated copies of the original query vector, in our example x q = (1, 0, 0, 1, 1, 0, 0, 1). We apply controlled phase operation with N = 4 with count j ≤ 4, see Figure 24 and Appendix B. The measured probability (control qubit = 1) indicating a firing of the units is 0.838 and there are six states not equal to zero, see Figure 25 and compare with Figure 11.

Costs
We cannot clone an arbitrary quantum state; however, it was proposed that a quantum state can be probabilistic cloned up to a mirror modular transformation [14]. In an alternative approach, we prepare a set of quantum Lernmatrices in superposition. This preparation requires a great deal of time and we name it the sleep phase. The cost of storing L = (ln 2)(n 2 /k 2 ) patterns in n units with k = log 2 (n/4) in a Lernmatrix and consequently the quantum Lernmatrix is O(n 2 ) [18,19]. On the other hand, in the active phase, the query operation is extremely fast.

Query Cost of Quantum Lernmatrix
During the active phase, the quantum Lernmatrices are sampled with minimal costs in time. In Figure 26a, we compare the query cost of k queries of the quantum Lernmatrix representing the weight matrix of the size n × n to the cost of a classical Lernmatrix of the size n × n, which are O(log(n) · n) < O(n 2 ).
In Figure 26b, we compare the query cost of k queries of the quantum Lernmatrix representing the weight matrix of the size n × n to Grover's amplification algorithm on a list of L vectors of dimension n O(n · √ L) = O n 2 log(n) .

Conclusions
We introduced a quantum Lernmatrix based on the Monte Carlo Lernmatrix and preformed experiments using qiskit as a proof of concept for future quantum associative memories. We proposed a tree-like structure that increases the measured value for the control qubit indicating a firing of the units. Our approach does not solve the input destruction problem but gives a hint how to deal with it. We represent the preparation costs and the query time by two phases.
The cost of the sleep phase and the active phase are the same as one of a conventional associative memory O(n 2 ). We assume that in the sleep phase we have enough time to prepare several quantum Lernmatrices in superposition. The quantum Lernmatrices are kept in superposition until they are queried in the active phase. Each of the copies of the quantum Lernmatrix can be queried only once. We argue that the advantage to conventional associative memories is present in the active phase were the fast determination of information O(log(n) · n) is essential by the use of quantum Lernmatrices in superposition compared to the cost of the classical Lernmatrix O(n 2 ). Funding: This work was supported by national funds through FCT, Fundação para a Ciência e a Tecnologia, under project UIDB/50021/2020. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript. The authors declare no conflicts of interest. This article does not contain any studies with human participants or animals performed by any of the authors.