Design of Fully Analogue Artiﬁcial Neural Network with Learning Based on Backpropagation

. A fully analogue implementation of training algorithms would speed up the training of artiﬁcial neural networks. A common choice for training the feedforward networks is the backpropagation with stochastic gradient descent. However, the circuit design that would enable its analogue implementation is still an open problem. This paper proposes a fully analogue training circuit block concept based on the backpropagation for neural networks without clock control. Capacitors are used as memory elements for the presented example. The XOR problem is used as an example for concept-level system validation.


Introduction
The edge computation architectures minimise the movement of the data by performing the computations close to the sensors. The significant computing power in this domain comes with the motivation to design hardware for near-sensor data analysis, which is not affected by the von Neu-mann bottleneck [1][2][3][4]. This field has also been influenced by the recent advancement of the Internet-of-Things (IoT) devices [5][6][7].
While hardware-based ANNs currently exist, their training has to be performed with specialised software and FPGA-based units. An analogue implementation of the backpropagation would enable the creation of fully hardwarebased ANNs, which could be trained on-chip [2], [15]. Even though this can diminish structural flexibility, it also offers a significant speed-up in the training process of artificial neural networks [7,10,11].
For that, it is necessary to use an electronic component that can work as analogue memory and is possible to update. For this, state of the art articles are using memristors or memristive crossbar arrays [2-4, 6, 7, 10, 16]. Unlike CMOS technology, memristors cannot yet work precisely for largescale multi-level simulations [7], [10]. That is why in this proposal of block concept, capacitors are used as memory elements. Capacitors in VLSI occupy a lot of space, but for fast near-sensor application it is not always necessary to use large neural network structures. The capacitors can be changed to other technologies with transistor-level designing.
A large number of designs of an on-chip learning processes are done in steps [7,10,11]. This allows to use more flexible structure, but it slows down the learning process and disables the usage of the analogue input from sensors without sampling. That is why this design does not use any clock control. It avoids synchronization signals, and it makes this concept fully-analogue and fully-parallel [2], [15].
For above-mentioned reasons, this article presents and verifies the novel concept of a fully analogue training process of an artificial neural network circuit implementation inspired by the backpropagation and based on the gradient descent.

Background
The speedup of the training process of a neural network is based on several principles in this article. The training process is designed to be fully parallel [2,7,15]. Further, any clock control that holds back the propagation signal is avoided. The training process is realized by analogue electric circuit feedback; thus, it is fully analogue [15]. The design consists of two parts: forward propagation and backward propagation. Inputs, outputs and weights of a classical neural network [12] are represented by: input x i as V in i , outputŷ i as V out i , target y i as V out i and weight w i as V w i . For the transistor-level implementation, current can be used instead of voltage for x i ,ŷ i and y i .
The forward propagation of the designed neural network is defined at the level of a single neuron by where V out is the output signal (here, the voltage of this neuron), S(·) is the activation function of this neuron, V in i is one of the input signals of this neuron and V w i is the weight for input i, where for i = 0 the V in 0 is a bias input.
There are two parts to the backpropagation between layers. The first part solves the calculation of an error at the output layer of a neural network (Type 1 ). It is possible to define an error of the NN's output layer as a Mean Square Error (MSE) [12], [17] by where E [V 2 ] is the error, N is the number of outputs, V target i is the ideal output and V out i is the obtained output signal. The second part solves the backpropagation of the error between the hidden layers of a neural network (Type 2 ).
The backpropagation circuit implementation is based on a well-known backpropagation algorithm with the stochastic gradient descent method where the update of the weight [12], [13] is defined by where w k is a weight at step k ∈ N, E is the error and η is a learning rate.
The fully Analogue Artificial Neural Network (AANN) does not operate by separate and distinct steps. The proposed design for the update of the weight uses where V w (t) is a function of weight at continuous time t ∈ R.

Circuit Design
The proposed AANN is based on a structure that allows the construction of most types of neural networks such as RNN, CNN etc. Each of these structures consists of the same cells, called neurons. The proposed hardware concept of a fully analogue neuron with training is illustrated in Fig. 1. The signals can be realised by voltages or currents depending used technologies. For example, a multiplier can be implemented in both voltage and current mode. However the capacitors are necessary to charge by current and voltages of capacitors are used as a memory quantity, see Sec. 3.2. It consists of voltage multipliers, capacitors representing weights of a neuron, and blocks representing activation function and its derivative. The green boundary indicates the forward propagation part of the circuit. The blue boundary indicates the backpropagation part of the circuit.
The multiplier block has the schematic symbol shown in Fig. 2. It is a function which multiplies two voltages [19], [21] and whose output is a current calculated by is the ratio of output to the product of inputs. Figure 3 shows the schematic symbol of the block representing an activation function. A sigmoid is the most commonly used activation function, which is described by where V amp is a constant determining the maximum possible voltage and I ref is a referential current.    Figure 4 shows the schematic symbol of the block representing the derivative of an activation function. In the sigmoid activation function, the block is described by where K m is a proportionality voltage constant and V ref is a referential voltage.

Implementation of Forward Propagation
There have been many solutions to the problem of hardware implementation of forward propagation of neural networks [1,[7][8][9][10]15,18,19]. The whole behaviour is described by (1), and the analogue design is shown in Fig. 1 in the green area. The weights correspond to the voltages of capacitors [15]. Each input is connected to the multiplier, whose output is a current given by (5). The activation function is applied to the sum of these currents I net [18].
The crucial element in this type of implementation is the capacitor. Each neuron has N + 1 capacitors as weights, where N is the number of input. For example, neural network in Fig. 7 has 17 capacitors. However, bigger chip area and higher current consumption are required for bigger structures. The exact implementation depends on the technologies, but there could be tens of thousands of capacitors on a 1 mm 2 of a chip. That is sufficient for a large number of fast near-sensor applications.

Implementation of Backpropagation
The training process is described by (4). The weight update is implemented by charging a capacitor, which is described by For this implementation, the current is denoted as I charged and defined by where K η [S] is the conductance coefficient. Substitution of (9) to (8) gives where t is the continuous-time and C is the capacitance. Comparison of (10) and (3) leads to the definition of learning rate as K η C ≡ η. The process of a weight update is illustrated in Fig. 5.
The partial derivative of the error with respect to a weight V w is calculated using the chain rule twice where The first partial derivative from (11) is denoted in Fig. 1 The last partial derivative from (11) is solved as The remaining partial derivative cannot be solved generally. However, this is a derivative of the activation function. If the activation function is known, it is possible to calculate and implement this partial derivative as a separate sub-circuit.
The voltage V teach determines if the error is propagated and weights are updated in the neuron. If this voltage is set to 0 V, only the forward-propagation part of the circuit is active. If V teach > 0, the weights are being updated and the network is learning. The magnitude of V teach determines the magnitude of the analogue learning rate. In simulations in Sec. 4, V teach is set to 0 V for the forward-propagation mode and to 1 V for the learning mode.

Implementation of Backpropagation Between Layers
For simplicity, the whole circuit of the analogue neuron shown in Fig. 1 will be represented by the symbol shown in Fig. 6.
An example of an AANN composed of these neurons is presented in Fig. 7, which demonstrates a neural network with three layers. It contains two input signals represented as voltage sources V in 1 and V in 2 . The hidden layer in the figure is created by three neurons N 2 1 , N 2 2 and N 2 3 .  The output layer is created by neurons N 3 1 and N 3 2 . The blue area on the right side (denoted as Type 1 ) represents the error propagation for the output layer. It comes from the first partial derivative of (11) for the output layer and can be calculated as The backpropagation of each neuron in the hidden layers is dependent on the errors of all neurons that indirectly connect it to the output. So, for the hidden layers, the first partial derivative from (11) cannot be calculated directly as in the output layers. This partial derivative is then calculated as ∂E ∂V out = n ∂E ∂V out n ∂V out n ∂net n ∂net n ∂V out (15) where n is a neuron connected to the output. The sum is again implemented as summation of currents I E n shown in Fig. 6. It is then converted to voltage V E as shown in the Type 2 area in Fig. 7.
Implementation of the first two partial derivatives in the sum in (15) is shown in Fig. 1 in the top part of the backpropagation circuit. The last one is solved as which is implemented in Fig. 1 in the Type 2 area.

Circuit Simulation
The aim of the simulation is to verify that the learning process of the designed concept of the neural network works. For this purpose, it is not necessary to implement and simulate the circuit on the transistor level. That is the reason why the blocks are defined by mathematical expressions.
The whole concept is designed for analogue near-sensor applications. It means that it is not essential to compare the accuracy results with non-analogue neural networks. At the same time, it is not necessary to use complicated datasets such as MNIST for this verification. Spice OPUS is used as an engine to analyze the behaviour and parameters of the resulting AANN structure.

Learning Process
The first simulation demonstrates the learning process of a designed neural network by transient analysis. The simulated AANN is made up of three parts; one input, two neurons in a hidden layer and one output. The dataset contains two rows. For illustration, after each epoch, the V teach is toggled. The result of this simulation is shown in Fig. 8. It shows how the voltage V out converges to the voltage V target as the weights are updated. The speed of the training process is described in Sec. 4.4. V target V out V teach Fig. 8. Transient simulation of training process with dataset with two rows.

XOR Simulation
One part of verification is tested on AANN the XOR problem training, which is often used with analogue backpropagation [7], [15]. The simulations show how the training process of the designed neural network works. It is illustrated with the learning curves. As an example, an AANN with two inputs, eight neurons in a hidden layer and one output was simulated. The results of the analysis follow.
Each simulation was run ten times with different initial weights. At the end of each training epoch, the MSE was calculated. The results are shown in Figs. 9, 10 and 11, where each figure corresponds to a different choice of the learning rate η and each line of a different colour represents a different set of the initial network weights. It is possible to vary the learning rate by changing the value of V teach , time given for training on one row of a dataset or other deep circuit parameters.
The simulations show that the proposed concept of AANN converges. The speed and stability depends on set of hyperparameters. Furthermore, classical neural networks behave in a similar way [7], [15].

Dependence on Parasitic Properties
The whole article relies on the fact that the neural network can learn with its parasitic properties. However, this can work only to a certain extent. This simulation presents the toleration of this concept to some parasitic properties.
The block that most affects the scheme is the multiplier (Fig. 12). Inaccuracies are introduced into it according to where K I is a constant determining the maximum current.
The result of this simulation is shown in Fig. 13. The comparison with Fig. 10 shows that the parasitic properties affect the proposed concept. However, it does not fundamentally affect its functionality.
Similar simulations such as noise resistance, the influence of non-linearity of CMOS capacitance, etc. can be demonstrated only partly with block design level, there are dependent on used implementation.

Speed Comparison
One of the main reasons to use a fully AANN instead of a classical one is the speed of training. This simulation shows a comparison between the proposed AANN and a neural network implemented with TensorFlow, an end-to-end open source platform for machine learning [12]. The computations are run on Central Processing Unit (CPU), Graphics Processing Unit (GPU) and Tensor Processing Unit (TPU) hardware obtainable in Google Colab. [20] The NNs are trained and compared on the same datasets. The configurations are the most similar to the working flow of AANN described in this article. Unfortunately, the results of the other analogue implementations are not available for comparison [7], [17].  Four different neural network structures were created, as is shown in Tab. 2. All these structures were created both in an analogue way and in a classical way. The result is shown in Tab. 1. The designed circuit still contains blocks defined on the formula level. This means that the result in real implementation is expected to be slower. Charging capacitors is the most time-consuming in this case. Values of capacitors and charging capacitors are depended on final hardware implementation. For example, in simulations, all capacitors have the value 0.1 pF and their maximal charged current is 100 µA per capacitor. These values will be changed according to specific implementation so that the parasitic properties do not affect the charging process. The block that is most prone to parasitic properties and therefore, the speed in this concept is the four-quadrant multiplier, which can tend to 40 GHz [21].
In AANN, the training time depends only on the size of the dataset and training time of one row of the dataset. For the classical neural network, the training time depends mainly on the structure of the network and not only on the size of the dataset.
The results show that potential real-time speed is around several orders of magnitude faster as compared to the known implementation.

Conclusion
This article introduces the concept of a novel fully analogue artificial neural network. It presents a block scheme for the fully-analogue backpropagation algorithm with stochastic gradient descent, which can be used for the supervised training of feedforward neural networks. This AANN was designed for the near-sensor application. It requires to be much faster and rather smaller than the classical neural networks and process analogue signals as inputs directly from sensors. This design is built on an idealized structure subjected to simulations. Simulations have shown that this block structure works and is around several orders of magnitude faster than the classical neural networks. It means that a real structure is expected to be significantly faster too.
Future work will be focused on the preparation of the transistor-level design. Furthermore, it will be possible to extend this design of AANN to the other structures, such as Autoencoder, Recurrent Neural Network (RNN), Convolutional Neural Network (CNN), Deep Neural Network (DNN), Long Short-Term Memory (LSTM) etc. From these structures, the CNNs seem to be the most useful option because they are used in applications at the limits of today's computer power.