Back-Propagation Operation for Analog Neural Network Hardware with Synapse Components Having Hysteresis Characteristics

To realize an analog artificial neural network hardware, the circuit element for synapse function is important because the number of synapse elements is much larger than that of neuron elements. One of the candidates for this synapse element is a ferroelectric memristor. This device functions as a voltage controllable variable resistor, which can be applied to a synapse weight. However, its conductance shows hysteresis characteristics and dispersion to the input voltage. Therefore, the conductance values vary according to the history of the height and the width of the applied pulse voltage. Due to the difficulty of controlling the accurate conductance, it is not easy to apply the back-propagation learning algorithm to the neural network hardware having memristor synapses. To solve this problem, we proposed and simulated a learning operation procedure as follows. Employing a weight perturbation technique, we derived the error change. When the error reduced, the next pulse voltage was updated according to the back-propagation learning algorithm. If the error increased the amplitude of the next voltage pulse was set in such way as to cause similar memristor conductance but in the opposite voltage scanning direction. By this operation, we could eliminate the hysteresis and confirmed that the simulation of the learning operation converged. We also adopted conductance dispersion numerically in the simulation. We examined the probability that the error decreased to a designated value within a predetermined loop number. The ferroelectric has the characteristics that the magnitude of polarization does not become smaller when voltages having the same polarity are applied. These characteristics greatly improved the probability even if the learning rate was small, if the magnitude of the dispersion is adequate. Because the dispersion of analog circuit elements is inevitable, this learning operation procedure is useful for analog neural network hardware.


Introduction
The artificial neural network (ANN) is receiving research interest, for example, due to deep learning approaches that are improving recognition rates in benchmark classification problems [1,2]. There have been studies on large-scale digital processing built upon the conventional CPUs and GPUs [3]. However, built only with digital circuits [4], the ANN hardware requires a large volume of memory. It is true that the algorithm improvement can reduce the memory size, but cannot solve the fundamental problem. A hardware-level solution must be proposed.
One of the solutions is introducing a neuromorphic device. To realize ANN hardware, the circuit element for synapse function is important because the number of synapse elements is much larger than that of neuron elements. One of the candidates for this synapse element is a memristor [5,6]. Because the conductance of the memristor depends on the history of the applied voltage, it can realize the synapse function [7,8]. The memristor-based memories can achieve a very high integration density of 100 Gbit/cm 2 , a few times higher than flash memory technologies [9]. These unique properties make it a promising device for massively parallel, large-scale neuromorphic systems [7,10]. Hu et al. have also reported the potential of a memristor crossbar array that functions as an associative memory [11].
We have also examined the synapse function using a ferroelectric memristor (FeMEM) [12,13]. Because FeMEM could be operated at a 60 nm channel length [14], high density integration of FeMEM synapse device can be expected. We demonstrated the conductance change according to the biologically inspired learning method of spike-timing-dependent synaptic plasticity (STDP) [15,16]. As the FeMEM has three terminals, concurrent learning can be realized. We constructed an analog circuit with FeMEM synapses for a Hopfield neural network, and by using this STDP learning method, we demonstrated the learning and recalling of patterns [17,18].
To realize generic ANN hardware, we should adapt the learning method to a back-propagation (BP) algorithm [19]. Ishii et al. reported hardware BP learning for neuron metal-oxidesemiconductor (MOS) neural networks [20]. However, the neuron MOS did not have non-volatile memory to store the learned synapse weight.
By applying a memristor as a multivalued memory, many researchers have reported ANN hardware having memristor synapses [6][7][8]11,17,18,21]. However, a memristor has hysteresis characteristics of input voltage and conductance. The conductance values vary according to the history of the applied voltage and its width. These characteristics make it difficult to control its conductance to the desired value. Therefore, it is not easy to apply the BP learning algorithm to ANN hardware having memristor synapses. The purpose of this paper is to develop a simple procedure for the BP learning operation, which can be applied to analog ANN hardware with synapse devices having hysteresis and variability.
Analog Neural Network Hardware with FeMEMs 1. Feed-forward neural network Figure 1 shows the analyzed feed-forward neural network structure. This structure has two inputs in the input layer, three neurons in the hidden layer, and one neuron in the output-layer. A neuron has multiple synapses. A fundamental calculation for neural networks is the product sum operation described by where M(i) is the output from hidden-layer neurons, S out is the output from the output-layer neuron, S in (1) and S in (2) are two inputs, w m (i, j) and w o (i) are the synapse weights of the hiddenlayer neurons and the output-layer neurons, respectively. The function f(-) is a threshold function; a sigmoidal function is frequently used. In Figure 1, both S in (3) and M(4) are bias inputs, and their values are unity.

ANN hardware
To realize an analog neuron device, we examined a circuit based on an operational amplifier (op-amp) adder circuit. Using FeMEMs and an op-amp, a neuron circuit was constructed as shown in Figure 2(a). R F is a fixed resistance, whose conductance is G R . To achieve a synapse function using an FeMEM, the synaptic circuit modules were devised that consist of inhibitory/ excitatory synapse pairs. As the op-amp adder circuit is an inverting amplifier circuit, the inhibitory pairs receive raw input directly and the excitatory ones receive inverted copies of the raw input voltage via a unity gain inverting amplifier. Although this synapse circuit construction needs two FeMEMs, a highly functional neuron circuit can be realized, because the modulation of synapse weight is easier to control individually with two FeMEMs. Here, we denote the channel conductance of FeMEM as G F . Also, we denote G F for the excitatory synapse as G E (i) and G F for the inhibitory synapse as G I (i). The sum of amplified voltages, or the inner potential (u), is calculated as where N in is the total number of inputs, V in (i) is the input voltage. The non-linear output voltage (V out ) of the op-amp is As this circuit is an inverting amplifier circuit, the plus and minus signs of u are reversed for the output voltage. Thus, the input voltage for excitatory FeMEM is reversed by using an inversion circuit. Using G F , the synapse weight (w) can be calculated as G F / G R . We use the output of the op-amp as that of a neuron circuit for convenience in constructing the circuit, although, in general neural networks, the output of a neuron is calculated using a threshold function such as a sigmoidal function. Figure 2 Preparation of ANN Hardware and Proposal of Learning Operation

Structure and procedure for preparation of the FeMEM
We fabricated a FeMEM structure based on insights gained in previous studies [12][13][14]. As shown in Figure 3 Figure 3(c). The G F -V G characteristics were measured using a semiconductor parametric analyzer (Agilent 4155C) under the condition of long integration time. By measuring the drain current under the condition of drain voltage = 0.1 V, G F was calculated. The drain voltage was set to be low so as not to change the polarization of the ferroelectric. The figure shows counterclockwise hysteresis loops corresponding to the switching of ferroelectric polarization. The conductance at V G = 0 V changed according to the history of applied V G and could thus take multiple values. It was confirmed that there was no notable degradation of conductance over 10 5 s [12]. These characteristics allowed the construction of an analog ANN circuit with synapse elements using the FeMEM [15][16][17][18]21].

Electrical characteristics of the synapse circuit
We examined the performance of the basic neuron circuit. The experimental setup used to evaluate the relation of the pulse voltage (V P ) and the conductance of the FeMEM is shown in Figure 4(a). The devices we use in this experiment have been tested before and were found to exhibit good non-volatility characteristics [12]. The pulse width of V P was set to 1 ms. To enhance the conductance repeatability, the conductance was measured after applying a reset pulse (V R ). V R = 22 V when V P . 0 and V R = 3 V when V P ,0. V P was first increased from 0 to 3 V in 0.2 V steps and then reduced from 0 to 22 V in 20.2 V steps. In the same manner as G F -V G measurement, the drain current was measured under the condition of drain voltage = 0.1 so as not to change the polarization of the ferroelectric.
This scanning operation was performed 300 times. From the measured V out , G F was calculated according to Because V R and V P are pulse voltages, the voltages are applied only during the weight update. This enables the reduction of the number of voltage sources as the pulse voltages can be applied by switching a voltage source. Moreover, the power consumption for maintaining the synapse weight is zero.
The average and standard deviation of calculated conductance are shown in Figure 4(b). Smooth counterclockwise characteristics were observed. The conductance change was in the range of 0.5610 26 -40610 26 S for the investigated V P range.
To analyze a learning operation, as an alternative approach, we prepared a numerical model of the FeMEM conductance by fitting experimental data. We fitted the average conductance with sigmoidal functions that are commonly used in modeling ferroelectric functions [22]. We manually fitted the two curves of increasing voltage and decreasing voltage and derived the equation.
where a = 3 V 21 , G max = 45610 26 S, G min = 0.5610 26 S, h ( = h 1 ) = 2.3 V (for increasing V P ) and h ( = h 2 ) = 20.6 V (for decreasing V P ). The fitting curves are shown in Figure 4(b) as broken lines. Approximate characteristics were expressed, regardless of there being few parameters.

Proposal of BP operation of hysteresis synapse devices
BP is the most widely applied learning methods for training an ANN. When a synapse weight (w) is changed, the outputs of the ANN also change, which is known as weight perturbation [23]. The synapse weight w is updated according to where g is the learning rate, w new and w old are the synapse weights after and before the update, respectively. The square error (E) is calculated according to where T out is a target output. Because the ANN in this study has only one output, E can be calculated simply according to (8). DE is the difference between the square errors after and before the update and is calculated according to where the subscript 1 and 2 indicates before and after the update, and the superscript n indicates the input pattern number defined in table 1.
As the synapse weight (w) of the analog ANN hardware in this paper is calculated as w = G F /G R , w is proportional to G F . Because G F is a function of V P as shown in (6), we simply updated V P according to where V P(new) and V P(old) are V P s after and before the update, respectively, and DV is the minute V P change to obtain the error difference. Moreover, to eliminate the effect of the hysteresis of the conductance, we applied the following procedures. The detailed flowchart is shown in Figure 5. In this flowchart, V PE and V PI are V P s for the excitatory and inhibitory synapses, respectively, whose values are stored in an external memory. H is the designated threshold value to exit from the learning procedure. E sum is the sum of the square errors for all target output values and is calculated according to: The learning procedures are as follows: A. Select a synapse to update. In this paper, we started from output-layer. B. First, the FeMEMs for excitatory synapses are updated; V P(old) = V PE . C. Error E 1 at a point is calculated according to the outputs from ANN hardware and target output values. D. Slightly larger V P in amplitude than the stored previous V P(old) is applied to the FeMEM constructing the target synapse. That is, when DV.0, V P(old) + DV is applied in the case of increasing V P , and V P(old) 2 DV is applied in the case of decreasing V P . E. Error E 2 at a point is calculated in the same manner as step C. F. If E 2 #E 1 , according to (10), V P(new) is updated and stored in an external memory. In this case, V R is not applied. We term this operation ''non-inverting operation''. G. If E 2 . E 1 , V R is applied according to V P polarity as explained in Figure 4(b). Subsequently, V P of the same conductance on another curve in Figure 4(b) is updated and stored; i.e., V P(new) = V P 2 (h 1 2 h 2 ) if V P is increasing and J. If E sum is larger than H, then the process returns to step A, else the learning procedure is finished.
At the step G, V P is switched and jumps from one curve to another in Figure 4(b). The reset pulse V R and this V P jump eliminate the hysteresis of h 1 2 h 2 .  Results and Discussions

Learning of the boolean logic of exclusive OR
To evaluate the proposed learning operation, we numerically analyzed the learning process of the Boolean logic of exclusive OR using (10). The high and low signals were set to 1 and 21 V, respectively. In this analysis, DV = 10 mV, g = 0.01 V 22   The write pulses (V P ) for the excitatory and inhibitory synapses are defined as V PE and V PI , respectively. V P is slightly changed (DV) according to the V P direction and the error change is calculated. When the error increases (E 2 .E 1 ), after applying V R at step G, V P is switched and jumps from one curve to another in Figure 4  in Figure 6. One loop involves the update of all synapses in output and hidden layers.
S out values fluctuated until about 200 loops; however, the values gradually became correct afterward (Figure 6(a)). E sum gradually decreased from 200 loops, and the learning operation successfully converged (Figure 6(b)). By changing the initial values of G F randomly, we simulated 100 learning processes and examined the loop number required for reaching E sum #0.1 V 2 . As shown in Figure 6(c), the loop number for reaching E sum #0.1 V 2 with the highest frequency was 200-400 bin. However, there were cases in which E sum was larger than 0.1 V 2 after more than 2000 loops. Here we denote the probability for reaching E sum #0.1 V 2 within 2000 loops as P C . In this case, P C was about 60%.

Adoption of conductance dispersion
In the Section 4.1, a simulation was carried out for the conductance characteristics of the FeMEM using (6). However, as seen from Figure 4(b), the conductance showed dispersion. In this Section, we adopt this dispersion numerically to simulate under a more realistic condition. From the results in Figure 4(b), the coefficient of variation (CV), which is calculated by dividing the standard deviation by the mean, was plotted as shown in Figure 7.
The results show that the CV is less than 0.1 for all values of conductance. Therefore, when we calculate G F for applied V P , we introduced random dispersion so that the CV = 0.1. The conductance G F 9, which involves the dispersion, is calculated according to where j indicates the Gaussian dispersion, which is calculated by Box-Muller method.
Here, we introduce the properties of ferroelectric material to this j. Under the condition that voltage pulses have the same polarity and sufficient pulse widths, the polarization of the ferroelectric changes only when the maximum amplitude voltage in its history is applied. In this experiment, the pulse width of V P was set to 1 ms, which is sufficiently wide because the switching time of this device is less than 1 ms [13].
In the inverting operation, because the polarity changes, G F 9 changes according to (12). However, in the non-inverting operation, the G F 9 changes only when the maximum amplitude voltage is applied according to the properties of ferroelectric. As a result, G F 9 never decreases in cases of V P .0 and never increases in cases of V P ,0. In the following, we term this ''restriction effect''. In the simulation, this effect was realized by setting j = 0 in the case that (j,0 and V P .0) or (j.0 and V P ,0).
When the error is decreasing (E 2 ,E 1 ), because the noninverting operation is chosen, if j is not too large, the error hardly increases by G F '. Needless to say, if j is too large, because G F 9 jumped over the adequate conductance value, the error increases. It should be noted that the restriction effect enhanced the correct G F 9 change.
Taking this restriction effect into consideration, we simulated the learning process under the condition of CV.0. The simulation results are shown in Figure 8. Though, both S out and E sum showed large fluctuation, E sum rapidly decreased from 400 loops and fallen below the designated value at about 450 loops. In Figure 8(b), this point is indicated as ''Reaching point''. However, after that, E sum rapidly increased again. These results are very different from those of CV = 0 in Figure 6. When CV = 0, E sum decreased gradually from 200 loops and, after that, continued decreasing. When CV = 0.1, E sum fluctuated in large scale, however, E sum was not always large but fall down again below 10 23 V 2 . The results seemed not to diverge.
Finally, to clarify the effect of the conductance dispersion, we analyzed the relation between P C and g changing CV values. The results are shown in Figure 9.
When g&0:01 V {2 , regardless of CV value, because V P (and G F ) changed too large according to (12), G F jumped over the adequate value. As a result, P C became small. When g%0:01 V {2 , under the condition that CV = 0, V P (and G F ) changed so small that E sum hardly changed. Consequently, P C became small because P C is defined as the probability of E sum #0.1 V 2 within 2000 loops. In this case, the maximum P C of about 60% was obtained around g = 0.01 V 22 . On the other hand, under the condition of CV.0, though V P hardly changed, the learning operation progressed because the G F 9 changed. Moreover, in almost cases, E sum was expected to decrease by the restriction effect in case of non-inverting operation. Although it was true that there was a possibility that E sum increased in case of inverting operation, it was shown that high P C was realized in a large region of g#0.01 V 22 , especially CV,0.1. When CV is too large (CV$ 0.3), as explained above, because G F ' jumped over the adequate value, the error increased and P C became small regardless of g value. The CV is a difficult parameter to control. When the absolute value of reset voltage (DV R D) is higher, the repeatability of conductance improves because the ferroelectric polarization is along the major loop, whereas when the DV R D is lower, because the ferroelectric polarization is along the minor loop, the dispersion of conductance increases. Thus, rough control of the CV is possible by changing the reset voltage value. Because P C is high in a comparatively wide region of the CV, high P C can be achieved by controlling the reset voltage.
As for neural networks, it is commonly known that noise assists in the escape from local minima [24]. Conversely, for analog ANN hardware, noise is harmful to learning because voltages cannot be controlled strictly. As for FeMEM, however, we found that appropriate dispersion realized large P C value.
The proposed operation procedure is simple and easy to implement in hardware yet is capable of eliminating the effect of hysteresis and is robust against the dispersion of conductance. Another type of memristor also displays the hysteresis [7,8]. By expressing the relation between the conductance and the applied voltage as equations and analyzing the CV, our approach can be also applicable to such memristors if they also exhibit restriction effect.

Conclusions
A BP learning operation was studied for analog artificial neural network (ANN) hardware having a ferroelectric memristor (FeMEM) synapse. The synapse weight was expressed by the channel conductance (G F ) of the FeMEM. After applying a reset pulse, by changing the height of the pulse voltage (V P ), smooth counterclockwise characteristics of G F -V P were observed.
To eliminate the effect of hysteresis of the conductance, we proposed a learning operation, by which G F always traveled on either curve of two G F -V P relations. By this operation, because G F traveled practically along a continuous function, we confirmed that the simulation of the learning operation converged.
The measured G F had a coefficient of variation up to 0.1. Therefore, we adopted conductance dispersion numerically in the simulation. As a result, the dispersion introduced large fluctuation in the converging process; however, the probability (P C ) for reaching E sum #0.1 V 2 within 2000 loops was not so poor. Moreover, when the learning rate was smaller than 0.01, P C greatly improved to 85%. These results were obtained by the properties of ferroelectric. When V P was not inverted, the dispersion affects only in the direction of decreasing error. Dispersion is not a controllable parameter but a characteristic of the FeMEM; however, it can be roughly changed by the reset voltage.
The proposed operation procedure is simple and easy to implement in hardware. Considering the inevitability of the dispersion of analog circuit elements, this operating procedure is useful for analog ANN hardware. As the scale of ANN processing is increasingly growing, analog ANN hardware is a promising candidate for effective calculation and will play an important role in energy saving for large-scale ANNs.  . Relation between P C and g under the condition of coefficient of variation = 0-0.3. P C is the probability for reaching sum of the square errors (E sum ) #0.1 V 2 within 2000 loops, and g is the learning rate. When the coefficient of variation (CV) is appropriate, the pulse voltage changes adequately even though g = 0 V 22 . These results show that, in wide regions of the CV, high P C is realized. When g&0:01 V {2 , the pulse voltage (V P ) changed so large according to (10) that the conductance of FeMEM (G F ) also changed large and that P C became small regardless of CV. When g%0:01 V {2 , V P hardly changed in case of CV = 0. However, in case of CV = 0.1, the learning operation progressed because the G F ' is not so small according to (12). Moreover, in almost cases, E sum was expected to decrease by the restriction effect in non-inverting operation. doi:10.1371/journal.pone.0112659.g009