Effect of conductance linearity of Ag-chalcogenide CBRAM synaptic devices on the pattern recognition accuracy of an analog neural training accelerator

Pattern recognition using deep neural networks (DNN) has been implemented using resistive RAM (RRAM) devices. To achieve high classification accuracy in pattern recognition with DNN systems, a linear, symmetric weight update as well as multi-level conductance (MLC) behavior of the analog synapse is required. Ag-chalcogenide based conductive bridge RAM (CBRAM) devices have demonstrated multiple resistive states making them potential candidates for use as analog synapses in neuromorphic hardware. In this work, we analyze the conductance linearity response of these devices to different pulsing schemes. We have demonstrated an improved linear response of the devices from a non-linearity factor of 6.65 to 1 for potentiation and −2.25 to −0.95 for depression with non-identical pulse application. The effect of improved linearity was quantified by simulating the devices in an artificial neural network. The classification accuracy of two-layer neural network was seen to be improved from 85% to 92% for small digit MNIST dataset.


Introduction
In recent years, the advancement of semiconductor technologies coupled with the critical need for 'big' data processing has prompted renewed interest in artificial intelligence and neuromophic computing [1]. Neuromorphic computers have shown significantly improved efficacy over traditional computing architectures in large scale visual, auditory as well as classification tasks [2]. A deep neural network (DNN) is one type of a neuromorphic computing system garnering substantial interest today, particularly for applications requiring data intensive pattern recognition and machine learning. In order to achieve continuous improvements in accuracy, the depth and size of the deep neural network needs to significantly increase. However, the need for larger scaled neural networks poses a significant challenges in hardware implementation, particularly with respect to the traditional metrics of power, performance and density [3].
Many DNN algorithms like backpropagation and sparse coding execute iterative vector-matrix multiplications (VMMs). Various research groups have tried to implement these matrix operations with conventional analog/digital CMOS circuitry [4,5]. However, many CMOS implementations are not well suited to meet targeted specifications for power consumption, latency, or functional area [6]. To overcome such limitations, hybrid DNN architectures that combine CMOS with resistive RAM (RRAM) devices have been proposed. RRAM devices have shown better scalability to sub-lithographic limits, multilevel conductance characteristics, low power operation, good data retention and endurance. RRAM devices can be fabricated in a crossbar architecture which helps in the implementation of high density VMM functions. For example the computing power efficacy of a crossbar RRAM array has been estimated to be 31 000 times the state-of-the-art microprocessor [3,7].
RRAM arrays can be used as synaptic weight elements in DNN VMM. In the configuration that we are considering here, the output of one column in the vector-matrix operation is an output current (I j ), which is the sum of the products of row voltage (V i ) and the synaptic weight matrix elements (G ij ). Mathematically,  this is expressed as This VMM operation is directly applicable to the inference carried out on a neural accelerator. The in situ training operation on a neural accelerator involves a more complex and energy intensive operations-VMM, matrix vector multiply (MVM) and outer product update [8].
The synaptic weight corresponds to the conductance across a two-terminal RRAM device. Conductance can be modified with an applied voltage pulse. The increase in conductance is called potentiation and a decrease in conductance is called depression. For training, the RRAM devices should have more analog conductance states for precise weight tuning, small variability in read and write operation and high endurance so that the system can be trained efficiently and accurately [9]. An efficient in situ learning neural network can be potentiated or depressed by a pulse, independent of its initial resistance state. Therefore, it is desired that the change in RRAM conductance be a highly linear and symmetric function of the number of voltage pulses applied, e.g., linear analog response as depicted in figure 1 [10]. Earlier works show that the dependence of conductance change to the input signal is directly related to the learning accuracy of a neural systems [11,12].
Conductive bridge random access memory (CBRAM) is one type of RRAM device technology [13][14][15][16]. The memory states of the CBRAM are defined by the formation and modulation of a conductive metal filament as depicted in figure 2 [17]. The operation of these devices is determined by metal ion transport processes through an ion conducting electrolyte and chemical redox (reduction/oxidation) reactions. Upon the application of a positive bias, oxidation takes place at the CBRAM anode, where metal ions are formed. These ions transport across the electrolyte layer and are reduced at the cathode. As the redox reaction continues, a metallic conductive filament is formed across the electrolyte layer. Once the filament bridges the anode and the cathode, the devices can be modulated over a range of 'low resistance states' (LRSs) by increasing or decreasing the average cross-sectional area (width) of the filament; conductance being to first order proportional to filament width. Unfortunately, the incremental change in filament width is typically dependent on the conductance state [18], which introduces inherent non-linearities in the CBRAM response to pulse number. However, it is possible to control these non-linear mechanisms by exploiting the dependence of filament growth and dissolution on the amplitude and pulse width of the programming voltage [18]. Ag-Ge 30 Se 70 CBRAM devices have empirically demonstrated monotonic conductance switching with controllable linearity through the voltage pulse profile [18][19][20]. In this paper, we present a quantitative analysis of conductance change as a function of two different pulsing schemes: constant amplitude pulses and increasing amplitude pulses. Beyond this analysis, we model, using Cross-Sim, an open-source software, developed by Sandia National Laboratories, how various degrees of non-linearity affect the classification accuracy of a neural accelerator that utilizes these RRAM synapses [21].

Experimental details
The CBRAM crossbar devices examined in this work were fabricated on Si wafers coated with a 170 nm layer of low-pressure chemical vapor deposited (LPCVD) Si 3 N 4 . This layer served as a passivation layer between the Si substrate and the device structure. Next, a nickel cathode is created by evaporating 65 nm Ni using a Lesker PVD electron-beam (e-beam) evaporator. The nickel electrode was patterned using positive photoresist and etched using the nickel etchant TBF. To provide isolation between devices in the crossbar array, 100 nm of SiO 2 was deposited using plasma enhanced deposition (PECVD). The SiO 2 layer was patterned using double resist technique. To create device vias contacting to the Ni electrode layer, the insulating layer was etched using reactive ion etching. Next, as double layer lift-off resist was spun onto the wafer to define the Ag-Ge 30 Se 70 switching layer region. A 65 nm Ge 30 Se 70 layer was deposited using a thermal evaporator followed by a 30 nm Ag deposition. The wafer was then removed, and Ag photo-doping was carried out by placing the wafer under the 3.5 mW UV source for 30 min [22,23]. The wafer is placed back into the thermal evaporator where an additional 35 nm of Ag is deposited to create additional supply of the active metal. The wafer is then placed in acetone to remove the excess material, creating regions of Ag-Ge 30 Se 70 centered at each via. To form the top crossbar contact, double layer lift-off resist was deposited and patterned followed by 350 nm of Al deposition using a physical vapor deposition (PVD) process. After the deposition, the wafer is dipped in acetone to get rid of excess Al metal. The final device structure is depicted in figure 3.

Constant pulse programming
The current-voltage (I-V) characteristic of one fabricated CBRAM synapse is shown figure 4. The figure shows the characteristic hysteresis DC switching behavior over three voltage sweep cycles. It is important to note that Ag-Ge 30 Se 70 CBRAM devices are forming free devices since Ag is introduced in the switching layer by the process of photo dissolution [17]. The DC sweep was performed using Agilent 4156 C analyser. Bipolar switching behavior was observed reproducibly over 50 cycles. It should be noted that these large-scale DC characterizations which toggle the CBRAM cell between HRS and LRS are performed to assess general operational integrity not the incremental LRS switching required for use as a synaptic element. Cumulative distributions for all HRS and LRS measurements are shown in figure 5. The distributions show a minimum HRS to LRS ratio above 20, well above the target minimum of 10 [2].
To work as an analog synapse, CBRAM devices must demonstrate the capacity to switch incrementally into multiple LRSs. Previous works [19,20] on Ag-Ge 30 Se 70 CBRAM devices have demonstrated the multilevel conductance characteristics as a function of compliance current.
Multilevel conductance switching is demonstrated by applying voltage pulses to top electrode (Ag anode) while the bottom electrode (Ni cathode) is grounded. The measurements are performed using the Keithley 4200 SCS parameter analyser and the in-built 4225 PMU module. To achieve gradual switching behavior, we applied 100 consecutive SET pulses (0.35 V for 100 ns) and 50 consecutive RESET pulses (−0.25 V for 100 ns). Each write and erase pulse was followed by read voltage of 30 mV to extract conductivity. The change   in conductance of the device is plotted as a function of pulse number in figure 6 for 20 cycles of SET/RESET operations. For these input pulse parameters, the results indicate a highly non-linear response in device conductance, where the incremental change in conductance is greatest for the early SET/RESET signals in each sequence, saturating quickly to a to a maximum/minimum level. We observe a 10× change in conductivity for the 100 ns pulse width.
The effect of pulse width on the switching response is observed in figure 7. For these data, the SET/RESET pulses have a longer pulse width of 100 μs. An increase in maximum and minimum conductance range is observed for higher pulse width, which can be attributed to the greater widening of the conductive filament during the longer voltage stress time. Figure 8 plots the response for both pulse widths over one cycles and shows that for shorter pulses, i.e., 100 ns, we observe a larger G MAX /G MIN ratio and lower conductance levels. When considering the use of these devices for parallel VMM programming lower conductance levels are preferred since this will reduce the energy required for both VMM programming and inference [6].
The conductance update response for both pulse widths was compared by normalizing the conductance range as depicted in figure 9. The conductance update for the potentiation is nearly identical for both the pulsewidths. The depression behavior for 100 ns pulse width seems to be better. A detailed study of the linearity response is provided in the following subsections.

Variable pulse programming
An increasing amplitude pulses were applied to the anode of the device. Increasing amplitude voltage pulses from 0.2 V to 1.5 V (pulse width = 100 μs and 50 mV step) for potentiation and −0.2 V to −1.2 V (pulse width = 100 μs and 50 mV step) for depression were applied. Each pulse was accompanied by a read pulse of 50 mV with a pulse width of 100 μs. Figure 10 shows the potentiation and depression characteristics of the device. Figure 11 depicts the response of the devices to multiple pulse cycles. The devices have shown to exhibit good linearity response over a suitable conductance range. This linearity response of the device is exploited by varying the thickness of the metallic filament. One should however note that using varying amplitude pulses eliminates the advantage of parallelism since it necessitates addressing each analog resistive element in a crossbar, individually.
This problem can be resolved by using combination of resistors in series with RRAM device as suggested in some previous works [24,25]. Moon et al suggest using a voltage divider arrangement in which a fixed resistor is connected in series with the CBRAM device as shown in figure 12 [24]. However, for potentiation, i.e., increasing the CBRAM conductance (i.e., reducing the resistance), it will not be possible to use a fixed   voltage divider to realize an increase in the pulse voltage across R CBRAM . To increase the V CBRAM pulse while R CBRAM switches to a lower resistor a more sophisticated design is required.
One proposed design is shown in figure 13. In this design, the constant voltage pulse input, V IN , toggles a custom 4-bit mixed signal counter, CNTR1, which selects eight different bias currents to CBRAM device. When the 4th bit of CNTR1 is set low, a second 4-bit counter, CNTR2, is enabled. The first three output bits of CNTR2 combine with the CNTR1 currents to add another four monotonically increasing currents to the sequence (the last three 3-bit outputs of CNTR 2 can be neglected as the CBRAM device will have reached a level close to its minimum value by that step). The 4th bit of CNTR2 is fed back to the enable switch of CNTR1. When the 4th bit on CNTR2 is toggled, CNTR1 is disabled, and the pulse sequence terminated. The currents are set by biasing a select combination of p-channel transistors in saturation mode. The counter outputs are toggled between VDD (off-state) and VDD-V ref (saturation), where V ref is a tunable voltage slightly larger than the p-channel threshold voltage, which ensures the p-channel transistors are biased in saturation. When the input voltage is low, an n-channel transistor in parallel with R CBRAM is activated to shunt current to ground, which debiases the R CBRAM during that half cycle of the input pulse.
The increasing current magnitude sequence selected by the counter is determined as follows. To achieve a V CBRAM that increases approximately linearly as R CBRAM is reduced, the following condition must be satisfied: (2)  In equation (2), I i is the selected current at pulse number i. M is a negative value and B is positive value set to match the targeted (V CBRAM , R CBRAM ) order pairs, e.g., (0.2 V, 2 kΩ) and (1.36 V, 730 Ω). By rearranging equation (2), the targeted I i is therefore, (3) Figure 14 shows the simulated response of R CBRAM and V CBRAM with the input voltage pulse sequence for V IN . It should be noted that using switch programming, the hardware of the potentiation circuit (figure 13) can be reconfigured for use in R CBRAM depression.
To mathematically extract the non-linearity factor, the potentiation and depression response of the device to the 2-pulse scheme response are fit to the following equations [26] where, G, P, and A are the conductance value, pulse number, and nonlinear behavior of weight update, respectively. G max and P max are the maximum conductance and pulse widths, respectively, obtained from the experimental data. The different nonlinearity factors for potentiation and depression are obtained by simulating the curves using MATLAB with equations (1) and (2), respectively [27]. The non-linearities for the potentiation and depression are found to be 6.65 and −2.5, respectively, for the constant amplitude (P.W. = 100 ns) as shown in figure 15. Similarly, for the increasing amplitude pulses, the non-linearity factors the potentiation and depression was found to be 1 and −0.96 respectively as depicted in figure 16. The impact of linearity and weight update behavior on system response to different pulse schemes is studied by modeling these devices in an analog neural training accelerator. An artificial neural network was simulated using the device properties of Ag-Ge 30 Se 70 CBRAM. The CrossSim simulator was used to perform the supervised learning [28,29]. A three-layer neural network depicted in figure 14 was trained using the backpropagation algorithm. This algorithm is a computationally intensive algorithm that uses the two important kernels-vector matrix multiply (VMM) and outer product update. For this work, we have used two datasets: a small image version (8 × 8 pixels) of handwritten digits from the 'optical recognition handwritten digits' dataset [30] and a large image version (28 × 28 pixels) of handwritten digits from MNIST dataset [31]. A   two-layer neural network consisting of 64 input, 36 hidden and 10 output nodes for small image dataset. Similarly, an MNIST dataset was trained on a network of dimensions (784 × 300 × 10). A crossbar array is simulated using the analog synaptic properties. The crossbar a part of the neural core performs the required kernel operations. To perform the vector matrix multiplications, the conductance states of the synaptic devices are programmed by applying input voltages or input pulse lengths to each row and the corresponding output vector is read in the form of current. Parallel read, multiplication and summation operation is performed in this single step. This helps in reducing the total energy of vector-matrix multiply (VMM) and highlights the main advantage of using analog resistive memories for these operations ( figure 17).
To simulate the response of the Ag-Ge 30 Se 70 CBRAM devices in the neural network, a look up table is created containing ΔG/G response of the devices to multiple potentiation and depression cycles at a given pulse widths [32]. For small image dataset, we observe the training accuracy of 85% for constant amplitude pulsing as depicted in figure 18(a) while the training accuracy for the same dataset with increasing pulse amplitude was found to be 92% as seen in figure 18(b). For large digits the accuracy was seen to be improved from 79%  to 87% for increasing pulse amplitude as seen in figures 19(a) and (b). The ideal training accuracy using a double-precision CPU or GPU is 98%.

Conclusion
In conclusion, we have demonstrated the response of Ag-Ge 30 Se 70 CBRAM to two different pulse schemesconstant amplitude and increasing amplitude pulse scheme. Also, the effect of varying pulse width was studied. The device shows a 10× increase in conductance for shorter pulse width but for longer pulse width we observe an increase in conductance range attributed to widening of the filament. A more linear and symmetric synaptic response of the device to increasing amplitude pulse response was observed. Further, an artificial neural network was simulated based on the measured device properties to demonstrate supervised learning abilities. The training accuracy was found to improve from 80% to 92% for small image dataset and from 79% to 87% for large image dataset using the increasing pulse amplitude scheme.