Analyses of a 1-layer neuromorphic network using memristive devices with noncontinuous resistance levels

The emerging nonvolatile memory technology of redox-based resistive switching (RS) devices is not only a promising candidate for future high density memories but also for computational and neuromorphic applications. In neuromorphic as well as in memory applications, RS devices are configured in nanocrossbar arrays, which are controlled by CMOS circuits. With those hybrid systems, brain-inspired artificial neural networks can be built up and trained by using a learning algorithm. First works on hardware implementation using relatively large and high current level RS devices are already published. In this work, the influence of small and low current level devices showing noncontinuous resistance levels on neuromorphic networks is studied. To this end, a well-established physical-based Verilog A model is modified to offer continuous and discrete conduction. With this model, a simple one-layer neuromorphic network is simulated to get a first insight and understanding of this problem using a backpropagation algorithm based on the steepest


INTRODUCTION
Current computing systems are designed for computation purposes. However, there is an increasing need for more cognitive tasks as pattern recognition. As we, humans, are very good in solving such tasks, people developed brain-inspired artificial neural networks (ANN's) to handle these tasks. This culminated in the complex, multilayer structured Deep Learning (DL) systems of today that even achieve better-than-human performance. 1 Current DL ANN systems, however, are mainly software constructions that still run on classical von Neumann computers. While their computational performance has tremendously increased over the last decades, thanks to the advancement and scaling of CMOS technology, for implementing more cognitive tasks, they are much less efficient in terms of both system size and energy dissipation than the human brain. In addition, strong performance improvement is no longer expected due to the ending of Dennard scaling and of Moore's law. Hence, especially for edge computation applications, there is a need for more efficient hardware to realize these ANN systems.
The new emerging resistive switching (RS) devices 2 (also called "memristors" 3 or better "memristive devices" 4 ) offer interesting possibilities for an efficient hardware implementation of these ANN's: (i) RS devices can emulate synapse functionality (adaptable, nonvolatile weight) in a single device that is small and scalable compared to complex CMOS circuits that would be needed to directly emulate the same functionality; (ii) RS devices can be configured in 2D (and even 3D) crossbar arrays resulting in very dense network connectivity; and (iii) such crossbar configuration is very efficient for analog vector dot product computations. 5 The latter feature is of major importance as DL ANN's use a backpropagation (BP) learning algorithm to determine the synapse weight values during the ARTICLE scitation.org/journal/apm training phase. 6 The basic kernel of this BP algorithm is matrix multiplications involving a large amount of multiply and accumulate operations (MAC'S). This requires heavy computation and is the computation bottleneck. Using RS crossbar arrays, the results follow directly from a parallel current readout, and so they constitute efficient vector-dot-product engines. 5,7,8 The BP algorithm needs 6-bit resolution of the synaptic weights during the learning phase, requiring RS devices with analog programming behavior with many programmable levels. 9,10 Furthermore, because of energy considerations, the RS devices should operate at low currents (and low voltage). The major RS devices investigated for these applications are Phase-Change Memory (PCM) and redox-based resistive switching RAM (ReRAM) devices. 11 These devices were initially developed for applications in binary switching memories but also have potential for multilevel operation. In both devices, the switching from a high ohmic resistance state (HRS) to a low ohmic resistance state (LRS) is called SET and switching from LRS to HRS is defined as RESET. If a device is set and reset with the same polarity of the device, the switching is called unipolar, otherwise it is a bipolar switching device. PCM devices have the drawback of unipolar switching while analog programming behavior is available during SET. In contrast, ReRAM devices can switch bipolar. While standard memory type ReRAM devices show abrupt SET switching, by tuning the material stack, analog programming in both SET and RESET can be obtained. 12,13 This makes ReRAM devices of high interest for this application.
Filamentary switching ReRAM devices operate by the modulation of a conductive filament, which is induced by the motion of metal cations as Ag or Cu in electrochemical metallization (ECM) memory cells or by the motion of charged oxygen vacancies in devices based on the valence change mechanism (VCM). 14,15 The conductivity of this filament can be controlled either by the RESET voltage or by the maximum current level attained during SET operation. Depending on the chosen current level, however, quantization of different conduction states with large "gaps" in attainable values can be observed. [16][17][18][19][20][21][22][23] The aim of this work is to investigate how this quantization of possible conduction states in the ReRAM device influences the learning operation of an ANN. For a first insight, the small system presented in Prezioso's work 7 is analyzed by means of simulation using a deterministic ECM device circuit model featuring discrete conduction steps. Here, a BP learning algorithm based on the steepest decent method is used for a one-layer neuromorphic network. It is shown that the existence of discrete conduction states separated with conductivity gaps can have a strong influence on both the training cost and the system accuracy. 24

SIMULATION MODEL
To create a model that features the discrete conduction state mechanism, 18 two well-established models were combined. The first one is a well-established compact model for ECM cells, 25,26 which was also fitted to different ECM devices. The second one is a kinetic Monte Carlo (kMC) ECM model, which was derived also from the compact model in order to include switching variability and quantization. 18,27 By combining the quantization part of the kMC, which was compared to the findings in experimental data, with the compact model, a deterministic compact model featuring the quantized conduction mode is obtained. Panel (a) of Fig. 1 depicts together with panel (b) the equivalent circuitry of the continuous conduction model, whereas panels (a) and (c) show the discrete conduction model. In both cases, a filament grows from an inert electrode toward an active electrode, which consists of Ag or Cu. In the continuous conduction model, the tunneling process is modeled by a mean tunneling gap (xmean) and the tunneling processes appear over the complete filamentary area. In the discrete conduction model, the

ARTICLE
scitation.org/journal/apm filament is built up by an atom by atom and a layer by layer growth mechanism, which results in two parallel tunneling gaps. One tunnel gap describes the tunneling process of the residual layer (xres), which is the last complete layer in the filament, and the other specifies the tunnel process to the incomplete layer with a tunneling gap (x in ). The complete model description is found in the supplementary material. In Ref. 28, we showed that the switching kinetics simulated with 1D KMC model agree with the ones simulated with a more complex 2D KMC model, 27 which allows for the deposition of atoms at arbitrary positions. Thus, the approximation of the growth mode in our simple 1D model gives reasonable results while reducing the computation time by orders of magnitude. Figures 1(d) and 1(e) show the conductance of the two different models as a function of their respective state variables. In the continuous ECM model, the conductance increases exponentially if the tunneling gap decreases representing an analog device. The discrete model, however, shows regimes of quasianalog conductance separated by distinct conductance jumps. This jump occurs whenever a new layer appears. The first atom on the new layer gives the biggest conductance jump. This behavior has been reported for the Cu 2 S-based ECM cell. 29,30 With increasing number of atoms in the incomplete layer, the conductance jumps get smaller and eventually they become quasianalog. This transition can be illustrated if the logarithm of the conductance is divided into equidistant conductance bins and counting the number of states within these bins, which is done exemplary for the last two layers in Fig. 1(e). As shown in Fig. 1(f), the number of states is increasing until a complete layer forms. Between two layers, a conductance gap appears. The width of this conductance gap depends on the filament radius. The largest gap is always the gap between the first atom of the galvanic contact layer (Gcontact), which closes the gap between the filament and the electrode, and the full previous layer (G stub ). It is larger for smaller filament radii and vice versa [cf. Figs. 1(f) and 1(g)]. For very large filament radii, the conductance vs state variable plot will converge to the analog exponential behavior of the continuous model.
Both models are deterministic. Thus, the weight updates in response to an excitation is determined exactly by the current state of the cell. In reality, however, the weight update might change due to the intrinsic switching variability. Nevertheless, the conductance would show quantized levels. Using our simulation model, we are able to discriminate between the effects of switching variability and quantized conduction effects. In this study, we focus on the impact of noncontinuous resistance levels.

SYSTEM AND SETUP
Prezioso et al. 7 proposed an ANN which can distinguish between the three classes "z," "v," and "n" using a 3 × 3 pixel picture, resulting in a 9 input 3 output one layer ANN, and an algorithm based on the steepest descent. Here, the ReRAM devices are used to weight the different inputs for each output and can be changed by applying write pulse of height ±V write and pulse length t cycle . This system is adapted and built up for simulation using the continuous and the discrete model. It is chosen as a simple way to understand the differences in the behavior between the two buildups. A complete and extended description of the system and the learning algorithm is found in the supplementary material.
The simulations were performed using a combination of MAT-LAB and Cadence software. MATLAB acted as a control unit during the simulation, whereas the main circuit simulations were performed using the Spectre Circuit Simulator from Cadence. The training procedure starts with a read step of all possible input pictures, which is used to evaluate the network response and cost values. Afterwards, the weight updates were calculated using MATLAB. Weight update, read, and evaluation steps define one training epoch, which is repeated in a loop structure until the specified number of training epochs was reached. In one training epoch, the whole dataset of input pictures is used.

CONTINUUM MODEL SIMULATIONS
For first tests, the behavior of the conductance state depending on the applied write voltage pulses is studied. For this, the device was first read by a read voltage of V read = 0.1 V and then a voltage pulse of different pulse heights V write and t cycle = 10 μs length is applied. Afterwards, a second read pulse is performed. In Fig. 2(a), the difference of both simulated read conductance values is plotted

ARTICLE
scitation.org/journal/apm against the conductance of the first read. The conductance change shows not only a dependency on the applied voltage, as was expected due to the nonlinear switching kinetics of the device, but also a state dependence. In our model, this state dependence is based on the filamentary growth mechanism in an ECM cell 31,32 and the exponential dependence of the current on the tunneling gap. 33,34 The state-dependence has been shown for different ReRAM devices and this basis for analog weight adaption. [35][36][37][38][39][40][41][42][43] The behavior of the experiment of Fig. 2(a) is crucial for neuromorphic applications, since it gives a first hint of how to pick the write voltage for specific initialization states of the system. 44 If not stated otherwise, the standard parameters of Table I are used for simulation. The first step of all system simulations is the initialization of the devices with conductance states, which are drawn from a truncated normal distribution with the minimum and maximum value of the conductance G min = 0 μS and Gmax = 600 μS, respectively. To verify the correct behavior, the system was set up with the continuous model for two different initializations of the conductance state of the 60 devices. In case 1, the standard parameters were used. In case 2, the mean conductance value and the standard deviation were chosen to be μ = 1 μS and σ = 0.1 μS, respectively. To achieve a comparable learning speed, two different write voltages for these experiments are used. In case 1, the standard write voltage was chosen, whereas the write voltage of case 2 is set to 1.3 V.
The cost and training accuracy of the system are shown in Figs. 2(b) and 2(c), respectively. As expected, the training cost is reduced and the accuracy increases with the number of training epochs, showing the correct behavior of the system. Even though the accuracy reached 100% after a few epochs, the cost function could be further minimized. The accuracy only gives an account of the right classification for the training set, but it does not identify the insecurity of the decision. The cost function illustrates how well the decision is matching the ideal output and thus indicates the security of the decision.
In Fig. 2(d), the energy consumption of training the ReRAM array is depicted. Even though the write voltage of case 2 is higher, the energy consumption is less, since the array has an overall higher impedance than in case 1. Note that this energy is only the energy needed for the ReRAM array. The overall power consumption will be higher since the whole CMOS periphery and control unit is not taken into account. Overall, the system seems to be robust against the change in the conductance initialization if the write voltage is chosen properly with respect to the initialization.
As a next step, the convergence behavior of the system is further investigated. This time the system was five times initialized with the same distribution of conductance. Thus, the impact of the update voltage can be evaluated independent of the specific initialization. For each initialization, the write voltage was varied between 1 V and 1.6 V to find the optimum voltage for a stable, fast convergence.

ARTICLE
scitation.org/journal/apm for which the system mostly converges. The update voltages 1 V and 1.1 V show a fast and stable convergence. For 1.2 V, the system converges even faster, but it is not really stable and does not reach the high accuracy and low cost function values as for the other two voltages. Having a higher update voltage, the conductance change is higher in each step, which is beneficial in the beginning for the rough weight adjustment, but hindering the fine adjustment. Here, a voltage of 1.1 V is the best compromise. For this voltage, the system shows the best performance, as it converges faster than for 1 V and is more reliable than with 1.2 V. If the voltage is further increased, the problems occurring for 1.2 V will be even more present, as shown in Figs. 3(c) and 3(d). As shown before, if the system is initialized in a different conductance regime, the optimum voltage can be different, since the conductance change depends on the voltage and on the starting conductance [cf. Fig. 2(a)].

DISCRETE SIMULATION MODEL
If a discrete conduction mechanism is assumed, the system needs to be able to compensate for the nonexisting conduction states. To test the robustness of the system with regard to this effect, the discrete resistance model is used. Since the conduction gaps depend on the filament radius, the filament radii of all devices are set to 1 nm, 3 nm, 5 nm, 7 nm, or 10 nm. For the initialization, two extreme cases can be considered: initializations in a high state density region and in a low state density region. The standard initialization parameters of Table I (μ = 300 μS and σ = 20 μS) result in an initialization in the galvanic layer (low state density region). Here, the conductance state density is very low. Hence, the standard deviation was increased to σ = 60 μS. Figures 3(e) and 3(f) depict the cost function and training accuracy of these simulations. In Figs. 3(e) and 3(f), abrupt spikes are visible for all used filament radii. Thus, convergence does not seem feasible for the system. As a second test, the devices were initialized with μ = 1 μS and σ = 0.1 μS. Here, the targeted conductances are below G stub and thus in a higher state density region for filaments with r fil > 1 nm. For filament radii r fil > 3 nm and thus a nearby high state density, the cost function reduces quickly [ Fig. 3(g)]. For the filament radius of 1 nm, however, the targeted conduction is inside the gap between G stub and Gcontact. Accordingly, all cells are initialized to G stub . Like before, the write voltage is set to 1.3 V to achieve comparable learning rates. The results of these simulations are depicted in Figs. 3(g) and 3(h). Here, convergence is not achieved for filaments with 1 nm or 3 nm radii. The 1 nm case does not shows any convergence from the beginning. In contrast, the 3 nm case converges until the 42nd epoch, but then starts to oscillate.
To investigate the nature of those fluctuations, the transients of the conductances of randomly selected devices of the r fil = 5 nm, μ = 300 μS, and σ = 60 μS case are shown in Fig. 4(a). The gray horizontal lines depict feasible conductance states. All transients start in the states of the galvanic layer and as expected the transition between the states is abrupt. Thus, the change in the conductance is abrupt, too. If one device state starts to oscillate between G stub and Gcontact (pink and light green lines), fluctuations in the cost function result. Similar to using larger writing voltages, the weight jumps are too big to be handled properly with the algorithm. The weights cannot be adjusted fine enough due to the missing state density in the gap region. The algorithm works better for the (μ = 1 μS, σ = 0.1 μS) cases as the devices were initialized in a region of high state density.
Another possibility to avoid transitions over the gap, and thus to keep the conductance changes small enough to have less convergence issues, is to change the normalization factor β [cf. supplementary material (20)], since it defines the target weights. Until now, the standard value was 2 × 10 5 and is kept unchanged. Due to the standard normalization factor, the targeted weights are in the range of 5 μS-15 μS. Hence, the weights of the μ = 300 μS cases need to be decreased, whereas the weights of the μ = 1 μS cases need to be increased. To achieve lower targeted weights, β is set to 2 × 10 6 for the r fil = 3 nm, μ = 1 μS, and σ = 0.1 μS case. The comparison of the overall maximum and minimum conductance states of the system with both β values is depicted in Fig. 4(b). Using the standard β value, the maximum conductance value shows jumps to one point above Gcontact (turquoise). This jump corresponds to the start of the cost function fluctuation. In contrast, the maximum conductance value in the simulation with the increased β value stays below Gcontact during the course of the training.

DISCRETE SIMULATION MODEL WITH VARIABLE FILAMENT
In a second study, the filament radii were no longer assumed to be equal for all cells. They were drawn from a truncated normal In these simulations, the memristive synapses were initialized each with a randomly drawn filament radius r fil from a truncated normal distribution with r fil,min = 1 nm, r fil,max = 20 nm, μ fil = 7 nm, and σ fil = 2 nm.
distribution with r fil,min = 1 nm and r fil,max = 20 nm denoting the respective minimum and maximum filament radii, respectively. These choices of filament radii are consistent with experimental work. [45][46][47] The upper limit was chosen in order not to exceed cell dimensions (feature size F = 40nm). The mean and standard deviation were chosen to be μ fil = 7 nm and σ fil = 2 nm, respectively. Again, simulations were performed using both previously discussed initialization regimes: the standard initialization regime and the lower conductance regime (μ = 1 μS and σ = 0.1 μS). The resulting training performance indicators are visualized in Fig. 5

CONCLUSION
In summary, a Verilog A model of an ECM ReRAM was derived by combining two well-established ECM models, which incorporates the discrete conduction steps that are observed in the experiment at low current levels. It is used to investigate the influence of discrete conduction steps on the training accuracy of a simple one-layer ECM-based ANN using a backpropagation learning algorithm based on the steepest descent method. The simulation results show that noncontinuous resistance levels will deteriorate the network performance in comparison to devices with completely analog behavior. The convergence of the system will be either difficult or not achieved at all. Here, we studied a one-layer network. The problem of convergence is expected to become more severe for networks with multiple layers. The problem of noncontinuous resistance levels will occur when scaling down the size of the device to the 10× nanometer scale. In addition, it will appear if the current levels, and thus the filament dimensions, are getting very small. Thus, it is necessary to consider noncontinuous resistance levels when designing a neural network in ultrascaled technology nodes. The simulation results also showed how to cope with this problem. First, all the memristive devices need to be initialized in a region of a high state density. To this end, the cells need to be characterized before initialization. In addition, some algorithm parameters can be adapted to improve the convergence. The last opportunity would be to increase the amount of synapses in order to achieve quasicontinuous resistance levels, but this comes with the expense of higher area and power consumption of the network. Besides the noncontinuous resistance levels, fluctuations can occur due to the intrinsic switching variability leading to bigger or smaller conductance change during a weight update. Nevertheless, the resistance change will be due to individual jumps of atomic defects, leading to noncontinuous resistance levels for very small filament dimensions. Thus, the results should still be valid when variability is considered. The switching variability will lead to additional problems for the learning process. The impact of the variability will be addressed in a future study. Finally, additional experimental results on ultrascaled ECM devices operated in this narrow filament regime will ultimately be needed to confirm the conclusions of this paper.

SUPPLEMENTARY MATERIAL
See supplementary material for a detailed description of the used ECM models and background information on the 1-layer neural network and the learning algorithm.