Stochasticity invariance control in Pr1−x Ca x MnO3 RRAM to enable large-scale stochastic recurrent neural networks

Vivek Saraswat; Udayan Ganguly

doi:10.1088/2634-4386/ac408a

1. Introduction

Emerging non-volatile memories like RRAMs, MRAMs and FeRAMs have generated immense interest and excitement in the computing community in the past couple of decades. Particularly, the versatility of their applications ranging from conventional dense memory arrays to neuro-synaptic characteristics for neuromorphic hardware and algorithms has been highlighted [1–5]. Memristors have been experimented with the most, owing to their simple capacitor-like structure. Filamentary RRAMs like HfO_x RRAMs have been adopted widely due to their CMOS compatibility and fast switching [6–8]. The non-filamentary switching RRAMs, on the other hand, like those based on Pr_1−xCa_xMnO₃ material system are popular for low device-to-device variability, scalable currents, analog switching states, and volatile and non-volatile switching modes [9–13]. As a result of these properties, they have been demonstrated for a wide range of applications ranging from analog synapses [14], integrating neurons [10] to voltage-controlled oscillators.

Another property common to both filamentary and non-filamentary RRAMs is the presence of inherent stochasticity or cycle-to-cycle variability in switching. These devices are natural noise generators which have found numerous applications from security to stochastic neural networks for optimisation [15, 16]. A random number generation is a task that the noise-free, deterministic, digital computer is not good at [17]. As a result, optimization networks, like the Boltzmann machines, which run for millions of iterations of stochastic dot product for a problem of reasonable size are not well-suited for conventional computing hardware [18]. In this scenario, memristors enable promising solutions. First, dedicated architectures with memristor crossbar arrays have been proposed to speed up the vector matrix multiplication [18]. Gradual state control in memristors enables scaled synapses with analog memory [13, 19, 20]. Second, the concept of memristor-based stochastic neurons is also explored extensively with switching stochasticity or read stochasticity being proposed for binary probabilistically switching neurons [15, 16, 21–23] (figure 1(a)). Stochasticity is important because it helps escape local minima when performing gradient descent on an optimization network. More and more iterations of stochastic gradient descent are required as the problem size rises to convincingly explore the wide solution space (figure 1(b)). Naturally, longer term control over stochastic switching distribution parameters is crucial to solving different types of optimization problems.

The requirements from a stochastic neuron are summarized in figure 1(c). Although conceptual demonstrations of controllable stochasticity have been achieved in different material systems [16, 24, 25], the consistency to generate the stochastic distribution repeatedly over many iterations is still lacking. For practical problems of large size, the presence of controllable stochasticity for a short duration is not sufficient. A guarantee of a particular stochastic distribution independent of how long the circuit has been running is crucial. Further, an optimization problem of large size equips a large number of stochastic neurons and hence multiple RRAM devices. The device-to-device variations in the stochastic distributions is another aspect which affects practical implementations of Boltzmann machines.

In this paper, we demonstrate control of stochastic distribution over an extended duration through a confluence of both internal state (i.e., resistance) and external input (i.e., voltage pulse). First, the set time stochasticity of PCMO RRAM is measured as a function of the initial high resistance state (HRS) of the device to establish causal dependence. Second, the stochastic distribution parameters, defined by internal state and external input, are compared to fixed 'electrical inputs-only' characterization to show reduced distribution drift. Third, the gradual, analog, deterministic reset process enables excellent internal state control. Finally, the reduced distribution drift significantly increases the size of optimization problems that can be solved reliably—without the added complexity of repeated distribution re-calibrations. In addition, the internal state monitoring significantly tunes out the negative impact of the device-to-device variations by aligning the stochastic distributions across devices.

The dependence of stochastic distribution parameters on the internal state of the device, although guarantees consistent stochastic distributions, presents a new challenge. The requirement, now, from a 'stochastic' device, is to achieve a 'deterministic' state to begin with.

From a system's perspective, deterministic analog synapses are highly attractive for compact multiply and accumulate operation in crossbar arrays [1]. Further, we hypothesize that drift-free controllable stochasticity in set would require the control of the initial resistance state in a deterministic reset. The choice of memristors for a system is as follows. First, a deterministic, gradual RRAM with good internal state control works well for analog synaptic application but would provide no harvestable stochasticity (figure 2(a)). Second, filamentary RRAMs typically have a lot of variability and stochasticity in all regions of operation to produce binary and hence bulky synapses [26, 27] (figure 2(b)). Further, the stochastic switching distribution is not controllable as the initial and final states are not well controlled. Thus, a confluence of deterministic reset to control the initial resistance followed by stochastic set is essential for drift-free controllable stochastic distribution. Our proposal is to use PCMO RRAM which serendipitously combines these contradictory properties. To elaborate, PCMO RRAMs perform a gradual and analog deterministic reset in the positive polarity along with an abrupt, stochastic set in the negative polarity. Thus, all the ingredients of a stochastic neuron are enabled through polarity control suitable for practical realizations of Boltzmann machine networks (figure 2(c)). The gradual deterministic reset also enables compact, analog memory for synaptic crossbar array. Thus, PCMO RRAM is used to investigate the realization of our proposal experimentally.

**Figure 2.** Proposed benefits of PCMO RRAM as a stochastic neuron: PCMO RRAM combines the features of (a) deterministic RRAM and (b) stochastic RRAM different polarities to demonstrate (c) stochastic set and analog gradual reset. (d) Set and (e) reset mechanisms for PCMO RRAM. Set process starts as a stochastic ionic drift followed by thermal run-away (positive feedback), reset starts as a deterministic ionic drift followed by slow cooldown (negative feedback).
Download figure:
Standard image High-resolution image

The switching mechanism for the set and the reset processes in PCMO RRAM are shown in figures 2(d) and (e) and discussed extensively before [28]. The current in PCMO RRAMs is governed by oxygen vacancies-related bulk trap-modulated space charge limited conduction (SCLC). The switching mechanism is attributed to the reaction-drift of oxygen vacancies to and from the W/PCMO interface and the PCMO bulk [28]. The set process starts out with a cold device in a HRS with an abundance of bulk traps and a thin WO_x layer. A stochastic ionic (or vacancy) motion is triggered under the influence of the applied field. The vacancies reach the reactive electrode in a substitutive manner and recombine with the thin oxide layer resulting in the reduction of bulk traps. The carrier current increment response to this trap reduction is fast and is accompanied by self-heating (owing to highly thermally insulating nature of PCMO material [29]). Temperature now aids the field in the removal of bulk traps by the continued reaction-drift process. This leads to a positive feedback between carrier current, self-heating and further ionic motion and a sharp set to compliance. On the other hand, the reset process starts out in a low-resistance state with a minimal number of bulk traps. There is a high carrier current and self-heating to begin with, which promotes the generation of vacancies at the interface and their subsequent drift toward the bulk PCMO. High temperature and electric field lead to a deterministic ionic (or vacancy) motion which reduces the carrier current. This results in negative feedback between temperature cool-down and further ionic motion. The switching mechanism in PCMO RRAM has been quantitatively verified using detailed physics and numerical solvers and published earlier [28]. The effect of this set-reset asymmetry on amount of stochasticity in set vs reset switching is discussed in detail in this work.

2. Experimental section: characterization methods

2.1. Set stochasticity

PCMO RRAM device has a simple metal-oxide-metal structure: a thin layer (60 nm) of PCMO sandwiched between a top W (10 μm × 10 μm) and bottom Pt electrode on a Si/SiO₂/Ti substrate [30] (figure 3(a)). The structure, fabrication and characterization tool details are discussed in supplementary section S1(https://stacks.iop.org/NCE/2/014001/mmedia). The PCMO device resets into a HRS in positive polarity and sets into a low resistance state (LRS) in the negative polarity. The current in PCMO RRAMs is governed by bulk trap-modulated SCLC. The switching mechanism is attributed to the reaction-drift of oxygen vacancies to and from the W/PCMO interface and the PCMO bulk [28]. The mean behavior of short (∼ ns) and long-time range (∼ s) current transients have been modeled and published using the reaction-drift and thermodynamic models [28, 29].

Figure 3(b) shows the experimental write current transients for the set pulse. When the set voltage is applied, the current rises and settles to an initial level. At this timescale, the drift of vacancies is too slow to produce any further change, so the current stays almost flat. At a later timescale, when the drift timescale of vacancies matches with the measurement timescale, a sharp increase in the current to compliance is observed, indicating an abrupt set in the logarithmic time axis [28]. The set current crosses a threshold I_Set to define a set time, t_Set. The abrupt shoot-up and the well-defined set time are explained by a positive feedback between vacancy motion, current, and self-heating induced temperature to create a run-away process limited by compliance. This set process is repeated several times as a part of the following scheme: a fixed set voltage pulse followed by a reset pulse of fixed voltage and pulse width (inset of figure 3(b)). These cycles of reset and set gives rise to a family of set transients in the device as shown in figure 3(b). These transients denote the stochastic set in a PCMO RRAM. The distribution in set times can be used to generate a stochastic neuron for Boltzmann machines [16].

A closer inspection of the set transients stochasticity reveals that there is a distribution in the initial currents, I_Settle measured at a pre-defined t_Settle after the input voltage rise is complete. This indicates that the present distribution in t_Set may not be entirely stochastic or based only on the applied set voltage. The dependence of the t_Set on the I_Settle is shown in figure 3(c). Clearly, the total t_Set scatter has a deterministic component related to the I_Settle and a stochastic component at any given I_Settle. However, the I_Settle should indeed be a function of the initial HRS of the device. We test this next.

2.2. HRS dependence

In order to observe the dependence of t_Set on the initial HRS, we repeat the reset, set measurements as in the previous section but add a read pulse before the set. We perform this experiment for a fixed $\left\vert {V}_{\text{Set}}\right\vert =1.7\enspace \mathrm{V}$ . The reset voltages are incremented or decremented per cycle in the range of (1.5–2.5) V so as to be able to sweep a large range of initial HRS. The initial HRS is measured using the read current, I_Read,HRS using a fixed small voltage of −0.5 V. The t_Set and I_Settle are extracted from the set transients as mentioned before.

First, we observe that the set transients reveal a dependence of t_Set on I_Settle (figure 3(d))). It is a two-slope dependence in the log-linear domain, indicating an exponential decrease in t_Set as the I_Settle increase followed by saturation in t_Set for even higher I_Settle. This limit indicates the fastest switching times (∼ 100 ns) for the given set voltage. With the help of the read pulse, we can now demonstrate that the I_Settle is indeed linearly related to the I_Read,HRS (figure 3(e)). Hence, the dependence that we are observing in t_Set can ultimately be mapped to this initial HRS of the device (figure 3(f)). Again, as the read current rises, the t_Set falls exponentially followed by saturation for even higher I_Read,HRS. Earlier studies have shown a similar behavior of t_Set with direct input of set voltage. This is the first demonstration of the t_Set dependence on the internal state i.e., initial HRS before set of the PCMO RRAM device.

2.3. Controllable stochasticity

It can be demonstrated that the stochastic cycle-to-cycle distribution of t_Set in PCMO RRAM devices is lognormal at any given HRS (details in supplementary section S2). The mean dependence of t_Set on set voltage, |V_Set| is well known and shown earlier [28, 31]. We add the t_Set vs HRS characterization discussed in previous sections for a wide range of |V_Set| (1.6–2.2 V) values where HRS = V_Read/I_Read,HRS and V_Read = −0.5 V. The experimental t_Set is plotted in figure 4(a). We observe that the mean t_Set increases as the HRS increases or the $\left\vert {V}_{\text{Set}}\right\vert$ decreases. We fit a surface through the experimental data of log t_Set with quadratic dependencies on |V_Set| and HRS. The surface is an excellent fit with 0.9 R-squared metric. This is plotted in figure 4(b) and expressed as the mean of log t_Set. Next, we calculate the variation of experimental data with respect to this mean and obtain the standard deviation of log t_Set plotted in figure 4(c). This enables the complete characterization of t_Set distribution with the lognormal parameters μ and σ as a function of the |V_Set| and HRS.

**Figure 4.** Controllable stochasticity: (a) experimental stochastic t_Set measured for a range of HRS for different |V_Set|, bi-quadratic surface fit (R-squared ∼ 0.9) is used for (b) extracted μ of log₁₀ t_Set, (c) extracted σ of log₁₀ t_Set assuming a lognormal distribution of t_Set at each (|V_Set|, HRS), (d) P_Switch extracted from the previous methodology of fixed electrical input compared with (e) P_Switch extracted from the proposed methodology of fixed electrical input and internal state of the device, (f) P_Switch vs |V_Set| demonstrating control of bias point and steepness of switching probability as a function of internal state (HRS) and t_pw (for $\left\vert {V}_{\text{Set}}\right\vert$ > 1.95 V, the P_Switch ∼ 1 indicating a deterministic set operation).
Download figure:
Standard image High-resolution image

$\left\vert {V}_{\text{Set}}\right\vert $ — **Figure 4.** Controllable stochasticity: (a) experimental stochastic t_Set measured for a range of HRS for different |V_Set|, bi-quadratic surface fit (R-squared ∼ 0.9) is used for (b) extracted μ of log₁₀ t_Set, (c) extracted σ of log₁₀ t_Set assuming a lognormal distribution of t_Set at each (|V_Set|, HRS), (d) P_Switch extracted from the previous methodology of fixed electrical input compared with (e) P_Switch extracted from the proposed methodology of fixed electrical input and internal state of the device, (f) P_Switch vs |V_Set| demonstrating control of bias point and steepness of switching probability as a function of internal state (HRS) and t_pw (for $\left\vert {V}_{\text{Set}}\right\vert$ > 1.95 V, the P_Switch ∼ 1 indicating a deterministic set operation).
Download figure:
Standard image High-resolution image

Now, we can characterize the stochastic switching neuron. A neuron is said to switch if the t_Set (i.e., time to reach I_Set) is lower than the applied pulse width t_pw. Since the t_Set is a lognormal random variable dependent on |V_Set| and HRS, the switching is stochastic and results in a stochastic neuron. The switching probability can be obtained as:

$\begin{equation*}{P}_{\text{Switch}}=P({t}_{\text{Set}}\leqslant {t}_{\text{pw}})\enspace \enspace \mathrm{w}\mathrm{h}\mathrm{e}\mathrm{r}\mathrm{e}\enspace \enspace {t}_{\text{Set}}\enspace \enspace \mathrm{i}\mathrm{s}\enspace \enspace \mathrm{l}\mathrm{o}\mathrm{g}\mathrm{n}\mathrm{o}\mathrm{r}\mathrm{m}\mathrm{a}\mathrm{l}\end{equation*}$

Given the external inputs |V_Set|, t_pw and internal state HRS, the lognormal distribution parameters for t_Set (μ(HRS, |V_Set|) and σ(HRS, |V_Set|)) can be found and are plotted in figures 4(b) and (c)) followed by calculating the cumulative probability distribution (CDF) function at t_pw. Thus P_Switch is a function of |V_Set|, HRS and t_pw. The switching probability of the device was studied earlier as a function of |V_Set| and t_pw alone without monitoring the HRS [16]. The results are plotted in figure 4(d). In this work, we calculate the switching probability as a function of |V_Set| and t_pw at a particular HRS (40 kΩ) as shown in figure 4(e). Clearly, capturing the HRS dependence has improved the behavior of the P_Switch. It is now well defined and predictable with respect to electrical inputs.

A stochastic neuron is typically used with a fixed t_pw while $\left\vert {V}_{\text{Set}}\right\vert$ serves as the input to control P_Switch. We plot the P_Switch vs |V_Set| for a fixed HRS and t_pw to get sigmoid-like characteristics (figure 4(f)). In order to center the sigmoid around a particular $\left\vert {V}_{\text{Set}}\right\vert$ = 1.8 V, we choose t_pw as the mean of the lognormal t_Set for the given |V_Set| and HRS. This ensures that P_Switch (1.8 V) = 0.5 (since CDF at mean is 0.5). The center of the sigmoid can be shifted to any voltage of choice by this methodology (as seen by black dashed lines in figure 4(f)). Next, we vary the HRS (and correspondingly the log t_pw = μ of log t_Set at 1.8 V and chosen HRS such that P_Switch (1.8 V) = 0.5 is maintained) to get sigmoids of varying steepness. Thus, a stochastic neuron with a controllable center and steepness of sigmoidal switching probability is demonstrated.

2.4. Drift-free stochasticity

Drift-free stochasticity means the ability to generate instances from the exact same stochastic distribution over time over multiple iterations. In other words, it is the ability to control all the inputs/parameters external and internal to the device that determines the stochastic distribution parameters (mean and standard deviation). We have identified Set voltage and HRS as the inputs that affect t_Set and its stochasticity. In this experiment, we observe the distribution of t_Set along multiple cycles while measuring the HRS before each set (figure 5(a)). Every cycle is an application of a reset, read and set pulse. The reset voltage of consecutive cycles is decreased in small amounts (∼ 0.05 V) which leads to progressively lower HRS points. Once we achieve a low enough LRS, the direction of change of reset voltage in the consecutive cycles is reversed, which leads to progressively higher HRS points. This back and forth sweeping of the HRS range and the resultant t_Set points are measured for 1000 cycles (figure 5(a)). The dependence of t_Set on HRS is fairly static over cycles (evident by HRS color bands parallel to the x-axis) indicating complete control over t_Set distribution using internal state HRS and external input V_Set. The variation of t_Set as a function of the mean t_Set is plotted in figure 5(b) to indicate that the t_Set although controllable, is indeed stochastic.

The comparison of control over t_Set distribution with and without monitoring the internal state is shown in figure 5(c). If we perform reset-set cycles of fixed magnitude and pulse duration without actively monitoring the internal state (as in figure 3(b)), the t_Set distribution has the potential to drift a lot (cross markers in figure 5(c)). The drift is 1 decade in mean and 0.34 decades in the standard deviation of the lognormal t_Set over 100 cycles. On the other hand, if the HRS is ensured to be within 10% tolerance of a target over multiple cycles, the control over t_Set is drastically improved (circle markers in figure 5(c)). We show this for two bounding HRS values. The resultant drift now is reduced significantly to 0.01 decades in mean and 0.003 decades in standard deviation. The resultant drift is so minimized that it is statistically insignificant and points to a simpler model—with HRS and V_Set fixed, t_Set distribution is fully determined across iterations and time (unless, of course, if something fundamentally ages about the device—reliability concern—however, switching endurance is well-demonstrated earlier [11]). This result is extremely significant for using these devices to reliably and repeatedly generate random numbers from a fixed distribution. For use in a Boltzmann machine, the number of iterations before the network converges can easily run into millions of cycles depending on the network size. This means every stochastic neuron in the network undergoes a large number of samplings during the course of network evolution (exactly one neuron being sampled every iteration). We will analyze the impact of drifting stochastic distribution in the following sections.

2.5. Resistance state controllability in PCMO RRAM

As evident from the previous section, the ability to consistently produce identical stochasticity over multiple reset-set cycles depends on the ability to control the internal state HRS with precision. This is where PCMO RRAM devices are particularly attractive. Not only do they show set based stochasticity which is controllable but they are also excellent analog memories with gradual controllable resistance using the opposite polarity reset [11, 20, 29, 31]. This difference arises from the fundamentally different feedback mechanisms at operations in set (positive feedback, abrupt) and reset (negative feedback, gradual) discussed extensively in the reaction drift model proposed for the switching phenomena in these memories [28]. We perform reset resistance transient measurements, where starting from a fixed LRS, a reset pulse of fixed magnitude (1.8 V) and pulse width is applied (figure 5(d)). A fixed LRS is ensured by choosing a high |V_Set| = 3 V and compliance current, I_comp = 10 mA. This is termed as a strong set and is deterministic compared to a lower voltage stochastic set (figure 4(f)). As the pulse width of the reset pulse is increased, the read resistance after the reset pulse starting from a fixed LRS gradually increases (figure 5(d)). The pulse width sweep is performed multiple times back and forth (3 times) to show precise control over resistance using reset over a wide range of resistances. All resistance measurements have a sub-10% variation with a very small fraction exceeding even 5% variation (figure 5(e)). In order to demonstrate the asymmetry in stochasticity, the variation in t_Reset is plotted for three different HRS values (± 20% tolerance) (figure 5(f)). Compared to lognormal distributions of t_Set with σ(log₁₀ t_Set) varying from 0.1 to 0.7 decades (figure 4(c)), the distribution of t_Reset is linear with sub-5% variation. This demonstrates precise HRS control using reset pulses in otherwise set based stochastic PCMO RRAM. This behavior is very powerful since resistance controllability makes them the choice of memories for the crossbar array weights in a Boltzmann machine along with stochastic neurons on the edge of this crossbar utilizing the set stochasticity. A single material systems-based Boltzmann machines chip is realizable using PCMO RRAM devices.

3. Network simulation section: performance of Boltzmann machines

3.1. Impact of cycle-to-cycle drift

The utility of RRAM devices in efficient implementation of Boltzmann machines has been explored extensively [18]. The possibility of analog multiply and accumulate in the crossbar array and the generation of analog approximate sigmoid switching probability using inherent switching stochasticity provide significant hardware implementation benefits [16]. However, the Boltzmann machines are networks which run for a large number of iterations performing stochastic gradient descent before converging. Hence, demonstration of controllable stochasticity is necessary but not sufficient. It is well known that PCMO RRAMs have good endurance, with each device capable of switching tens of thousands of cycles [11]. However, it is extremely essential to be able to reproduce the switching probability of a neuron for this large number of reset-set cycles. As seen in the previous section (figure 5(c)), if the internal state of the device drifts over these cycles, the stochastic distributions drift as well. This will result in changing P_Switch characteristics which can potentially affect the performance of Boltzmann machines. In this section, we analyse the impact of HRS drift over cycles on the maximum size of the max-cut optimisation problem that a Boltzmann machine, built out of these devices, can solve.

The max-cut problem requires us to find a bi-partition of a graph with N nodes and E edges such the sum of the weights of the edges crossing between these partitions is maximized. The input to this problem is the adjacency matrix of weights w_ij connecting any two nodes i and j. Let us represent the current configuration of the network by the vector x of nodes where x_i is 0 if it belongs to first partition and it is 1 if it belongs to second partition. The quantity to be maximized then can be expressed as [16]:

$\begin{equation*}M\left(x\right)={{\Sigma}}_{i=1}^{N}{{\Sigma}}_{j=i+1}^{N}{w}_{ij}.[\left(1-{x}_{i}\right).{x}_{j}+\left(1-{x}_{j}\right).{x}_{i}\left. \right)]\end{equation*}$

The energy to be minimized is then $E\left(x\right)=$ −M(x):

$\begin{equation*}E\left(x\right)={\boldsymbol{b}}^{T}\boldsymbol{x}-\frac{1}{2}{\boldsymbol{x}}^{T}{\boldsymbol{W}}_{\boldsymbol{B}}\boldsymbol{x}\end{equation*}$

$\begin{equation*}\text{Where}\enspace {b}_{i}=-{{\Sigma}}_{j=1}^{N}{w}_{ij}\quad \text{and}\quad {W}_{B,ij}=-2{w}_{ij}\end{equation*}$

The energy is expressed in the standard form for the Boltzmann machine. The machine performs stochastic gradient descent if at every iteration exactly one neuron switches with the sigmoidal probability as a function of u_i = Σ_j W_B,ij x_j − b_i.

The max-cut instances are chosen from standard benchmark libraries for this work [32, 33]. These are random or planar graphs of size (100–3000) nodes with weights chosen from {−1, 0, 1}. The sigmoidal switching probability is achieved using the RRAM switching probability after scaling and shifting the input u_i to the range of input voltages that RRAM responds to (figure 4(f)). The ideal Boltzmann machine solution using a fixed switching probability characteristics for a 125 nodes problem is shown in figure 6(a). The energy reduces stochastically to reach the best-known solution over ∼10⁴ iterations. In each iteration, exactly one RRAM device undergoes stochastic switching. The corresponding separation of the desired bi-partition as energy reduces is shown on the left in figure 6(a). Multiple stochastic runs are performed to display the variation in the stochastic performance of the Boltzmann machine.

**Figure 6.** Ideal solution to max-cut problem and effect of cycle-to-cycle HRS drift: (a) the objective is to find a bi-partition of the input graph maximizing the weights in the cut. Hollow circles and filled circles denote the desired bi-partition. As energy evolves over iterations, the partition separates out into the desired configuration, (b) switching probability characteristics of a single neuron which has undergone different number of reset cycles assuming a constant HRS drift per reset, (c) energy vs iterations of ideal stochastic gradient descent ( ${m}_{\text{HRS}}=0\left. \right)$ and in the presence of HRS drift (m_HRS = 0.01 kΩ/cycle). (d) Iterations taken to converge within 10% of best-known solution for different problem sizes (multiple runs and multiple problem instances (∼ 25 runs) of each problem size), (e) maximum meaningful iterations of stochastic gradient descent for a given extent of HRS drift.
Download figure:
Standard image High-resolution image

**Figure 6.** Ideal solution to max-cut problem and effect of cycle-to-cycle HRS drift: (a) the objective is to find a bi-partition of the input graph maximizing the weights in the cut. Hollow circles and filled circles denote the desired bi-partition. As energy evolves over iterations, the partition separates out into the desired configuration, (b) switching probability characteristics of a single neuron which has undergone different number of reset cycles assuming a constant HRS drift per reset, (c) energy vs iterations of ideal stochastic gradient descent ( ${m}_{\text{HRS}}=0\left. \right)$ and in the presence of HRS drift (m_HRS = 0.01 kΩ/cycle). (d) Iterations taken to converge within 10% of best-known solution for different problem sizes (multiple runs and multiple problem instances (∼ 25 runs) of each problem size), (e) maximum meaningful iterations of stochastic gradient descent for a given extent of HRS drift.
Download figure:
Standard image High-resolution image

For an N node problem, we require N stochastic neurons or N RRAM devices. Every iteration, exactly one RRAM device is reset-set to evolve the configuration. Every reset-set causes the HRS to drift for that particular RRAM device. The effect of HRS drift on the switching characteristics for different number of reset-set cycles undergone is shown in figure 6(b). The drift is modeled as a linear shift with slope m (in kΩ/cycle). There is significant shift in the sigmoid curves over cycles. The effect of HRS drift is now observed on the energy transient of the stochastic gradient descent in figure 6(c). Compared to the ideal case (m = 0) which takes some iterations to converge (i.e., reach within 10% of the best-known solution of the problem), the HRS drift (m_HRS = 0.01 kΩ/cycle) case performs meaningful stochastic gradient descent up to only some max No. of iterations after which it is no longer moving toward the optimal energy. We can calculate the typical number of iterations required to solve the max-cut problem using an ideal no HRS drift neuron for different problem sizes (figure 6(d)). We can also calculate the maximum number of meaningful iterations of stochastic gradient descent that a network can perform with neurons of a given HRS drift (figure 6(e)). For any given m_HRS, we can thus find the maximum size of the problem that can be solved. With the improved consistency in stochasticity obtained using the fixed HRS method in PCMO RRAM results in an 20× improvement (sub-100 to above-1000) in the size of the problem that can be solved. Hence to solve practical problems of large size, drift-free stochasticity is key and essential for efficient hardware implementations of Boltzmann machines.

3.2. Impact of device-to-device variations

Large problems also make use of large number of stochastic neurons and hence many stochastic RRAM devices. These devices vary from each other in terms of their stochastic t_Set distribution. Figure 7(a) shows the t_Set vs HRS scatter for six different devices at the same |V_Set| = 1.9 V. If the t_Set distribution for the six devices at a given HRS of 140 kΩ (with ±20% variation in HRS) is plotted, the distributions are widely separated from each other with the primary difference in their μ(log₁₀ t_Set)—about 20% variation device-to-device (figure 7(b)). This can potentially affect the quality of the solution obtained by the Boltzmann machine. Once again, the HRS controllability in PCMO RRAM devices helps to tune out this variability once before the network evolution begins. The HRS of every device is set (with realistic precision of 20% variation as shown in figure 7(b)) to the value that corresponds to a fixed $\mu \left({\mathrm{log}}_{10}\enspace {t}_{\text{Set}}\right)=-5$ . The new t_Set distributions using the HRS controlled scheme are plotted again in figure 7(c). The device-to-device variation in stochastic distributions is brought down to just 2% (as opposed to 20% in the fixed HRS method).

The effect of device-to-device variations in t_Set translates to variation in switching probability curves of different devices (figure 7(d)). Different P_Switch curves for each stochastic neuron affect the minimum settling energy of the Boltzmann machines (figure 7(e)). Higher variations result in a higher settling energy and hence more error % with the optimal energy. This error % in settling energy with respect to the ideal settling energy is plotted for different amounts of device-to-device variability (figure 7(f)). Compared to 50% error from ideal for 20% device-to-device variations, the HRS controllability (reduced variations to 2%) allows the error to improve 10× to just 5%. Thus, the knowledge of stochasticity affecting parameters, namely HRS and V_Set and the ability to control HRS in the same device helps to tackle practical problems of cycle-to-cycle drift and device-to-device variations and enable large-scale stochastic neural networks.

4. Conclusion

In this work, the utility of PCMO RRAMs as an enabler for large scale stochastic recurrent neural networks like Boltzmann machines is proposed. The parameters affecting the set-time stochasticity are identified. With HRS and V_Set fixed, t_Set distribution is fully determined across cycles and time. The asymmetric nature of stochasticity between set and reset is highlighted. Deterministic and gradual state control in the Reset operation allows HRS controllability to enable drift-free set stochasticity over many iterations. The reduced drift enables the solution of problems greater than 1000 nodes for the max-cut graphical optimization using Boltzmann machines which is 20× higher than electrical-input only method of stochasticity generation. Further, HRS controllability allows tuning out of the device-to-device variability effects improving solution quality by 10× compared to a system with realistic variations. The properties of PCMO RRAM neuron as a stochastic neuron with a controllable internal state makes it the choice of device for implementing stochasticity and weights in large scale Boltzmann machines.

Acknowledgments

The authors would like to thank Sandip Lashkare (previously graduate student with Dept. of Electrical Engineering, IIT Bombay) for discussions on experimental characterization. This work is supported in parts by the Semiconductor Research Corporation (SRC), DST Nano Mission, Ministry of Electronics and IT (MeitY) and Department of Electronics through the Nanoelectronics Network for Research and Applications (NNETRA) project. Vivek Saraswat is supported by Prime Minister's Research Fellowship, Govt. of India.

Data availability statement

All data that support the findings of this study are included within the article (and any supplementary files).

Stochasticity invariance control in Pr_1−xCa_xMnO₃ RRAM to enable large-scale stochastic recurrent neural networks

Article metrics

Submit

Author e-mails

Author affiliations

Author notes

ORCID iDs

Dates

Peer review information

Abstract

1. Introduction