Unsupervised learning in probabilistic neural networks with multi-state metal-oxide memristive synapses

In an increasingly data-rich world the need for developing computing systems that cannot only process, but ideally also interpret big data is becoming continuously more pressing. Brain-inspired concepts have shown great promise towards addressing this need. Here we demonstrate unsupervised learning in a probabilistic neural network that utilizes metal-oxide memristive devices as multi-state synapses. Our approach can be exploited for processing unlabelled data and can adapt to time-varying clusters that underlie incoming data by supporting the capability of reversible unsupervised learning. The potential of this work is showcased through the demonstration of successful learning in the presence of corrupted input data and probabilistic neurons, thus paving the way towards robust big-data processors.


Supplementary tables
Supplementary

Supplementary note 1
Device characterisation and behaviour. The capability of TiO 2 -based memristors to encode conditional probabilities largely relies on their ability to support gradual switching. Supplementary Fig. 1 shows the biasing parameter optimiser test routine [2] being applied on a single device. During this routine the device under test (DUT) is subjected to a series of pulse trains in alternating polarities. Each pulse train consists of a succession of progressively higher voltage pulses; all at fixed duration (Supplementary Fig. 1(a)). The effect of each voltage amplitude used on DUT resistive state is assessed by measuring resistive state between pulses. The test shows how the choice of bias voltage determines the speed of switching (Supplementary Figure 1(c)). We find that in our devices appropriate choice of pulsing voltage can lead to gradual switching corresponding to very small δR in response to input stimulation. Applying successive barrages of identical, pulsed stimuli (LTP only or LTD only) as described in Fig. 1 confirms the capability of gradual switching and uncovers the dependence of the magnitude of switching on the value of the running conductance. Supplementary Fig. 5 shows results from the experimental procedure carried out in Fig.  1 on all devices used for this work. We note that all devices are well-behaved, with LTP and LTD easily fitting to the exponential model used in Fig. 1.
Moreover, Supplementary Fig. 5(e) shows a typical case of cycle-to-cycle variation in memristive devices. Final resistive state at the end of the second LTD event block is slightly different compared to the first LTD block. Whilst this may be at least partially explained by incomplete convergence to an equilibrium point, our experience with the TiO 2 -based devices suggests that cycle-to-cycle variation is likely to play a role in this phenomenon.
Another important aspect of device behaviour is the voltage-time dilemma, that is the trade-off between pulse duration and voltage. We tested our samples with the biasing parameter optimiser routine at different pulse widths and recorded the pulse voltages at which the resistive state of the DUT had changed by 2% versus its state at the start of the test. The obtained values provide rough, but comparably obtained estimates of the DUT voltage threshold. Supplementary Fig. 13 shows extracted threshold voltages from a typical device in the same family as used in this work versus pulse duration whilst Supplementary Table 2 summarises the 100 µs pulse thresholds extracted for the devices used in this work. The exponential relation between pulse duration and pulse voltage is encouraging towards the notion that switching can be achieved at significantly lower power cost if shorter, but stronger pulses are used as stimulation.
The thresholded nature of switching in our devices as shown in Supplementary Fig.  1 provided good read-disturb immunity to our devices. Fig. 1(d) shows that the DUT read-out operation did not lead to appreciable changes in DUT resistive state when the DUT is at its minimum operational conductance. We ran experiments to confirm that this is still the case when the DUT is at its maximum operational conductance. Results are shown in Supplementary Fig. 2, confirming the immunity of our devices to read-disturb at both extremes of their operating resistive state range. In addition, we quantified these results by fitting conductance evolution data from the neutral regions (pre-type stimulation only) of Supplementary Figs 5(a,c,e,g) and 2 to exponentials via least-squares optimisation and then computing the fitted change in conductance at the start versus the end of the region. The results, summarised in Supplementary Table 1, indicate that the effect is small (less than 10% of DUT resistive state range as defined in Supplementary Table 3).
Finally, basic endurance and retention data is shown in Supplementary Figs 11, 12. The endurance run was conducted by repeatedly applying stimulus units (trains of 10 identical pulses lasting 100 µs at +1 V or −1 V amplitude) of alternating polarities, each followed by resistive state assessments (1 assessment = average of 5 reads). Results indicate reliable and repeatable switching of our TiO 2 devices for 500 cycles (that is 1000 stimulus units) with a small but clear (approximately 3% of Low Resistive State (LRS) resistive state level) window between High Resistive State (HRS) and LRS. The retention run was carried out by driving a test device at its operational resistive state ceiling, measuring resistive state for 2.5 hours in 30 minute intervals and then driving the device to its operating resistive state floor and taking another set of half-hourly resistive state measurements. We notice that the low resistive state is very stable (max.

Supplementary note 2
Functional form of plasticity. The estimated functional form for software plasticity is shown in Supplementary Fig. 3. This relies on the two exponential fits for f LTP and f LTD from Fig. 1(e).

Supplementary note 3
Experimental protocols. In the experiment testing for the capability of memristors to encode conditional probabilities, four test runs were carried out. Two of them used test blocks visiting the LTP probability points in scrambled order for the purposes of confirming that results obtained from the other two runs were not a consequence of visiting the various LTP probability points in a monotonically decreasing order. The precise sequence in which LTP probability points were visited are shown in Supplementary  Table 4.
In all WTA network experiments both the biasing parameters used to implement plasticity and the mappings between memristor conductance and synaptic weight were kept constant. The numbers used are summarised in Supplementary Table 3. The initial and final software and hardware weights for each WTA network run are summarised in Supplementary Table 5.
The effects of homoeostasis can be observed by examining the computed membrane potential response of the hardware-synapse neuron for the two prototype patterns and noting how significant the effect of the homoeostatic term is. This is shown in Supplementary Fig. 14 for the ANN run corresponding to Fig. 3, where the homoeostatic term fluctuated between +0.419 and −0.225 units of abstr. weight. However, the homoeostatic term can take much larger values, reaching a magnitude maximum at −1.333 abstract weight units during the learning reversibility check ANN run, which indicates a potentially powerful effect on overall membrane potential.

Supplementary note 4
Fitting converged conductance versus LTP/LTD composition. The linear fitting used for Fig. 2(a) followed the formula: where S(p) is final, converged conductance as a function of LTP/LTD composition p, a = 3.87 · 10 −7 and b = 3.73 · 10 −6 ).

Supplementary note 5
Quantifying quality of convergence. The quality of convergence achieved during the experimental runs shown in Fig. 2(a) is very hard to assess reliably given the difficulty in extrapolating how memristors might behave after the end of each 10 4 -point data block. However, as a simple check the memristor resistive state evolution during each data block -conductance g(k)-was fitted to an exponential as per eq. 15. The constant offset term c then denotes the expected resistive state saturation level for each data block.
Extrapolated convergence values are plotted in Supplementary Fig. 4 along with two data block runs and their corresponding exponential fits. We note that the exponential fits in both cases tend to qualitatively underestimate the degree with which the resistive state continues to drop/increase towards the end of each data block. Further study is required in order to understand precisely why this occurs and determine a more suitable fitting model. Moreover, we notice that on most occasions (38/40), despite the possible unsuitability of the exponential model as a fitting function, the extrapolated resistive state convergence points are within 400 nS of their counterparts as extracted from the experimental data. In the remaining two cases the conductance versus input event number plots does not exhibit a sufficiently strong saturating trajectory and causes the extrapolation to fail. We therefore conclude that: i) Incomplete convergence cannot be ruled out as a reason behind the qualitatively worse convergence observed for run no. 2, ii) preliminary checks attempting to fit data to exponentials do not lend support to this hypothesis but do not disprove it either and iii) exponential fits may be poor predictors of future memristor behaviour.
Repeatability of learning. In order to demonstrate that the memristive synapses can repeatably perform learning as shown in Fig. 3, the experiment was performed three consecutive times. In each experiment run all devices were initialised through the memristor control instrument to values corresponding to an abstract weight of 0 (within the limits of the measurement noise). The software was then initialised to 0 weights (on top of which measurement noise was added during operation). The three experimental runs are shown in Supplementary Fig. 6 where we observe that the last run is the one from Fig. 3. In all cases the data clearly shows that both neurons start from a situation where they both display no specialisation on either pattern and simultaneously their membrane potentials show no inherent preference to either pattern. At the end of each run, the prototype patterns have been successfully segregated.

Supplementary note 7
Quantifying the weight evolution noise during WTA network runs. In order to quantify the noise present in the evolution of the memristive synapse weights throughout the WTA network trial shown in Fig. 3 we first fitted the weight data to first order exponentials of the form: where w(k) the memristor synapse weight at input event k and a, b, c fitting parameters. Results are shown in Supplementary Fig. 7. The residuals were then extracted and their standard deviations computed. These results are summarised along with overall weight change throughout the WTA run ∆w total as estimated by the fittings in Supplementary  Table 6.
It is important to note that the standard deviations of the residual levels computed will include contributions from at least three main components: First, The stochastic nature of the input signal. Second, in the case of the hardware synapses, random measurement error. Third, extra error introduced by the mismatch between the choice of fitting function and the underlying synaptic weight evolution dynamics. The random measurement error can be quantified by examining the standard deviation in the resistive state of the hardware synapses as computed from the neutral region seen in the left half of Supplementary Fig. 5 (residual versus exponential fitting to mitigate spontaneous drift effects). If we then combine the standard deviation in the resistive state with the mapping between resistive state and weight from Supplementary Table 3 we can compute the contribution of the measurement error to overall noise levels in units of abstract weight. These values are shown for the hardware synapses in Supplementary Table 6. Notably, software and hardware synapses show similar levels of overall noise. Note: throughout this analysis we have assumed that the distribution of residuals is normal. Whilst this may not be necessarily true, the overall values of standard deviation are still indicative of noise levels in the system.

Supplementary note 8
Comparison case: what if software synapses are immune to noise? For the purposes of comparison we have also carried out a WTA learning experiment where the software synapses were implemented without added noise. Results are seen in Supplementary Figs 8 and 9. The difference is very clear especially with regard to the progress of learning between the neuron using software and the neuron using hardware synapses ( Supplementary Fig. 8(b)), but also when examining the evolution of synaptic weight.

Supplementary note 9
Learning reversibility timescale check. The learning experiments shown in Fig. 4 did not fully elucidate whether the system is truly capable of developing a new, stable weight configuration during reversal learning since the memristor synapses still exhibited notable changes by the end of the 1200 trial WTA run. For that reason, immediately after the conclusion of the experimental runs from Fig. 4 an additional reversibility run lasting 3000 trials was carried out. Results are shown in Supplementary Fig. 10 where we notice that after 1200 trials the system has not yet fully settled at a stable weight configuration. After 3000 trials, however, the reversal is very clear as indicated by the computed membrane potentials of the hardware neuron. Thus the system is truly capable of not just learning, but if necessary also complete relearning. resistance values far below the initial, pristine level. The stochastic nature of the CFs explains the variability in the threshold voltages; the voltage levels at which the device begin to experience switching towards lower (higher) resistive states. Notably, the precise magnitude of the voltage stimulus pulses affects the values of HRS and LRS between which the device can toggle: higher applied voltages enhance the HRS/LRS contrast, but at the expense of endurance and switching graduality (higher voltages -most of the resistive state change tends to occur upon the first pulse [7]). When applying long trains of constant voltage pulses the vacancies/ions susceptible to drift under the accumulated energy gradually migrate, resulting in a progressive shift in resistance until reaching a plateau (convergence), where no more vacancies/ions can drift unless the pulse amplitude or/and width are increased.
It is important to specify that especially when operating at near-threshold levels many pulses are needed to migrate all the vacancies/ions sensitive to the applied voltage. This is the basic explanation of the results depicted in Figure 2. The more LTP (LTD) events are applied to the device the higher (lower) the conductance becomes. At a probability of 0.05% of LTP events for example, the number of positive pulses overcomes the negative ones resulting in drifting more vacancies thus building the CF. The nature of the experiment in runs 2 and 4, which consists of applying LTP and LTD events to the device and slowly and regularly increasing (decreasing) the number of LTP (LTD) events at each event block, causes the final (and ideally converged) conductance to increase smoothly. However, larger variability in converged conductances was observed for run 1 and run 3, where the probabilities of LTP (LTD) events was randomly applied. These more abrupt changes in pulsing regime render the overall vacancy/ion drift more aggressive throughout each run and thus are the possible cause of the increased end result variability.
It is worth mentioning that the filamentary nature of the switching of our devices makes the ON state very stable, possibly because at that state the filament bridge is completely formed; determining whether this is indeed the case requires further study. However, the CF is disrupted and interrupted in the OFF state, and at the end of each pulse train the OFF resistive state drifts slightly, particularly immediately following stimulation interruption. This is observed in Supplementary Fig. 12 where the test device drifts from 8.4 kΩ to 8.9 kΩ within the first 30 minutes after stimulus interruption. The drift continued with smaller changes, from 8.9 kΩ to 9 kΩ for the following 2 hours, as can be seen. We attribute this to the active component of the resistive switching devices, named nano-battery effect [8]. Indeed, Valov et al. have studied this phenomenon and demonstrated that an inherent electromotive force (emf) exists within the device that causes the resistance value to change even when no external voltage is applied. This emf or diffusion is generated by the inhomogeneous charge distribution and charge motion resulting from the electroforming or set/reset processes. This happens at HRS where vacancy/ion drift occurs slowly, however, when a CF is completely formed, in the LRS, this phenomenon does not occur. The nanobattery effect is partially masked in this proof of principle study, as learning occurs under a constant barrage of input data, which allows the vacancies/ions to drift and achieve repeatedly relatively stable conductance values.
However, carefully studying the influence of this phenomenon in further exploiting this work should be considered. Interesting open questions for further research would be whether the presence of this emf materially affects the balance between potentiation and depression during network operation and to precisely what extent drift in resistive state after stimulation interruption is tolerable (even though results from Supplementary Figs 5 and 2 suggest the overall effect is relatively small).