Habituation based synaptic plasticity and organismic learning in a quantum perovskite

A central characteristic of living beings is the ability to learn from and respond to their environment leading to habit formation and decision making. This behavior, known as habituation, is universal among all forms of life with a central nervous system, and is also observed in single-cell organisms that do not possess a brain. Here, we report the discovery of habituation-based plasticity utilizing a perovskite quantum system by dynamical modulation of electron localization. Microscopic mechanisms and pathways that enable this organismic collective charge-lattice interaction are elucidated by first-principles theory, synchrotron investigations, ab initio molecular dynamics simulations, and in situ environmental breathing studies. We implement a learning algorithm inspired by the conductance relaxation behavior of perovskites that naturally incorporates habituation, and demonstrate learning to forget: a key feature of animal and human brains. Incorporating this elementary skill in learning boosts the capability of neural computing in a sequential, dynamic environment.


Supplementary Figure 1 | Habituation response in the perovskite and learning mechanism in neural/non-neural organisms.
Habituation is a ubiquitous behavior present across the phyla of living beings that help organisms to learn and adapt to different aspects of the environment. It has been demonstrated to cause short-term and long-term potentiation of synaptic connections (or synaptic plasticity) that is key to memory formation in neural organisms. In non-neural organisms such as slime, habituation is seen as a change in its shape in response to different environments. The perovskite's nonlinear response to the environment (H2 and Air in this study) with varying conductance mimics simple adaptation behavior and motivates the Adaptive Synaptic Plasticity (ASP) learning (see Supplementary  Fig. 10). In ASP, we incorporate habituation by weight leaking coupled with traditional spike timing correlation to demonstrate learning to forget for robust and stable learning of artificial neural systems. In organisms, if the stimulus is withheld for extended period of time, the original state can be recovered. Subsequent exposures to the environment will result again in habituation. Strikingly identical to this behavior, our nickelate devices can be made to forget previous exposures to hydrogen by resting in air. a, A set of experiments on a device after it was recovered by an air anneal then followed by testing for 15 cycles. The continuous diminished response demonstrates habituation behavior. b, After seven cycles of H2/Air treatments, the SNO was left in air for 12 h. The device started to recover and approached its original state. With the same manner of H2/Air exposure again, the habituation behavior could be reproduced. This forgetting-habit forming process was repeated by resting the SNO device in air for another 12 h and re-exposure to H2/Air. c, non-habituation by fully recovering SNO. It is worth noting that if the H2 treatment was followed by an extended exposure of the nickelate device in air for 48 h, no habituation phenomenon would be present, again similar to what is observed in experiments conducted on organisms. The dotted conducting line in (a) and (b) indicated the trend of diminished response. These experiments were conducted in a manner identical to numerous experiments conducted on organisms in the biology literature and are all cited in the manuscript and Supplemental Information files.  Figure 4 | A set of representative in situ synchrotron X-ray diffraction patterns monitoring structural evolution upon breathing in H2 and air sequentially. a, X-ray diffraction patterns in H2 environment. SmNiO3 (SNO) thin film and substrate LaAlO3 (LAO) are indexed in pseudocubic lattice system. SNO (002) peak (labeled 1) appears close to LAO (002) peak. The peak at qz = 2.98 Å -1 (peak 2) is related to HSmNiO3 (H-SNO) 1 . It is clearly seen that when treated with H2, peak 1 drops with exposure time (t1→t2→t3), while peak 2 increases with longer exposure, indicating H-SNO phase emergence. The proton concentration can be estimated to be of the order of 0.03-0.05 doping per unit cell by comparing the intensity to that of a fully doped sample. b, X-ray diffraction patterns in air environment. The gas environment was switched to air and structural evolution was monitored. It is evident that no new peaks are present. An opposite trends is observed, i.e. the SNO phase is restored and H-SNO phase diminished when breathing in air. The areas of peak 2 for all measured data during H2/air breathing are calculated via integration of the region shown in dotted line box and plotted in Fig. 2c.
Intensity (a.u.)  To obtain the RMXS intensity, we integrated the CCD signal along vertical slices (with 50 pixels lateral width) through the peak center in order to average out the intensity fluctuations (speckle pattern) arising from domain interference (b). When the beam is on the Pt bar, no signal can be measured, due to the opacity of the heavy metal layer to soft X-rays. The fluorescence background increases in the H-SNO region, where, however, no magnetic scattering can be detected. The magnetic reflection only appears in the undoped SNO region. All linecuts are fitted using a Gaussian lineshape with a uniform background. The position-dependent magnetic peak intensity can then be extracted as shown in Supplementary  consists of an input layer followed by excitatory and inhibitory layers. The input layer contains 28x28 pixel image data (with one neuron per image pixel) from MNIST dataset 3 . Each input pattern or image is converted to a Poisson spike train based on the pixel intensities of the images in the dataset. The input layer is fully connected to the excitatory neurons, that are connected to the corresponding inhibitory neurons in a one-to-one manner. Each of these neurons inhibit the excitatory layer neurons except the one from which it receives the forward connection. This connectivity structure provides lateral inhibition that limits the simultaneous firing of various excitatory neurons in an unsupervised learning environment, promotes competitive learning causing them to learn different input patterns from each other. Besides lateral inhibition, we employ an adaptive membrane threshold mechanism called homeostasis 4 that regulates the firing threshold to prevent a neuron to be hyperactive. It equalizes the firing rate of all neurons preventing single neurons from dominating the response. During learning, the excitatory synaptic weights from the input layer to each excitatory neuron are modulated to learn a particular input digit using the learning rule. Towards the end of the learning phase, the weights (or excitatory connections) that are randomly initialized eventually learn to encode a generic representation of the digit patterns. Specifically, the weights fanning out of the higher-intesity (or white) pixel regions will get potentiated while the weights from the low-intensity regions on the input image will be depressed during the learning phase. Correspondingly, the color-map figures shown in Fig. 1d, Fig. 3 and Supplementary Fig. 10b represent the weight values learnt corresponding to each excitatory neuron when learning stops.

Supplementary Figure 10 | Adaptive Synaptic Plasticity (ASP) learning for weight modulation in a
Spiking Neural Network (SNN). a, When a spiking activity is observed at the post/pre neuronal synaptic terminal, the recovery phase begins. This phase involves an exponential increase (potentiation) or decrease (depression) based on the temporal difference between the spiking activities of the pre-and post-neurons. The decay or forgetting phase in a synaptic weight ensues when there is no spiking activity (or input stimulus) observed at both post/pre neuron. The weights are dynamic during the training phase when input patterns are presented. The leak dynamics determine which post-neuronal connections (that have learnt old or insignificant data) should be forgotten to learn the new data. Whilst, the recovery phase potentiation/depression is geared towards making synaptic weight updates to learn a generic representation rather than learning specific training patterns. For instance, the weights of an excitatory layer post-neuron learning a digit 2 should spike for different instances of 2 so that it learns a more generic representation rather than just mimicking a specific instance. Thus, synaptic depression (based on spike timing correlation) and leak (based on habituation) have different roles in ASP learning. ASP incorporates the significance of the inputs to modulate the weights (see Supplementary Note 3 for details on implementation). That is, the weight updates during recovery phase are more prominent for frequently spiking input neuron. Also, the leak rate during the decay phase is varied taking into account the postsynaptic or excitatory neuron's spiking activity and membrane threshold thresh + , such that recent input patterns do not overwrite old but significant data. The leak rate decreases as the weights become more and more prominent (either in potentiation or depression window), that is basically habituation. This behavior helps to retain significant information (corresponds to weights with higher negative/positive values) while forgetting (or leaking) the weights corresponding to insignificant information. b, To show the effectiveness of the proposed learning model for larger problems, we trained an SNN of 200 excitatory neurons with Spike Timing Dependent Plasticity (STDP) and ASP in a dynamic environment when all digits 0 through 9 are presented sequentially. To ensure that the earlier digits are not completely forgotten, the number of training instances of each digit category were arranged in a decreasing order i.e. digit 0 had more training instances than digit 1 and so on. So, the network will try to retain more significant data while learning recent patterns. It is clearly seen that the SNN learnt with our proposed ASP encodes a better representation of the input patterns in comparison to the standard STDP trained network. In fact, the network is able to represent all digits. Without habituation in STDP trained SNN, most of the representations are illegible due to substantial overlap. As noted in Supplementary Note 4, ASP learning can also be naturally integrated with filamentary switch or spin-based devices. The color intensity of the patterns are representative of the value of synaptic weights with lowest intensity (white) corresponding to a weight value of -0.5 and highest intensity (black) corresponding to 0.5. Supplementary Note 1:

Related work for habituation based learning
There have been prior efforts on integrating habituation for implementing autonomous mobile robots [5][6][7] . Essentially, the non-associative learning rule that represents a decreased response after exposure to repeated stimuli, has been demonstrated to be crucial for the attention phenomenon. Consequently, in Ref. 5, the authors had demonstrated that inserting such a rule at the synaptic level increased a robot's adaptation capability (that was controlled by a Spiking Neural Network (SNN)) by enabling the robot to ignore broader contextual irrelevant information. In Ref. 6, the authors investigated operant conditioning in neurobotics context in multiple learning scenarios. Specifically, in Ref. 6, the authors used the habituation enhanced synaptic plasticity to model simple operant conditioning learning in mobile robots that can perform complex behaviors with simpler neural components. On the other hand, from neuroscience perspective, habituation mechanisms (not related with synaptic plasticity) have been used to model visual cortical dynamics to better understand how the brain perceives slanted, curved surfaces and 2D as 3D objects 8,9 . The model shows how chemical transmitters that habituate in an activity dependent manner trigger attention that eventually affects 3D/2D perception. Our work, in contrast to the above studies, integrates habituation with synaptic plasticity to emulate the forgetting capability of the brain in order to build a stable-plastic self-adaptive SNN for dynamic environments without catastrophic forgetting. Adaptive Synaptic Plasticity (ASP) learning facilitates the gradual degradation or forgetting of already learnt weights to realize new and recent information while preserving some memory about old significant data. While the current work focuses on a visual recognition application, the ASP learning rule can serve as a general unsupervised learning model across neuromorphic applications that addresses the catastrophic forgetting issue.

Supplementary Note 2: First-Principles calculation of SNO band structure
The undoped (pristine) calculation are carried out on monoclinic SNO with a Jahn-Teller distortion, relaxed from a Pbnm structure and freezing in a monoclinic distortion with β ≄ 90. This structure (space group P21/n) is a √2 x √2 x 2 supercell with two inequivalent Ni sites. Considering 10 atom/cell magnetic orderings for the monoclinic structure, we found the energies of all these relaxed structures to be within 0.1 eV/Ni of each other with band gaps all smaller than 1 eV (See Supplementary Table 1 for further details). Consequently, the specific choice of the underlying magnetic structure in the spin-polarized DFT+U is not expected to influence appreciably the evolution of the band structure with doping. Previous computational work on rare earth nickelates, employing both dynamical mean field theory (DMFT) and DFT+U, has shown that the insulating phase exhibits a disproportionated structure, which is often exaggerated by DFT+U 10 ; furthermore, room temperature experiments show only a small 11 or no 12 disproportionation. For each added electron, its localization can be observed through the magnetic moment of the Ni, the oxygen octahedral size or the PDOS of the Ni. The localized electron on a Ni site resulting in a high-spin Ni 2+ , where Hund's rules are favored over a Jahn-Teller distortion. This can be clearly seen for 1/4 and 1/2 e -/Ni in Supplementary Table 2. The octahedral distortions observed for 3/4 and 1 e -/Ni are also constrained by the volume of the cell, which is not allowed to change in the ionic relaxation in these calculations. Magnetic moment of each of the four Ni and properties of the compassing oxygen octahedron in the doped SNO structures with G-type magnetic ordering. We elected to keep the overall volume of the calculations fixed as in reality the electron doping will not occur the same regularity present in the calculation. While the overall octahedral tilt pattern is not affected by electron doping, the tilt angles become more acute. When the lattice parameters are allowed to relax the overall volume increases; however, the octahedral volume increases that the tilt angles become even more acute.  Relaxed, Supplementary Fig. 5a, third

Adaptive Synaptic Plasticity Learning
In the Spiking Neural Network (SNN) simulations for digit recognition, we use the Leaky-Integrate-and-Fire (LIF) model 13,14 to simulate the membrane potential V of a neuron as = ( rest − ) + e * ( exc − ) + i * ( inh − ) where Erest is the resting membrane potential (-65 mV), Eexc (0 mV) and Einh (-100mV) are the equilibrium potentials of excitatory and inhibitory synapses, τ is the time constant (100 ms) and ge and gi are the conductances of excitatory and inhibitory synapses respectively. The LIF model causes V to increase when pre-synaptic spikes are received and to otherwise decay exponentially. The post-neuron fires when V crosses the membrane threshold Vthresh (-52 mV) and THEN its membrane potential is reset to Vrst (-65 mV). After each firing event, a refractory period (5 ms) ensues during which the post-neuron is inhibited from firing even if additional input spikes arrive.
Synapses are modeled by conductance changes 13,14 wherein the conductance increases by the synaptic weight, w, only upon the arrival of pre-neuronal spike. Otherwise, the conductance continues to decay exponentially. The dynamics of both inhibitory and excitatory conductance are simulated as where τe (1ms) or τi (2ms) are the time constants for the excitatory or inhibitory post-synaptic potential.
As discussed in Supplementary Fig. 9, homeostasis is used to prevent a single neuron from dominating the spiking pattern. Specifically, each excitatory neuron's membrane threshold is not only determined by Vthresh but by Vthresh + θ, where θ is increased each time the neuron fires and then decays exponentially at an extremely slow rate. We use θ = 0.1 and a very high decay time constant of 10 8 ms in our simulations.
Each input image is presented for 350 ms. There is resting period of 150 ms before presenting a new input to allow all neuronal parameters to decay to the reset values (except for the adaptive membrane threshold, thresh + ). We note to the reader that we use identical parameters for neuron and synapse models, input encoding and input image presentation time as Diehl & Cook 14 for fair comparison of our ASP learning with standard STDP learning (Fig. 3b, Supplementary Fig. 10). The standard STDP learning model is implemented using the power law weight dependent rule 14,15 .

ASP: Learning rules
We examine the mathematical formulations for ASP to understand how the temporal dynamics dictate the plasticity that eventually enables the SNN to learn to forget as well as adapt to new patterns.

Recovery Phase
To improve simulation speed, the weight dynamics are computed using synaptic traces 16 . In ASP learning, the synapses keep track of three different kinds of traces corresponding to pre and post-synaptic neuron's spiking activity: a) Recent presynaptic trace ( rec ) that doesn't accumulate over time (only accounts for the most recent spike), b) Accumulative presynaptic ( acc ) trace that adds over time (accounts for the entire spike history of the presynaptic neuron for a given time period or epoch during which a particular pattern is presented to the SNN), c) Postsynaptic trace (Post) that accumulates over time based on the postsynaptic neuron's spiking activity. Each of the traces is evaluated as follows wherein the trace is increased when a spiking activity is observed, otherwise it decays exponentially: Now, the time constant for decay of the accumulative pre-trace ( acc ) has to be larger than that of the recent pre-trace ( rec ) so that spike history can be appropriately added. In our simulations, acc = 10 rec , post = 2 acc . We adopt a modified version of the power-law weight dependent STDP model 13,14 to obtain the weight changes during the recovery phase (i.e. in presence of input stimulus) of ASP. When a postsynaptic spike arrives at the synapse, the weight change Δw is calculated based on the presynaptic trace ( rec , acc ) where η(t) is a time dependent learning rate that is inversely proportional to the post-synaptic trace value (Post(t) from Eqn. 3) at a given time instant. As the post-synaptic neuron in the excitatory layer starts spiking for a given input, the learning rate will decrease. This will ensure that a particular neuron retains and stably learns a particular input pattern. It also prevents the neuron from quickly adapting to a new pattern (or catastrophic forgetting). The offset ensures that the presynaptic neurons that rarely lead to firing of the postsynaptic neuron will become more and more disconnected (or the synaptic weight values will depress).
In case of digit inputs, the black (or off) pixel region for a particular digit will become disconnected resulting in lowering of synaptic weight values corresponding to the pre-neurons in the lower pixel intensity region. In Eqn. 4, the first part represents the weight change (potentiation or depression) based on the most recent pre-synaptic spike (as with STDP). However, as seen earlier, erasure of memory traces is prominent with STDP as in its simplest form any pre/post spike pair will modify the synapse. Besides precise spike timings that identify the correlation between input patterns, learning rule should incorporate the significance of the inputs to modulate the weights. As the inputs are continually changed, an SNN (with fixed resources or size) should gradually forget obsolete data while retaining important information. Thus, input based significance driven learning would enable the SNN to learn in a stable-plastic manner in a dynamic environment.
The second part of Eqn. 4 quantifies the dependence of the weight change on the significance of the input pre-neuron. We define an input neuron to be significant if it has more frequent spikes. In that case, acc value will be high that would eventually make the second term in Eqn. 4 less dominant for determining the final weight update. Thus, for more frequent input spikes at the pre-neuron, the weight update will be more prominent. Hence, the learning rule encompasses significance of the inputs with standard synaptic plasticity. It can be deduced that the prominent weights will essentially encode the features that are common to different input classes as the pre-neurons across those common feature regions in the input image will have higher firing activity. This eventually helps the SNN to learn more common features across different input patterns to obtain more generic representation of the data.

Decay Phase
The decay phase in ASP learning is activated in the absence of input stimulus i.e. when no spiking activity is observed at the synaptic terminals connecting pre and post neuron. It involves the forgetting of the weights for insignificant information to enable the SNN to learn new data without catastrophic forgetting or overlap of representations. As discussed earlier, the weights undergo an exponential decay towards a baseline value as where α is a decay constant and leak is the time constant of decay. leak is a time dependent quantity that is proportional to the post-synaptic trace value (Post(t) from Eqn. 3) and the membrane threshold value ( thresh + obtained from homeostasis behavior) at a given time instant. Now, it is desirable that the weights that have learnt a pattern should leak less in order to retain the learnt information. A neuron that has learnt a particular pattern will have a higher spiking activity (or higher post trace value Post(t)) that will increase the time constant of decay, leak . Higher leak causes the weight to forget or leak less. Post(t) will be higher for an excitatory neuron that has learnt an input pattern that is recent and presented latest to the network. The overall leak rate can be defined as / leak that decreases with increasing leak .
While Post(t) is indicative of how recent and latest the input pattern is, it does not account for the significance of the input pattern. We define significance in terms of number of times a particular pattern has been presented to an SNN. The membrane threshold (obtained from homeostasis) of a post-neuron is representative of the significance of the input pattern. A neuron that has learnt a given pattern will spike more when that pattern is presented several times to the network. An excitatory layer neuron's membrane threshold will be high only when it is firing more. Higher membrane threshold implies that the corresponding excitatory layer neuron has learnt a significant pattern. Hence, the SNN learns to forget insignificant older information while trying to retain more recent and significant, yet old, data using ASP.
A key aspect to note here is that the weight leak in the decay phase is based only on the post-neuron's spiking activity (and membrane threshold). All weights connected to a post-neuron in the excitatory layer will have the same decay time constant, leak and hence show uniform leak dynamics during the decay phase. On the contrary, during recovery phase, the weight dynamics of each synapse will be different as it is determined by both the post and pre-neuronal spiking activity.

Supplementary Note 4:
ASP-based learning to forget is compatible with other proposed neuromorphic device technologies As discussed earlier, the dopant interaction with the perovskite lattice seen in experiment and studied by ab initio dynamical simulations enables habituation-based plasticity. This is key to the perovskite's forgetting capability that motivates the ASP learning. In recent years, non-volatile and/or filamentary switch device elements including spin-based and memristors to emulate the behavior of neural systems have been proposed [17][18][19][20][21][22][23][24][25] . While our correlated perovskite can emulate forgetting similar to the animal world, other nonvolatile devices can be made to forget or leak their conductance by applying electrical pulses. Our proposed ASP can therefore be synergistic with those devices as well. Thus, ASP can be incorporated with a broad range of programmable devices to construct robust self-adaptive artificial neural systems for dynamic environments.

Supplementary Note 5: AIMD Movie
Ab initio molecular dynamics simulations showing migration of proton in H-doped monoclinic SNO crystal at 300 K. The proton hops from one O atom to another neighboring O atom within the NiO6 octahedron in a facile manner (see Fig. 2e for details on the activation barriers). The Ni, O, Sm and H are depicted as green, red, yellow, and blue spheres respectively. For the sake of clarity, only the hydrogen, and the Ni/O atoms belonging to the two NiO6 octahedra closest to the hydrogen are shown as large spheres; the atoms far away from the hopping phenomena are depicted with small translucent spheres. Our AIMD simulations at 300 K at various H doping levels show that the SNO lattice monotonically expands with addition of hydrogen approaching lattice expansion of ~5% for 1 H per unit cell of SNO.