Introduction

Recent trends in computation highlight a shift from conventional computing towards memory-centric computing. In conventional computing the processors are central, and the data subject to processing are transferred to the processors from a separate memory unit. Memory-centric computing avoids this data transfer through the memory hierarchy by placing processing power near the memory domain1,2. Examples of memory-centric computing include near-data-processing and in-memory processing (also known as processing-in-memory or computing-in-memory). A significant motivator for this shift is deep learning which requires immense memory capacity but simple data processing. Conventional computing exhibits significant shortcomings for this particular workload due to the enormous amount of data transferred between the separated memory and processors. These shortcomings include the memory wall, arising from the difference in performance between the processor and memory (processor > memory) and the consequent bottleneck in performance caused by the memory latency, and the immense power consumption over the buses between the processor and memory3. Specifically, the majority of the workload for deep learning results from the elementary operation for vector-matrix multiplication, which is a multiply-accumulate operation (one multiplication and one accumulation operation). Despite the simplicity of each operation, repetition over a massive matrix creates an enormous workload for the hardware because of its complexity O(n2). Notably, the trend in deep learning computing in recent years indicates an exponential increase in operation number; AlphaGo Zero in 2018 needed approximately 300,000 times the number of operations that were required for AlexNet in 2012. This trend is expected to continue. Therefore, employing memory-centric computing as a complementary approach or, more radically, an alternative approach to conventional computing is unavoidable if we wish to maintain sustainable progress in deep learning technologies.

Inference in deep learning only needs to read the weights in the memory, unlike training that needs to read and write the weights. Most of the workload for the hardware arises from inference rather than training as fully trained neural networks are only minimally re-trained and repeatedly applied to the given input data. Therefore, memory-centric computing for deep learning acceleration requires appropriate memories that have (i) large memory capacity, (ii) low latency in-memory read-out, (iii) low power consumption on memory read-out, (iv) non-volatility, and (v) complementary metal-oxide-semiconductor (CMOS) compatibility. Fast writing at low power is also desirable as a second priority. A common measure of hardware performance for inference is tera-operations per second per watt; therefore, requirements (ii) and (iii) directly improve the hardware performance. Requirement (i) is necessary because the state-of-the-art deep neural networks (DNNs) that can recognize real-world data are substantial in depth and unit number per layer. For instance, convolutional neural network (CNN)-based DNNs, such as, AlexNet4, VGGNet (specifically, VGG-19)5, GoogLeNet6, ResNet (specifically, ResNet-152)7, include approximately 60 M, 138 M, 4 M, and 60 M weights, respectively. When using a full precision float 32-bit format, the memory for the weights in a single model reaches 1.9 Gb, 4.4 Gb, 128 Mb, and 1.9 Gb, respectively. The memory should be sufficiently large to host these weights on-site to accelerate significantly the inference task. Requirement (iv) avoids loading the memory with massive amounts of weight data whenever it is rebooted. Compatibility with standard CMOS technologies (requirement (v)) is a critical criterion because the memory should be cointegrated with CMOS-based processing units.

Considering these requirements, there are several non-volatile memories which are regarded as potential storage-class memories combining the advantages of main memories (random-access memory (RAM)) and data storage. They include ferroelectric RAM (FRAM)8, spin-torque-transfer RAM (STT-RAM)9, phase-change RAM (PcRAM)10, and resistive RAM (RRAM)11. Thus far, these have not been commercialized as standalone storage-class memories due to a few shortcomings. For instance, FRAM and STT-RAM have an unavoidable limit to their memory capacity due to the use of transistors as bit-cell selectors and difficulty in fabrication. PcRAM may achieve high memory capacity using passive bit-cell selectors such as diodes and ovonic threshold switches; however, its high programming power and difficulty in multilevel programming preclude it from commercialization as storage-class memory. Similar to PcRAM, RRAM achieves high memory capacity and offers multilevel programming; however, its programming endurance is much lower than dynamic RAM and static RAM. Nevertheless, among these non-volatile memories, RRAM is the most likely candidate for the memory for memory-centric architecture for deep learning acceleration regarding requirements (i)–(iv) as the top priority. As a second priority, the advantages presented by RRAM in satisfying these requirements far outweigh its drawback of limited programming endurance. As a result, in-memory processors based on non-volatile memory frequently employ RRAM loaded with weight matrices12,13,14,15,16.

RRAM offers feasible solutions to high memory capacity (requirement (i)) due to its multilevel operation and scalability down to the 4F2 design rule. Each memory bit-cell in RRAM is capable of multibit (n-bit; n > 1) representation using its 2n resistance levels17,18. This significantly increases its memory capacity. Moreover, RRAM can be realized in passive crossbar arrays (CAs) where each bit-cell is formed at an intersection of a row- and column-line, meeting the ultimate 4F2 design rule19,20,21. Furthermore, the sneak current through unchosen cells, that leads to read-out errors should be considered22,23. Staking a passive selector on the memory cell at a cross-point avoids read-out errors only if it is possible for the selector to address just the chosen bits with negligible interference, similar to transistors in active CAs. However, because a single terminal is used to turn on the selector, read, and program the memory cell, the independent operation of each of the two series elements may be challenging unless the selector is specifically tailored to the memory cell or vice versa. An alternative is to use self-rectifying memory cells (SRMCs) which are single memory cells, whose highly nonlinear and asymmetric current–voltage (IV) behavior alone enables the current sensing amplifier to distinguish between chosen and unchosen cells24,25,26,27,28,29,30. SRMCs have attracted significant attention because of their simplicity in bit-cell structure and thus potential compatibility with three-dimensional memory structure, enriching candidates for SRMCs, for example, NbOx/TiOx/NbOx26, TiO2/HfO229, Ta2O5/HfO2-x24, and Al-doped HfO227. Albeit excellent in most aspects, each has shortcomings that hinder it from being a promising candidate for an SRMC.

In this paper, we propose an Hf0.8Si0.2O2/Al2O3/Hf0.5Si0.5O2 trilayer-based SRMC that accurately meets the requirements for the main memory in memory-centric computing. Our proposed device has high selectivity (ca. 104) and reliable 2-bit representation, which were verified in single cells in support of requirement (i), along with extremely low power consumption on a single read-out operation with 4 and 0.8 nW for low resistance state (LRS) and high resistance state (HRS), respectively, and latency of <10 μs in a single read-out operation in support of requirements (ii) and (iii). Moreover we also demonstrate the excellent non-volatility (data retention >104 s at 85 °C) in support of requirement (iv), and a programming pulse amplitude below 5 V, which is compatible with the CMOS voltage driving circuits in support of requirement (v). We summarize these features and compare with findings from previous studies in Table 1. The selectivity value of each reference was extracted from the current ratio between at the maximum programming voltage and its negative one-third voltage.

Table 1 Performance comparison between our SRMC and previous results.

Results

Resistive switching operation of unit SRMCs

The proposed SRMC is based on an Hf0.8Si0.2O2/Al2O3/Hf0.5Si0.5O2 trilayer between a Ru top electrode (TE) and TiN bottom electrode (BE). The device fabrication procedure is detailed in the Methods Section. Figure 1a shows 90 IV hysteresis loops at 85 °C for 30 different SRMCs of 2 × 2 μm2 area (three loops each). We chose a memory operation temperature of 85 °C which is the upper bound of the industrial temperature range (−40–85 °C). Notably, no electroforming was needed to activate the switching behavior. From the results, negligible cell-to-cell variability in IV behavior even at the elevated temperature was first identified. With the measured bipolar switching (BS) characteristics, both set (from HRS to LRS) and reset (from LRS to HRS) switching events are gradual under positive and negative voltage, respectively. The IV loops in Fig. 1a highlight large asymmetry in IV behavior between positive and negative voltage and large nonlinearity in IV on both sides and are eligible to be used as SRMCs. A voltage range that allows extremely low current (barely sufficient to distinguish between different resistance states) is referred to as an inhibit region; the large inhibit region (−0.8–0.6 V) of our SRMC is favorable to inhibit the sneak current through unselected cells in a passive CA. Additionally, given the gradual set switching behavior, which is a self-compliance characteristic, no external current compliance is needed to protect the cell from a hard breakdown. This self-compliance characteristic is particularly desirable for passive CAs because a lack of transistors would otherwise limit current flow through memory cells.

Fig. 1: Electrical characterization of the unit self-rectifying resistive memory cell (SRMC).
figure 1

a DC IV characteristics of 30 SRMCs. Arrows indicate switching direction. Readable current margin verified at 2 V is 0.4–2 nA. b Resistance states programmed by varying amplitude of programming voltage pulse for three pulse widths (50 μs, 100 μs, and 1 ms). c Read-out current in response to read-out pulse (2 V and 5 μs in amplitude and width, respectively). Current evaluated from voltage across 1 MΩ internal resistor of oscilloscope. d Memory retention of characteristic of 20 SRMCs in HRS and LRS as programmed and after baking (at 85 °C for 2 h). e Programming endurance of SRMC using 4.2 V/100 μs set and −4.3 V/100 μs reset pulses. f Read disturb characteristic of SRMC using repetitive reading pulse of 2 V/10 μs. (gray and red circle for LRS and HRS, respectively).

We subsequently examined the BS of our SRMC in response to programming voltage pulses of different amplitudes (0–4.3 V) and widths (50 μs, 100 μs, and 1 ms). The measurement results in Fig. 1b indicate the onset of set switching at a positive voltage and a gradual reset behavior with the increase in reset voltage amplitude. The onset implies a threshold voltage for set switching, which enables non-destructive reading with a read-out voltage below this threshold voltage. Accordingly, we chose a read-out voltage of 2 V. The voltage pulses of 50 μs duration were insufficient to set the SRMC to a fully LRS, unlike the 100 μs and 1 ms duration cases (Fig. 1b), so we set the standard programming pulse width to 100 μs thereafter.

The high resistance, even in the LRS, causes a long RC time constant. This delimits the read-out speed significantly. We examined the read-out speed of the SRMC by applying a read-out pulse (2 V in amplitude and 5 μs in width) to five LRS SRMCs in parallel (Fig. 1c). It should be noted that we used five parallel SRMCs because the current level from a single SRMC is so small that it is barely measurable by an oscilloscope. The current plateau was reached after ~3 μs and the same delay was shown during discharging. Therefore, the read-out latency is below 10 μs.

The key to non-volatility is data retention at real memory-operating temperatures. Hence, we tested the stability of the LRS for 20 SRMCs maintained at the elevated temperature of 85 °C for 2 h. The 20 SRMCs were first programmed to the HRS using identical reset voltage pulses and their currents were read at 2 V. They were subsequently programmed to the LRS using identical set voltage pulses and the currents were read at 2 V. The 20 SRMCs in the LRS were heated up to 85 °C for 2 h, followed by current read-out at 2 V. The results in Fig. 1d indicate the excellent data retention even at the elevated temperature and almost negligible cell-to-cell variability in BS operation. Additionally, retention measurement at a higher temperature (125 °C for 2 h) on a single cell was also performed to confirm the stable data non-volatility as shown in Supplementary Fig. 1. We also identified the programming endurance of the SRMC for up to 106 cycles, each with +4.2 V set and −4.3 V reset pulses (Fig. 1e). As elaborated in the Introduction section, because the number of read operation is much larger than that of writing operation in in-memory computing application, we examined read disturb characteristics of LRS and HRS by applying repetitive read pulses of 2 V. (10 μs width) (Fig. 1f) Up to 1010 of reading operation, our SRMC showed stable non-volatility in each resistance states, which largely exceeds the 106 of endurance characteristic. (Fig. 1e)

Structural analyses of SRMC

Our SRMC is a vertical stack of Ru/ Hf0.8Si0.2O2/Al2O3/Hf0.5Si0.5O2 /TiN as confirmed by a cross-sectional high-resolution transmission electron microscope (HR-TEM) image (Fig. 2a). The upper Hf0.8Si0.2O2 and lower Hf0.5Si0.5O2 differ in chemical composition and are referred to as HSO1 and HSO2, respectively. HSO1 and HSO2 are separated by a 1-nm-thick Al2O3 layer. These three layers are sandwiched between a Ru TE and TiN BE. Auger electron spectroscopy (AES) analyses on the SRMC consistently identify the stack structure as shown in Fig. 2b. Further analysis of the AES data indicates that HSO1 and HSO2 differ in chemical composition such that the cation-to-anion ratio is larger in HSO1 than in HSO2 (Fig. 2c). Additionally, we performed X-ray photoelectron spectroscopy analysis on our SRMC stack to compare the HSO1 and HSO2 layers (Fig. 2d–f). The two layers largely differ in the peak energy of an O1s spectrum; the spectrum for HSO1 peaks at approximately 530.4 eV while that for HSO2 peaks at ~530.0 eV. The higher peak energy in HSO1 indicates a higher concentration of non-lattice oxygen than in HSO2,31,32. Rutherford Backscattering Spectroscopy (RBS) measurement results shown in Supplementary Fig. 2 indicate that the chemical composition of HSO1 and HSO2 is Hf0.8Si0.2O2 and Hf0.5Si0.5O2, respectively.

Fig. 2: Microstructural and chemical analyses.
figure 2

a Cross-sectional high-resolution transmission electron microscope image of our SRMC. b Depth profile of the elements, which were measured by Auger electron microscopy. c Atomic ratio (Hf+Si)/O along depth of SRMC. X-ray photoelectron spectra of d Hf4f, e Si2p, and f O1s emission for HSO1 and HSO2.

Resistive switching mechanism and current behavior of SRMC

Regarding current transport in our SRMC, the current in both the LRS and HRS scales with a device area in the wide range 0.0484–100 µm2 is plotted in Supplementary Fig. 3. This indicates interface-type switching as opposed to localized switching33; the whole device area is responsible for the switching by modulating the interfacial electronic energy barrier in a non-volatile manner34,35. This is consistent with the fact that our SRMC did not require an electroforming process, which is known to introduce conducting filaments11,36. In this regard, the largely asymmetric IV behavior may be due to the use of asymmetric metal electrodes and thus asymmetric interfacial barrier heights.

The asymmetry in the interfacial barrier height was acquired by fitting the Schottky emission equation37,38 to the experimental IV data at different temperatures (45–85 °C). Here, the assumption was that the interfacial barrier at the cathode dictates the overall current transport through the SRMC such that the barrier limits the injection current level. The measured data on the ln(I/T2) and reciprocal kBT plane indicate good linearity, where T and kB are absolute temperature and Boltzmann’s constant, respectively (Fig. 3a). This analysis yields a barrier height pair, at the top and bottom interfaces, for the HRS and LRS. For both states, electron injection from the TE (i.e., under negative voltage), encounters a higher Schottky barrier than the injection from the BE (i.e., under positive voltage), implying asymmetry in the barrier height due to asymmetry in the electrode presently in use (Fig. 3b).

Fig. 3: Current behavior in temperature domain emission equation and data retention for SRMC devices.
figure 3

a Fitting Schottky emission equation to current measured at various temperatures (45–85 °C) and ±2 V for HRS and LRS. b Estimated barrier heights (ϕ1 and ϕ2) indicated in inset for HRS and LRS. c Data retention for the proposed SRMC (Dev 3) at 85 °C compared with Dev 1 (Ru/HSO2/TiN) and Dev 2 (Ru/HSO1/HSO2/TiN). The as-programmed current level and current level at 7200 s are denoted by I(0) and I(7200 s), respectively.

The change in barrier height upon switching may be attributed to oxygen vacancy redistribution by the applied programming voltage39,40. Oxygen vacancies are redistributed in response to the direction of a programming field by electronic drift, resulting in the polarization of space charge. However, one should consider the high depolarization field built up during the programming period, which takes effect immediately after the removal of the programming field40. This presents a significant challenge for data retention and thus non-volatility. The outstanding data retention in our SRMC may be achieved by the use of a separate oxygen reservoir (HSO1) by an oxygen-blocking layer (Al2O3). Figure 2c shows that it is conceivable that the HSO1 layer may have a higher oxygen vacancy concentration than the HSO2 layer, serving as an oxygen reservoir, which creates excellent data retention. Data that support this hypothesis are presented in Fig. 3c; unlike the Ru/HSO1/Al2O3/HSO2/TiN stack, single-layer-based Ru/HSO2/TiN stack creates a severe retention issue, identifying a current decrease by 70% at 85 °C. However, the key to high data retention in our SRMC lies not only in the oxygen reservoir (HSO1) but also in the 1-nm-thick Al2O3 layer between HSO1 and HSO2. A comparison between the Ru/HSO1/Al2O3/HSO2/TiN and Ru/HSO1/HSO2/TiN SRMCs shows further improvement in data retention by inserting the Al2O3 layer between the HSO1 and HSO2 layers (Fig. 3c). Therefore, it is believed that the thin Al2O3 layer hinders the depolarization of space charge (oxygen vacancy)41,42.

Modeling of switching in SRMCs

To identify the role of each layer in our SRMC, we modeled our SRMC from scratch regarding oxygen vacancy dynamics in response to the applied voltage. A schematic of the one-dimensional SRMC configuration is shown in Fig. 4a. We considered the trilayer as a mixed ionic-electronic conductor with free electrons and oxygen vacancies as mobile electronic and ionic defects. The oxygen vacancy redistribution in time and space domains was fully addressed using the Fick’s second law. We used the quasi-static approximation to consider electron distribution in the SRMC given the large difference in diffusion coefficient between oxygen vacancy and electron. The defining features of the model are as follows.

  • The electron transport is thermally activated such that the interfacial electron transport conforms to the Schottky emission and the bulk electron transport to the band conduction rather than localized conduction.

  • HSO1 is given a lower reference state chemical potential \({\mu }_{i}^{0}\) for oxygen vacancy than HSO2 to allow HSO1 to hold a larger oxygen vacancy density than HSO2 to be consistent with the experimental data in Fig. 2.

  • The Al2O3 layer is given a lower oxygen vacancy diffusion coefficient than HSO1 and HSO2 by one order of magnitude.

  • The Ru TE works as an oxygen vacancy source.

  • An interfacial layer (IL) is placed at each interface, which may work as the Helmholtz layer39,40.

  • The breakdown of the first-order approximation (FOA) of the drift-diffusion equation is considered, which is likely the case when the internal electric field exceeds a few 10s MV/cm as for our SRMC.

  • The experimentally observed asymmetry in IV is realized by using asymmetric electrodes with different work functions (WRu > WTiN).

Fig. 4: Resistive switching simulation.
figure 4

a One-dimensional configuration of the SRMC for simulation. b Simulated IV loop (quasi-static behavior) in comparison with experimental data. c Simulated switching behaviors in response to voltage pulses of different widths and amplitudes. d Simulated LRS retention for the HSO2-only cell (Dev 1), HSO1/HSO2 cell (Dev 2), and HSO1/Al2O3/HSO2 SRMC (Dev 3). e, f Simulated oxygen vacancy distributions in the trilayer SRMC and HSO1/HSO2 cell in the LRS (upper panel) and HRS (lower panel). The change of the distribution in each state was monitored in the time range (0–7200 s). g Retention of areal density of oxygen vacancies in the LRS in each layer of the trilayer and bilayer cells.

The calculation procedure is elaborated in the Methods section.

In our model, the dc electronic current in the steady-state is dictated by the electronic injection current at the cathode, which conforms to the Schottky emission. That is, the injection current is determined by the interfacial electric field that lowers the Schottky barrier height (SBH) by image force. The redistribution of oxygen vacancies by programming voltage alters the interfacial electric field at both interfaces because the Debye length for the oxygen vacancy density considered is larger than the thickness of our trilayer. The relation between the interfacial electric fields and oxygen vacancy distribution is best explained using Poisson’s equation.

$$\frac{{ {\mathrm{d}}E}}{{ {\mathrm{d}}x}}=\frac{q\rho \left(x\right)}{{\epsilon }_{r}{\epsilon }_{0}}\approx \frac{q{c}_{{{\mathrm{Vo}}}}\left(x\right)}{{\epsilon }_{r}{\epsilon }_{0}},$$

where the space charge density ρ is approximated to the oxygen vacancy density cVo because the free electron density is much lower than the vacancy density due to the large bandgap in the trilayer. The dielectric constant and vacuum permittivity are denoted by \({\epsilon }_{r}\) and \({\epsilon }_{0}\), respectively. That is, the trilayer is fully depleted. Solving the differential equation for the electric field at the bottom interface E(0) or the top interface E(d) yields the following equations.

$$\left\{\begin{array}{c}E\left(0\right)=-\frac{{V}_{{{\mathrm{ap}}}}}{d}-\frac{q}{{\epsilon }_{r}{\epsilon }_{0}d}{\int }_{0}^{d}{\int }_{0}^{x}{c}_{{{\mathrm{Vo}}}}\left({x}^{{\prime} }\right) {\mathrm{d}}{x}^{{\prime} }{ {\mathrm{d}}x}\\ E\left(d\right)=-\frac{{V}_{{{\mathrm{ap}}}}}{d}+\frac{q}{{\epsilon }_{r}{\epsilon }_{0}}{\int }_{0}^{d}\left({c}_{{{\mathrm{Vo}}}}\left(x^{\prime} \right)-\frac{1}{d}{\int }_{0}^{x}{c}_{{{\mathrm{Vo}}}}\left(x^{\prime} \right){ {\mathrm{d}}x}^{\prime} \right){ {\mathrm{d}}x}\end{array}\right.,$$
(1)

where d denotes the total thickness of the trilayer. From these equations, it is obvious that the change in vacancy distribution alters both interfacial electric fields. The key to non-volatile switching is that (i) set and reset switching operations cause \(\triangle {c}_{{{\mathrm{Vo}}},t\,=\,0}\left(={c}_{{{\mathrm{Vo}}},t\,=\,0}^{{ {\mathrm{LRS}}}}-{c}_{{{\mathrm{Vo}}},t\,=\,0}^{{ {\mathrm{HRS}}}}\right)\) sufficiently large to change E(0) and E(d) and (ii) the programmed distribution should be retained, \(\triangle {c}_{{{\mathrm{Vo}}},t\,=\,0}\approx \triangle {c}_{{{\mathrm{Vo}}},t\,=\,\infty }\).

We took into account the breakdown of the FOA of the drift-diffusion equation in that the oxygen vacancy migration velocity exponentially increases with the electrochemical potential gradient43,44,45. The breakdown of the FOA may be the substrate for the voltage-time dilemma46.

As boundary conditions, the TiN/HSO2 bottom interface forms oxygen-blocking contact while the Ru/HSO1 top interface allows oxygen vacancies to move through the interface (non-blocking contact) with a constant vacancy density on the Ru side of the interface. The parameters used in our modeling are listed in Supplementary Table 1, including several key parameters, e.g., vacancy diffusion coefficients in HSO1, HSO2, and Al2O347,48.

The response of our model to quasi-static stare-case voltage sweep (0.25 V/s) is plotted in Fig. 4b. The simulated IV loop is well consistent with the experimental data, identifying good reproducibility of experimental data in a quasi-static voltage domain. Subsequently, we tested the response of our model to voltage pulses of different widths (50 μs, 100 μs, and 1 ms) and amplitude (0.1–4.5 V). The results are shown in Fig. 4c. Similar to the experimental results, the set switching behavior represents the onset of switching at ~3 V, so that setting read-out voltage to 2 V was allowed as for the experimental measurements. The reset switching behavior (particularly, with 1 ms reset pulses) indicates a gradual change in resistance, in agreement with the experimental data.

The excellent LRS retention for our SRMC was successfully reproduced using our model as shown in Fig. 4d. We also modeled the other cells, HSO2-only cell (Dev 1) and HSO1/HSO2 cell (Dev 2) to identify their LRS retention characteristics. Note that for the two cells we used the same parameters as the SRMC model except their thicknesses. The comparison in Fig. 4d highlights the excellent LRS retention of our SRMC model in support of the experimental data.

As such, the key to data retention is the time-dependent redistribution of oxygen vacancies in each state. Our model simulation allows us to monitor the evolution of oxygen vacancy distribution at any time step. We set and reset the model and kept track of vacancy distribution for 7200 s. The monitored results are plotted in Fig. 4e; the upper and lower panels show the distributions for the LRS and HRS, respectively. The distributions indicate (i) the lower vacancy density in HSO2 than HSO1 in both resistance states, (ii) small change in vacancy density in both states over time, i.e., small \({c}_{{{\mathrm{Vo}}},t\,=\,0}-{c}_{{{\mathrm{Vo}}},t\,=\,7200}\) in both layers, and (iii) small difference in vacancy density between LRS and HRS, i.e., small \(\triangle {c}_{{{\mathrm{Vo}}},t\,=\,0}\) in HSO2. Considering the indication (i), the resistance state of our model is mainly dictated by the oxygen vacancy density in HSO1 rather than HSO2. This is because the interfacial electric fields are mainly determined by large space charge density as shown in Eq. (1); integrating the oxygen vacancy density over HSO2 is much smaller than over HSO1. The indication (ii) is the direct cause of the excellent LRS retention.

The indication (iii) is caused by the Al2O3 oxygen-blocking layer. The low diffusion coefficient of Al2O3 hinders oxygen vacancies from entering into (leaving from) HSO2 during set (reset) switching. This can be seen in comparison with the HSO1/HSO2 cell whose oxygen vacancy distributions in both states are plotted in Fig. 4f. Figure 4f identifies that the lack of the oxygen-blocking layer allows a large number of oxygen vacancies to enter into (leave from) HSO2, unlike the trilayer. Thus, the role of the Al2O3 oxygen-blocking layer in switching is to confine the active switching region to HSO1. The better LRS retention for the trilayer than HSO1/HSO2 is understood in terms of the confined switching to HSO1. According to Eq. (1), the larger the oxygen vacancy density, the larger the interfacial electric field evolves; the larger electric field in the vicinity of the top interface E(d) drives the more oxygen vacancies back to the source (Ru). The unconfined cell (HSO1/HSO2) holds the more oxygen vacancies over than the trilayer, and the larger E(d) releases the more oxygen vacancies to the vacancy source. To show this, we evaluated the areal density of oxygen vacancies in each layer for the trilayer and HSO1/HSO2 bilayer. The areal density was calculated by integrating the vacancy density over the HSO1 or HSO2 region. The results are shown in Fig. 4g, which identifies the larger decay in oxygen vacancy density in HSO2 in the bilayer than the trilayer.

Two-bit operation of SRMCs

The capability of multilevel programming was identified for four resistance levels: one HRS and three LRSs (L1, L2, and L3). The available resistance state ranges from 5 to 1 GΩ at a read-out voltage of 2 V, corresponding to 0.4–2 nA (Fig. 1b). The range was equally divided into four resistance bits, each with one of the four resistance states (0.4–0.8 nA for HRS, 0.8–1.2 nA for L1, 1.2–1.6 nA for L2, and 1.6–2 nA for L3). Each range (0.4 nA in width) was sub-divided into the available current range (0.16 nA) and forbidden range (0.24 nA) to reduce the state overlap probability (SOP) between neighboring states. We applied the incremental step pulse programming (ISPP)/error check and correction (ECC) method17,18 considering the available current range (0.16 nA in width) which is separated from neighboring states by the forbidden range (0.24 nA in width). The multiple resistance levels were programmed using two distinct schemes: (i) erase-and-program and (ii) erase-free schemes. The former fully erases the SRMC from the LRS before each reprogramming, whereas the latter programs a new LRS directly from its current state without the full erase process. The erase-free scheme reduces the time complexity in multilevel programming because the erase process is omitted. These methods are shown in detail in previous studies17.

To address the reliability of multilevel programming, 5 SRMCs were programmed into the four distinct resistance states at 85 °C using each programming protocol. Figure 5a identifies the multilevel operation of the five SRMCs (indexed #1–#5) using the erase-and-program scheme. Each subplot in Fig. 5a shows switching between the HRS and one of the three LRSs (L1, L2, and L3) over 50 cycles for one of the five SRMCs (#1–#5). The erase-free scheme was also applied to another set of five SRMCs subject to switching between L1 and L2, L2 and L3, and L1 and L3 over 50 cycles at 85 °C (Fig. 5b). During ISPP/ECC, the pulse amplitude required to program one of the three LRSs varied. We plotted the cumulative distribution of the pulse amplitudes for L1, L2, and L3 in Fig. 5c. The four measured resistance states were statistically analyzed to identify the SOP between the states as shown in Fig. 5d. The minimum distance in current between neighboring states arises from L2 and L3, which are separated by 5.0σ in a Gaussian distribution. This implies a 2.86 × 10–5% probability of errors in a multibit read-out. Additionally, retention of the 2-bit data is an important concern. We addressed the retention by monitoring the four resistance states at 85 °C for 2 h, yielding the profiles of read currents in Fig. 5e. The data indicate a barely noticeable change in the current level for the four states even at the elevated temperature.

Fig. 5: Two-bit states of SRMC.
figure 5

Two-bit states programmed using a erase-and-program scheme and b erase-free scheme on five SRMCs (indexed #1–#5). c Cumulative distribution of amplitudes of two-bit programming pulses. Average amplitude and standard deviation denoted by m and σ. d SOP between two-bit states. e Retention of two-bit states at 85 °C.

SRMCs in a passive crossbar array

We fabricated a 30 × 30 CA of the SRMCs, each of which was fully addressable. The layouts of the CA and morphology of a single SRMC are shown in Fig. 6a, b, respectively. To address a single cell, we applied an operation voltage (Vop) to the cell column-line (biasing line) with the row-line (ground line) being grounded. The current was measured on the ground line. Additionally, the other column- and row-lines were pulled up to the column- and row-inhibit voltage, respectively, to suppress the sneak current. We considered two biasing schemes: half-biasing (Scheme 1) and one-third biasing (Scheme 2). When addressing a cell, Scheme 1 pulls up the biasing line and half pulls up the column- and row-inhibit-biasing lines, whereas Scheme 2 pulls up the biasing line, pulls up the column-inhibit-biasing lines one-third, and pulls up the row-inhibit-biasing lines two-thirds. Schemes 1 and 2 are summarized in Table 2.

Fig. 6: 30 × 30 CA of SRMCs.
figure 6

a Top view of CA layout. b Scanning electron microscope image of the array. The inset shows an atomic force microscope image of a unit SRMC. Schematic of c Scheme 1 and d Scheme 2. Appended IV loops of 100 randomly chosen SRMCs (three loops for each SRMC), which were measured using e Scheme 1 and f Scheme 2. For Vop = 3 V, voltage across selected cell (red-filled circle), unselected group 1 cell (blue-filled circle), and unselected group 2 cell (black-filled circle) is indicated. Open circuit current was also plotted (blue line). The currents read using Scheme 1 and Scheme 2 on the 100 SRMCs are shown in the distributions in g and h, respectively. Scheme 1: the mean current m and standard deviation σ for HRS and LRS are (6.5 × 10−11 A, 1.8 × 10−11) and (9.3 × 10−10 A, 5.1 × 10−10), respectively. Scheme 2: (6.2 × 10−11 A, 3.2 × 10−11) and (1.0 × 10−9 A, 4.1 × 10−10) for HRS and LRS, respectively.

Table 2 Voltage across cells over a CA for Schemes 1–4.

One hundred different SRMCs in the 30 × 30 CA were randomly chosen to characterize their IV loops using Schemes 1 and 2, illustrated in Fig. 6c, d, respectively. The aim was twofold: the identification of disturbance from the unchosen cells and cell-to-cell variability of switching behavior. For each scheme, three IV loops for each of 100 cells, i.e., 300 loops in aggregate, are shown in Fig. 6e, f. First, a comparison between Fig. 6e (6 f) and Fig. 1a identifies negligible effects of the 899 parallel SRMCs on the selected SRMC, leveraging their self-rectifying and highly nonlinear I-V characteristics. Second, the appended IV loops show negligible variability. The variability was evaluated by statistical analysis on the read-out currents (at 2 V) of the 100 cells (Fig. 6g, h for Schemes 1 and 2, respectively). The distributions for both schemes highlight good cell-to-cell uniformity in the 30 × 30 CA, and thus no overlap between HRS and LRS.

The excellent uniformity in the IV loop is observed; this is due, in part, to the lack of electroforming, which is otherwise likely to endow each cell with uncontrollable random variability. Additionally, the current level in the inhibit region is comparable to the open circuit current level, implying extremely low current in the inhibit region, which is desirable when the CA size becomes large. In this CA configuration, each of the SRMCs is classified as (i) a selected cell between the biasing and ground lines, (ii) an unselected cell either between the column-inhibit-biasing and ground lines or between the biasing and row-inhibit-biasing lines, or (iii) an unselected cell between the column- and row-inhibit-biasing lines. The last two groups are named unselected groups 1 and 2, respectively. For each scheme, the theoretical voltage across an SRMC (voltage on a column-line minus voltage on a row-line) in each group is indicated in Fig. 6c, d. As shown in Fig. 6c, Scheme 1 allows half the pull-up voltage across the unselected group 1 cells (blue-filled circle) and zero voltage across the unselected group 2 cells (black-filled circle). Scheme 2 applies a positive one-third of the pull-up voltage (blue-filled circle) and negative one-third of the pull-up voltage (black-filled circle) to the unselected group 1 and 2 cells, respectively (Fig. 6d). Given a square CA (N × N), the number of cells in the unselected groups 1 and 2 is proportional to N and N2, respectively. Thus, the larger the array, the more dominantly the unselected group 2 cells contribute to the resistance parallel to the selected cell’s resistance. Although the effect of the unselected group 2 cells is barely noticeable in the 30 × 30 CA, it is likely to take effect in larger CAs, which is a challenge.

To address this challenge, we investigated the IV behavior of a predefined selected SRMC embedded in a 160 × 160 (~25 kb) and 320 × 320 (~100 kb) CA. The layouts of the CAs are shown in Supplementary Fig. 4. Note that these arrays were not random-accessible because the number of lines exceeds the number of currently available probes. Instead, we programmed the unselected group 2 cells simultaneously to LRS by leaving all row-lines (except the signal row-line) and all column-lines (except the signal column-line) common. The measurement configuration is depicted in Supplementary Fig. 5. The detail of the measurement is written in the Methods section. Schemes 1 and 2 also applied to the large CAs. For both schemes, the voltage across the selected SRMC and unselected groups 1 and 2 cells is indicated on the IV loop taken from Fig. 1a in Fig. 7a, b. Two additional biasing schemes (Schemes 3 and 4) were considered for comparison. Scheme 3 applies a one-third Vop and two-thirds Vop to the row- and column-inhibit lines, respectively. Therefore, the voltage across the unselected group 1 and 2 cells is two-thirds Vop and one-third Vop, respectively (Fig. 7c). Scheme 4 applies a one-third Vop to both the row- and column-inhibit lines, allowing one-third or two-thirds Vop across the unselected group 1 cell and zero voltage across the unselected group 2 cell (Fig. 7d). The details of Schemes 3 and 4 are summarized in Table 2.

Fig. 7: 160 × 160 and 320 × 320 CAs of SRMCs.
figure 7

ad Illustrations of Schemes 1–4 and voltage across different cells indicated by different colors. IV loops of selected cell that was embedded in e 160 × 160 and f 320 × 320 CA.

To examine the sneak current from the unselected groups 1 and 2 cells, we attempted to program all unselected groups 1 and 2 cells into their LRS and subsequently examined the IV characteristics of the selected cell. The measured IV loops for the selected cell embedded in the 160 × 160 CA are shown in Fig. 7e. The different biasing schemes caused a negligible difference in the IV loops of the selected cell. This confirms that the self-rectifying and nonlinear IV behavior of the SRMC maintains a sufficiently low current through the unselected cells to enable the true current to be read through the selected cell.

The 320 × 320 CA allows the sneak current to vary the ground line current more obviously than the smaller CAs, yielding more obviously distinct IV loops depending on the voltage-application scheme (Fig. 7f). Scheme 2 yields a lower current than Scheme 1 over the whole voltage range, whereas the largest current level was yielded by Schemes 3 and 4. This is because Scheme 2 applies the lowest voltage to unselected group 1 cells, which share the same row-line as the selected cell. Nevertheless, the switching behavior of the selected cell indicates two distinct states despite the sneak current in this seemingly worst-case conductance distribution.

Two-bit operation of SRMCs in the CAs was examined successfully. Owing to the random-accessibility of the 30 × 30 CA, five SRMCs were chosen randomly and subject to the two-bit operation, resulting in readable four states (Supplementary Fig. 6) as for the single cells. Additionally, we identified the two-bit operation of the predefined SRMC in the 320 × 320 CA, yielding clearly distinct four states (Supplementary Fig. 7).

Vector-matrix multiplication acceleration using the 30 × 30 crossbar array

Finally, we identified the feasible acceleration of vector-matrix multiplication (\({\bf{w}}\)×\({\bf{x}}\); \({\bf{w}}\in {{\mathbb{Z}}}^{30\times 30}\), \({\bf{x}}\in {{\mathbb{Z}}}^{30}\)) by reducing the computational complexity to O(n). To this end, we aimed to calculate a dot product \({\bf{w}}\left[i,:\right]\cdot{\bf{x}}\) at one cycle, where \({\bf{w}}\left[i,:\right]\) denotes the ith row of matrix \({\bf{w}}\). We restricted the elements w of matrix \({\bf{w}}\) to 2-bit integers (\(w\in \left\{0,1,2,3\right\}\)) and the elements x to 1-bit integers (\(x\in \left\{0,1\right\}\)). The matrix \({\bf{w}}\) was transposed and mapped onto our 30×30 SRMC CA (conductance of each cell \(\in \left\{{{\mathrm{HRS}}},L1,L2,L3\right\}\)). The vector \({\bf{x}}\) with 1-bit integer elements was encoded as a voltage array \({\bf{V}}_{\rm{ap}}\) (\({V}_{{ap}}\in \left\{0,2V\right\}\)) and applied to the 30 row-lines of the CA (Fig. 8a). The current measured at the ith column-line was the intermediate result of the dot product \({\bf{w}}\left[i,:\right]\cdot{\bf{x}}\). As depicted in Fig. 8b, we addressed one column at one cycle by pulling down the chosen column-line to the ground while inhibit voltages (Vinhibit) were applied to the rest of column-lines (2/3Vap), so that we reduced the complexity to O(n), which is otherwise O(n2).

Fig. 8: Acceleration of vector-matrix multiplication using the 30 × 30 CA.
figure 8

a Configuration of a 30 × 30 matrix w mapped onto a CA of the same size. Vector x is encoded as voltage signals (‘0’ = 0 V, ‘1’ = 2 V) and enters into the row-lines (ROW[0]–ROW[29]). The resulting current vector j as an intermediate product enters into sense amplifiers (SAs) to be quantized. b Schematic timing diagrams of row- and column-line signals. The inhibit voltages applied to unchosen column-lines are denoted by Vinhibit. c Statistics of four states (HRS, L1, L2, L3) in four random matrices (w1–w4). dg (upper panel) Conductance maps of the four random matrices (w1–w4) and (lower panel) measured current vectors j for the four matrices. We considered a vector x of ones. The measurement results are compared with the calculated current vectors using the measured currents on individual cells.

We chose four random matrices (w1, w2, w3, and w4) of different sparsities (0, 25, 51, and 55%, respectively). The percentage of each integer (0, 1, 2, 3) in each matrix is shown in Fig. 8c. The chosen matrices were mapped onto four 30 × 30 CAs such that the individual cells of the CAs were randomly accessed and programmed to the correct conductance states using Scheme 2. The programmed conductance map for each matrix is shown in Fig. 8d–g. The conductance of each SRMC was individually read out at a read-out voltage of 2 V to acquire the maps. We then performed the dot product \({\bf{w}}\left[i,:\right]\cdot{\bf{x}}\) for each i at one cycle with vector \({\bf{x}}\) of ones, i.e., \({\bf{x}}\) = [1, 1, …, 1]. The vector-matrix multiplication operation for each matrix thus consumes 30 column-line-addressing cycles, yielding a current vector \({\bf{j}}\) (\(\in {{\mathbb{R}}}^{30}\)) as the intermediate product (Fig. 8d–g). The measured current at each column-line is almost identical to the current value extrapolated from each cell current in the same column, indicating marginal disturbance from the unselected cells. For the multiplication with four matrices (\({\bf{w}}\)1, \({\bf{w}}\)2, \({\bf{w}}\)3, and \({\bf{w}}\)4), the CA domain consumes powers of 4.22, 3.44, 2.83, and 2.69 μW, respectively. The considered multiplication is the worst case in terms of power consumption because of the extremely dense vector x (of ones).

To output the final product \({\bf{z}}\) (\({\bf{z}}\) = \({\bf{w}}\)×\({\bf{x}}\); \({\bf{z}}\in {{\mathbb{Z}}}^{30}\)), current from the ith column ji for all i needs to be encoded as a binary number, which subsequently enters into the near-memory digital domain for additional processing. A common method is to convert the summed current to voltage and subsequently to quantize the converted voltage using an analog-to-digital converter (ADC)49. Alternatively, the summed current can directly be converted to a binary value using a current sense amplifier (CSA) with multiple reference currents that are iteratively compared with the summed current50. In either way, the important consideration is twofold: (i) energy consumption and (ii) bit-width of the product z. Regarding power consumption, ADCs are well known to consume a considerable amount of energy insomuch as the total energy consumption of an RRAM-based inference accelerator is dominated by the ADCs13. An alternative method using a CSA50 keeps the static current from the chosen line flowing while the summed current being converted iteratively, causing additional energy consumption. The bit-width should be chosen carefully to avoid the performance, i.e., inference, degradation by the quantization bit-width. As shown in quantized neural networks such as DoReFa-Net51, the resolution, i.e., bit-width, of activations more critically determines the inference accuracy than that of weights. The activation resolution is dictated by the bit-width of the output z. Therefore, the bit-width of the product \({\bf{z}}\) is an important consideration in the design of summed current-encoding circuits.

Regarding multibit factor \({\bf{x}}\), time-division multiplexing is a desirable method by encoding the vector x as shown in Fig. 9. Because of the nonlinear IV behavior in the LRS of our SRMCs, encoding a factor as input voltage amplitude is unsuitable unlike linear IV cases14,52. The l-bit elements \({\bf{x}}\)[i] are time-division multiplexed from the least significant bits (LSBs) to the most significant bits (MSBs) and are applied to the row-lines at one column-line addressing cycle for the dot product \({\bf{w}}\left[i,:\right]\cdot{\bf{x}}\). Thus, each dot product cycle includes l sub-cycles. The output current at each sub-cycle is encoded as a binary value and subsequently multiplied by 2k−1, where k denotes the digit corresponding to the sub-cycle. The results are finally summed to output the dot product \({\bf{w}}\left[i,:\right]\cdot{\bf{x}}\).

Fig. 9: Acceleration of multibit vector-matrix multiplication.
figure 9

a Configuration of a mapped weight matrix w (M × N) and multibit vector x (here, 3-bit). Elements x[i] are time-multiplexed, so that the multiplication delay is proportional to the bit-width of elements x[i]. b Timing diagrams of the signals to calculate w[:,i]·x for a given i with multibit elements including x[0] (=b101), x[1] (=b010), and x[29] (=b111). The resulting currents in the three time divisions (j[00], j[01], and j[02]) are first quantized by the SAs and subsequently multiplied by 1, 21, and 22, respectively, and summed in the processing elements (PEs).

Discussion

We proposed an SRMC based on a Hf0.8Si0.2O2/Al2O3/Hf0.5Si0.5O2 trilayer stack, which highlights large selectivity (~104), two-bit operation, low read power (4 nW for LRS and 0.8 nW for HRS), read latency (<10 μs), excellent data retention (>104 s at 85 °C), and CMOS compatibility (maximum supply voltage ≤5 V). Particularly, the large selectivity due to the high asymmetry and nonlinearity in the IV behavior potentially supports high-density passive CAs of the SRMCs, which is one of the key elements to memory-centric computing in support of deep learning acceleration. Feasibility was identified in 30 × 30, 160 × 160, and 320 × 320 arrays of our SRMCs. The IV behavior of an isolated SRMC was reproduced well in the arrays without significant effects on the unselected cells. These excellent characteristics may be attributed to nonfilamentary switching, i.e., switching on the grounds of Schottky barrier modulation at the cathode, which is homogeneous over the device area. A common issue of such nonfilamentary switching is data retention due to the rapid depolarization of point defects, which was overcome by using the engineered trilayer switching stack in this study. Furthermore, the low programming power (ca. 18 nW), latency (100 μs), and endurance (>106) highlight the energy-efficiency and highly reliable random-access memory of our SRMC.

Ideally, CAs may achieve the ultimate complexity O(1) of vector-matrix multiplication beyond the complexity O(n) by addressing all column-lines at one cycle. The basic premise is that all non-ideal factors, e.g., sneak current and line resistance effects, are excluded. The sneak current effect may be marginal because all bit-cells are supposed to be non-negatively biased when all column-lines are simultaneously addressed, i.e., grounded. However, the effect of finite line resistance is significant. The finite line resistance causes the inhomogeneous distribution of bit-cell voltages over the cells on the same row-line such that the further a bit-cell from the row-line contact, the lower voltage is applied across the bit-cell. Further, this effect is boosted when the bit-cells on the same row-line allow simultaneous current flow, which is the case of all column-line addressing.

Additionally, simultaneously addressing all column-lines requires one CSA and following logic circuit per column line, whereas addressing one column-line at a cycle allows one CSA to be shared among a group of column-lines through time-division multiplexing. This additional peripheral circuit-area overhead can be prohibitive in large-scale CAs. Therefore, the complexity reduction to O(1) may be realized only when these challenges are overcome.

Methods

Device fabrication

A 200-nm-thick TiN layer was sputtered on a SiO2/Si substrate and patterned to a shape of crossbar-type BE. The TiN BE was patterned by a conventional photolithography and dry-etching process by an inductively coupled plasma reactive ion etching (ICP-RIE). An ICP power of 200 W and a substrate bias power of 20 W were maintained during TiN etching. During the dry-etching process, the reactant gas flow rates of Ar and Cl2 were maintained at 5 standard cubic centimeters per minute (sccm) and 30 sccm, respectively. Furthermore, the process temperature was maintained at 25 °C by a water-circulation cooling system. The observed etching rate was ~70 nm/min. The residual photo-resist (PR) on the patterned TiN BE was removed by an acetone etchant and cleaned sequentially with isopropyl alcohol and deionized water. The 1-nm-thickness HSO1 and 2-nm-thickness HSO2 thin films layer were then deposited by traveling wave-type ALD at 250 °C using a tetrakis-ethylmethylamido hafnium (TEMA-Hf) and bis(diethylamino)silane (BDEAS) precursor, respectively, and H2O and O2 plasma as a source of Hf and Si oxidant, respectively. To form each of HSO1 and HSO2 layers, the super-cycle ALD processes with HfO2 and SiO2 layers were used. During the deposition of HSO1 and HSO2 thin film, the ALD cycle ratios of 1:3 and 1:1 for SiO2:HfO2 were set, respectively. Between the HSO1 and HSO2 layers, the Al2O3 thin film layer was deposited by traveling wave-type ALD at 150 °C using a tri-methyl aluminum (TMA) and H2O as a source of Al and oxidant, respectively. Subsequently, the crossbar-type TE pattern was formed by photolithography and then the 100-nm-thick Ru layer was deposited by DC magnetron sputtering. Finally, through the conventional lift-off process, the Ru/HSO1/Al2O3/HSO2/TiN stacked device was fabricated. All of the unit and CA devices have identical fabrication processes.

Structural analysis of SRMC device

The sample for TEM analysis was prepared by a focused ion beam (FIB, Helios NanoLabTM by FEI) operation. HR-TEM (Tecnai G2 F30 S-TWIN by FEI) analysis was then performed to obtain a cross-sectional view of the Ru/HSO1/Al2O3/HSO2/TiN stacked RS device. The XPS analysis was performed to examine the chemical binding status of the HSO1 and HSO2 layer with an X-ray photoelectron spectrometer (XPS, Thermo Fisher Scientific Inc.) using an Al source with a spot size of 400 μm and energy step size of 0.1 eV. The samples for XPS analysis were prepared using blanket-type HSO thin films on a TiN substrate. It should be noted that because the thicknesses of HSO1 and HSO2 in the device are very thin, additional samples were prepared for XPS analysis. The elemental depth profile was obtained using AES (ULVAC-PHI 700, coaxial full CMA type analyzer, 10 kV/10 nA of electron beam energy) measurements. During the AES measurement, a sputtering rate of ~0.2 Å/s was maintained. RBS (National Electrostatics Corp.) analysis for qualified chemical composition of our active layers was performed with separately prepared HSO1 and HSO2. (50-nm-thick each on SiO2/Si substrate) AFM (Digital Instruments Dimension 3000, Veeco Science) analysis was conducted to observe the CA device morphology and determine the exact feature sizes of the RS device. The CAs were observed using a scanning electron microscope (SEM, JEOL JSM-6700F).

Electrical measurements

The resistive switching characteristic of the device was measured using an HP4145B semiconductor parameter analyzer in the IV sweep mode. Measuring temperature was controlled by a hot stage using a temperature controller. The pulse-based electrical measurements were conducted using an HP4145B, arbitrary function generator (Agilent 81150 A), oscilloscope (MSOX3024T, Tektronix), and electromechanical radiofrequency electric-circuit switch box. Throughout the measurement processes, the voltage was biased to the Ru TE, while the TiN BE was electrically grounded. The resistance (or current) values of the programming and erasing were verified at 2 V using the SPA. Measuring the 2-bit RS operation, ISPP/ECC algorithms were performed using two convertible electrical circuits composed of [AFG-RS device-OSC] and [SPA-RS device], respectively. These two types of electrical circuits were approached alternately by the electromechanical RF electrical circuit switch boxes. In the random-access operation (Fig. 6), the 30 × 30 sized matrix switching zig was additionally equipped in the previous electrical circuit. All electrical measurements were performed using a LabViEWTM-based control program.

Modeling of resistive switching dynamics

We modeled the resistive switching behaviors of our SRMC regarding oxygen vacancy dynamics in response to the applied voltage. The one-dimensional model configuration considered is shown in Fig. 4a. The SRMC consists of five layers, HSO1/Al2O3/HSO2 plus two interfacial dipole layers between HSO1 and TE (IL1) and HSO2 and BE (IL2).

Note that we also considered an equivalent series resistor (ESR) with its resistance RESR. The voltage across the SRMC (Vc) at the applied voltage Vap is therefore expressed as

$${V}_{{{\mathrm{ap}}}}=-A{R}_{{{\mathrm{ESR}}}}j+{V}_{c},$$
(2)

where j and A denote the electric current density through the whole circuit and SRMC area, respectively. According to the Kirchhoff’s current law (KCL), the electric current density j should be invariant along the spatial axis x. The current density j consists of dc current density jdc and displacement current density as follows.

$$j={j}_{{{\mathrm{dc}}}}-{\epsilon }_{r}{\epsilon }_{0}\frac{d}{{{\mathrm{d}}t}}\left(\frac{{{\mathrm{d}}V}}{{{\mathrm{d}}x}}\right),$$
(3)

where \({\epsilon }_{r}\) and \({\epsilon }_{0}\) denote the dielectric constant and permittivity of vacuum, respectively. The dc current density jdc consists of an electronic contribution je and ionic contribution \({j}_{{{\mathrm{dd}}}}^{i}\) because the SRMC is considered as a mixed ionic-electronic conductor (MIEC). The ionic (oxygen vacancy) current density \({j}_{{{\mathrm{dd}}}}^{{Vo}}\) is attributed to the drift and diffusion of oxygen vacancies, which are driven by the gradients of internal electrostatic potential and chemical potential, respectively. The same holds for electronic dc current density je. Nevertheless, the much larger diffusion coefficient (and thus mobility) of electrons than that of oxygen vacancies allows us to use the quasi-static approximation in that the electronic distribution and current retain their equilibrium at given distributions of oxygen vacancies and internal potential across the SRMC at any given time.

Integrating Eq. (3) over x at a given time t yields

$$\frac{{\mathrm{d}}{V}_{c}}{{{\mathrm{d}}t}}=\int_{0}^{{d}_{{{\mathrm{tot}}}}}{\left({\epsilon }_{r}{\epsilon }_{0}\right)}^{-1}{j}_{{{\mathrm{dc}}}}{{\mathrm{d}}x}-j\int_{0}^{{d}_{{{\mathrm{tot}}}}}{\left({\epsilon }_{r}{\epsilon }_{0}\right)}^{-1}{{\mathrm{d}}x}$$
(4)

Note that the dielectric constant \({\epsilon }_{r}\) is a function of x given the multilayer structure, and the total current j is independent of x according to the KCL. Differentiating Eq. (2) with time t and subsequently entering Eq. (4) lead to

$$\frac{{\mathrm{d}}{V}_{{{\mathrm{ap}}}}}{{{\mathrm{d}}t}}=-A{R}_{{{\mathrm{ESR}}}}\frac{{dj}}{{{\mathrm{d}}t}}+\int_{0}^{{d}_{{{\mathrm{tot}}}}}{\left({\epsilon }_{r}{\epsilon }_{0}\right)}^{-1}{j}_{{{\mathrm{dc}}}}{{\mathrm{d}}x}-j\int_{0}^{{d}_{{{\mathrm{tot}}}}}{\left({\epsilon }_{r}{\epsilon }_{0}\right)}^{-1}{{\mathrm{d}}x}$$
(5)

Equation (5) was numerically solved using the finite different method, which allows us to discretize Eq. (5) in the time interval \({t}_{1}-\left({t}_{1}+\triangle t\right)\) as follows.

$${\triangle t}^{-1}\left[{V}_{{{\mathrm{ap}}}}\left({t}_{1}+\triangle t\right)-{V}_{{{\mathrm{ap}}}}\left({t}_{1}\right)\right]=-A{R}_{{{\mathrm{ESR}}}}{\triangle t}^{-1}\left[j\left({t}_{1}+\triangle t\right)-j\left({t}_{1}\right)\right]+{B}_{1}-{B}_{2}j\left({t}_{1}+\triangle t\right)$$
(6)

B1 in Eq. (6) is the integration of the dc current density jdc over axis x, which should be separately calculated across each layer because of the inhomogeneous dielectric constant distribution through the layers.

$${B}_{1}=\mathop{\sum}\limits_{i}\mathop{\sum}\limits_{j}{\left({\epsilon }_{r}^{\left(i\right)}{\epsilon }_{0}\right)}^{-1}{j}_{{{\mathrm{dc}}}}\left({t}_{1}+\triangle t\right){\triangle x}^{\left(j\right)},i\in \left\{{{{\mathrm{IL}}}}_{1},{{{\mathrm{HSO}}}}^{1},{{{\mathrm{Al}}}}_{2}{{\mathrm{O}}}_{3},{{{\mathrm{HSO}}}}^{2},{{{\mathrm{IL}}}}_{2}\right\}$$
(7)

B2 in Eq. (6) is the integration of \({\epsilon }_{r}{\epsilon }_{0}\) over axis x.

$${B}_{2}=\mathop{\sum}\limits_{i}{\left({\epsilon }_{r}^{\left(i\right)}{\epsilon }_{0}\right)}^{-1}{d}_{i},i\in \left\{{{{\mathrm{IL}}}}_{1},{{{\mathrm{HSO}}}}^{1},{{{\mathrm{Al}}}}_{2}{O}_{3},{{{\mathrm{HSO}}}}^{2},{{{\mathrm{IL}}}}_{2}\right\},$$
(8)

where di denotes the thickness of the layer i. Equation (5) is further arranged using Eq. (2):

$$j\left({t}_{1}+\triangle t\right)=-{\left(A{R}_{{{\mathrm{ESR}}}}+{B}_{2}\triangle t\right)}^{-1}\left[{V}_{{{\mathrm{ap}}}}\left({t}_{1}+\triangle t\right)-{V}_{c}\left({t}_{1}\right)-{B}_{1}\triangle t\right]$$
(9)

Therefore, the total current density j at the current time step t2 (= t1 + Δt) can be calculated using Eq. (9) with the voltage across the SRMC at the previous time step (Vc(t1)) and B1 at the current time step t2. The remaining task is to calculate B1.

As such, the dc current density jdc considers electronic je and ionic contributions \({z}_{{Vo}}q{j}_{{{\mathrm{dd}}}}^{{Vo}}\): \({j}_{{{\mathrm{dc}}}}={j}_{e}+{z}_{{{\mathrm{Vo}}}}q{j}_{{{\mathrm{dd}}}}^{{{\mathrm{Vo}}}}\), where zVo and q are the ionization number of an oxygen vacancy and elementary charge, respectively. Note that \({j}_{{{\mathrm{dd}}}}^{{{\mathrm{Vo}}}}\) is the flux of oxygen vacancies. Unlike the total current density j, the dc current density jdc varies along the layers because of the non-equilibrium distribution of oxygen vacancies at the applied voltage. This is due to the sluggish response of oxygen vacancies to the internal electric field because of their low diffusion coefficient DVo (and thus mobility MVo). The flux \({j}_{{{\mathrm{dd}}}}^{{{\mathrm{Vo}}}}\) is driven by the gradient of electrochemical potential of oxygen vacancies, \({\nabla \eta }_{{{\mathrm{Vo}}}}\left(={\nabla \mu }_{{{\mathrm{Vo}}}}+{z}_{{{\mathrm{Vo}}}}q\nabla V\right)\). The chemical potential of dilute oxygen vacancies is denoted by \({\mu }_{{{\mathrm{Vo}}}}\left(={\mu }_{{{\mathrm{Vo}}}}^{0}+{kT}{\rm{ln}}\left({c}_{{{\mathrm{Vo}}}}/{c}_{{{\mathrm{Vo}}}}^{0}\right)\right)\), where \({\mu }_{{{\mathrm{Vo}}}}^{0}\), \({c}_{{{\mathrm{Vo}}}}^{0}\), cVo, k, and T are the chemical potential and oxygen vacancy density at the reference state, oxygen vacancy density at a given state, Boltzmann constant, and lattice temperature, respectively.

In the first-order approximation (FOA), the oxygen vacancy flux \({j}_{{{\mathrm{dd}}}}^{{{\mathrm{Vo}}}}\) is given by

$${j}_{{{\mathrm{dd}}}}^{{{\mathrm{Vo}}}}=-{q}^{-1}{c}_{{{\mathrm{Vo}}}}{M}_{{{\mathrm{Vo}}}}{\nabla \eta }_{{{\mathrm{Vo}}}}$$
(10)

Considering the electrochemical potential gradient and Einstein relation, Eq. (10) becomes the celebrated drift-diffusion equation, \({j}_{{{\mathrm{dd}}}}^{{{\mathrm{Vo}}}}=-{D}_{{{\mathrm{Vo}}}}{\nabla {\rm{c}}}_{{{\mathrm{Vo}}}}-{z}_{{{\mathrm{Vo}}}}{c}_{{{\mathrm{Vo}}}}{M}_{{{\mathrm{Vo}}}}\nabla V\). However, the application of a high voltage to the SRMC of thin active layers (~4 nm) likely undermines the FOA, leading to the breakdown of FOA. In this case, we should consider the full velocity equation44:

$${v}_{{{\mathrm{Vo}}}}=\frac{{D}_{{{\mathrm{Vo}}}}}{{a}_{{{\mathrm{Vo}}}}}\left({e}^{-\frac{{a}_{{{\mathrm{Vo}}}}{\nabla \eta }_{{{\mathrm{Vo}}}}}{2{kT}}}-{e}^{\frac{{a}_{{{\mathrm{Vo}}}}{\nabla \eta }_{{{\mathrm{Vo}}}}}{2{kT}}}\right),$$
(11)

where aVo is the hopping distance of an oxygen vacancy. Note that the electric field involved in \({\nabla \eta }_{{{\mathrm{Vo}}}}\) in this equation is macroscopic electric field45. Equation (11) can further be arranged as

$${v}_{{{\mathrm{Vo}}}}=\frac{{D}_{{{\mathrm{Vo}}}}}{{a}_{{{\mathrm{Vo}}}}}{e}^{\frac{{a}_{{{\mathrm{Vo}}}}}{2{kT}}\left({z}_{{{\mathrm{Vo}}}}q\nabla {\rm{V}}+\nabla {\mu }_{{{\mathrm{Vo}}}}^{0}+{kT}\nabla {c}_{{{\mathrm{Vo}}}}\right)}\left({e}^{-\frac{{a}_{{{\mathrm{Vo}}}}}{{kT}}\left({z}_{{{\mathrm{Vo}}}}q\nabla {\rm{V}}+\nabla {\mu }_{{{\mathrm{Vo}}}}^{0}+{kT}\nabla {c}_{{{\mathrm{Vo}}}}\right)}-1\right)$$
(12)

Here, we consider the gradient of reference chemical potential \({\mu }_{{Vo}}^{0}\) that can be ignored when considering ion migration within a homogeneous medium. However, the reference state chemical potential may vary along the multilayer in our SRMC. Eventually, the oxygen vacancy flux \({j}_{{{\mathrm{dd}}}}^{{{\mathrm{Vo}}}}\left(={c}_{{{\mathrm{Vo}}}}{v}_{{{\mathrm{Vo}}}}\right)\) can be calculated at all edges for given distributions of internal electrostatic potential and oxygen vacancy density. Inversely, the oxygen vacancy flux distribution determines the oxygen vacancy density according to the Fick’s second law, \(d{c}_{i}/{{\mathrm{d}}t}=-\nabla {j}_{{{\mathrm{dd}}}}^{i}\), so that the oxygen vacancy flux and density are in a self-consistent relation at a given distribution of internal electrostatic potential, which can be calculated iteratively. The Fick’s second law was solved using the finite difference method, and the self-consistent relation was retained using the Newton-Raphson method. We used asymmetric boundary conditions such that the BE/HSO2 interface forms blocking contact for oxygen vacancy while the TE serves as an oxygen vacancy reservoir by maintaining a particular oxygen vacancy density value.

As stated, we applied the quasi-static approximation to electronic distribution and dc current density regarding the large difference in diffusion coefficient (and thus mobility) between electron and oxygen vacancy. That is, the electronic distribution and current density are at equilibrium at any given time step, leading to space-invariant dc current density \(\left(\nabla {j}_{e}=0\right)\) and the consequent time-invariant density \(\left(d{c}_{e}/{{\mathrm{d}}t}=0\right)\) at given distributions of internal electrostatic potential and oxygen vacancy density. For the simplicity of modeling, we used the following approximations: (i) both BE and TE conform to the free electron model with spherical Fermi surface, and (ii) both HSO1 and HSO2 have single conduction band minima. Given the experimentally observed thermally activated current, we considered the conduction band conduction of free electrons in the insulating trilayer structure:

$${j}_{{\mathrm{e}}}=n{M}_{{\mathrm{e}}}{\nabla {\rm{\epsilon }}}_{{\mathrm{f}}},$$
(13)

where n, Me, and \({{\rm{\epsilon }}}_{{\mathrm{f}}}\) denote free electron density, electron’s mobility, and Fermi level (electron’s electrochemical potential), respectively. The free electron density n is calculated from the density of states of electrons \(g\left(\epsilon \right)=8\pi {\left(2{m}_{{\mathrm{e}}}^{3}\right)}^{1/2}{h}^{-3}{\left(\epsilon -{\epsilon }_{{\mathrm{f}}}\right)}^{1/2}\) and the Fermi-Dirac distribution function \(f\left(\epsilon \right)\).

$$n=\int_{{\epsilon }_{c}}^{{{\infty }}}g\left(\epsilon \right)f\left(\epsilon \right)d\epsilon ,$$
(14)

where \({\epsilon }_{c}\) is the conduction band minimum. We assumed that the Al2O3 diffusion barrier is sufficiently thin to have negligible effects on the overall electron transport through the SRMC.

In the steady-state, the electronic current in Eq. (13) equals the interfacial electronic currents at bottom and top interfaces (je(B) and je(T), respectively), i.e., je(B) = je = je(T). The bottom interfacial current je(B) is the difference between electrode-to-insulator current (je(B)MI) and insulator-to-electrode current (je(B)IM), je(B) = je(B)MIje(B)IM.

The current je(B)IM is attributed to the electron transport from BE to insulator. Note that the current direction is opposite to the electron transport direction. Given that the experimental observations indicate the thermal activation of electron transport, we modeled je(B)IM using the Schottky emission equation, which considers the Schottky barrier height (SBH) lowered by image force when the interfacial electric field Eint is negative.

$${j}_{{\mathrm{e}}\left({\mathrm{B}}\right){{\mathrm{IM}}}}=-{A}^* {T}^{2}{e}^{-q{\phi }_{b}/{kT}},$$

where A* means the modified Richardson constant. The SBH \({\phi }_{b}\) with image force is given by

$${\phi }_{b}={\phi }_{b}^{0}-{\left(\frac{q}{4{\epsilon }_{r}^{\left(i\right)}{\epsilon }_{0}}{{\max }}\left(0,-{E}_{{{\mathrm{int}}}}\right)\right)}^{1/2},$$

where \({\phi }_{b}^{0}\) is the band offset (difference between the electron affinity of the insulator and work function of the cathode electrode). The function max(0,-Eint) indicates a max function which is used to include the barrier lowering effect only when the interfacial field is negative. The opposite current je(B)MI should be considered to keep a balance against je(B)IM. This current was modeled as \({j}_{{\mathrm{e}}\left({\mathrm{B}}\right){{\mathrm{MI}}}}=q{v}_{{{\mathrm{MI}}}}{n}_{{\mathrm{B}}}\), where \({v}_{{{\mathrm{MI}}}}={A}^{\ast }{T}^{2}{\left(q{N}_{{\mathrm{c}}}\right)}^{-1}\). The electron density at the bottom interface nB can be calculated using Eq. (14). The effective density of states of is denoted by Nc. Finally, we can calculate the electron current je(B) (= je(B)MIje(B)IM). The same holds for the top interfacial current je(T). Yet, the direction should be opposite; je(T) = je(T)IMje(T)MI.

The algorithm for resistive switching dynamics calculation is detailed in the following pseudocode.

Algorithm 1

Resistive switching dynamics.

Input: internal voltage profile V, oxygen vacancy profile cVo, total current density j, applied voltage Vap at time step t-1

Output: internal voltage profile V, oxygen vacancy profile cVo, total current density j at time step t

Parameters: iteration error minima δ(l)0 and δ(m)0

Evaluate j, jdc, V at time step t and applied voltage Vap:

while δ(l) > δ(l)0 do

      calculate je at given V(l)

      while δ(m) > δ(m)0 do

            calculate jdd(m) at given cVo(m), V(l)

            calculate cVo (m+1) using jdd(m) at given V(l) s.t. δ(m) (=|cVo(m) - cVo(m+1)|) decreases

            δ(m) ← |cVo (m)cVo(m+1)|

    calculate jdc(l) using je and jdd(m) at given V(l)

    calculate V(l+1) using jdc(l) s.t. δ(l) (=|V(l+1)−V(l)|) decreases

    δ(l) ← |V(l+1)V(l)|

    calculate j(l+1) using jdc(l+1) and V(l+1)

j[t] ← j(l+1)

jdc[t] ← jdc(l+1)

V[t] ← V(l+1)

Electrical Measurements of 160 × 160 and 320 × 320 CAs

Unlike the 30 × 30 CA, the 160 × 160 and 320 × 320 CAs are not random-accessible because the number of row- and column-lines to be connected exceeds the number of currently available probes. Nevertheless, we programmed the whole CA to identify the effect of sneak current on the single predefined cell under measurement by tying all row-lines (common row-inhibit-line) except the signal row-line and all column-lines except (common column-inhibit-line) the signal column as shown in Supplementary Fig. 5. We attempted to program the unselected group 2 cells into their LRS by pulling up the common row-inhibit-line and pulling down the common column-inhibit-line. Likewise, an attempt to program the unselected group 1 into their LRS was made by pulling up the signal column-line (Common column-inhibit-line) while the common row-inhibit-line (Signal row-line) is grounded. However, the LRS could not directly be verified on individual cells because of the lack of random-accessibility. Instead, the success in set switching was indirectly verified such that the total current through the cells in each group was compared with the current extrapolated from a single cell. The measured value was comparable to the extrapolated value, indirectly identifying the success in programming.

The IV measurement on the predefined selected cell was conducted by applying a stare-case voltage (Vop) to the signal column-line while (i) the signal row-line was grounded and (ii) stare-case inhibit voltages (Vinhibit1, Vinhibit2) were applied to the common row- and column-inhibit-lines, respectively. The step-wise changes of Vop, Vinhibit1, and Vinhibit2 were synchronized.