Recurrent Neural Networks Made of Magnetic Tunnel Junctions

Artificial intelligence based on artificial neural networks, which are originally inspired by the biological architectures of human brain, has mostly been realized using software but executed on conventional von Neumann computers, where the so-called von Neumann bottleneck essentially limits the executive efficiency due to the separate computing and storage units. Therefore, a suitable hardware platform that can exploit all the advantages of brain-inspired computing is highly desirable. Based upon micromagnetic simulation of the magnetization dynamics, we demonstrate theoretically and numerically that recurrent neural networks consisting of as few as 40 magnetic tunnel junctions can generate and recognize periodic time series after they are trained with an efficient machine-learning algorithm. With ultrahigh operating speed, nonvolatile memory and high endurance and reproducibility, spintronic devices are promising hardware candidates for neuromorphic computing.

In the past decade, significant progress has been made in artificial intelligence, where advanced algorithms using artificial neural networks (ANNs) have been successfully applied in image recognition, data classification, and other areas. 1,2 As an impressive example, the deep learning technique has shown an overwhelming advantage in the confrontation between a human and computer in the game of go. [3][4][5] ANNs resulting from the simulating biological architectures of human brain possess the intrinsic advantages of brain including parallel computation, distributed storage, low energy consumption etc. Nevertheless, these advanced algorithms are mostly implemented using software and are still executed on conventional computers with the von Neumann architecture, where the advantages of brain-inspired computing are unfortunately not fully exploited. 5 There have been many attempts to design and fabricate neuromorphic hardware devices, [6][7][8][9] which are not limited by the von Neumann bottleneck and intrinsically possess all the aforementioned advantages. Neuromorphic chips using the standard CMOS circuits such as the IBM TrueNorth consisting of billions of transistors can perform brain-inspired computing with remarkably low power. 10,11 Magnetic materials, however, have the potential to further increase the energy efficiency and areal density of devices by several orders of magnitude. An example has been shown in the devices of random number generation, where the most energy-efficient implementation of CMOS circuit consumes 2.9 pJ/bit and the circuit area of 4004 µm 2 . 12 The device based on magnetic tunnel junctions (MTJs) only costs 20 fJ/bit and 2 µm 2 in area. 13 In addition, memristors made of resistive and phase-change materials have attracted much attention in the realization of ANNs. 14-18 Compared to a) Electronic mail: zyuan@bnu.edu.cn resistive and phase-change memristors, magnetic materials have faster dynamics at a time scale of nanoseconds and high endurance of more than 10 15 cycles for magnetization switching. [19][20][21] More importantly, the magnetization dynamics can be well described by the phenomenological Landau-Lifshitz-Gilbert equation, 22,23 which has been examined in the past half century in the research communities of magnetism and spintronics. Recently, spintronics-based brain-inspired computing was used to realize the Hopfield model of memory. 24 The sound recognition could be significantly improved under the help of spin-torque nano-oscillators, 25,26 whose nonlinear magnetization dynamics with memory is essential to capture the distinct acoustic features encoded in frequencies. A voltage-controlled stochastic spintronic device is implemented in experiment, where the stochastic behavior of magnetic switching is incorporated in an ANN to recognize the handwritten digits. 27 To date, most of the spintronic devices of brain-inspired computing have been applied in the recognition of static images or patterns, and little is known about their capability of temporal signal processing.
Reservoir computing is particularly suitable for encoding time series, 28,29 in which the reservoir is physically a recurrent neural network (RNN). 30 The sparse and usually random connections among the neurons in the RNN ensure the capability to describe sufficiently complex functions. 31 The relatively simple structure is another advantage of the RNN in the hardware implementation. 32,33 In this paper, we report a spintronic realization of RNNs with MTJs, which were used as the basic units of spintransfer-torque magnetic random access memory. The nonlinear magnetization dynamics of an MTJ driven by an electrical current allows us to replace one neuron in the RNN by a single MTJ. By performing a micromagnetic simulation, we demonstrate that an RNN consisting of as few as 40   of the network can be significantly improved by increasing the number of MTJs. We consider 40 MTJs as artificial neurons, which are randomly and sparsely connected with one another via either positive or negative unidirectional synapses, as schematically illustrated in Fig. 1(a). An MTJ consists of two thin magnetic layers separated by an insulating layer, as plotted in Fig. 1(b). The bottom layer has a fixed inplane magnetization, usually pinned by an antiferromagnetic material via exchange bias. 34 The magnetization of the top (free) layer in the MTJ can be excited by an injected electrical current following the Landau-Lifshitz-Gilbert equation in the presence of current-induced spintransfer torques [35][36][37] Here, m is the magnetization direction of the free layer, and H eff is the effective magnetic field, including the exchange, anisotropic and demagnetization fields. The gyromagnetic ratio γ and Gilbert damping α are both material parameters. The last two terms in Eq. (1) are the adiabatic and nonadiabatic spin-transfer torques, respectively, where m p denotes the magnetization direction of the fixed magnetic layer and the magnitude of the torque τ = (γ P/µ 0 eM s t)j depends on the current polarization P , the current density j, the saturation magnetization M s and the free-layer thickness t. The Slonczewski pa-rameter = Λ 2 /[(Λ 2 + 1) + (Λ 2 − 1)m · m p ] characterizes the angular dependence of the torque with the dimensionless parameter Λ ∈ [0, 1]. β is the nonadiabaticity of the spin-transfer torque and is usually much smaller than one. In this work, the dynamic equation is solved numerically using the micromagnetic simulation program MuMax3. 38 The magnetization of the free layer, which is perpendicular to the fixed layer at equilibrium, starts to precess about its easy axis with an external current. Therefore, the total resistance R of the MTJ, which depends on the relative magnetization orientation of the two magnetic layers, exhibits an oscillation as a function of time. In the regime of a small current density, the amplitude of the oscillation decays gradually [see Fig. 1(c)], and the output signal of the artificial neuron is quantitatively defined by the difference between the last maximum and minimum values of the resistance (∆R) within 1.5 ns since the electrical current is injected.
In this way, the driving force of the magnetization precession, i.e., the injected electrical current density, can be defined as the input of the artificial neuron, while the resulting oscillatory resistance ∆R corresponds to the response or output. If one increases the current density, ∆R increases monotonically and nonlinearly. Using micromagnetic simulation, we can determine the nonlinear response function of the artificial neuron, which is plotted in Fig. 1(d). We choose an electrical current density in the range of [0.8, 1.0]×10 10 A/m 2 and a corresponding ∆R ∈ [114, 145] Ω in this work. Both the input and output are normalized to be in the range [0, 1] when we consider the signal transfer among MTJs; see Supplementary Material. Owing to the small range of j that we choose in this work, the resistance change ∆R is very small compared with the value at equilibrium R 0 = 11.05 kΩ. In practice, ∆R/R 0 can be increased up to 20% by applying larger current density, as shown in the inset of Fig. 1(d).
Every MTJ is connected to the "output neuron" of the RNN via two synapses: one transfers the output signal of every MTJ to the "output neuron" (w out ), and the other provides feedback from the "output neuron" to the MTJ (w f ). Here, only the weights w out are varied during the "learning" process, while the synapses within the RNN and the feedback synapses w f are all fixed. Such an RNN can maintain time-dependent activation by the mutual interactions of neurons even without an external input. The detailed parameters and learning algorithm of this network can be found in Supplementary Material.
As illustrated in Fig. 1(a), the weighted summation over the output signals of all the MTJs is defined as the output of the RNN. We first let the RNN generate a target sinusoidal function f (t) = A sin(2πt/T ) with A = 0.87 and T = 45 ns. A very efficient algorithm called "force-learning scheme" 39 is applied, in which the weights w out are tuned by comparing the error between the RNN output and the desired target function. In practice, the RNN output follows the target function very quickly. As shown in Fig. 2(a), after 3000 ns, the output is already in perfect agreement with the target function. After learning for 4500 ns, we no longer vary w out , and the network sustains the generation of the same function as its output. This suggests the success of the learning scheme in this artificial RNN, which merely consists of 40 MTJs.
At t = 5700 ns, we abruptly change the amplitude and period of the target sinusoidal function with A = 0.65 and T = 80 ns. At the same time, the force-learning algorithm is launched again to train the network via tuning the weights w out . The output significantly deviates from the new target function immediately after 5700 ns, but they superpose each other after 3000 ns of learning. We turn off the learning process after t = 8700 ns, and the RNN steadily generates the new sinusoidal function afterwards.
More complex time series can be learned using an RNN with more MTJs. For instance, by defining twodimensional coordinates x and y, which are both timedependent functions, we can reproduce handwritten Chinese characters. As schematically illustrated in Fig. 3(a), we construct an RNN with 800 MTJs and two output nodes for x and y. Instead of using feedback from the output nodes, we introduce two input nodes, where the ideal target functions are imported to the RNN to increase the learning efficiency. Moreover, we allow tunability of the random and sparse connections in the RNN to improve its flexibility and transferability, because a single RNN is used to generate the two coordinates simultaneously.
We choose the Chinese character meaning "teacher" written with a writing brush, as shown in Fig. 3(b). Two functions of time x(t) and y(t) are defined in a twodimensional coordinate system to follow the stroke order of this character. Since both the connections in the RNN and the output weights are adjusted in the learning process, we employed the so-called innate training algorithm. 40 Such an algorithm is more robust and efficient for convergence. In addition, the innate training algorithm is practically highly resistant to noise or perturbations. The specific implementation of the innate training algorithm has two steps. In the first step, the connection weights inside the RNN are tuned to allow every MTJ to have its own sustained response to a pulse input. The success of this training is achieved when this sustained response becomes invariant for different initial conditions of the MTJs. This step is essential to improve the robustness of the network and produce insensitivity to noise.
Having adjusted the connection weights in the RNN, next we apply the force-learning algorithm to tune the output weights w out1 and w out2 . In this step, the target periodic functions x(t) and y(t), which are implemented with the period of 170 ns, are imported from the two input nodes. After training for 10 periods, the output is plotted in Fig. 3(c), which successfully reproduces the handwritten Chinese character.
In addition to the generation of periodic functions, an RNN made of MTJs can also be applied to the recognition of time series. The structure of the network is shown in Fig. 4(a), where a time-dependent function is imported to the RNN from the input node. After adjusting the output weights w out of the network, the output can have a different response to the corresponding input functions. To demonstrate recognition by the RNN, we input two simple functions into the RNN: a square wave 2sgn[sin(ω 1 t)] and a sinusoidal function sin(ω 2 t) with ω 1 = 0.16 GHz and ω 2 = 0.21 GHz. The designed target functions for the sinusoid and the square wave are +1 and −1, respectively.
For every 60 ns, we input one type of function, either the sinusoid or the square wave, as plotted in Fig. 4(b).
Here, random noise is superposed on the function, which is approximately 5% of the magnitude of the function. Then, we adjust the output weights w out to let the output of the RNN match the required target function [ Fig. 4(c)]. Such learning is performed 60 times in the first 3600 ns, and then the weights are fixed in later recognition. To avoid the influence of the previous recognition, we deliberately reset all the MTJs at the beginning of every recognition process, i.e., the initial output is always 0. The real RNN output is plotted in Fig. 4(d) as a function of time. Depending on the averaged output value, the RNN can recognize all the input waves 100 times (from 3600 ns to 9600 ns) successfully. The numerical simulation we have done so far is a proof of concept and hence MTJs with fast precessions are chosen in simulation to reduce computational cost. In experiment lower-frequency precessions may be preferred, which can be done by using the vortex magnetization in the free layer. 26 Moreover, the high-frequency resistance is technically difficult to measure, so the measurable voltage of the MTJs can be used as the neuron output, which is just another nonlinear function of the input current. One also needs to consider two possible difficulties in experiment, i.e. the non-identical MTJs and phase noise, while the details are provided in Supplementary Material. We show that a RNN made of 40 MTJs with 25 different sizes works as well as the RNN with identical MTJs. Phase noise is one of the key issues limiting the functionality and performance of MTJ-based dynamical devices. 41 The RNN output is indeed affected by the phase noise, but can be systematically improved by increasing the number of MTJs in the RNN.
The synapses here are not realized using magnetic devices. Instead, we merely consider a hybrid system with the artificial neurons modeled by MTJs and an external storage for the synaptic weights. The implementation is analogous to the present neuromorphic chip, where static random access memory is employed to store the adjustable synaptic weights. [8][9][10][11] There are several proposals of trainable artificial synapses made of magnetic and resistive materials in literature, 20,42 such as the Hall bars consisting of perpendicular magnetic multilayers, 24 the spintronics memristors based on domain walls, [43][44][45][46] and the MTJ-based devices with multiple electrical resistances. 19,47,48 The technical challenge for applying these proposals is the effective and precise adjustment of the synaptic weights in the training process. The imperfection in synapses are explicitly examined in Supplementary Material including the fluctuation of the synaptic weights, signal delay in the RNN, and failure to update part of the output synapses during the training process. Nevertheless, the RNN can still learn to generate the target function indicating a high tolerance of the RNN for imperfect synapses.
We have demonstrated that the generation and recognition of time series can be achieved by a MTJ-based recurrent neural network. Using micromagnetics to simulate the magnetization dynamics of the MTJs, we have shown that the RNN can learn to generate an arbitrary periodic function. With enough MTJs, an RNN can even be trained to simulate a handwritten character of the Chinese writing system. The recognition of different time-dependent functions has also been successfully performed using such a network. Moreover, this MTJ-based RNN is found to have a high tolerance to size dispersion of the MTJs. In time series recognition, such an RNN is resistant to the noise of the input signals.
The demonstration of this spintronic implementation of neuromorphic computing suggests that MTJs are very promising candidates for artificial neurons. Owing to the low energy cost and small geometric size, magnetic devices are expected to significantly improve the energy efficiency and integration density of neuromorphic devices. MTJs have ultrafast dynamics in the nanosecond regime and high endurance of more than 10 15 cycles because magnetization dynamics does not involve any atomic motion as in the diffusive memristors. In addition, MTJs can be naturally integrated with the artificial synapses made of non-volatile magnetic memories such that all magnetic/spintronic neuromorphic chips can eventually be achieved. The proposed magnetic synapses also attract great attention in research, 19,20,24,[42][43][44][45][46][47][48] where the synaptic weight needs to be precisely adjusted during learning.