Paper The following article is Open access

Magnetic tunnel junction random number generators applied to dynamically tuned probability trees driven by spin orbit torque

, , , , , , , , and

Published 23 April 2024 © 2024 The Author(s). Published by IOP Publishing Ltd
, , Focus on Neuromorphic and Non-Boolean Computing with Nanomagnets Citation Andrew Maicke et al 2024 Nanotechnology 35 275204 DOI 10.1088/1361-6528/ad3b01

0957-4484/35/27/275204

Abstract

Perpendicular magnetic tunnel junction (pMTJ)-based true-random number generators (RNGs) can consume orders of magnitude less energy per bit than CMOS pseudo-RNGs. Here, we numerically investigate with a macrospin Landau–Lifshitz-Gilbert equation solver the use of pMTJs driven by spin–orbit torque to directly sample numbers from arbitrary probability distributions with the help of a tunable probability tree. The tree operates by dynamically biasing sequences of pMTJ relaxation events, called 'coinflips', via an additional applied spin-transfer-torque current. Specifically, using a single, ideal pMTJ device we successfully draw integer samples on the interval [0, 255] from an exponential distribution based on p-value distribution analysis. In order to investigate device-to-device variations, the thermal stability of the pMTJs are varied based on manufactured device data. It is found that while repeatedly using a varied device inhibits ability to recover the probability distribution, the device variations average out when considering the entire set of devices as a 'bucket' to agnostically draw random numbers from. Further, it is noted that the device variations most significantly impact the highest level of the probability tree, with diminishing errors at lower levels. The devices are then used to draw both uniformly and exponentially distributed numbers for the Monte Carlo computation of a problem from particle transport, showing excellent data fit with the analytical solution. Finally, the devices are benchmarked against CMOS and memristor RNGs, showing faster bit generation and significantly lower energy use.

Export citation and abstract BibTeX RIS

Original content from this work may be used under the terms of the Creative Commons Attribution 4.0 licence. Any further distribution of this work must maintain attribution to the author(s) and the title of the work, journal citation and DOI.

1. Introduction

The generation of large, high-quality sets of random numbers is critical for many modern computing purposes, such as the modelling of quantum or high-energy physics systems, precise simulation of climate models, optimization, cryptography, and others. Traditionally, this has been achieved using deterministic computing architecture, which includes von Neumann machines such as clusters of graphics processing units or central processing units. This has two implications for the production of random numbers. First, the numbers are not truly stochastic in nature, and thus are more accurately referred to as 'pseudo'-random numbers. And second, significant energy and computational resources are required to appropriately emulate stochastic behavior using only deterministic inputs and operations on CMOS technology [1, 2].

Probabilistic approaches to computing have gained interest in recent years as potential solutions to these problems [3, 4]. For example, it is possible to leverage the inherent stochasticity of nanoscale devices such as the magnetic tunnel junction (MTJ) to produce probabilistic behavior at the device level, without needing complex mathematical functions or computer hardware to convert deterministic inputs into pseudo-random outputs [58]. This allows for 'true-random number' generation (tRNG), where the resulting random bitstreams are the results of directly sampling stochastic thermal processes.

The MTJ has been studied as a stochastic bit using various knobs to sample or control its stochasticity. An individual sampling of an MTJ is treated as a 'coinflip', i.e. the MTJ state at a given time is interpreted to be in one of two binary positions 'heads' and 'tails'. The MTJ p-bit has a low energy barrier between its two resistance states, sampled as it thermally fluctuates in time; as spin transfer torque switching of the MTJ free layer is stochastic, choosing the correct switching current can produce a 50:50 probability of switching. To avoid dependence on temperature, instead the MTJ energy barrier can be lowered and then raised in time to produce random binary bits, evaluated in our previous work [9]. The probability for the MTJ to be in either state when sampling can be controlled via direct application of spin transfer torque, spin–orbit torque, voltage-controlled magnetic anisotropy [5, 6], or a combination thereof; by taking a sequence of coinflips with appropriate biasing between samples, it is possible to assemble sets of true-random numbers.

While using MTJs to produce bitstreams of true-random numbers has been investigated, a less explored property of the MTJ as an RNG is the ability to dynamically tune the MTJ switching probability. This opens up the ability to sample from arbitrary probability distributions with the help of a probability tree [10]. This framework allows direct sampling of arbitrary distributions without needing to spend resources converting uniformly-distributed random data or use techniques such as rejection sampling to ensure bit quality, making it more energy efficient.

In this work, we use physics-based simulations to demonstrate the use of dynamically tunable MTJ RNGs for sampling from a probability distribution. A probability tree for the exponential distribution is chosen due to the usefulness ubiquity of such a distribution in physics applications [1113]. Using a single, ideal device it is shown via p-value analysis that the drawn samples are reasonably sampled from the desired distribution. Since the precision of such a probability distribution is essential to many of the applications, thermal energy cycle-to-cycle variation as well as device-to-device variation of state-of-the-art fabricated MTJ arrays [14, 15] are studied and the resulting effects on true-random number generation are shown. Specifically,when using a single, varied device the ability to accurately draw samples from the desired distribution is diminished when the variation gets too large; fortunately, this can be rectified by using a collection of varied devices to generate the entire random dataset. Finally, the framework is used to draw exponential and uniform random numbers for use in a Monte Carlo physics simulation, where excellent fit with the analytical data is shown. Comparing to CMOS pRNG and tRNG, several orders of energy savings are achieved.

The paper is divided as follows: section 2 describes the physical operation of the MTJ coinflip device in the probability tree to construct a random number pulled from a prescribed probability distribution. Section 3 describes the numerical model used for this investigation and the particular material and device properties simulated. Section 4 uses the ideal MTJ-based device to generate 8-bit integer samples from an exponential distribution. Section 5 introduces realistic device-to-device variations and analyzes the effects on the resulting sampled distributions, as well as mitigation strategies. Section 6 applies the described framework to random walk solutions for the computation of particle absorption. And finally, section 7 concludes the paper.

2. MTJ operation in the probability tree

A single MTJ is a nanoscale device with three critical layers: a ferromagnetic free layer (FL), an insulating layer, and a ferromagnetic pinned layer (PL) as shown in figure 1(a) with perpendicular magnetic anisotropy (PMA). The magnetization of the PL is held fixed, while the FL is allowed to rotate. When the magnetizations of the FL and PL are parallel (P), i.e. they point in the same direction, the MTJ is in a low resistance state. When the magnetizations of the two layers are anti-parallel (AP) the resistance of the MTJ is in a high resistance state. With an energy barrier between the states greater than thermal energy, each state is a local energy minimum, and the MTJ will remain in its state until application of an external signal.

Figure 1.

Figure 1. (a) PMA MTJ stack showing FL being held in-plane via applied SOT current between T1 and T3, and relaxing to P or AP states upon SOT current removal. Relaxation can be biased via application of STT current between T1 and T2. (b) Calculated ideal MTJ device S-curve shown in black, mapping desired weighting to applied STT current. Colored curves show S-curve distortion as MTJ parameters vary. (c) Mechanics of the tunable probability tree, showing a three-level tree operation with associated PDF and results as an example. Because the example distribution is an exponential, the weightings across branches on a given level are equal; for arbitrary PDFs this will not be the case.

Standard image High-resolution image

Considering the P and AP states as the two possible binary states, it is possible to construct an unbiased coinflip device by pulsing a spin–orbit current in a heavy metal layer beneath the FL between T1 and T3 as shown in figure 1(a). This rotates the magnetization of the FL to be in-plane, and perpendicular to that of the PL via the spin Hall effect [16]. Removing the spin–orbit current will cause the FL to stochastically relax back into either the P or AP states with equal probability at finite temperatures. Thus unlike a MTJ p-bit, the stochasticity of the SOT-MTJ device is driven via an external signal which temporarily lowers the energy barrier of the MTJ. After the removal of the external signal, the energy barrier of the device rises and the MTJ state can be read via external circuitry without needing to worry about errors introduced via random sampling. It is also possible to create a biased coinflip by applying a spin-polarized current between T1 and T2; the precise amplitude of the spin-polarized current sets the probability to fall into either the P or AP states. The relative weight given to each state as a result of the applied spin-polarized current is described by the sigmoidal device 'S-curve' [17], calculated in figure 1(b).

Interpreting the outputs of a series of unbiased coinflips specifically as the digits of a binary number, i.e. as 0's and 1's, allows for generation of true-random integers sampled from a uniform distribution. In order to sample from more complex distributions, a probability tree can be used to properly tune successive coinflips as shown in figure 1(c). Selecting reasonable lower and upper bounds to draw random samples from, the bias of the first coinflip is determined by dividing the probability distribution function (PDF) of the target distribution at the midpoint of the chosen bounds and the area-fractions underneath the PDF curve to the left and right of the division (readily calculable using the cumulative distribution function) represents the probability for the MTJ to relax into the P or AP states, respectively. This desired weighting is mappable to an applied spintransfer torque (STT) current via the device ideal S-curve. For the following coinflip, the PDF section corresponding to the result of the first flip (left side for P, right side for AP) is again split into two parts and used to determine the next bias. This process continues recursively down the tree until all desired bits have been sampled. Each leaf thus corresponds to a bin centered on the final drawn value; the width and number of available bins depends on the choice of initial lower and upper bounds, and on the number of levels in the probability tree. Figure 1(c) illustrates the process for generating random integers on the interval [0, 7] from an exponential distribution.

While a particular MTJ device can be characterized by an ideal S-curve that allows for accurate biasing with a STT current, inevitably device variations will be introduced in the manufacturing process [14]. Geometric imperfections, uneven growth rates, and other factors can cause device quantities such as the TMR, resistance-area product (RA), and thermal stability factor (Δ) to vary between devices fabricated on the same die. This leads to variations in the device S-curves, calculated as the colored lines in figure 1(b) by varying Δ within the range ±3%. As a result of these parameter variations, the sampled random bitstreams from these non-ideal devices will become biased [18, 19], and the random variables sampled via the probability tree may not belong to the intended target distribution. The impact of parameter deviations on the devices, as well as the result of using collections of devices with normally distributed variations to generate sets of randomly generator numbers, is studied in section 5.

3. Numerical model

To investigate the ability for SOT-driven MTJ devices to accurately sample numbers from arbitrary probability distributions using a tuned probability tree, the FL magnetization is modeled as a three-dimensional vector $\vec{m}$ obeying the time-dynamic Landau–Lifshitz-Gilbert equation (LLG) under a macrospin approximation. Including both STT and SOT terms, the LLG is written as:

Equation (1)

where $P$ is the spin-polarization, $\gamma $ the gyromagnetic ratio, $\alpha $ the Gilbert damping constant, ${\vec{m}}_{\mathrm{PL}}$ the magnetization of the PL, ${J}_{\mathrm{STT}}$ the amplitude of the biasing STT current density, and ${J}_{\mathrm{SOT}}$ the amplitude of the SOT current density through the heavy-metal layer, and ${\vec{\sigma }}_{\mathrm{SOT}}$ its direction. $\beta $ is defined as $\beta =\sqrt{\gamma \hslash /2{q}_{e}{t}_{f}{M}_{{\rm{S}}}}$ where ${q}_{e}$ is the electron charge, ${t}_{f}$ the FL thickness, and ${M}_{{\rm{S}}}$ the saturation magnetization of the FL and PL. The term ${\vec{H}}_{\mathrm{eff}}$ is an effective field term that includes both $\hat{z}$ directed PMA and stochastic thermal effects. Here, we focus on damping-like torques that produce switching dynamics in reasonable timescales. Equation (1) is discretized in time and solved with a time-integration scheme. The device simulated is a typical CoFeB PMA MTJ with MgO tunnel barrier; in [9], we previously showed that bitstreams generated by this type of SOT-MTJ device pass the National Institute of Standards and Technology uniformity tests [20]. Device parameters such as RA, TMR, and ${\rm{\Delta }}$ are based on data from fabricated devices in [14]; note that in order to maintain PMA under these conditions, the saturation magnetization (unreported in [14]) had to be sightly lowered past typical values, but is kept within realistic physical limits [21]. The full set of device and simulation parameters is listed in table 1, however other PMA devices are expected to work with the tuned probability tree framework provided accurate characterization of the ideal S-curve.

Table 1. Simulation parameters.

SymbolMTJ parameterValue
$\alpha $ Gilbert damping $0.03$
${M}_{{\rm{S}}}$ Saturation magnetization $6\times {10}^{5}{\rm{A}}\,{{\rm{m}}}^{-1}$
${\rm{\Delta }}$ Thermal stability factor $70$
${A}_{\mathrm{ex}}$ Exchange stiffness $4\times {10}^{-6}\,{\rm{erg}}\,{\rm{c}}{{\rm{m}}}^{-1}$
$\mathrm{RA}$ Resistance-area product $7\,{\rm{\Omega }}\cdot {\mu {\rm{m}}}^{2}$
$\mathrm{TMR}$ Tunnelling magnetoresistance ratio $150 \% $
$a$ MTJ diameter $50\times {10}^{-9}\,{\rm{m}}$
${t}_{\mathrm{FL}}$ Free layer thickness $1.1\times {10}^{-9}\,{\rm{m}}$
${t}_{\mathrm{MgO}}$ MgO layer thickness $1.5\times {10}^{-9}\,{\rm{m}}$
SymbolDevice operation parameterValue
${T}_{\mathrm{pulse}}$ Pulse time $10\times {10}^{-9}{\rm{s}}$
${T}_{\mathrm{relax}}$ Relax time $15\times {10}^{-9}{\rm{s}}$
${J}_{\mathrm{SOT}}$ SOT current density $5\times {10}^{11}{\rm{A}}\,{{\rm{m}}}^{-2}$
$P$ Spin polarization of tunnel current $0.6$
$T$ Temperature $300\,{\rm{K}}$

For demonstration of the method, integer samples are drawn on the interval [0, 255]. This requires 8 coinflips per sample, with each coinflip interpreted as the bit of an 8 bit integer number. The chosen probability distribution to draw random samples $x$ from is the exponential distribution, whose PDF is:

Equation (2)

where $\lambda \gt 0$ is the rate parameter. For this work, $\lambda $ is chosen to be 0.01, resulting in 92.19% of the expected draws to lay in the interval [0, 255]. As the exponential distribution is unbounded on the positive side of the real number line, there is some expected distribution error as a result of truncation of the sample domain, which will be explored in section 6.

To capture MTJ device variations, the thermal stability factor ${\rm{\Delta }}$ is randomly varied about the value 70 by $\pm 3 \% $ following characterization of fabricated devices in [14]. The thermal stability factor is related to both the energy barrier ${E}_{b}$ and the anisotropy energy density ${K}_{u}$ through the relation [22],

Equation (3)

where ${k}_{{\rm{B}}}$ is the Boltzmann constant, $T$ the device temperature and $V$ the FL volume. Certain other device variations, such as the MTJ critical dimension (MTJ diameter) are also expected to affect device performance, however we find ${\rm{\Delta }}$ has the most significant effect on the RNG behavior; see [23] for such an exploration of other parameters using reinforced learning. All simulations occur at $T=300{\rm{K}};$ operation at different temperatures shifts the device S-curve and temperature variation of the SOT-MTJ modelled as an effective field was studied in our previous work [9]. The effects of Joule heating due to the applied bias current are not included in the model, however these effects would be captured during device characterization (S-curve generation) and therefore are not expected to diminish the ability to accurately bias the device.

4. Drawing samples

Figure 2 shows the result of using the ideal MTJ device to draw 100k integer samples from an exponential distribution using the probability tree. The blue line in figure 2(a) represents the relative frequency of the sampled draws for each integer in the set [0, 255]. It can be seen that, with some apparent noise due to the random nature of the sampling, the observed MTJ-sampled distribution frequency closely follows that of the expected distribution. In order to evaluate the quality of the sampled distribution, a ${\chi }^{2}$ parameter is computed via

Equation (4)

where $N=256$ is the number of possible output numbers (degrees of freedom), ${O}_{i}$ is the sampled (observed) frequency of each number, ${E}_{i}$ the expected frequency calculated from the exponential PDF, and $S$ the total number of samples drawn. The ${\chi }^{2}$ parameter is therefore a sum of squared deviations between the observed and expected results; a small ${\chi }^{2}$ implies little deviation, while a large ${\chi }^{2}$ implies there is a meaningful difference between the observed data ${O}_{i}$ and the expected data ${E}_{i}.$ The ${\chi }^{2}$ parameter is used with (one-sided, right-tailed) $p$-value testing to determine the probability of obtaining a result with at least as much deviation as that observed; i.e. it is the expected likelihood that the result of an experiment would be at least as extreme as the observed data. $p$-values greater than 0.05 are generally considered to imply a good data-fit [24]. For the blue curve shown in figure 2(a), ${\chi }^{2}=214.57,$ corresponding to a $p$-value of 0.972 and thus can be considered to have successfully drawn observed samples from an exponential distribution.

Figure 2.

Figure 2. (a) Sampled frequencies, in linear units, of sampled numbers using the MTJ with a probability tree compared to the exponential distribution. (b) Repeated experiments on a log unit scale with the highest and lowest $p$-value datasets. (c) Distribution of $p$-values across all 128 experiments.

Standard image High-resolution image

However, it should be noted that the $p$-value itself is a random variable; repeated experiments will draw different observed data and therefore will compute different ${\chi }^{2}$ parameters corresponding to different $p$-values. Thus, the previous experiment is repeated 128 times; the distributions resulting from the experiments corresponding to the highest (best) and lowest (worst) $p$-values are compared to the exponential distribution on a log-scale in figure 2(b). It is seen that even the 'worst' set of observed data is not easily visually discernable from an exponential distribution, or even the 'best' set of observed data, demonstrating how a single $p$-value is not enough to characterize the tuned probability tree performance. Ideally, the distribution of $p$-values from a large set of repeated experiments should be uniform with mean 0.5 [25, 26]. The spread of $p$-values for the 128 performed experiments is captured in the histogram in figure 2(c), which shows that the $p$-value distribution has this desired behavior.

5. Device-to-device variations

In practice, realizing the performance from the previous section would require either accurate knowledge of each individual device's S-curve or a set of perfect devices. However, as seen in figure 1(b), when we include parameter deviations, the calculated S-curve becomes distorted from the ideal case. To investigate the effect of having device variations on the ability to sample from exponential distributions, figures 3(a)–(c) show the results of 128 experiments where a single MTJ is used whose thermal stability is varied from the ideal according to the discussion in section 2. Unlike the previous non-varying case, there is significant deviation in the quality of the observed data as the device parameters stray from the designed value. Specifically, in figure 3(a), it can be seen that the device with the largest positive deviation under-samples low integers and over-samples high ones. The opposite behavior is observed for the device with the largest negative deviation. Each extreme is essentially sampling exponentially distributed numbers from distributions with a modified $\lambda $ parameter. The $p$-values for each experiment as a function of thermal stability are shown in figure 3(b), where it can be seen that while the $p$-values are roughly uniformly distributed in the range $68\leqslant {\rm{\Delta }}\leqslant 71,$ they are nearly always near 0 otherwise. The distribution of all $p$-values is shown in figure 3(c), where it is obvious that they do not follow a uniform distribution nor do they have a mean of 0.5.

Figure 3.

Figure 3. (a) Sampled frequencies, in logarithmic units, for the experiments with the largest positive and negative ${\rm{\Delta }}$ deviation. (b) $p$-values for all device deviation experiments, showing workable deviation limits. (c) $p$-value distributions for the varied device experiments.

Standard image High-resolution image

This implies that there is a certain variation tolerance which can be handled by the device under an assumed ideal S-curve; if the device parameters stray too far from the designed values, the applied STT currents will not accurately bias the individual flips. In this case, it is more accurately the ratio of thermal stability (anisotropy energy density) to saturation magnetization ${K}_{u}/{M}_{s}$ that determines the quality of the device. This ratio controls the level of PMA in the MTJ device, which is necessary for proper operation. To see this, consider (as a simplification) the total effective anisotropy only due to ${K}_{u}$ and saturation magnetization:

Equation (5)

If ${K}_{\mathrm{eff}}\gt 0,$ the device will exhibit PMA; if ${K}_{\mathrm{eff}}\lt 0,$ it will be IMA. Thus, to remain PMA it must be true that ${K}_{u}\gt {\mu }_{0}{M}_{s}^{2}/2.$ When the ratio becomes too small, the device more readily exhibits in-plane magnetic anisotropy (IMA), meaning the magnetization of the FL prefers to remain in-plane even without application of a SOT current and does not properly relax to either the P or AP states. Instead, the observed data increasingly becomes the result of the (uniformly-distributed) thermal noise. When the ratio is too large, the SOT current cannot fully bring the FL magnetization to the perpendicular state even with assistance from the biasing STT current, and thus is likely to remain in whatever state it already happens to be in. Thus when using a single device to generate a large set of random numbers, it is critical to get the device parameters correct.

Instead of repeatedly using a single (potentially non-ideal) device to draw an entire set of random numbers in this manner, it may be instead preferrable to draw each new number from a random device. In this mode of operation, the user is 'agnostic' as to where the number comes from; they do not know and do not care ahead of time which MTJ device will generate the number. This operation strategy may be preferrable as while an MTJ can theoretically have unlimited endurance [27, 28], in practice MTJ devices are expected to have a cycle life of ${\sim }{10}^{15}$ cycles before degradation [29], and thus distributing the work can let devices last longer. Further, each device can generate numbers in parallel, instead of needing to sequentially be sampled millions or billions of times.

Figures 4(a)–(b) show another benefit of this approach. Here, the varied devices are again used to generate sets of 100k random numbers, but this time each sampled number comes from a unique (randomly-sampled) device with a random deviation in thermal stability. In figure 4(b), it is seen that the $p$-value distribution again becomes uniform with a mean centered at 0.5. This is because, assuming the variations in the devices are normally-distributed and centered around the device parameters chosen to correspond to the ideal S-curve, there is an equal likelihood for any device to be under- or over-biased compared to the desired bias level. The net result is that the collection of devices as a whole does reasonably sample from an exponential distribution. As before, it is seen in figure 4(a) that the differences between the observed best, observed worst, and exactly exponential distribution results are visually indiscernible.

Figure 4.

Figure 4. (a) Highest and lowest $p$-value datasets taking a device agnostic approach when drawing random samples. (b) $p$-value distribution for the device agnostic experiments.

Standard image High-resolution image

Using this strategy, it is possible to observe the shapes of the distributions as the number of pulled random samples increases. Figure 5 shows the distribution as the number of samples is increased to 1 M, 10 M, and 100 M. At these higher sample sizes, the visible random fluctuations present in the set of 100k samples disappears. Despite this, the ${\chi }^{2}$ parameters computed via (4) for each case start to diverge as the sample size $S$ increases. This is a known and expected phenomenon [30], as even the slightest deviation from the expected dataset becomes statistically significant when sample sizes are large. Thus all $p$- values will become 0 unless the observed data exactly recaptures the expected data, and will no longer be useful as a measure of data quality.

Figure 5.

Figure 5. Sampled distributions, in logarithmic units, when generating datasets of sizes 1 M, 10 M and 100 M using the device-variation agnostic approach. Note the three cases are arbitrarily offset from each other in the y-axis to show in a single plot.

Standard image High-resolution image

Instead, it is possible to simply observe visually in the 100 M sample case that, while the sampled distribution closely follows the expected distribution, there is a clear discontinuity between integer values of 127 and 128. This corresponds to an error in the most significant bit of the integers, which is the first bit determined using the probability tree structure. Close inspection of the 100 M curve reveals further, smaller discontinuities at other locations.

To understand the source of these discontinuities, the sensitivity of the sampled distribution as a result of variations at each individual level of the probability tree is investigated by using a single, ideal device for seven of the levels and a varied device for the eighth. This process is repeated for each level, with results on the sampled distributions corresponding to the variations in each of the three highest levels of the probability tree shown in figures 6(a)–(c). Figure 6(a) shows the largest positive and largest negative deviations in the device thermal stability for the most significant bit, affecting the biasing of coinflips at the top level of the probability tree. It is seen that when the deviation is large and positive, integers between 0 and 127 are under-sampled while integers between 128 and 255 are over-sampled; the reverse is true when the deviation is negative. When the second bit is varied as in figure 6(b), the sequence of alternating over- and under-estimates divides the domain equally into four segments. The pattern continues when varying the third bit in figure 6(c), although with each successive level down the probability tree the size of the deviations from the exponential distribution shrink. The discontinuities seen in the 100 M dataset in figure 5 are thus the result of the summation of these variation patterns over all levels in the probability tree; as the effect is most pronounced for variations at the top of the tree, the discontinuity is strongest at the center of the distribution. This implies that the quality of the first flip is the most critical to accurately sampling from distributions using the probability tree.

Figure 6.

Figure 6. (a) Largest positive and negative deviation dataset in the first coinflip (top of the probability tree) when not allowing deviations in the other coinflips. (b) Largest positive and negative deviation in the second coinflip, keeping other levels unvaried. (c) Largest positive and negative deviation in the third coinflip, keeping other levels unvaried.

Standard image High-resolution image

In addition to variations in the device parameters, there is also potential variation in the biasing STT currents. Supplying a current different from the intended value would map to an incorrect bias value on the S-curve. For the level of electrical current precision needed for this application (59 nA at 1% precision [9]), a supply noise of 0.6% can be achieved using operational amplifier current sources [31]. This level of uncertainty is much less than that introduced by the variations of device parameters; repeating the experiment shown in figure 2 with a consistent +0.6% error on every supplied STT current only slightly causes a deviation in the p-value distribution from uniform, with the distribution mean going to 0.466 instead of 0.5, a much smaller effect than shown in figure 3.

6. Application

As an additional test of the quality of tRNG using an MTJ via the tuned probability tree framework, the MTJ-sampled datasets are used during the numerical solution of physical phenomenon. Specifically, random walk solutions to partial integro-differential equations describing a simplified problem in particle physics transport are computed. The particular problem solved is a one-dimensional, initial-value, time-dependent problem describing the angular flux density ${\rm{\Phi }}\left(t,{\rm{\Omega }}\right)$ of a particle in an absorbing medium, with directional states ${\rm{\Omega }}\pm 1.$ The particle can either change states according to a Poisson process with some rate ${\sigma }_{s},$ or be absorbed according to another Poisson process with rate ${\sigma }_{a}.$ While there are analytic expressions which describe this process, its solution can be cast in a probabilistic representation and the solution estimated in a Monte Carlo fashion by averaging the result of many random walks. For full details on the simulated equations and probabilistic representation, see note SN3.1 in the supplementary material to [32]. The solution requires both exponentially and uniformly distributed random variables, which can each be sampled using the MTJ-based device in a tuned probability tree framework.

For this demonstration, scattering rate ${\sigma }_{s}=0.5$ and ${\sigma }_{a}=5.0.$ Thus, the sampled exponential distribution has parameter $\lambda =({\sigma }_{s}+{\sigma }_{a})=5.5.$ As 99.9983% of all expected sampled values fall in the interval [0, 2] for this exponential, the initial lower and upper bounds to truncate the distribution domain are chosen to be 0 and 2, respectively. In order to generate sampled numbers on the order of double-precision, this requires 55 consecutive flips for each sampled draw, as $\frac{2}{{2}^{55}}\approx 5.55{{\rm{E}}}^{-17}\lt 1.0{{\rm{E}}}^{-16}.$ Only 54 consecutive flips are required for each uniformly sampled number, as the uniform distribution is supported on [0,1] (and thus its bounds are 0 and 1). In total, 128k random walks are performed; as each random walk solution requires ∼80k exponential draws and ∼150k uniform draws, this necessitates ∼10 million exponential samples and ∼20 million uniform samples for the entire simulation. The MTJs used to generate samples have normally-distributed variations in their thermal stability according to section 4 and MTJ and simulation parameters follow those listed in table 1. The averaged results of the random walk simulations are compared to the analytical solution in figure 7(a), showing excellent agreement.

Figure 7.

Figure 7. (a) Averaged result of 128k random walk simulations using the MTJ-based device and tuned probability tree for simulating particle transport. (b) Truncation of the sampled distribution domain to the interval $\left[\mathrm{0,1}\right]$ causes underestimation of the true solution, due to the absence of rare, large draws.

Standard image High-resolution image

To demonstrate the importance of careful construction of the probability tree via proper truncation of the distribution domain the experiment is repeated to instead sample the exponential distribution on the interval [0, 1], which captures only 99.59% of the expected sampled values. This requires one fewer flip for each exponential sample to achieve 16-bit precision. The results of 128k random walks is shown in figure 7(b), where it can be seen that both the ${\rm{\Omega }}=+1$ and ${\rm{\Omega }}=-1$ lines are now beneath the analytical solution by about 5%. The likely reason for this is the absence of rare, large draws push the flux value upwards during the simulation. The necessary size of the sampled distribution domain and precision of the sampled numbers will depend on the problem being solved, but can be controlled via the correct choice of initial lower and upper domain bounds and number of probability tree levels.

In addition to the ability to generate large sets of random draws to accurately capture the physics of the particle transport problem, it is worth noting the time and energy usage savings to perform the computation as compared to traditional CMOS pRNGs. There are three stages to each individual MTJ coinflip: a pulse, a relaxation, and an MTJ read. With the given parameters in table 1 each coinflip requires 25 ns total for pulsing and relaxation, and draws on average 0.15 pJ of energy (the energy draw does not depend significantly on the weighting [9]). Assuming a 2 ns read stage [33] using a 10 mV Ohmic read, the total time for a single coinflip is 27 ns; the energy of the read is significantly less than the pulse and relax, and so is ignored. This calculation does not include the energy to generate the pulse, which will come from external CMOS circuitry. In the particle transport example, each exponentially distributed random number requires 55 total coinflips, and thus takes 1.49 $\mu $s and 8.25 pJ to generate. In contrast, CMOS pRNG of 16-bit integers useful for cryptographic purposes can take anywhere from ∼6 to 600 $\mu $s depending on platform and scheme, and draw no less than 0.35 [$\mu $J] per integer [1]. Further, in [2] 200,000 Gaussian-distributed random numbers are generated on a Nvidia GTX 970 GPU in ∼20 [ms] using the polar box-Muller method, or 0.1 ns per number. While this is faster than the MTJ-based device in the probability tree, it is noted that the Nvidia GTX 970 parallelizes pRNG across up to 1664 cores, while the MTJ is a single device; operating many MTJs in parallel will allow for much faster tRNG as each random number can be generated by an independent MTJ, speeding up tRNG by a factor proportional to the number of devices. It is also possible to compare to other types of stochastic number generation (SNG). [34] uses a two-terminal memristor device integrated with 65 nm CMOS technology to achieve 1 bit per 10 ns and 0.8 pJ per bit. Extrapolated to 55 generated bits this would be 55 ns and 44 pJ, slightly better than the SOT MTJ device presented here, though is unable to be dynamically tuned. And finally, [35] used HSPICE to model a MTJ for (voltage pulse) biased-SNG. It found it was able to do so at a rate of 1 bit per 8.21 ns with an energy of 0.53 pJ, with similar interpretation as the previous SNG example. For a visual comparison of the reported MTJ-based device energy usage and bitrate to CMOS pRNG, Memristor SNG, and a selection of other MTJ-based devices, see figure 8 and table 2. While the MTJ-based devices all produce random bits at much lower energy than either memristor SNG or CMOS technology, it is also useful to compare between the MTJ-based devices. Specifically, the superparamagnetic low-barrier magnet MTJ in [43] has the lowest energy usage, but is slower and susceptible to thermal and process variation due to the much lower thermal stability. [41] and [42] use medium-barrier magnets, which also have lower thermal energy barrier (∼20–40 ${k}_{{\rm{B}}}T$) than this work. [44] uses high-barrier magnets (109 ${k}_{{\rm{B}}}T$), but with slower bitstream generation as a result. [45] generates bits more quickly than this work, but is difficult to tune as the stochastic behavior is due to precessional nature of the device. [46] uses parallel MTJ-devices to improve bitstream quality, at higher energy cost. And [47] counts the precessions of a spin-torque nano-oscillator (STNO) during a specified period to generate bitstreams, which are 'biased' by adjusting a count threshold. Thus, SOT MTJ in a tunable probability tree is both faster and more energy efficient than modern CMOS technologies, and competitive with other non-CMOS stochastic devices with the additional ability to be dynamically tuned.

Figure 8.

Figure 8. Benchmarking random bit generation for CMOS, Memristor, and MTJ based technologies. Data is reported in [1] as time and energy per integer, and so converted to bitwise data for plotting purposes. This work is indicated by the red arrow; it does not include energy or delay effects needed to generate SOT pulse in external CMOS circuitry. See table 2 for the underlying data.

Standard image High-resolution image

Table 2. RNG benchmarking.

ReferencesTechnologyEnergy/bit [pJ]Bitrate [Mb/s]
[1]CMOS pRNG $1.6{{\rm{E}}}^{4}-1.1{{\rm{E}}}^{6}$ 0.024–16
[36]CMOS tRNG2.92400
[37]CMOS tRNG23231.6
[34]Memristor SNG0.81100
[38]Memristor SNG52300.04
[39]Memristor SNG0.80.016
[40]Memristor SNG3.150.006
[35]MTJ SNG0.526122
[41]SMART0.092500
[42]SMART0.29250
[43]Superparamagnetic0.020.1
[44]Spin dice0.1220.6
[45]Precessional0.1244
[46]Parallel MTJs0.7100
[47]STNO5020
${\rm{This\; work}}$ SOT-pulsed0.1537

7. Conclusion

This paper studies how MTJs can function as tunable probability trees for the purpose of generating random samples from arbitrary probability distribution functions. Through repeated numerical experiments and observing $p$-value distribution behavior, it was shown that the MTJ could reasonably draw integer samples from an exponential distribution with a given rate parameter. Through varying the MTJ thermal stability Δ, it was shown that device parameters distort the ideal S-curve of an individual device and cause the device samples to become biased away from the desired distribution. However, by choosing to draw new samples from random devices and choosing the ideal S-curve such that there are normally distributed device variations, it is possible to draw numbers of the same quality as those drawn from an ideal device. It was also shown that the device variations induce different amounts of bias at different levels of the probability tree, with the highest level of the tree being most sensitive to device variations.

The probability tree framework was used to generate exponentially and uniformly distributed numbers for Monte Carlo-type simulation of a particle's angular flux density for a problem in particle physics transport. Generating 30 M total random samples, the solution generated using the MTJ-based device showed excellent agreement with the analytical solution. Further, the random number generation was achieved at significantly lower energy usage than traditional CMOS pRNG, and at comparable energy usage and bitrate as other MTJ-based tRNG. The proper construction of the probability tree, via correct identification of the distribution domain bounds and level of precision required, was demonstrated in this problem by over-truncating the distribution domain, causing the MTJ-based device solutions to deviate from the analytical solution.

Acknowledgments

The authors acknowledge support from the DOE Office of Science (ASCR/BES) Microelectronics Co-Design project COINFLIPS. SL acknowledges support from the National Science Foundation graduate research fellowship program under Grant No. 2021311125. The authors acknowledge the Texas Advanced Computing Center (TACC) at the University of Texas at Austin, Austin, TX, USA, for providing the High Performance Computing resources that have contributed to the research results reported in this article. URL: http://www.tacc.utexas.edu. This article describes objective technical results and analysis. Any subjective views or opinions that might be expressed in the article do not necessarily represent the views of the US Department of Energy or the United States Government. Sandia National Laboratories is a multimission laboratory managed and operated by the National Technology and Engineering Solutions of Sandia, LLC, a wholly owned subsidiary of Honeywell International Inc. for the U.S. Department of Energy's National Nuclear Security Administration under Contract DE-NA0003525.

Data availability statement

The data cannot be made publicly available upon publication because they are not available in a format that is sufficiently accessible or reusable by other researchers. The data that support the findings of this study are available upon reasonable request from the authors.

Please wait… references are loading.
10.1088/1361-6528/ad3b01