Computational Capacity and Energy Consumption of Complex Resistive Switch Networks

Resistive switches are a class of emerging nanoelectronics devices that exhibit a wide variety of switching characteristics closely resembling behaviors of biological synapses. Assembled into random networks, such resistive switches produce emerging behaviors far more complex than that of individual devices. This was previously demonstrated in simulations that exploit information processing within these random networks to solve tasks that require nonlinear computation as well as memory. Physical assemblies of such networks manifest complex spatial structures and basic processing capabilities often related to biologically-inspired computing. We model and simulate random resistive switch networks and analyze their computational capacities. We provide a detailed discussion of the relevant design parameters and establish the link to the physical assemblies by relating the modeling parameters to physical parameters. More globally connected networks and an increased network switching activity are means to increase the computational capacity linearly at the expense of exponentially growing energy consumption. We discuss a new modular approach that exhibits higher computational capacities and energy consumption growing linearly with the number of networks used. The results show how to optimize the trade-off between computational capacity and energy efficiency and are relevant for the design and fabrication of novel computing architectures that harness random assemblies of emerging nanodevices.


Introduction
Please use the AIMS template to prepare your manuscript, before you submit to our journal.Please read carefully the instructions for authors at http://www.aimspress.com[1].These are important instructions and explanations.Thank you for your cooperation.

Sub-subheading
The heading levels should not be more than 4 levels.The font of heading and subheadings should be 12 point normal Times New Roman.The first letter of headings and subheadings should be capitalized.

Results
The body text is in 12 point normal Times New Roman, the line space is at least 15 point.[add an equation here; use MS Word or MathType equation function] (1) 4. Discussion

Conclusion 1. Introduction
Emerging nanoscale resistive switches provide possible solutions for creating future computing architectures that are faster, less expensive and more energy-efficient by exploiting their intrinsic switching characteristics [11].Recent progress in the fabrication of atomic switches [15], as one class of emerging nanodevices, allows the random assembly of these devices into larger networks [1,29,32].In [12] it was argued that the complexity of these networks resembles that of biological brains, in which the complex morphology and interactions between heterogeneous network elements are responsible for powerful and energy-efficient information processing [31].Contrary to designed computation, where each device has a specified role, computation in random resistive switch networks does not rely on specific devices, but is encoded in the collective nonlinear dynamic switching behavior as a result of an applied input signal.
Atomic switches, as well as memristive devices [10,34], are history-dependent resistive switches.Application of a bias voltage can change the conductance of the device.The nonlinear relationship between an applied input and a resulting conductance change in these devices perform nonlinear transformations of the input, similar to the dynamics found in biological synapses [18,19,25,27].
Harnessing the intrinsic nonlinear characteristics of these emerging nanodevices assembled in random structures has been shown for the memristor [5,6,20] as well as for the atomic switch [1,12,29,32].Simulated random memristor networks have been employed to implement reservoir computing (RC) [17,23].RC is a computational approach initially inspired by cortical microcircuits, in which computation takes place by translating the intrinsic dynamics of an excited medium, called a reservoir, into a desired output.By observing the nonlinear responses of a random resistive switch network to different input signals, it was possible to perform simple pattern classification [6,20] as well as computationally more demanding tasks by using multiple independent random assemblies [5].In contrast to the simulation-based results, work on atomic switch networks (ASN) has demonstrated physical random assemblies and discussed fabrication parameters that determine the network morphology [12].Based on these fundamental device and network characteristics that resemble the complexity of biological brains, it was argued that these networks are viable candidates for the physical realizations of brain-inspired information processing, such as reservoir computing.
Here we present a modeling and simulation framework that enables a detailed analysis of resistive switch network morphologies as determined by nanowire lengths, distributions and density.The computational capabilities of different network morphologies are analyzed with respect to the compressibility of the measured network signals.We put the computational capabilities into perspective by comparing to corresponding energy-consumption data.This comparison outlines a trade-off between computation and energy consumption.A modular approach is presented that provides a more energy-efficient architecture while achieving higher computational capabilities.Our results demonstrate what constitutes computationally useful networks with respect to network morphology, density and signal amplitudes.Based on the used modeling parameters, future fabrication of random resistive switch networks can be guided to achieve a desired trade-off between computational capacity and energy consumption.

Memristive Devices and Atomic Switches
Memristive devices and atomic switches have been investigated in the context of random networks and reservoir computing.In this section we will establish the functional similarities and argue that the methods and results presented here are valid for both device types, and hence for a larger range of resistive switches.Detailed comparisons of memristive devices and atomic switches can be found in [7,14].
The metal-insulator-metal (MIM) structure of the memristor, where the insulator is typically a metal-oxide such as TiO 2 or WO x [7,34], establishes changes in the device conductance by redistributing oxygen vacancies within the metal-oxide.In the absence of an input bias, the oxygen vacancies remain at their positions, which leads to the history-dependent conductance of the device.Volatile behavior was demonstrated for a WO x memristive device [9].Here spontaneous diffusion of the oxygen vacancies causes a dissolution of conducting channels in the WO x thin-film and a return to a low-conducting ground state of the device.
The atomic switch as a gap-type device based on crossing Ag 2 S and Pt wires, achieves conductance changes by changing the concentration of Ag + cations which will allow growing metal protrusions of Ag atoms that eventually form a bridge between the two wires.The width of that bridge determines the conductance of the device [14].In the absence of an input bias, the thermodynamically unstable atomic bridge eventually dissolves and the atomic switch returns to a low conductive equilibrium state [25].
An important functional similarity between memristors and atomic switches is the nonlinear relation between the applied input and the resulting device state and current.Strukov and Williams presented an exponential ionic drift model that describes how an applied electric field changes the effective activation barrier and velocity for ionic-drift within the metal-oxide [35].Similarly, Tamura et al. observed the exponential dependence of applied bias and switching time and argued this to be caused by a required minimum activation energy necessary to form a metal bridge within the atomic switch [36].
Having established the qualitative similarities between memristive devices and atomic switches that are relevant to nonlinear computation, the rest of this paper will refer to both types of devices as resistive switches, allowing the presented concepts to be translated to specific device implementations.

Resistive Switch Modeling
In this section we will describe the device model used in our simulations.Our focus is on capturing the fundamental switching characteristics of the discussed devices, not on precisely reproducing empirical data obtained from a specific device.As outlined in the preceding section, resistive switches are characterized by history-dependent nonlinear conductance changes and state decay.
We adopt a memristor model as presented for W O x devices in [7,8].In the original model the conductance as well as the state change are defined as follows: Here the internal device state is modeled by w, V is the applied input bias, , θ, γ, δ, λ, η, τ are the model parameters calculated from the experimental data.The nonlinear switching, as explained by the exponential ionic drift model [35], is modelled by the sinh term.This model captures well the nonlinear switching as a function of the applied input and the current state.
However, such a first-order model, using only one variable to describe the device conductance, does not capture effects such as Ag + cation concentration changes.Such a change affects a device's response to future bias signals, but does not reflect into the actual device conductance.In [14] this difference was described as the memristor using a single variable to model the size of an ion-doped area, while the atomic switch uses two variables, one to model the height of an Ag protrusion, and another to model the width of the atomic bridge that emerges after the Ag protrusion has reached a sufficient height.Recently a second-order memristor model was presented that follows a similar modeling approach to the atomic switch [19].Here the second variable that functions as an enabler for a subsequent change in conductance is the internal device temperature.Application of an applied bias signal increases the internal device temperature due to Joule heating, which in turn affects drift and diffusion processes described earlier.
To account for effects such as Joule heating or Ag protrusions, we use equation 2 to model an internal state w that does not directly reflect in the device conductance.Furthermore, we extend the model to implement the different state decays based on the device state.As shown in [9,25], the rate of dissolution of the atomic bridge or the diffusion of ions to an equilibrium state is state-dependent and enables short-and long-term memory within a single device.We adopt equation 2 as well as describe state variable w, which models the device conductance as: For our simulations we employ a binary switching function f that thresholds w to create two distinct conductances.Binary behavior of atomic switches was shown in [33] and is also found in some memristive devices [13].Different levels of sub-surface Ag + concentrations are required to establish or dissolve an atomic bridge (length and width of Ag protrusion) [24].This implies different threshold values for the internal state variable w to perform device switching.We model this by applying a hysteresis function to w .
As random assemblies of resistive switch networks will exhibit variations in nanowire diameters, and hence in the resulting device sizes (e.g., gap size of atomic switches), variation is an integral system characteristic [12].In [29] device parameter ranges for ASN were shown.These parameter ranges were not simply described by a Gaussian distribution around a mean value, but could span several orders of magnitude.For our experiments we apply a simplified model to create device variation where we draw device parameters uniformly from respective parameter ranges.

Network Modeling
While the application of memristive networks to reservoir computing was all simulation based [5,6,20], physical assemblies of atomic switch networks were reported in [1,12,29,32,33].A simple modeling approach with focus on localized conductance changes of such networks was shown in the supplementary documentation of [29].
Here we expand the modeling of such networks with the aim to investigate the relation between network morphologies, computational capabilities, and energy consumption.Network morphology is defined by the average length of nanowires.Their density can be controlled by the underlying copper seed posts.In [29] the effects of copper seed posts' shape, size, and pitch on the network's nanowire length and density were described.Smaller seed posts lead to long wires while larger seed posts lead to more fractal local structures.The pitch of the seed posts is a control parameter for the network density.
The connection of the random assembly of nanowires with the size of the seed posts implies that the distributions of the nanowire lengths can be described in terms of some probability density function.Large seed posts would cause more localized connections while small seed posts would result in long-range connections.Hence, we use a probability density function (PDF) that can capture different distributions.We use a beta-distribution for our purposes.A betadistribution B produces values in the interval of [0, 1] and is controlled by the parameters α and β that define the mean value and the skewness around the mean.
dx is a normalization factor ensuring that P (x, α, β) is a probability measure.We use this PDF to model the distribution of the relative nanowire lengths within a random network.To translate the PDF into nanowire lengths we create a map with normalized Euclidean distances ([0, 1]) from every seed post within the network.Starting from an initial node, we add a nanowire to the network by drawing a value from P (x, α, β) that determines the nanowire length.Beta-distributions with a focus on local connections (α < β) will produce more fractal structures as nanowires branch out around the selected starting node for the wire.Distributions favoring long-range connections (α > β) will connect distant parts of the underlying grid of seed posts.In Fig. 1a we show some examples of P (x, α, β) as a function of the control parameters α and β.
For the modeling of the underlying seed post grid we define two post types.Interface seed posts allow the application and reading of network voltages.These posts are established at a more coarse-grained pitch.The second grid is a supporting grid and more fine-grained than the interface grid.This supporting grid improves the formation of fractal structures for localized wire growth [12] and the establishment of more complex structures between the interface nodes; without a supporting grid establishment of multiple devices between two interface nodes will not be favourable which limits the morphological diversity.The density of a network is defined by  medium (small σ) semi-sparse N5 5 5 2 medium (medium σ) sparse the indegree ξ of a node.ξ defines the average number of devices connected to each grid node.
The number of resulting devices, and hence the density, is then given by the total number nodes in the supporting grid multiplied by ξ.The fabrication of networks with different densities was demonstrated in [29].In Fig. 1b we show an example network model with 16 interface nodes, a supporting grid at a third of the pitch of the interface grid, and a mix of short and long-range connections.For all simulations we have used 16 interface nodes arranged in a 4 × 4 grid as shown in Fig. 1b.If not mentioned otherwise we have used a supporting grid at half the pitch of the interface grid.Throughout this paper we will refer to this network as of size 16 1 , meaning that it has 16 interface nodes and 1 node between two interface nodes (half the pitch).
For a more intuitive understanding of how the defined network parameters inter-relate, we give five examples and describe the resulting network morphology (Table 1).In Fig. 2 and 3, we added the network IDs to the presented results to illustrate the relation between network morphology and information processing capacities.
We simulate the resistive switch networks by treating them as temporarily stationary resistive networks that can be solved efficiently using the modified nodal analysis (MNA) algorithm [22].After calculating one time step using the MNA, we update the memristive devices based on the node voltages present in the network to account for the dynamic state changes of memristors.

Network Morphology and Computational Capacity
As was outlined by Demis et al. [12], the structural similarities of ASN and biological brains (i.e., fractal branching similar to dendritic trees) suggests that such complex random assemblies provide hardware platforms for efficient brain-inspired computing.A characteristic feature of cognitive architectures is the nonlinear transformation of an input signal into a high-dimensional representation more suitable for information processing [4,26].In particular, for N > M , the input signal u ∈ R M is transformed to x ∈ R N by the dynamics of the cortical microcircuits driven by sensory inputs.In the case of random resistive switch networks this means that the measured signals at the interface nodes are ideally nonlinearly related to the input signal and then provide a platform for brain-inspired computing.
A suitable transformation of the input requires rich dynamics that can preserve relevant distinctions between different input signals in the high-dimensional space [23].This property is attributed to the dynamics of a system in the critical regime where the distinctions between states do not diverge or converge [2,21,30].To provide such a rich dynamics, the activity of the nodes should show the least amount of redundancy.In other words, the dynamics of the nodes should be as uncorrelated as possible.The question is how could one measure that and how this measure would be related to the parameters of the system.
Here we introduce a simple measure that can describe the dynamics of random network signals in a way that is meaningful in the context of information processing.We study the compressibility of the network dynamics as a proxy for its richness.Specifically, we use principal component analysis (PCA) to transform the system dynamics into a principal component space [3].
The distribution of variations in the principal component space indicates the amount of redundancy between the different dimensions of the original system.The variation in each principal component is given by the corresponding normalized Eigenvalue of the covariance matrix of the system.To calculate these, we record the network dynamics on all interface nodes as a result of an applied network input.All network signals are expressed in the network state matrix X.The covariance matrix of the network dynamics is then given by C = X T X, and the Eigenvalues are obtained by diagonalizing, C = U ΛU −1 .The diagonal elements of Λ, {Λ 1 , Λ 2 , . . ., Λ N }, are the Eigenvalues of the corresponding dimension i and can be normalized as . Since λ i are normalized as a probability measure, we can describe their evenness using a single number H = − N i=1 λ i log 2 (λ i ).This is an entropy measure and describes how evenly the λ i are distributed.In one extreme case, where the system nodes are all maximally correlated, only one Eigenvalue will be 1 and the rest will be zero, and the resulting entropy will be H = 0.In the other extreme case, where the nodes are maximally uncorrelated, the Eigenvalues will be identical and equal to 1  N , and the resulting entropy will be H = log 2 N .In the latter case, every node in the network represents something unique about the properties of the input signal that cannot be described by a combination of the rest of the nodes.
In the following we present entropy measurements as a function of the network morphology.As outlined in section 2.3, the morphology is modeled based on a beta distribution with parameters α and β as well as the indegree ξ that defines the device density.We apply a 5Hz sine wave to the upper left interface node of a 16 1 network and connect the lower right node to ground (0V).Fig. 2 shows the network entropies as a function of α, β, and ξ.Across all densities, the highest entropies, and hence the least linearly-dependent network states are achieved with a majority of the nanowires being equal or longer than half the normalized maximum euclidean distance (lower left triangles in Fig. 2 where α ≤ β).Long-range connections spatially distribute the input signal across the network without much voltage drop.Hence, different areas of the network experience a sufficient bias voltage to exhibit switching dynamics useful to information processing.Increasing network densities, which in other words describes the number of devices per area, creates more signal paths and due to device parallelism also higher conductive connections.This also leads to better distribution of the input signal and to an expansion of the morphologies for which larger entropies can be achieved.
Besides the network morphology, the amplitude v of the input signal also greatly affects the network dynamics due to the exponential dependence of applied bias and either activation energy to form an atomic bridge (for atomic switches) or the velocity of ionic drift (for memristive devices).In Fig. 3   Increasing signal amplitudes v lead to higher bias voltages for individual network devices and results in higher switching activity.Highest entropies are again achieved for more globally connected networks with α ≤ β (lower triangles of the plots).Markers Nx refer to table 1.
indegree ξ = 6.It can be seen that the average entropy increases as we increase v.This is related to larger voltages enabling more devices to exhibit switching activity and hence affect the network dynamics.The similarity in the plots as compared to Fig. 2 implies that increased input voltages also allow better spatial distribution of the input and more areas of the network to exhibit switching activity.Note that these plots present qualitative results on the dependence of v and entropy, but the absolute values of v are device dependent and can change for devices with different threshold behavior.We also studied network sizes 16 0 , and 16 2 .16 0 networks have shown very similar results to the 16 1 networks presented here.The morphological similarity between 16 0 and more globally connected 16 1 networks is that they do not form many local fractal structures.Contrary, the 16 2 networks are characterized by more complex morphologies between the interface nodes.While this might closely resemble complex structures in biological brains, in the context of passive electrical networks, these fine-grained fractal structures with many devices in-between interface nodes lead to alleviated individual switching dynamics.While this could be circumvented in simulations by increasing the input amplitude v, in practice this is not a viable approach for reasons of safe operation and energy consumption.

AIMS Molecular Science
Volume 3, Issue x, xxx-xxx page.

Energy Consumption
In the previous section we have shown that best computational capabilities are achieved for dense, globally connected networks (α ≤ β) and larger signal amplitudes v.The viability of these results has to be evaluated with respect to the energy that would be consumed by such networks.Based on the application of a 5 Hz sine wave with amplitude v, we calculate the total energy consumption of a network over time T as E(t) = T i=0 V i (t)I i (t)dt, with V (t) being the time-dependent signal amplitude and I(t) the current drawn by the network.As the energy consumption is very application-specific, we consider our findings as qualitative measures that highlight the general relations between the network parameters and the consumed energy.Absolute energy numbers presented here are not of relevance, only the information on how drastically energy consumption changes with network parameters.Fig. 4a shows the resulting energy consumption for different α and β, averaged over different ξ and v.The distributions of the energy data across the α and β plane resemble the entropy distributions seen in Fig. 2 and  3, which confirms that high entropies come at the cost of high energy consumption (relative to the consumption at low entropies) caused by high conducting paths in denser networks.This finding is further supported by plotting energy vs. entropy (Fig. 4b).Here we plot the averaged energy against the averaged entropy for corresponding setups.We can see how entropy grows with energy.However, as the energy grows exponentially, increasing entropy can lead to overproportional energy consumption.As the computational performance of a single random network is limited by exponentially increasing energy consumptions or strong linear dependence at low energies, we will outline an approach to increase entropy with linear growth of energy.In [33] a hierarchical approach of ASN was presented that combined multiple small-world networks on a single chip.Similarly, we have shown a hierarchical approach that embedded independent networks (similar sizes as presented here) in a reservoir computing architecture and showed that a memory and computationally demanding application could be solved [5].
The concept relies on extracting only a subset of signals from each independent network.This allows harnessing the different processing caused by the different random structures of the independent networks.Furthermore, in [5] we have extracted a differential signal from each network.By retrieving a network output as the difference of two network nodes, differences in the spatial and temporal dynamics within networks can be better captured than using signals measured with respect to a common ground.In Fig. 5 we show two examples of 16 network signals.Fig. 5b shows signals obtained from a single random network (qualitatively comparable to signals presented in [12]).Readouts from a single network show some nonlinearities, however, it is mostly characterized by strong linear dependence across the signals (low entropy of 0.37).In contrast, 16 independent networks exposed to the same input, we can significantly increase the richness of the measured signals.In this example the entropy is 1.79.The total energy consumption of the hierarchical independent networks scales linearly with the number of networks.Considering the gain in entropy, this approach poses a much more viable approach for brain-inspired information processing than relying on single networks with high density and high signal amplitudes.A comparison of how energy and entropy relate in the two presented setups .Energy vs. entropy for single networks and a hierarchical approach with 16 independent networks.The circled areas highlight an example of networks with the same network parameters (v = 2, ξ = 2).Scaling up the number of independent networks is more energy-as well as computationallyefficient as compared to increasing the entropy of a single network by means of scaling up either density or voltage.Data obtained with v = 2 and ξ = 2 . . .8. is shown in Fig. 6.The plotted data represents collected data from single networks as well as for the 16 networks approach with ξ = 4 and increasing v.The circled areas approximately mark the energy and entropy data obtained with a 2 V input signal.The average energy difference is 16 which corresponds to the number of networks used.However, compared with single networks that exhibit similar energy consumption (around v = 4..6) significantly higher entropies can be achieved.

Discussion
We have presented a detailed analysis of how the morphology of random resistive switch networks affects computational capacity and energy consumption.It will be an application-specific choice of how much entropy is required by the individual networks and what costs one is willing to pay in terms of energy.The hierarchical approach and the better entropy per energy ratio is possible as we utilized different independent networks.This implies that cognitive computing is not merely the product of sufficient excitement of network elements but is also rooted in the heterogeneous morphologies across networks, as demonstrated for neural populations in cortical microcircuits [16].This ability to harness randomness is a strong argument for the future fabrication of nanoscale resistive switch networks.We found that networks with strong fractal structures (i.e., 16 2 networks) performed worse.However, in biological brains such fractal structures (dendrites) are an integral part of the information processing.Biological dendrites are active components with voltage-and calcium-gated ion channels that regulate synaptic re-sponses [28], while the fractal structures in resistive switch networks are purely passive elements.Further investigation of the functional differences might provide better fabrication techniques and methods to utilize fractal resistive switch networks.

Conclusion
The nonlinear dependence of input bias and state transitions in resistive switches allows random assemblies of such devices to exhibit complex behavior useful to computation.The design parameters used to control the simulated network morphologies relate to design parameters for the physical fabrication of the networks.We have shown how network morphologies correlate with computational capacities and energy consumption.A hierarchical approach was presented that allows us to increase computational capacities more energy-efficiently as compared to increasing information processing in single networks.These findings provide insights in the abilities and limitations of random nanoscale resistive switch networks and should serve as a design guide for investigation and fabrication of future nanoscale computing architectures.

Figure 1 .
Figure 1.Legend of the figure.

Figure 1 .
Figure 1.(a) Different beta distributions and the resulting relative nanowire lengths.The x-axis represents the normalized nanowire length and the yaxis the probability of creating a nanowire of that length.(b) Example graph of modelling a random 16 2 memristive/atomic switch network with α = 2, β = 5, and ξ = 4.The large circles represent the interface nodes to the underlying CMOS layer.The small circles create a narrower supporting grid that guides the density and morphology of the network structure.The graph edges represent memristive devices and the edge width is an indicator for the connectivity range with increasing edge widths representing longerrange connections between nodes.
we show the entropy as a function of α, β, and v for networks with an AIMS Molecular Science Volume 3, Issue x, xxx-xxx page.

Figure 2 .Figure 3 .
Figure 2. Entropy as a function of the network topology defined by the beta distribution values α, β and the indegree ξ (v = 8V ).Independent of the density, best entropies are achieved for more globally connected network morphologies described by α ≤ β (lower triangles of the plots).With respect to the network density an indegree of ξ = 6 has resulted in best entropies.Lower ξ values provide fewer signal paths and hence less network activity.Average numbers of network devices were 150, 250, 350, 450 for ξ = 2, 4, 6, 8, respectively.Markers Nx refer to table 1.
Energy vs. Entropy

Figure 4 .
Figure 4. Energy consumption in random resistive switch networks.(a) Averaged energy consumption for different network morphologies expressed as log 10 (E).Energy grows exponentially when transitioning from locally connected (α = 1, β = 10) to more globally connected networks (α ≤ β).The link between energy and entropy is shown in (b).Entropy grows linearly with exponential energy increase.

Figure 5 .
Figure 5. Examples of networks and node signals.(a) Example 16 2 network with α = 2, β = 5, and ξ = 4 indicating the 16 interface nodes.(b) 16 Signals measured from a single 16 1 network with the input applied to node 1 and the ground node connected to node 16.The presented signals have an entropy of 0.37.The numbers correspond to the physical location of the nodes within the network.(c) Signals measured from 16 independent 16 1 networks.The entropy for this example is 1.79.Signals were measured as the difference between interface nodes 2 and 9.The numbers represent the network number.All networks were created with α = 1, β = 5, ξ = 4, v = 2V

Figure 6
Figure 6.Energy vs. entropy for single networks and a hierarchical approach with 16 independent networks.The circled areas highlight an example of networks with the same network parameters (v = 2, ξ = 2).Scaling up the number of independent networks is more energy-as well as computationallyefficient as compared to increasing the entropy of a single network by means of scaling up either density or voltage.Data obtained with v = 2 and ξ = 2 . . .8.

Table 1 .
Caption of the table.(Table body should be created by MS word table function; three-line table is preferred.)

Table 1 .
Examples of network parameters