Structural plasticity on an accelerated analog neuromorphic hardware system

In computational neuroscience, as well as in machine learning, neuromorphic devices promise an accelerated and scalable alternative to neural network simulations. Their neural connectivity and synaptic capacity depends on their specific design choices, but is always intrinsically limited. Here, we present a strategy to achieve structural plasticity that optimizes resource allocation under these constraints by constantly rewiring the pre- and postsynaptic partners while keeping the neuronal fan-in constant and the connectome sparse. In our implementation, the algorithm is executed on a custom embedded digital processor that accompanies a mixed-signal substrate consisting of spiking neurons and synapse circuits. We evaluated our proposed algorithm in a simple supervised learning scenario, showing its ability to optimize the network topology with respect to the nature of its training data, as well as its overall computational efficiency.


Introduction
Experimental data shows that plasticity in the brain is not limited to changing only the strength of connections. The structure of the connectome is also continuously modified by removing and creating synapses (Grutzendler et al., 2002;Zuo et al., 2005;Bhatt et al., 2009;Holtmaat and Svoboda, 2009;Xu et al., 2009). Structural plasticity allows the nervous system to reduce its spatial and energetic footprint by limiting the number of fully expressed synaptic spine heads and maintaining sparsity (Knoblauch and Sommer, 2016). The lifetime of dendritic spines, involved at least in excitatory projections, varies dramatically (Trachtenberg et al., 2002).
The process of spine removal depends on the spine head size: smaller spines are removed while larger ones persist (Holtmaat et al., 2005(Holtmaat et al., , 2006. At the same time, new spines are continu-ously created. The spine volume also shows a strong correlation with the amplitude of the respective synaptic currents (Matsuzaki et al., 2001), hence suggesting a coupling of a connection's lifetime and its synaptic efficacy.
Neuromorphic devices implement novel computing paradigms by taking inspiration from the nervous system. With the prospect of solving shortcomings of existing architectures, they often also inherit some restrictions of their biological archetypes. The exact form and impact of these limitations depend on the overall design and architecture of a system. Ultimately however, all physical information processing systems, with neuromorphic ones making no exception, have to operate on finite resources. For most neuromorphic systems, synaptic fan-in is -to various degrees -one of these limited resources. This applies to analog as well as digital platforms, especially when they implement fast on-chip memory.
While TrueNorth does make use of time-division multiplexing, it allocates a fixed memory region for 256 synapses per neuron (Akopyan et al., 2015). In digital neuromorphic multi-core platforms like SpiNNaker (Furber et al., 2013), the number of synapses per neuron can often be traded against overall network size or simulation performance. Loihi allows up to 4096 differentiable presynaptic partners per group of up to 1024 neurons, located on a single core (Davies et al., 2018). ODIN features a fan-in of 256 synapses per neuron (Frenkel et al., 2018). Since digital systems often make use of time-multiplexed update logic, these constraints can often be alleviated by increasing memory sizes -albeit at the cost of prolonged simulation times.
Analog and mixed-signal systems mostly do not allow this trade-off, because their synapses are implemented physically, and therefore often constitute a fixed resource. Examples include DYNAP-SEL (providing 64 static synapses per neuron on four or its cores and 256 learning synapse circuits on a fifth core, Moradi et al., 2018;Thakur et al., 2018), Spikey (256 synapses per neuron, Schemmel et al., 2007), and BrainScaleS-1 (220 synapses per neuron, Schemmel et al., 2010). For this manuscript, we have used a prototype system of the BrainScaleS-2 architecture The analog neuromorphic core contains neuronal and synaptic circuits, which are accompanied by, inter alia, an analog parameter storage and the CADC for digitizing synaptic correlation data. It is surrounded by digital logic which interfaces the full-custom circuits and handles configuration data as well as spike traffic. The PPU is closely attached to the analog core, allowing it to access synaptic weights, address labels, and digitized correlation traces from the CADC. with 32 synapses per neuron. At full scale, the system features 256 synapses per neuron, with the additional option of merging multiple neuron circuits to larger logical entities in order to increase their overall fan-in (Aamir et al., 2018a), similarly to its predecessor BrainScaleS-1 (Schemmel et al., 2010). In this paper we present an efficient structural plasticity mechanism and an associated on-chip implementation for the BrainScaleS-2 system, which directly exploits the synapse array's architecture. It leverages the fact that the network connectivity is partially defined and resolved within each synapse, which is enabled by local event filtering. The update algorithm is implemented on the embedded plasticity processor, which directly interfaces the synaptic memory through a vector unit. This near-memory design allows efficient parallel updates to the network's topology and weights.
Our approach enables fully local learning in a sparse connectome while inherently keeping the synaptic fan-in of a neuron constant. We further demonstrate its ability to optimize the network topology by forming clustered receptive fields and study its robustness with respect to sparsity constraints and choice of hyperparameters. While enabling an efficient, task-specific allocation of synaptic resources through learning, we also point out that our implementation of structural plasticity is computationally efficient in itself, requiring only a small overhead compared to the computation of, e.g., synaptic weight updates.

Methods
The BrainScaleS-2 architecture, which we discuss in section 2.1, provides all features required to implement flexible plasticity rules, including our proposed mechanism for structural reconfiguration. Section 2.2 describes the algorithm for pruning and reassignment of synapses as well as an optimized implementation thereof. This structural plasticity scheme can be coupled with various weight dynamics. In this work, we employ a correlationbased weight update rule, which is described in section 2.3. The combination of both is tested in a supervised classfication task, as outlined in section 2.4. Events are identified with an address denoting their source (numbered and marked by color). Spike trains from different origins can be overlayed and injected into a single synapse row. Synapses filter afferent events by comparing the source address to a label stored in their local SRAM and forward only matching spikes to the postsynaptic neurons. Addresses and labels can be reconfigured by the PPU to implement weight dynamics and structural changes.

BrainScaleS-2 architecture
BrainScaleS-2 is a family of mixed-signal neuromorphic systems implemented in a 65 nm process (Fig. 1). It is centered around an analog neural network core implementing neuron and synapse circuits that behave similarly to their biological archetypes. State variables such as membrane potentials and synaptic currents are physically represented in the respective circuits and evolve in continuous time. Leveraging the intrinsic capacitances and conductances of the technology, time constants of neuron and synapse dynamics are rendered 1000 times smaller compared to typical values found in biology. This thousandfold acceleration facilitates the execution of time-consuming tasks, such as performing high-dimensional parameter sweeps, the investigation of learning and metalearning, or statistical computations requiring large volumes of data (Cramer et al., 2019;Bohnstingl et al., 2019). The analog core features 32 silicon neurons 1 (Aamir et al., 2018b) implementing leaky integrate-and-fire (LIF) dynamics ṁ m = − l ( m − l ) + syn , where m represents the membrane potential, m the membrane capacitance, l the leak conductance, and l the resting potential. Synaptic currents syn are modeled as superpositions of spike-triggered exponential kernels. The membrane is connected to a reset potential by a programmable conductance for a finite refractory period as soon as the membrane potential crosses a firing threshold th . All neurons are individually configurable via an on-chip analog parameter memory (Hock et al., 2013) and a set of digital control values.
Each neuron is associated with a column of 32 synapse circuits 1 (Friedmann et al., 2017), which receive their inputs from the chip's digital backend. Incoming events are tagged with addresses, which denote their presynaptic origins (Fig. 2). A 6 bit label is stored alongside the 6 bit weight in the synapselocal static random-access memory (SRAM). It allows to filter afferent spike trains by their addresses; only an event match-time weight Hebbian potentiation regularization w pruning and reassignment Figure 3: Illustration of weight dynamics. The evolution of synaptic weights is governed by a Hebbian potentiation term and a regularizing force of opposing sign. A stochastic component in the weight update term leads to a random walk. Synapses with an efficacy below the pruning threshold w are regularly reassigned to new receptors, allowing neurons to find more informative presynaptic partners, to which the connections can then be strengthened.
ing the locally stored label is forwarded to the postsynaptic neuron circuit. Each synapse also implements an analog circuit for measuring pairwise correlations between pre-and postsynaptic spike events (Friedmann et al., 2017), enabling access to various forms of learning rules based on nearest-neighbour spike-timing-dependent plasticity (STDP). The analog correlation traces are made accessible by the column-parallel analogto-digital converter (CADC).
The versatility of the BrainScaleS-2 architecture is substantially augmented by the incorporation of a freely programmable embedded microprocessor (Friedmann et al., 2017). Together with the single instruction, multiple data (SIMD) vector unit, which is tightly coupled to the synapse array's SRAM controller and the CADC, it forms the plasticity processing unit (PPU), which allows efficient control of synaptic plasticity. Access to the on-chip configuration bus further allows the processor to also reconfigure all other components of the neuromorphic system during experiment execution. The PPU can thus be used for a vast array of applications such as near-arbitrary learning rules, on-line circuit calibration, or the co-simulation of an environment capable of continuous interaction with the network running on the neuromorphic core. On the prototype system used in this work, the plasticity processor runs with a frequency of 100 MHz. Its SIMD unit operates in parallel on slices of 16 synapses 2 .
A field-programmable gate array (FPGA) is used to interface the application-specific integrated circuit (ASIC) with a host computer. It also provides sequencing mechanisms for experiment control and spike handling. Our experiments were based on this paradigm. However, it was shown that the PPU can replace all of the FPGA's functionality during experiment runtime (Wunderlich et al., 2019), dramatically reducing the overall system's power consumption. In this case, the FPGA is only used for initial configuration as well as to read out and store observables for later analysis and visualization. This is an essential prerequisite for the scalability of the BSS-2 architecture.
Algorithm 1: Plasticity algorithm including weight updates and structural reconfiguration. The update algorithm is applied iteratively to the synapse rows. Synapses within a row are processed in parallel. The PPU supports SIMD vector instructions including arithmetic operations and access to the synaptic memory (synram_weights_{read,write}(), synram_labels_write()) and CADC data (correlation_read()). It has also access to the neuronal firing rates (rates_read()) and uniform pseudo-random number generators (rng()).

Pruning and reassignment of synapses
We propose a mechanism and an optimized hardware implementation for structural plasticity inspired by two well-established biological observations. First, we assume that important, informative synapses have larger absolute weights. In our particular setting, this is achieved by Hebbian learning, augmented by slow unlearning, as outlined in Section 2.3, but this assumption holds for many other plasticity mechanisms as well (Oja, 1982;Urbanczik and Senn, 2014;Frémaux and Gerstner, 2016;Mostafa, 2017;Zenke and Ganguli, 2018). Second, we enable the network to manage its limited synaptic resources towards potentially improving its performance by removing weak synapses and creating new ones instead.
A synapse's eligibility for pruning is determined by the value of its weight: it is removed in case its efficacy falls below a threshold w (Fig. 3). Whenever an afferent synapse is removed, the postsynaptic neuron replaces it with a connection to a randomly selected presynaptic partner, thus conserving its indegree. The newly created synapse is intialized with a low weight init . The pruning process takes place at a slower timescale than the network dynamics and weight updates, giving the synaptic weights time to develop and integrate over multiple update periods.
The implementation on BrainScaleS-2 exploits an in-synapse resolution of the connectome. Each event carries a label denoting its origin, allowing synapses to distinguish different sources. A synapse filters afferent spike trains by comparing this event address to the locally stored value and forwards only matching events to its postsynaptic neuron. Pruning and reassigning of synapses is implemented by remapping the label stored in the synapse-local SRAM, which effectively eliminates the previous connection.
As compared to other synaptic pruning and reassignment strategies, our algorithm and implementation of structural plasticity requires a particularly low overhead. Due to the in-synapse definition of the connectome and the therefore local reassignment mechanism we can avoid global access patterns; e.g. no reordering of routing tables is required, which can otherwise lead to increased computational complexity . At its core, reassignment only involves a single SRAM access. Also the evaluation of the pruning condition and the selection of a new presynaptic partner can be realized with just a few simple instructions (Fig. 1).

Correlation-driven weight update algorithm
The synaptic reassignment algorithm described above is accompanied by Hebbian weight dynamics. The temporal evolution of the synaptic weights , which is illustrated in Fig. 3, obeys the following equations: The update rule (Eqn. 1) consists of three terms. The first term represents an implementation of STDP and depends on the postand presynaptic spiketrains and , defined as vectors of ordered spike times and . The STDP kernel is exponential and positive for causal presynaptic spikes and zero for anti-causal ones, with a cutoff at a maximum value max (Eqn. 2). The second term implements homeostasis (by penalizing large postsynaptic firing rates) and forgetting (as an exponential decay). This regularizer encourages competition between the afferent synapses of a neuron. The third term induces exploration by means a uniformly drawn random variable leading to an unbiased random walk. The three components are weighted with positive factors , , and , respectively.
All three contributions to the weight update rule can be mapped to specialized hardware components. The STDPderived term is based on correlation traces. These observables are measured in analog synapse-local circuits and then digitized using the CADC (section 2.1).
As stated above, the correlation values are capped. This is required to reduce the imbalance introduced by fixed-pattern deviations in the correlation measurement circuits' sensitivity, as some of these analog sensors might systematically detect stronger correlation values than others. This can lead to an overly strong synchronisation of the respective receptor and label neurons, in turn resulting in a self-amplifying potentiation of the corresponding weight and a resulting dominance over the teacher spike train. In principle, a decrease of could dampen such feedback, but the corresponding reduction of the exponential STDP kernel can be difficult to reconcile with fixed-point calculations of limited precision.
The homeostatic component requires access to the postsynaptic firing rates, which are read from spike counters via the onchip configuration bus. Stochasticity is provided by an xorshift algorithm (Marsaglia et al., 2003) implemented in software. 3 The individual contributions are processed and accumulated on 3 Later versions of the system feature hardware acceleration for the generation of pseudo-random numbers. The two-layer network consists of a group of receptors and a label population. One teacher per label neuron ensures excitation of the correct labels during learning. The inputs project onto the label layer with a potential all-to-all connectivity (gray), but only a subset of synapses is realized (blue). (B) The receptors are uniformly distributed on the twodimensional feature space, which is spanned by the petal widths and lengths of Iris flowers belonging to the three classes setosa, versicolor, and virginica. A receptor's activity is calculated from its Euclidean distance to a data point according to a triangular kernel with radius .
the embedded processor: using the SIMD vector unit, it is able to handle slices of 16 synapses in parallel.

Classification task
We applied the presented plasticity mechanism including structural reconfiguration to a two-layer network trained to perform a classification task. The network consisted of a group of spike sources in a receptor layer and a set of label neurons. These layers were set up such that every postsynaptic neuron could potentially receive input from any presynaptic partner in the receptor layer. Only a fixed fraction of these potential synapses was expressed at each point in time; the others were dormant, resulting in a sparse connectome. In addition to the feed-forward connections, label neurons were stimulated by teacher spike sources. These supervisory projections ensured excitation of a label neuron when an input belonging to their respective class was presented.
The network was trained on the Iris dataset (Fisher, 1936). We reduced the four-dimensional dataset to only two dimensions by selecting petal widths and lengths, renormalized to values between 0.2 and 0.8. The resulting two-dimensional feature space is shown in Fig. 4 B. On this plane, virtual receptors were placed at random locations drawn from a uniform distribution. These receptor neurons emitted Poisson-distributed spike trains with an instantaneous rate determined by their respective Euclidean distances to a presented data point. The firing rate was calculated according to a triangular kernel ( ) = ⋅ max(0, 1 − ∕ ), witĥ = 50 kHz. This corresponds to a biologically plausible firing rate of 50 Hz, when taking the system's speedup into account. The radius of the receptors was scaled inversely with √ to ensure a reasonable converage of the feature space.
To impose a certain level of sparsity, we used the following procedure. Receptors were randomly grouped into disjoint bundles of size and each bundle was injected into a single synapse row. Within a bundle, each receptor was assigned a unique address. The sparsity of the connectome, defined as the ratio between the number of unrealized synapses and the number of potential synapses, was thus set to 1 − 1∕ = 1 − ∕ . This Exemplary evolution of realized afferent weights of the "setosa" label neuron during the course of a single experiment. The line color is determined by the average feature-space distance between the respective receptor and all "setosa" data points. Synapses that receive inputs from relevant receptors (i.e., those lying close to the features that are relevant for their postsynaptic label neuron) are strengthened towards values that lie above the pruning threshold w . All other, less informative synapses remain below w and are pruned at regular intervals of five epochs. For each pruned synapse, a new one is initialized at init , between the same label neuron and a previously unconnected receptor. (B) Distribution of synaptic weights during the last 50 epochs over 20 randomly initialized runs. Note that the histogram only takes into account realized synapses, which at all times are only 18 out of 144 potential ones. (C) Exemplary evolution of all synaptic weights between the receptor population and the "setosa" label neuron. At all times, only ∕ = 6 synapses are realized. The transition from blue to red marks the pruning threshold w . Note how gray/blue (subthreshold) and white (non-existent) states alternate, marking the pruning of weak synapses and re-initialization of new ones. One of these reassignments is highlighted and referenced to the corresponding threshold crossing in pane A. (D) Evolution of the turnover rate (fraction of pruned synapses per epoch) for the 20 runs. The solid line marks the mean and the gray area represent the 20 and 80 percentiles. As time progresses, the turnover rate converges to approximately 20 %, indicating that all relevant receptors (on average five) have been found. The remaining "free" synapses (on average one) keep switching between all other receptors, but are pruned regulary as they are not informative for the respective class.
setup allowed two degrees of freedom in the control of network sparsity (Fig. 7). Increasing the number of receptors for a fixed synapse count increased the bundle size and thus the sparsity as well. On the other hand, for constant sparsity 1 − 1∕ , reducing the synapse count incurred a reduction of the receptor count . The dataset, containing a total of 150 data points, was randomly divided into 120 training and 30 test samples to allow cross validation. Samples were presented to the network in random order. For each presented data point, the network's state was determined by a winner-take-all mechanism implemented in software, which compared the firing rates of the label neu-norm. petal length norm. petal width setosa norm. petal length versicolor norm. petal length virginica Figure 6: Self-organized formation of receptive fields. The probability of synapse expression depends on the location of receptors in the feature space and the class of label neurons. Each square is shaded according to the probability for a label neuron to have formed a synapse with a receptor lying within that area (lighter for higher probability); estimated from the state at the end of training in 100 experiments with random initial conditions. The size of the three emerging clusters is determined by the receptor radius .
rons. Synaptic weights were updated according to Eqn. 1 after each epoch. The pruning condition was evaluated regularly every five epochs.

Results
In this section, we describe experimental results of learning on the BrainScaleS-2 prototype using the plasticity rule and classification task outlined above. We evaluated the network's performance under varied sparsity constraints and performed sweeps on the hyperparameters to study the robustness of the learning algorithm and demonstrate its efficient use of limited synaptic resources. Moreover, we highlight the speed of our structural plasticity algorithm, especially in conjunction with its implementation on the BrainScaleS-2 system.

Self-configuring receptive fields
Depending on the nature of the data to be learned, i.e., the distribution of data points in the feature space, some receptors can be more informative than others (Fig. 4 B). Our learning rule naturally selects the most informative receptors, thereby creating a topological order of the label neurons' receptive fields. This clustering of receptors is driven by the synaptic weight evolution as described by Eqn. 1 (Fig. 3). Fig. 5 shows this evolution during the course of an experiment. Starting from their initial values, synapses that contributed causally to the firing of their postsynaptic neurons were potentiated. After escaping the pruning threshold, they continued evolving until reaching an equilibrium with the homeostatic force. Weaker connections were regularly pruned and reassigned; the common intialization value manifests itself in a strongly pronounced peak.
The turnover rate, defined as the fraction of pruned synapses, also reflects the formation of receptive fields. As the receptors were randomly intialized at the beginning of the experiment, they did not reflect the spatial distribution of the dataset. This resulted in frequent pruning, indicated by a high turnover rate. Over time, a set of stable synapses was formed and the turnover rate gradually decreased. The topology of the emergent connectome can be reconstructed from the synaptic labels. By repeating the experiment with varying seeds and therefore initial conditions, it is possible to calculate a probability density for a synapse to be expressed at a given point on the feature plane. This map closely resembles the distribution of the presented data ( Fig. 6): the receptive fields of the respective label neurons cluster around the corresponding samples. The radius of these clusters is determined by the spread of the data as well as the support and shape of the receptors' kernels.

Increased network performance with structural plasticity
During the course of the training phase, the network's performance was repeatedly evaluated by presenting the test data to the receptor layer. In this phase, the network's weights and connectome were frozen by disabling weight updates and structural modifications. To test the network's ability to generalize and reduce the impact of specific positioning of receptors or initial conditions, we trained and evaluated the network starting from 20 randomly drawn initial states. The evolution of the network's accuracy can be observed in Fig. 7 A. Starting from approximately chance level, the performance increased during training and converged to a stable value. In this specific experiment, we swept the bundle size while keeping the number of utilized synapse rows constant, resulting in a variable number of receptors = ⋅ . This corresponds to a scenario where the limited afferent synaptic resources per neuron are fully utilized and structural plasticity is required to expand the number of virtual presynaptic partners. For = 1 the network was trained without structural reconfiguration and only had access to a small pool of receptors, resulting in a correspondingly low performance. As more receptors became available, the classification accuracy increased as well, up to 92.3 % for structural plasticity with a bundle size of 8.
In a second sweep we kept the number of receptors constant and varied the bundle size . This resulted in a variable number of realized synapses and hence different levels of sparsity. The classification accuracy's evolution for ∈ {2, 4, 8} is shown in Fig. 7 B. The network achieved a very comparable performance of approximately 92 % for all of the sparsity levels. In this experiment, we showed that learning with structural plasticity allows to reduce the utilization of synaptic resources while conserving the overall network performance. These results demonstrate that our learning algorithm enables a parsimonious utilization of hardware resources. The resulting pool of "free" synapses can then be used for other purposes, such as for the realization of deeper network structures. For larger receptor pools, we also note that learning converges more slowly, as the label neurons need more time to explore their respective receptive fields (Fig. 7 A,B).
Both of the aformentioned experiments can be embedded into a more extensive sweep over receptor counts and bundle sizes . In Fig. 7 C, the two experiments correspond to the dashed and dotted lines, respectively. Classification performance primarily depends on the count of available receptors -and to a much lesser extent on the amount of utilized hardware resources. For the employed classification task, only six synapses were sufficient to reach levels of accuracy otherwise only tangible with more than 32 synapses and more.
The network's performance also depends on the selection of hyperparameters for the learning rule. Since the pruning condition is based on the synaptic weights, the selection of the pruning threshold must take into account the distribution of learnt efficacies (Fig. 5). Thus, w must be high enough to allow uninformative synapses to be pruned, but still low enough as to not affect previously found informative synapses. Fig. 8 displays different performance metrics as a function of the pruning threshold. These analyses are shown for a varied strength of the regularizing term , as the weight distribution and scale depend on the balance of the positive Hebbian and this negative force. All three metrics exhibit broad plateaus of good performance, which coincide over a relatively wide range of w .

On-the-fly adaptation to switching tasks
As demonstrated, structural plasticity enables learning in sparse networks by exploring the input space and forming informative receptive fields. So far we have considered experiments with a randomly initialized connectome and most importantly a homogeneous weight distribution. In another experiment, we tested the plasticity mechanism's ability to cope with a previously learned and therefore already structured weight distribution. We achieved this by abruptly changing the task during training. After 200 epochs, the receptors were moved to new, random locations, resulting in a misalignment of receptive fields and data points. The plasticity rule was executed continuously, before and after this task switch. As shown in Fig. 9, the accuracy dropped to approximately chance level as the receptors were shuffled. This decline, however, was directly followed by a rapid increase of the turnover rate. The negative contribution of the regularization term outweighed the Hebbian forces, thereby resulting in decreasing synaptic efficacies. After a few epochs, most of the weights had fallen below and were eligible for pruning. This process allowed the network to successfully unlearn previous connections, thus rekindling exploration of the input space.

Fast and efficient hardware emulation
In our proposed implementation, structural reconfiguration only induces a small computational overhead. Synaptic pruning and reassignment is enabled by exploiting the synaptic filtering of spike events by their source address. Since the connectome is essentially defined by the address labels stored in the synapses' memory, it can also be reconfigured with local operations only.
The algorithm can effectively be dissected into four steps (Alg. 1): accessing the synaptic weights, evaluation of the pruning condition, potential reassignment of the synaptic label, and a final write access to the synapse SRAM. The exact time required for executing the respective instructions depends on the neuromorphic system's architecture and the design of the plasticity processing unit. In general, memory access and the generation of pseudo-random numbers can be regarded as the most expensive operations. The former primarily depends on the system's design and can be optimized for low access times. Random number generation can also be sped up by implementing dedicated hardware accelerators.  Figure 9: Restoration of network performance after task switch. After training for 200 epochs, the receptor layer is randomly rearranged, leading to a mismatch in receptive fields. Ongoing structural plasticity unlearns the previously established connectome and quickly starts to again explore the input space. This process can be observed in an elevated turnover rate after the task switch, similar to the initial phase of the experiment.
Our implementation on BrainScaleS-2 is enabled by the PPU and its tight coupling to the neuromorphic core. Access to the synapse array as well as arithmetic operations are optimized by a parallel processing scheme. Performing a structural plasticity update on a single slice of 16 synapses takes approximately 110 clock cycles, which corresponds to 1.1 µs at a PPU clock frequency of 100 MHz (Fig. 10). This amounts to about seven clock cycles, or 69 ns, per synapse. In comparison, the Hebbian term, which is executed five times more often, requires approximately 3.8 µs for a slice or 240 ns per synapse. The regularizer and random walk take 69 ns and 97 ns per synapse, respectively. In our implementation, these terms were implemented separately and were not particularly optimized for performance. Sharing memory accesses or intermediate results between them would lead to an overall speedup of the plasticity mechanism.
The time spent on the generation of pseudo-random numbers, highlighted in Fig. 10, constitutes a significant portion for both the random walk and the pruning term. On the full-size BrainScaleS-2 system, hardware accelerators allow to reduce this contribution to a comparatively negligible 0.08 clock cycles per synapse 2 .
Hence, our implementation of structural plasticity is doubly efficient. Not only can it effectively optimize the utilization of synaptic resources, but it can also achieve this at the cost of only a small overhead to the calculation of synaptic weight updates (Fig. 10).
The accelerated nature of the BrainScaleS-2 system also contributes to a rapid evaluation of plasticity schemes in generaland structural reconfiguration in particular. Emulating a single epoch of 24 biological seconds required a total of 137 µs on our system. Excluding the overhead induced by on-the-fly generation of input spike trains in Python, this number boils down to less than 50 µs, which corresponds to a speedup factor of about 500. As shown by Wunderlich et al. (2019), this overhead can be dramatically reduced by porting the experiment control from the host and FPGA to the PPU. This further allows to optimize the system's power consumption to below 60 mW, with only a weak dependence on the nature of ongoing network activity and plasticity (Wunderlich et al., 2019). Contributions of the individual terms to the overall update duration, taking into consideration that pruning and reassignment are executed five times less often than synaptic weight updates.

Discussion
We have presented a fully local structural plasticity mechanism together with an efficient implementation on a prototype of the BrainScaleS-2 architecture. The algorithm allows to train a network with a sparse connectome, thereby utilizing synaptic resources more efficiently. We showcased this implementation in a supervised learning task with weight updates driven by Hebbian potentiation. For this classification task, it was possible to drastically increase the sparsity of the connectome without significant performance loss. Self-configuring receptive fields led to near-perfect accuracy and a better utilization of synaptic resources without prior knowledge of the input data. Structural plasticity has been successfully applied to networks with various topologies and learning paradigms (Butz et al., 2009;George et al., 2017;Bogdan et al., 2018;Kappel et al., 2015;. Some of them were designed to mimic biological findings, others focused on computational principles. For some of the aforementioned work, there already exist neuromorphic implementations. A processor-based solution was proposed to augment a real-time analog neuromorphic ASIC (George et al., 2017). However, structural reconfiguration only took place on an FPGA and acted on spike-trains before injecting them into the neuromorphic substrate. Fully digital approaches were demonstrated on the two SpiNNaker generations (Bogdan et al., 2018;Yan et al., 2019). In particular, the addition of hardware accelerators enabled an efficient implementation on the second generation of the systems. The implementation of sparsity on SpiNNaker was similar to ours, but was applied to synaptic fan-out rather than fan-in. Also on SpiNNaker-2, a nonspiking deep learning framework incorporating structural plasticity demonstrated efficiency gains for a functional network It did, however, not use an optimized memory layout, and thus, introduced a large overhead for rewiring.
Similar plasticity schemes have also been demonstrated on the BrainScaleS platforms (Wunderlich et al., 2019;Schmitt et al., 2017). The presented implementation of structural rewiring can be employed in many of these frameworks, where the pruning of low-weight synapses is not detrimental or can even be beneficial to the overall network performance. Our approach can alleviate the ubiquitous issue of limited fan-in, whether plasticity calculations are performed on-or off-chip. In the latter case, it is particularly appealing due to its low computational overhead.
We note that the accelerated nature of the BrainScaleS-2 system is especially relevant in the context of modeling biological rewiring processes. In vivo, structural changes to the connectome typically take place on time scales of hours to days (Lamprecht and LeDoux, 2004), which allows synapses to process large amounts of information and evolve accordingly before being potentially pruned. This throughput of information -essentially spikes -per unit of time is directly contingent on the specific time constants of neuro-synaptic dynamics. Consequently, the acceleration factor of the BrainScaleS-2 can also translate directly to a corresponding speedup of structural plasticity.
Our implementation scales well with growing system sizes, since it is fully based on synapse-local quantities. In particular, it profits directly from the parallel handling of synaptic updates. On such large systems, this would especially benefit the more complex network structures and associated larger synapse arrays required when tackling more difficult tasks.

Contributions
SB and BC conceived the idea and designed, implemented, and executed the experiments. MP and DK discussed the ideas and results. KS designed the experiment setup. JS, as the lead designer and architect of the neuromorphic platform, conceived and implemented the synapse circuits. All authors discussed and contributed to the manuscript.