Computational properties of multi-compartment LIF neurons with passive dendrites

Mixed-signal neuromorphic computers often emulate some variant of the LIF neuron model. While, in theory, two-layer networks of these neurons are universal function approximators, single-layer networks consisting of slightly more complex neurons can, at the cost of universality, be more efficient. In this paper, we discuss a family of LIF neurons with passive dendrites. We provide rules that describe how input channels targeting different dendritic compartments interact, and test in how far these interactions can be harnessed in a spiking neural network context. We find that a single layer of two-compartment neurons approximates some functions at smaller errors than similarly sized hidden-layer networks. Single-layer networks with with three compartment neurons can approximate functions such as XOR and four-quadrant multiplication well; adding more compartments only offers small improvements in accuracy. From the perspective of mixed-signal neuromorphic systems, our results suggest that only small modifications to the neuron circuit are necessary to construct more computationally powerful and energy efficient systems that move more computation into the dendritic, analogue domain.


Introduction
Biological neurons possess intricately detailed dendrites. Both theoretical and empirical studies suggest that the distribution of synaptic sites along these dendrites has a significant impact on neural computation [1][2][3]. Still, there is no widely accepted high-level theory of dendritic computation, that, for example, is on a similarly high level of abstraction as the LIF neuron [4].
A better understanding of dendritic computation is not only of interest to neurobiologists. Being able to systematically harnesses nonlinear dendritic interaction for computation can inform neuromorphic hardware design and thereby increase the computational power of neuromorphic computers at relatively small costs. This is particularly true for mixed-signal neuromorphic systems-computers that use analogue model neurons in conjunction with a digital interconnect (e.g., [5][6][7][8]). As we demonstrate below, combing two or more model neurons into a multi-compartment neuron in the analogue domain can sometimes outperform multi-layer networks of the same size while reducing digital communication overhead.
Compared to the complex active dendritic mechanisms often discussed in the theoretical neuroscience literature (e.g., [4,9,10]), our notion of dendritic computation is decidedly simple. Specifically, we do not include active effects such as dendritic spikes in our model. Instead, we investigate to what degree passive nonlinear interaction between dendritic compartments provides a substantial computational advantage over standard LIF neurons. This results in a family of 'n-compartment LIF', or 'n-LIF' neurons. Forgoing active dendrites has the benefit of keeping neural dynamics mathematically tractable; however, our models do not possess the computational power ascribed to dendrites by the studies listed above. That is, an n-LIF neuron with fixed parameters is generally not as powerful as a multi-layer perceptron (MLP).
Our work is an extension of the neural engineering framework (NEF; [11]) and a direct continuation of our previous study on passive dendritic computation [12]. In a nutshell, the NEF is a method for constructing biologically constrained functional spiking neural networks. This allows researchers to ground high-level function

Function approximation in spiking neural networks
Before we analyse the potential computational benefits of multi-compartment neurons, we describe our approach to constructing neural networks more carefully. Note that the ideas presented in this section have previously been described in greater detail elsewhere [12,18].

Synaptic weight optimisation problem
We generally assume that our spiking neural networks are partitioned into neuron populations. Each population represents a vectorial quantity x over some compact set X ⊂ R d . This representation is established by assigning a normative tuning constraint, or tuning curve, to each neuron i. The tuning curve defines the average activity a i (x) that we expect a neuron to exhibit if its population represents a specific value x: Here, G[J] is a mapping from a somatic input current J onto an average firing rate, and J i (x) maps from x onto a current. Lastly, α i , β i are a gain and bias, and e i is a unit encoding vector.

Multi-channel neurons
In this paper, we are mostly interested in neuron models with k > 1 input channels. That is, we do not just have a response curve G [J], but a response surface G[g] that maps some channel state g = (g 1 , . . . , g k ) onto an average neural activity. However, when solving for synaptic weights W i , we typically do not use G directly. Instead, we decompose the multivariate response curve G into functions G [J] and H(g) such that G(g) = G[H(g)]. Here, G[J] maps a current onto an average rate as before, and H(g) maps g onto a current. This decomposition allows us to solve for weights in current-space.
Importantly, we always use equation (1) to assign a tuning curve to each neuron, even for multi-channel neurons. We merely harness the additional degrees of freedom provided by multi-channel neurons to-as we discuss next-realise this tuning and to compute some functions more accurately.

Current-space weight optimisation
Consider two neuron populations representing quantities x ∈ X ⊂ R d and y ∈ Y ⊂ R d , respectively, where the population representing x projects onto the population representing y. By adjusting the synaptic weights between these two populations, we can approximate functions f : X → Y. In order to do this systematically, we need to find synaptic weights W i ∈ R k×m for each post-synaptic neuron i. These weights must both realise the normative tuning constraint, and, at the same time, approximate the desired function. To simplify this process, we assume that the pre-synaptic neurons already possess the desired tuning. We can then minimise the following loss: Here, a pre ∈ R m are the combined activities of the pre-populations according to equation (1), and f is the function that we would like to compute in the connection from the pre-to the post-population. Furthermore, E is the superthreshold error function, defined as follows: This function relaxes the optimisation problem in case the target current J tar is below the neuron's firing threshold J th . In this case, the magnitude of the decoded current J dec does not matter-as long as it is below the firing threshold. We refer to including E in equation (2) as subthreshold relaxation; this can substantially reduce approximation errors [12]. Equation (2) can further be combined with constraints such as forcing certain entries in W i to be zero, or requiring that W i be nonnegative. This can be used to construct networks that, for example, adhere to Dale's principle [19,20] or spatial connectivity constraints [15].

Theoretical and practical considerations when using multi-channel neurons
Neuron populations that realise diverse tuning according to equation (1) are universal approximators [21]. By increasing the number of hidden neurons, any function f over X can be linearly decoded from the neural activities with arbitrarily small errors.
Of course, this raises the question: why should we use multi-channel neurons at all? To answer this question, consider the case where two separate neuron populations represent variables x 1 and x 2 ( figure 1(a)). To compute f with nonlinear interaction between x 1 and x 2 (i.e., functions other than f(x 1 , x 2 ) = f 1 (x 1 ) + f(x 2 )), both x 1 and x 2 must be represented in a common population with encoders e i that are not axis aligned. This can be accomplished by computing the linear function f(x 1 , x 2 ) = (x 1 , x 2 ) in the connection to an intermediate population, and then computing the desired function in the connection to a target population. However, relying on intermediate populations is not always desirable.

Multi-channel neurons can be more compatible with biological connectivity constraints
One reason to avoid intermediate populations are connectivity constraints in models of neurobiological systems. For example, attention circuits in cerebral cortex do not seem to contain intermediate populations that represent both the signal and the attentional gain [22]. Instead, one hypothesis for how attentional gain could be realised is that gain-modulation (i.e., f(x 1 , x 2 ) = x 1 x 2 ) arises from nonlinear interaction between excitatory and inhibitory input channels [23][24][25]. Such a 'dendritic computation' scheme is illustrated in figure 1(b).

Dendritic computation can support computing some functions more efficiently
A second reason for avoiding intermediate populations is efficiency. Being able to decode any function f from the pre-synaptic activities implies that a pre (x) implicitly spans a function basis. Assume that a onedimensional population with m neurons produces decodable (within a certain error) basis functions ϕ j (x). Legendre polynomials that can be decoded with an error smaller than the indicated threshold from a ReLU neuron population of size m (mean is over 100 trials). The three columns correspond to a one-, two-and three-dimensional population. The total number of basis functions tested for each data point is 20 d , where d is the dimensionality of the population. The number of decodable basis functions is not drastically larger for higher-dimensional neuron populations.
A d-dimensional function basis should then produce d decodable basis functions, namely all product terms Empirically speaking, increasing the dimensionality of a neuron population only results in a moderate increase of decodable basis functions (figure 2). In theory, it takes exponentially more neurons to reach d basis functions. It can thus be more efficient (i.e., require fewer neurons and lead to a smaller error) to decode k one-dimensional functions and to instead rely on nonlinear interaction between input channels to produce the necessary product terms.

Dendritic computation is not universal
Unfortunately, such a dendritic computation scheme cannot be a universal function approximator. Specifically, if d > 1, then for any fixed H(g 1 , . . . , g k ) in C ∞ , ε > 0, and arbitrary f j i (x i ) in C ∞ , there always exist f(x 1 , . . . , x d ) in C ∞ such that the following holds almost everywhere on a compact set X ⊂ R d : This follows from the limited number of independent product terms in H(. . .) when performing a Taylor series expansion. As a consequence, dendritic computation can only approximate some functions well; yet, as we demonstrate below, this includes practically useful functions.

Example: XOR and four-quadrant multiplication
To illustrate our notion of 'dendritic computation', consider the XOR problem [26]. Here, the goal is to assign a label y = {−1, 1} to the four corners of a square, where diagonal points belong to different classes ( figure 3(a)). This can also be phrased as a product: the sign of x 1 x 2 for (x 1 , x 2 ) ∈ {−1, 1} 2 is exactly the XOR pattern ( figure 3(b)). 'Four-quadrant multiplication' f(x 1 , x 2 ) = x 1 x 2 is thus a continuous version of the XOR problem; we use this function as a benchmark in our experiments below. Now, in order to solve the XOR problem, consider the aforementioned case where we represent x 1 , x 2 in separate populations. For post-synaptic neurons with a monotonic response G[ξ], we can only decode additive functions of the form f 1 (x 1 ) + f 2 (x 2 ). Although such functions can be highly nonlinear (figure 3(c)), it is impossible to solve the XOR problem in this way. However, introducing an intermediate layer representing (x 1 , x 2 ) (i.e., constructing an MLP) allows us to decode arbitrary functions over these two variables, including XOR and four-quadrant multiplication ( figure 3(d)). Yet, as we discussed above in section 2.2, establishing a powerful multivariate basis in the intermediate population may require a relatively large number of neurons.
The idea behind dendritic computation is to compute functions such as XOR by changing the response curve of the neuron. For example, if we had G [ξ] = G[ξ 2 − 1], then G [f 1 (x 1 ) + f 2 (x 2 )] contains productterms between x 1 and x 2 and we can solve the XOR problem (figure 3(e)). Similarly, for a two-channel neuron G[ξ 1 , ξ 2 ] = ξ 1 ξ 2 , we could (trivially) compute four-quadrant multiplication. However, as stated above, we can always find functions that cannot be approximated in this way.

Mathematical analysis of multi-compartment LIF neurons
As we discussed above, dendritic computation may offer an advantage in scenarios where neural resources are not sufficient to establish a good multivariate function basis. In the next section, we test whether we can observe such a computational advantage in spiking networks using neurons from a family of n-compartment LIF neurons, henceforth called 'n-LIF'. However, before we perform these experiments, we first need to define n-LIF neurons more carefully and derive the nonlinearity H required for our optimisation problem in equation (2). To gain a better understanding of the theoretical properties of n-LIF neurons, we furthermore provide simple rules that (on a theoretical level) characterise the interaction between input channels. We then explore in how far these predictions hold for biologically plausible n-LIF configurations.
Note that neuron models very similar to the n-LIF neurons discussed here have already been implemented on mixed-signal neuromorphic hardware. For example, the BrainScales 2 system contains analogue switches that can be used to connect the membranes of neighbouring neurons through an adjustable coupling conductance [27,28].

Mathematical description of n-LIF neurons
An n-LIF neuron is a connected graph of resistively coupled capacitive compartments. Each neuron possesses exactly one active somatic compartment with LIF dynamics, while the other compartments represent passive dendrites. Compartments may hold any number of passive current-and conductance-based channels. Channels are either constant (e.g., leak channels), or receive external input (synapses). Compartments are connected through coupling conductances c ij . Examples of such neurons are depicted in figure 4.

Superthreshold dynamics
In contrast to the standard LIF model, the active n-LIF compartment has an explicit spike model. This is important, as the spike potential causes a substantial current to backpropagate into the dendritic compartments [29].
More specifically, we model the superthreshold dynamics as follows. Whenever the membrane potential v surpasses the threshold v th , v is clamped to a 'spike potential' v spike for a period τ spike and subsequently forced to v reset for the refractory period τ ref . We use τ spike = 1 ms and τ ref = 2 ms in our models.

Subthreshold dynamics
The dynamics of an n-LIF neuron can be described in matrix form. Let k be the number of non-static input channels, and g ∈ R k a vector representing all input channel states. We have Here, C m ∈ R n is a vector of membrane capacitances, '•' denotes elementwise multiplication, A[g(t)] ∈ R n×n is the 'voltage feedback matrix', and b[g(t)] ∈ R n describes the input to the system. We further decompose A, b into input-independent and input-dependent terms: the graph Laplacian L ∈ R n×n and the vectors a ∈ R n , b ∈ R n describe the input-independent portions of the system, whereas the matrices A ∈ R n×k and B ∈ R n×k describe its input-dependent parts. The construction of these matrices is detailed in appendix A.

Deriving a surrogate model of the dendritic nonlinearity H
To solve our current-space weight optimisation problem in equation (2), we need to find a function H(g) that maps a synaptic state g = W i a pre onto an average somatic current J. Trivial cases aside, this H cannot be stated closed form. This is due to the nonlinear interaction between the membrane potential and conductance-based input channels, as well as the nonlinear superthreshold dynamics. Still, we can derive a parametrised 'surrogate model' that approximates H well and that can-as we discuss below-be fitted to empirical data. For the purpose of building networks of spiking neurons, we are primarily interested in the superthreshold regime. We can thus exploit the fact that n-LIF neurons are tonically spiking. That is, the somatic compartment oscillates between the reset and threshold potential with a fixed frequency-the somatic membrane potential is effectively clamped to some valuev between reset and threshold potential.
To estimate the average current flowing into the somatic compartment, we replace the system from equation (4) with a reduced system described by matricesÃ[g],b[g]. For constant g, we obtain an equilibrium stateṽ eq :ṽ As we describe in more detail in appendix A, the reduced system is specifically constructed such thatṽ eq converges to the voltage that we would obtain if the somatic compartment were clamped tov. In the somatic compartment itself, the voltage differenceṽ eq 1 −v is proportional to the current flowing into the compartment. Given a vectorc of somatic coupling conductances withc 1 = 1, the average current H(g) flowing into the soma is modelled as H(g) = c,ṽ eq .

Theoretical properties of the n-LIF nonlinearity
As we discussed in section 2.2, there must be some nonlinear interaction between different input channels for dendritic computation to offer any advantage. Analysing the dendritic nonlinearity H(g) sheds some light on how far this is the case for n-LIF neurons (see appendix B for a mathematical derivation). To summarise, and as is illustrated in figure 5, there are three types of interaction between inputs: affine, shunting, and multiplication.

Affine terms
The 'computationally weakest' n-LIF neurons are those where the somatic current is merely an affine function over the input channels. This is the case for n-LIF neurons that only possess current-based input channels, or any n-LIF neuron where input is solely fed into the somatic compartment (figures 4(a), (b) and 6(a), (b); [30]).

Shunting
Nonlinear interaction between the input channels of an individual dendritic compartment is limited to shunting (cf [31], section 1.5). Specifically, conductance-based input channels act 'divisively' on the other input channels in that compartment; an example would be the two-compartment neuron in figure 4(c) where the excitatory and inhibitory conductances g E and g I both act divisively on H (cf figure 6(c)). Shunting in a single compartment cannot be used to compute 'interesting' functions such as XOR [12]. Still, as is visible in our experiments below, and as is described in [12], a single layer of two-compartment neurons can compute a wide range of functions with a substantially smaller error than a single layer, or, in some cases, even two layers of standard LIF neurons.

Product terms
Input channels targeting different dendritic compartments on the same branch interact multiplicatively (figures 4(d), (e) and 6(d)). Notably, there is at most one current-based input channel per product term; current-based inputs cannot interact nonlinearly. Correspondingly, n-LIF neurons with three or more compartments could, at least in theory, compute four-quadrant multiplication.
However, exploiting the product terms of conductance-based inputs to compute multiplication over all four quadrants is difficult. Multiplicative terms are similar in magnitude to the linear terms. We must thus find synaptic weights that compensate for the linear terms, and, in addition, compute the desired function. Furthermore, there is a tradeoff between the coupling conductances c ij and the maximum somatic current. If c ij is small, it becomes more difficult for distal compartments to influence the soma. However, countering this by choosing larger c ij results in more linear H, which makes these inputs less computationally useful.

Evaluating the model quality
Our theoretical observations are only of value if the neuron actually behaves as our model predicts. Recall that the somatic current model H (cf section 3.2) is the result of major simplifications. For example, we assumed that the somatic compartment is effectively clamped to a constant potentialv, that g is constant, and that spike generation has no effect on the dendritic compartments. These assumptions are readily violated in practice.
Thus, equation (5) is better interpreted as a 'template' for the overall mathematical shape of the equilibrium potentialṽ eq (g) and the nonlinearity H(g) = c,ṽ eq (g) . We thus declare non-zero entries inÃ,B to be free parameters and use gradient-based optimisation such as Adam [32], LBFGS-B, or a sequential quadratic programme (SQP) to calibrate these parameters. In our testing, all of these methods perform equally well. In the context of this paper, we use the SQP implemented in our software library libnlif , which in turn is based on the 'OSQP' solver library [33]. 3 In appendix C, we describe a conceptually very similar SQP that we use to solve for synaptic weights and that is also implemented in libnlif . Figure 7 depicts empirical spike rate measurements together with our model predictions H(g) after fitting the model to the depicted ground truth for the two-, three-, and four-compartment models depicted in figure 4 (see table D1 for parameters). For two-compartment neurons (figure 7(a)), H can be made to fit empirical rates extremely well (with RMSEs below 2 s −1 ). For three-and four-compartment neurons (figures 7(b) and (c)) prediction errors are significantly higher at about 5 s −1 to 10 s −1 , but H qualitatively captures the shape of the empirical measurements. Without fitting the model parameters, errors are well above 30 s −1 .

Empirical evaluation of the n-LIF nonlinearity
According to our analysis from section 3.3, adding more compartments and input-channels to n-LIF neurons should increase their computational power. We now test in how far this holds empirically. Of course, we cannot test every possible n-LIF neuron configuration and parameter set; however, due to symmetries in n-LIF neurons with multiple branches, the sequential arrangements of compartments in figure 4 are, at least in theory and according to preliminary tests, the most computationally powerful n-LIF configurations. We manually tuned the coupling conductances to balance maximum somatic current and nonlinearity (cf section 3.3.3).

Methods
Consider a scenario where two variables x 1 , x 2 are represented in two separate one-dimensional neuron populations ( figure 1(b)). To assess the 'computational power' of different H in an unbiased manner, we generate random current functions J ρ (x 1 , x 2 ) with increasing frequency content ρ −1 (cf figure 8(a); ρ is the standarddeviation of a low-pass filter).
We sample 256 training samples (x 1 , x 2 ) from a 100 × 100 grid; the test error is computed over all 10000 points with normal-distributed noise (standard deviation of 10 −2 a max ) added to the pre-activities to test generalisation. We assume J th = 0 nA for subthreshold relaxation, and use an SQP implemented in libnlif to locally optimise the synaptic weights W i equation (2) for the H corresponding to the different neuron models with the fitted parameters from above. Again, it is possible to use Adam or quasi-Newton methods such as L-BFGS-B to optimize the synaptic weights, but we did not observe a substantial difference in the results with respect to the approximation errors, and our SQP-based solver (appendix C) typically converges within thirty epochs.

Results
Results are depicted in figure 8(b). Note that the performance of the three-and four-compartment neuron is statistically indistinguishable for ρ −1 > 0. 4. Surprisingly, the difference between the two-and threecompartment neuron is relatively subtle, with the largest difference at ρ −1 ≈ 1.5 and a median reduction in error of 38%. Still, the three-and four-compartment neurons perform slightly better than a multiplicative baseline (i.e., H(g 1 , g 2 ) = g 1 g 2 ), while the two-compartment neuron is slightly worse and features a much higher variance. As in our previous study [12], we find that two-compartment neurons substantially outperform one-compartment neurons. Dotted black-and-white lines indicate the cross-section location. E denotes the spike-rate RMSE over the entire grid and E the error for super-threshold ground-truth samples (with a rate above 12.5 s −1 ). Each diagram depicts a different slice through the input space g. The model prediction is good for two-compartment LIF neurons (A), but is of lower quality as the number of compartments increases (B and C).

Discussion
The computational advantage of three-and four-compartment neurons is not as clear-cut as is suggested by our theoretical analysis. While better performance compared to the multiplicative baseline indicates that threeand four-compartment neurons are able to compute multiplicative and not just merely divisive terms, this only results in a small average improvement in performance for random functions J ρ .
This negligible increase in performance for four-compartment neurons likely stems from the complications with product terms listed in section 3.3.3. It might be possible to find more balanced configurations of the three-and four-compartment neurons that offer a larger computational benefit, but we were not able to do so in our experiments.
Another caveat with this experiment is that the optimisation problems for three-and four-compartment neurons are highly non-convex. The weight solver is neither guaranteed nor expected to converge to an optimal solution. Correspondingly, three-and four-compartment neurons may be more powerful than what is suggested here; however, we were not able to improve upon our results using a more exhaustive stochastic optimisation scheme such as Adam with random restarts. In contrast, equation (2) is rather docile in the case of the two-compartment neuron and our solutions tend to be close to the global optimum [18].

Spiking neural network experiment
Our theoretical analysis of the n-LIF neuron family in section 3.3 predicted that models with three or more compartments can compute products of the input channel state variables g 1 , . . . , g k . Further evidence for this is that the three-compartment nonlinearity model H slightly outperforms the multiplicative baseline H(g 1 , g 2 ) = g 1 g 2 (section 3.5). We furthermore saw in section 3.4 that our model H can be used to predict the empirical neuron rate quite well-at least after fitting neuron parameters.
As such, it might be possible to solve the XOR problem, and to compute four-quadrant multiplication (cf section 2.3) with these neurons. The goal of this section is to put this theory to the test by placing the neurons  discussed above in a spiking neural network and solving for synaptic weights that compute some benchmark functions.

Solving for biologically plausible weights with Dale's principle
We compare the performance of our 'dendritic computation' network ( figure 1(b)) to a two-layer neural network ( figure 1(a)) with a similar amount of neural resources in terms of the total number of compartments. The total number of compartments is 200 + 100n in our n-LIF experiments; the intermediate population experiment requires 500 compartments. The number of non-zero synaptic weights is about 20 000 for the one-and two-compartment experiments, 60 000 for the two-layer experiment, and 20 000(n − 1) for threeand four-compartment neurons.
Data are collected over a ten-second simulation during which the inputs x 1 (t), x 2 (t) follow a fourth order Hilbert curve. We report the NRMSE between the decoded network output over time and a reference signal passed through the same synaptic filters (see figure 10, which we refer to later, as an example). We use the SQP solver described in appendix C.
Results are listed in the upper portion of table 1 under 'standard parameters'. Notably, dendritic computation with multi-compartment neurons outperforms the two-layer network for most benchmark functions. However, among our n-LIF neurons, baseline errors tend to increase with the number of compartments n. That is, errors for the three-compartment neuron are slightly higher than those for the two-compartment neuron, etc.
A notable exception to this is four-quadrant multiplication, i.e., x 1 × x 2 over [−1, 1] 2 . Here, the threeand four-compartment neurons clearly outperform the LIF and two-compartment LIF neurons. The onecompartment LIF neuron reaches an NRMSE of 100%-the network outputs a constant zero, which indeed is the best linear approximation of multiplication over all four quadrants. Similarly, since the two-compartment LIF neuron can only approximate two out of the four quadrants, the error is at 79%. In contrast, the threeand four-compartment neurons reach errors of about 36%; the resulting network solves the XOR problem, however, the output does not properly match four-quadrant multiplication at all points. The two-layer network clearly outperforms all other setups with a 13% error. However, keep in mind that this is the only benchmark function where the two-layer network has a smaller error compared to the dendritic computation schemes, while requiring more resources. Table 1. Function approximation errors E net for spiking networks using various n-LIF neurons. Results are the mean and standard deviation over 100 trials. The best result for a particular target function is set in bold; darker background colours indicate a worse ranking of the result. λ is a regularisation factor, ξ 0 denotes the range of the tuning-curve x-intercepts (cf section 2.1).

Improving four-quadrant multiplication performance
The experiment in section 3.5 indicated that our three-and four-compartment neuron models should outperform a multiplicative baseline nonlinearity. We would thus expect that we can approximate four-quadrant multiplication with errors much lower than observed 30%. The reason why we observe much higher errors in a network context is illustrated in figure 9. The current functions J i (x 1 , x 2 ) = α i e i , x 1 x 2 + β i induced by target population tuning curves with negative x-intercepts ξ 0 (i.e., β i > 0) can indeed be approximated well. In other words, we can compute four-quadrant multiplication extremely well for individual post-neurons. However, the weight solver does not find good solutions for the current functions resulting from tuning curves with positive intercepts.
Potential culprits are the increased steepness of the current function, our random sampling not being dense enough to cover the relevant portions of J i (x 1 , x 2 ), and systematic biases in the weight solver. A pragmatic solution to this problem is to simply change the tuning curves of the target population. However, this affects the classes of functions that can be decoded from this population.
The bottom portion of table 1 labelled 'adapted parameters' lists the results of an experiment where we tune the target-population x-intercepts ξ 0 to be strictly negative. We furthermore do not account for Dale's principle-this effectively increases the number of pre-neurons and doubles the number of synaptic weights.
With these changes, we are able to reach a mean dynamic network error E net of about 20%, albeit with high inter-trial standard deviations of ±5%. Some trials reach errors below 15%. Figure 10 depicts an example trial. Errors are mostly visible at the extrema, that is, the target population does not quite represent 1 and −1. Apart from these deviations, three-compartment neurons seems to be able to, qualitatively speaking, compute four-quadrant multiplication well.

Discussion
We have explored dendritic computation as a method for function approximation in spiking neural networks. We saw that, from a theoretical perspective, networks utilising dendritic computation can outperform multilayer neural networks with fixed encoders. To put this to the test, we analysed a family of n-compartment LIF neurons with passive dendrites-one of the simplest possible neuron models that supports dendritic computation and that can be realised on mixed-signal neuromorphic hardware platforms by connecting multiple neuron circuits.
Our theoretical analysis of n-LIF neurons in section 2.2 suggested that three-compartment neurons can compute multiplicative terms and should be capable of solving the XOR problem. We confirmed this in a spiking neural network experiment, where two-compartment neurons outperform larger two-layer networks in a wide range of benchmark tasks. Specifically, we found (cf section 4.1), that two-compartment neurons often reach 50% of the approximation error at about half of the neural resources in terms of neural compartments.
Dendritic computation outperforming the two-layer network for various target functions is caused by the two-layer network establishing a two-dimensional basis with only 200 neurons. This results in a substantial static decoding error (cf [11], section 2.2). Dendritic computation in a two-compartment neuron can thus, especially in noisy spiking neural network contexts, be a viable alternative. Moreover, given that our neural compartments are merely passively coupled LIF neurons without the active circuitry, we would expect that the two-compartment network uses at most half as much power as the two-layer network. In practice, we expect the reduction in power usage to be even more pronounced, since joining LIF compartments eliminates the overhead for digital spike-based communication, which is a substantial portion of the power budget of neuromorphic systems [7].
With some manual tuning, a network of three-compartment neurons can competitively (compared to a two-layer network) compute four-quadrant multiplication, a continuous version of the XOR problem that cannot be solved by two-compartment neurons. However, in general, we observed larger function approximation errors for three-and four-compartment neurons compared to two-compartment neurons. While it may seem as if these neurons are thus less 'powerful', this is not the case. Smaller static decoding errors for these functions (i.e., the same experiment as in section 3.5, but for the specific J i ; data not shown) indicate that the increase in error stems from the somatic current model H not predicting the neuron precisely enough (cf section 3.4). A more charitable interpretation is thus that three-and four-compartment neurons reach errors similar to the two-compartment neuron for most functions, despite the surrogate model H being inaccurate. It should thus be possible to reach smaller function approximation errors if we were able to improve H.
To conclude, the methods presented here extend the NEF, an already widely used set of methods for programing neuromorphics [17], towards harnessing nonlinear synapses and dendrites. Our results offer guidance when designing new mixed-signal neuromorphic systems that can be programed using the NEF. Being able to combine up to three neuron compartments provides purely analogue computation as a resource for high-level computation, while, at the same time, equipping researchers modelling biological systems with the ability to better map real neural circuits onto the hardware. Researchers interested in simulating n-LIF neurons may find our open source library libnlif useful, which provides efficient simulation and weight optimisation routines.
Our observations are also interesting from a biological perspective. The fact that increasing the number of compartments beyond three does not substantially increase the computational power of these neurons-at least when using biologically plausible reversal potentials-suggests that passive dendrites alone can-under biological constraints-at most compute functions such as XOR. This may explain the prevalence of active dendrites, which have been hypothesised to be more powerful [4,9,10], in many brain regions.
Future work may include further exploring the power of larger n-LIF neurons, and improving our models of H to better exploit the computational power of more compartments. As well, there may be some further potential when leaving biologically plausible parameter ranges behind (i.e., providing symmetric excitatory and inhibitory reversal potentials). Finally, it would be interesting to validate our results on real neuromorphic hardware that has the ability to combine multiple neuron circuits, such as BrainScales 2 [27]. Table A1. Matrix representations of the first four neuron models in figure 4. See text for details. Table A2. Reduced matrix representations of the first four multi-compartment neuron models in figure 4. The somatic compartment is disconnected from the remaining neuron model. Connections to the somatic compartment are replaced by a static conductance-based channel with reversal potentialv. The voltage difference betweenv and the equilibrium potential of the new model is proportional to the current flowing into the somatic compartment.
In order to fully determine an n-LIF neuron, we must specify the matrices L ∈ R n×n , A ∈ R n×k , B ∈ R n×k and the vectors a ∈ R n and b ∈ R n . More specifically, the Laplacian L is the difference between the weighted degree matrix and the adjacency matrix, and, in our case, is given as L = diag(CI) − C, where (C) ij = c ij is the coupling conductance between the ith and the jth compartment. The vector a consists of sums of static channel conductances for each channel (such as the leak conductance) and b contains the sums of the static channel conductances multiplied by their reversal potential. The matrix A contains one-entries for inputvariables influencing a conductance-based channel in the corresponding compartment. B contains a 'one' for each variable current-based channel in the corresponding compartment, and the reversal potential for each variable conductance-based channel in a compartment. Examples of these matrices for the first four n − LIF neuron models depicted in figure 4 are given in table A1.

A.2. Constructing the reduced system matrices
The reduced system matrices are used to compute the equilibrium potentialṽ eq under the assumption that the somatic compartment is clamped to a constant potentialv (cf equation (5)). We once more decompose the system matricesÃ,b as follows To obtainL ∈ R n×n ,Ã ∈ R n×k ,ã ∈ R n ,B ∈ R n×k ,b ∈ R n , we set the first column and row ofL to zero and replace connections to the somatic compartment with static conductance-based channels. In the somatic compartment, all conductance-based channels are replaced by current-based channels weighted by the difference betweenv and the channel reversal potential (cf [34]); furthermore, we let (ã ) 1 = 1 and addv to (b ) 1 .
We provide examples of reduced systems in table A2.

A.3. Conditioning the n-LIF system
The reduced system described above is conceptually useful, but notoriously ill-conditioned. In fact, our weight and parameter optimisation methods do not work properly without preconditioning the reduced system. We suggest the following series of transformations as a preconditioning step. Apart from the final scaling step, this transformation does not alter the current predicted by H in any way.
Step 1. Offset all voltages such thatv = 0. This is accomplished by updatingb andB : This rule follows from applying an offset o to the equilibrium state of the n-LIF system: ExpandingÃ[g]o with o = −v yields the above equations.
Step 2. Scale all voltages such thatc i ∈ {1, 0} orã i = 1. Together withv = 0, this implies that the equilibrium state of each compartment connected to the soma directly expresses the current flowing into the soma. To this end, we first assemble a positive vector of scaling factors α ∈ R n and updateL ,ã ,Ã ,c as follows: This update rule follows from scaling the equilibrium state of the n-LIF system: Again, expandingÃ[g]diag(α) −1 yields the first three update equations. Scalingc is necessary to preserve the correct output current. In general, ifv = 0, thenv must be multiplied by diag(α).
Step 3. Scale the system input and output. Inputs to the n-LIF system are typically in the microsiemens (μS) or nanoampere (nA) range, while outputs are usually single-digit nanoampere values. We hence suggest scaling all inputs by α in = 10 6 (before passing them into the system) and all outputs by α out = 10 9 . Given that v = 0, we can adapt the system as follows:

Appendix B. Mathematical derivation of the n-LIF properties
In section 3.3, we briefly described rules that govern the interaction of n-LIF input channels. To summarise, there is no nonlinear interaction between current-based inputs, inputs targeting the somatic compartment, In particular, note that the sum-of-rational-functions structure directly follows from pluggingÃ[g] −1 from equation (B.1) into equation (B.2). As we will become apparent in a moment, the product between the inverted block matrixC m [g] for branch m and the corresponding section of the input vectorb[g] form the individual rational functions. By construction, this rational function can only depend on the input channels targeting the branch m. That is, each sum term m can only depend on the variables g r 1 , . . . , g s k , J r 1 , . . . , J s k , where r, s are as defined above.
To see that the product between the inverted block matrix and the input vector forms a rational function, remember that the inverse of a matrix can be written as its adjoint scaled by the inverse of its determinant [e.g., 36, theorem 1.9, p. 366]:C Crucially, the determinant fully determines the denominator H A m of each rational function. The division by the determinant is the only division in each product between the inverted block-matrixC m [g] −1 and the corresponding portion of the input vectorb [g].
In addition to the properties listed above, claim B states that H A m only depends on the conductance-based inputs, and that it only contains products between inputs targeting different compartments. The first statement follows fromÃ[g] only containing conductance-based inputs by construction. The second statement follows from the permutation expansion of the determinant [e.g., 36, section 4.I.3, p. 337], we have , where P = S(1, . . . , n − 1).
Here, S(X) denotes the set of permutations of the set X, and sgn(π) ∈ {−1, 1} is the 'signum' of the permutation π. Importantly, each product in the determinant only contains each diagonal element d i [g] exactly once and there are no product-terms between inputs targeting the same compartment i.
Finally, claim C states that the numerators H B m only contains product terms between inputs targeting different compartments, and that each product term features at most one current-based input. The second statement is a result ofÃ[g] only depending on conductance-based inputs, and all current-based inputs only influencingb[g]. Multiplications with current-based inputs are a result of the final matrix-vector product betweeñ A −1 [g] −1 and the input matrixb[g] and there can be only one current-based input per product term.
The first statement follows from the structure of adjoint. The numerator H B m is determined by the matrix product between the adjoint ofC m [g] and the corresponding portion ofb [g]. Notably, the adjoint of a matrix A is a matrix of determinants of A where the kth row and the th column have been deleted: , where P = S(1, . . . , − 1, + 1, n − 1).
Hence, the entry (adj(C[g])) k neither contains the term d k [g], nor d [g], and the product-terms formed by computing the matrix-vector product between adj(C m [g]) andb[g] contain each input channel at most once. This concludes our proof.

Appendix C. Solving for n-LIF synaptic weights
In order to integrate n-LIF neurons into spiking neural networks, we must solve for synaptic weights. That is, given a set of pre-activities a and a set of target currents J , we would like to find weights w ij such that the somatic current predicted by our model is close to the desired target. The same approaches outlined here can analogously be applied to the model calibration (cf section 3.4).
Just as we discussed above, we assume that each of the k input channels is given as g = Wa, where a ∈ R m is the vector of all pre-activities, and W ∈ R k×m is a nonnegative sparse connection matrix. Entries in W corresponding to invalid connections between pre-and post-neurons must be forced to zero when performing weight optimisation.
Let E be the superthreshold error function (equation (3)), and L,c,ã ,Ã ,b ,B describe the reduced and conditioned system matrices describing the neuron. Given N samples (a , J ), we must minimise the following loss function with respect to the non-zero entries of W: We explore two different approaches to solving this problem. First, using gradient-based methods, and second, by constructing a SQP. Both algorithms are implemented in our software library libnlif .

C.1. Gradient descent
To determine the loss function gradient, we use the matrix inverse differential tensor (see [37,  Note that the superthreshold error function E(J dec , J tar ) is a piecewise function with three separate cases; luckily, the above equation handles all three at the same time. In case J tar < J th and J dec < J th , the output of E is zero, and thus the gradient is zero as well. In the other two cases, E is equal to J dec with some offset. This does not affect the gradient.
To enforce nonnegativity of W, and thus nonsingularity ofÃ[Wa ], weights should be clipped to zero after each gradient descent step. In the case of quasi-Newton methods, a bounded algorithm such as L-BFGS-B must be used.

C.2. Sequential quadratic programme
An alternative approach is to use a SQP. Let θ be a vector of the non-zero weights in W. Introducing N auxiliary variables v ∈ R n we can split the optimisation problem as follows: .
Expanding E 2 , we find that there is only one product term containing both θ and v . Linearising this product term in E 2 at (θ 0 , v 1,0 , . . . , v N,0 ) using a Taylor expansion results in the following expression: This equation is in linear least-squares form over θ and v 1 , . . . , v N . Combining this with a soft trust-region term E 3 , as well as regularisation factors λ 1 , λ 2 , the final loss function E is This loss function, including a nonnegativity constraint on θ and subthreshold relaxation can be expressed as a quadratic programme. We obtain a SQP by minimising E and using the resulting θ as θ 0 in a subsequent iteration.

Appendix D. Neuron parameters
See table D1.