Mathematical studies of the dynamics of finite-size binary neural networks: A review of recent progress

Traditional mathematical approaches to studying analytically the dynamics of neural networks rely on the mean-field approximation, which is rigorously applicable only to networks of infinite size. However, all existing real biological networks have finite size, and many of them, such as microscopic circuits in invertebrates, are composed only of a few tens of neurons. Thus, it is important to be able to extend to small-size networks our ability to study analytically neural dynamics. Analytical solutions of the dynamics of finite-size neural networks have remained elusive for many decades, because the powerful methods of statistical analysis, such as the central limit theorem and the law of large numbers, do not apply to small networks. In this article, we critically review recent progress on the study of the dynamics of small networks composed of binary neurons. In particular, we review the mathematical techniques we developed for studying the bifurcations of the network dynamics, the dualism between neural activity and membrane potentials, cross-neuron correlations, and pattern storage in stochastic networks. Finally, we highlight key challenges that remain open, future directions for further progress, and possible implications of our results for neuroscience.


Introduction
Understanding the dynamics of networks of neurons, and of how such networks represent, process and exchange information by means of the temporal evolution of their activity, is one of the central problems in neuroscience. Real networks of neurons are highly complex both in terms of structure and physiology. Introducing details of this complexity greatly complicates the tractability of the models. Thus, a mathematical model of neural networks needs to be carefully designed, by finding a compromise between the elements of biological complexity and plausibility that are introduced, and the analytical tractability of the resulting model [51].
A wide set of mathematical models have been proposed to investigate the behavior of biological neural networks [5,27,42]. Typically, these models attempt to simplify as much as possible the original system they describe, without losing the properties that give rise to the most interesting emergent phenomena observed in biological systems. Binary neural network models [1,19,20,40,50] represent one of the most successful examples in finding a good compromise between keeping simplicity to enhance tractability, while yet achieving a rich dynamics with a set of complex emergent network properties.
Binary models describe the dynamical properties of networks composed of threshold units, which integrate their inputs to produce a binary output, namely a high (respectively low) output firing rate when their membrane potential does (respectively does not) exceed a given threshold (see Sec. (2) for more details). Among the wide set of neural network models proposed by computational neuroscientists, threshold units represent one of the most convenient tools for studying the dynamical and statistical properties of neural circuits. The relative ease with which these models can be investigated analytically, networks, and possible implications of our results for neuroscience.

The binary network model
In this review we assume that neural activity evolves in discrete time steps, and that the threshold units are synchronously updated. These assumptions are often used in studying network dynamics, see e.g. [1,27,50,64]. A threshold unit, or artificial neuron, is a logic gate or a mathematical function that mimics the working mechanisms of a biological neuron. Typically the unit receives several inputs, which can be loosely interpreted as postsynaptic potentials at neural dendrites, and sums them to produce a binary or digital output (also known as activation or neural activity). Usually, each input of a threshold unit is multiplied by a so called synaptic weight, which represents the strength of connections between pairs of neurons. Moreover, the sum of the weighted inputs is passed through a piecewise-constant (or Heaviside) thresholding function, also known as activation function or transfer function. If the sum of the weighted inputs exceeds a threshold, the output is set to one, and the artificial neuron is said to fire at that rate. On the contrary, if the sum is below the threshold, the output is set to zero, and the neuron is quiescent. For this reason, the binary output of a threshold unit can be loosely interpreted as the firing rate of the postsynaptic neuron, namely as the number of spikes per second of its action potential, which propagates along its axon toward other neurons in the network.
The basic set of equations defining the dynamics of the discrete-time binary network is: Ji,jAj (t) + Ii + which describes the temporal evolution of the neural activity A i of the ith neuron, from the time instant t to the time instant t + 1. In Eq. (1), N represents the size (namely the number of threshold units) of the network. The matrix J = [J i,j ] i,j=0,··· ,N −1 is the (generally asymmetric) synaptic connectivity matrix of the network, whose entries J i,j are time-independent and represent the strength or weight of the synaptic connection from the jth (presynaptic) neuron to the ith (postsynaptic) neuron. Moreover, represents the total external input current (i.e. the stimulus) to the ith neuron. In more detail, I i is the time-independent deterministic component of the stimulus, while its stochastic component is the sum of N random variables σ N i,j N j (t), each one having zero mean and standard deviation σ N i,j . The vector N def = N 0 , . . . , N N −1 T represents a collection of stochastic variables with unit standard deviation, whose joint probability distribution p N is arbitrary. Then, in Eq. (1), H (·) is the Heaviside activation function with threshold θ, which is defined as follows: It is important to observe that, unlike the classic Hopfield network [40], which is symmetric and asynchronously updated, a Lyapunov function for synchronous networks with asymmetric synaptic weights, like ours, is generally not known. For this reason, the analytical investigation of the network dynamics determined by Eq. (1) proves much more challenging. In Secs. (3)-(5), we will review the techniques, that we developed in [29][30][31], for investigating the dynamical and probabilistic properties of Eq. (1) in small networks.

Analysis of bifurcations in deterministic networks
An important problem in the theory of binary networks is represented by the study of the qualitative changes in the dynamics of their neuronal activity, which typically are elicited by variations in the external stimuli. These changes of dynamics are named in mathematical terms as bifurcations [46]. Seminal work in physics focused on the study of bifurcations in infinite-size spin networks, see e.g. [22,56,70]. On the other hand, the theory of bifurcations of non-smooth dynamical systems composed of a finite number of units, including those with discontinuous functions such as binary networks, has been developed mostly for continuous-time models, see e.g. [7,37,48,49,52], and for piecewise-smooth continuous maps [62]. Despite the importance of discontinuous maps in computational neuroscience, the development of new techniques for studying their bifurcation structure has received much less attention [6].
In [29,31] we tackled the problem of deriving the bifurcations in the dynamics of the neural activity, for finite-size binary networks with arbitrary connectivity matrix. As is common practice, we performed the bifurcation analysis in the zero-noise limit, i.e. for σ N i,j → 0 ∀i, j (see Eq. (1)). In particular, we studied how the dynamics of neural activity switches between stationary states and neural oscillations, when varying the external stimulus to the network. Because of the discrete nature of the neural activity, there exists only a finite number of stationary and oscillatory solutions to Eq. (1). This allowed us to introduced a combinatorial brute-force approach for studying the bifurcation structure of binary networks, which we describe briefly below.
We introduce the vector A def = A 0 , . . . , A N −1 T containing the activities of all N neuron, and the sequence S (0, T ) of activity vectors A (0) → A (1) → · · · → A (T ) , for some 1 ≤ T ≤ 2 N . Given a network with P distinct input currents I 0 , · · · , I P−1 , we also define Γ Iα to be the set of neurons that share the same external current I α (namely Γ Iα . Then, in [31] we proved that the sequence S (0, T ) is a solution of Eq. (1) in the time range [0, T ] (i.e. A (t = j) = A (j) for j = 0, · · · , T ), for every combination of stimuli (I 0 , · · · , I P−1 ) ∈ V = V 0 × · · · × V P−1 , where: A neural sequence loses its stability, turning into another sequence, at the boundaries Λ α and Ξ α , which therefore represent the coordinates of the bifurcation points of the neural activity. In this review, we focus specifically on the subset of sequences that satisfy the additional constraint A (0) = A (T ) : these sequences represent the candidate oscillatory solutions with period T of Eq. (1). We also observe that, in the special case T = 1, we obtain the set of candidate stationary solutions of the network equations. For this reason, the bifurcation diagram of a binary network can be decomposed into two panels, the oscillation and the multistability diagrams. These diagrams describe the relationship between the oscillatory/stationary solutions of Eq. (1), and the set of stimuli. In other words, these diagrams display the fragmentation of the stimulus space into areas where several oscillatory solutions occur, and/or where the network is (multi)stable.
It is important to note that only the sequences whose hyperrectangles V have positive hypervolumes (i.e. the sequences that satisfy the condition Λ α < Ξ α , for every α and j such that Γ (j+1) Iα,1 = ∅) are solutions of Eq. (1), for some combinations of stimuli. On the contrary, if the hypervolume of V is zero, the corresponding neural sequence is never a solution of Eq. (1). Unfortunately, the sequences with positive hypervolumes are not known a priori, therefore they must be found through a brute-force searching procedure. Because of the combinatorial explosion of the number of possible sequences for increasing N , typically brute-force algorithms have at least exponential complexity with respect to the network size. Therefore they can be applied only to small networks (typically N < 30), regardless of the density of their synaptic connections. However, real cortical circuits are typically very sparse [45], therefore in [31] we developed an efficient algorithm specifically designed for networks with a low density of the synaptic connections. This efficient algorithm takes advantage of the information provided by the absence of the synaptic connections among the threshold units to speed up the detection of the oscillatory and stationary solutions of Eq. (1). In other words, the sparse-efficient algorithm avoids checking the sequences of neural activity vectors that are not compatible with the topology of the synaptic connections, resulting in a much faster calculation of the bifurcation structure of the network model. The interested reader is referred to [31] for a detailed discussion of the algorithm.
In Fig. (1) we show an example of bifurcation diagram, that we obtained from Eq. (2), in the specific case of the network parameters reported in Tab. (1).
In this example, we consider a network with heterogeneous random synaptic weights, which is composed of 3 excitatory neurons and 2 inhibitory neurons. Moreover, in Fig. (2), we show two examples of state-to-state transitions, obtained by solving Eq. (1) in the zero-noise limit, for all the 2 N initial conditions of the network dynamics (i.e. from A (t = 0) = 0, . . . , 0 T to A (t = 0) = 1, . . . , 1 T ).

Dualism between neural activity and membrane potentials
In this section we show the existence of a dualism between the neural activity states and the membrane potentials in a network composed of binary neurons. These variables are intrinsically related, but have distinct dynamical aspects. As we explained in Sec. (2), the term (1) can be loosely interpreted as the weighted sum of postsynaptic potentials at neural dendrites. Therefore this term, plus the eventual

Multistability Diagram
Oscillation Diagram A B Figure 1: An example of bifurcation diagram. This figure shows the bifurcation diagram of the binary network, obtained for the parameters in Tab. (1). We supposed that the excitatory neurons with indexes i = 0 and i = 1 receive an arbitrary external stimulus I0 = I1 = IE, which represents the first bifurcation parameter, while the excitatory neuron with index i = 2 receives a fixed stimulus I2 = 10. Moreover, we assumed that the inhibitory neuron with index i = 3 receives an arbitrary stimulus I3 = II , which represents the second bifurcation parameter, while the inhibitory neuron with index i = 4 receives a fixed stimulus I4 = 5. Then, we plotted the multistability and oscillation diagrams in the IE − II plane, according to Eq. (2). A) Multistability diagram. Each color represents a different degree of multistability (white = astable, red = monostable, green = bistable, blue = tristable). B) Oscillation diagram. Each color represents a different set of oscillatory solutions of Eq. (1) (the notation x : y reveals the formation of y distinct oscillations with period T = x). For example, for every combination of stimuli (IE, II ) that lies in the yellow area, Eq. (1) has 2 oscillatory solutions with period T = 2, while for every combination in the green areas, the equation has an oscillatory solution with period T = 3. Note that, for other values of the network parameters, oscillations with distinct periods may coexist in the same area. external stimulus to the ith neuron, can be interpreted as the total membrane potential V i of that neuron, namely: Since Eq. (1) does not depend on the variables V i (t), it can be solved without knowing the behavior of the membrane potentials. For this reason, the calculation of the probability distribution of V i (t) has always been neglected in the literature. Interestingly, we show that it is possible to exactly derive the set of equations satisfied by the membrane potentials, and that these equations provide a complementary description of the network dynamics with respect to Eq. (1). Under the change of variables Eq. (3), we observe that Eq. (1) can be equivalently transformed into the following set of equations: provided the matrix J is invertible (note that this condition can be eventually relaxed, see the supplementary information of [29] for further details). Note also that Eqs. (1) and (3) imply , 1}, and that the binary output of a threshold unit can be loosely interpreted as the firing rate of that neuron: It is important to observe that the neural activities are discrete random variables, therefore they are described by probability mass functions (pmfs). We introduce the vector A = A 0 , . . . , A N −1 T containing the activities of all N neurons at time t + 1, and A the vector of the activities of all neurons at time t. We also define: where the matrix Ψ is invertible by hypothesis. Moreover, we introduce the matrix C = [C i,j ] i,j=0,··· ,2 N −2 , such that: where δ i,j is the Kronecker delta, and: In Eq. (6) is defined as follows: where D (ν) is the decimal representation of the binary vector ν. For example, for N = 2, we get: (note for example that, for any ( , whose decimal representation is 3). Finally, we define: Hj.
Then, in [29] we proved that the conditional probability distribution of A given A , and the stationary joint distribution of A in the limit t → +∞, are: On the other hand, the membrane potentials V i (t) are continuous variables, therefore they are described by probability density functions (pdfs). By introducing the vector contains the membrane potentials of all N neurons at time t + 1, and the vector V of the membrane potentials of all neurons at time t, the conditional probability distribution of V given V , and the stationary joint distribution of V in the limit t → +∞, are [29]: By comparing Eqs. (7) and (8) with, respectively, Eqs. (9) and (10), we observe that there exists a close relationship between the neural activities A and the membrane potentials V , which we further investigate in SubSec. (4.2). It is also important to note that, generally, the integrals in Eqs. (5)- (8) can be calculated only numerically or through analytical approximations. However, in the specific case when the matrix Ψ is diagonal, and the stochastic variables N i (t) are independent, exact analytical solutions can be found in terms of the cumulative distribution functions of the noise sources (see [29]). In Fig. (3) we show an example of the joint probability distributions P (A, t) and p (V , t) in the largetime limit, obtained for the network parameters that we reported in Tab. (1). In this figure we considered two distinct distributions of the noise sources (namely correlated normally-distributed variables N i (t), as well as independent sources with Laplace distributions), and we showed how they differently shape the probability distributions of the neural activity and of the membrane potentials.

Cross-neuron correlations
The joint probability distributions P (A, t) and p (V , t), that we reported in the previous section, provide a complete probabilistic description of the network in the limit t → +∞. In particular, these distributions can be used for calculating cross-neuron correlations, which represents a powerful tool for quantifying the exchange of information between neurons. In [29], we derived exact analytical expressions of the Pearson correlation coefficient for t → +∞, in the case when the matrix Ψ is diagonal (i.e. , while the noise sources are independent and normally distributed. By applying Eq. (8), we found that the pairwise correlation between the neural activities of two neurons with indexes i and j is: In a similar way, from Eq. (10) we derived the following formula for the pairwise correlation between the membrane potentials: In Fig. (4) we plotted some examples of cross-correlations, obtained for the the network parameters that we reported in Tab. (1). This figure shows that variations of the external stimuli I switches the binary network between synchronous (i.e. highly correlated) and asynchronous (i.e. uncorrelated) states. Moreover, we observe that low (respectively high) correlations between neural activities do not necessarily correspond to low (respectively high) correlations between the membrane potentials. In other words, the linear relationship between the neural activity and the membrane potentials, as given by Eq. (3), is not reflected by the correlation structure of these variables. This result proves that, despite the similarity of the corresponding equations (which we already observed in SubSec. (4.1), by comparing Eqs. (7) and (8) with, respectively, Eq. (9) and (10)) and their linear relationship, neural activity and the membrane potentials represent two opposed aspects of binary networks. The interested reader is referred to [29] for a detailed description of the conditions under which synchronous and asynchronous states occur in the network, and for the extension of Eqs. (11) and (12) to encompass higher-order (i.e. groupwise) correlations among an arbitrary number of neurons.

Pattern storage and retrieval in presence of noise
In this section we consider the problem of storing D sequences of neural activity vectors A (i,0) → A (i,1) → · · · → A (i,Ti) , for i = 0, . . . , D − 1. In the context of content-addressable memories, one aims to determine a synaptic connectivity matrix J that stores these sequences in the binary network, so that each sequence can be retrieved by initializing the network state to A (t = 0) = A (i,0) , even in the presence of noise. Any method for calculating such a connectivity matrix is typically called learning rule.
Each transition A (i,ni) → A (i,ni+1) in the sequences is noise-resistant whenever P A (i,ni+1) , t ni + 1|A (i,ni) , t ni ≈ 1, since under this condition the probability that the state A (i,ni) switches to a state other than A (i,ni+1) at the time instant t ni + 1 is negligible. Therefore, according to Eq. (7), the sequences of neural activity can be stored in the network by solving the following set of equations: with respect to the connectivity matrix J. Generally, these equations can be solved only numerically or through analytical approximations. However, in the specific case when the matrix Ψ is diagonal and the stochastic variables N i (t) are independent, exact analytical solutions can be found.
In [29], we considered the case of of independent normally-distributed noise sources, and we found that, if the network is fully-connected without self connections (so that J i,i = 0), the matrix J that stores the D neural sequences satisfies the following sets of linear algebraic equations: In Eq. (13), where K to the magnitude of P A (i,n i +1) , tn i + 1|A (i,n i ) , tn i (see Eq. (7)), so that the arrows are white for every pair of states A (i,n i ) , A (i,n i +1) such that P A (i,n i +1) , tn i + 1|A (i,n i ) , tn i ≈ 0, while they are blue if the conditional probability is close to 1. The noise sources are supposed to be independent and normally distributed, so that the synaptic connectivity matrix J that stores the patterns can be calculated according to Eq. (13).
In these examples, we set K .
In particular, we observe that whenever A (i,0) = A (i,Ti) , the ith neural sequence is an oscillatory solution of Eq. (1) with period T i , so that if the matrix J is calculated by solving Eq. (13) and the network is initialized to any state of the oscillation, the network will cycle repeatedly through the same set of states. Moreover, in the special case T = 1, the neural sequence represents a stationary solution of Eq. (1) . In Fig. (5), we show some examples of storage of stationary patterns and oscillatory sequences with T = 3. This figure is obtained for the the network parameters reported in Tab. (2), and proves that the learning rule Eq. (13) can be used to store safely sequences of neural activity also in very noisy networks.

Networks with quenched disorder
The results reported in Secs. (3) and (4) are valid for binary neural networks with arbitrary topology of the synaptic connections (which does not evolve over time). For this reason, they can be applied to networks with regular connectivity matrices J, as well as to random networks with frozen synaptic weights (see e.g. the network parameters reported in Tab. (1)). In other words, the connectivity matrix of random networks can be interpreted as a single realization of the synaptic wiring among neurons, generated according to some known probability distribution p J . These models are said to present quenched disorder [39,43,70]. Each realization of the connectivity matrix, generated according to the distribution p J , usually produces a distinct matrix J, which in turn gives rise to distinct dynamical properties of the neural activity. In particular, each realization typically produces distinct bifurcation diagrams. For this reason, in order to obtain statistically representative results, one needs to average the coordinates of the bifurcation points over the variability of the matrix J. More generally, one would be interested in determining the probability distribution of the bifurcation points over the matrix J.
In [30] we derived semi-analytical expressions of these probability distributions. For simplicity, we focused on the bifurcation points of the stationary states in the zero-noise limit (σ N i,j → 0 ∀i, j ). We supposed that the entries J i,j of the connectivity matrix can be decomposed as the product of a synaptic weight, W i,j (which represents the random strength of interaction of the jth neuron on the ith neuron), with another random variable, T i,j (which represents either the presence, for T i,j = 1, or the absence, for T i,j = 0, of a synaptic connections from the jth to the ith neuron). The random variable W i,j is supposed to be continuous, and distributed according to some distribution p Wi,j . On the other hand, the variable T i,j is discrete, and such that T i,j = 1 with probability P i,j ∈ [0, 1], while T i,j = 0 with probability 1−P i,j . We also supposed that the variables {W i,j , T i,j } i,j=0,··· ,N −1 are statistically independent. Because of these assumptions, the random variable I i (as given by Eq. (2); note that the superscript (j) can be omitted in the case of stationary states studied in this section) is distributed as follows: where P (R) represents the power set of R def = {i ∈ {0, · · · , N − 1} : A i = 1}. Moreover, we call F Ii the cumulative distribution function of I i . Then, the coordinates of the bifurcation points, Λ α and Ξ α , are distributed as follows: for X ∈ {Λ α , Ξ α }. In Eq. (14), δ (·) is the Dirac delta function, p X c is the component of p X that describes the statistical behavior of the continuous values of X, and F X is the cumulative distribution function of X. Since, according to Eq. (2), Λ α and Ξ α are, respectively, the maximum and minimum of the independent variables I i , they must be distributed according to order statistics [9,10,36,75]. By calling per (·) the matrix permanent, in [30] we proved that X c is distributed as follows: is a γ α,u ×v matrix, and I γα,0,γα,0−1 is the γ α,0 ×(γ α,0 − 1) all-ones matrix.
Moreover, in Eq. (14), {x q } q∈D represents the set of the discrete values of X, at which the cumulative distribution function F X , namely: is (possibly) discontinuous. Note that D = Γ Iα,1 and D = Γ Iα,0 , for Λ α and Ξ α respectively, while The mean multistability diagram of the network is the plot of the bifurcation points Λ α and Ξ α , averaged over the realizations of the synaptic connectivity matrix J. In other words, the mean bifurcation points Λ α and Ξ α (where the brackets · represent the mean over the realizations) correspond to the values of the stimulus I α at which a given neural activity state A loses its stability on average, turning into another stationary state or an oscillation. In [30] we proved that: where the functions p X c (x) and F X (x) are given, respectively, by Eqs. (15) and (16). The probability that a given activity state A is stationary for a fixed combination of stimuli I = I 0 , · · · , I P−1 T , corresponds to the probability that I ∈ V, where the coordinates of the hyperrectangle V are calculated from Eq. (2) for the given state A. In [30] we proved that this probability can be calculated from the cumulative distribution of the bifurcation points as follows: Moreover, the probability to observe the state A in the whole multistability diagram of a single realization of the matrix J (i.e. the probability that A is stationary, regardless of the specific combination of stimuli), corresponds to the probability that the hyperrectangle V has positive hypervolume vol (V ). In [30] we proved that this probability has the following expression: Eqs. (14)- (19) provide a complete description of the statistical properties of the stationary states in networks with quenched disorder. It is also important to note that these equations are semi-analytical, since they are expressed in terms of 1D integrals containing the distribution p Wi,j . These integrals may be calculated exactly for some p Wi,j , for example in the case of normally-distributed weights. However, for simplicity, in this review and in [30], they are calculated through numerical integration schemes, because fully-analytical expressions may be very cumbersome. In Figs. (6) and (7) we show an example of these results for a specific distribution of the connectivity matrix.
We consider a network composed of 3 excitatory neurons (with indexes i = 0, 1, 2), and 2 inhibitory neurons (i = 3, 4). The excitatory and inhibitory neurons receive, respectively, external stimuli I E and I I . We also assume that the synaptic weights W i,j are distributed according to the following powerlaw distribution: where S and W represent, respectively, the horizontal shift and the width of the support of the distribution. To conclude, in Tab. (3) we reported the values of the parameters P, θ, S and W that we chose for this network.   (7) The symbol × in the matrices S and W means that the probability distributions of the stationary states and of the bifurcation points are not affected by those parameters, since the corresponding synaptic connections are absent (P i,j = 0).  . (19)), while the blue bars are calculated numerically through a Monte Carlo method (see [30] for more details).
mean multistability diagram of the network, as well as the occurrence probability of the activity states for fixed stimuli (i.e. I E = 1 and I I = 0) and regardless of the stimuli (see, respectively, Eqs. (17), (18) and (19)).

Discussion
New mathematical techniques for analytically investigating finite-size and small-size neural network models are invaluable theoretical tools for studying the brain at its multiple scales of spatial organization, that complement the already existing mean-field approaches. Studying how the complexity and dynamics of neuronal network activity change with the network size is of fundamental importance to understand why networks in the brain appear organized at multiple spatial scales. In this article, we reviewed the effort we made in this direction, trying to fill the gap in the current neuro-mathematical literature. In the following, we discuss strengths and weaknesses of the approach we developed so far, the implications of our work for specific issues related to bifurcation dynamics and learning, and for future progress in the understanding of the function of networks in the brain.

Bifurcation analysis of binary networks
An effective tool for studying spin networks in physics is represented by their energy (Hamiltonian) function. In order to study the low-temperature physical properties of the network at the thermodynamic equilibrium, one is often interested in finding out the global (and possibly degenerate) minimum energy state of the network. This is known as the ground state of the system, and it can be calculated by minimizing the energy function over the space of all possible spin configurations at absolute temperature. This is an optimization problem, which, for networks on non-planar or three-or higher-dimensional lattices, has been proven to be NP-hard [61].
In the study of discrete-time binary neural networks, energy (or, more generally, Lyapunov) functions typically are known only for asynchronously updated neurons with symmetric synaptic connections [40]. For neural networks with asymmetric connectivity matrices and/or synchronous update, like the one we considered in this review, the search for the ground state(s) turns into the more general problem of determining the long-time (non-equilibrium) states in the zero-noise limit. It is important to observe that this problem is even more formidable than the search for the ground states, due to the intractable number of oscillatory sequences that can eventually be observed during the network dynamics. We are not aware of any algorithm that performs efficiently this highly demanding combinatorial analysis in synchronouslyupdated networks with asymmetric connections. The algorithm that we introduced in [31] represents an attempt to tackle this problem, in the specific case of networks with sparse synaptic connections.
Once the set all the possible stationary and oscillatory solutions has been evaluated, Eq.
(2) provides a fast, analytical way to calculate the bifurcation diagram of the network. On the other hand, the bifurcation analysis of networks composed of neurons with graded output typically requires numerical continuation techniques [46], which do not provide any analytical intuition of the mechanisms underlying the changes of dynamics.

Probability distributions and cross-neuron correlations
The analytical results reported in SubSecs. (4.1) and (4.2) provide a complete description of the probabilistic behavior of the neural activity and of the membrane potentials in the long-time regime of single network realizations. These results provide new qualitative insights into the mechanisms underlying stochastic neuronal dynamics, which hold for any network size, and therefore are not restricted to small networks only.
However, the main drawback of Eqs. (8) and (10) (and, as a consequence, also of Eqs. (11) and (12)), is represented by the quantitative evaluation of the probability distributions and of the cross correlations. This requires the calculation of a number of coefficients H i that increases exponentially with the network size, and therefore proves intractable already for networks composed of a few tens of neurons. A more efficient quantitative estimation of these quantities for large networks can be performed numerically through Monte Carlo methods.
In order to provide a complete description of the stationary behavior of networks with quenched disorder, the calculation of the probability distributions of the bifurcation points across network realizations, that we reported in Sec. (5) (see Eqs. (14)- (17)), as well as the calculation of the occurrence probability of the stationary states (Eqs. (18) and (19)), must be performed for every stationary state of the model. Unfortunately, these states are not known a priory, therefore the calculation of the probability distributions must be repeated for all the 2 N combinations of the neural activity states. A possible solution to this problem, in the specific case of sparse networks, is represented by the sparse-efficient algorithm that we introduced in [31]. This algorithm allows a fast evaluation of the stationary solutions of the network, so that the calculation of the probability distributions can be performed only on the actual stationary states detected by the sparse-efficient algorithm.
To conclude, another disadvantage of Eqs. (14)- (19) is represented by the numerical calculation of the matrix permanent, which is computationally demanding. The fastest known technique for calculating the permanent of arbitrary matrices is the Balasubramanian-Bax-Franklin-Glynn (BBFG) formula [8,11,12,35], which has complexity O 2 N −1 N 2 . In order to alleviate this computational bottleneck, in [30] we derived a closed-form analytical expression of the permanent of uniform block matrices, which proved much faster than the BBFG formula. This solution allowed us to speed up considerably the calculation of the bifurcation structure of statistically-homogeneous multi-population networks with quenched disorder, which are often considered to be a good approximation of biologically realistic circuits, see e.g. [16,32,39].

Learning rule
The algorithm we introduced in SubSec. (4.3) for storing some desired sequences of neural activity, was obtained in [29] by manipulating the conditional probability distribution of the neural activity (see Eq. (7)). This distribution does not depend on the coefficients H i , therefore the learning rule has polynomial complexity. For this reason, its applicability is not restricted to small networks only.
Another interesting property of this learning rule is represented by the possibility to store sequences of neural activity also in noisy networks. Typically, noise can break a neural sequence if the stochastic fluctuations are sufficiently strong. However, our algorithm is designed for being noise-resistant, namely the probability of breaking the sequence under the influence of noise can be made arbitrarily small.
In the literature, several learning rules have been proposed for networks of binary neurons. A mechanism for storing and retrieving static patterns of neural activity in networks with symmetric connectivity was proposed by Hopfield [40]. The storage of sequences of temporally evolving patterns was investigated by Sompolinsky, Kanter and Kleinfeld [44,72], for networks with non-instantaneous synaptic transmission between neurons. In [25], Dehaene et al introduced an alternative approach based on temporally evolving synapses. Then, Buhmann and Schulten [15] showed that asymmetric connectivity and noise are sufficient conditions for storing and retrieving temporal sequences, without further assumptions on the biophysical properties of the synaptic connections.
It is important to observe that the learning rule introduced in SubSec. (4.3) does not require transmission delays, time-dependent synaptic strengths, or the presence of noise. Only asymmetric synaptic connections are required. This is compatible with experimental observation, in that the vast majority of synapses in real biological networks are asymmetric [24]. Our result can be considered as an extension to stochastic networks of the associating learning rule for deterministic models, introduced by Personnaz et al in [65].

Open problems and future directions for mathematical developments
While in [31] we proposed an efficient solution for performing the bifurcation analysis of binary networks with sparse connectivity, fast algorithms for dense networks proved more difficult to develop. In [31] we showed that, in the specific case of homogeneous networks with regular topologies, it is possible to take advantage of the symmetries of the network equations to speed up the calculation of the bifurcation diagram. This observation allowed us to introduce an algorithm which runs in linear time with respect to the network size. However, the development of efficient algorithms for dense networks with arbitrary topology of the synaptic connections represents a much more difficult challenge, and it still remains an open problem to be addressed in future work.
Another open problem is represented by the bifurcation analysis of large and medium-size networks. Since computer's processing power increases over time, the size of the networks that can be studied through combinatorial approaches, such as the one we introduced in [31], is expected to increase accordingly. For this reason, we believe that these methods are key to the future development of computational neuroscience and of the physics of complex systems. Beyond brute-force processing power, other techniques can be developed for accelerating combinatorial algorithms; in particular, the algorithm that we developed in [31] lends itself to be parallelized over several processors. It also is important to note that our algorithm calculates exact bifurcation diagrams, while the calculation of approximate bifurcation diagrams, through a heuristic search of the oscillatory and stationary solutions of the network equations, would prove much faster.
To conclude, an important open problem in the study of networks with quenched disorder, is represented by models with correlated synaptic weights. In Sec. (5), we calculated the probability distribution of the bifurcation points, through the results derived in [9,10,36,75] for the order statistics of a set of independent random variables. On the other hand, it is known that synaptic weights in real cortical circuits are correlated, as a consequence, for example, of synaptic plasticity mechanisms. However, a generalization of order statistics to sets of arbitrarily correlated random variables is still out of reach. More generally, the analytical investigation of networks with correlated synaptic weights has proven a formidable problem in mathematical neuroscience, that has challenged also the mean-field theories of large-size systems. Because of its biological relevance, the presence of synaptic correlations represents an important ingredient in the study of neural network dynamics, which needs to be addressed in future research for increasing the biological plausibility of the models.

Possible implications of this mathematical progress for neuroscience
Being able to understand analytically the dynamics of finite-size networks from the circuit's equations is potentially important for improving our understanding of how neural circuits work and of how and why their function is impaired in certain neural disorders. The pattern and power of the external inputs, the pattern of anatomical connectivity, the synaptic strength and the relative firing rate of excitatory and inhibitory neurons, are all key elements in determining the functional organization and output of a circuit. Yet, it is not known how exactly these factor combine to produce brain functions and to cause dysfunctions. Mathematical work to understanding neural networks of arbitrary size could be useful to address two questions relevant to these issues.
The first question regards the relationship between anatomy, functional coupling and population coding in local neural circuits. Our own work (e.g. [59,60,68,79]), and that of many others [14,18,26,57,71,78], has shown that both the dynamics of individual neurons and of populations, and the functional coupling between cells, is crucial for shaping information in population codes and for behavior. Functional coupling between cells may arise both because of anatomical connectivity between neurons, but also because of other factors such as common inputs. Thus, the relationship between functional coupling, circuit's anatomy, and population-level information coding has remained largely unaddressed. However, recent advances in experimental techniques allow the simultaneous functional imaging of activity of several neurons in mice during sensory or cognitive tasks, as well as the post-mortem measure by Electron-Microscopy of the anatomical connectivity of the same set of neurons that were functionally imaged in vivo [47]. The mathematical work reviewed here develops a set of tools that could be used to complement the measure of anatomy and physiology from the same circuits, and help bridging the gap between these two measures. Our tools, when coupled with modern experimental techniques such as those described above, could be used, in particular, to understand what is the consequence of specific patterns of recurrent anatomical connectivity on population coding and circuit dynamics, and then to test these theoretical relationships on real data.
A second possible direction of relevance for neuroscience of our work is to use these tools to understand the neural origin of certain brain disorders. For example, it is thought that Autism Spectrum Disorders (ASD) result, at least in part, from abnormal changes in the functional organization and dynamics of neural circuits [41,67,69,77]. However, although many changes in parameters such as the strength of synaptic connections and/or the change in firing properties of certain classes of neurons have been observed in ASD, it is still unclear how different elementary changes in neural parameters combine to change the circuit's function. Being able to test directly, and understand mathematically at a deep level, how the changes of such basic neural properties affect the circuit's dynamics at different spatial scales, including scales that involve finite-size neural networks, could be useful to understand the origin and consequences of aberrant circuit function in ASD conditions, as well as in other neural disorders.