Stochastic pair approximation treatment of the noisy voter model

We present a full stochastic description of the pair approximation scheme to study binary-state dynamics on heterogeneous networks. Within this general approach, we obtain a set of equations for the dynamical correlations, fluctuations and finite-size effects, as well as for the temporal evolution of all relevant variables. We test this scheme for a prototypical model of opinion dynamics known as the noisy voter model that has a finite-size critical point. Using a closure approach based on a system size expansion around a stochastic dynamical attractor we obtain very accurate results, as compared with numerical simulations, for stationary and time-dependent quantities whether below, within or above the critical region. We also show that finite-size effects in complex networks cannot be captured, as often suggested, by merely replacing the actual system size N by an effective network dependent size Neff.


I. INTRODUCTION
From classical problems in statistical physics [1,2] to questions in biology and ecology [3][4][5], and over to the spreading of opinions and diseases in social systems [6][7][8][9][10], stochastic binary-state models have been widely used to study the emergence of collective phenomena in systems of stochastically interacting components. In general, these components are modeled as binary-state variables -spin up or down-sitting at the nodes of a network whose links represent the possible interactions among them. While initial research focused on the limiting cases of a well-mixed population, where each of the components is allowed to interact with any other, and regular lattice structures, later works turned to more complex and heterogeneous topologies [11][12][13][14]. A most important insight derived from these more recent works is that the macroscopic dynamics of the system can be greatly affected by the particular topology of the underlying network. In the case of systems with critical behavior, different network characteristics have been shown to have a significant impact on the critical values of the model parameters [15][16][17], such as the critical temperature of the Ising model [18][19][20] and the epidemic threshold in models of infectious disease transmission [21][22][23][24][25]. Thus, the identification of the particular network characteristics that have an impact on the dynamics of these models, as well as the quantification of their effect, are of paramount importance.
The first theoretical treatments introduced for the study of stochastic, binary-state dynamics on networks relied on a global-state approach [2,26], i.e. they focused on a single variable -for example, the number of nodes in one of the two possible states-assumed to represent the whole state of the system. In order to write a master equation for this global-state variable, some approximation is required to move from the individual particle transition rates defining the model to some effective transition rates depending only on the chosen global variable. Within this global-state approach, the effective field approximation assumes that all nodes have the same rate of switching to the other state, such that local densities can be replaced by their global equivalents and all details of the underlying network are completely lost. As a consequence, results are only accurate for highly connected homogeneous networks, closer to fully connected topologies.
More recent theoretical treatments have proposed a node-state approach, departing from a master equation for the N -node probability distribution and, thus, taking into account the state of each individual node. From this master equation, general evolution equations for the moments and correlations of the relevant dynamical variables can be derived. Depending on the particular functional form of the individual particle transition rates, this system of dynamical equations can be open -an infinite hierarchy of equations, each of them depending on higher-order correlation functions-or closed. For open systems of equations, different schemes have been proposed for their closure: a mean-field approximation [25,27,28], in which all moments are replaced by products of individual averages, and a van Kampen type system-size expansion [29,30]. Other variants of the moments expansion have been previously considered in the literature with different types of moment closure assumptions [31][32][33][34], see also the review [35] of these techniques. Even when the system of equations is closed, further approximations are required to deal with the appearance of terms depending on the specific network structure. In particular, an annealed network approximation has been successfully used in a number of models [36][37][38][39][40][41][42] in order to deal with these terms. This approximation involves replacing the original network by a complementary, weighted, fully connected network, whose weights are proportional to the probability of each pair of nodes being connected. In this way, highly precise analytical solutions can be obtained for all relevant dynamical variables.
A third group of analytical treatments has sought to establish an intermediate level of detail in its description of the dynamics, in-between the global-state and the node-state approaches. Similar to the global-state method, though significantly improving on it, a reduced set of variables is chosen to describe the state of the system. In particular, information is aggregated for nodes of a given degree. In this manner, some essential information about the network structure is kept -different equations for nodes with different degrees-while the tractability of the problem is greatly improved. As with the global-state approach, in order to write a master equation for this reduced set of variables, some further approximation is required to translate the individual particle transition rates that define the model into some effective transition rates depending only on the chosen set of variables. The number of these variables, as well as their nature, define different levels of approximation, which can be ordered from more to less detailed as: Approximate Master Equation (AME) [16,17], Pair Approximation (PA) [43][44][45][46][47][48][49] and Heterogeneous Mean Field (HMF) [25,37,50,51]. Many works based on these approaches have further neglected dynamical correlations, fluctuations and finite-size effects, which corresponds to a mean field type of analysis. While there have been attempts to relax these restrictions and deal with finite-size effects, these efforts have been usually based on an adiabatic elimination of variables [37,43,48,49,[51][52][53], whose range of validity is limited.
In this paper we consider this intermediate level of description focusing on the Pair Approximation that chooses as a reduced set of relevant variables the number of nodes with a given degree in a given state and in the number of links connecting nodes in different states. We develop a full stochastic treatment of the pair approximation scheme in which we avoid mean-field analysis or adiabatic elimination of variables. In particular, after writing a master equation for the reduced set of variables, we obtain effective transition rates by introducing the pair approximation as described in Ref. [43] and elaborating on the assumptions used on this approach as well as on their quantitative implications. Furthermore, after writing general equations for the moments and correlations, we propose two different strategies for their closure and solution. The first one, that we term S1PA, is based on a van Kampen type system-size expansion [29,54] around the deterministic solution of the dynamics, that is, around the lowest order term of the expansion which corresponds to the thermodynamic limit of infinite system size. This approach allows us to derive linear equations for the correlation matrix and first-order corrections to the average values. The second strategy, that we term S2PA, is based on a system-size expansion around a dynamical attractor, and it allows us to obtain expressions not only for the stationary value of all relevant quantities but also for their dynamics.
Although our methodology is very general for any set of transition rates, only for the first closure strategy -S1PAone can apply straightforwardly our general method to any model under study, to end up with the desired equations. The second one -S2PA-requires specific rates for the analysis to proceed. Thus, for the sake of concreteness, we focus on the noisy voter model as an explicit example for which we carry out a full analytical treatment. The noisy voter model is a variant of the original voter model [4,55] which, apart from pairwise interactions in which a node copies the sate of a randomly selected neighbor, it also includes random changes of state. It is a paradigmatic example of a stochastic binary-state model, with applications in the study of non-equilibrium systems in a wide range of fields, and it has been studied by, at least, five mutually independent strands of research, largely unaware of each other. Namely, random processes in genetics [56], percolation processes in strongly correlated systems [57], heterogeneous catalytic chemical reactions [58,59], herding behavior in financial markets [60], and probability theory [61]. The behavior of the noisy voter model is characterized by the competition between two mechanisms: On the one hand, the pairwise copy interactions tend to order the system, driving it towards a homogeneous configuration -all spins in the same state, whether up or down-. On the other hand, the random changes of state tend to disorder the system, pulling it away from the homogeneous configurations. These homogeneous configurations are the absorbing states of the ordinary voter model, which lose their absorbing character due to the random changes of state. Depending on the values of the model parameters, which set the relative strength of one mechanism over the other, the system can be found in a mostly ordered regime -dominated by pairwise interactions-or a mostly disordered regime -dominated by noise-. Furthermore, a noise-induced, finite-size transition can be observed between these two behavioral regimes. The finite-size character of the transition is due to the fact that the critical point tends to zero in the thermodynamic limit of large system sizes. Although most of the initial literature about the noisy voter model focused only on regular lattices [57,61] and a fully-connected network [60,62], recent studies have addressed more complex topologies, both from an effective field perspective [26,63,64] and using an annealed network approximation [42]. In particular, while the effective field approach was only able to broadly capture the effect of the network size and mean degree on the results of the model for highly homogeneous and connected networks, the annealed approximation [42] was, in addition, able to reproduce the impact of the degree heterogeneity -variance of the underlying degree distributionon the critical point of the transition and the temporal correlations with a high level of accuracy, as well as the main effects on the local order parameter, though with significantly less accuracy. Further studies of the noisy voter model include the effects of zealots [65], nonlinear group interactions [53] and ageing [66].
When applied to the noisy voter model, the first of the approaches proposed in this paper -S1PA-is able to improve on the accuracy of previous methods only for the region above the critical point, while it fails in its proximity and in the region below it. This failure is due to the existence of a mathematical divergence as the noise intensity approaches zero in the analytical expressions obtained for the fluctuations and the local order parameter. The second approach proposed in this paper -S2PA-, on the contrary, leads to highly accurate results for all relevant variables and for all values of the parameters, whether below, within, or above the critical region, thus significantly improving on previous methods. Furthermore, this approach is not limited to stationary values, but rather it is able to provide highly accurate analytical expressions for their time evolution. Finally, this second approach allows us to show that finite size effects in a complex network can not be simply reduced to replacing the system size by an effective one, as most previous studies suggest [37,43,48,49,51].
Summing up, this paper contributes to the analysis of stochastic binary-state models in three main ways. First, it presents a general methodology with clearly identified and well-justified arguments and approximations, including an analytical assessment of their validity ranges. Second, our approach allows to find analytical expressions for the correlations, fluctuations and finite-size effects, as well as for the temporal evolution of all relevant variables, without needing to resort to crude adiabatic elimination of variables. Third, the results obtained for the noisy voter model represent a significant improvement in accuracy, being valid both above and below the critical point as well as in the critical region.
The paper is organized as follows: in Section II we introduce the stochastic model and define the main quantities of interest namely, the fluctuations of the density of nodes in a particular state and the density of active links. In Section III we revisit some of the previous theoretical treatments introduced for the study of the stochastic binarystate dynamics in the two limiting cases of a global-state approach and a node-state approach. In the same section we explain the main approximations required to close and eventually solve the dynamical equations for the moments and correlations of the relevant dynamical variables. In Section IV we introduce our stochastic description based on a master equation for a reduced set of dynamical variables. In this section we use the pair approximation to derive the hierarchy of equations for the moments and correlations of the dynamical variables. Two different closure schemes are presented and discussed in Sections V and VI based, respectively, on an expansion around the deterministic solution and around a stochastic dynamical attractor. The main results of both schemes as well as other approaches in the literature are compared against the results of numerical simulations in Section VII in the steady state. In Section VIII we extend our method to determine the dynamical evolution of the correlation function and its comparison with numerical simulations. Finally, in Section IX we present the main conclusions of our study. The technical details of the solutions of some equations are presented in A and C. In B we present a simple example using linear equations in which one of our closure schemes can be carried out in full detail. In D we explain the method of adiabatic elimination to obtain a partial solution of some of the variables under study.

II. MODEL
We consider an undirected network consisting of N nodes. Links between nodes are mapped into the usual (symmetric) adjacency matrix A = (A ij ) as A ij = 1 if nodes i and j are connected, and A ij = 0 otherwise. Connected nodes are "neighbors" of each other. Each node i = 1, . . . , N holds a time-dependent binary variable n i = 0, 1. We do not adopt in this paper any specific interpretation, but the state n i = 1 (resp. 0) will be labeled as "up" (resp. "down"), and a link between nodes in different states will be defined as active. A stochastic dynamics is introduced by which the node variable n i (t) can switch its state at time t, either 0 → 1 with rate r + i , or 1 → 0 with rate r − i . The rates r ± i (n) depend, in general, on the whole set of node states n = (n 1 , . . . , n N ) but are assumed to not depend explicitly on time. Although our method is rather general, we will use throughout the paper the noisy voter (Kirman) model rates as an archetypal example being k i = j A ij the degree (number of connected nodes) of node i. The term proportional to h represents a "herding" mechanism by which the transition rates are proportional to the fraction of neighbor nodes holding the opposite state, while the term proportional to a represents random jumps between states at a constant rate and is also known as the "noise term".
We introduce the probability P (n; t) of finding node configuration n at time t. This probability can be understood at different levels, as the detailed configuration n(t) depends on (i) the initial condition, (ii) the realization of the stochastic process, and (iii) the network realization A. For a given network configuration, let N k = #{i| j A ij = k} be the number of nodes with degree k. By definition, N k = 0 for k < k min and for k > k max , the minimum and maximum values of the degree for that given network. We introduce as the probability P k that a randomly chosen node has degree k, and the m−moment µ m of the degree distribution, and for brevity µ ≡ µ 1 . The variance of the degree distribution is σ 2 k = µ 2 − µ 2 and we introduce the degree heterogeneity κ = σ 2 k /µ 2 . We will focus on the time evolution of the first moment m(t) and variance σ 2 [m(t)] = m 2 (t) − m(t) 2 of the rescaled density of nodes in the up-state, as well as on the average ρ(t) of the density of active links. The precise definitions of these quantities are where L is the number of active links, and the denominator 1 2 N µ = 1 2 ij A ij is the total number of links. While the "magnetization" m(t) ∈ [−1, 1] describes the global state of the system, ρ(t) ∈ [0, 1] can be considered as a measure of disorder, corresponding the value ρ = 1/2 to a random distribution of states, and ρ = 0 to full order or consensus where all nodes are in the same state, either n i = 0 or n i = 1. All averages F [n(t)] are understood with respect to the probability P (n; t) F [n(t)] = n1=0,1;···;n N =0,1 F (n)P (n; t).
Results can be later averaged over the ensemble of networks with a given degree distribution, always after performing the average in Eq. 5. We also introduce the correlation functions · · · · · · · · · · · · · · · where ∆ i = n i − n i . As n 2 i = n i , note that C ii = n i − n i 2 and that

III. NODE-STATE AND GLOBAL-STATE APPROACHES
A. The node-state approach For general rates r ± i , the probability function satisfies the master equation [29,67] ∂P (n; t) being E i the step operator E ± i f (n 1 , ..., n i , ..., n N ) = f (n 1 , ..., n i ± 1, ..., n N ). From the master equation it is possible to get general evolution equations for the moments as [74] To proceed forward, one would need to solve these equations, introducing explicit expressions for the transition rates r ± i and finding solutions for n i (t) and C ij (t), which can then be introduced in equations (3), (8), and (9) in order to obtain expressions for the desired quantities m(t) , σ 2 [m(t)], and ρ(t) . When introducing explicit expressions for r ± i for any given model, however, one might find two different cases depending on whether the resulting system of equations is closed or not, which will determine the need for different types of approximations: • Non-closed system of equations: With the remarkable exception of the noisy voter model, the set of equations (11)(12) is, in general, not closed. In particular, when the explicit expressions for r ± i are introduced, the right-hand-side of the equations will contain higher-order correlation functions, C i1i2...im with m ≥ 3. While it is also possible to obtain evolution equations for these higher-order correlation functions, the resulting hierarchy of equations is, in principle, infinite. Thus, some approximations are needed in order to close this hierarchy. In general, there are two basic schemes for the closure of equations (11) and (12) (see [35] for alternative techniques): -Mean-field approximation: In the mean-field approximation, all moments are replaced by products of individual averages, e.g.
-System-size scaling hypothesis of the correlation functions: The usual van Kampen system-size expansion [29] splits a relevant dynamical variable as x(t) = x(t) + Ω 1/2 ξ(t), being Ω a large parameter (typically the system size or volume). This assumes that the dynamical variable scales as x(t) ∼ Ω. When trying to apply this approach to the master equation (10) one faces the fact the dynamical variables n i (t) = 0, 1 do not scale with system size N . A more sophisticated expansion was developed in Ref. [30] with the main ansatz that the correlation functions scale as C i1i2.
Once this ansatz is assumed, it turns out that one can still use the mean-field results (13) in the evolution equations for the mean values (11), thus finding for n i (t) the same result as the mean-field approximation. Regarding the correlation C ij , however, this approach leads to non-trivial evolution equations, allowing one to improve on the mean-field approach.
• Closed system of equations: In the particular case of the noisy voter model, and due to its linear transition rates, Eqs. (11) and (12) are already closed, and thus there is no need for any closure approximation. Of course, closing the equations is just the first step towards solving them. Typically, after closing the equations one finds factors of the form j A ij n j , whose complexity for a general lattice structure makes further analytical progress very difficult. These terms can be dealt with by assuming the: -Annealed approximation: This approach replaces the original adjacency matrix A ij by a complementary, weighted, fully connected adjacency matrix A ij , whose weights are proportional to the probability of each pair of nodes being connected [38][39][40][41]. For uncorrelated networks of the configuration ensemble this can be written as where the normalization of the weights is chosen so that both the degree sequence and the total number of links remain unchanged [37,[68][69][70]. This approximation reduces the aforementioned sums to being f j [n j , k j ] any function that depends on the degree k j of the node j and/or on the node variable n j . In the particular case that f j [n j , k j ] = f [k j ], a further simplification is possible in terms of the degree distribution P k = N k /N as There is one case in which the annealed approximation is exact. This is when the adjacency matrix is A ij = 1, ∀i, j, meaning that every node is connected to every other node, a situation known as all-to-all coupling or fully connected network. The results of the treatment in terms of the full set of node-states and the annealed approximation have been described in Ref. [42].
The main results for the noisy voter model using the mean-field and the annealed approximations will be compared in Section VII with our new treatment using a stochastic pair approximation scheme. Before that, let us explain in the next subsection yet another, very simple, approximation which is capable of capturing the main phenomenological features of the noisy voter model.

B. The global-state approach
Instead of the full state n = (n 1 , . . . , n N ), this approach proposes a much coarser description, in which the only relevant variable is the number of nodes in the up state N 1 = #{i|n i = 1}. This is clearly an important reduction with respect to the N variables needed in the n representation, leading to a much simpler analytical treatment but also to less accurate results in general network topologies and exact results only for fully connected networks. As far as the global variable N 1 is concerned, there are only two possible outcomes of the stochastic process: be the probability that the number of nodes in the up state takes the value N 1 at time t. It obeys the master equation where W + and W − are effective rates, depending only on the global variable N 1 , and the step operator now acts as . From here we derive equations for the average value and the second moment of the global variable as Some approximation is needed to find the effective rates W + and W − in such a way that they depend only on the N 1 variable and in a closed form. The simplest approximation is to assume that all nodes have the same rate of switching to the other state, such that we can replace the local density of nodes in the up state by the global density of nodes in the up state, leading to Note that in this effective-field approximation, the details of the network (any dependence on the connectivity matrix) are completely lost. In the case of a fully connected network, where each node is connected to every other node, the effective-field approximation is exact. Once the effective-field approximation has been assumed, no further approximations are needed, for after replacing Eqs. (21)(22) in Eqs. (18)(19), the evolution equations for the mean value m(t) and the variance The density of active links ρ can be calculated using the all-to-all ansatz as the probability of finding a node in the up state times the probability of finding one of its neighbors in the down state or vice versa ρ = 2 N1 N N −N1 In the steady state we have Further analysis shows that the noisy voter model has a transition separating regions in which the steady state probability P st (N 1 ) has a single peak at N 1 = N 2 and regions in which it has two peaks at N 1 = 0, N . The critical condition of the transition is thus that the probability distribution becomes flat P st (N 1 ) = 1 N +1 for N 1 = 0, . . . , N . This condition leads to a variance σ 2 st [m] = N +2 3N , from which we can obtain the critical value a c = h/N , such that P st (N 1 ) is double peaked for a < a c and single peaked for a > a c (see [60,62] for a different method of finding this critical value). These predictions for the transition point and the detailed dependence of σ 2 st [m] and ρ st on the system parameters will be analyzed in the next sections.
As the critical value a c = h/N depends on N , the single to double peaked transition observed for P st (N 1 ) is said to be a finite-size-transition [71]. Certainly, the transition disappears in the thermodynamic limit N → ∞ where, for all values of a, the distribution P st (N 1 ) is always single peaked. Therefore, from the perspective of a strict statistical mechanics formulation, the transition does not exist. Nevertheless, the behavior near a = 0 bears many similarities with a true phase transition, as there are valid scaling laws, divergence of fluctuations with characteristic exponents, critical slowing down, etc., as we will show in the next sections. For this reason, and being clearly an abuse of language, it is customary to refer to the small a ∼ h/N values as the "critical region". It is also possible to identify the critical dimension d c = 2 above which the "critical" exponents are independent of dimension [61].
As shown in Eq. (26), which, we recall, is exact in the all-to-all coupling scenario, when As the critical dimension of this model is d c = 2, this scaling with system size is the correct one in the all-to-all coupling. This simple global approach captures one of the main difficulties we will encounter when closing the equations for the moments in more complicated setups, namely, the fact that the fluctuations scale differently near the critical point and far away from it. A standard van Kampen approach [29] consisting in splitting m(t) in deterministic φ(t) and stochastic ξ(t) contributions m(t) = φ(t) + N −1/2 ξ(t), will lead invariably to fluctuations of the order σ[m] = O(N −1/2 ), and it is hence not appropriate near the critical point.

IV. STOCHASTIC PAIR APPROXIMATION
In this section, we introduce the stochastic pair approximation, starting from a master equation description of the dynamics and using an intermediate level of detail in our characterization, in-between the node-state and the globalstate approaches. Our method is a full stochastic treatment of the pair approximation considered by Vazquez and Eguíluz [43] in their study of the (noiseless) voter model. We elaborate further on this approach and we apply it to the case of the noisy voter model. We derive in this section the evolution equations for the moments and correlations of the dynamical variables and leave for the next two sections the closure of the hierarchy and the solution of the equations.
As explained, the starting point is not the full node-state configuration n = (n 1 , . . . , n N ), but a reduced set of variables {N 1 , L}, where L is the number of active links and N 1 = (N 1,kmin , N 1,kmin+1 , . . . , N 1,kmax ) with N 1,k the number of nodes in the up state with degree k. The total number of variables in this description is thus k max −k min +2, still much smaller than the N variables needed in the n representation.
In the elementary process n i = 0 → n i = 1, we have N 1,ki → N 1,ki + 1, being k i the degree of node i. At the same time, the number of active links L can vary in different amounts L → L + k i − 2q i depending on the number q i ∈ (0, k i ) of neighbors of node i in the up state. Similarly, in the elementary process n i = 1 → n i = 0 it is N 1,ki → N 1,ki − 1 and L can vary in different amounts L → L − k i + 2q i . Introducing P (N 1 , L; t) as the probability of the state {N 1 , L} at time t, we derive the master equation with E N 1,k and E L the step operators acting on N 1,k and L, respectively. From this master equation and following standard techniques, one can derive the equations for the moments and correlations of the rescaled variables m k = 2 where As in the global state approach, some approximation is needed in order to find the effective rates W (±,k,q) that appear in the master equation. Writing q i as q and k i as k, the rates of the elementary processes n i = 0 → n i = 1 and n i = 1 → n i = 0 are, respectively, Therefore, the total rate of the process N 1,k → N 1,k + 1, L → L + k − 2q is calculated as the product of the elementary rate R + k,q , the number of nodes in the down state among the population with degree k, and the fraction (probability) of the population of nodes in the down state with degree k that have q neighbors in the up state P 0 (k, q), i.e.
A similar reasoning leads to the total rate of the process In general, those probabilities depend on the detailed configuration n and adjacency matrix A, but we need to write them in terms only of the description variables {N 1 , L}. We then make the pair approximation [43] where is the ratio between the total number of active links and the total number of links connected to nodes in state n = 0, 1. This c 0/1 can be interpreted as the single event probability that a node in the state 0/1 has a neighbor in the opposite state 1/0. The pair approximation assumes that the probability of having q nodes in the up state among the k nodes connected to a node in state 0/1 is a sequence of q independent processes and thus binomial. The pair approximation is known to be accurate for uncorrelated networks [17,43,48].
For the noisy voter model, encoded in the transition rates (39) and (40) using the known moments of the binomial distribution k q=0 qP 0/1 (k, q) = kc 0/1 and k q=0 q 2 P 0/1 (k, q) = kc 0/1 (1+(k−1)c 0/1 ), after lengthy but straightforward algebra we arrive at with the link (or "degree-weighted") magnetization m L defined as The complete expression for G ρ being too long, only a simplified version valid in the stationary state will be displayed later [see Eq. (70)]. Eqs. (29)(30)(31)(32)(33) together with Eqs. (46)(47)(48)(49)(50) are the basis of our subsequent analysis. They are not yet closed. Note, however, that m L (t) satisfies the exact equation whose solution m L (t) = m L (0) e −2at indicates that m L st = 0. Remarkably, m L is self-governed and fulfills an equation equivalent to m in the effective-field approximation [see Eq. (23)]. This will not be the only similarity between these two approximations and variables, as we will show in section VI B.

A. The deterministic solution
In order to close and eventually solve Eqs. (29)(30)(31)(32)(33), we first study the deterministic solution using a mean-field assumption, i.e., neglecting all correlations. Introducing the notation φ , and substituting Eqs. (46) and (47) into Eqs. (29) and (30), we findφ where the dot indicates a time derivative. The equation for φ would be identical to Eq. (52) with the exception of changing φ k → φ. This system of equations has only one (stable) fixed point φ k = 0, φ ρ = ξ, where ξ is the positive solution of the quadratic equation For a = 0 this is ξ = µ−2 2(µ−1) . The other solution (unstable) of Eq. (54) is negative and thus non-physical, except in the case a = 0, which is φ ρ = 0. The stability of the fixed point φ k = 0 and φ ρ = ξ can be assessed using the elements of the Jacobian matrix J, defined as J a,b = − ∂φa For future reference, let us also mention here the coefficients of the Hessian matrices H, evaluated at the fixed point, Note that the cross-terms J k,ρ and J ρ,k are zero, which means that the deterministic dynamics of φ k and φ ρ are uncoupled -at least in the linear regime-. The eigenvalues of the J matrix (obviating the trivial φ ρ part) are (i) 2a (slow) and (ii) 2(a + hξ) (fast). Both of them are positive, which means that the fixed point is a stable node (note that the Jacobian matrix has been defined with a minus sign). The first eigenvalue has multiplicity one while the second one has multiplicity d − 1, where d = k max − k min + 1 is the dimensionality of the space φ = {φ k }. The corresponding eigenvectors are (i) v 1 = (1, ...., 1) and (ii) the vectors v 2 contained in the plane p · v 2 = 0, where p = (k min P kmin , ..., k max P kmax )/µ is the normal vector to the plane. The linear deterministic dynamics is then given by (0), which has a clear geometric interpretation. As shown in Fig. 1, in the early stages of the dynamics, the fast part dominates and trajectories evolve close to the parallel plane p · φ(t) = φ L (0). Later, the slow part plays the important role and trajectories bend following the direction v 1 , an effect which is more pronounced the larger the separation of time scales is.

B. Expansion around the deterministic solution
The first approach that we will consider to close and solve Eqs. (29-33) is a system size expansion around the deterministic trajectory. We term this approach as S1PA. In the spirit of van Kampen's expansion [29], we propose here the expansion where γ k , γ ρ , η k , η ρ are stochastic processes, while φ k , φ ρ are the deterministic terms as given by the solution of Eqs. (52) and (53). If we introduce the ansatz (63) and (64) into Eqs. (29)(30)(31)(32)(33) and equate powers of N , we are able to close the system of equations for the moments [75]. From now on, we restrict ourselves to the stationary state and set all time derivatives to zero. For the stochastic variables γ k and γ ρ we find γ k st = 0 and γ ρ st = 0 for the average values, and the following set of equations for the correlation matrix with G k , G k,ρ and G ρ evaluated at the steady state Although the system of linear equations (65-67) might be difficult to solve in general, the decoupling J k,ρ = J ρ,k = G st k,ρ = 0, which can be shown to be a general consequence of the up-down symmetry in the rates R − k,q = R + k,k−q , in addition to the simplicity of the terms J k,k for the noisy voter model, simplifies significantly the problem. As shown in A, the solution is The variance of the global magnetization m can now be obtained from Eq. (8), expressed in the form where we have defined an effective system size N eff ≡ N µ 2 /µ 2 . For a given finite network, N eff adopts a finite value.
In the large N limit, however, N eff can be characterized by a non-trivial dependence on N , as the moments of the distribution can diverge. For instance, in a Barabási-Albert network it is N eff ∼ N/ log N . From Eq. (74) we obtain that, in the limit a → 0, the fluctuations diverge as N σ 2 st [m] ∼ hξ a µ2 µ 2 . In the same limit, the correlations behave as C k,k ∼ N σ 2 st [m], independent of k, k , which is strongly related to the geometric picture of the linear deterministic dynamics that we described earlier, where variables fluctuate along the slow direction v 1 with an amplitude N 1/2 σ st [m]. In a homogeneous network, where µ 2 = µ 2 , we find N σ 2 st [m] = a+hξ a , which in the limit N → ∞ coincides with the effective-field result in Eq. (26) whenever ξ = 1/2, which happens, for instance, in the limit of large connectivity µ → ∞. On the other hand, below or around the critical point, when a = O(N −1 ), the van Kampen expansion fails to reproduce the all-to-all result and, hence, it provides the wrong scaling dependence with system size near the critical region (as can be seen in Fig. 2b). The critical point can be obtained from the condition of a flat probability distribution of the magnetization, σ 2 st [m] = N +2 3N , which leads to It is also possible to calculate the O(N −1 ) stochastic corrections to the average value of the description variables defined in Eqs. (63) and (64), which are led by η k st and η ρ st . In the stationary state, the equations for η k st and Again, due to the particularities of the model, these equations greatly simplify, leading to from where we obtain η k st = 0, a result arising again from the up-down symmetry of the model, and where D is given by Eq. (A5). Finally, the density of active links in the stationary state is In the all-to-all limit µ 2 → µ 2 → ∞, we have η ρ st = − h 4a and ρ st coincides with the effective-field result in Eq. (27) to the order O(N −1 ). On the other hand, in the limit of a → 0 we have which leads to unphysical results for the density of active links, as, by definition, it should be ρ ≥ 0. In Fig. 2 we test the validity of these analytical results for a wide range of values of a for an Erdös-Rényi network with N = 2500 nodes and average connectivity µ = 4 (a more detailed comparison for other network types is postponed  26) and (27), the stochastic pair approximation combined with a van Kampen expansion (S1PA), Eqs. (74) and (81), and the prediction for the density of active links which is obtained neglecting fluctuations, ρ st = ξ from Eq. (55).
to the next section). The results in this figure suggest that the expansion S1PA around the deterministic solution works perfectly for a a c = hN −1 , while it fails below the critical point a a c , where it is not able to reproduce even qualitatively the observed dependence. We also plot the prediction for the density of active links which is obtained neglecting fluctuations, ρ st = ξ, which performs very poorly, specially for small a. In the same panel we show the results of the simple global approach, Eqs. (25) and (26), which is surprisingly accurate for the variance of the magnetization but not so good for the density of active links. Nevertheless, we note that the global approach, equivalent to the all-to-all approximation, completely disregards the details of the network connectivity and, as shown in the the next sections, is not capable of explaining the differences observed between different networks. The reason for the failure near the critical region of the stochastic pair approximation combined with a van Kampen expansion lies in the ansatz stated in Eqs. (63) and (64), which requires σ 2 [m] to scale as ∼ N −1 and also to be small enough for the expansion to be accurate, which is only true in the limit a a c [see Eq. (26)]. This is strongly related to the associated deterministic dynamics and to the slow eigendirection v 1 having an infinite time scale as a → 0, which eventually leads to large fluctuations following this direction. If we want an expansion to be also valid in the critical region, we have to explore the full set of nonlinear deterministic dynamical equations (52-53) and find a solution (in this case a privileged slow trajectory) around which the dynamics has a finite time scale in the entire parameter region. Then, we can propose an expansion similar to Eqs. (63) and (64) around this solution. This is the method that we are going to apply in the next section and whose procedure is made clear in B with a simple example of a set of linear equations.

A. The dynamical attractor
As explained before, the second method we propose to tackle Eqs. (29)(30)(31)(32)(33) starts by finding a particular deterministic solution around which we can include stochastic corrections. The solution of Eqs. (52) and (53) depends on the initial conditions φ k (0), φ ρ (0) besides the time t. We are interested, however, in a particular solution that we will name φ * k (t), φ * ρ (t) and that depends on time only through φ L (t), namely φ * k (φ L (t)), φ * ρ (φ L (t)), i.e., in a particular solution such that the system reacts instantly to the self-governed variable φ L (t) = φ L (0)e −2at . This solution has to fulfill the constraints φ * = ±1 and φ * ρ = 0 when φ L = ±1, which arise from the definition of the variables when all the nodes are up or down. By inspection, we note that φ * k = φ L is an exact solution of Eq. (52) that fulfills these constraints, while the dependence φ * ρ (φ L ) is not so trivial. Introducing φ = φ L in Eq. (53) and changing the time derivative a Riccati type of equation whose exact solution, carried out in C, depends on a combination of Gauss hypergeometric functions. A simpler form, more amenable to analytical treatment, is also derived in C as In order to show that this simpler φ * ρ is a stable attractor of the deterministic dynamics, we linearize Eqs. (52) and (53) around the particular trajectory with δ(t) = k P k δ k (t) and J ρ,ρ given by Eq. (58). The solution of these linear equations is where c is a constant determined by the initial conditions. The time scale of δ k is fast (compared to the time scale of φ L (t)) and corresponds to the second eigenvalue calculated in Section V A, while δ ρ has three terms: the first two are fast and the last one is slow (for a 1). It is important to note that this last slow term appears because we used the approximate expression of the trajectory in Eq. (84), instead of the exact one. According to the discussion in C, this term can be neglected, since it is of order O(N −1 ) as long as the approximation is accurate. We thereby conclude that this trajectory is an attractor of the dynamics and that the attractor is reached in a time scale which is longer than the time scale within the attractor. This approach captures a critical feature that the expansion around the deterministic solution overlooked. While φ k (t) is exactly the same for both approaches, φ ρ (t) depends on φ L (t) following Eq. (84), and so the dynamics are not uncoupled as in the previous case.

B. Expansion around the dynamical attractor
In Figs. 3a and 3b we plot, respectively, m(t) and ρ(t) versus m L (t) for different realizations of the stochastic process in different network structures. It is apparent from this figure that m(t) and ρ(t) fluctuate around the dynamical attractor m(t) = m L (t), ρ(t) = φ * ρ (m L (t)). Based upon this picture of the phase space, our second approach to solve Eqs. (29)(30)(31)(32)(33), which we name S2PA, consists in splitting the m k (t) and ρ(t) variables into the dynamical attractor contribution plus additional fluctuations as where m L , ε k and ε ρ are all stochastic variables. According to the discussion above, section VI A, ε k and ε ρ are fast variables and we assume that they can be approximated by a van Kampen type expansion δ k and δ ρ being the deterministic equations (87) and (89), while λ k ,λ ρ , ν k are stochastic. The order O(N −1 ) of ε ρ will not be necessary for the expansion procedure in this case. We now proceed to find the statistical properties of ε k and ε ρ . Introducing Eqs. (90-93) in the equations for the moments (29-33) and equating powers of N , we find for the average values and for the correlations, where, for simplicity and brevity of the expressions, we have assumed that the deterministic contributions δ k (t), δ ρ (t) are well captured by the linear system of equations (85) and (86) and that δ k (0) = δ ρ (0) = 0. An important difference with respect to the expansion around the deterministic solution, Eqs. (63)(64), is that the equations for the moments are not closed, as indicated by the presence of the last term in Eq. (98). Nevertheless, we will argue later on in this same section that this term can be neglected at this level of approximation. From these equations, it follows directly that λ k st = λ ρ st = λ k m L st = 0. Furthermore, due to the up-down symmetry of the model, we know that ν k st = 0, and thus it follows that λ k λρ 1−m 2 L st = 0. Performing in Eq. (98) the double sum k,k kk P k P k and taking into account that, by definition, k kP k ε k (t) = 0 (the same identity holds for δ k (t), λ k (t) and ν k (t) separately), we find which, together with Eq. (51), proves that m L (t) behaves as an autonomous stochastic variable. Furthermore, its first and second moments fulfill equivalent equations to those of m(t) in the all-to-all scenario [see Eqs. (23) and (24)] if we change h → 2hξ and N → N eff . The stationary solution reads and the transient regime Note that aN m 2 L st = O(1), implying that the order of Eq. (98) is consistently O(1) and also supporting the reasoning of C. As shown in D, it is also possible to obtain these statistical properties starting from a closed master equation for the link magnetization m L (t).
Performing in Eq. (98) the partial sum k k P k we obtain an expression for ν k m L that depends on m 2 L , We now set A justification of this assumption relies on the use of the van Kampen expansion Eqs. (63,64), that determines that at order O(N −1/2 ) the average of Eq.(103) is proportional to γ k γ k γ ρ , which is zero as the stochastic variables γ a are Gaussian [54]. This analysis is strictly valid only in the regime of large a a c , but in Fig. 4 we provide numerical evidence of its validity in the whole range of parameters of a. We observe that the assumption works very well for the Erdös-Rényi and Barabási-Albert networks, while for the dichotomous network there is a narrow parameter region around the critical point where the average Eq.(103) is non-zero but still negligible. Once this term has been neglected, the solution of Eq. (102) in the stationary state is Replacing this result in Eq. (98), we obtain the following correlations in the stationary state We have now all necessary ingredients to calculate the first and second moments of the global magnetization m = k P k m k , both in the stationary and transient regimes. For example, the stationary variance is which leads to The critical point is found by solving σ 2 st [m] = N +2 3N , taking into account that ξ depends on a, The average value of ρ can be determined in the stationary regime as At variance with the results of the expansion around the deterministic solution [see Eqs. (81) and (82)], the above expression is not divergent for a → 0. It is important to stress the similarities and differences of our stochastic pair approximation method with the one developed in Ref. [43] for the noiseless voter model. In that paper the authors use a master equation approach for the link-magnetization m L which can readily be extended to the noisy case (see D) yielding for m L the same statistical properties as the ones obtained by our more sophisticated approach. However, the authors of Ref. [43] make, implicitly, the extra assumption that the link magnetization m L and the node magnetization m share the same statistical properties. To check the validity of this assumption, and using a function of the degree heterogeneity and the parameters of the model (the ratio of time scales). From this expression we conclude that it is true that in the noiseless voter model, with a = 0, (m − m L ) 2 st becomes zero (rather, of order N −1 , which is the accuracy of our expansion), validating the approach of Ref. [43] in that case. Remarkably, in the limit of a homogeneous network, with N eff = N , the statistical properties of m and m L also coincide, consistently with the fact that m and m L are the same quantity for those networks. Furthermore, in the critical zone a = O(N −1 ) and for a heterogeneous network we obtain that (m−m L ) 2 st = O(N −1 ), meaning that when there is time scale separation m does not differ significantly from m L . Our analysis shows that in the noisy voter model the assumption that the statistical properties of the magnetization are those of the link magnetization can not be held for heterogeneous networks far away from the critical region, while for the voter model without noise (a = 0) the statistical differences between both quantities turn out to be negligible. The relevance of this difference for other stochastic processes defined in heterogeneous networks is under study.

VII. COMPARISON WITH NUMERICAL SIMULATIONS
We now compare the different theoretical approaches based on the stochastic pair approximation and others with the results of numerical simulations. We focus on the steady state values σ 2 st [m] and ρ st as functions of the noise parameter a and the degree heterogeneity κ ≡ µ 2 /µ 2 − 1, for six different network structures keeping a constant value for the average connectivity µ = 4 and the total number of nodes N = 2500, namely: 1) an Erdös-Rényi random network with a fixed number of links µN/2 that are connected randomly between pairs of nodes, thus leading to a degree heterogeneity κ = 0.19; 2) a Barabási-Albert preferential-attachment network, with a power-law degree distribution P k ∼ k −3 and a degree heterogeneity κ = 3.05; 3) a dichotomous network where a fraction 23/24 of nodes has k = 2 neighbors and the remaining fraction 1/24 has k = 50 neighbors, leading to a degree heterogeneity κ = 5.75; 4) a z-regular network where each node has exactly k = 4 neighbors but the connections amongst them are chosen at random; 5) a one-dimensional (1D) linear lattice where a node is connected to its closest 4 neighbors (2 to the right and 2 to the left); 6) and a two-dimensional square lattice (2D) where each node is connected to its 4 nearest neighbors.
The last three of these networks are called "homogeneous", as κ = 0. For each one of these networks we plot the theoretical predictions for the variance of the magnetization σ 2 st [m] and the average density of active links ρ st in the steady state as given by: a) the global-state-approach as presented in Eqs. (26) and (27), which is independent of all network details (the comparison has already been presented in Fig. 2); b) the annealed-approximation as developed in Ref. [42]; c) the expansion around the deterministic solution (S1PA) as given by Eqs. (74) and (81); d) the expansion around the dynamical attractor (S2PA) as given by Eqs. (107) and (109); e) and, for comparison, we have also included the variance σ 2 st [m L ], which would have been the result of extending the theory developed at Ref. [43] to the noisy voter model, neglecting the difference between the magnetization m(t) and the link-magnetization m L (t).
In Fig. 5 we plot N σ 2 st [m] as a function of the noise parameter a. For clarity, we have split the comparison in several panels. We can appreciate in Figs. 5b and 5d how S2PA reproduces very accurately the numerical results for the heterogeneous and the z-regular networks in the whole range of values of a, both above and below the critical point a c . The results of S1PA are indistinguishable from those of S2PA above the critical point a > a c , while below it a < a c the S1PA prediction for N σ 2 st [m] diverges as a → 0, at odds with S2PA and the numerical simulations. We expect a full agreement between these two approaches only in the thermodynamic limit, in which N → ∞ with a fixed, as Eq. (107) and Eq. (74) would then coincide. The data is also analyzed in Fig. 6 as a function of the degree heterogeneity κ. While for a > a c (Fig. 6a) is proportional to the degree heterogeneity in accordance with Eq. (74), for a < a c (Fig. 6b) the dependence is not linear but well captured by Eq. (107). On the other hand, only below the critical point a < a c or for homogeneous networks κ → 0, where the difference between m and m L is irrelevant, as discussed in the previous section. The annealed approximation developed in Ref. [42] reproduces correctly both the dependence on a and κ, but it is less accurate in general than the pair approximation (S2PA). The global state approach Eq. (26) has the correct scaling, but it is accurate only for networks with low levels of heterogeneity and does not predict any dependence with κ. In the limit µ 2 = µ 2 → ∞, and for all values of a below and above the critical point, the relative difference between Eqs. (107) and (26) is of order N −1 . We also check the predicted values for the critical point a c in Fig. 7. Again, the S2PA approach offers the best fit to the numerical data, while the S1PA and the annealed approximations coincide for all values of the degree heterogeneity.
In Fig. 8 we plot the results of ρ st as a function of a. We have split the comparison for heterogeneous random (Fig. 8a) and homogeneous (Fig. 8b) networks. We can observe again a very high degree of accuracy of S2PA for random networks in the whole range of values of a. The results using the expansion around the deterministic solution S1PA are accurate above the critical point, but below it they show a divergence leading to non-physical results ρ < 0. Comparing with other treatments, such as the annealed and all-to-all approximations, it is clear that the stochastic pair approximation S2PA developed here perfectly captures the shape of the curve ρ st vs a, while the annealed approximation discussed in Ref. [42] fails even qualitatively. This failure of the annealed and all-to-all approximations is directly related to the fact that the average degree µ is finite, as one can prove that in the limit κ → 0 Eq. (109) and Eq. (27) coincide only when µ → ∞ and thus ξ → 1/2. Note that the expression of S2PA for ρ st given at Eq. (109) is independent on whether the spin m and link m L magnetizations are assumed to be equal or not, and thus this approximation leads to the same result as the extension of the theory developed at Ref. [43] for ρ st .
Both the variance of the magnetization and the average density of active links show significant discrepancies between the results of numerical simulations and the proposed analytical methods in the case of regular networks of low dimensionality d = 1, 2. This is specially notorious for the 1D case, where even the scaling m 2 . This is the reason why there is no finite size critical point in Fig. 7, as defined previously, for the 1D and 2D cases. As the critical dimension of the noisy voter model is d = 2, for any dimension above this value the same scaling properties are expected, in accordance to the all-to-all result. A detailed analysis of the finite-size scaling for regular lattices will be published elsewhere [72].

VIII. TIME DEPENDENCE
The method S2PA, developed in the previous sections, allows us to determine the time evolution of m(t) and ρ(t) . From the construction of the variables we have m k (t) = δ k (t) + m L (t) , which leads to assuming that we start with a fixed initial condition m(0) and m L (0). A preferable quantity to measure this time evolution, that does not depend on the initial conditions, is the stationary autocorrelation function Figs. 5b and 5d show the comparison with the methods developed in this paper, S1PA and S2PA, for heterogeneous (Fig. 5b) and homogeneous (Fig. 5d) networks, while Figs. 5a and 5c show the corresponding comparison for the annealed approximation of Ref. [42], and the results obtained neglecting the difference between the magnetization m and the link magnetization mL. Dots correspond to numerical simulations while lines are analytical results according to the legend.
The existence of two exponentials in this expression is directly related to the degree heterogeneity, as already pointed out with the annealed approximation in Ref. [42]. In Fig. 9 we plot the time evolution of K st [m](t) for the two extreme cases of a network with no degree heterogeneity (z-regular random network) and a highly heterogeneous degree distribution (dichotomous network). While for the z-regular network there is only one slow exponential, for the dichotomous network the early stages are dominated by the fast exponential and later the slow part dominates, in accordance to Eq. (112). Hence, by fitting a double exponential to the autocorrelation data we are then capable of obtaining information about the network heterogeneity.
While it has not been studied before in the context of the noisy version of the model, the time evolution of the interface density of active links ρ is commonly used to describe the dynamics of the (noiseless, a = 0) voter model. In that case, the voter model is characterized by the existence of two absorbing states (m = −1 and m = 1), both of them corresponding to full order or consensus (ρ = 0). Therefore, the focus is on how the system approaches consensus and if it does so in the thermodynamic limit. In particular, the ordering process is found to be determined by the dimensionality d of the underlying topology. For d ≤ 2, and in the infinite size limit, the system orders by a coarsening process, i.e., by a continuous and unbounded growth of spatial domains of one of the opinions. The ensemble average interface density decays as ρ ∼ t −1/2 for d = 1 and as ρ ∼ (log t) −1 for d = 2. After an initial transient, infinite systems with d > 2 are observed to fall into a metastable, highly disordered state (0 < ρ < 1/2)  75) and (108) for S1PA and S2PA, respectively. The annealed network approximation coincides with S1PA.
where coarsening processes have stopped. In these cases, the average interface density behaves as ρ ∼ b − c · t −d/2 , and thus complete order is never reached. Finite-size systems, on the contrary, are always led to complete order by finite-size fluctuations. As a consequence, finite systems follow the above described behaviors only for a finite time, after which an exponential decay to complete order is observed. Notably, it has been shown that this characteristic decay time is proportional to the effective system size N eff (for d > 2).
According to the theoretical description presented in the previous Section VI, the time dependence of ρ(t) can be determined as where δ ρ (0) = ρ(0) − ξ and, for simplicity, we assume m k (0) = m L (0) = 0 so that the term δ ρ (0)e −Jρ,ρt is the only one that remains from Eq. (89). The form of this expression is in line with the qualitative description given in Ref. [64], where the average interface density is said to be characterized by a short initial transient after which a plateau is reached at a value ρ * = ξ > 0. In a more quantitative description, Eq. (113) contains the initial transient δ ρ (0)e −Jρ,ρt , the exponential decay due to finite size effects after the plateau ξ m 2 L st e −t/τ with τ = N eff 4(aN eff +hξ) and the final steady state described by Eq. (109). Note how the decay time of the initial transient is independent of the degree heterogeneity, unlike the one due to finite size effects. For a < a c , τ ∼ N eff 1/J ρ,ρ , whereas τ and 1/J ρ,ρ are both of the order N 0 for a > a c . The difference between the plateau and the steady state ξ m L (t) 2 is of order 1/N for a > a c and of order aN for a < a c .
Numerical results for the time evolution of the average order parameter are presented in Fig. 10 for the six different networks studied. Note that, due to their particular ordering dynamics and for comparison with the voter model, we have included both one-and two-dimensional regular networks. In this figure, we can clearly distinguish the two stages in the dynamics of the average interface density identified in Ref. [64] and described by Eq. (113): an initial ordering transient, with a significant decrease of the order parameter from its initial, close to complete disorder value ρ = 1/2, corresponding to a random initial distribution of states; and a final steady state, where the order parameter fluctuates around a characteristic level of disorder. As shown in the figure, for a < a c the initial ordering behavior of the noisy voter model is remarkably similar to the ordering dynamics of the voter model described above: a power-law decay as ρ ∼ t −1/2 for d = 1 (regular 1D), a logarithmic decay as ρ ∼ (ln t) −1 for d = 2 (regular 2D), and a fast decay to the plateau followed by a slow exponential decay ρ ∼ ρ * e −t/τ for d > 2 (z-regular, Erdös-Rényi, Barabási-Albert, dichotomous). Therefore, as in the voter model, the initial ordering of systems with dimension d ≤ 2 is characterized by a coarsening process with growth of spacial domains, while the ordering of systems with dimension d > 2 occurs mainly by finite-size fluctuations after a period in a metastable, highly disordered state. We can also note that, apart from coarsening, finite-size fluctuations are present too for d ≤ 2, being responsible for the exponential decay interrupting the coarsening process. It is important to note, however, that, while in the voter model any ordering process in a finite system is a transient towards complete order, in the noisy voter model it is a transient towards a non-zero steady state of the order parameter. Thus, the voter model-like initial ordering of the noisy voter model is interrupted whenever the average order parameter reaches its steady state value. In general, the behavior for a > a c is quite different, as one can notice from the figure. This is because the influence of finite size effects on ρ becomes weaker as a is increased. Thus, the value at the plateau ρ = ρ * is closer to the steady state solution ρ st and the time scale of the fast initial transient is not that different from the one of finite size effects τ , which mixes the two behaviors. Furthermore, for d ≤ 2 the previous voter-like time dependence is no longer true.

IX. CONCLUSIONS
We have developed a full stochastic approach to the pair approximation scheme to study binary state dynamics on heterogeneous networks. Our starting point is the master equation for the set of variables {L, N 1,k }, being N 1,k the number of nodes with degree k in the state up and L the number of active links. The effective rates of the master equation are obtained using the pair approximation as discussed in Ref. [43] in the context of the noiseless voter model. For the noisy voter model, however, we show that the spin magnetization and the link magnetization cannot be identified. Once the equations for the time evolution of the moments and correlations are formulated, we proceed to close the hierarchy by using two different closure schemes. The first one, S1PA, is based on a van Kampen type of expansion around the deterministic solution of the dynamics corresponding to the thermodynamic limit, while the second one, S2PA, is based on an expansion around a stochastic dynamical attractor.
In the first approach, we expand the solution around the deterministic fixed point. In this way we are able to derive a linear set of equations for the correlation matrix and first corrections to the average values. Although our methodology is very general for any set of rates, we have focused on the noisy voter model, for which we have been able to carry out a full analytical treatment. This first analysis, based on the ansatz that the fluctuations of the magnetization scale with system size as σ 2 st [m] ∼ N −1 , predicts that the fluctuations depend linearly on the degree heterogeneity of the network. We also obtained an expression for the average density of active links ρ st . Both expressions have a mathematical divergence as the noise intensity approaches a → 0. Compared to numerical simulations, these results turn out to be very accurate well above the critical point of the noisy voter model, a/h N −1 , but fail below or close to it. The failure of this approach is ultimately linked to the ansatz of the scaling of the correlations with system size. As we approach the critical point a → 0, it is not true that fluctuations diverge but there is a change in the scaling with N , in the noisy voter case becoming σ 2 st [m] ∼ N 0 . Our second approach is based on a system size expansion around a stochastic dynamical attractor. Although this method turns out to be more precise than the expansion around the deterministic solution and fits much better the numerical data for the noisy voter model, it is much more involved and its applicability must be analyzed for the particular model under study. Exploring the phase space of deterministic solutions, we were able to identify a special trajectory m(t) = m L (t) and ρ(t) ≈ ξ(1 − m L (t) 2 ) (m L being the link magnetization) around which the dynamics is fast and the fluctuations do fulfill a van Kampen type of scaling, namely σ 2 st [m − m L ] ∼ N −1 , for all values of a. Using this expansion we are able to obtain expressions for all the desired quantities, not limited to the stationary values but including the dynamics of ρ(t) and m(t) , which match perfectly compared to numerical simulations and work better than previous approaches to the same model, such as all-to-all or annealed network approximations. We highlight that the results of σ 2 st [m L ] are the same as the all-to-all network if we rescale the herding as h → 2hξ and the system-size as N → N eff = N µ 2 µ2 in terms of the first µ and second µ 2 moments of the degree distribution, a result that was previously reported for the noiseless voter model. However, due to the effect of noise a = 0, it is not true that σ 2 st [m] ≈ σ 2 st [m L ], which implies that finite size effects in a complex network can not be reduced to replacing the system size N by an effective one N eff .
Appendix A: Solution of Eqs. (65)(66)(67) Replacing the Jacobian Eqs. (56)(57)(58) and the steady state value Eqs. (68)(69) into Eqs. (65)(66)(67), these latter can be reduced to From Eq. (A3) we get C ρ,ρ = G ρ /2J ρ,ρ , while from Eq. (A2), by inspection, we note that C k,ρ is a constant that does not depend on k. Reinserting this independence of k back into the equation, we notice that the only possibility is the trivial solution C k,ρ = 0. Eq. (A1) is more involved. Let us define D k = k P k k µ C k,k and D = k P k k µ D k , so the equation becomes We now perform the double sum k P k k µ k P k k µ , obtaining in this way a closed relation for D If we perform instead the single sum k P k k µ , we obtain an expression for D k as a function of D and we then solve C k,k Finally, replacing Eq. (A5) into Eq. (A6) and this latter into Eq. (A7), we obtain Eq. (73).

Appendix B: A linear noise example
Consider the simple example of two coupled linear stochastic equations for variables x(t) and y(t) where ξ x,y (t) are two uncorrelated Gaussian white noise variables ξ a (t)ξ b (t ) = δ a,b δ(t − t ). The x variable is independent of y and has a stationary variance x 2 st = τ x /(2N ). This variance fulfills the van Kampen ansatz as long as τ x = O(1), while when the time scale diverges τ x = O(N α ) with 0 ≤ α ≤ 1, the variance has an anomalous scaling x 2 st = O(N α−1 ). The deterministic part of the equation has the solution y(t) = x(t) which is an attractor of trajectories. We then propose to split the y variable as y(t) = x(t) + ε(t), with two stochastic parts: a slow one x(t), based on the solution of the deterministic attractor, and a fast one ε(t). Introducing this change of variables in Eq. (B2), we findε The equation is with α = h a (1 − 2/µ) − 2, β = 2h a (1 − 1/µ) and boundary conditions φ * ρ (φ L = ±1) = 0. Note that, the value φ * ρ (φ L = 0) ≡ y 0 is determined by the (positive) solution of 1 + αy 0 − βy 2 0 = 0 or φ * ρ (0) = ξ, where ξ is given by Eq. (54).
To find the general solution of the Ricatti-type equation (C1) we follow a standard procedure. We first introduce the change of variables s = φ 2 L , which reduces the equation to leads to the second-order linear differential equation whose solution is z(s) = Cs λ1 F (λ 1 , 1 + λ 2 ; 1 + λ 1 + λ 2 ; s) + s −λ2 F (−λ 2 , 1 − λ 1 ; 1 − λ 1 − λ 2 ; s) , where C is an integration constant (another irrelevant global multiplicative constant has been set equal to one). Note that F (a, b; c; s) is the Gauss hypergeometric function and that we have introduced the notation We note that, independently of the value of the constant C, the solution verifies φ * ρ (0) = ξ. The constant C is obtained by demanding that φ * ρ (1) = 0. Using the known expansion [73] of the hypergeometric function which is valid for s → 1, we arrive at The solution is finally written as where G(s) = − sz (s) λ 2 z(s) .
For a more explicit expression of φ * ρ (φ L ) in terms of hypergeometric functions we could perform the derivative z (s) = dz(s) ds using dF (a, b; c; s) ds = ab c F (a + 1, b + 1; c + 1; s) .
By expanding G(s) around s = 0 it is possible to find the following bound with Λ = 2µ h(µ − 2) 2 + 2aµ 2 + µ 4(1 − µ)h 2 + µ 2 (h + 2a Note that the difference φ * ρ (φ L ) − ξ(1 − φ 2 L ) is a measure of the error introduced by using the approximate form ξ(1 − φ 2 L ) instead of the exact function φ * ρ (φ L ), as done in Eq. (84), and thus the previous equation establishes an upper bound for this error. The validity of the approximation and its error bound are graphically proven in Fig. 11, where we plot the difference φ * ρ (φ L ) − ξ(1 − φ 2 L ) as a function of m 2 L and the straight line Λam 2 L . If we now assume that m L fulfills the same scaling properties as m described earlier in Eqs. (25)(26), then m 2 L st ∼ O(N 0 ) if a ∼ O(N −1 ) and m 2 L st ∼ O(N −1 ) if a ∼ O(N 0 ), and it turns out that the difference between the exact function φ * ρ and the approximation Eq. (84) is of order N −1 and thus it can be neglected at the order of our approximation.

Appendix D: Master equation for the link magnetization
We present in this appendix a simpler method, based on a master equation approach, to compute the statistical properties of the link magnetization, including fluctuations and finite-size effects [53]. This approach is very similar to the all-to-all approximation, and it can be derived following the same steps. We only consider one description variable m L to be relevant. The possible processes are m L → m L ± ∆ k , with ∆ k = 2k µN , where the sign − (resp. +) corresponds to the switching of a node of degree k from the up (resp. down) to the down (resp. up) state. The master equation then reads where W ± k are the effective rates of the proposed processes, that only depend on the description variable m L . We assume that all nodes with degree k have the same rate of switching to the other state, replacing the local density j Aij nj ki by the global density of up nodes connected to nodes in state up or down, which coincides with the c 0/1 calculated in Eq. (45) as ρ 1±m L . Additionally, and this is the key point, we assume that all the other variables follow strictly the deterministic attractor and do not deviate at all, i.e., m k = m L , ρ = ξ(1 − m 2 L ). We then have, for the noisy voter model model, Following [43], we take the continuum limit by expanding the master equation (D1) in powers of ∆ k to second order where now P (m L ; t) must be understood as a probability density function and the functions f L (m L ) and g L (m L ) are defined as The time evolution of the moments m L (t) and m L (t) 2 can be obtained by integrating both sides of the Fokker-Planck equation (D4) respectively as m L dm L and m 2 L dm L , such that one reproduces the results obtained in Eqs. (51) and (99).
If we take the same continuum limit in the master equation (17) of the all-to-all approach, we find, for the probability P (m; t), an equation similar to Eq. (D4), but dependent now on functions f (m) and g(m), defined as