Connectome architecture, gene expression and functional co-activation shape the propagation of misfolded proteins in neurodegenerative disease

It is becoming increasingly clear that brain network organization shapes the course and expression of neurodegenerative diseases. Parkinson’s disease (PD) is marked by progressive spread of atrophy from the midbrain to subcortical structures and eventually, to the cerebral cortex. Recent discoveries suggest that the neurodegenerative process involves the misfolding of endogenous proteins (α-synuclein) and prion-like spread of these pathogenic proteins via axonal projections. However, the mechanisms that translate local “synucleinopathy” to large-scale network dysfunction and atrophy remain unknown. Here we use an agent-based epidemic spreading model to integrate structural connectivity, functional connectivity and gene expression, and to predict sequential volume loss due to neurodegeneration. We demonstrate three key findings. First, the dynamic model replicates the spatial distribution of empirical atrophy identified in an independent dataset of PD patients. Second, the model implicates the substantia nigra as the disease epicenter, consistent with previous literature. Third, we reveal a significant role for both connectome topology and spatial embedding (geometry) in shaping the distribution of atrophy. Gene expression and functional co-activation further amplify the course set by connectome architecture. Altogether, these results support the notion that the progression of neurodegenerative disease is a multifactorial process that depends on both cell-to-cell spreading of misfolded proteins and local regional vulnerability. The model proves powerful in modelling neurodegeneration and provides insights into developing preventative procedures.


Fig. 1. Agent-based S-I-R model | (a)
Misfolded α-synuclein (red) may diffuse through synaptic connections into adjacent neurons, causing misfolding of normal α-synuclein (gray). Accumulation of misfolded α-synuclein induces neuronal loss. (b) At the macroscopic level, misfolded α-synuclein propagates via structural connections, estimated from diffusion-weighted imaging. Simulated neuronal loss (atrophy) is compared against empirical atrophy, estimated from PD patients using deformation-based morphometry (DBM). the pathogenic spread hypothesis. We also investigate whether selective vulnerability may influence from structural magnetic resonance imaging (MRI) scans in PPMI and healthy connectomes that the fiber tracts between region i and j, reflecting our intuition that agents in longer edges have lower 139 probability of exiting the edge; (b) remain in edge (i, j) with probability 1 − 1/l i j . The synthesis 140 rate α i and clearance rate β i in region i are the SNCA and GBA expression z-scores respectively 141 in region i converted to [0, 1] using the standard normal cumulative distribution function. The 142 system has only one stable point which can be found numerically (see Supplementary Information), 143 suggesting that the growth of α-synuclein will deterministically converge to an equilibrium state 144 set by the connectome and the gene expression profiles. The regional density of normal agents 145 (number of agents per voxel) solved at the stable point was taken as the initial state of the misfolded 146 α-synuclein spreading process. 147 Synuclein misfolding. We next initiated the pathogenic spread by injecting misfolded α-148 synuclein agents into the seed region, here chosen as the substantia nigra. The updating rules 149 of normal agents (above) were adapted to account for their susceptibility to infection from con-150 tact with misfolded agents. Apart from the rules defined in the aforementioned growth process, 151 normal (susceptible) agents in region i that survive degradation can be infected with probability 152 γ i thereby becoming misfolded (infected) agents (see Methods). In the absence of any molecular 153 evidence to the contrary, misfolded agents are updated with the same mobility (exiting/remaining 154 in regions/edges) and degradation (clearance rate) as normal agents. The new system seeded with 155 misfolded α-synuclein has two fixed points: (1) one represents the scenario in which misfolded 156 α-synuclein dies out, cleared by metabolic mechanisms before being able to transmit the infection to Simulated neuronal loss replicates the spatial pattern of atrophy 165 We first investigated whether misfolded α-synuclein spreading on the healthy connectome can 166 replicate the spatial patterning of atrophy observed in PD patients. We simulated the propagation 167 of misfolded agents and the accrual of atrophy due to the toxic accumulation of the aggregates. 168 Two factors that may induce neuronal loss were accounted for: (1) the accumulation of misfolded 169 α-synuclein that will cause region-specific cell death directly; (2) atrophy due to deafferentation After reaching the peak value (r = 0.63, r = 1.71 × 10 −5 ), the model fit slightly drops and finally stabilizes. In scanning along t to find the peak value, beginning timeframes were discarded to avoid picking up spurious correlation value as the peak (See Supplementary Information) As the misfolded agents propagate and accumulate in the system, the model fit increases up 174 to a maximum value (r=0.63, p=1.71×10 −5 , FIG. 2(a)) after which it drops slightly and stabilizes. 175 Note that we discarded early-spreading timeframes when scanning along t to find the maximum correlation value (see Supplementary Information) to avoid picking up spurious model fit. We posit that the slight decrease following the peak occurs because simulated atrophy becomes increasingly 178 widespread as the propagation of misfolded agents progresses, while the empirical atrophy was 179 derived from de novo PD patients at their first-visit in PPMI. FIG. 2(b) shows the linear relationship 180 between simulated and empirical atrophy across all nodes at peak fit, while FIG. 2(c) shows the 181 spatial correspondence between simulated and empirical atrophy. Model fit assessed by Pearson's 182 correlation coefficient produced comparable results (r=0.56, p=1.52×10 −4 at the peak, FIG. 2(a)(b)). 183 Interestingly, the model fit finally stabilizes with increasing t as the regional accumulation of  S3). 189 We next investigated if the pattern of model fit as a function of t and its peak value were 190 consistent across variations in structural network densities. We selected varying subsets of the 191 most commonly occurring edges in the individual structural connectivity matrices, varying the 192 binary density of the group structural network matrix from 25% to 45%. We then simulated the 193 spreading processes on each network, derived the neuronal loss at each region and compared it with Fig. 3. The full dynamic model outperforms static network measures across multiple network densities | The full spread model has more predictive power than static topological metrics, including node degree (yellow), node strength (purple) and eigenvector centrality (green). Moreover, simulated atrophy (blue) yielded higher correlation with empirical atrophy than the density of misfolded α-synuclein (red, peak correlation along t at each density), suggesting that neuronal death induced by misfolded α-synuclein is a better measure to model atrophy in PD than the mere accumulation of misfolded α-synuclein. Model fit was assessed using Spearman's correlation coefficient. The overall pattern of results was consistent across multiple network densities. Using Pearson's correlation coefficient yielded similar results (FIG. S4).
Finally, we investigated whether the observed atrophy patterns could be directly reproduced from 204 simpler topological measures, without invoking metapopulation dynamics. We first tested whether 205 simple regional variation in GBA or SNCA expression are associated with regional variation in dictates the mobility pattern of the agents such that hub regions have a higher probability of 211 misfolded α-synuclein infiltration. We correlated the atrophy map with node-level network metrics 212 including node degree, node strength, and eigenvector centrality at each network density ranging 213 from 25% to 45%. Hubs, or nodes with greater degree connectivity or centrality, tend to be 214 more atrophied (FIG. 3,   Identifying the disease epicenter 221 We next investigate whether the model yielded a disease epicenter consistent with previous 222 literature. In the aforementioned process of normal α-synuclein growth, we solved the regional likewise, in our model, higher regional abundance of normal α-synuclein agents indicates greater 227 likelihood of exposure to and growth of infectious agents, higher chance of disease transmission, 228 and consequently, greater vulnerability to the accumulation of misfolded α-synuclein. 229 We compared the regional density of normal α-synuclein agents with the empirical selective 230 vulnerability in patients to identify if highly vulnerable regions, such as substantia nigra, also 231 manifest abundance of α-synuclein. We find that, of the 42 left hemisphere regions, substantia nigra 232 has the highest normal α-synuclein level (FIG. 4, Fig. 4. Identifying the disease epicenter | Densities of normal α-synuclein (blue) at equilibrium (represented by the stable point) and spread threshold (red). Spread threshold was inverted by − log 10 , so higher values indicate lower thresholds. Spread thresholds reflect the susceptibility of a region to trigger an epidemic. Basal ganglia regions are rich in endogenous α-synuclein (caudate ranks among the top 42.9% regions; putamen, 31.0%; pallidum, 28.6%) and have relatively low spread threshold (caudate ranks among the lowest 35.7%; putamen, 38.1%; pallidum, 16.7%). Substantia nigra has the highest normal α-synuclein level and lowest spread threshold, making it the most probable epicenter of disease outbreak.
An alternative definition of disease epicenter is the seed node most likely to initiate a disease 247 outbreak. As explained in the previous section, the metapopulation model has two fixed points 248 representing disease extinction or major outbreak. Although the choice of seed region and injection 249 number of misfolded α-synuclein agents does not affect the magnitude of misfolded α-synuclein 250 accumulation, it can initially shift the properties of the two fixed points, determining which one 251 the system will converge to. We posited that the probability of triggering an outbreak indicates 252 the plausibility of acting as an epicenter. Therefore, we quantified the spread threshold for each 253 seed region, i.e., the minimally-required injection amount of misfolded α-synuclein to initiate an 254 outbreak. In traditional epidemic disease models that do not consider spatial structure or synthesis   Connectome architecture shapes disease spread 268 We next asked whether model fit would be facilitated or degraded by disrupting the connectome's 269 topology or spatial embedding (geometry). To address this question, we implemented two types of  p 35% < 0.001, p 40% < 0.001) and suggests that the high correspondence between simulated and 286 empirical atrophy in PD is jointly driven by connectome topology and geometry.  5. Effects of network topology and geometry | Systematic disruption of (a) connectome topology (rewired null) or (b) spatial embedding (spatial null) significantly degrades model fit as measured by Spearman's correlation. Red = real structural network (empirical network); grey = null networks. Rewired null: p 25% = 0.0014, p 30% < 0.001, p 35% < 0.001, p 40% = 0.0018; spatial null: p 25% < 0.001, p 30% < 0.001, p 35% < 0.001, p 40% < 0.001).

Gene expression shapes disease spread
We next sought to directly assess the influence of local gene expression on spreading patterns 289 of neurodegeneration. Based on molecular evidence, the model uses regional expression of GBA 290 and SNCA as determinants of α-synuclein clearance and synthesis rate. Regional GBA and SNCA 291 expressions were shuffled 10,000 times respectively by re-assigning the expression scores in each 292 parcel (FIG. 6(a),(b) respectively). We then implemented the dynamic models with randomized 293 expression levels and compared differences in model fit when using the empirical gene expression 294 levels (FIG. 6, red) and permuted gene expression levels (FIG. 6, grey).  An alternative explanation for these results is that simply introducing regional heterogeneity in 303 gene expression levels improves model fit, for example because of differences in general transcrip-304 tion levels between cortex and subcortex. To address this possibility, we further assessed model fit 305 in the cases where GBA and SNCA expression is made uniform across all brain regions. Instead of 306 using empirical gene expression, we set uniform synthesis/clearance rates across all regions using 307 the mean expression score, converted to a scalar value between [0, 1] using the standard normal 308 cumulative distribution function. We then computed the model fit (peak Spearman's correlation 309 value) for this "uniform" model. Critically, models using uniform transcription profiles under-  6). Altogether, these results demonstrate that regional expression of GBA and SNCA shapes 318 the spatial patterning of atrophy in addition to connectome topology and spatial embedding. 319 Fig. 6. Assessing the contribution of GBA and SNCA gene expression | To assess the influence of gene expression on atrophy, model fit using real expression values (red) is compared to null models in which node-wise expression profiles of GBA and SNCA (reflecting, respectively, α-synuclein clearance and synthesis) were shuffled. Both manipulations significantly reduce model fit regardless of network density (GBA: p 25% = 0.0031, p 30% < 0.001, p 35% < 0.001, p 40% = 0.0024; SNCA: p 25% = 0.0102, p 30% = 0.0201, p 35% = 0.0084, p 40% = 0.0334). Notably, uniform transcription profiles, in which all nodes have identical expression values (blue) yield above-chance model fit, but perform poorly compared to the model with real expression values (GBA uniform correlations: r 25% = 0.4479, r 30% = 0.3869, r 35% = 0.3672, r 40% = 0.3481; SNCA uniform correlations: r 25% = 0.5653, r 30% = 0.5780, r 35% = 0.5767, r 40% = 0.5794). 320 Finally, we tested whether neuronal activity or pre-and post-synaptic co-activation may facilitate pathways may be biased by synchronous activity between the pre-and post-synaptic cells, such that the agents are more likely to move towards regions with higher functional connectivity to a seed 328 region.

329
To address this question, we integrated resting-state fMRI functional connectivity into the model. 330 We introduce a term e k×fc (i, j) to rescale the probability of moving from region i to region j previously 331 defined by the connection strength of edge (i, j) while keeping the sum of the probabilities equal to 1 332 to preserve the multinomial distribution (see Methods). As k is increased, the influence of functional 333 connectivity is greater: stronger co-activation patterns play a more influential role in modulating non-significant functional connections were set to zero. 338 We varied k from 0 (no influence of functional connectivity) to 5 and derived the corresponding 339 peak values of model fit using the same four structural connectome densities as before (FIG. 7).

340
Model fit was improved by progressively increasing the importance of functional connectivity, but 341 only up to a point (k 25% = 1, k 30% = 2.5, k 35% = 2.5, k 40% = 2.5). Beyond this point, the influence 342 of functional connectivity dominates the agents' mobility pattern resulting in diminished model 343 fit. The results were consistent across network densities. These results provide evidence for the 344 notion that while α-synuclein propagation and resultant brain atrophy patterns occur via anatomical 345 connections, they may also be biased by neuronal activity.

346
An alternative explanation is that inclusion of functional connectivity simply leads to over-347 fitting the model. To test this possibility, we investigated if the same improvement in model fit 348 could be observed if α-synuclein spread is biased by randomized functional connectivity patterns. 349 We generated "null" functional connectivity matrices by randomly re-assigning the parcellated  Incorporating functional connectivity improves model fit | Resting-state fMRI functional connectivity was incorporated in the model by tuning the probability of α-synuclein propagation along structural connections. As the influence of functional connectivity is increased, α-synuclein spreading is biased towards structural connections that exhibit high functional connectivity. Model fit is shown for a range of structural connection densities. A balanced effect of functional connectivity and structural connectivity improves model performance, while excessive influence of functional connectivity degrades model fit. The same beneficial effect is not observed when randomized, "null" functional connectivity patterns are used (FIG. S6).
Second, model fits based on null functional connectivity do not have the same peaked shape as 354 observed when using real functional connectivity. This further support the conclusion that atrophy providing evidence that the emergent dynamics of synucleinopathy depend on network connectivity 400 (topology) and geometry (spatial embedding).

401
While we did find that network metrics predict brain atrophy, the full dynamic agent-based cell-autonomous factors on hubs, which cannot be solely explained by network structure. Thus the 416 dynamics arise from an interplay between regional vulnerability and network-wide propagation.

417
Regional vulnerability may also depend on local cell-autonomous factors such as gene expres-  It is also known that α-synuclein is secreted in an activity dependent manner (Paillusson et al. 448 2013). We therefore tested the influence of rs-fMRI derived measures of functional connectivity on  2(b)) that remarkably impedes model fit serves as an example. Nucleus accumbens is one of 487 the least atrophied regions in the dataset used here, whereas it exhibits high atrophy in the model.

488
One possible reason for this disagreement is that we did not include the different subsections of These values constitute a 3D deformation map for each subject, on which an un-paired t test is 530 conducted to derive the statistical difference (t-score) between the PD patients and the healthy 531 controls at each voxel as a measure of local atrophy.  connectivity matrix with binary density ranging from 25% to 45%. These group-level matrices 578 were finally symmetrized to represent (un-directed) brain networks. Likewise, we also constructed a 579 group-level length matrix in which elements denote mean lengths of the corresponding streamlines, 580 which were used to model the mobility pattern of agents in the edges.

588
It assumes that α-synuclein molecules are independent agents with mobility patterns and lifespans 589 characterized by the connectome's architecture, neuronal activity, and regional gene expression.

590
The normal α-synuclein agents, synthesized continuously under the modulation of regional SNCA Therefore, in determining the baseline regional density of normal α-synuclein, we increment the population of normal agents N i with: After the system reaches the stable point (error tolerance < 10 −7 ), we initiate the pathogenic spread and update the population of normal (N) and misfolded (M) agents with: The system has two fixed points, whose final positions will not be affected by the initial conditions where w i j is the connection strength of edge (i, j) (fiber tracts density between region i and j). The probability of remaining in the current region i, ρ i , was set to 0.5 for all i (see FIG. S8(a) for other choices of ρ i ; we note that the model fit is robust to variations in ρ i ). Analogously, the agents in edge (i, j) may exit the edge or remain in the same edge per unit time with binary probabilities: where l i j is the length of edge (i, j) (the mean length of the fiber tracts between region i and region j). We use N (i, j) , M (i, j) to denote the normal/misfolded population in edge (i, j). After a total time ∆t, the increments of N i , M i in region i are: Likewise, We adopt an asynchronous implementation in which the propagation of normal and misfolded agents 621 is operated before the synthesis, clearance and infection at each ∆t. We have also tried other imple-622 mentations, including propagation after synthesis/clearance/infection at each ∆t and synchronous 623 implementation, and found the differences are negligible, suggesting that our results are independent 624 of the modules' update order. The simulator is available at https://github.com/yingqiuz/SIR_simulator.

625
Accrual of neuronal death (atrophy). We model neuronal death as the result of two processes: direct toxicity from accumulation of native misfolded α-synuclein and deafferentation (reduction in neuornal inputs) from neuronal death in neighbouring (connected) regions. The atrophy accrual at t within ∆t in region i is given by the sum of the two processes: where r i (t) is the proportion of misfolded agents in region i at time t, and 1 − e −r i (t)∆t quantifies the 626 increment of atrophy caused by accumulation of native misfolded α-synuclein aggregates within ∆t 627 at time t. The second term 1−e −r j (t−1)∆t , weighted by w ji / j w ji and summed up across j, accounts 628 for the increment of atrophy induced by deafferentation from its neighbouring regions within ∆t 629 at t − 1. k 1 , k 2 are the weights of the two terms with k 1 + k 2 = 1. We set k 1 = k 2 = 0.5 such 630 that native α-synuclein accumulation and the deafferentation have equal importance in modelling 631 the total atrophy growth (see FIG. S8(b) for other choices of k 1 , k 2 ; we note that the model fit is 632 consistent across k 1 /k 2 ranging from 0.1 to 10).

633
Integration of functional connectivity 634 We used resting-state functional MRI (rs-fMRI) scans from the Human Connectome Project where fc (i, j) is the functional connectivity between region i and region j. Therefore the probability that agents move from region i to edge (i, j) per unit time is determined by Note that increasing k makes the influence of functional connectivity more differentiated across 648 the edges: the stronger functional connectivity values will be enhanced while the weaker ones may 649 be suppressed. weight of atrophy accrual due to native misfolded agents' accumulation k 1 + k 2 = 1 controling the contribution of native misfolded agents' accumulation to total atrophy growth k 2 weight of atrophy accrual due to deafferentation k 1 + k 2 = 1 controling the contribution of deafferentation to total atrophy growth.
r i (t) the ratio of misfolded agents in region i measuring the burden of misfolded agents in region i at t. 1 − e −r i (t)∆t is the increment of neuronal death at t in region i TABLE S1. Parameter List | Note that only k, ρ i , k 1 , k 2 are free parameters: k was scanned from 0 to 5 to study the effect of functional connectivity on disease spread (FIG. 7); ρ i = 0.5 for all the regions so that agents have equal chance of staying in the same region or moving out; k 1 = k 2 = 0.5 so that the two factors ((i) native misfolded α-synuclein accumulation; (ii) deafferentation from connected regions) contributed equally to the total atrophy growth. We also note that model fit is robust across multiple choices of ρ i , k 1 , k 2 (FIG. S8).
Although there is no analytical solution for α-synuclein concentration, phase plane analysis is 862 helpful in finding fixed points of the system. Considering that the rates of incoming and outgoing 863 agents in the edges are equal when the system is at the stable point, and that no clearance, synthesis, 864 or misfolding occurs in the edges, the effects of the propagation module are negligible in the analysis 865 of the system's fixed points. Therefore, we sought to use the "overall" or "total" synthesis rate (α), 866 normal agent clearance rate (β 1 ), misfolded agent clearance rate (β 2 ), and transmission rate (γ) to 867 approximately analyze the entire system using a series of differential equations (equation (S1)-(S3), 868 see below). Likewise, we use N, M to represent the total population of normal and misfolded agents 869 in the entire brain. β 1 , β 2 , γ depend on the actual N i , M i and thus are not static values (to see this, 870 for example, the total cleared normal agents per unit time is i β i N i , so the "overall" clearance rate 871 β 1 = i β i N i /N depends on real-time N i ; it is not the "real" clearance rate but an approximation of 872 the total rate of clearance); the total synthesis rate α = i α i S i , where α i is the empirical synthesis 873 rate in region i in the formal model and S i is region size. Note that the actual spreading dynamics 874 cannot be fully described by the following differential equations (S1)-(S3). However, they are 875 helpful in analyzing the possible states of disease propagation.

878
• Spread of misfolded α-syn. The system with misfolded agents injected behaves like: The nullclines of N, M are Therefore, the monotony and position of line (S5) relative to line (S4) becomes crucial. To find the monotony of (S5), we take its first order derivative Obviously, when M = 0, the first order derivative is 0. We then take the second derivative

892
To study why different choice of seed region and injected α-synuclein may lead to 893 different states (extinction or outbreak), we also investigated on what conditions the fixed 894 points are stable. This can be studied by taking the jacobian matrix of the system linearized 895 around the fixed points: has eigenvalues −β 1 and −β 2 −

906
Therefore it is also possible, in theory, that certain choices of parameters may lead to an 907 outbreak followed by gradual extinction. The vector field (arrows) denotes the direction of the gradient at each position (i.e., the system at that point will move along the direction of the corresponding arrow).  (b) Regional arrival time of misfolded α-synuclein is defined as the time steps required for misfolded α-synuclein amount to exceed 1 (after seeding at the substantia nigra with one misfolded agent). This roughly follows the Braak staging hypothesis. (c) Line chart of regional arrival time of misfolded α-synuclein.
though assessing model fit using Spearman's correlation takes into account only relative magnitudes 913 of simulated neuronal loss and inevitably discards the information of data points' values, thus can 914 be sensitive to small changes that alter the rank order of regional neuronal loss, it is better capable 915 of capturing the similarity of the rank orders between two variables. It is preferred in our present 916 study as it measures the resemblance between the simulated atrophy and empirical atrophy as to 917 which region(s) display more atrophy compared to other regions. Moreover, the simulated neuronal 918 loss does not exhibit a normal distribution pattern, making the Pearson's correlation less suitable 919 in out study. However, to ensure the robustness of model fit, we also derived Pearson's correlations 920 across the same set of network densities, which all yield comparable results (FIG. S4). Also note 921 that, considering that the low spatial resolution of structural MRI scans of PD patients may cause 922 inaccuracy in assessing the atrophy in substantia nigra, it was excluded in the correlations. 923 Fig. S4. Model fit based on Pearson's correlation coefficient yielded comparable results across network density from 25% to 45% | The model integrated with gene expression levels has more predictive power than the density of misfolded α-synuclein (red) and the static network metrics, including node degree (yellow), node strength (green), or eigenvector (purple) centrality.
Cutoff of the early-spreading timeframes 924 . The sensitivity of Spearman's correlation to rank orders may be problematic in the early 925 spreading period, as the measure takes in ranking information only so that even small increments 926 that alter ranking order of the original neuronal loss may cause substantial changes to model fit.

927
Therefore, to avoid picking up spurious peak correlation value (model fit) in the early timeframes 928 after seeding, we discarded the timeframes where change of misfolded α-synuclein densities exceeds 929 1% within ∆t = 0.01 in at least one region, resulting in a cutoff point at around 1000 time steps 930 depending on the network density. 931 We also adopted a less rigorous cutoff point, removing only the first 100 timeframes. The 932 difference in results is negligible (FIG. S5). At each k, rs-fMRI time series were re-assigned to construct null FC matrices, which degrades model fit monotonously as k increases. At smaller k, simulations based on real FC yield significantly higher model fit than the null settings as indicated by the 95% confidence interval (gray bar), while at larger k, real FC ceases to have advantage over null FC matrices in facilitating model fit and can even become significantly more harmful than the nulls.  For the atrophy in region i, k 1 controls the contribution of α-synuclein accumulation inside region i, while k 2 controls the contribution of deafferentation induced by atrophy in connected regions. k 1 + k 2 = 1. The model fit is consistently over 0.5 across k 1 /k 2 ranging from 0.1 to 10. These results suggest that the predicative power of the model is robust to variations in free parameters ρ i or k 1 /k 2 .