Stochastic calculus of protein filament formation under spatial confinement

The growth of filamentous aggregates from precursor proteins is a process of central importance to both normal and aberrant biology, for instance as the driver of devastating human disorders such as Alzheimer's and Parkinson's diseases. The conventional theoretical framework for describing this class of phenomena in bulk is based upon the mean-field limit of the law of mass action, which implicitly assumes deterministic dynamics. However, protein filament formation processes under spatial confinement, such as in microdroplets or in the cellular environment, show intrinsic variability due to the molecular noise associated with small-volume effects. To account for this effect, in this paper we introduce a stochastic differential equation approach for investigating protein filament formation processes under spatial confinement. Using this framework, we study the statistical properties of stochastic aggregation curves, as well as the distribution of reaction lag-times. Moreover, we establish the gradual breakdown of the correlation between lag-time and normalized growth rate under spatial confinement. Our results establish the key role of spatial confinement in determining the onset of stochasticity in protein filament formation and offer a formalism for studying protein aggregation kinetics in small volumes in terms of the kinetic parameters describing the aggregation dynamics in bulk.

Here, we introduce a kinetic framework based on stochastic calculus [46,47] that allows us to investigate the stochastic behavior of protein filament formation processes in small volumes. Starting from a purely deterministic description of filamentous growth kinetics [32][33][34][35][36][37], we derive a stochastic differential equation governing the early-time evolution of protein filament formation under spatial confinement. We exploit this formalism to describe the statistical properties of individual small-volume protein aggregation reactions and derive explicit expressions for the probability distribution of lag-times. Moreover, we establish the increasing importance of rare primary nucleation events in controlling the characteristic properties of aggregation in the stochastic limit.

Deterministic aggregation kinetics
The dynamics of protein filament formation in bulk is modeled in terms of a set of deterministic kinetic equations, the mean-field master equation [38], that describes the time evolution of the concentrations of aggregates of different sizes in terms of the elementary mechanisms of aggregation. The fundamental microscopic events that drive protein filament formation (figure 1(b)) generally involve an initial primary nucleation step, where the smallest fibrillar units (nuclei) are formed spontaneously from monomers, coupled to growth through filament elongation at the ends of filaments. In many cases, including the aggregation of diseaserelated peptides and proteins such as Alzheimerʼs Amyloid-β peptide, the aggregation process is accelerated by secondary nucleation processes, whereby new fibrils can be formed through the interaction between monomers and existing fibrils [32,35,36,48]. The most commonly experimentally accessible observables include the monomer concentration m(t) as well as the number and mass concentrations of fibrils, denoted with P(t) and M(t). These quantities are defined in terms of the concentrations f (t, j) of aggregates of size j as = å ( ) ( ) P t f t j , The general dynamic equations that describe the time evolution of P(t) and M(t) for a system evolving through primary and secondary nucleation pathways are thus obtained by summation of the (a) The time course of fibril mass concentration during an aggregation reaction in bulk (volumes > 1 μl) involves an initial phase of exponential growth, where the monomer concentration stays approximately constant at its initial value. Exponential growth is followed by saturation due to monomer depletion at late times. In the present paper, we focus on stochastic effects during the earlytime exponential growth phase. mean-field master equation over aggregation number j, yielding [32][33][34][35][36][37]: where k n , k 2 and k + are the rate constants for primary nucleation, secondary nucleation and filament elongation and n c and n 2 are the reaction orders for primary and secondary nucleation with respect to the monomer concentration. Note that equation (2.2) tacitly assumes aggregate growth to be the dominant term contributing to changes in aggregate mass concentration. This assumption, however, is always satisfied in filamentous aggregating systems, because the formation of long aggregates necessarily requires nucleation processes to be slow compared to aggregate growth.

Early stages of aggregation
Kinetic equations such as (2.1) and (2.2) describe a sigmoidal-type kinetics characterized by an initial lag-phase, followed by rapid growth and a final phase of approach to the plateau ( figure 1(a)) [49]. The system dynamics during the initial stages of aggregation (lag-phase and rapid growth) are captured very accurately by assuming that the monomer concentration is not significantly depleted, i.e. m(t)≈m(0), where m(0) is the initial concentration of monomers. Under these circumstances, the deterministic aggregation kinetics are captured by a simpler set of linearized kinetic equations, obtained from equations (2.1) and (2.2) in the limit of constant monomer concentration, m(t)≈m(0) [32][33][34][35][36][37]: where we have introduced the notation a b = = ( ) ( ) k m k m 0 , 0 n n n 2 c 2 and μ=2k + m(0). In this regime, both the filament number and mass concentrations then increase exponentially with time, P(t)→α/(2κ)e κt and M(t)→α/(2β)e κt after a short initial adjustment period, where k mb is the fibril multiplication rate. Hence, P(t) and M(t) are proportional to each other, P(t)=(β/κ) M(t); equivalently, the average fibril length is constant in this regime, at L=κ/β. Through this proportionality relationship, equations (2.3) and (2.4) can be simplified further to give a single and linear differential equation that describes the early-time evolution of filament mass concentration M(t) [41,50]: . Equation (2.5) has a straightforward physical interpretation: ξ is an input term describing the rate at which new filaments are introduced in the system through primary nucleation, while the parameter κ describes an effective rate of aggregate multiplication through secondary nucleation. We have effectively mapped our system onto one without explicit elongation, such that all filaments are of length L. New filaments of this length are added by both the input term, and the multiplication term.

Stochastic aggregation kinetics
The kinetic approach for describing the growth kinetics of protein filaments discussed in the previous section is based on the assumption that the growth rate κ and the input parameter ξ entering equation (2.5) are perfectly known quantities. In the following, we generalize equation (2.5) by accounting for the effects of intrinsic noise associated with small-volume effects.

Stochastic differential equation model for aggregation in small volumes
To see how system volume V enters explicitly equation (2.5) and determines the onset of stochasticity, it is useful to note that the parameter κ has units of inverse time and, thus, the first term on the right-hand side of equation (2.5) is independent of system volume. By contrast, the input term ξ, which describes the rate of increase of aggregate mass due to primary nucleation, has units of concentration and inverse time and hence carries an intrinsic dependence on V. In particular, the number of primary nucleation events per unit time is where N A is Avogadroʼs number, while the quantity ν=ξ/λ represents the increase in aggregate mass concentration associated with a single primary nucleation event. Thus, reducing the system volume V leads to few and rare primary nucleation events. Consequently, aggregation curves become highly unpredictable because variations in the actual number of primary nucleation events that occur, and the statistics of the times at which these nucleation events take place, come to determine the overall aggregation behavior. This type of behavior has been observed in microdroplet experiments of aggregation in small volumes [45]; in these experiments, aggregation is initiated by rare nucleation events and then propagated deterministically by the secondary pathway (figure 1(e)). Furthermore, primary nucleation events are discrete realizations that cause finite jumps of size ν in fibril mass concentration. Consequently, noise in protein aggregation in small volumes is discontinuous and the most appropriate way of extending equation (2.5) to account for intrinsic noise is to replace the term ξ by a random jump process [46,47].
Based on the above considerations, we model the stochastic time course of protein filament formation in small volumes in terms of the following stochastic differential equation with an additive noise jump process, as a generalization of equation (2.5) 4 : Here, dN(t) is the differential form of a Poisson process with intensity l = table 1) [46,47]. The constant κ represents the drift of the continuous part, whereas the parameter ν=ξ/λ defines the jump size of the Poisson process (ν has units of concentration). Poisson statistics apply because, in addition to nucleation being a discrete process, our rate of nucleation is constant, nucleation events are independent and rare, and the probability of a nucleation event occurring in a given time interval is proportional to the length of that interval. The intensity parameter λ has the physical interpretation of the rate of nucleus formation in volume V, such that λt is the expected value for the number of nuclei formed in volume V and time t. A sample path for the process N (t) is shown in table 1 and it is characterized by a series of jumps of size ν that occur at random times T n . We note that because increments of N(t) are Poisson distributed, the time T n+1 −T n that the system spends in state n before an additional nucleus is formed follows an exponential distribution with parameter λ. We note also that in the limit of a very large system, equation (3.1) naturally recovers the deterministic kinetic equation (2.5), as the Poisson process N(t) will be characterized by increasingly frequent jumps, such that the noise term ν dN(t) will approach ξ dt. We finally note that other stochastic models of protein filament formation could be explored within this formalism, e.g. by considering the effect of adding a multiplicative noise term at the level of fibril selfreplication or growth.  [46,47]. A Poisson process is defined as

Distribution of waiting times
T t e n n t 1 4 As a technical note, it is important to clarify that equation (3.1) must be understood as dM(t)=κM(t − ) dt+ν dN(t) where f (t − ) stands for the left limit of f, i.e.
. This clarification is important as it highlights the fact that the fluxes depend on the value of M(t) immediately before the jump, thus ensuring causality. For notational simplicity, however, the subscript t --will be dropped throughout the paper, although the left limit will always be tacitly assumed.

Stochastic aggregation curves
The explicit solution to the stochastic differential equation (3.1) can be found from direct integration as This expression can be re-written using the definition of stochastic integrals for jump processes [46,47], such that the solution to equation (3.1) can be obtained explicitly as: where the stochastic process N(t) has jumps at random times T n . Equation 3) describes the deterministic propagation of aggregate mass following a stochastic nucleus formation event that occurred at time T n . Hence, the measured stochastic polymerization curves explicitly depend on the current number N(t) of nuclei that happened to have formed at time t and the times T n at which these nuclei were formed. The variability of the measured M(t) profiles thus originates because N(t) is a stochastic process and the times T n are random. Figure 2(a) shows 10 sample aggregation curves generated using equation (3.3). Clearly, all realizations of M(t) differ from each other because both N(t) and T n are random. Equation   where, in the last step, we have used ξ=λν. Note that this result could have been obtained directly by taking the average of equation (3.1). We also note that in the limit of large volumes and fast nucleation, corresponding to λ?κ, equation (3.6) reduces to which is the solution to the deterministic kinetic equation (2.5). In the opposite limit Note that in the deterministic limit, κ ? λ, the variance goes to zero. In the stochastic regime, κ = λ, the leading term of the variance is

Lag-time distribution
A key feature of linear protein polymerization reactions is the observation of an initial lag-phase during which no aggregation is detected. For reproducible reaction curves, quantification of the delay in the polymerization is typically given in terms of a lag-time τ(x), which, for a reaction starting with a concentration M(0)=x of seed aggregates, is defined as the time at which M(t) reaches an arbitrarily chosen concentration threshold Hence, the deterministic value of the lag-time can be determined from (2.5) and reads t k For reaction kinetics exhibiting stochastic behavior, however, the duration of the lag-phase can no longer be defined as in equation (3.11), because the individual kinetic curves are not reproducible. Instead, the lag-time τ becomes a random variable, whose variability reflects the fluctuations in the underlying polymerization reaction [46,51]. Under these circumstances, finding τ translates into a first hitting time problem. We define τ(x) as the time at which a sample path of the stochastic process M(t) first hits the barrier M th  The underlying mechanism of barrier hitting is sketched in figure 2(b). Of particular interest are the cumulative probability distribution function of the lag-time, defined as t h for all t0, and the associated probability density function In fact, the theory of exit times for Markov processes [46,51] allows us to re-formulate the first passage problem in terms of a partial differential equation with appropriate initial and boundary conditions. In particular, the cumulative probability distribution function of lag-times for a reaction starting at M(0)=x, Q(t, x), satisfies the backward Kolmogorov equation: subject to the boundary condition 1, 3.17 th and the initial condition Here,  is the backward Kolmogorov operator of the SDE 3.1, which is given by 5

. 2 2
Noticing that jump step sizes due to primary nucleation are very small compared to M th , i.e. n  M 1 th , we can re-write  most conveniently as follows: where, we have replaced the first derivative of f (x) with the corresponding finite difference. Equation with boundary condition According to equation (3.15), the lag-time probability density function T(t, 0) is finally obtained by differentiating equation (3.26), yielding [42,50]: Thus, by explicit calculation of the random aggregation trajectories we verified the results derived previously in [42,50].
3.5. Characteristics of lag-time distribution and average lag-time Equation (3.27) describes in closed form the probability distribution of lag-times for a reaction with no initial aggregates ( figure 2(d)). In the long time limit, the probability density function for lag-times decays exponentially whereby the characteristic decay time is 1/λ. In the early-time limit, T(t, 0) and all its derivatives decay to zero, i.e. T(t, 0) has an essential singularity at t=0. The function T(t, 0) attains its maximum at which can be interpreted as the most probable time at which the threshold concentration M th is reached. Since θ ? 1, this is effectively equal to the deterministic lag-time. The expected value for the lag-time, t á ñ, is obtained by integrating equation (3.27), yielding The variance of the lag-time distribution is computed similarly and reads  Figure 2(c) illustrates a plot of t á ñ as a function of inverse system volume. In the limit of large volumes or high nucleation rate, corresponding to l k t á ñ  , tends to the deterministic bulk lag-time, or the most-probable lag-time In the opposite limit of small volumes or slow nucleation, corresponding to λ=κ, the sum in equation (3.30) is dominated by the j=0 term. Consequently, t á ñ becomes In the limit of small volumes, the rate of primary nucleation thus becomes so slow that the aggregation reaction is limited by the waiting time for the first few primary nucleation events to occur. In this regime, t á ñ displays a marked volume dependence. In particular, in a plot of t á ñ against 1/V, equation (3.30) approaches the bulk limit τ bulk approximately as a straight line of gradient c n . Since c n depends explicitly on the rate constant for primary nucleation, k n , a measurement of average lag-times with inverse system volume under spatial confinement provides a useful strategy for investigating the early stages of protein filament formation. These stages are otherwise highly challenging to probe using traditional bulk experiments, as aggregate concentration readouts in bulk tend to be dominated by secondary nucleation rather than primary nucleation. The transition between the bulk regime and equation (3.33) occurs at a characteristic volume V c =κ c n . Estimates of V c are typically in the picoliter range, which is many orders of magnitude smaller than reaction volumes in bulk. Microfluidic techniques have emerged as a powerful method for probing such volumes in a high throughput manner. Distinct small-volume aggregation reactions may be carried out (and measured) in parallel in large numbers of microdroplets, generating sufficient statistics to determine average lag-times as a function of volume.

Growth rate
Another observable of key experimental interest is the normalized average growth rate at time t, which in the protein aggregation literature is commonly defined as: Note that other definitions of the growth rate are equally possible, including, for instance, . The growth rate (3.34) can be evaluated by considering the averaged version of equation (3.1) To evaluate the term , we first note that, since N(t) is a sum of equally-sized step functions with steps at times T n , its derivative must be a uniformly-weighted sum of delta functions at the same times: Hence, since the jumps are independent, and their times T n are Gamma-distributed (see table 1), we find: 3 . 3 7 n n n 1 1 Since νλ=ξ, it follows: for times t?κ −1 the expression (3.38) can be simplified to: Equation (3.39) states that the normalized growth rate equals κ irrespective of system volume, V. This is because, once an appreciable fraction of monomer has aggregated, the processes that control further growth, i.e. elongation and secondary nucleation, are deterministic in nature.

Correlation between normalized growth rate and lag-time
In the deterministic limit, there is a strong linear correlation between inverse lag-time and normalized growth rate [33]. This correlation has been verified experimentally for different systems in bulk [52,53] and emerges because both the inverse lag-time and the normalized growth rate scale with κ; moreover, the proportionality constant between inverse lag-time and normalized growth rate is in the form of a logarithmic correction that depends only weakly on different experimental conditions. Our results show that this correlation is lost gradually as system volume V is decreased and the system enters the stochastic regime ( figure 3). This is because the normalized growth rate still equals κ in the stochastic limit, but the lag-time τ tends to 1/λ. This effect reflects that the lag-time no longer represents solely the time for taken for M(t) to multiply up to the threshold value, but also the time for the first primary nucleation events to occur. Thus, reducing the system volume Figure 3. Breakdown of the correlation between lag-time and normalized growth rate under spatial confinement. Lag-times and normalized growth rates are simulated for several values of κ and several different volumes from V=10 fl to V=1 μl (the latter being distinguished by color). In the deterministic limit (large volumes), there is a strong correlation between the inverse lag-time and the normalized growth rate, as indicated by the solid black line with slope −1. This correlation is lost as the system volume is decreased and stochastic effects become important. decouples the effect of the primary nucleation process from that of the other growth processes, permitting its direct measurement.

Conclusions
In this paper, we have studied stochastic effects in protein aggregation phenomena in small volumes. Using the framework of stochastic calculus with jump processes, we have obtained expressions that describe the stochastic time course of protein aggregation reactions and the probability distribution of lag-times. We have further demonstrated the gradual loss of correlation between protein aggregation rate and lag-times as the system size shrinks. Our results highlight the crucial role played by the system volume in determining the inherent variability characterizing protein aggregation phenomena under confinement and provide a strategy for linking bulk parameters characterizing large-volume kinetics with behavior in small volumes.