Distinguishing between different percolation regimes in noisy dynamic networks with an application to epileptic seizures

In clinical neuroscience, epileptic seizures have been associated with the sudden emergence of coupled activity across the brain. The resulting functional networks—in which edges indicate strong enough coupling between brain regions—are consistent with the notion of percolation, which is a phenomenon in complex networks corresponding to the sudden emergence of a giant connected component. Traditionally, work has concentrated on noise-free percolation with a monotonic process of network growth, but real-world networks are more complex. We develop a class of random graph hidden Markov models (RG-HMMs) for characterizing percolation regimes in noisy, dynamically evolving networks in the presence of edge birth and edge death. This class is used to understand the type of phase transitions undergone in a seizure, and in particular, distinguishing between different percolation regimes in epileptic seizures. We develop a hypothesis testing framework for inferring putative percolation mechanisms. As a necessary precursor, we present an EM algorithm for estimating parameters from a sequence of noisy networks only observed at a longitudinal subsampling of time points. Our results suggest that different types of percolation can occur in human seizures. The type inferred may suggest tailored treatment strategies and provide new insights into the fundamental science of epilepsy.


Details on ML Estimation Regime for RG-HMM
Although our ultimate goal is testing for two competing hypotheses of percolation regimes, learning the RG-HMM from a sequence of noisy networks observed only at a longitudinal subsampling of times is a necessary precursor and of independent interest.
Throughout this section, we assume α < 0.5 and β < 0.5. Under such an assumption, the parameters in the model are identifiable. (A proof is provided in section 4.) We present an Expectation-Maximization (EM) algorithm for estimating the parameters in an RG-HMM (i.e. p, q, γ, α, β) with a given sequence of noisy networks observed only from a longitudinal subsampling of time points, i.e. g ⋆ 1:M . We assume that the network in the hidden layer changes faster than the observing rate so that there are likely many non-observed small changes occurring between the consecutive observation times, as is typically the case for functional connectivity networks used in the study of epilepsy.
Estimation is done conditional on the latent state W (t 1 ) = 1, G(t 1 ) = g ⋆ (t 1 ) at the first observation time t 1 . As in [1], this has the advantage that no initial distribution assumption is needed for the latent Markov chain, and the estimated parameters refer exclusively to the dynamics of the network.
The algorithm consists of an E-step, wherein we calculate the expected value of the complete data log-likelihood l(α, β, γ, p, q) = log f (W 2:M , G 2:M , g ⋆ 2:M ), with respect to the distribution of unknown latent variables W 2:M , G 2:M , given the observed networks g ⋆ 1:M and the current parameter estimates. We then maximize the expected log-likelihood in the M-step.
The E-step is non-trivial in that there are two levels of unknown latent variables in our model: we do not know the true networks and binary variables W (t m ), G(t m ) at the observation times, nor do we know the intermediate path that connects one true state W (t m−1 ), G(t m−1 ), to the next W (t m ), G(t m ). EM is standard for parameter estimation in general HMMs. However, we face some unique challenges: (1) direct calculation of the expectation is computationally infeasible due to the enormously large state space of size 2 ( N 2 ) ; (2) the observed-data likelihood is difficult to calculate due to the unobserved changes occurring between consecutive observation times. Particle filtering and augmented data sampling using MCMC are used to tackle these challenges.

Complete-data Log-likelihood
For the complete data scenario, we assume we have samples w 2:M , g 2:M for the hidden layer. Then the complete data log-likelihood for the RG-HMM, conditional on W (t 1 ) = 1, G(t 1 ) = g ⋆ (t 1 ), can be written as l ⋆ (α, β, p, q, γ) = log f (w 2:M , g 2:M , g ⋆ 2:M ) = M m=2 log f α,β (g ⋆ m g m ) + M m=2 log f p,q,γ (w m , g m w m−1 , g m−1 ), (1.1) where g m , g ⋆ m is the abbreviation for g(t m ) and g ⋆ (t m ), respectively, α is the type I error rate, β is the type II error rate, p and q are birth and death rate, and γ is the rate parameter for the continuous-time percolation model in which the hidden chain in RG-HMM was embedded.
The first term in (1.1) is based on the conditional distribution of the observed network given the truth, which takes the form where a m , b m , c m , d m are counts corresponding to type I and type II errors in the error process from g m to g ⋆ m at observation time t m . Details are shown in Table 1.1.
The second term in (1.1) is the log-likelihood of the embedded discrete-time Markov chain in the hidden layer, which cannot be calculated in closed form due to the non-observed small changes occurring between consecutive observation times in the hidden layer. Therefore, we consider the log-likelihood for augmented data in order to obtain a more easily computed likelihood. The details are as follows.

EM for Incomplete-data
The goal is to find arg max Γ f (g ⋆ 2:M G 1 = g ⋆ 1 , W 1 = w 1 ), where Γ = (p, q, γ, α, β). E-step The expected value of the complete data log-likelihood with respect to the true networks and true binary variables M-step The first term in (1.4) can be easily maximized since it has a closed-form log-likelihood in (1.2), which yields the following: . (1.5) The formula forα is the expected number of false edges in the observed network divided by the expected number of non-edges in the true network, given the observed networks g ⋆ 1:M and current parameter estimates Γ (k−1) . The formula forβ is the expected number of false non-edges in the observed network divided by the expected number of edges in the true network, given the observed networks g ⋆ 1:M and current parameter estimates Γ (k−1) . The second term in (1.4) can be maximized by setting the score function to be zero, where θ = [p, q, γ], S(θ; w 2:M , g 2:M ) = ∂ ∂θ log f (w 2:M , g 2:M ) is the partial data score function, which cannot be calculated in closed form. Note that the score function for the augmented data is in closed form and by the Missing Information Principle used in [1], we have that (1.7) In other words, the partial data score function can be calculated by taking the expectation of the total data score function. The proof is provided in Appendix C. Therefore, solving (1.6) yields the following closed-form solution for γ, p and q, . (1.10) where R m−1 is the total number of transition times between t m−1 and t m in the augmented sample path and τ m−1 r is the r-th transition time between t m−1 and t m .

Particle Filtering
The expectation step of our E-M algorithm is difficult to calculate. Since the state space for latent variables is prohibitively large, it is not feasible to directly calculate the conditional expectations in the aforementioned MLEs by summing over all possible network sequences, binary variable sequences and sample paths in the hidden layer. Instead, we propose to estimate those conditional expectations in (1.5), (1.8), (1.9), (1.10) by sampling from the state space of latent variables. Note that there are two levels of the expectation in (1.8), (1.9), and (1.10). The first is the outer expectation, which is taken with respect to the conditional distribution of W 2:M , G 2:M in the hidden layer given the observed networks g ⋆ 1:M , which we propose to sample from by a particle filtering based sampling scheme described in [2] (i.e. a Sequential Monte Carlo method). This sampling scheme involves first approximating f (w(t m ), g(t m ) | g ⋆ 1:m−1 , Γ (k−1) ) with B particles (i.e. samples for the true network and binary variable pair) at each observation time t m , then tracing the ancestral lines, or particle paths, for the particles as samples from f (w 2:M , g 2:M | g ⋆ 1:M , Γ (k−1) ). The B particles at each observation time reduces the space from which we will sample from for the particle paths, thus allowing us to effectively sample some number Ψ network and binary variable sequences that are more likely to generate the observed networks in the error process. A visual representation for this procedure is provided in Fig 1.1. Particle filtering sampling scheme. B particles are sampled at each observation moment following a two-stage process: 1) particles are sampled at the previous moment with probability proportional to the conditional distribution of the true network given the observed network; 2) then a true network and binary variable at the current observation moment, i.e. a new particle, is simulated starting from one of the sampled particle in step 1, according to ER/PR process. Ψ ancestral lines are obtained by tracing back the ancestral lines for Ψ particles at time t M , sampled with probability proportional to the conditional distribution of the observed network given the true network at time t M .
2. To create B number of particles at observation time t 2 , denoted by {w b (t 2 ), g b (t 2 )} B b=1 , repeat the following for b = 1, · · · , B.
(a) Start from W (t 1 ) = 1, G(t 1 ) = g ⋆ (t 1 ) and simulate according to a continuous time ER/PR process up until time t 2 − t 1 . Then set w b (t 2 ), g b (t 2 ) equal to the current binary variable value and network in the simulated chain.
(a) For each one of the simulated pairs in ) with current parameter estimates.
(b) To create B number of particles at time t m+1 , repeat the following for b = 1, · · · , B.
ii. Start from W (t m ) = w η (t m ), G(t m ) = g η (t m ) and simulate according to a continuous time ER/PR process up until time t m+1 − t m . Then set w b (t m+1 ), g b (t m+1 ) equal to the current binary variable value and network in the simulated chain. 4. To obtain Ψ samples from the conditional distribution of W 2:M , G 2:M given the observed networks g ⋆ 1:M , we sample Ψ ancestral lines, or particle paths, denoted by {w ψ 2:M , g ψ 2:M } Ψ ψ=1 with the following procedure. We first calculate the conditional probability at the last observation time ) with current parameter estimates, for b = 1, · · · , B, then repeat the following for ψ = 1, ..., Ψ.
(b) Trace back the ancestral line for the η M -th particle, i.e. (w η M (t M ), g η M (t M )), at time t M . In other words, identify the ancestral particle index for the η M -th particle at time t M , and call it η M −1 , then identify the ancestral particle index for η M −1 -th particle at time t M −1 , and call it η M −2 , and continue to do this backward for η m -th particle at time t m , m = M − 2, · · · , 2. Note that at each observation moment, each particle has exactly one parent. Then set the ancestral line, i.e. {(w ηm (t m ), g ηm (t m ))} M m=2 as one sample from the conditional distribution of W 2:M , G 2:M , given g ⋆ 2:M .
The particle filtering allows us to sample Ψ network and binary variable sequences that are more likely to generate the observed networks in the error process. It generates B possible latent variables at each observation time, thus effectively reducing the space from which we will sample from for the particle paths. It also allows us to sample true network and binary sequences without actually having the transition probabilities in closed form.

Simulating the Sample Path
The second level of the expectation in (1.8), (1.9), and (1.10) is the inner expectation, which is taken with respect to the conditional distribution of sample path S, V that connects the given true sequences w 2:M , g 2:M at the observation times in the hidden layer. We sample from this conditional distribution by simulating paths using an MCMC method. We draw upon the work of [1] where draws are generated by the Metropolis-Hastings algorithm, using a proposal distribution consisting of small changes in a sample path that brings one network to the next. Specifically, for each m = 2, · · · , M , we need to generate a sample path s m−1 , v m−1 that brings (w(t m−1 ), g(t m−1 )) to (w(t m ), g(t m )). The probability function of the sample path, conditional on (w(t m−1 ), g(t m−1 )) is given in (A.4). To simplify the notation, denote the sample path and its conditional probability function by u and p(u), respectively. The acceptance ratio in the Metropolis-Hasting algorithm for a current state u and a proposed stateũ is where q(ũ|u) is the proposal distribution under which a new candidate sample pathũ is proposed by making small changes in the current sample path u. The small changes consist of three operations: paired deletion, paired insertion and permutation. Paired deletion involves removing a pair of same edge changes in sample path. For example, if sample path u consists of adding an edge at one time and deleting the same edge after, then a new sample pathũ can be obtained by removing such pair of same edge changes from u since the two edge changes offset each other producing no change on the network. Similarly, paired insertion involves adding a pair of same edge changes in a current sample path, and permutation involves permuting all edge changes in a current sample path. All of these operations are done in such a way that the proposed new sample pathũ is still a feasible path with only one edge change at a time connecting the two networks at the consecutive observation times in the hidden layer. More discussion of this MCMC procedure can be found in [1].

Putting it Together -the EM Algorithm
The EM algorithm resulting from the combination of all elements described in the previous subsections, for maximum likelihood estimation of the parameters p, q, γ, α, β in our RG-HMM, is shown as Algorithm 1.
The convergence criteria we have used in practice is ||Γ k+1 −Γ k || 2 ||Γ k || 2 < 0.10. Based on our numerical experience, the proposed EM algorithm usually converges within 5 iterations when all parameters are initialized as 0.5. Less iterations will be needed if parameters are initialized somewhere close to the true values. The choice of B (i.e. the number of particles sampled at each observation moment), the choice of Ψ (i.e. the number of true network series to sample for the maximization of error rates α and β) and the choice of H (i.e. the number of true network series to sample for the maximization of p, q, γ) may vary depending on the size of network and the number of observation time points. Our recommended starting point is to set B = 50, 000, Ψ = 40, 000 and H = 10. It is important to keep in mind that the maximization of p, q, γ involves a computationally expensive MCMC procedure. Setting H to a small number helps efficiently reduce the computational burden without compromising much on the accuracy of parameter estimates, in our experience.
In our EM algorithm, time complexity is O(BM ) for the particle filtering step, O(HM ) for the MCMC step, and O(ΨM ) for sampling Ψ number of ancestral lines. Although the estimation procedure scales linearly in B, M and H, it is still a computationally expensive procedure for large networks because simulating the true networks in the hidden layer involves sampling from edges or non-edges of a N -node network, which scales quadratically in N using standard techniques.
Our experience in the reported simulations and in working with empirical data sets is that the algorithm converges well. However, it is time-consuming, especially for larger networks requiring more particles for approximation. For example, when working with a 30 node network with 30 observation time points and 50,000 particles, one testing procedure, including ML estimations under two hypotheses and the calculation of Bayes factor, takes approximately 1 hour to run on a high performance Linux computing cluster using 4 cores with appropriate use of parallelization techniques for speedups. If we multiply the number of particles by 10, i.e. B = 500, 000, the algorithm will take approximately 10 hours. In principle, the larger B is, the more accurate estimates will be. However, it comes with the trade-off between computational time and accuracy. While it is feasible to run our algorithm on even larger networks under manageable time with a relatively small particle size, we must interpret the results cautiously. In such cases, we recommend running multiple testing trials and taking an average of the approximated Bayes factor to reduce the high variance caused by a relatively small number of particles.
In the E-step of our EM algorithm, we apply particle filtering to approximate the expected log-likelihood of the complete-data. Although a central limit theorem [3] exists for the particle estimate of the likelihood, which assures the asymptotic unbiasedness of the approximation in Estep as B → ∞, it is still difficult to theoretically investigate the convergence property of our EM algorithm with the rather complicated likelihood function in the RG-HMM. Therefore, we resort to simulation to evaluate the empirical performance of our EM in the main text.

Algorithm 1: EM Algorithm for MLEs
Input: Network snapshots: to be the current simulated network and binary variable.

Derivation of Recurrence Relation in Equ (5) in the Main Text
Define Z m := P G ⋆ 2:m = g ⋆ 2:m |X 1 = 1, g ⋆ (t 1 ) for each m = 2, · · · M , and Z 1 : ). To alleviate the notational burden, we omit the condition at t 1 in the probability expression when no confusion is possible.

A Forward Algorithm with Particle Filtering
We now present a forward algorithm with particle filtering to compute Z B M , i.e. the particle approximation of the marginal probability of observations. The time complexity is O(BM ). Most of Algorithm 2: Forward Algorithm with Particle Filtering the computation is in the simulation of (⋆), which can be done as in step 3 of section 1.

Details on Automatic Segment Finder
Before we apply our estimation and testing framework to the constructed networks, we need first choose a segment of the network time series consistent with percolation in connection with different stages of seizure, e.g. seizure onset and preceding termination. The choice of an appropriate time series segment from the overall non-stationary signal is nontrivial. We developed a simple, automatic procedure for the purpose, shown in Algorithm 3. This procedure consists of the following steps: 1. Subset the time series by windowing so that it only contains network evolution in the region of interest (ROI).
2. Split the time series into different low-high-low segments based on GCC/density of the network time series. Each low-high-low segment starts when the size of the GCC/density is below low level and ends when it reaches low level for the first time after reaching high level. The two levels are set to be the first and third quartiles of the size of the GCC/density over time.
3. For each low-high-low segment, trim off the tail with decreased GCC/density, which is from when it reaches high level for the last time to the end.
4. Use the identified low-high segments for testing.
The purpose of step 2-3 is to automatically identify all segments where both the size of GCC and density ramp up in the ROI. There are several ways proposed to choose the ROIs in a network time series. One way is to choose the segment from 50 s before seizure onset to 50 s after seizure onset. The second way is via the subjective determination of an expert. The third way is through dynamic community detection on network time series [4], and to choose the region where the large dynamic community appears. We used the third in our analysis since it is the most objective approach and yields reliable ROIs assessed by epileptologist.

Algorithm 3: Automatic Segment Finder
, metric on networks: GCC or Density Output: A network time series segment g ⋆ a: ▷ Take a subsection of time series 4 Q 1 ← first quartile of c l:r 5 Q 3 ← third quartile of c l:r 6 anchor ← l ▷ Start segmenting time series with sliding window 7 while c anchor > Q 1 do 8 anchor ← anchor + 1 ▷ Anchor the left point of a potential segment 9 end 10 set = {set, anchor} 11 while not finished segmenting c l:r do Note that instead of using either GCC or density to segment the network time series, we can also use both of them to find the proper segment for testing. Specifically, we can first run the Automatic Segment Finder on the network time series with one metric. We can then run it again with the other metric, thus obtaining two segmented network time series. And then we can take, as input, the overlapping of the two segments for testing, which is what we do in this analysis.

Model Identifiability for RG-HMM
In this section, we address the identifiability of RG-HMM. The model is identifiable if distinct values of Γ correspond to distinct probability distributions, i.e. if Γ 1 ̸ = Γ 2 , then also P Γ 1 ̸ = P Γ 2 .
We show that f (g ⋆ ) is identifiable if type-I error α < 0.5 and type-II error β < 0.5.
We now show that . Let x(t m ) = (g(t m ), w(t m )) denote the latent variables in the hidden layer, then conditional on x(t 1 ) = (g ⋆ (t 1 ), 1), For any potential true network g(t m ), there exists an 'opposite' network g ′ (t m ) obtained by flipping all edges of g(t m ) to non-edges and all non-edges to edges. Hence, if given the observed network g ⋆ (t m ), g(t m ) contains a true edges, b false non-edges, c false edges and d true non-edges, then given g ⋆ (t m ), g ′ (t m ) will contain c true edges, d false non-edges, a false edges and b true non-edges. Note that where the sum is over all possible sample paths connecting x(t m−1 ) and x(t m ). Details for the pmf of a sample path are discussed in Appendix A. Let u ′ be the 'opposite' sample path of u, which can be obtained by flipping all (g(τ r ), w(τ r )) in u into (g ′ (τ r ), 1 − w(τ r )). Hence if u is a sample path connecting x(t m−1 ) and x(t m ), u ′ will then be a sample path connecting x ′ (t m−1 ) and x ′ (t m ).
If we fix the rate parameter , w′(τ r−1 ) for each τ r . Assume that g(τ r−1 ) contains a edges and b non-edges, then g ′ (τ r−1 ) will contain b edges and a non-edges. Note that under ER process, then (4.4) and (4.5) are equivalent if p 2 = p 1 and p 1 = p 2 . For PR process, it's not necessarily true since f g(τ r ) w(τ r ), g(τ r−1 ) depends on the sizes of connected components of g(τ r ) and the relation between the connected components of g(τ r ) and that of g ′ (τ r ) is intractable and hard to write into analytical forms.
Since each summand in (4.1) under paramerterization Γ 1 = (p 1 , q 1 , γ 1 , α 1 , β 1 ) will have one and only one match to a summand under the paramerterization . Therefore, the probability distributions of observed networks are the same under the two parameterization explained above, causing nonidentifiability. By restricting the parameter space α < 0.5, β < 0.5, we can make the model identifiable.

Asymptotic normality
We provide a convergence result for the MLE in our proposed RG-HMM in its stationary period as time progresses. Let Γ 0 andΓ denote the true values of parameters and the MLE of RG-HMM, respectively. To simplify the notation, let X(t m ) denote the pair (W (t m ), G(t m )).
The asymptotic behavior ofΓ in our RG-HMM can be established using results from [5] on asymptotic normality of the MLE for a general stationary HMM.
Theorem 5.1. Let π Γ (x) be the stationary distribution of a Markov chain in RG-HMM {X(t m ), G ⋆ (t m )} and L 0 be the limiting covariance matrix of m −1/2 ∂ ∂Γ log p Γ (G ⋆ 1:m ). Assume (C1) that t m − t m−1 = t m+1 − t m = ∆t, ∀ m ≥ 2, and stationarity is reached at t 1 , (C2) that ∀ x ∈ X , Γ → π Γ (x) has two continuous derivatives in some small neighborhood of Γ 0 , and (C3) that L 0 is nonsingular. Then The proof of Theorem 5.1 is given in the following subsection. We establish the asymptotic property of the MLE for our RG-HMMs in its stationary period using asymptotic results for general HMMs with stationary Markov chains [5]. Unfortunately, no proof is yet available for our model in its nonstationary period when networks evolve through the percolation phase transition. The most relevant work on asymptotic theory for HMMs with nonstationary Markov chains is in [6]. This is an interesting subject for future research.

Proof of Theorem 5.1
Proof.
[5] provide a central limit theorem for the MLE of a general HMM with stationary Markov chain under 6 regularity conditions (A1)-(A6). Our approach is to show that (A1)-A(6) hold for our RG-HMM under assumptions (C1)-(C3).
Let P ab (t) denote the probability that the Markov chain {X(t), t ≥ 0} in our RG-HMM presently in state a will be in state b after an additional time t, P ab denote the one-step transition probability that the process will next enter state b when it leaves state a (details in Appendix B), P n ab denote the n-step transition probability.
Under (C1), the M observation moments t 1 < t 2 < · · · < t M are evenly spaced at the stationary period of Markov chain, then the transition probability for the discrete-time Markov chain {X(t m ), m ≥ 1} in our RG-HMM is P (X(t m ) = b|X(t m−1 ) = a) = P ab (t m − t m−1 ) = P ab (∆t), ∀ m, which is independent of m. Hence, (C1) ensures that {X(t m )} is a stationary Markov chain with homogeneous transition probability matrix {α Γ 0 (a, b)} = {P ab (∆t)}. We now restate (A1)-(A6) and show that they all hold for our RG-HMM {X(t m ), G ⋆ (t m )}.
In our RG-HMM, all states in the Markov chain {X(t m )} communicate with each other as any two networks can be connected by a network path that leads the one to the other, hence irreducible. Note that |X | < ∞, then {X(t m )} is positive recurrent as every irreducible Markov chain with a finite state space is positive recurrent. Also note that the transition probability for all a, b ∈ X , p, q ∈ (0, 1) and γ ∈ (0, +∞). Let α Γ 0 (a, b)} can be calculated by multiplying the matrix {α Γ 0 (a, b)} by itself k times, yielding positive diagonal elements, which means that each state can visit back itself in any number of transition steps, hence it is aperiodic. Therefore, (A1) holds, which also assures the existence of unique stationary distribution for {X(t m )}.
Note that P ab is given in Appendix B (example with 3-node network is provided in the end of the proof), and the matrix {P n ab } can be calculated by multiplying the matrix {P ab } by itself n times, thus we have where {c ji } 4 j=0 and k(n) are constant depending on the values of a, b and n. c 0i is the term corresponding to the product of probabilities of random graph process, thus c 0i ≤ 1. k(n) can be interpreted as the number of feasible paths from a to b with length n, and each path contains c 1i + c 4i edge additions and c 2i + c 3i edge deletions, thus 0 ≤ c 1i , c 2i , c 3i , c 4i ≤ n. Also we have where a, b, c, d are constant depending on the true network a and observed network g ⋆ . From the analytical form of α Γ 0 (a, b) and g Γ (g ⋆ |a) given in (5.5), (5.6) and (5.7), along with assumption (C2), identifiability addressed in Section 4 and finite state space X , we can verify that when p, q ∈ (0, 1), γ > 0 and α, β ∈ (0, 0.5), (A2)-(A6) all hold. (A3) holds since the derivatives of g Γ (g ⋆ |a) w.r.t α and β are bounded in some neighborhood of Γ 0 , and the expectation and integral involve finite summation of finite terms, thus finite as well. (A4) holds since g Γ (g ⋆ |a) ̸ = 0 with α, β ∈ (0, 0.5), and ρ(g ⋆ ) < ∞ for all g⋆ and a. (A5) holds since Θ is an open set, Θ = {(p, q, γ, α, β) | p, q ∈ (0, 1), γ ∈ (0, ∞), α, β ∈ (0, 0.5)}. (A6)-(i)(iii)(iv)(v) all hold since g Γ (g ⋆ |a) are bounded in some neighborhood of Γ 0 , and the model is identifiable under set Θ.
As it is hard to verify analytically the continuity regarding Γ → π Γ (a), we make it the assumption (C2). We can verify that (A2) and (A6)-(ii) also hold under (C2). To see this is so, we show, as an example, that α Γ 0 (a, b) is continuous in p ∈ (0, 1) and the first and second derivatives of α Γ 0 (a, b) w.r.t p are continuous in some neighborhood of p 0 . The continuity w.r.t other parameters q, γ, α, β can be verified using a similar argument. Define f n (p) := P n ab · exp(−γ∆t) Note that for any γ, p and n, |f n (p)| ≤ M n := exp(−γ∆t) (γ∆t) n n! , and ∞ n=1 M n = 1 < ∞, then by Weierstrass M-test ∞ n=1 f n (p) converges uniformly to f (p) in p ∈ (0, 1). Note that f n (p) is continuous, then f (p) is continuous in (0, 1). Under uniform convergence, we can interchange the order of summation and differentiation, which gives hence ∞ n=1 f ′′ n (p) converges uniformly to f ′′ (p) in some neighborhood of p 0 . Since f ′ n (p) and f ′′ n (p) are continuous in some neighborhood of p 0 , and they are all uniformly convergent, then f ′ (p) and f ′′ (p) are continuous in some neighborhood of p 0 .
Example. The one-step transition matrix {P ab } for networks with 3 nodes under ER model is shown in (5.14), where the state space is X = {0, 1} × {1, 2, · · · , 8}) \ {(0, 8), (1, 1)} as complete network cannot be obtained by deleting an edge and empty network cannot be obtained by adding an edge. And the mapping between labels and networks is given in Figure 5

Supplementary Simulation Results
This section provides numerical results for Figure 6 and Figure 7 in the main text, and the testing results for the second simulation study designed to evaluate the effect of observation rate κ on the rate of detection for ER and PR using Bayes factor.

Testing Results
A second simulation study is designed to evaluate the effect of observation rate κ on the rate of detection for ER and PR using Bayes factor. We set B = 500, 000, while keeping the rate of change fixed at 2 (γ = 2). We let κ take on values in {0.5, 1, 1.5} and N take on values in {10, 20, 30}. For each combination of settings, we generate 100 ER sequences and 100 PR sequences. We then conduct hypothesis testing on each sequence. We also fit a GLMM to analyze the testing results. The simulation and GLMM results are shown in Figures 6.1, 6.2 and Table 6.3. We can see that the observation rate significantly affect the rate of detection. The rate of detection significantly increases by increasing κ from 0.5 to 1, but stops to significantly increase as κ further increases.

Supplementary Application Results
This section provides a full set of results for seizure 1, 2 and 3 on the left Hemisphere. For each seizure,we report the mean and standard deviation of parameter estimates under the two percolation models, the likelihood evaluated at those estimates, and the estimated Bayes factors in log scale.
The results for all chosen segments (bold) in all three seizure are given in Tables in this section. Figure 7.1, Table 7.1, 7.2, 7.3 are for seizure 1; Figure 7.2, Table 7.4, 7.5, 7.6 are for seizure 2; Figure 7.3, Table 7.7, 7.8, 7.9, 7.10 are for seizure 3.   Table 7.3: Mean and standard deviation of parameter estimates for ER/PR process, log-likelihood estimates and Bayes factor (log scale) estimate based off of 10 trials for seizure 1 on the left brain hemisphere (third bold segment).   Table 7.5: Mean and standard deviation of parameter estimates for ER/PR process, log-likelihood estimates and Bayes factor (log scale) estimate based off of 10 trials for seizure 2 on the left brain hemisphere (second bold segment).  Table 7.6: Mean and standard deviation of parameter estimates for ER/PR process, log-likelihood estimates and Bayes factor (log scale) estimate based off of 10 trials for seizure 2 on the left brain hemisphere (third bold segment).   Table 7.8: Mean and standard deviation of parameter estimates for ER/PR process, log-likelihood estimates and Bayes factor (log scale) estimate based off of 10 trials for seizure 3 on the left brain hemisphere (second bold segment).  Table 7.9: Mean and standard deviation of parameter estimates for ER/PR process, log-likelihood estimates and Bayes factor (log scale) estimate based off of 10 trials for seizure 3 on the left brain hemisphere (third bold segment).  Table 7.10: Mean and standard deviation of parameter estimates for ER/PR process, log-likelihood estimates and Bayes factor (log scale) estimate based off of 10 trials for seizure 3 on the left brain hemisphere (fourth bold segment).

Robustness of Results to Network Construction and Segment Identification
To explore the robustness of the testing results on percolation regimes to the functional connectivity network construction from the ECoG data and the subsequent segment identification, we construct the network time series using a different threshold value (q = 0.05) and rerun the testing procedure on the resulting networks for each of the three seizures. Recall that each snapshot in the network time series is inferred by measuring the coupling strength between pairs of nodes within a sliding window of time and then assigning edges where coupling is judged to be sufficiently strong. Following the network construction method in [7], we assess coupling strength on node pairs through multiple statistical hypothesis tests and declare edges between node pairs whose test is assessed to be significant under a pre-specified FDR level, denoted as q. The FDR level can be regarded as the threshold in our network construction. While all results in the main text are from the networks constructed under q = 0.5, in this section, we provide a counterpart of the results using networks constructed under q = 0.05. Figure 8.1, 8.2 and 8.3 show, respectively, for the three seizures, the resulting network density/GCC over time, the dynamic communities detected using the dynamic plex percolation method (DPPM) proposed in [4], and the segment chosen in ROIs where the large dynamic community appears. Table 8.1-8.6 show the estimation and testing results for each segment identified. In contrast to the three plots in sections 7, we can spot several differences in the newly constructed networks: 1) the network density reduces in that fewer edges are claimed in lowering the level of FDR; 2) the network dynamics in topology changes, but not substantially as there are many similar patterns; 3) the dynamic communities detected from the newly constructed network time series remain similar for some time periods in the first seizure, but change prominently for the other two seizures. Given that the ROIs where segments are identified for percolation regime testing correspond with the dynamic communities detected, the prominent change in ROIs leads to the change in the segments identified for testing.
Specifically, for the first seizure, three segments are selected for testing. Two are at similar places as before, whereas one appears at a new place. Similarly, for the second seizure, one segment selected is at a similar place as before, while the other one is at a new place. Regardless of the change in segment identification, the Bayes factors in log scale are positive for all segments in seizure 1 and seizure 2, which are consistent with our conclusions drawn in the main text based off of networks constructed under q = 0.5. For seizure 3, all large dynamic communities detected in the previous networks are missing in the newly constructed networks. There seems to be only one legitimate community detected near time 130, but of which both size and duration shrink significantly compared with what is obtained in the previous networks. The Bayes factor in log scale for the segment identified in this ROI is positive, which is consistent with our finding in the main text at the similar place. However, under the newly constructed networks, the segment around time 190, where we find signal for PR percolation from the previous networks, is not identified as a qualified segment for testing, as the dynamic community detected around this time is small and transient.
In conclusion, from this robustness check of limited scope, our findings on percolation regimes for seizure 1 and 2 are quite robust to the network constructions. However, it seems that for seizure 3, the finding on the evidence of PR percolation at the late ictal stage is sensitive to the network construction, caused by the ROI and segment identification being sensitive to the network construction for this seizure.      Table 8.5: Robustness check on seizure 2. Mean and standard deviation of parameter estimates for ER/PR process, loglikelihood estimates and Bayes factor (log scale) estimate based off of 10 trials for seizure 2 on the left brain hemisphere (second bold segment).  Table 8.6: Robustness check on seizure 3. Mean and standard deviation of parameter estimates for ER/PR process, loglikelihood estimates and Bayes factor (log scale) estimate based off of 10 trials for seizure 3 on the left brain hemisphere.

Goodness of Fit for RG-HMM
The most commonly used approach for assessing goodness-of-fit of network models is to (i) use Monte Carlo methods to randomly generate from the fitted model and then (ii) compute a particular set of statistical summaries on those networks generated for comparison with those same summaries on the observed networks. This type of numerical approach is heavily used for assessing goodnessof-fit in exponential random graph models (ERGMs) [8] and is referred to as a graphical test of goodness-of-fit. In this section, we adopt this graphical test approach to evaluate how well our models fit the observed networks constructed from the real seizure data. Specifically, for each fitted model on each segment identified from each seizure, we generate 500 network time series from the RG-HMM with the estimated parameters under the inferred percolation process. We compare these generated networks with the observed seizure networks on three structural aspects of networks: the size of giant connected component (GCC), the size of the second largest connected component (SLC), and the density.
One adjustment we made in generating networks from the fitted model is that we scale down the type-I error rate estimate (α) by a factor of 0.3. This is because our earlier simulation results (described in the Results section of the main text) indicate that our estimation algorithm (based on particle filtering) tends to overestimate the error rate parameters when a limited number of particles is used out of consideration for computational feasibility, and the percolation curve is sensitive to the type-I error rate. The factor value 0.3 is chosen based on our experience in the simulation study. Figure 9.1 shows a comparison of the Monte Carlo results for one segment in seizure 1 -one with the type-I error adjusted to account for overestimation, the other without. We can see that without accounting for overestimation in α the size of GCC is completely off under the overestimated error rate, the size of SLC is also a bit off, and the density is less affected. This is as expected in that the false positive edges in the observed networks introduced by a small type-I error rate typically impact the connected components the most and the network density the least. Accounting for overestimation in α, the behavior of the generated networks and the actual seizure networks is qualitatively quite similar for much of the time period.  We can see that the observed curve largely falls within the 95% range of the generated networks for the majority of the seizure segments, although there are a few plots where the seizure curve partially falls out of the range. Overall, the fitted ER/PR model does a respectable job of recreating the GCC/SLC/density dynamics in the selected seizure segments -particularly in light of how simple these models are compared to the complexity of the underlying physiological processes.
Meanwhile, we can observe that the quality of resulting fit differs when assessed under different network statistics. The model appears to fit the seizure data better from the perspective of the GCC and SLC compared to the density. Since connected components play a key role in percolation models, we think GCC and SLC are more important in assessing fit than density. Therefore, based on this graphical assessment of goodness-of-fit, the ER/PR percolation model looks to fit the seizure data decently well.