MEG cortical microstates: Spatiotemporal characteristics, dynamic functional connectivity and stimulus-evoked responses

Highlights • EEG microstate analysis is a method to study brain states in health and disease.• We present a source-space microstate pipeline, giving additional anatomical insight.• Simulations validate source-space microstates.• We identify 10 resting microstates and their patterns of functional connectivity.• Microstate activation likelihoods change in response to auditory stimulus.


S1.1 Mathematical description of the generalized microstate-pipeline
Let X ∈ R N ×T be an electrophysiological data set, where N is the spatial dimensionality (number of M/EEG sensors, or source space dipoles/ROIs) and T is the number of time samples. Each column of X is a spatial map, i.e. a sample of data x(t) ∈ R N ×1 . Then the aim of microstate analysis is to identify a set of spatial maps with high signal-to-noise ratio (SNR) and cluster these maps into a small number of discrete states (Pascual-Marqui et al., 1995;Koenig et al., 1999;Khanna et al., 2015;Michel and Koenig, 2018). The most widely used methodology for EEG is the modified k-means algorithm presented by Pascual-Marqui et al. (1995) and Koenig et al. (1999), although a range of methods exist (Michel and Koenig, 2018) which have been demonstrated to give consistent spatial topographies (Khanna et al., 2014) and information-theoretically invariant temporal sequences (von Wegner et al., 2018). There are steps in the EEG microstate methodology which limit application to MEG sensor space or M/EEG source space data (outlined in Main Text Section 2.1), and to overcome this we here present the EEG microstate k-means algorithm (Pascual-Marqui et al., 1995;Koenig et al., 1999) in a framework which allows for generalization to other large scale electrophysiological datasets such as M/EEG sensor and source space based on an appropriate modality-specific transformation of the each spatial map y = f (x).
In this generalized framework, the EEG algorithm (Pascual-Marqui et al., 1995;Koenig et al., 1999) is recovered by re-referencing the data to a common average, i.e. using the transformation y = x − µ x , where µ x is the average of x over sensors. The equivalent transformations for other modalities are given in Table S1. For sensor MEG, no transformation of the data is required (including no re-referencing). For source space, where source flipping may be an issue, there are two options. One is to use the amplitude envelope of the data (row labelled "Amplitude" in Table S1). However, this requires a narrow band signal and rejects phase information. A second option is to simply take the (element-wise) absolute value of the samples (row labelled "Source" in Table S1), which can be applied to broadband data and mostly maintains phase information (excluding phase differences of π).
The first step of the pipeline involves extracting samples with local peaks of SNR and map stability. These samples are peaks of the global field power (GFP), defined at a given time within our generalized framework as The GFP is therefore the total deviation of the transformed map y from zero across space. For EEG, in which y n = x n − µ x , this is equal to the standard deviation of the map, i.e. the total variation from the common average reference. In MEG and source space, the function is more appropriately viewed as a normalized vector norm of the map, and is therefore the total variation from zero field/current. Peaks of the GFP were here found using Modality Spatial transformation EEG y = x − µ x MEG y = x Source y = (|x 1 |, |x 2 |, . . . , |x N |) T Amplitude y = (|H(x 1 )|, |H(x 2 )|, . . . , |H(x 3 )|) T Supplementary Table S1: Transformations used in generalized formulation of clustering analysis for each modality. Here, x ∈ R N ×1 is a spatial topography with spatial dimension N (i.e. a single time sample of multivariate data) with elements (x) n = x n , µ x is the mean of x, | · | denotes absolute value, and H(·) denotes the (temporal) Hilbert transform (therefore |H(·)| is the instantaneous amplitude envelope).
Matlab's findpeaks function. Appropriate pre-processing of the data including bandpass filtering is important here to generate a smooth GFP signal for accurate identification of peaks.
Once the set of GFP peaks has been extracted, we cluster the set of transformed spatial maps at these peaks into k clusters. The k-means algorithm is outlined in Table 2 of the main text, and a clear graphical representation is given in Koenig et al. (1999) and Michel et al. (2009). The number of states k is a free parameter, and must be chosen appropriately for the data. The algorithm initializes (step #1 in Main Text Table 2) by randomly selecting k transformed maps as initial state centroids C ∈ R N ×k . Here, we apply the k-means++ method of selecting initial maps (Arthur and Vassilvitskii, 2007;Tait et al., 2020), which has better performance and faster convergence than uniform sampling (Arthur and Vassilvitskii, 2007) which has commonly been used for microstate studies (Pascual-Marqui et al., 1995;Koenig et al., 1999;Michel and Koenig, 2018).
The subsequent steps of the algorithm are iterated until some convergence critera are met. Firstly, the distance between each transformed map and each cluster centroid is calculated (step #2 in Main Text Table 2). It is convenient to first define the similarity between a (transformed) map y and a centroid c (i.e. a column of C) as This equation is often called the cosine similarity, as it can equivalently be written as where θ(y, c) is the angle between the two vectors y and c, and hence it is clear that this value is in the range zero (orthogonal maps) to one (maps point in the exact same direction, ignoring polarity). Subsequently, we can define the distance between maps as For EEG, where both y and c have zero mean, it is clear from Equation 2 that the similarity is the absolute value of correlation, in line with the classic microstate k-means clustering algorithm (Pascual-Marqui et al., 1995;Koenig et al., 1999). For MEG and source data this can more readily be interpreted as the cosine similarity, i.e. the absolute value of the cosine of the angles between the vectors y and c. For all modalities, 0 ≤ D ≤ 1, where a value of one represents uncorrelated/orthogonal spatial maps, while a value of zero represents identical maps up to a scaling factor, i.e. D(y, c) = 0 ⇐⇒ y = αc, where α ∈ {R \ 0}.
Each map is subsequently labelled as belonging to the cluster with which it has minimal distance (or maximal similarity) to the cluster centroid (step #3 in Main Text Table 2), i.e.
where subscript i represents the i'th GFP peak with map y i , j represents the j'th cluster with centroid c j , and L i ∈ {1, 2, . . . , k} is the label assigned to peak i. The centroid for each cluster is then updated (step #4 in Main Text Table 2). For a given cluster j containing M maps, let us define the matrix Y j ∈ R N ×M such that each column of S j is one of the transformed maps y in the cluster. As in the algorithm of Pascual-Marqui et al. (1995), the updated centroid of cluster j is here defined as the eigenvector of Y T j Y j corresponding to the largest eigenvalue. While not strictly necessary for the k-means algorithm described above, it is typical to normalize the resulting centroids to unit vector norm (Pascual-Marqui et al., 1995), a process which is performed automatically in many implementations of eigenvalue decomposition, including the implementation used here (Matlab's eig function).
With the updated cluster centroids, the algorithm returns to step #2 (Main Text Table 2), iteratively recalculating distances between maps and centroids, relabelling states, and recalculating the centroids. These steps are iterated until convergence is met, i.e. once there is no change in assigned cluster labels between subsequent iterations (Koenig et al., 1999), or up to a fixed maximum number of iterations, here chosen to be 100 iterations (Tait et al., 2020). Since the choice of initial maps is random, different runs of the algorithm may give different results on the same dataset. For this reason, we repeat the clustering 20 times using a different initial seed on each repeat (Koenig et al., 1999;Tait et al., 2020). The repetition with the highest globally explained variance (Murray et al., 2008, GEV) is used in subsequent analysis, where GEV is defined as: where i denotes summation over samples, σ i is the GFP at sample i (Equation 1), and R(y i , c j ) is the similarity between the map at sample i and the centroid of cluster j (Equation 2). It should be noted here that the phrase GEV is maintained in this study to align with microstate literature, but in reality this measure is only a true measure of explained variance (by a linear model) when R 2 is squared correlation, i.e. the case of sensor EEG signals. However, interpretation is highly similar for all modalities. GEV is always in the range [0, 1], and when GEV=1, the cluster centroid maps can fully explain the data, while GEV=0 indicates the maps do not explain any data.
In the EEG microstate literature, a range of methods exist for clustering states (Michel and Koenig, 2018). The most widely used alternative to k-means is topographic atomize and agglomerate heirarchical clustering (TAAHC). TAAHC has been demonstrated to give consistent spatial topographies (Khanna et al., 2014) and information-theoretically invariant temporal sequences (von Wegner et al., 2018) when compared to k-means in EEG microstate sequences. TAAHC is notably more computationally expensive than k-means for large datasets (von Wegner et al., 2018), motivating our choice for k-means in this study. However, the TAAHC algorithm suffers from the same technical limitations when applied to source/MEG data as the original k-means formalism, and therefore the spatial transformations and generalized definitions of GFP and map (dis)similarity presented here may also be used to adapt the TAAHC algorithm for source/MEG microstates.

S1.2 Optimum number of microstates
A number of methods exist for choosing the optimum number of clusters/microstates from the data (Michel and Koenig, 2018). The most widely used in the EEG microstate literature are the Cross Validation Index (CVI) (Pascual-Marqui et al., 1995) and the Krzanowski-Lai (KL) criterion (Murray et al., 2008). CVI is highly dependent on number of sensors/ROIs, and is known to perform poorly for large numbers of ROIs (≥ 64 according to Murray et al. (2008)). Here, we used 274 channel MEG mapped to 230 source space ROIs, so we did not use CVI. While KL criterion has been used for resting-state microstate analysis (Hatz et al., 2015;Tait et al., 2020), it is believed to work appropriately for event-related potential analysis but result in spurious peaks for resting-state data Michel and Koenig (2018). Alternatively, Custo et al. (2017) considered eleven different criteria for the optimum number of microstates and developed a meta-criterion combining these solutions.
Here, we use the kneedle algorithm (Satopää et al., 2011) to choose the number of states. As the number of clusters k increases, so does the variance explained. Increasing k up to the true number of states will only give a large increase in GEV as more states of the brain are accounted for, whereas increasing k above the true number of states only explains noise and hence will give marginal increases to GEV. Therefore the plot of GEV vs k has a characteristic 'knee' shape. The kneedle algorithm aims to find the number of clusters in the dataset by finding the value of k for which this knee occurs. This criterion is simple, intuitive, and easy to interpret. It does not depend on number of channels, and does not generate multiple spurious peaks due to the smooth increase of GEV with number of clusters. For this reason, we here used kneedle for the optimum number of states.

S1.3 Simulations
Simulations were performed to test the ability of the microstate methodology to estimate a ground truth microstate sequence and spatial maps. Simulations were repeated for k = 4 and k = 10 states. The three main steps to simulations are described below.
Firstly, a generative microstate sequence was simulated. While any sequence of numbers could be used here, we aimed to generate a sequence with EEG-microstate like properties and avoid assumptions known to be invalid for EEG microstates such as Markovianity or stationarity. To do so, we used a random walk decision tree approach, which reverse engineers the method commonly used to transform EEG microstate sequences to random walks using a decision tree (Van De Ville et al., 2010;von Wegner et al., 2016von Wegner et al., , 2018. k − 1 random walks were generated with Hurst exponent 0.7 (Van De Ville et al., 2010). Each random walk was then converted to a binary sequence based on the sign of their derivative, i.e. +1 if they are increasing, and -1 for decreasing. Each binary sequence corresponds to a branch of a decision tree, which places a particular time point within a state. For the example of four states, the first split separates the states {1,2} from states {3,4}, while the second split separates states 1 and 2, and the third split separates states 3 and 4. For example, consider a time point with the three binary sequences having values of (+1,-1,+1). The first binary sequence is +1, so the current state is either state 1 or 2 (a value of -1 would indicate state 3 or 4). To decide which of these two states is the current sample, we consider the 2nd sequence (which separates states 1 and 2; if the first sequence had a value of -1 we could consider the 3rd sequence which separates states 3 and 4). Values of +1/-1 in the 2nd sequence correspond to states 1 and 2 respectively, so since the 2nd sequence had a value of -1 the state at this time point is 2. Prior to binarizing the states, we smoothed the random walk with a moving average filter. The width of the filter was chosen for each simulation such that the mean duration was approximately 50 ms (Tait et al., 2020). We will denote the resulting sequence L, which is a vector of length T (the number of samples), where the t'th sample L(t) ∈ {1, 2, . . . , k}. This method resulted in generative sequences with mean duration of approximately 50 ms, Hurst exponents of approximately 0.7-0.8, and which were highly non-Markovian in line with past EEG microstate studies (Van De Ville et al., 2010;von Wegner et al., 2017von Wegner et al., , 2018Tait et al., 2020).
Neural dynamics were then generated for each microstate. Each state was assigned a Wilson-Cowan neural mass model in a parameter regime previously used to model resting-state dynamics (Deco et al., 2009;Abeysuriya et al., 2018). For state j, the equations of the model were Here, E j and I j correspond to the firing rates of pools of excitatory and inhibitory neurons associated with microstate j respectively. ν E and ν I are independent Gaussian noise terms with variance 1. The sigmoid function φ(x) is given by The remaining parameters and their interpretations are given in Table S2. The input P 0 was chosen by Deco et al. (2009) such that (in the absence of microstate related drive P ms ) the model has a single stable fixed point close to a Hopf bifurcation, and as such exhibits low amplitude aperiodic noise-driven fluctuations with resonance at an alpha band frequency. Therefore all k − 1 neural masses with P ms = 0 at any given time are in this regime. When microstate associated drive is added (P ms = P 1 ), the system moves across the Hopf bifurcation into a high-amplitude stable periodic alpha oscillation. The LFP of each microstate was given by the sum of synaptic input onto the excitatory population, where s j is the observed LFP for microstate j.
Simulations were performed using the stochastic Heun algorithm with time step of 0.1ms. Each simulation was initialized with 2 seconds to reach steady states, and then 60 seconds of data was simulated. Simulated Finally, the states were assigned spatial maps. As with the choice of generative sequence, the choice of spatial map is arbitrary.
Here we used open acess fMRI derived resting-state network maps (Smith et al., 2009) aligned to our 230 ROI atlas by averaging all voxels within an ROI. The result was a matrix W ∈ R 230×k , whose columns are the spatial maps for each state. If we define S ∈ R k×T to be a matrix containing the neural dynamics of each state, i.e. S j,t = s j (t), then the simulated LFP at the 230 ROIs were given by X = W S + λη, where X ∈ R 230×T has rows corresponding to time series for each ROI. η ∈ R 230×T is pink noise, and λ is a scalar factor for the noise chosen such that SNR was 1.

S1.3.1 Comparison with other state-estimation methodologies
Because simulations have ground truth maps and microstate sequences, they were used to compare k-means against other clustering algorithms: • Principal Component Analysis (PCA), using Matlab's implementation (https://uk.mathworks.com/help/ stats/pca.html). PCA is an outer product model in which each component is described by an activation map and a component time courses. In PCA, the first component is the map/time-course pair which maximally explains variance across the dataset. Each subsequent component is temporally uncorrelated to previous components and maximises the remaining variance explained. Component time courses were thresholded in a winner-takes-all approach to obtain discrete state time courses.
• Independent Component Analysis (ICA), using the fastica algorithm (Hyvarinen, 1999). Much like PCA, ICA is an outer product model in which each component is described by an activation map and a component time course. However, while PCA tries to maximise variance explained, ICA aims to unmix true independent sources underlying the signal by maximising non-Gaussianity of component time courses. As in PCA, discrete state activations were achieved using winner-takes-all.
• Hidden Markov Modelling (HMM), using the HMM-MAR toolbox (https://github.com/OHBA-analysis/ HMM-MAR). HMMs are a broad framework in which the data is modelled as being generated by a series of hidden states, each of which are associated with an observation model. We used the recommended settings for raw source and amplitude data in the HMM-MAR toolbox, which are based upon  and (Baker et al., 2014) respectively. For amplitude data, the observation model associated with each state is a Gaussian process generated from a state-specific covariance matrix. For raw source data, the observation model is assumed to be a time-delay embedded covariance matrix, which gives both time-lagged covariance and spectral information for each state.
Four statistics were used to assess the quality of the state estimation. Firstly, GEV was used as a purely data driven quantity. Secondly, we correlated the estimated spatial maps for each state against the ground truth maps using a template matching algorithm. Third, we assessed the temporal sequence accuracy using the normalized mutual information (MI) between the ground truth and estimated sequence. Finally, we also calculated the normalized mutual information only at the GFP peaks (MIg), since these are the samples with locally maximal SNR.

S1.3.2 Phase locking simulations
To demonstrate differences in interpretation between different clustering methods described in the previous section, we additionally ran some simulations for a simple 5 node model consisting of 3 states: • In state 1, all nodes had a 1/f spectrum with an alpha (10 Hz) peak. All nodes were uncorrelated (initial phases for each frequency and node IID distributed and drawn from U (0, 2 * pi)).
• State 2 was identical to state 1, except alpha rhythms were coherent (initial phases for frequencies 8-12 Hz were IID distributed and drawn from N (0, 0.1), introducing phase locking in the alpha frequency band).
• State 3 was identical to state 1, except the peak was at a beta frequency (17 Hz) instead of 10 Hz.
Dynamics for node i in state j were given by: A where the state-specific peak frequencies (f 0,j ) and initial phases (φ 0 j (f )) are described in the descriptions of each state above. Frequencies were simulated from 1-100 Hz in steps of 0.2 Hz.
We simulated 20 trials, each containing 5 minutes of data sampled at 200 Hz. State durations were uniformly distributed from 20ms to 1s, while transitions were drawn from a Markov chain with no self loops. The elements of the Markov matrix were randomly drawn from a uniform distribution (except the diagonals, which were zero) and each row summing to 1. State estimation via each of the clustering methods (k-means, HMM, PCA, ICA) was performed at the group level by concatenating each trial.
Crucially, these simple simulations aim to highlight some differences between methods. Microstates, PCA, and ICA define states in terms of patterns of instantaneous activation (topographic maps for microstates, spatial vectors for PCA/ICA components). The time-delay embedded HMM conversely defines states in terms of covariance and spectral content. Hence, the states for this simulation are set up in such a manner that microstates, PCA, and ICA are likely to fail (due to all states having equal power across all nodes), but HMMs should succeed in differentiating the states.

S1.4 Microstate segmented functional connectivity
Our method is described below, and a visual overview of the pipeline is given in Figure S1. We first assign each time point in the broadband (1-30 Hz) data a microstate class using the methods described above. We subsequently filter the data into one of three narrow bands (theta 4-8 Hz, alpha 8-13 Hz, beta 13-30 Hz), and calculate the analytical signal using the Hilbert transform. For a given microstate, we concatenate all samples of the analytic signal within this microstate class, and use this to derive time courses of instantaneous phases and amplitudes. While these time series are no longer continuous, the Hilbert transform was performed on the continuous data so they represent instantaneous phases and amplitudes of the continuous signal at these samples. Under the assumption that microstates are associated with specific patterns of phase synchronization (i.e. the hypothesis we wish to test), we can subsequently calculate functional connectivity from the concatenated phase/amplitude time courses. A large number of metrics exist to quantify functional connectivity. Since amplitude coupling is typically performed on a slower time scale through lowpass filtering and downsampling, here we chose to address microstate synchronization using phase coupling, specifically the weighted phase lag index (wPLI) (Vinck et al., 2011;Colclough et al., 2016), which is a reliable estimator of phase coupling (Colclough et al., 2016) while attenuating spurious connections due to leakage Vinck et al. (2011). The (unweighted) phase lag index was used by Hatz et al. (2015) in their study of microstate connectivity, but wPLI is more reliable and less biased due to small noisy perturbations than PLI (Vinck et al., 2011;Colclough et al., 2016). This method is repeated for each microstate class, to obtain functional connectivity patterns for each class.
The methodology described above assumes that microstates are associated with a unique pattern of phase synchronization, and therefore by concatenating samples within a particular class we obtain a stationary time course from which it is meaningful to calculate a single functional network. To test this hypothesis, we applied multi-variate pattern analysis (MVPA) using the MVPA-Light toolbox (Treder, 2020). For each frequency band, the following analysis was performed. Firstly, for each participant, scan, and microstate class, we followed the methodology above for calculating microstate functional connectivity. However, prior to calculation of wPLI, the concatenated phase/amplitude time-courses were segmented into non-overlapping windows of 1280 samples (equivalent to 5-seconds). Across all participants, scans, and microstate classes this resulted in 5413 networks per frequency. Each network was associated with a single microstate class. For each network, the weighted degree distribution (i.e. sum of all edges connecting to a node) was calculated, resulting in a 230 (number of ROIs) dimensional vector for each network. These 230 dimensional vectors were used as features in a multi-class linear discriminant analysis classifier, with the associated microstate class as the class-labels to be predicted. Classification used default values in the MVPA-Light toolbox, while 5-fold cross-validated classification accuracy was used as the metric of performance. If high classification accuracy is achieved, this is strongly suggestive that microstates are associated with unique patterns of phase synchronization, since it suggests that given any 1280 samples from a given microstate class (across participants/scans), this class can be predicted based on the functional connectivity pattern. Permutation testing was used to test for statistically significant classification accuracy, using 1000 permutations.
Finally, for visualization purposes only, we plotted the deviation of the microstate functional connectivity pattern from the static connectivity pattern. This was done since plotting raw microstate functional connectivity patterns gave visually similar results, suggesting an underlying static connectivity associated with all states. Firstly, we obtained a single microstate network (for each class) by averaging the network across all 5-second segments within the class, participants, and scans. We obtained a single static network (used for all classes) by segmenting the original data into 5-second epochs, calculating wPLI, and averaging the resulting networks over all epochs, participants, and scans. We subsequently normalized the microstate-connectivity by the static connectivity using the equation where superscripts i, j denote ROIs i and j. This equation has in the past been used to remove linear-connectivity from non-linear connectivity matrices (Rummel et al., 2015), but here is applied to remove static connectivity from microstate-connectivity matrices. For visualization, the top 1% of edges from the matrix wPLI deviation are shown.

S2.1 Reproducibility and reliability of maps
To test the reliability and reproducibility of the estimated microstate maps, we repeated the k-means clustering 250 times using a different random seem and sampling a different selection of 150,000 GFP peaks. For each of these permutations, only a single repetition of the algorithm was performed (unlike the original run, where 20 repetitions were used and the repetition with highest GEV was chosen). The estimated 10 maps on each of these iterations were aligned to the original 10 maps using iterative template matching, i.e. map similarity between the estimated maps and the templates (the original 10 maps) was calculated according to Equation 2, and the template/map pair with highest similarity were given the same label. Then both this template and map were removed from the pool. This was repeated until all maps were aligned. To quantify how tightly the 250 runs were clustered about the original maps, we use the Silhouette coefficient (Rousseeuw, 1987). The silhouette coefficient has a value for each sample (i.e. 250 repetitions and 10 clusters), ranging from -1 (poorly clustered) to +1 (well clustered). Distance between maps was calculated using Equation 4. The measure rewards low intra-cluster distances and high inter-cluster distances. Therefore if the estimated maps for all 250 runs are highly repeatable, the template matching algorithm will place each run in approximately the same location in the abstract space spanned by the map values, resulting in low intracluster distances (and high inter-cluster distances relative to the low intra-cluster distances) and therefore large silhouette coefficient.
Results are shown in Figure S2. On average, there was high similarity between our microstate maps and maps estimated from subsequent runs, and the mean (± standard error) of similarity values across all classes and runs was 0.9668±0.0019. The silhouette coefficient was 0.7726±0.0047, indicating that the various repetitions were well clustered. Of the maps, the frontal and left/right parietal were the least reproducible (still having

S2.2 Assessing non-stationarity
An important point to note is that the k-means clustering algorithm will cluster the data even in the absence of true stable microstates. We therefore additionally tested whether the statistics of the microstate sequences could be replicated by random fluctuations in a stationary Gaussian process, or whether they required genuine non-stationarity and the appearance of stable states. To do so, for each participant and scan, we generated 100 multivariate iterative amplitude adjusted Fourier transform (mvIAAFT) surrogates (Schreiber and Schmitz, 2000). These surrogate time series are stationary Gaussian processes with the same power/cross spectra and distributions as the original dataset (Schreiber and Schmitz, 2000;Rummel et al., 2011). Therefore if the microstate statistics from the empirical MEG data were drawn from the same distribution as the microstate statistics from the surrogate data, it is probable that the observed 'microstates' are really just due to random fluctuations of a stationary process. Similar approaches have been taken to statistically analyse non-stationarity in dynamic functional connectivity in neuroimaging datasets (O'Neill et al., 2018). Here, we found no surrogate distributions were significantly different from a normal distribution (minimum p-value over all participants, scans, and microstate statistics was p = 0.2310 using a Kolmogorov-Smirnov test), so therefore quantified differences in the empirical statistics vs the surrogates via a z-score, i.e. z = (x − µ surrogate )/σ surrogate , where x is the empirical statistic of interest, and µ surrogate and σ surrogate are the mean and standard deviation of the surrogate distributions respectively. For each statistic, the z-scores were subsequently averaged over scans, giving a single z-value per participant. To test whether the resulting distribution of z-scores across participants was likely drawn from a normal distribution with mean zero (i.e. whether the z-scores were significantly different from the surrogates), a one sample t-test was used for each statistic. For this analysis, the statistics which were class-specific (duration, coverage, occurrence) were transformed to a standard deviation across classes, so the test estimated specifically whether heterogeneity between microstates in these properties could be accounted for in a stochastic process. Results of the non-stationarity analysis are shown in Supplementary Table S3 and Supplementary Figure S3. Mean duration of microstates was significantly larger than could be explained by random fluctuations in a stationary Gaussian process with the same power spectra, cross-spectra (and covariances), and distribution of data, suggestive of non-stationarity and the occurence of stable microstates in empirical resting-state MEG. Heterogeneity in durations across states (quantified by standard deviation; stdDur) was also significantly larger in empirical data than due to chance.

S2.3 Comparing state-estimation methods based on simulations
In this section, we aim to compare and contrast the different methods of brain-state estimation outlined in section S1.3.1 via simulations. Two models were used, outlined in Supplementary Table S4.
The 'activation' model (section S1.3, main text Figure 1) is set up in such a way that meets the assumptions of k-means, PCA, and ICA. States are defined in terms of spatial maps, i.e. each state has a unique pattern of activations. While the ground-truth state sequences were discrete, i.e. only one state was active at a given  Supplementary Table S5: Mutual information between ground-truth and estimated state sequences in the various simulation scenarios. Values are given as a percentage, and shown are median and interquartile range (in brackets). Abbreviations: Act/source/k is the activation model applied to raw time courses with k simulated states. Act/amp/k is the activation model applied to amplitude envelopes with k states. Phase represents the phase-locking model.
time, some degree of mixing between states was allowed since even inactive states exhibited low-amplitude noise-driven resonant oscillations, while active states exhibited high-amplitude limit-cycle oscillations. These simulations were not set up well for HMMs, since states were not defined in terms of covariance matrix or spectral content. Furthermore, the non-Markovian nature of the state sequences may be a limiting factor for the performance of HMMs (Koenig and Brandeis, 2016). In addition, the complexity of HMMs will greatly increase with dimensionality, and hence for a 230 node system may require a large amount of data to train (e.g. Baker et al. (2014) used 9 subjects × 10 minutes of data with 40 nodes, and Vidaurre et al. (2018) used 55 subjects × 5 minutes of data with 42 nodes), while in these simulations clustering was performed separately for each trial.
Results of the activation simulations are shown in Supplementary Table S5. Across both raw and source time courses, as well as 4 and 10 states, HMMs consistently performed worst as expected. MI between simulated and estimated state sequences ranged between 0.8 and 4.2%, depending on model configuration. Conversely k-means, PCA, and ICA had values ranging from 27.2-45.4% across all model configurations. Broadly speaking, PCA and k-means were best performing in terms of both state sequence estimation (Supplementary Table S5) and spatial map estimation (Supplementary Table S6).
The 'phase-locking' model (section S1.3.2) is set up in such a was that meets the assumptions of HMMs, while failing to meet any of the assumptions for k-means, PCA, or ICA. States are defined in terms of phaselocking and frequency peaks, while all states have uniform activation across all nodes. Hence, in theory the methods which define states in terms of activation patterns should fail. To better allow for HMM training, we kept the dimensionality low (5 nodes), simulated 20 trials of 5 minutes in length, and performed analysis on the group-level (i.e. fit states to all trials simultaneously). Results are shown in Supplementary Table S5. k-means, PCA, and ICA all performed poorly (MI between ground-truth and estimated sequences ranging from 1.4-2.6%), while HMMs performed well (MI of 47.2%).
It is important to note that both sets of simulations are likely extreme examples and empirical M/EEG data will to some degree meet assumptions of both models. For example, if the variances of nodes (i.e. diagonals of the covariance matrix) change between states, then this will be reflected in patterns of activation.  Table S6: Map similarity between ground-truth maps and estimated spatial maps associated with k-means, PCA, and ICA (note HMM states are defined in terms of (lagged) covariance matrices and not spatial maps, so HMM is excluded from this table). Values are Fisher z-scored for better differentiation between high values, and shown are median and interquartile range (in brackets). Abbreviations as in Table S5 .
such as bursts will likely have unique spatial patterns, which again will reflect as patterns of activation. Furthermore, in our study we found a significant association between patterns of activation and functional connectivity, suggesting that patterns of activation likely drive covariance matrices. A detailed study of how source-space estimates of brain states from different algorithms in real data would be of much interest.
Supplementary Figure S1: Calculating microstate functional connectivity. (A) Raw MEG source space data. Note the data shown here is simulated data for visualization purposes, but empirical data was used in the analysis. (B) By appropriately preprocessing and performing microstate analysis on the data, we can assign each sample a microstate class (denoted here by colour). (C) To calculate narrowband functional connectivity, the data must be filtered into an appropriate frequency band. (D) Using the Hilbert transform, calculate instantaneous phases or amplitudes from the narrowband data. (E) Concatenate all samples with a label according to a given state. Here, we have shown all samples from the state with a light-blue background. (F) Window the concatenated data into epochs with 5 seconds worth of samples. (G) For each window, calculate functional connectivity. Here, we used wPLI. (H) To obtain a group level representation of the functional network associated with this microstate, average all functional connectivity matrices derived from this state. (I) To statistically test whether microstates are associated with a specific functional network structure, each matrix (along with the sets of matrices from all other microstate classes) can be submitted for multivariate pattern analysis.
Supplementary Figure S2: Reproducibility of resting-state microstates. The k-means clustering algorithm was repeated 250 times with different sets of GFP peaks and initial seeds, and then matched to the original set of maps using a template matching algorithm. Each point correponds to a map estimated by a different repeat of the algorithm, while the classification according to the original maps is given by the x-axis. (Top) Map similarity between the estimated and original maps. The y-axis labels correspond to the raw values of map similarity, but the axis itself was transformed according to a Fisher z-transform to help differentiate large values of map similarity. (Bottom) Silhouette coefficients of each point.  Koenig and Brandeis (2016) using resting-state source MEG microstates, with modified definitions of GFP and map (dis)similarity as described in section S1.1. Upper rows show the observed data, while lower rows show shuffled data to simulate assumptions of Hidden Markov Modelling (i.e. independence of GFP and topographic change; Koenig and Brandeis (2016)). Left column shows original data, while right column shows log-log transformed data. After log-log transformation, there is a correlation of −0.6225±0.0189 (mean ± standard error across participants). The data shown here is from a single representative participant, with correlation of −0.6466. This figure demonstrates that the inverse relationship between GFP and the topographic change holds in MEG source space data, justifying the use of GFP peaks (as the most stable samples) in the k-means clustering algorithm and demonstrating that the assumptions of Hidden Markov Modelling may be inappropriate for this data (Koenig and Brandeis, 2016).