Uncovering hidden resting state dynamics: A new perspective on auditory verbal hallucinations

In the absence of sensory stimulation, the brain transits between distinct functional networks. Network dynamics such as transition patterns and the time the brain stays in each network link to cognition and behavior and are subject to much investigation. Auditory verbal hallucinations (AVH), the temporally ﬂuctuating unprovoked ex- perience of hearing voices, are associated with aberrant resting state network activity. However, we lack a clear understanding of how diﬀerent networks contribute to aberrant activity over time. An accurate characterization of latent network dynamics and their relation to neurocognitive changes necessitates methods that capture the sub-second temporal ﬂuctuations of the networks’ functional connectivity signatures. Here, we critically evaluate the assumptions and sensitivity of several approaches commonly used to assess temporal dynamics of brain connectivity states in M/EEG and fMRI research, highlighting methodological constraints and their clinical relevance to AVH. Identifying altered brain connectivity states linked to AVH can facilitate the detection of predictive disease markers and ultimately be valuable for generating individual risk proﬁles, diﬀerential diagnosis, targeted intervention, and treatment strategies. network interactions. A high probability of switching between two particular brain states could hint towards a close temporal interplay between two functionally related (sub-)networks or critical hubs within a network. In turn, aberrant transition patterns could link to a deﬁcient use of network resources, facilitating cognitive errors, e.g., the misattribution of internal sensory events to external sources, which may manifest in hallucinations.


Introduction
The brain's resting state (RS) is characterized by spontaneous fluctuations of neural activity in the absence of an overt task. The neural time series of RS data capture the interaction of underlying resting state networks (RSNs), i.e., the transitions in and out of distinct connectivity patterns over time. Functional magnetic resonance imaging (fMRI) research demonstrated high spatial consistency of RSNs; however, functional connectivity (FC) within and between networks is temporally non-linear, complex, and dynamic on a sub-second timescale ( Hutchison et al., 2013 ). Here, we consider the brain as a dynamic system that fluctuates between transient periods of high and low connectivity, stable and unstable (i.e., noisy) episodes, and engaged and disengaged functional networks, each associated with varying oscillatory signatures over time. This perspective is in line with the idea of segmenting neural time series data into a sequence of recurrent distinct states (i.e., network FC patterns). Such segmentation methods have been successfully applied to fMRI ( Hunyadi et al., 2019 ;Kottaram et al., 2019 ) and M/EEG data ( Baker et al., 2014 ;Trujillo-Barreto et al., 2019 ;Vidaurre et al., 2017 ;> 30 min), and cease without a clear termination point ( Daalman et al., 2011 ). Research on the behavioral dynamics of hallucinations using experience sampling methods found that they are often preceded by fluctuations of delusional ideation ( Oorschot et al., 2012 ) or an increase in negative affect ( So et al., 2021 ). This suggests that the experience of hallucinations is not bound to their on-and offsets but involves a complex and dynamic interplay of cognitive-emotional processes. This is further supported by state and trait studies of AVH showing that alterations in behavior, cognition, and neurophysiology are not restricted to the hallucination periods but include mechanisms that may predispose an individual to hallucinate ( Ford et al., 2012 ;Kühn and Gallinat, 2012 ). Evidence supports the idea of a psychosis continuum, in which more pronounced and persistent symptoms in combination with genetic and environmental risk factors are often associated with a higher need for care and increased clinical relevance ( Johns et al., 2014 ;Johns and Van Os, 2001 ;Van Os et al., 2009 ). In the context of AVH, this continuum is often referred to as hallucinatory proneness (HP; Johns et al., 2014 ;Johns and Van Os, 2001 ).
The characteristics of AVH can differ substantially between nonclinical and clinical voice-hearers, including the phenomenology, emotional appraisal, perceived control over the voices, and associated level of distress ( de Leede- Smith and Barkus, 2013 ;Hill and Linden, 2013 ). Nonclinical AVH tend to be of higher linguistic complexity (i.e., sentences and conversations), whereas AVH in psychotic disorders are often linguistically less complex (i.e., single words, derogatory patterns; de Leede- Smith and Barkus, 2013 ). The frequency of occurrence of AVH is usually higher in patients with psychotic disorders as compared to healthy voice hearers. Nonclinical AVH seem to fluctuate considerably across time and, in some cases, cease completely ( Honig et al., 1998 ;Larøi et al., 2012 ). Conversely, persistence of AVH and the formation of secondary delusions, affective disturbances, or subclinical levels of schizotypal phenomena are risk factors for the development of clinical AVH ( de Leede- Smith and Barkus, 2013 ;Sommer et al., 2010 ). Moreover, while patients with AVH often report low controllability over the voices (e.g., regarding their on-and offset) and a negative appraisal of the voices, non-clinical voice-hearers tend to perceive higher levels of control and evaluate the voices as emotionally neutral or positive. Hence, the perceived level of control is assumed to be a strong indicator of the emotional distress associated with AVH and the need for care ( de Leede-Smith and Barkus, 2013 ;Larøi et al., 2012 ).
The experience of voices is a transdiagnostic phenomenon occurring in various neuropsychiatric disorders including Borderline Personality disorder (BPD), Parkinson's disease (PD), epilepsy, and mood disorders ( Fénelon et al., 2000 ;Larøi et al., 2012 ;Slotema et al., 2012 ;Toh et al., 2015 ). Patients with PD commonly experience hallucinations in the visual and less frequently in the auditory domain as compared to ScZ patients. If AVH are present in PD, they are often nonimperative and have no affective component ( Larøi et al., 2012 ). The phenomenology of AVH in BPD has been found to be similar to those in ScZ ( Slotema et al., 2012 ). AVH in mood disorders are associated with childhood trauma and the presence of primary or secondary delusions. Improved insight into the neurocognitive mechanisms of AVH in these disorders could benefit differential diagnosis and research on overlapping symptomatology of distinct clinical profiles. As the majority of research on AVH concerns Schizophrenia spectrum disorders or healthy voice-hearers, the psychosis continuum (and hallucination proneness in the general population) will be the focus of the current review.
The continuity of psychotic experiences in the general population is supported by behavioral, electrophysiological, as well as structural and functional neuroimaging studies. Several mechanisms have been proposed to underlie AVH, including defective monitoring of inner speech, source misattribution, lack of inhibitory control, and intrusive cognitions. Importantly, these mechanisms are not mutually exclusive, however, they may account for AVH to a different degree in clinical and non-clinical voice-hearers ( Badcock and Hugdahl, 2012 ). For instance, AVH have been linked to deficient contextual integration and inner-speech monitoring leading to misattribution of externally versus internally generated sensory events (i.e., source misattribution; Allen et al., 2005Allen et al., , 2006Johns et al., 2014 ;Pinheiro et al., 2019 ). This cognitive deficit is observed in patients with hallucinations and in healthy but hallucination-prone individuals with delusional ideation ( Allen et al., 2006( Allen et al., , 2004. Further, AVH are associated with involuntary and dysfunctional verbal intrusions of long-term memory, which is consistent with imaging findings of heightened activation of medial temporal regions during AVH ( Badcock and Hugdahl, 2012 ). This assumption is further supported by a higher rate of intrusion errors in verbal word recall tasks and deficient voluntary inhibitory control (i.e., resistance to interference) in hallucination-prone individuals ( Paulik et al., 2008 ). Consistent with the continuum perspective of AVH, these behavioral and cognitive aberrancies are more pronounced in ScZ patients with AVH ( Badcock and Hugdahl, 2012 ). Moreover, non-clinical AVH and high HP link to event-related potential changes indicative of disturbed sensory feedback to voices ( Pinheiro et al., 2020 ), white matter abnormalities ( Johnson et al., 2021 ), and structural changes (i.e., decreased volume, fractional anisotropy, and orientation dispersion) in the superior temporal gyrus (STG), a brain area that is critically involved in auditory and language processing ( Spray et al., 2018 ). Functional changes include altered connectivity in language production areas (i.e., inferior frontal gyrus, IFG), auditory processing areas (e.g., STG), and brain regions involved in memory (parahippocampal areas), pointing towards altered functional interaction and information transmission between the respective cortical networks . The altered FC between IFG and parahippocampal regions in non-clinical voice-hearers might facilitate memory retrieval, thereby providing additional evidence for the hypothesized intrusion of long-term memory into ongoing cognitive processes ( Ćur či ć-Blake et al., 2017 ;Diederen et al., 2012 ).
Network analyses of RS fMRI data revealed that non-clinical voicehearers display heightened connectivity in the cingulate gyrus, precuneus, insula, and left STG as compared to healthy controls without AVH ( van Lutterveld et al., 2014 ). These regions are critical hubs of the default mode network (DMN), the salience network (SN), and the auditory network. Connectivity changes within and among these networks could result in an increased focus on and increased salience attached to internal cues (e.g., spontaneously retrieved memories), combined with altered auditory processing. The 'resting state hypothesis of AVH' introduced by Northoff and Qin (2011) summarizes the interaction between different RSNs as an interplay of dysfunctional bottom-up and top-down processes, facilitating neural confusion between spontaneously generated and externally induced activity in the auditory cortex (i.e., leading to source misattribution). This highlights the importance of precisely timed and efficient (dis-)engagement of large-scale neural networks for healthy brain function ( Menon, 2011 ). In summary, these findings demonstrate that AVH i) are the result of a complex interaction of multiple neurocognitive processes, ii) may originate from dysfunctional allocation of neural resources at rest, and iii) may link to similar functional alterations in clinical and non-clinical populations, with the latter potentially representing an attenuated version of network dysfunction observed in patients with a psychotic disorder ( Geng et al., 2020 ;Menon, 2011 ;van Lutterveld et al., 2014 ;Weber et al., 2020 ).

Dynamic resting state analyses and their relevance for AVH
Dynamic FC (dFC) has been shown to better predict functional alterations and disease as compared to static FC measures ( Du et al., 2018 ;Preti et al., 2017 ). This is supported by results showing that dynamic metrics of FC provide higher accuracy for classifying ScZ, Bipolar disorder, and healthy controls ( Rashid et al., 2016 ). To our knowledge, the classification performance of static versus dynamic measures of FC has not been directly tested to differentiate voice-hearers from controls. However, given the temporally fluctuating nature of AVH (on the cognitive-behavioral and neural level) in the absence of external sen-sory input, dynamic network connectivity analyses of the resting brain should be particularly suited to reveal their neural basis.
Despite extensive research on fMRI-derived RSNs and respective M/EEG correlates, we still lack a clear understanding of the temporal dynamics of these network connectivity patterns on a sub-second timescale, that is, how each network and its functional interaction with other networks contribute to altered neurocognition and behavior over time. Moreover, FC patterns among functional networks are hidden, i.e., they cannot be observed directly. Hence, capturing these sub-second dynamics requires not only high-temporal resolution measurements such as M/EEG but also suitable analysis tools that accurately model the signatures of these hidden state patterns and allow inference of brain connectivity states from the observations. Likewise, the relationship between AVH and RSN dynamics is not straightforward. In the past, different segmentation approaches have been applied to find associations between temporal characteristics of the identified connectivity states and AVH. However, these approaches differ significantly in their assumptions, methodological advantages and limitations, and the data modality that they are commonly applied to ( Hutchison et al., 2013 ).
In the following, we discuss several commonly used segmentation methods (fMRI sliding window analysis, EEG microstates, Hidden (Semi-)Markov Modeling) to assess the temporal dynamics of brain connectivity states based on RS data. For each method, we will describe the typical workflow and highlight the most important constraints. We further discuss the insights we have gained from each approach into the underlying mechanisms of AVH. Considering the limitations of commonly applied segmentation methods, we propose a generative modeling approach to be used as an alternative or complementary analysis tool in future studies on AVH to overcome the major limitations of the previously discussed methods, and thereby increase our knowledge about RSN dynamics in healthy and pathological contexts. Accurate characterization of temporal (and spatial) markers of healthy and impaired network dynamics could help to elucidate the mechanisms of altered cognition and behavior, differentiate between neurocognitive profiles in distinct patient populations, and facilitate the identification of individuals with an increased risk of transitioning to a clinical status.

fMRI sliding window approach
The sliding window analysis reveals functional relationships within and between large-scale neural networks. A predefined fixed time window is shifted along with the continuous time series data by a specified number of time points, such that the windows are temporally overlapping to prevent information loss. The data points in each window are used to quantify the associations between spatially distinct but temporally correlated brain regions or voxels of interest. These associations are often calculated using Pearson's linear correlation coefficient or covariance matrices. Comparing this metric as a function of time allows quantification of temporal variability in connectivity strength ( Savva et al., 2019 ). Sliding window analyses are often combined with successive clustering methods to detect recurrent transient patterns of FC (i.e., connectivity states; Allen et al., 2014 ;Damaraju et al., 2014 ;Rashid et al., 2014 ). These connectivity states are obtained by calculating FC matrices for each time window and then clustering them into a predefined number of states. A widely applied clustering procedure is the k-means algorithm. This algorithm groups a larger number of observations (here, the individual covariance matrices) into a smaller set of clusters so that each observation belongs to the cluster with the nearest mean, referred to as cluster centroid ( Na et al., 2010 ). Thus, each connectivity state is acquired by maximizing the correlation within that cluster. Lastly, the resulting clusters, i.e., the connectivity states are projected back onto the continuous time series data to infer temporal variability in FC, by finding the best fit between the data covariance at each time point to one of the clustered FC states (see Fig. 1 ). Zhang et al. (2018) investigated the relation between RS fMRI FC metrics and AVH in ScZ patients, specifically focusing on the interplay between speech production and comprehension areas. To this end, they computed connectivity windows at different time points by means of sliding windows and used k-means clustering to identify recurring connectivity states. Compared to patients without AVH (AVH-), patients with AVH (AVH + ) displayed reduced FC variability and strength between the left frontal and left temporal auditory areas, involved in speech production and comprehension, respectively. Importantly, their analyses did not reveal any significant differences in static FC between AVH-and AVH + patients. This emphasizes that averaging FC metrics across the whole scanning period can fail to detect functionally meaningful connectivity variations.

Sliding window applications to AVH
Similarly, Geng et al. (2020) analyzed fMRI dynamic inter-and intranetwork FC using sliding windows to unveil altered RS dynamics in AVH + and AVH-ScZ patients. Based on previous meta-analyses, they used several large-scale neural networks implicated in hallucinatory experiences as regions of interest (ROIs) and segmented their FC patterns into six distinct states. The results indicated that AVH + patients were characterized by connectivity states representing decreased FC within the auditory network and between the executive control and language networks. Moreover, two connectivity states showed significant differences in their temporal features (i.e., their duration and switching behavior) between the two groups. AVH + patients were associated with less time spent in a state capturing high anti-correlation between the DMN and the language network, as well as a lower probability of entering this state. The attenuated antagonism between the DMN and language-related brain areas arguably facilitates a heightened focus on internal speech processing and verbal intrusions as commonly observed in AVH + individuals ( Badcock and Hugdahl, 2012 ). Results obtained by Weber et al. (2020) support the conclusion that altered DMN activity contributes to hallucinations. Particularly, ScZ patients displayed less dynamic DMN interactions with other networks, e.g., patients were more likely to spend more time in a state representing high FC within the DMN. Clinical ratings of hallucination severity were not significantly associated with the FC abnormalities on the day of fMRI scanning. Yet, (auditory) hallucination proneness, as measured by the Positive and Negative Syndrome Scale (PANSS; Kay et al., 1989 ) showed a significant positive relationship with DMN state dwell times (i.e., time spent in the state) over one year.
These results support previous theoretical and evidence-based frameworks of AVH based on static FC and network analysis, highlighting aberrant (i.e., reduced) anti-correlation of the DMN and other networks such as the auditory and language networks ( Northoff and Qin, 2011 ;van Lutterveld et al., 2014 ). The latter might promote abnormally heightened involvement of auditory processing areas even in the absence of task demands. This in turn may result in source-attribution deficits, i.e., the attribution of internally generated stimuli to an external source, hence, the experience of hallucinations. While these findings suggest DMN instability as a potential marker of trait hallucination proneness, the directionality of effects and the contribution of other RSN interactions remain inconsistent ( Northoff and Qin, 2011 ;Waters et al., 2012 ;Weber et al., 2020 ).

Limitations of the fMRI sliding window analysis
The reviewed findings suggest that the sliding window approach unveiled critical aspects of FC abnormalities in AVH. The high spatial resolution of fMRI yields high reproducibility of results with great withinand between-subject consistency ( Fox et al., 2005 ) and is therefore particularly suited for precise localization of (dys-)functional brain areas. However, the method has several considerable limitations. Commonly, Fig. 1. Sliding window analysis. Note. A . fMRI BOLD time courses. The different colors in the time series represent activity from different brain areas. B . A fixed time window is shifted in time by a predefined number of data points, defining the amount of overlap between successive windows. Covariance matrices, capturing the connectivity strength between different brain areas, are obtained based on the data within each window separately. C. The covariance matrices are grouped using a clustering algorithm, e.g., k-means. D. The resulting clusters are fitted back to the original time series to obtain a sequence of transient connectivity states.
sliding window analyses cluster covariance matrices into different connectivity states a posteriori. The temporal evolution of network fluctuations is therefore not considered during the clustering procedure itself but is only indirectly inferred when the obtained states are projected back to the time series data. In other words, this method provides connectivity metrics that are computed separately for each time interval at several moments of a recording period, hence assuming temporal independence of time windows. Consequently, employed metrics may capture rather static properties of FC, hence permitting only limited conclusions about the true temporal variation of network states. This may be especially problematic if the timescale of FC fluctuations does not match the sliding window size ( Hutchison et al., 2013 ;Liuzzi et al., 2019 ).
Another key disadvantage of the sliding window approach concerns the need to predefine a fixed window size (i.e., a fixed number of data points), which inevitably dictates the timescale and sensitivity of subsequent analyses. The length of the sliding window is commonly derived to maximize the signal-to-noise ratio (SNR). In the context of BOLD fMRI, the window size should be long enough to fully capture the slow hemodynamic response and to ensure robust estimation of the SNR, while being narrow enough to sensitively detect quickly fluctuating patterns of network connectivity. Hence, the sliding-window approach indirectly assumes that the RSN dynamics are sufficiently captured by the fixed window size. However, Liuzzi et al. (2019) showed that mere variability in connectivity metrics can be an artifact of a short window length and does not necessarily provide evidence for underlying network dynamics. They further demonstrated that sliding windows techniques are particularly limited in detecting fast connectivity patterns, i.e., brain states with very short durations, even in the case of short window lengths ( Liuzzi et al., 2019 ). Thus, it remains questionable whether fixed temporal windows account for functional aberrancies underlying a, by definition, fluctuating phenomenon such as AVH of which the timescales are largely unknown.
Lastly, a more general concern goes beyond the sliding window analysis as such and is related to the low temporal resolution of fMRI. Hemo-dynamic fluctuations measured with RS fMRI are slow and yield only an indirect measure of neural activity ( Hutchison et al., 2013 ). Consequently, fMRI lacks the required temporal sensitivity to monitor the dynamically evolving connectivity patterns and corresponding cognitive operations. In contrast, M/EEG are direct measures of the activity of large neural assemblies, providing temporal precision at the sub-second level ( Hutchison et al., 2013 ). It has been frequently shown that the brain rapidly switches between distinct connectivity patterns, which can only be captured by sufficiently high temporal resolution techniques as well as computational algorithms accounting for sub-second nonstationarity ( Baker et al., 2014 ;Vidaurre et al., 2016 ;Woolrich et al., 2013 ).

EEG microstate analysis
The EEG microstate analysis has been frequently employed to detect changes in the topography of RS EEG data, and to investigate the relationship between these topographies and clinically relevant betweensubject differences that predict behavioral abnormalities and pathology. Microstates are short periods of quasi-stable scalp potential fields with varying amplitudes across time that are assumed to be generated by an underlying network of simultaneously active sources ( Lehmann et al., 1998 ;Michel and Koenig, 2018 ). Microstates are commonly identified by i) calculating the Global Field Power (GFP) and extracting local maxima ii) plotting the scalp activity distribution at GFP peaks and clustering them using k-means, and iii) assigning the original activity maps to one of the microstate classes based on spatial similarity, i.e., the class showing highest correlation with the original scalp topography. Finally, the data is re-expressed by a series of microstates (see Fig. 2 ). The GFP peaks are of particular interest as they represent the highest field strength and SNR .
There is a consensus regarding a four-cluster microstate solution to represent the EEG time course. The four typical microstate clusters labeled A-D, seem to correspond to fMRI RSNs associated with phono- Fig. 2. EEG microstate analysis. Note. A. Pre-processed multi-channel RS EEG time series data. The different colors in the time series represent different channels. The pink dots mark the local GFP peaks. B. Calculation of Global Field Power (GFP) and extraction of local maxima, representing instances of highest field strength and SNR. C. The scalp activity distribution at GFP peaks is plotted and subsequently clustered using k-means based on their topographical similarity. D. The original maps obtained from the GFP maxima are assigned to the microstate cluster with the highest spatial correlation. (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.) logical processing, the visual network, the salience network, and the attention network, respectively Yuan et al., 2012 ). Alternatively, Pascual-Marqui et al. (2014) suggested that the classic EEG microstates represent fragments of the metabolically derived DMN, indicating that the fMRI RSNs might represent a low-pass filtered version of more quickly fluctuating electric neural activity. Further, it is proposed that microstates characterize qualitative aspects of cognition and represent spontaneously fluctuating and task-independent brain activity Lehmann et al., 1998 ).

EEG microstate analysis applied to AVH
The classic EEG microstate analysis has been applied in different clinical populations, yet most of the microstate literature is concerned with ScZ. Changes in temporal features of microstates (e.g., shorter durations or aberrant inter-class transitions) align with ScZ characteristics such as abnormal FC, decreased functional organization, and disturbed information processing ( Lehmann et al., 2005 ). Further, shortened microstate durations are particularly associated with positive ScZ symptoms such as hallucinations ( Nishida et al., 2013 ;Strelets et al., 2003 ). Moreover, between-state transition patterns tend to differ between ScZ patients and healthy controls, emphasizing disturbed network interaction and information processing. This may favor source misattribution, which is assumed to underlie positive ScZ symptoms ( Kapur, 2003 ;Lehmann et al., 2005 ). Likewise, ScZ patients and controls show different microstate topographies, but results remain controversial ( Lehmann and Skrandies, 1980 ;Strelets et al., 2003 ). A meta-analysis by Rieger et al. (2016) revealed medium-sized effects for microstate C and D in ScZ patients. While cluster C microstate occurred more frequently, microstate D showed significantly shorter durations, suggesting that the underlying temporal evolution of RSNs is different than in healthy controls. This finding was reinforced by showing that antipsychotic medication normalizes the temporal patterns of microstates C and D ( Kikuchi et al., 2007 ).
Importantly, shorter microstate durations are characteristic of the presence of AVH and are independent of a ScZ diagnosis ( Kindler et al., 2011 ). It was proposed that the premature termination of these microstates reflects underlying RSN disruptions, facilitating the misattribution of self-generated sensory events (e.g., speech, thoughts) to external sources, thereby leading to AVH. Additionally, Andreou et al. (2014) reported that EEG microstate alterations are already characteristic of the prodromal ScZ stage, suggesting that abnormal network communication, represented by an altered microstate profile, is associated with an increased risk for psychosis. This supports our conceptualization of AVH and other psychotic symptoms as a continuum ( Johns et al., 2014 ;Johns and Van Os, 2001 ). Considering that the psychosis continuum has been supported by cognitive-behavioral, electrophysiological, and neuroimaging studies, it appears plausible that non-clinical AVH would show a similar but attenuated version of the state differences observed in AVH + ScZ patients and individuals in the prodromal stage ( Andreou et al., 2014 ;Kindler et al., 2011 ;Nishida et al., 2013 ). However, this assumption remains to be empirically tested and validated.

Limitations of the EEG microstate analysis
The extensive body of research available on classic EEG microstates applied to FC abnormalities in ScZ and positive psychotic symptoms such as AVH has obtained quite consistent results. The microstate approach is also used to investigate functional changes in dementia ( Grieder et al., 2016 ), panic disorder ( Kikuchi et al., 2007 ), and narcolepsy ( Drissi et al., 2016 ). While these microstate profiles are distinct from those reported in ScZ patients, other disorders such as dementia, panic disorder, and narcolepsy seem to be characterized by very similar microstate patterns (i.e., decreased occurrence of microstate cluster D and changes in A and B). This may indicate that microstate features (e.g., occurrence) alone are insufficiently informative to reliably differentiate between distinct clinical profiles.
Moreover, the temporal dynamics of EEG microstates are inferred a posteriori, and the topographies obtained for the timepoints at GFP peaks are clustered based on their spatial similarity only. This means that the temporal sequence of FC fluctuations is essentially disregarded, yet artificially generated when fitting the microstate clusters back to the original time series. Further, the microstate time course is inferred under the restricting assumption that state switching exclusively happens at the local maxima of the GFP ( Rukat et al., 2016 ). Thus, the temporal information between two GFP peaks is dismissed. However, RSN activity yields richer dynamic properties than currently shown by the microstate analysis. We emphasize that it is important to character-ize brain connectivity states based on their temporally fluctuating and switching behaviors, i.e., dFC, and not based on their topographical similarity at the sensor level alone. Lastly, while most research has reported seven ( Seeley et al., 2007 ) or more RSNs from fMRI ( Damoiseaux et al., 2006 ), the EEG microstate literature limits itself to 4 clusters. Therefore, the microstate solution may yield a simplified representation of underlying RSNs.

Integration of the fMRI sliding window and EEG microstate analyses
Several studies proposed a functional link between the EEG microstates and the well-known BOLD-fMRI ICA-derived RSNs ( Bréchet et al., 2018 ;Britz et al., 2010 ;Musso et al., 2010 ;Yuan et al., 2012 ). However, these studies derive different conclusions, challenging a one-to-one match between the normative microstate classes and brain networks. Abreu et al. (2021) investigated the neural underpinnings of EEG microstates by quantifying their ability to predict dFC states obtained by means of fMRI sliding windows. The results indicate a high accuracy of EEG microstates based on time-frequency decomposition predicting dFC states obtained from fMRI sliding windows. The results further suggest that the fMRI dFC states were best characterized by a combination of microstates, challenging the previously hypothesized matching of the four microstate classes to distinct RSNs Michel and Koenig, 2018 ).
Taken together, the fMRI sliding window and EEG microstate approaches aim to assess transient connectivity states and their relation to behavioral and cognitive changes. Although differing considerably in their spatial and temporal resolution, both methods follow a similar logic: First, a particular feature of interest is extracted from the data. While the sliding window approach to fMRI time series extracts the correlation matrices reflecting FC between different brain areas within a pre-defined time window, the EEG microstate analysis extracts the topographies at GFP peaks, which are assumed to reflect periods of highest field strength. Subsequently, the features are clustered using k-means according to a similarity criterion. Lastly, the temporal evolution of the feature clusters (i.e., connectivity states or microstates) is inferred a posteriori by matching the obtained clusters to the time series data by assigning the state with the highest correlation to each point of the time series. State abnormalities in diagnosed ScZ patients may be prominent enough to show in both approaches Geng et al., 2020 ;Lehmann et al., 2005 ;Nishida et al., 2013 ;Weber et al., 2020 ). However, this can be problematic for two reasons. Firstly, even though group differences are significant, they provide insufficient evidence for sub-second dynamic abnormalities contributing to the phenomenon of AVH, due to i) the low temporal resolution of fMRI and ii) the discussed methodological limitations (e.g., fixed window size, restriction to GFP peaks, post-hoc clustering of FC matrices/topographies). Secondly, more subtle nonstationary changes in brain dynamics that may be indicative of non-clinical voice-hearing, i.e., AVH as an isolated phenomenon, may remain undetected. In conclusion, the suitability of fMRI sliding window analyses and EEG microstates to derive reliable conclusions about the true temporal dynamics of network connectivity in the context of AVH remains to be further specified.

Probabilistic generative models -hidden (Semi -)Markov models
Hidden (Semi-)Markov Models (H(S)MMs) are generative probabilistic models that describe the brain as a hybrid dynamic system (HDS), which exhibits discrete and continuous dynamic behavior. The first underlying, non-observable (hidden), and discrete process defines the sequence of recurring brain states, which "emit " (generate) the continuous time series data, i.e., the observations ( Yu, 2010 ). Each state is characterized by distinct statistical properties (i.e., means and covariances) of the data, and changes of these properties mark the points in time at which a state switches to another ( Baker et al., 2014 ;Quinn et al., 2018 ;Trujillo-Barreto et al., 2019 ).
Recently, Bayesian formulations of the HMMs have successfully been applied to M/EEG and fMRI data ( Baker et al., 2014 ;Hunyadi et al., 2019 ;Vidaurre et al., 2018 ). Unlike sliding window estimations, H(S)MMs estimate the state time courses in a self-contained manner and thereby overcome the temporally restricting assumptions underlying sliding windows or the microstate analysis ( Woolrich et al., 2013 ). While the H(S)MM states are estimated at the group level (i.e., the states are fixed and common for all subjects), subject-specific state features can be directly obtained from the state sequences. This provides insight into the temporal dynamics of individual connectivity patterns and allows between-subject comparisons. To this end, different summary statistics can be computed, including i) state occurrences, the number of times a state is visited, ii) fractional occupancy, the percentage of time that is occupied by a state, iii) state duration, also frequently referred to as dwell time, and iv) the probabilities of transitioning between the states ( Trujillo- Barreto et al., 2019 ).
H(S)MM parameters are learned directly from the input data during the training process. The models comprise three main components: The emission/observation model governs the sequence of observations generated by the underlying latent variables and can be adjusted according to the assumptions about the hidden discrete process ( Baker et al., 2014 ). Each state's observation model defines a probability distribution from which the observed EEG data is drawn whilst the system is in that current state. These observations are assumed to be normally distributed, i.e., modeled by a multivariate Gaussian observation model. Each Gaussian state is characterized by a unique topography (state map), which is bound to the state's mean activation and covariance, which can be interpreted as a network matrix of functional connectivity . The initial state model defines the probability of the sequence beginning in a particular hidden state and generating corresponding observations. Lastly, the transition model captures the probabilities of transitioning from a state to any other state, i.e., the points at which the underlying data distribution changes.
Although the main assumptions of the HMM and HSMM are similar, they differ in how they consider the between-state transitions and the activation time of each state (state durations). Standard HMMs assume that at each time point, an active brain state emits one observation at a time, which is a direct consequence of the strictly Markovian betweenstate transitions. This further implies Geometrically distributed state durations (i.e., the probability decreases as the state duration increases). The HSMM is a more flexible adaptation of the HMM. As implied by its name, the HSMM does not assume the state sequence to be strictly Markovian but allows the underlying process to be a semi-Markov chain. Hence, in contrast to the standard HMM, the HSMM considers each state as a "super-state " emitting a sequence of observations. Fig. 3 schematically depicts the difference between the HMM and the HSMM and shows how the hidden states relate to the sequence of observations given the respective assumptions about state durations.

Explicit modeling of the state durations
In the HSMM case, the state durations are determined by the number of successive observations that are generated by the currently active state, which allows longer and more realistic periods of state activation. Thus, in addition to the already introduced emission model, initial state model, and transition model, the HSMM comprises a duration model as a fourth component, enabling explicit modeling of the state duration distribution ( Trujillo- Barreto et al., 2019 ).
Explicit modeling of the state durations means that their probability distribution can be determined according to realistic assumptions, i.e., to fit the data as closely as possible. This implies that state durations are not dependent on the transition model (as in the HMM case) but are drawn from an explicitly defined probability distribution. First applications of the HSMM to RS EEG data provide evidence for increased Fig. 3. Schematic comparison of the HMM and HSMM. Note. A. HMM. State S1 transitions to state S2 with a probability of a12 or transitions to itself with a probability of a11 (auto-transition). Each time S1 is active, it emits one observation oS1. B. HSMM. Each state emits a random number of observations (i.e., observation sequence o1 to od1). d1 refers to the duration of S1 and is defined as the number of successive observations emitted by S1. The state durations are explicitly modeled by the duration model, here depicted as a lognormal probability distribution for each state, e.g., p(d1). S1 transitions to S2 with a probability of a12, while auto-transitions are not permitted. model fit for lognormally (instead of normally or geometrically) distributed state durations ( Trujillo-Barreto et al., 2019 ). Fig. 4 visualizes the difference between the HMM and HSMM (with a lognormal duration model) regarding the empirical state durations and transition probabilities. Previous models of brain functioning suggested that structural and functional brain parameters such as synaptic weights, neural firing rates, axon diameters, and the synchronous discharge of large neuron groups follow a normal (i.e., Gaussian) distribution. Yet, those parameters, as well as the amplitude fluctuations of neural oscillations, tend to vary by several orders of magnitude and their distributions are rather heavy-tailed (e.g., log-normally distributed), which has fundamental implications to structural and functional brain organization ( Buzsáki and Mizuseki, 2014 ;Roberts et al., 2015 ). The log-normal distribution can be found at many hierarchically connected levels of the brain. For instance, at the synaptic level, strong synapses and fast-firing neurons constitute a small minority, yet they substantially contribute to overall brain activity and reliable information flow. Meanwhile, most neuronal synapses are attached to more slowly firing neurons, transmitting information in a more energy-efficient way ( Buzsáki and Mizuseki, 2014 ). Similarly, axon diameters are log-normally distributed. Axons with larger diameters typically constitute inter-hemispheric connections, while smalldiameter axons are most common in frontal cortices. The former allows fast neural communication between nodes and tends to comprise frequently and strongly firing neurons ( Innocenti et al., 2014 ;Wang et al., 2008 ). Taken together, it seems that various brain functions on microto macroscopic levels, including local circuits and the brain connectome are lognormally distributed. This suggests that the lognormal distribution may be the result of experience-dependent and genetically defined modifications. It is therefore reasonable to assume that dynamic network connectivity patterns and associated cognitive functions follow the same rule, which may partly determine the activation time of a given network until the brain switches to another. Hence, lognormally distributed state durations mirror this behavior at yet another hierarchically connected level ( Buzsáki and Mizuseki, 2014 ). Crucially, highly sensitive, yet flexible computational models, such as the HSMM can account for such fundamental properties of brain structure and function ( Trujillo-Barreto et al., 2019 ).

Applications of the H(S)MM to AVH
Research focusing on sliding windows and EEG microstates points towards aberrant connectivity among and within large-scale brain networks at rest, particularly involving the DMN ( Andreou et al., 2014 ;Geng et al., 2020 ;Manoliu et al., 2014 ;Weber et al., 2020 ). Supportive evidence was obtained by Kottaram et al. (2019) , who investigated differences in brain dynamics between ScZ patients and healthy controls using an HMM to model fMRI BOLD hemodynamics in RSNs. The inferred set of 12 brain states represented recurring patterns of activation among RSNs. ScZ patients exhibited aberrant state transition in and out of a state identified as the DMN. Further, the DMN was less frequently activated; however, once active, it remained active for an abnormally long period. These periods were further associated with decreased activity in sensory networks. Moreover, the mean duration of a brain state characterized by increased sensory network activity combined with low activation in cognitive control networks was significantly higher for ScZ patients than for controls. Importantly, these functional changes were positively correlated with the severity of positive symptoms, e.g., hallucinations and delusions. Lastly, aberrant state dynamics were predictive of an individual's clinical status, which highlights the model's ability to guide early risk detection. The study conducted by Kottaram et al. (2019) is the first to apply an HMM to investigate temporal network connectivity changes in ScZ. Following previous research, they established a link between aberrant DMN dynamics and positive psychotic features, confirming that ScZ links to disturbed temporal interaction between neurocognitive networks ( Geng et al., 2020 ;Weber et al., 2020 ). The timescale of the obtained metrics (e.g., state durations), however, is bound to the low temporal resolution of the employed imaging modality, i.e., fMRI. Hence, sub-

Note. A. Training Input.
Pre-processed (filtered or broad-band) and concatenated multichannel RS EEG time series data (input to the model training). Different colors represent different channels. B. Training. The EEG time series are modeled as a finite set of sequentially recurring spatially and temporally distinct states (clear circles) labeled S1, S2, …, SN (N representing the total number of state visits). In the HSMM, each active state emits a random number of observations (filled circles). In the HMM, each active state emits one observation only. C. Transitions & Durations * . In the HSMM, auto-transitions are prohibited (i.e., diagonal elements are equal to 0) and the state durations are explicitly modeled as lognormal probability distributions. The HMM allows auto-transitions (i.e., the diagonal elements are close to 1), which favors short durations and fast state switching. In both duration plots, the different colored lines represent different states. D. State sequence. Based on the trained model parameters, the states are allocated to the EEG time series. The resulting state sequence represents the recurring pattern of state activity across the whole data recording time and can be used to directly compute statistics for between-group or -subject comparisons. E. State Maps. The state maps represent the most representative topographies for each state S1 to S4, which are directly computed from the state's mean activation (amplitude) and covariance captured by the multivariate Gaussian observation model. * Transition probabilities and state durations depicted here are based on simulated data and serve illustrative purposes only (i.e., the state durations might look different for real RS M/EEG data). (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.) second fluctuations of network FC likely remained undetected. Moreover, this study did not specifically focus on AVH, but only assessed relationships between the state metrics and the patients' score on the PANSS, a clinical measure of positive and negative symptoms of ScZ. Nevertheless, this finding reflects the usefulness of the HMM to identify markers for pathology with increased temporal sensitivity for network fluctuations.
To date, no study has investigated HSMM-derived brain state dynamics linked to any clinical phenomenon, including AVH. In the following, we will summarize recent applications of the HSMM to RS M/EEG as well as RS and task-based fMRI data to illustrate the HSMM's advantages for the modeling of state characteristics.
Trujillo-Barreto et al. (2019) directly compared the performance between HMM and HSMM based on simulated and resting-state EEG data Table 1 Common H(S)MM state metrics, their definitions, and example interpretations of brain state changes.

Fractional occupancy (FO)
Relative data time occupied by a state given as a percentage.
Different network connectivity patterns likely compete with each other. The relative time of a brain state representing a distinct and transient connectivity profile may provide insights into network dominance and (in)flexibility. Thus, increased or decreased state FO could indicate perseverance or instability of network activity, respectively, both contributing to cognitive dysfunction. State occurrences (SO) The total number of times a state is visited during the data recording time.
Distinct network connectivity patterns are assumed to recur sequentially in a dynamically fluctuating, yet predictive way. SO likely covaries with state FO (i.e., a higher occupancy links to more frequent state visits) Given a sufficient data recording time, and in combination with other metrics (e.g., FO and TPM), SO can provide critical information to predict activity sequences and abnormalities thereof. State durations (SD) The probability distribution of state durations throughout the data recording time. Each duration is calculated as the number of successive timepoints a brain state is active.
RSNs alternate between periods of activation and deactivation with fluctuating strengths and directionality of functional connections. Once active, a network will stay active for a variable period of time to serve a cognitive function. The temporal interaction of RSNs further preserves an 'equilibrium state' (i.e., baseline excitability of the brain). Therefore, the activity duration can inform about network engagement and stability. For instance, increased duration of a state representing high within-DMN connectivity may suggest an enhanced focus on internal processing and increased mind-wandering, thereby suppressing anti-correlated, e.g., attention-related networks. Transition probability matrix (TPM) Probability of switching from a particular state to any other state represented in a k x k matrix, with k being the number of states in the model.
The TPM yields information about the frequency and nature of network interactions. A high probability of switching between two particular brain states could hint towards a close temporal interplay between two functionally related (sub-)networks or critical hubs within a network. In turn, aberrant transition patterns could link to a deficient use of network resources, facilitating cognitive errors, e.g., the misattribution of internal sensory events to external sources, which may manifest in hallucinations. and assessed differences between a log-normal versus a normal distribution implemented into the HSMM duration model. Their approach revealed that the semi-Markov chain (as opposed to the classic Markov chain in the HMM case) implemented in the HSMM framework accurately estimates the properties of empirically observed brain states such as non-Markovian/non-geometric duration distributions (i.e., heavy tailed), cycling transitions, and non-stationarity. Thus, by i) encoding the whole data covariance structure and estimating the statistical properties of the time series in terms of probabilistic generative states, ii) explicitly defining the state duration probability distribution according to realistic assumptions, and iii) estimating the between-state transition probabilities accordingly, the HSMM provides a suitable model of nonstationary neural time series.
When directly comparing HSMMs with a lognormal or a normal duration distribution, the HSMM with lognormal state duration distribution showed a better model fit (evaluated by means of nFE) as compared to the normal brain state durations, supporting Buzsáki's conceptualization of the log-dynamic brain ( Buzsáki and Mizuseki, 2014 ). In summary, accurate modeling of the state durations is essential for correctly characterizing non-Markovian EEG dynamics. Hence, the explanatory power and ability to accommodate true brain dynamics of the HSMM justify the increased complexity due to additional parameter estimation (i.e., the parameters of the duration model) and outperforms other approaches.
Similar to the standard HMM, the HSMM can be applied to taskdependent data and other data modalities. Shappell et al. (2019) proposed an HSMM approach to infer the temporal dynamics of brain networks from RS and task-fMRI data. In their framework, a brain state represents a unique FC pattern involving five ROIs that sequentially repeats over time. Their findings confirmed that the HSMM solution of the task-based data perfectly aligned with the experimental design. Specifically, most participants' state sequence originated in state 2 representing the fixation period, transitioned to state 1 during the task instruction, and transitioned back to state 2 during the final fixation period. These results demonstrate that the HSMM adequately models task demands and corresponding variation in network connectivity in an unsupervised manner, i.e., without prior information about the task design. A comparison of these results with those obtained from a standard HMM indicated that the explicit estimation of the state durations provided by the HSMM fit the data very well and outperformed the HMM solution. Lastly, Shappell et al. (2019) modeled a 12-state HSMM based on RS fMRI data and reported evidence for a close relationship between HSMM state features and cognitive functioning. Participants with lower sustained attention scores tended to dwell shorter in 3 (out of 12) states and transitioned quicker into other states as compared to those with high sustained attention scores. Importantly, those states are assumed to represent parts of the DMN and the dorsal/ventral attention network. Taken together, the results provide empirical support for perturbed dynamics of RSNs and their whole-brain connectivity state signatures are indicative of cognitive and behavioral dysfunction. As argued in the previous sections, the temporal sensitivity of the method is strictly dependent on the data used for analysis and fed into the model. Hence, the temporal accuracy of the modeled brain state characteristics using an HSMM is highest when combined with high temporal resolution measurements, like M/EEG instead of fMRI. Table 1 provides a summary of commonly analyzed brain state metrics, their definitions, as well as example implications of brain state alterations. We build on the premise that changes in H(S)MM brain states indicate disturbed network functioning and link to cognitive disturbances ( Kottaram et al., 2019 ;Shappell et al., 2019 ;Vidaurre et al., 2017 ). Importantly, each metric may uniquely inform about network functioning in a more detailed manner. Note that some metrics might covary with each other, as demonstrated by Trujillo-Barreto et al. (2019) . For instance, a state occupying a relatively large fraction of the recording time will likely be associated with a high number of state occurrences, i.e., more frequent visits to this particular state. Yet, this relationship does not always hold. It is plausible that high state occupancy links with few state occurrences but long state durations. Hence, a thorough quantitative assessment of all state metrics warrants a comprehensive evaluation of dynamic network functioning. As empirical evidence of the HSMM is lacking, future research is needed to assess these brain state metrics using the HSMM in healthy voice-hearers to isolate brain dynamics associated with AVH and further confirm the psychosis continuum hypothesis on the level of dFC.

Integration of the fMRI sliding windows, EEG microstates, and H(S)MM
When comparing the fMRI sliding window approach and the H(S)MM, the most critical difference concerns the time scale of connectivity fluctuations that can be captured by both approaches. The sliding window technique relies on the a priori specification of the window length and is therefore restricted to temporal fluctuations that match this particular window. Consequently, FC signatures that are considerably slower or faster than the window length might be either disregarded or merged together ( Zhang et al., 2019 ). The H(S)MM encodes the whole time series and thereby learns the relevant timescales of the data directly during the training process, i.e., it can capture temporal dynamics as fast as the data modality allows. This implies that the H(S)MM approach can account for non-stationary FC of a priori unknown timescales, while the sliding window approach does not ( Vidaurre et al., 2017 ;Zhang et al., 2019 ).
To fully exploit the temporal sensitivity of the H(S)MM, it is straightforward to focus on its application to high temporal resolution data modalities such as M/EEG. Rukat et al. (2016) compared results obtained from the EEG microstate analysis as described above and from an HMM, both applied to RS EEG data from six healthy participants. They found that a four-state solution for both approaches shows similarities in the states' average durations and topographies. However, temporal switching patterns of the brain states, as expressed by the transition probabilities, were different. For instance, the microstates' time courses displayed a strong positive pairwise temporal correlation, whereas the HMM state time courses displayed more varying and sometimes negative pairwise correlations. If we assume that brain states reflect distinct network connectivity profiles (e.g., underlying RSNs) and are thereby associated with different, supposedly temporally independent, cognitive operations, the pattern displayed by the microstate analysis unrealistically reflects true brain dynamics ( Smith et al., 2012 ). Contrastingly, the HMM provided a different temporal profile, suggesting that the HMM integrates critical temporal information that the microstate analysis fails to incorporate ( Rukat et al., 2016 ). The explicit comparison between the microstate analysis and the HMM outlined above was based on a small sample size, therefore, replication of such qualitative and methodological comparisons with a larger dataset would warrant more reliable conclusions about the superiority of either approach.
More recently, Coquelet et al. (2021) used simultaneous M/EEG RS recordings to investigate the reproducibility and robustness of brain states across different modalities (MEG vs. EEG) and analysis methods (microstate analysis vs. HMM). Their results support that GFPbased microstates and HMM states computed on the signal envelope show spatially and temporally different characteristics. While the microstates were not reproducible across the two modalities, the M/EEG-HMM states reflected transient activity in similar functional networks. This provides evidence for the high stability and reliability of the HMMderived conclusions about underlying network interactions. The authors further emphasize that the microstate approach is i) biased towards the GFP local maxima and ii) largely dominated by the alpha frequency, and therefore should be interpreted as "moments of high-amplitude alpha rhythms within alpha bursts " ( Coquelet et al., 2021 ). Contrastingly, they provide evidence that the HMM is more sensitive to state transitions and the temporal structure of the time series and is driven not only by power fluctuations but also functional connections (i.e., crosscovariance; Coquelet et al., 2021 ).
We can conclude that the two approaches are conceptually different, and therefore, reflect distinct data features. The microstate analysis solely assesses the evolution of quasi-stable topographies (i.e., the clustering is based on topographical similarity). The H(S)MM relies on the temporal covariance structure of the data when estimating the model parameters, thus encompassing information about functional network connectivity ( Baker et al., 2014 ;Coquelet et al., 2021 ;Woolrich et al., 2013 ). Taken together, the two methods should be considered complementary, i.e., suitable to investigate different aspects of neural time series data with a different degree of sensitivity to temporal variability. If we aim to identify subtle dynamic changes in RSN connectivity, the preferred approach should be the one with higher temporal sensitivity. This may be particularly important in the context of a fluctuating phenomenon with varying intensity, such as AVH.

Limitations of the H(S)MM
So far, we emphasized that generative probabilistic models such as the HMM and HSMM are better suited to assess brain state dynamics as compared to the fMRI sliding window technique and the EEG microstate analysis. The HMM's limitation concerns the Markovian state switching and related Geometric duration distribution. Another related key limitation of the HMM's underlying assumptions concerns the Markovian memorylessness insinuated by the Geometric state duration distribution. The concept of memorylessness implies that the time between entering a state and switching to another state is independent of the time already spent in the current state. In other words, the probability to switch to another state independent from the path the brain took to get to the current state, i.e., it is strictly Markovian. This assumption fundamentally contradicts the long-range correlation, i.e., temporal dependence of M/EEG data ( Buzsáki and Mizuseki, 2014 ;Roberts et al., 2019 ). Van de Ville, Britz, and Michel (2010) provided evidence of (micro-)state durations critically influencing long-range dependencies of EEG time series, while the order of the state sequence does not. This finding highlights the importance of an accurate duration model.
The HSMM overcomes these limitations by allowing explicit modeling of the state durations as elaborated in Section 3.1 . Despite the HSMM's convincing advantages over previously described state segmentation methods, its additional value in clinical applications still needs to be established. As research on AVH using the HSMM is currently lacking, our evaluation is based on only a few applications to either simulated, RS EEG data, and task-based fMRI data of healthy participants ( Shappell et al., 2019 ;Trujillo-Barreto et al., 2019 ). These studies supported the theoretical advantages of the HSMM, however, the current lack of H(S)MM applications warrants caution about the generalizability of their underlying assumptions and the neurophysiological significance of the resulting brain states. Thus, it is indispensable to acquire more empirical evidence of the HSMM's performance in healthy and pathological contexts. Considering that the underlying temporal dynamics of RSNs and their contribution to AVH are still elusive, the HSMM may provide further critical insights into the sub-second interaction of functional networks.
Lastly, despite their flexibility in choosing prior distributions according to theoretically valid assumptions about the data, model-based approaches can be time-consuming and computationally expensive. To increase model robustness and reliability of post-hoc state metrics (e.g., fractional occupancies), the training usually requires many iterations of the underlying algorithm, which can be particularly problematic when employing large sample sizes with many data points per data set.

Discussion and future directions
Despite a large body of research on RSN FC in various populations, precise temporal dynamics underlying healthy brain function and pathological phenomena remain elusive. We consider the brain as a dynamic system that continuously and systematically transits in and out of distinct connectivity states. In this review, we provided an overview and qualitative evaluation of a few commonly used methods to segment neural time series data into a sequence of discrete brain states and link the states' temporal dynamics to AVH in clinical (ScZ) and non-clinical samples (healthy voice-hearers), including the fMRI slidingwindow approach, EEG microstate analysis, and two generative modeling approaches, the HMM and HSMM. Table 2 provides a summary of the methods' main differences and constraints.
Both fMRI sliding window and EEG microstate analyses have greatly advanced our understanding of RSN activity in various disease contexts ( Damoiseaux, 2012 ;Greicius, 2008 ;Song et al., 2019 ;Weber et al., 2020 ). Research has demonstrated that the dynamic interaction between core RSNs, including the DMN, SN, and Central Executive Network (CEN) is a fundamental characteristic of healthy brain function. Instability of these dynamics (e.g., reduced anticorrelation between the DMN and CEN) disrupts whole-brain network communication, facilitates neurocognitive errors, and manifests in aberrant behavioral performance ( Geng et al., 2020 ;Manoliu et al., 2014 ;Menon, 2011 ;Weber et al., 2020 ). Likewise, EEG microstates yielded reproducible group differences between ScZ patients (and other patient populations) and healthy controls, that are indicative of aberrant RSN func- tion ( Andreou et al., 2014 ;Lehmann et al., 2005 ;Nishida et al., 2013 ;Strelets et al., 2003 ). Both approaches further provided evidence for network changes linked to AVH based on studies either comparing ScZ patients with and without AVH, or by investigating AVH in non-clinical populations ( Andreou et al., 2014 ;van Lutterveld et al., 2014 ). The continuity of psychotic experiences is supported by electrophysiological and neuroimaging data, showing that some functional aberrancies linked to clinical and non-clinical AVH are similar but vary in their frequency of occurrence and severity Johns et al., 2014 ;Johns and Van Os, 2001 ;van Lutterveld et al., 2014 ). Consequently, hallucination-prone individuals in the general population might exhibit subtle functional changes (i.e., an attenuated version of those observed in clinical and non-clinical voice hearers). These changes include source misattribution, voice feedback and identity processing, abnormal functioning in the DMN and the auditory network, and alterations in white matter pathways subserving salience and attention processing ( Allen et al., 2006 ;Baumeister et al., 2017 ;Johns et al., 2014 ;Johns and Van Os, 2001 ;Johnson et al., 2021 ;Pinheiro et al., 2018 ;van Lutterveld et al., 2014 ). Higher severity and frequency of symptoms link to more prominent underlying network changes, and persistence of AVH are usually associated with a higher need for care ( Johns et al., 2014 ). Thus, already subtle changes in temporal network dynamics in healthy voice-hearers (or highly hallucination-prone individuals) can be indicative of a heightened risk of developing clinically relevant symptoms. However, the methodological constraints of some of the discussed approaches highlight that we still lack insights into sub-second RSN changes contributing to AVH. As a temporally fluctuating and unprovoked phenomenon, AVH require high-sensitivity methods that account for realistic assumptions about the dynamic brain.
We propose that the HSMM constitutes a strong candidate for gaining new insights into mechanisms contributing to disease markers that remained undetected by previous approaches. Firstly, the HSMM accounts for the temporal dynamics of RS activity by decoding the whole data covariance structure and finding the best fitting number of distinct hidden brain states. Thus, the HSMM i) does not rely on a predefined time window and ii) is not restricted to a predefined number of states ( Trujillo-Barreto et al., 2019 ). This is in clear contrast to the sliding window and EEG microstate approaches. The flexibility and temporal sensitivity of the HSMM are crucial for investigating phenomena like AVH, which are fluctuating in the absence of clear external triggers and varying in their severity along with increasing underlying dynamic neurocognitive changes.
Secondly, the HSMM offers flexibility for defining the model parameter distributions. Specifically, not only the observation model but also the duration model can accommodate realistic assumptions about the underlying distributions. In contrast, the HMM suffers from the unrealistic Geometric state duration distribution, which is a direct consequence of strictly Markovian state transitions. Here, we emphasized the importance of lognormally distributed brain parameters (e.g., axon diameters, synaptic firing rates) and their implications for overall cognitive functioning as discussed by ( Buzsáki and Mizuseki, 2014 ) and empirically demonstrated by Trujillo- Barreto et al. (2019) . Thirdly, and partly related to the previous point, the HSMM circumvents Markovian memorylessness. By modeling the state durations explicitly and more realistically, the assumptions underlying the HSMM agree with the temporal dependence of M/EEG data ( Buzsáki and Mizuseki, 2014 ;Roberts et al., 2019 ;Van de Ville et al., 2010 ).
Even though the HSMM overcomes several major limitations of other segmentation methods, the HSMM's assumptions (e.g., instantaneous state switching, common and fixed states) need further investigation and validation in healthy and pathological contexts. Considering nonclinical AVH in particular, it would be informative to correlate HSMM state metrics with behavioral assessments of hallucination proneness and the severity of psychotic symptoms (e.g., Launay Slade Hallucination scale; Positive and Negative Symptom Scale; Kay et al., 1989 ;Launay and Slade, 1981 ). Based on previous findings emphasizing an increased anti-correlation of the DMN and other RSNs ( Northoff and Qin, 2011 ;van Lutterveld et al., 2014 ), we would expect such dynamics to manifest in aberrant transition probabilities. For instance, HSMM states capturing the functional signatures of these networks may show altered switching patterns, e.g., a reduced probability to switch between each other as compared to other states.
Due to the quick switching behavior of brain states, high temporal resolution techniques are indispensable. It is important to emphasize that any RS data segmentation and analysis method can only identify dynamics at a resolution that is provided by the data modality. Hence, M/EEG should be the preferred measurement. Nonetheless, knowledge about the spatial sources of dynamic changes is critical e.g., to guide alternative treatments such as brain stimulation methods, which aim at noninvasively and painlessly manipulating network functioning such as transcranial magnetic stimulation (TMS; van der Werf et al., 2010 ) and transcranial direct current stimulation (tDCS; Kostova et al., 2020 ).
To this end, the joint modeling of fMRI and source-space M/EEG data using the HSMM, or combined applications of the HSMM and multivariate autoregressive (MAR) observation models, which can accommodate lagged dependencies between networks ( Chiang et al., 2008 ), may yield further support for aberrant RSN dynamics in various (sub-)clinical phenomena, including AVH.
In the past, brain state characterization using H(S)MMs was most commonly based on the (sensor-or source-space) amplitude envelopes of filtered or broadband M/EEG data ( Baker et al., 2014 ;Quinn et al., 2018 ;Trujillo-Barreto et al., 2019 ;Vidaurre et al., 2016 ). However, state dynamics can also be modeled based on other data features, e.g., cross-frequency coupling, as implemented in the dominant intrinsic coupling mode (DICM) framework ( Dimitriadis et al., 2019 ). Combining the DICM approach with semi-Markovian modeling of brain connectivity states could be an interesting and complementary step for future research. Alternatively, future research could focus on manipulating network function using psychopharmacological agents and the modeling of corresponding changes in brain state dynamics. Evidence suggests that common medication for various neuropsychiatric conditions (e.g., noradrenergic, anti-dopaminergic, or glutamate-modulating agents) alters RSN activity ( Cole et al., 2013 ;Metzger et al., 2016 ;Scheidegger et al., 2012 ). A finer-grained, i.e., higher temporal resolution characterization of these alterations, i.e., changes in state transitions or durations) could yield additional insights into alternative interventions.
Lastly, HSMM state characterization is particularly suited for, but not restricted to AVH. We expect additional benefits for the modeling of neurocognitive dynamics involved in other sensory phenomena such as bistable perception ( Dowlati et al., 2016 ;Kornmeier and Bach, 2006 ), neurological conditions such as epilepsy ( Centeno and Carmichael, 2014 ), and other neuropsychiatric disorders associated with RSN dysfunction, e.g., depression and dementia ( Menon, 2011 ). RS analysis methods with increased temporal sensitivity may help to identify and refine our knowledge about unique disease markers. More crucially, the temporal dynamics of brain states could inform about the early risk status of vulnerable individuals. Early intervention consistently leads to higher treatment success rates in many psychiatric disorders ( Bird et al., 2010 ;McGorry, 2008 ;Vieta et al., 2018 ). Identification of subtle connectivity changes and their respective temporal and spatial sources could thus facilitate selectively targeting affected individuals early in the disease process.

Conclusion
We conclude that the semi-Markovian properties of the brain require computational methods of investigation that build on realistic assumptions about the neural time series data (e.g., lognormally distributed state durations, long-range correlations) and are highly sensitive to dynamic network fluctuations. The HSMM is a promising temporally sensitive approach to characterize fast switching connectivity patterns, thereby providing innovative methodological possibilities for research on whole-brain network communication. Given that AVH are a transdiagnostic phenomenon experienced by individuals with psychotic disorders, Bipolar disorder, PD, and depression ( Bauer et al., 2011 ;Fénelon et al., 2000 ;Toh et al., 2015 ), elucidating common and differential mechanisms underlying hallucinations in these disorders will advance research and clinical care. Crucially, our reasoning likely applies to a variety of neuropsychiatric profiles that require a high temporal sensitivity to brain activity changes. Lastly, investigation of symptoms and corresponding neurocognitive changes at a subclinical level will lead to a holistic evaluation of individual risk and disease development, which in turn may benefit early prognosis, (differential) diagnosis, and intervention.

Data and code availability statement
Our submission to NeuroImage is a Review article. We did not perform any analysis and did not make use of data and/or code.