Neural responses in autism during movie watching: Inter-individual response variability co-varies with symptomatology

Naturalistic movie paradigms are exquisitely dynamic by nature, yet dedicated analytical methods typically remain static. Here, we deployed a dynamic inter-subject functional correlation (ISFC) analysis to study movie-driven functional brain changes in a population of male young adults diagnosed with autism spectrum disorder (ASD). We took inspiration from the resting-state research field in generating a set of whole-brain ISFC states expressed by the analysed ASD and typically developing (TD) subjects along time. Change points of state expression often involved transitions between different scenes of the movie, resulting in the reorganisation of whole-brain ISFC patterns to recruit different functional networks. Both subject populations showed idiosyncratic state expression at dedicated time points, but only TD subjects were also characterised by episodes of homogeneous recruitment. The temporal fluctuations in both quantities, as well as in cross-population dissimilarity, were tied to contextual movie cues. The prominent idiosyncrasy seen in ASD subjects was linked to individual symptomatology by partial least squares analysis, as different temporal sequences of ISFC states were expressed by subjects suffering from social and verbal communication impairments, as opposed to nonverbal communication deficits and stereotypic behaviours. Furthermore, the temporal expression of several of these states was correlated with the movie context, the presence of faces on screen, or overall luminosity. Overall, our results support the use of dynamic analytical frameworks to fully exploit the information obtained by naturalistic stimulation paradigms. They also show that autism should be understood as a multi-faceted disorder, in which the functional brain alterations seen in a given subject will vary as a function of the extent and balance of expressed symptoms.


Introduction
Naturalistic paradigms, in which the scanned volunteer is exposed to an ecological experimental design, have been gaining interest in functional magnetic resonance imaging (fMRI) studies over the recent years. Compared to traditional task-based recordings that alternate between task and rest epochs, they enable to probe brain function in more realistic settings, which has provided insights into our understanding of previously scarcely explored facets of cognition: this includes, for example, music perception with auditory stimuli (Toiviainen et al., 2014;Burunat et al., 2016); theory of mind and the understanding of emotional responses upon audiovisual stimulation (Hasson et al., 2004;Wolf et al., 2010;Lahnakoski et al., 2012;Karim and Perlman, 2017); or reward processing in cognitive control when playing a video game (Huskey et al., 2018).
Naturalistic stimulation is also increasingly seen as a possible replacement to resting-state recordings in some settings because of a better test-retest reliability, which enables more accurate fingerprinting of individuals based on functional connectivity Vanderwal et al., 2017;Wang et al., 2017). Further, watching a movie lowers head motion of the participants, easing paediatric studies (Vanderwal et al., 2018), while narrative stimuli have been suggested as a replacement solution for pre-surgical language mapping (Tie et al., 2015;Wilson et al., 2017).
Several methodological approaches have been used to study naturalistic paradigms: on the one hand, dedicated regressors have been employed to reveal the neural correlates of a function of interest through general linear models Kim et al., 2016;Karim and Perlman, 2017) or psychophysiological interaction analysis (Friston et al., 1997;McLaren et al., 2012;Huskey et al., 2018). Data-driven approaches extracting information without the need for prior knowledge have also been applied in the form of independent component analysis variants (Cong et al., 2014) or machine learning (Mandelkow et al., 2016;Hu et al., 2017).
Another popular data-driven approach assesses the similarity of regional time courses from separate subjects exposed to the same timelocked paradigm. If the activity of the same area is compared across subjects, one speaks of inter-subject correlation (ISC), which was pioneered by Hasson et al. (2004) in a movie watching experiment. If time courses from separate brain regions are compared across subjects, as introduced more recently (Simony et al., 2016), the technique is referred to as inter-subject functional correlation (ISFC). Specific advantages of cross-subject analyses are the damping of uncorrelated signal sources across subjects, such as head motion and overshadowing intrinsic activity changes, so that stimulus-driven fluctuations are more clearly revealed (Simony et al., 2016;Bolton et al., 2018). At the same time, these stimulus-driven responses do not need to be hypothesised. The assessment of response similarity through IS(F)C has advanced our understanding of clinical brain disorders, either through group-wise comparison to a healthy reference (Rosenthal et al., 2017;Tohka et al., 2018), or by uncovering the functional underpinnings of within-group symptoms' heterogeneity (Finn et al., 2018;Tei et al., 2018).
Although the potential of dynamic inter-subject measures has recently been demonstrated (Simony et al., 2016), even the most recent analyses often remain at a static level, where only one estimate is obtained for the whole duration of the stimulus-see for instance Finn et al. (2018); Tei et al. (2018); Salmi et al. (2019). This is somehow at odds with the highly dynamic nature of movies, in which a fast-paced sequence of diverse cues are sequentially presented. There are two deleterious consequences of static IS(F)C estimates: first, assuming a difference is found within or across groups, it becomes impossible to pinpoint which of the movie cues drove it; second, the detection of temporally localised differences in responses will likely be impeded by the presence of the other time points (lowered sensitivity). Accordingly, the development of dynamic inter-subject methods is emerging as a new frontier in the analysis of naturalistic paradigms (Bolt et al., 2018;Bolton et al., 2018).
Here, we wished to build on previous efforts of ours where we proposed a novel dynamic ISFC analytical pipeline extracting the moments of significant ISFC transients . The present work introduces a fully distinct method, which leverages state-based analysis, a popular resting-state dynamic functional connectivity approach-see Preti et al. (2017) for a review. We apply our new framework to compare the responses driven by an audiovisual documentary in typically developing (TD) volunteers, and in subjects diagnosed with autism spectrum disorder (ASD).
ASD is a particularly well-suited condition for ISFC investigations based on a naturalistic movie: indeed, associated behavioural symptoms include hypo/hyper-responsiveness to sensory stimulation. In addition, as we consider a documentary for youngsters centred on social interactions and (non)verbal communication between all protagonists, social impairments and altered communication-other cardinal symptoms of ASD-can also readily be addressed. Accordingly, previous movie-based studies have evidenced a greater idiosyncrasy in induced responses in ASD (Hasson et al., 2009), and its correlation with the degree of social impairments (Byrge et al., 2015).
In what follows, we first derive a set of whole-brain ISFC states from the combined TD/ASD subject population, and explore the specificities of their recruitment over time. We probe the involved neurobiological substrates, and relate ISFC state expression to the movie content. In a second step, we then show that the idiosyncrasy in state expression within ASD subjects is tied to the exact subject-specific symptomatology.

Subjects and data acquisition
16 typically developing (TD) subjects and 15 participants diagnosed with autism spectrum disorder (ASD), as assessed by DSM-IV criteria and confirmed with the Autism Diagnostic Interview-Revised-ADI-R (Lord et al., 1994), took part in the study, which was approved by the local ethics committee (Biomedical Inserm protocol C08-39). All subjects, or their legal representatives, provided written informed consent.
In all but three subjects, IQ measures were acquired with a short form of the WAIS-III scale (Weschler, 2000). In all but five (four ASD and one TD) participants, the Autism Spectrum Quotient-AQ ( Baron-Cohen et al., 2001)-was also obtained. In all but three ASD subjects, scores from the ADI-R were collected as well, spanning four different areas of impairment (1. social skills, 2. nonverbal communication, 3. verbal communication, 4. repetitive and stereotyped behaviours).
For each subject, the data was acquired in three separate runs: the first and second (RUN 1 and RUN 2 ) were combined movie watching/ resting-state (RS) acquisitions, with the movie displayed from 5 to 353 s (348 s ¼ 5.8 min duration), and the RS scanning lasting from 386 to 678 s (292 s ¼ 4.9 min). The third run (RUN 3 ) was a pure RS acquisition, which lasted for 310 s (5.2 min). All RS runs were conducted with eyes closed, instructing the subjects not to fall asleep. Subjects were asked to refrain from moving as much as they could, and RUN 3 was not recorded in 4 subjects due to claustrophobia. Part of this data was already previously analysed in a combined EEG/fMRI study of speech processing (Jochaut et al., 2015), and in a former dynamic ISFC analysis of ours .
In total, 340 echo-planar fMRI image volumes (Tim-Trio; Siemens, 40 transverse slices, voxel size ¼ 3 mm x 3 mm x 3 mm, repetition time ¼ 2 s, echo time ¼ 50 ms, field of view ¼ 192) were acquired during RUN 1 and RUN 2 , and 155 volumes during RUN 3 . A 7 min anatomical T 1 -weighted magnetisation-prepared rapid acquisition gradient echo sequence (176 slices, voxel size ¼ 1 mm x 1 mm x 1 mm, field of view ¼ 256) was also acquired at the end of scanning.
Participants watched an audiovisual scientific documentary about the dangers of sun exposure (the same in RUN 1 and RUN 2 ). They were instructed to attentively watch the program, and were told that they would need to summarise its content after the MRI session. The movie included moments of social interaction between the presenters and children, several engaging scientific explanations, and a broad array of sensory and visual cues. It can be watched at https://miplab.epfl.ch/ index.php/miplife/research/supplement-asd-study.

Preprocessing
Preprocessing of the data was similar to that of Bolton et al. (2018). Briefly, we used SPM12 (www.fil.ion.ucl.ac.uk) and a combination of in-house MATLAB scripts and scripts from the DPARSFA toolbox (Yan and Zang, 2010). Functional volumes were realigned to the first image and co-registered to the structural image, which was segmented to derive the nonlinear deformation field between native and MNI spaces. Voxel-wise functional time courses were detrended, and constant, linear, quadratic, average cerebrospinal fluid (CSF) and average white matter (WM) time courses were regressed out. To generate CSF and WM time courses, we averaged the signals contained in conservative masks from the DPARSFA toolbox. As an additional verification, we also included the regression of the global signal-a historically controversial preprocessing step (Murphy et al., 2009;Fox et al., 2009;Murphy and Fox, 2017), and compared the obtained results to those from our main analyses (see Supplementary Material, section 5).
All runs with excessive instantaneous framewise displacement-FD > 0.5 mm (Power et al., 2012)-in more than 30% of frames were discarded. In total, we thus kept 30 RUN 1 (15 TD), 29 RUN 2 (15 TD), and 24 RUN 3 (14 TD) recordings. Age, IQ, AQ and ADI-R scores for all subjects are summarised in Table 1, along with mean FD (excluding scrubbed frames) and the percentage of scrubbed frames for all the runs retained for analysis.
Subjects from both groups were matched for age (p ¼ 0:24, twotailed t-test), handedness (all right-handed) and gender (all males). Both populations significantly differed for IQ (p ¼ 1:93 Á 10 À4 ), which we capitalised on in our analytical pipeline to reduce data dimensionality (see Section 2.4). Mean FD did not differ significantly between the TD and ASD groups (p ¼ 0:08) as assessed by a group Â run ANOVA design; however, there was a significant difference for the percentage of scrubbed out frames (p ¼ 0:0059). This prompted us to perform additional analyses to verify that motion-induced biases remained minimal and did not influence our findings: we thus reran our methodological pipeline on the full dataset using an alternative strategy to scrubbing, wavelet despiking (Patel and Bullmore, 2015), where there is no censoring of data points. Despiking was applied prior to the detrending step. The results of these additional analyses can be found in the Supplementary Material, section 5.
The data was then regionally averaged using a previously derived functional parcellation of the brain (2-level temporal correlation algorithm from Craddock et al. (2012), which we downloaded at https://www.nitr c.org/projects/cluster_roi/). The R ¼ 267 regions of interest were present across all subjects.
For each run, the MNI space atlas was inverse warped into native space using the deformation field acquired upon segmentation. Atlased time courses were scrubbed, discarding all frames with FD larger than 0.5 mm (time t), as well as the following ones (time t þ 1), and reestimating them through cubic spline interpolation. Finally, regional time courses were high-pass filtered at 0.05 Hz, in accordance with conducting a sliding window analysis with W ¼ 20 s (Leonardi and Van De Ville, 2015).

Computation of ISFC
We measured dynamic inter-subject functional correlation-ISFC (Simony et al., 2016)-in sliding window fashion using a dedicated toolbox (Bolton et al., 2019b). For each pair of brain areas, Pearson's correlation coefficient was iteratively computed over a gradually shifted rectangular temporal window of length W ¼ 20 s, the inverse of the lowest frequency still present in the data (Leonardi and Van De Ville, 2015), with a step size of 2 s (see Fig. 1A for a schematic depiction). In more details, if x ½s i ðtÞ denotes the activity of region i for subject s at time t, an ISFC connectivity estimate between regions i and j for this subject is given by: In the above equation, G denotes the reference group of subjects to which s is compared, N G is the number of subjects in this group, and ρðx 1 ; Table 1 Demographic, cognitive and motion-related properties of the analysed subjects. The subjects did not differ in terms of age (p ¼ 0:24) or mean framewise displacement (p ¼ 0:08), but did in terms of IQ (p ¼ 1:93 Á 10 À4 ) and percentage of scrubbed frames (p ¼ 0:0059). All participants were right-handed males. Sessions excluded because of excessive motion have their associated entries written down in italics, and the mean for each score across all subjects is provided as the last row of the x 2 Þ is Pearson's correlation coefficient computed between x 1 and x 2 . The full inter-subject functional correlation matrix ISFC (size R Â R) is not symmetric, but for simplicity, we considered the symmetrical matrix given by 1 2 ðISFC þ ISFC > Þ. For both TD and ASD populations, we computed ISFC with respect to a reference group made of TD subjects, in order to assess response similarity to a typical reference. Following Byrge et al. (2015), a bootstrapping approach was employed, in which ISFC was computed 500 times, assessing synchrony with a different reference group of 6 sessions (3 RUN 1 and 3 RUN 2 ) at each fold. Fig. 1B schematically illustrates the ISFC computation process.
The majority of our results rely on ISFC measurements obtained from the movie watching segments of RUN 1 and RUN 2 . Resting-state portions of the acquired sessions are only employed to reduce data dimensionality, as detailed in the following section.

Reduction of data dimensionality
Due to the curse of dimensionality (there are 3.6 times more dimensions than available samples in our raw ISFC dataset), and given the likely presence of many irrelevant dimensions (i.e., connections not responding to the movie), reliable analytical outcomes may be difficult to obtain (Hinneburg and Keim, 1999). Furthermore, since the goal of our study is to understand ASD, it would be desirable to focus analyses on movie-driven changes that are not related to differences in IQ across our subjects, but to ASD per se.
Therefore, we deployed feature selection based on a recently developed metric that extracts, for each connection, the moments of significant ISFC transients in the data . Briefly, this operates through a comparison between movie watching and resting-state ISFC time courses, the latter being used to construct a connection-specific null distribution. Thresholding can then be performed at a selected α-level. For example, at α ¼ 0:05, the 5th and 95th percentiles of the null distribution of a connection of interest are selected as thresholds, and the movie watching ISFC values that exceed these thresholds are tagged as showing significant excursions. The metric of interest is the total number of detected transients.
Here, we derived significant ISFC transients for α ranging between 10 À6 and 0.5. In each case, separately for each connection, we correlated (Spearman's rank correlation) the number of transients with (1) IQ scores within the TD group, (2), IQ scores within the ASD group, and (3) IQ scores of the whole population after regressing out the IQ group difference. As soon as one of these three assessments yielded an uncorrected pvalue lower than 0.05, the probed connection was flagged IQ-related. To account for possible remaining impacts of head motion, FD (computed on non-scrubbed frames) was included as a covariate of no interest. Fig. 2A depicts how many connections were not linked to IQ, and hence retained for subsequent analyses, as a function of the α-level. As α increases (less conservative thresholding), there is first a larger amount of detected relationships with IQ-and hence, less retained connections for ensuing analyses, due to the fact that more of the ISFC transients become captured. For even larger α, the number of significant IQ relationships decreases due to the greater noise (false positive ISFC excursions) that starts contaminating the data. For the analyses presented in this work, we selected the case that yielded the lowest number of retained connections, so that those with possible link to IQ would be maximally discarded, while at the same time lowering the dimensionality of our data as much as possible. This optimal value was α ¼ 0:1 (depicted by an arrow in Fig. 2A); if we denote by C the set of retained connections, jC j ¼ 2604 non IQ-related connections that spanned the whole brain ( Fig. 2B) were kept for subsequent analyses. For a given pair of brain regions, ISFC measurements are computed over temporally subsequent sliding windows to generate a dynamic ISFC time course. When done for all pairs of areas, the obtained information can be expressed as a dynamic inter-subject functional connectome (i.e., a temporal series of ISFC matrices). N 1 and N 2 stand for the number of TD and ASD subjects, respectively. (B) To compute an ISFC measurement between regions 1 and 2, the subject of interest s has its ISFC computed with respect to a subset of subjects randomly sampled from a reference group, and this is repeated over N bootstrapping folds. The average across folds provides the desired ISFC estimate. (C) ISFC matrices can be clustered into a set of ISFC states to generate expression time courses across subjects. Measures of population-level homogeneity and idiosyncrasy, as well as dissimilarity between populations, can be computed and brought back to the movie. TD: typically developing. ASD: autism spectrum disorder.

ISFC state analysis
In a first analysis, we wished to track the evolution of whole-brain ISFC configurations over time in our subjects. To do so, we applied a state-based approach, where a set of connectivity states are derived from the whole sliding window ISFC dataset. Details regarding the generation of the states, and the origin of the approach-borrowed from the restingstate dynamic functional connectivity field, are provided in the Supplementary Material, section 1.
Below, we detail the aspects specific to the present work: the selection of an optimal number of states into which to segment the data (Section 2.5.1), the design of population-level metrics reflective of ISFC state expression (Section 2.5.2), the extraction of the most influential connections from the ISFC states (Section 2.5.3), and the link between state expression and quantitative features from the movie (Section 2.5.4).

Selection of the number of clusters
To select into how many clusters K to partition the data, we performed consensus clustering (Monti et al., 2003), a subsampling-based approach in which robustness of clustering is assessed across a range of candidate values. Here, we probed K values ranging from 10 to 250, and evaluated clustering quality by the proportion of ambiguously clustered pairs-PAC (Senbabaoglu et al., 2014). Details on the approach can be found in the Supplementary Material, section 2. As can be seen in Fig. 2C, the evolution of PAC showed an exponentially decreasing trend for increasing K, in accordance with previous work using the same methodology (Z€ oller et al., 2018;Bolton et al., 2019Bolton et al., , 2020. We selected three values for which PAC was particularly lower than what would be expected from that trend, spanning the full range of investigated options: K 1 ¼ 70; K 2 ¼ 130, and K 3 ¼ 190. The main results presented here will use K 2 , but our findings generalised to these two other choices (see Supplementary Material, section 6).

Metrics reflective of population-level ISFC state expression
Following clustering into K ISFC states (each of size jC j ¼ 2604), our goal was to extract measures reflective of population-level state expression in TD and ASD subjects. We considered three different metrics, detailed below: (1) within-group homogeneity of ISFC state expression, (2) within-group idiosyncrasy in ISFC state expression, and (3) crossgroup difference in ISFC state expression.  (top) and horizontal (bottom) brain slice representation, number of retained connections per considered brain region at α ¼ 0:1. Colour and size of the nodes are proportional to nodal degree. (C) Percentage of ambiguously clustered pairs (PAC) for increasing cluster number values K, with a lower PAC denoting a more robust parameter choice. The colour coding from black to yellow for a given K reflects an increasingly lower range of consensus values taken as intermediate (hence associated to ambiguous clustering). Three values (K 1 ¼ 70; K 2 ¼ 130; K 3 ¼ 190) in which PAC was particularly lower than the global exponentially decreasing trend were selected for subsequent investigations, with K 2 ¼ 130 used in our main analyses.
Moments when different members of the same subject population simultaneously visit the same ISFC state are likely driven by particularly salient movie features, and are thus relevant to the understanding of the underlying functional brain responses. We computed homogeneity within a population of subjects at time t as the maximal fraction of subjects expressing the same state: if S ses data sessions are considered, it can thus vary between 1 Sses (all subjects express a different state) and 1 (all subjects express the same state).
On the other hand, time points when subjects show a heterogeneous expression of ISFC states may also be worth investigating; indeed, they may reflect the use of different cognitive strategies, or the presence of distinct underlying functional alterations in ASD subjects. We thus measured the idiosyncrasy within a subject population as the number of different ISFC states expressed at a given time t, divided by S ses . Idiosyncrasy can thus range from 1 Sses (all subjects express the same state) to 1 (each subject expresses a different state).
Finally, we also wished to isolate moments in time when the TD and ASD populations significantly aligned, or differed, in their ISFC state expression. For this purpose, for each time t and defining S TD and S ASD as the number of TD and ASD sessions, respectively, we first generated a cost matrix of size S TD Â S ASD , where the ði; jÞ th entry was filled with 0 if the same state was being expressed in session i of the TD population and session j of the ASD population, and with 1 otherwise. We then computed the assignment cost with the Hungarian algorithm (Kuhn, 1955), and divided by the minimal number of subjects in a population (in our case, S ASD ¼ 29). The resulting cross-population dissimilarity measure takes values between 0 (exactly similar state expression in the TD and ASD populations) and 1 (fully distinct state expression patterns).
For all three metrics, we also devised a statistical assessment to determine significant time points. Our strategy was to perform nonparametric statistical testing, comparing the real metric time course to those computed from dedicated null data.
For homogeneity and idiosyncrasy, we proceeded similarly: first, we computed the median number of expressed states across time, n med , in the assessed population; this determines out of many states a null realisation should be constructed. Second, we created 998 surrogates of ISFC state expression in the assessed population: in each case, at time t, the state expressed by a given subject was randomly sampled with replacement out of n med possibilities. Because homogeneity and idiosyncrasy are directly dependent, we extracted significant time points twodimensionally, and retained the moments when the two-dimensional (Homogeneity, Idiosyncrasy) data point would fall outside of the area spanned by null values.
For population dissimilarity, at time t, each of 998 null realisations was built by randomly shuffling subject labels between the TD and ASD groups, and subsequently computing population dissimilarity from this transformed ISFC state expression data. We deemed significant the moments when dissimilarity exceeded the range of the null distribution.
As we considered 998 surrogates and a two-tailed assessment, in all three metric cases, statistical significance was thus assessed at a p-value of 1 499þ1 ¼ 0:002.

Extraction of key ISFC connections
We wanted to determine the most important brain connections at play in driving homogeneity, idiosyncrasy and cross-population dissimilarity in ISFC state expression. To do so, we applied a similar strategy for each time point t: For homogeneity in state expression, we computed the connectionwise mean ISFC value across all expressed states. A larger mean value indicates that the connection at hand is more influential in cross-regional synchronisation within the considered ISFC states. In the presented results, we separately show positive-valued and negative-valued entries, since they may reflect functionally distinct information (integration as opposed to segregation of brain systems, respectively). For idiosyncrasy in state expression, we computed the connectionwise standard deviation of ISFC values across all expressed states. A larger standard deviation indicates that the connection at hand is more distinctly expressed across the considered ISFC states. For cross-population dissimilarity, for each connection, we performed two-tailed t-testing between the ISFC values of the TD and ASD populations. A larger t-statistic highlights a connection that is differentially expressed by both groups to a greater extent. In the presented results, we separately display positive-valued and negativevalued entries, which respectively stand for connections that take more positive values in TD or ASD subjects.
In our results, we focus on the patterns seen at indicative time points showing statistically significant metric expression.

Determination of the movie content responsible for ISFC expression patterns
We also wished to quantitatively assess which types of movie cues drove the observed fluctuations in ISFC state expression. To do so, we generated a set of five movie regressors highlighting different aspects of relevance: (1) presence of faces on screen, (2) luminosity of the scene, (3) novelty of the soundtrack, (4) brightness of the soundtrack, and (5) movie context ("fun" versus "science"). In addition, we also labeled the time points showing a change in scenery throughout the movie. Details on how the regressors were constructed are provided in the Supplementary Material, section 3.
We individually correlated (Pearson's correlation coefficient) each of the five movie regressors to the TD and ASD time courses of state expression homogeneity/idiosyncrasy, and to the time course of crosspopulation dissimilarity in state expression, for a resulting matrix of size 5 Â 5.
Significance was established, in each case, by comparing the actual correlation value to a non-parametrically constructed null distribution, where movie regressor time points were randomly shuffled for each null realisation case. We assessed significance at a Bonferroni-corrected pvalue of 0.05 for 25 parallel statistical tests.
2.6. Links between idiosyncratic ISFC state expression and symptomatology in ASD 2.6.1. Partial least squares analysis As a second main analysis, we wanted to determine whether the idiosyncratic expression of ISFC states in the ASD population could be related to subject-specific symptomatology. To do so, we used partial least squares (PLS), an approach that extracts overlapping covariance relationships between two sets of data. Below, we summarise the gist of PLS analysis; details regarding significance assessment can be found in the Supplementary Material, section 4 (McIntosh and Lobaugh, 2004; Krishnan et al., 2011).
In our case, the first dataset consisted in ISFC state expression features. For each ASD subject, we used the time courses of expression of all ISFC states, thus inputing a space/time representation. The resulting data was summarised in the imaging features matrix F 2 R ðK Á TÞÂ24 , with K the number of ISFC states and T the number of time points when ISFC estimates were obtained. The matrix only contains 24 (instead of 29) columns, as data on individual symptomatology was not available for some subjects. The second dataset consisted in four scores from the Autism Diagnostic Interview-Revised: social component, nonverbal communication component, verbal communication component, and stereotyped behaviours component (note that larger score values highlight greater impairment), summarised in the behavioural matrix B 2 R 4Â24 . Both matrices were normalised across subjects prior to PLS analysis. PLS considers the cross-covariance matrix between ISFC features and behavioural domain scores: This matrix can be equivalently expressed through singular value decomposition as: where U 2 R 4Â4 contains the left singular vectors of R, arranged in successive columns. These vectors form an orthonormal basis; i.e., U > U ¼ I.
The same property applies to the right singular vectors, arranged in the columns of V 2 ℝ ðK Á TÞÂ4 . As for Σ 2 R 4Â4 , it is a diagonal matrix containing the singular values σ i ; i ¼ f1; 2; 3; 4g. We assume here that singular vectors and singular values are sorted in decreasing singular value order (that is, decreasing percentage of explained data covariance). The intuition behind this decomposition is that the full crosscovariance between both datasets is expressed as a weighted low-rank approximation: where u i and v i are the i th left and right singular vectors, respectively. In the case of this work, since we consider only 4 behavioural scores (limiting dimension of the decomposition), four pairs of singular vectors are derived, and represent four overlapping modes of covariance. The entries of the left singular vector u i are the behavioural salience weights associated to the i th mode: they specify by how much each behavioural score is multiplied to yield the linear combination that maximally covaries with imaging features. The same applies for the entries of the right singular vector v i , which are referred to as ISFC salience weights.
Since U and V are both orthonormal bases, it follows that the strength of expression of the modes of covariance in the investigated pool of subjects can be expressed as a projection: with L B 2 ℝ 4Â24 denoting the strength of expression of ASD scores, or behavioural latent weights, across the 24 subjects, and L F 2 ℝ 4Â24 that of ISFC state features (called ISFC latent weights). The i th row of L B or L F contains the weights associated to the i th covariance component, while the j th column contains all weights associated to subject j.

Determination of salient movie features and associated ISFC states
To determine the movie content that was responsible for the observed covariance between ASD symptoms and ISFC state expression, we applied a similar strategy as outlined in Section 2.5.4, by correlating our movie regressors with the output imaging salience weights from PLS, separately for the time course of each ISFC state. Accordingly, our results were Bonferroni-corrected for 5K 2 ¼ 650 parallel tests.
To gain insight into the ISFC patterns associated to these relationships, we also extracted the top 10 ISFC states with at least one significant relationship to the movie content. They were sorted in descending order of their largest correlation to any movie regressor, notwithstanding the sign.

Population-specific temporal fluctuations in ISFC state expression
ISFC estimates of the combined TD and ASD populations were aggregated, and subjected to k-means clustering for the generation of ISFC states. Fig. 3A shows which ISFC states were expressed in the analysed subjects along the movie (middle panels, with each ISFC state labeled by a different colour), and also highlights the set of movie regressors that we extracted. A clear structure can be seen along time, with state transitions occurring at moments that would often fit the switch from one movie scene to another-denoted by the horizontal white lines. For instance, the yellow state expressed from 280 s onwards in the TD population nicely matches the duration of a particular movie sequence where one of the presenters was interacting with children on the beach. Links are also seen when relating the state expression time courses to the set of movie regressors: for this particular example, the expression of the yellow state also matches the presence of faces on screen, a greater overall luminosity, and a "fun" context.
Considering the same data across subjects at a time point t, moments of homogeneous state expression as exemplified above were interspersed with more heterogeneous patterns. A quantitative monitoring of population-wise homogeneity and idiosyncrasy in state expression (Fig. 3B) revealed that at several moments, the TD population showed significantly homogeneous state expression, while at others, it was instead significantly idiosyncratic (see the left panel). In the ASD population, homogeneity in state expression was less clear, as seen in the right panel, where homogeneity was never larger than expected under the null. Idiosyncratic state expression was, however, still present, in levels that were more elevated than for the TD case.
The state expression characteristics of both populations were sometimes in line, such as around 150 s (light blue data points, with significant idiosyncrasy in both groups) or at t 4 ¼ 268 s (same pattern). An explicit computation of cross-population dissimilarity (Fig. 3C) revealed that it only reached significance at one time point (t 3 ¼ 226 s). Fig. 3D denotes the connections that were the most homogeneous, idiosyncratic, or different across groups at indicative movie time points, across the battery of expressed ISFC states. At t 1 ¼ 72 s, scientific explanations were provided on UV rays, and in accordance with executive processing, nodes with the largest average ISFC were located in the prefrontal brain. At t 2 ¼ 168 s, the strongest ISFC connections on average instead involved the somatomotor and auditory areas, at a moment when the presenter had just finished touching a display panel with a concomitant sound. At t 4 ¼ 268 s, during a moment of scientific explanations involving geometric shapes in motion and a brightly coloured panel, the most prominent positive ISFC nodes involved the occipitoparietal brain. In each of these cases, the most variable connections across expressed ISFC states were distributed similarly to the ones with largest average value, and similar overall patterns were seen across groups. Accordingly, cross-group differences were often captured in a different set of connections, for which ISFC values were thus less large on average, and less variable.
To demonstrate that the unraveled ISFC state expression patterns over time arise from the movie content, we correlated the population-wise homogeneity and idiosyncrasy time courses, as well as the crosspopulation dissimilarity time course, to the extracted movie regressors (Fig. 3E). Sound novelty significantly negatively correlated with homogeneity in state expression in the ASD population (i.e., a greater novelty in sound content led to less homogeneous state expression). In addition, the contextual "fun versus science" regressor significantly correlated with three metrics: homogeneity in the TD population (negatively), idiosyncrasy in the ASD population (positively), and population dissimilarity (positively). In other words, when science scenes occur, homogeneity is lowered in TD subjects, idiosyncrasy is enhanced in ASD subjects, and this is in a way that exacerbates the dissimilarity between both groups.

ISFC state expression differs as a function of individual ASD symptomatology
To test whether the more heterogeneous state expression observed in ASD subjects may be due to different extents of behavioural symptoms across individuals, we performed a PLS analysis between subject-wise ISFC state expression time courses, and scores indicative of ASD symptomatology (quantifying social and nonverbal/verbal communication impairments, as well as stereotypic behaviours).
(caption on next page) T.A.W. Bolton et al. NeuroImage 216 (2020) 116571 One significant mode of covariance was found (p ¼ 0:004). The associated ISFC and behavioural saliences are depicted in Fig. 4A and B. ISFC salience weights occurred over temporally localised time spans, and included both positive and negative values. Subjects that undergo the expression pattern denoted by positive-valued elements, and do not express the negative-valued entries, show worse nonverbal communication skills and more repetitive behaviours (the positive-valued behavioural salience weights), while at the same time showing less impaired social and verbal communication skills (the negative-valued behavioural salience weights). In the representation of latent weights (Fig. 4C), these would be the data points located at the top right of the graph. Conversely, the subjects depicted at the bottom left of that graph (who express negative latent weights) will express the opposite behavioural and ISFC expression contrast.
In many occasions, ISFC salience weight time courses showed significant correlations with the examined movie regressors (Fig. 4D). Positive correlations were observed between the expression of some states and the presence of faces on screen, luminosity, or the contextual fun versus science regressor (top left panel). Negative correlations were seen with the same regressors, as well as sound brightness (top right panel). Thus, the derived relationship between the expression of ISFC states and individual ASD symptomatology is driven by visual, auditory, social and contextual cues from the movie that was being watched by the participants. The involved ISFC states (see bottom panel for a few indicative examples) involved whole-brain ISFC patterns that included both positive ISFC (denoting integration of information across brain centres) and negative ISFC (reflective of functional segregation).

Movie-driven brain responses are tracked by the analysis of ISFC state expression
Naturalistic movies expose the brain to a wide array of temporally ordered cues, and by successive measurements of cross-subject synchronisation between brain regions, we could track these dynamic reconfigurations at the whole-brain level under the form of a temporal sequence of ISFC states. We observed sophisticated rearrangements over time (Fig. 3A), where the most synchronised connections across subjects spatially relocated to different functional brain systems as a function of the input movie cues (Fig. 3D). On top of taking consistently large ISFC values, these connections also showed the most variable strength across subjects, highlighting an individualised extent of recruitment of the underlying functional brain networks. These subject-specific characteristics are well in line with the successful individual fingerprinting efforts performed from naturalistic imaging Vanderwal et al., 2017).
At the population level, this duality of ISFC patterns was rendered by variable homogeneity and idiosyncrasy of ISFC state expression over the course of the movie (Fig. 3B): at some time points, brain systems were recruited to different extents across subjects, resulting in significant idiosyncrasy of expression. This was seen in both the TD and ASD groups, and may reflect differential processing of the incoming stimuli as a function of past experiences, or attentional focus. In ASD subjects, the switch from a fun to a scientific context positively correlated with idiosyncrasy in state expression (Fig. 3E): a possible explanation could be the development of different types of compensatory processing strategies across individuals, as a way to palliate to deficits in long-range crossnetwork interactions that impinge on executive functions (Just et al., 2012;Vissers et al., 2012).
At other time points, TD (but not ASD) subjects also aligned in their ISFC state expression beyond what would be expected by chance. The absence of moments of homogeneity in the ASD population is interesting in light of previous resting-state findings, where spatial variability regarding the exact location of functional brain territories has been reported (Hahamy et al., 2015;Nunes et al., 2018). In addition, we have previously shown that movie-driven ISFC changes show a similar mean, but a greater variance, across ASD subjects . In the TD group, more homogeneous state expression accompanied the transition from a science to a fun context (see the associated negative correlation in Fig. 3E): the specificity to TD subjects could arise from a more pronounced attribution of salience to such scenes compared to the ASD group, where such processes are known to be altered (Uddin et al., 2013).

Autism: a behaviourally and functionally multi-faceted condition
The more pronounced idiosyncrasy in ASD subjects hinted at a possible role of individual symptoms in shaping the expression of ISFC states. This was verified by explicitly relating the temporal profiles of state expression to the extent and balance in behavioural impairments by PLS analysis (Fig. 4), as a significant mode of covariance linking both modalities was found. It revealed that subjects with greater nonverbal communication impairments and more repetitive behaviours also have less social and verbal communication deficits (Fig. 4B); at the same time, they more strongly express specific temporal sequences of ISFC states (Fig. 4A), and thus, respond to the movie content differently as a function of their individual symptomatology. This was explicitly confirmed by a broad array of significant correlations between these differentially expressed patterns of ISFC state expression, and the movie regressors that we analysed (Fig. 4D): fitting with the previous analyses, many ISFC states could be linked, in particular, to the fun versus science contextual regressor, but there were also several relationships with luminosity and the presence of faces on screen. This is consistent with the notorious visual (Simmons et al., 2009) and face processing (Dalton et al., 2005) specificities seen in ASD.
The behavioural salience weights (Fig. 4B) are particularly interesting, as they provide precious insight into the relationship between different symptoms at play in autism. The presence of a combination Fig. 3. ISFC state expression homogeneity fluctuates over time in TD and ASD. (A) Along the course of the movie (top to bottom), indicative frames (left column), subject-specific ISFC expression time courses (middle panel, with different colours denoting distinct ISFC states), and time courses of quantitative movie features (right panel). State time courses from TD subjects are presented on the left, and the ones from ASD subjects are shown on the right. The blue, orange and green vertical bars respectively denote the time points when TD state expression is significantly homogeneous/idiosyncratic, ASD state expression is significantly homogeneous/idiosyncratic, or cross-population dissimilarity in state expression is significant; they are computed from (B) and (C). The four time labels denote the time points for which ISFC is examined at the whole-brain level in (D). (B) For TD (left panel) or ASD (right panel) subjects, summary of the population-wise homogeneity and idiosyncrasy levels as a two-dimensional representation. Each circle reflects the population-level estimate obtained at one time point, with increasingly warmer colours and smaller circles along the course of the movie. The homogeneity/idiosyncrasy combinations occupied at least once under a null setting are displayed as a light blue (in the TD case) or light orange (in the ASD case) surface beneath the cloud of circles. (C) Evolution of dissimilarity across groups over time (in dark green), with 99.8% confidence interval spanned by the two light green curves. (D) For four selected movie time points with a significant ISFC state expression metric (t 1 to t 4 ), mean ISFC across all expressed states in the TD and ASD groups (first and third columns), standard deviation of ISFC in the same groups (second and fourth columns), and tstatistic of cross-group difference in ISFC (fifth column). Top/bottom slices denote positive-valued/negative-valued ISFC patterns. Only the top 25% of connections are shown for each time point/metric combination. Colour coding and nodal size are proportional to nodal degree. (E) Correlations between the time courses of the five movie regressors (rows) and the five assessed population-level ISFC metrics (columns). Significant entries following Bonferroni correction are labeled with a star. In all panels, movie time always accounts for a haemodynamic delay of 5 s (Friston et al., 2000). (Bottom) For the top 10 ISFC states in descending order of maximal absolute correlation with a movie regressor, whole-brain ISFC patterns, separately depicted for positive-valued (top) and negative-valued (bottom) connections. The numbers below the slices depict the movie regressor(s) for which a significant correlation was obtained, with the minus sign denoting a negative correlation value. In all panels, movie time always accounts for a haemodynamic delay of 5 s (Friston et al., 2000). demonstrates that at least part of the functional underpinnings of ASD are shared across symptoms: had distinct symptoms been associated to different ISFC features, they would have been rather retrieved as separate significant PLS components. However, the fact that a contrast is retrieved (some symptoms have positive weights, and some have negative weights) also means that a subject expressing some symptoms to a greater degree-along with dedicated ISFC state expression patterns-will, at the same time, be less impaired along other behavioural axes. In short, our results thus demonstrate that autism is a multi-faceted condition.
Heterogeneity within a subject population has only recently become truly appreciated (Eavani et al., 2016), and fingerprinting of subjects based on functional connectivity properties (Finn et al., 2015) has enabled the prediction of social impairments in ASD (Lake et al., 2019). Future work should more systematically account for the presence of such a continuum of functional alterations, which not only exerts its effects within, but also across distinct brain disorders (Kebets et al., 2019), further questioning the relevance of relying on cross-group comparisons.

Advantages, limitations and perspectives of dynamic ISFC state analysis
Although naturalistic movie paradigms are highly dynamic by nature, the large majority of studies leveraging inter-subject (functional) correlation analysis rely on static estimates of regional synchronisation. This is in part owing to the great complexity of dynamic ISFC features, which largely exceeds the static case: in the present work, for instance, 35511 estimates would be obtained per subject for each of 167 temporal subwindows, rather than only 270/35511 estimates in the static ISC/ISFC case.
To render dynamic ISFC analysis feasible, we first reduced the complexity of the data in meaningful fashion, by selecting connections of interest through a previously introduced metric sensitive to ISFC transients ; that is, by detecting movie time points that elicit a significant change in functional brain synchrony. To study ASD notwithstanding IQ-related effects, we only retained the connections that did not show any statistical association with IQ ( Fig. 2A). However, alternative criteria can readily be employed: for example, it may be desirable to focus on the connections that respond during a particular subpart of the movie, or according to predefined regressors such as the ones considered here.
To track ISFC reconfigurations over time, we deployed a methodology originating from the resting-state research field, where the analysis of functional connectivity changes over time has been a prominent research area-see Hutchison et al. (2013) and Preti et al. (2017) for reviews. We extracted ISFC states; in the resting-state field, such an approach typically results in a characterisation based on only a handful of different states Damaraju et al., 2014;Barber et al., 2018), across which transitions operate at a slow time scale. This meta-stability of the brain (Hansen et al., 2015;Deco et al., 2017) has also been revealed upon movie watching (Tseng and Poppenk, 2019), but due to the large breadth of incoming cues, the diversity in visited states is much larger, as supported by the choice of K 2 ¼ 130 ISFC states for our analyses (Fig. 2C).
In supplementary analyses, we verified that our results were insensitive to the exact number of clusters used in deriving ISFC states (Supplementary Material, section 6). In particular, this extended to other candidate values that also showed a local optimum in clustering robustness (K 1 ¼ 70 and K 3 ¼ 190), which may have denoted other satisfying solutions with lowered or increased granularity, respectively. The robustness of our findings along this broad range is partly enabled by the undertaken analytical approach, which operates at the whole-brain level instead of focusing on selected regions or connections.
Other validations also showed that exact preprocessing choices, including whether to include the controversial global signal regression (Murphy and Fox, 2017) and how to account for the deleterious impacts of head motion (Power et al., 2015), only had little impact on our results (Supplementary Material, section 5). This highlights a particularly attractive feature of inter-subject analyses, which damp out physiological and motion artefacts since they are uncorrelated across subjects (Simony et al., 2016). As intrinsic dynamic reconfigurations are also abolished from the data owing to the same principle, an additional advantage compared to standard functional connectivity computations is the easier interpretation of obtained results: when performing our analyses with a more traditional set of dynamic functional connectivity states generated in within-subject fashion, relationships between state expression and movie cues were lost due to the mix between paradigm-driven and intrinsic effects in the data (Supplementary Material, section 7).
The ability to extract relationships between imaging features and the movie content crucially depends on the dynamic aspect of ISFC analyses: indeed, it is by generating one ISFC estimate per frame-from the W th time point of data, with W the window size-that correlations as the ones reported in this work can be performed, which enables to more clearly and convincingly support the role of movie cues in driving functional brain reconfigurations. In the specific context of autism, the application of dynamic approaches to naturalistic paradigms can be foreseen to broaden the scope of knowledge regarding the condition, as was the case in the resting-state setting regarding diagnosis (Wee et al., 2016) or mechanistic understanding (Falahpour et al., 2016).
There remains, however, some degree of uncertainty in applying sliding window ISFC analysis, as each windowed ISFC estimate is generated from a set of time points that will vary in size as a function of window length. The size of the window used will also make the analyses particularly sensitive to ISFC responses at a specific frequency (higher for smaller windows). In future technical developments, a transition to time/ frequency analysis (Chang and Glover, 2010;Yaesoubi et al., 2015; Bolton and Van De Ville, 2019) could be envisaged. Alternatively, ISFC fluctuations could also be quantified through phase synchronisation measures (Glerean et al., 2012;Bolt et al., 2018). Finally, yet another option could be to revert back to analyses centred on the blood oxygenation level-dependent time courses themselves, instead of the more indirect second-order connectivity information. Inspiration for this purpose could once more come from the resting-state field, where frame-wise maps of co-activating regions can be derived (Karahano glu and Van De Ville, 2015), and subsequently parameterised in terms of temporal expression .

Conclusion
In this work, we revealed that upon naturalistic stimulation, typically developing and autistic individuals continuously adjust their functional brain configurations as a function of incident movie cues. We showed that these rearrangements occur differentially not only across populations, but also across different subjects within a given group, in temporally complex fashion. The dynamic flavour of our analyses, as well as the insight into cross-subject heterogeneity, are essential avenues towards augmented analyses of naturalistic paradigms, and a refined understanding of the functional alterations induced by multi-faceted brain disorders.

Author contributions
Delphine Jochaut and Anne-Lise Giraud devised the experimental paradigm, acquired and provided the analysed data.
Lorena G. A. Freitas generated the examined movie regressors. Thomas A.W. Bolton performed all the analyses, and wrote the manuscript. He jointly designed the analytical strategy with Dimitri Van De Ville, who also overviewed the work throughout. 205321_163376), and by the Bertarelli Foundation.