Agito ergo sum: correlates of spatiotemporal motion characteristics during fMRI

The impact of in-scanner motion on functional magnetic resonance imaging (fMRI) data has a notorious reputation in the neuroimaging community. State-ofthe-art guidelines advise to scrub out excessively corrupted frames as assessed by a composite framewise displacement (FD) score, to regress out models of nuisance variables, and to include average FD as a covariate in group-level analyses. Here, we studied individual motion time courses at time points typically retained in fMRI analyses. We observed that even in this set of putatively clean time points, motion exhibited a very clear spatiotemporal structure, so that we could distinguish subjects into four groups of movers with varying characteristics Then, we showed that this spatiotemporal motion cartography tightly relates to a broad array of anthropometric, behavioral and clinical factors. Convergent results were obtained from two different analytical perspectives: univariate assessment of behavioral differences across mover subgroups unraveled defining markers, while subsequent multivariate analysis broadened the range of involved factors and clarified that multiple motion/behavior modes of covariance overlap in the data. Our results demonstrate that even the smaller episodes of motion typically retained in fMRI analyses carry structured, behaviorally relevant information. They call for further examinations of possible biases in current regression-based motion correction strategies.


Abstract
The impact of in-scanner motion on functional magnetic resonance imaging (fMRI) data has a notorious reputation in the neuroimaging community. State-ofthe-art guidelines advise to scrub out excessively corrupted frames as assessed by a composite framewise displacement (FD) score, to regress out models of nuisance variables, and to include average FD as a covariate in group-level analyses.
Here, we studied individual motion time courses at time points typically retained in fMRI analyses. We observed that even in this set of putatively clean time points, motion exhibited a very clear spatiotemporal structure, so that we could distinguish subjects into four groups of movers with varying characteristics.
Then, we showed that this spatiotemporal motion cartography tightly relates to a broad array of anthropometric, behavioral and clinical factors. Convergent results were obtained from two different analytical perspectives: univariate assessment of behavioral differences across mover subgroups unraveled defining markers, while subsequent multivariate analysis broadened the range of involved factors and clarified that multiple motion/behavior modes of covariance overlap in the data.
Resting-state functional magnetic resonance imaging (RS fMRI) has been a vibrant and flourishing research topic. Since its advent (Biswal et al. 1995), the assessment of statistical interdependence between brain regions, or functional connectivity (FC), has enabled the determination of large-scale functional brain networks (Damoiseaux et al. 2006, Power et al. 2011, Yeo et al. 2011, and the harvesting of their spatiotemporal properties towards a refined understanding of a constellation of brain disorders (Fox and Greicius 2010).
One of the most remarkable features of RS fMRI is that such analyses are already feasible from a few minutes of acquisition (Van Dijk et al. 2009). However, the reliance on low amounts of data also requires that the acquired time courses be impeccably cleaned from potential confounding signals. This is even more of a concern as the field starts moving towards time-varying and -resolved analyses, Amongst confounding signal sources, in-scanner head motion of the volunteers has been a leading cause of investigation. Its deleterious impacts may take many forms, and remain incompletely understood (see Power et al. 2015, Caballero-Gaudes and Reynolds 2017 for reviews). Some years ago, it was discovered that even short-lived episodes of motion might greatly bias FC analyses (Power et al. 2012, Van Dijk et al. 2012, Satterthwaite et al. 2012, and lead to erroneous interpretations in clinical or developmental studies (Deen andPelphrey 2012, Makowski et al. 2019). These observations of motion-biased results further fueled the development of robust post-processing strategies to free fMRI time courses from confounding motion effects.
Thanks to many rigorous and extensive studies (Satterthwaite et al. 2013, Yan et al. 2013, Power et al. 2014, Burgess et al. 2016, Ciric et al. 2017, Parkes et al. 2018, the field has reached a consensus as to what general steps are essential for a viable RS fMRI denoising pipeline. Their specificities, however, remain debated. In short, following the linear realignment of functional images, estimates of motion over time are obtained along three translational directions (left/right, anterior/posterior and dorsal/ventral, respectively termed X, Y and Z in what follows) and three rotational planes (roll, pitch and yaw, respectively referred to hereafter as α, β and γ). Framewise displacement (FD) is then computed as an aggregated measure across those 6 motion parameters 1 , in order to tag data points corrupted by excessive instantaneous motion and exclude them from subsequent analyses.
Estimated motion time courses are then linearly regressed out from the remaining fMRI data 2 , in a matrix of regressors that can be extended to include their quadratic expansions, their derivatives, and/or their squared derivatives. 1 Here, we will be discussing the FD metric suggested by Power et al. (2012), but other alternatives have also been put forward in the past literature (Jenkinson et al. 2002, Van Dijk et al. 2012). 2 Scrubbed data points can be accounted for in two ways: either by modeling them as individual single-point regressors (Lemieux et al. 2007), or by extracting fitting weights from solely nonscrubbed data points, and then applying the regression to the whole data (Power et al. 2014).
More parsimonious models lead to a greater amount of retained degrees of freedom in the data, while more exhaustive models may remove signal of interest (Bright and Murphy 2015), but enable to account for biophysically relevant nonlinear motion effects (Friston et al. 1996).
Finally, the addition of a covariate for group-level analyses has also been warranted (Ciric et al. 2018). However, this last step has been criticized for its risk of biasing some RS fMRI analyses: indeed, if the behavioral feature of interest in the study positively correlates with the extent of head motion, the investigated metric will be more strongly attenuated in larger movers, thus potentially lowering the true magnitude of the effect of interest.
To date, such concerns have been raised in attention or impulsivity studies (Kong et al. 2014, Wylie et al. 2014. Head motion has been posited to be a marker of cognitive control abilities (Zeng et al. 2014), showing clear heritability (Couvy-Duchesne et al. 2014), even if solely non-scrubbed frames are considered (Engelhardt et al. 2017), and sharing genetic influences with hyperactivity (Couvy-Duchesne et al. 2016) or body mass index (Hodgson et al. 2017). Very recently, an extended multivariate assessment isolated body mass index and weight as the major predictors of head motion, with mild additional impacts of impulsivity levels and alcohol/nicotine consumption (Ekhtiari et al. 2019). Thus, in light of current knowledge, the span of behavioral or clinical measures subject to bias remains limited.
A major limitation of all the above studies, however, is the use of average FD over time to quantify head motion levels. In other words, it is implicitly assumed that motion properties remain similar along the course of a scanning session, and do not differ across translational directions or rotational planes (an assumption that does actually not square well with the meager information available to date; see Wilke 2014). It is likely that the true spatiotemporal complexity of motion is so far overlooked, and that its relationship to behavior is thus only poorly understood. Since even the most sophisticated motion correction approaches summarized above are still unable to fully remove deleterious motion influences (Yan et al. 2013, Siegel et al. 2017, filling such possible gaps of knowledge is a critical question.
Our first question in the present work was thus whether we could find, consistently across subjects, spatiotemporal head motion properties going beyond time-and space-invariance. Our second question was then whether those more subtle motion profiles would consist in endophenotypes characterized by specific anthropometric, behavioral or clinical properties.

Motion Data Acquisition and Preprocessing
We considered a set of 224 healthy subjects from the Human Connectome Project (Smith et al. 2013), scanned at rest (eyes open) over four separate 15-minute sessions at a TR of 0.72 s. For each session, motion was estimated using rigidbody transformation with three translation parameters (along the X, Y and Z axes) and three rotation angles (in the α, β and γ planes respectively highlighting roll, pitch and yaw) with respect to a single-band reference image (SBRef) acquired at the start of each session, and FSL's FLIRT (Jenkinson et al. 2002). It resulted in 6 time courses (one per motion parameter) with 1200 time points each.
In the present work, we solely analyzed the motion time courses (not the fMRI data) from the first acquired session. Individual motion time courses were differentiated so that our analyses would focus on instantaneous displacement from time t to time t+1. Further, since time points linked to excessive displacement are typically removed from RS fMRI analyses, we only considered non-scrubbed motion instances according to Power's FD definition (Power et al. 2012) at a threshold of 0.3mm. Resorting to a more conservative (0.2 mm) or more lenient (up to 1 mm) threshold, or censoring not only tagged time points (time t) but also the following ones (time t+1), did not modify our findings (see Supplementary Material for a more detailed description).

Spatiotemporal Motion Characterization
We wished to assess whether different subjects would present distinct spatiotemporal motion characteristics in the data points that are typically conserved in RS fMRI analysis (i.e., not scrubbed out).
For each motion time course, we computed absolute valued instantaneous displacement. Thus, we did not consider the sign of the changes (e.g., moving positively as opposed to negatively in the X direction); this is because initial analyses indicated that positive-valued and negative-valued movements always compensated, to the exception of the X case (two-sided Wilcoxon rank sum test, p=0.0001).
Then, we averaged motion values within a) each motion type (X, Y, Z, α, β and γ) and b) each of 6 even-duration time intervals along the scanning sessions (2.4 min = 144 s each). This resulted in a total of 36 conditions. We chose 6 temporal sub-bins to give equal weight to spatial and temporal domain information in our decomposition of the data. Eventually, the values were zscored across subjects for each condition so that positive values highlight strong movers (at a given time and for a given motion parameter) with respect to the mean, and vice versa. It also follows that an equal weight is given to each condition.
Next, we used these 36 motion summary measures to separate subjects into different subgroups of movers through consensus clustering, a nonlinear dimensionality reduction approach (Von Luxburg 2007; see Supplementary Material for details). By taking into account such precise motion characteristics, we exploit complex motion profiles rather than simply dividing into high-and low-motion subjects, as is classically done on the basis of average FD.
To evaluate whether there was any significant effect of scanning duration, motion parameter or mover subtype, or any interaction between these factors, we conducted a three-way ANOVA (factor 1: scanning duration [time], factor 2: motion parameter [space], factor 3: mover subtype [group]) and assessed significance by comparing the obtained F-values with a null distribution generated non-parametrically over 10'000 folds, shuffling the three factors independently from each other.
To assess motion changes along time within a given group of subjects, a linear model was fitted along the six time bins (including a constant regressor of no interest), and the null hypothesis that the mean β value across subjects would be equal to 0 was assessed. To examine differences in motion along space, we conducted pair-wise two-tailed t-tests between all group pairs.

Behavioral Data Acquisition and Processing
For each subject, a battery of behavioral and demographic scores was also quantified. A list of all the investigated scores in the present study can be found in Supplementary Table 1. They were subdivided into several key sub-domains, largely following the original classification found in the HCP Data Dictionary 3 : For some scores, several entries were not acquired in a sub-fraction of subjects (mean: 1.52%, median: 0.89%, maximum: 21.43%). This was taken into account in behavioral data processing so that it would exert a minimal effect on the described findings. Some scores were also discarded due to various criteria, and the remaining ones were processed as in Smith et al. (2015), yielding a total of 45 scores for subsequent analyses, reflective of anthropometric properties, cognitive abilities or clinical features. Details are provided in the Supplementary Material.

Univariate link between motion subgroups and anthropometry/behavior
To determine whether some anthropometric/behavioral scores would differ across mover subgroups, we performed a univariate assessment. For each of the 45 assessed domains, we computed a score indicative of cluster-to-cluster distinction. Formally, following Gu et al. (2012): where xi is the vector of the i th domain scores across subjects, μ i is its average regardless of group classification, μ i k is its average within group k, and σ i k is the standard deviation within group k. A large score value indicates that the assessed behavioral domain shows distinct values between clusters.
To non-parametrically extract significant scores, we used permutation testing, by randomly shuffling subject motion entries 1'000 times. P-values were Bonferroni corrected for 45 tests. Scores were considered significant at a corrected p-value of 0.05.

Multivariate Links Between Motion Features and Anthropometry/Behavior
To go beyond univariate comparisons and test for multivariate patterns of motion-behavior interactions, we conducted a Partial Least Squares (PLS) analysis (McIntosh andLobaugh 2004, Krishnan et al. 2011). We summarize the gist of the approach below, and additional details can be found in the Supplementary Material.
We considered the matrix of behavioral scores (size 224 x 46) on the one hand, and the matrix of spatiotemporal motion features (size 224 x 36) on the other. Using PLS, we derived a set of so-called components. Each consists in a linear combination of motion scores, and a linear combination of behavioral scores, with maximized covariance. The associated weights are termed motion saliences and behavioral saliences, and are respectively arranged in U and V, two matrices of size 36 x 36 and 46 x 36. Motion saliences (i.e., the columns of U) are orthonormal, and so are behavioral saliences. Successive components explain gradually less of the covariance present in the data, as quantified by their singular values. Finally, the extent to which a motion salience or a behavioral salience is expressed in a given subject is termed the motion latent weight or behavioral latent weight, respectively.
To assess significance of the PLS components, we compared their singular values to a null distribution constructed from 1'000 shuffled datasets, following Zöller et al. (2017). We focused our interpretation on the components significant at p=0.02. To determine the stability of the saliences, we performed bootstrapping with 80% of the data.
For interpretation, we converted the 36-element motion saliences obtained from PLS analysis into a 6-element space and a 6-element time representation, by averaging across all time points or across all spatial directions, respectively. Stability was assessed on those summarizing values. For each behavioral or motion salience element, we considered it significant above a bootstrap score (mean salience across bootstrapping folds divided by the associated standard deviation) of 3, corresponding to a confidence interval of approximately 99% (Garrett et al. 2010;Zöller et al. 2017).
In addition, we performed correlation analyses between motion (or behavioral) latent weights of the analyzed components and FD or age, using Spearman's correlation and non-parametric significance assessment. We also performed a Wilcoxon rank sum test to probe for possible differences in motion (or behavioral) latent weights across gender. Results were Bonferroni corrected for 18 tests (3 components examined in terms of 3 separate parameters for 2 types of latent weights) and judged significant at a corrected p-value of 0.05.

Spatiotemporal Motion Diversity
Average motion across six even-duration time bins, and the 6 motion parameters, was quantified. This spatiotemporal motion profile characterization revealed the existence of four separate subgroups of movers ( Figure 1A): in the first one (n1=70, red patches), subjects showed low motion across all time and motion dimensions (negative z-score values in Figure 1C). In the second (n2=51, dark blue patches), subjects moved little, and less following the first sixth of the session, with stronger motion along the α and β rotational components. The third group (n3 = 67, orange patches) showed very strong motion spatiotemporally, which increased after the first sixth of the session, while the fourth one (n4 = 36, cyan patches) showed particularly strong motion in the γ rotational plane, which slightly attenuated after the first sixth of the session.
In addition, we also individually plotted scanning duration or motion parameter against cluster assignments ( Figure 1C), averaging over all entries from the other factor (e.g., the bar labeled 'X' denotes the average of motion along the X direction from t1 to t6).
Statistical analysis confirmed the above observations: on top of a significant effect of group (F = 1414.41, p<10 -5 ), there was a significant time x group interaction (F = 3.11, p<10 -5 ), and post-hoc assessment revealed that while groups 1, 2 and 4 showed a decrease in motion over time ( In terms of spatial properties, there was a significant effect of space (F=5.92, p<10 -5 ), as well as a significant space x group interaction (F = 83.88, p<10 -5 ).
Exhaustive results from a post-hoc assessment are displayed in Supplementary  Table 2. They show that subjects in group 4 moved the most in the γ plane (hence their blue shade in Figure 1B), while subjects from group 2 moved the least on that plane (hence their red and green tones). Group 1 featured the lowest movers in Y, Z, α and β, while in group 3, subjects moved most in X, Y, Z, α and β (thus, they appear in white in Figure 1B). Overall, each group could thus be clearly distinguished on the basis of spatial motion properties.

Univariate Links Between Motion and Anthropometry
Next, we related the spatiotemporal motion characteristics of the subjects (as summarized by their mover group assignment) to their anthropometric, behavioral and clinical features. Height, Weight, Blood pressure and Cognitive flexibility scores were significantly different across mover subtypes following Bonferroni correction (Figure 2). When applying FDR correction instead, Endurance, Somatic problems and Inattention also became significant.
Subsequent inspection of pair-wise group relationships (Figure 2, inset) showed that group 3 (i.e., the largest movers) showed greater weight, lower height, more elevated blood pressure and reduced endurance compared to all other groups, highlighting that they largely stand out in terms of anthropometric features. Members from group 4 (that is, the γ movers) mostly differed from groups 1 and 2 through larger somatic problems and inattention scores; hence, they can rather be distinguished on the basis of clinical measures (note that no pair-wise comparison survived Bonferroni correction in this case; this is unsurprising given that clinical scores were only found significant upon FDR correction in the above evaluations). Finally, subjects from group 1 (the lowest movers) showed significantly larger cognitive flexibility compared to groups 2 and 3, hence differing in terms of behavioral abilities.

Subtler Motion/Behavior Relationships Revealed by Multivariate Analysis
Finally, we attempted to extract significant multivariate relationship(s) between our spatiotemporal motion characteristics and the entire breadth of anthropomorphic/behavioral/clinical features (Figure 3).
There were three significant covariance components. Component 1 (p<0.001) explained 71.92% of the data covariance, and characterized motion regardless of space or time (apart from α, all values had an absolute bootstrap score larger than 3): subjects showing larger global motion values also showed worse endurance, larger sensitivity to auditory noise, worse working memory performance, worse spatial orientation (both in terms of larger response time and lower accuracy), lower cognitive flexibility, worse fluid intelligence, and worse body mass index (decreased height, enhanced weight). In addition, they were more anxious, showed more thought problems, aggressiveness, inattention and antisocial behaviors. When compared to the mover groups derived beforehand, the gradient in motion latent weights across subjects appeared to discriminate low movers (group 1) from large movers (group 3).
Component 2 (p=0.017) explained 15.2% of the covariance of the data. It largely corresponded to movement in the γ plane independent from time (none of the t-values were significant). Stronger movers in that plane also showed sleep problems, but performed better in cognitive flexibility and spatial orientation tasks. Their response time was larger when tested for sustained attention, and they showed a constellation of elevated clinical scores including anxiety, withdrawal, somatic problems, attentional problems, aggressiveness, hyperresponsiveness and rule breaking behaviors. They also showed stronger alcohol consumption. Motion latent weights contrasted the γ movers (group 4) from other groups.
Component 3 (p=0.008) explained 5.35% of data covariance, and contrasted translational (mostly X) and rotational (mostly α ) motion. Stronger translational and lower rotational movers showed greater weight and height, higher blood pressure, enhanced cognitive flexibility abilities, greater selfregulation abilities, and larger response time upon a sustained attention task. They were also better at discriminating emotions, but worse at contrast sensitivity. Clinically, they had more intrusive thoughts and showed less externalizing, withdrawal and thought problems. As for motion latent weights, they separated the α/β movers (group 2) from the others.

A cartography of in-scanner spatiotemporal motion
In RS fMRI analyses, time points associated with excessive instantaneous motion (as quantified by a composite FD score) are typically removed/scrubbed. Two implied assumptions are made through this process: the first is that motion in the retained, non-scrubbed time points is negligible and does not show any characteristic structure. The second assumption is that FD is sufficient to characterize motion.
Our results question both of the above assumptions. Indeed, we were able to separate scanned volunteers into four different mover subgroups even when excluding time points with high motion as quantified by FD. These groups differed in the extent of motion displayed by subjects along the three translational (X, Y and Z) and three rotational (α, β and γ) motion directions, as well as in the temporal evolution of motion along the scanning session.
More specifically, a change in motion extent along the scanning session was observed, as seen by a significant time x group interaction in our ANOVA design, as well as by significantly non-null post-hoc regression coefficients. The absence of time x space, or time x space x group interaction terms means that temporal changes were consistent across all motion directions. Further, a closer inspection revealed that the main drive of this result was a changed motion extent after the first sixth of the session (see Figure 1C): in subjects from group 3, motion increased after the first 2.4 min of recordings, while the opposite was seen for the other groups. This need for a few minutes before setting into a motion steady state suggests that to avoid one possible source of bias, if affordable, future fMRI analyses may ignore the first few minutes of a scan.
In addition to temporal motion properties, there were also much more noticeable group differences in terms of space, which contributed in major part to the organization revealed in the dimensionally reduced representation of Figure 1A/B. Two mover subgroups were opposite extremes: group 1 subjects moved very little as compared to the average across spatial directions (hence depicted in black in Figure 1B), while the subjects in group 3 consistently displayed very large displacements (hence represented in white). Rotational motion was the distinguishing feature of the last two groups, respectively in the α/β (group 2, seen as red and green shades) and the γ (group 4, color coded by a blue shade) planes.
All in all, the presence of strongly differing spatial motion profiles across subjects confirms the importance of subject-level motion correction strategies through regression. Further, since our work focused on instantaneous motion (t to t+1 changes), our results may be interpreted as an additional argument in favor of more complete regression models, at least to the point of incorporating motion time courses and their shifted counterparts (see below, however, for a more complete discussion).
Rather than considering four subgroups of movers, another perhaps more relevant interpretation of our data is the presence of three distinct axes of motion: the first is a global component, homogeneous across space. It is seen as the first dimension in our spectral clustering investigations ( Figure 1A/B), positioning it as the major discriminating factor across subjects. It is also found back as the first, most prominent component in the following PLS analysis ( Figure 3A, left panel). Other more subtle, but nonetheless significant spatial motion combinations then add up on top: motion along the γ plane and in the α/β planes are, respectively, captured in dimensions 2 and 3 in spectral clustering, and in components 2 and 3 in the PLS analysis.
The two methodological approaches employed strongly differ in their core properties: spectral clustering is a hard clustering technique in which each subject can only be assigned to one group, while PLS describes the motion properties of each subject as a linear combination of motion saliences that covary with anthropometry, behavior and clinical symptoms. The convergent findings obtained from both analytical perspectives strengthen our newly revealed cartography of spatiotemporal movers as a non-negligible feature.
Our findings question the accuracy of most motion correction assessment approaches, which exclusively rely on averaged FD over time. A separate assessment across motion parameters (or more elaborate approaches involving specific weighted combinations, such as our PLS motion saliences) appears to be necessary to better understand which motion impacts are removed, and which subsist in the data.

Agito ergo sum: bodily and behavioral underpinnings of motion
In 1644, René Descartes, in quest for a primal principle at the root of all knowledge, formulated his notorious cogito ergo sum (I think, therefore I am) 4 . 375 years later, we wish to summarize our findings by reformulating his words: agito ergo sum (I move, hence I am). By this, we mean that the defining aspects of someone (one's bodily features, abilities to interact with the world and ways to respond to the environment around) are reflected, in various and subtle ways, in how one moves during scanning.
As an example of this principle, while subjects from groups 1, 2 and 4 moved less after the first sixth of the recording session, high movers from group 3 moved more. Univariate evaluation following spectral clustering revealed that the latter mostly stood out in terms of anthropometric or fitness measures (Figure 2, inset): weight, height, blood pressure and endurance. Conversely, group 1 subjects stood out by their better cognitive flexibility scores.
Those observations enable to sketch a global picture of how the response to the scanning environment differs across subjects: low movers, who are able to efficiently cope with changes in environmental conditions (for example, by better adjusting to the loud MRI noise fluctuations), rapidly start moving less and maintain overall low head motion throughout scanning. Large movers, on the other hand, are intrinsically more prone to large head motions, possibly because of feeling more cramped inside the scanner, and become increasingly uneasy with the contiguous MRI environment, thus moving more.
A caveat of univariate approaches is the risk that more subtle behavioral or clinical correlates of motion remain undetected. Our follow-up multivariate PLS analysis confirmed this limitation: motion saliences from the most prominent component ( Figure 3A, left panel) were positive across space and time, with an increase following the first session sixth. This means that subjects expressing this component more strongly move more overall, and vice versa, as also confirmed by a strong positive correlation with FD ( Figure 4A). This component thus highlights similar motion features as the ones discriminating mover groups 1 and 3. The array of associated behavioral saliences not only included the dominating anthropometric factors mentioned above, but also showed that larger movers perform worse in working memory, fluid intelligence and spatial orientation scores. Further, they also exhibit worse aggressiveness, inattention and antisocial behavior clinical scores.
Overall, this global pattern is highly reminiscent of a positive-negative mode of population covariation previously described by Smith and colleagues (2015), and put forward as relating behavior, demographics and FC. The similarity may partly come from the fact that the authors resorted to Canonical Correlation Analysis (CCA), a multivariate technique with strong similarities to PLS. Our results raise the possibility that this mode may, at least in part, reflect differences in motion across the considered subjects.
The main PLS component highlights the dominating factor of motion/behavior covariance. On top of it, we also revealed subtler overlapping factors. Component 2 ( Figure 3B) specifically showcased the γ motion seen in group 4: those subjects that express it strongly move a lot along γ, but also move less along other directions (as indicated by the negative signs in Figure 3B, left panel). This is why motion saliences were negative (albeit non significantly) across time bins, and why a negative correlation was found between motion latent weights and FD ( Figure 4B). Component 2 may thus be seen as a positive marker of low motion.
No significant anthropometric associations were detected, but the subjects expressing component 2 more strongly showed sleep disturbances, good cognitive flexibility and spatial orientation performances, as well as greater carefulness in an attention response task (better sensitivity at the cost of larger response times; see Supplementary Material). This was accompanied by a wide scope of elevated clinical scores, including anxiety, somatic problems, aggressiveness, intrusiveness, ADHD and hyper-responsiveness, as well as by marked alcohol consumption.
We conjecture that this component may reflect efforts of the subjects to refrain from moving in the scanner: indeed, head motion along γ reflects yaw, and may highlight attempts at limiting translational displacements along X or Z by forcing the head to remain anchored on the bed. The efforts leading to this typical motion signature may be regulated by the subjects' good cognitive abilities, and were perhaps influenced by their clinical symptoms.
Interestingly, the expression of component 2 also negatively correlated with age, despite considering a relatively narrow age range in the present study (between 26 and 35 years old). Thus, older subjects express this component less, and move more. Since head motion has been a central question in developmental studies, it will be interesting to examine, in future work, whether the characterization of motion along γ rather than through FD may be a better strategy (especially given that component 1, accounting for the global motion effect, showed no significant relationship to age).
As for component 3 (Figure 3B), it contrasted motion along X and α (i.e., more strongly expressing subjects move more along X, but less along α). Positive motion and behavioral latent weights were seen in males, while the opposite was seen for female subjects (Figure 4C), implying that gender may be an underlying cause of that particular motion pattern. α reflects roll, occurring in the plane spanned by the X axis: motions along α and X are thus biophysically constrained to occur concurrently. The differential recruitment of both motions across genders may result from distinct anthropometric factors (larger weight, height and blood pressure in males), or from behavioral/clinical specificities of one of the genders (for example, better self-regulation abilities in males).

Implications, limitations and future perspectives
Our results have strong implications regarding RS fMRI studies: indeed, the observation that a broad array of behavioral and clinical characteristics relate to motion implies that the scope of studies reporting possibly biased findings with regard to clinical or cognitive group-level comparisons is perhaps much wider than envisaged so far. Earlier on, we discussed how the widely used extended subject-level regression designs enable to remove the spatiotemporally complex motion effects introduced here. However, their intricate and overlapping relationships with behavior raise the danger that, akin to including average FD as a covariate in group-level analyses, an unwanted bias with regard to clinical or cognitive analyses occurs at the single-subject level stage.
Assume, for example, that an experimenter is interested in studying sleep quality through assessments of FC at rest. From our results, subjects showing worse sleep quality will exhibit greater motion along the γ axis, in a way that is essentially constant at the time scale of minutes of recording ( Figure 3B). The use of a regressor encoding instantaneous motion changes along γ, as suggested by most for optimal data preprocessing, may result in the removal of a larger signal fraction in individuals with poor sleep quality, possibly leading to the underestimation of the effect of interest. For this reason, we encourage experimenters, in future analyses, to investigate the fitting coefficients obtained upon regression so that it can be verified whether a link exists between the extent of removed signal, and the behavioral feature of interest.
Of course, the exact impact of the regression step will depend on the precise temporal expression of γ motion and of sleep-related fMRI fluctuations, since one fitting coefficient is extracted depending on frame-wise similarities between the considered motion and the voxel-wise fMRI time courses. The obvious next step to perform, and the major limitation of the present analyses, is that we have not yet pushed our exploration to the level of fMRI time courses, but focused on motion estimates only.
Our aim, with this report, was not to design a new efficient motion correction strategy, but to dig into the complexity of motion per se, and by this mean, put forward possible caveats and improvements of existing approaches. Our code and results are fully available at https://c4science.ch/source/MOT_ANA.git, and we encourage the interested researchers to extend our current investigations at the level of the fMRI signal.
A second limitation of our work is that we solely considered motion, although many more factors are known to corrupt the fMRI signal (Biancardi et al. 2009, Birn 2012, Liu 2016. Particularly relevant to the present study is the recent work of Power et al. (2019), who showed that motion time courses from the HCP dataset contain an array of respiratory contributions. Given the impact of blood pressure on some of our components, it seems likely that cardiac or respiratory effects indeed contribute to head motion variability.
It is important to specify that although regression-based approaches are one of the major preprocessing avenues, other motion correction alternatives also exist and may less suffer from possible biases; they include original twists on traditional regression designs (

Conclusion
We demonstrated that head motion in the MR scanner during RS fMRI acquisitions, an infamous confounding factor of this imaging modality, exhibits spatiotemporal structure that is not fully accounted for by motion-correction strategies. Strikingly, one's motion characteristics can inform not only about one's anthropometry, but, more surprisingly, about one's behavior and psychiatric function. We hope that our findings will lead future clinical or cognitive fMRI studies to probe more extensively for the presence of motionrelated artifacts.
TB designed the study, ran the analyses and wrote a first draft of the manuscript. DZ provided the PLS software used in the work, as well as methodological insights. CC and EG suggested extra analytical steps. VK contributed to the cognitive interpretation of the findings. DVDV supervised the work. All authors helped in writing the final manuscript version.       Average z-score Average z-score   ) whether it was included in the analysis or not (first column), the reason if not included (1: not available upon download, 2: considered irrelevant, 3: discarded at Principal Components construction stage, or 4: discarded at Principal Components quality control stage), the score name, the number of analyzed Principal Components containing the score (in parenthesis, total Principal Components number computed from the score, with 0 denoting single scores directly fed to the analysis), and the percentage of missing entries. Color in the background denotes scores that were jointly included in a Probabilistic PCA.

Supplementary Material
This Supplementary Material is subdivided into three sections: in the first one, we give extended explanations on spectral clustering and Partial Least Square analysis, two approaches used in our work. In the second section, we provide details on the exact constituents of each of the 45 behavioral domain scores that we analyzed. In the third section, we assess the robustness of classification results and motion/behavioral saliences to changes in scrubbing settings.

Section 1: Analytical Details
Graph theory, spectral clustering and consensus clustering There exist many ways to cluster a dataset into a subset of distinct groups. In our case, we applied spectral clustering on our data matrix ∈ ℝ !!"!!" . Each of our data points can thus be expressed as a vector ∈ ℝ !"!! , arranged as the rows of X.
Spectral clustering necessitates (1) the definition of a graph summarizing the data at hand, (2) the extraction of meaningful components summarizing the data based on the graph architecture (which can be understood as a dimensionality reduction approach), and (3) classification using the extracted components. We go through those three steps in details below.

Graph definition
Let us consider a graph ≜ ( , ℰ), where is its constituting set of nodes and ℰ the set of edges linking those nodes. We denote by !,! the edge weight between nodes i and j; a larger value indicates a closer similarity between the nodes.
Graphs can be used to represent a wide array of systems, such as transportation networks, metabolic networks or social networks. In neuroscience, a classical approach has been to define nodes as brain regions, and edges as their structural or functional connectivity.
Here, we adopt another approach in which we define the nodes as the 224 subjects considered in our analyses. Edge weights were set using an N-nearest neighbor criterion, in which each node was linked to its N closest neighbors (as quantified from cosine similarity) only. Let ! the N-neighborhood of node i; edge weights were initialized as: ||! ! || ||! ! || is the cosine distance between data points i and j, and ! is the average of all distance between i and its N nearest neighbors. Weights can take values between 0 (infinitely distant/non-neighboring data points) and 1 (identical data points).
Edge weights can be efficiently summarized into the adjacency matrix A of the system.

Dimensionality reduction
We first define the symmetric, positive-definite Laplacian matrix of the system as = − , where D is a diagonal matrix containing nodal degrees. The nodal degree of node i is the sum of incoming edges: ! = !,! !∈ ! . In our analyses, we considered the normalized version of the Laplacian, given by: LN can equivalently be expressed as an eigenvalue decomposition, as ! = Σ ! . In this expression, U is the matrix of eigenvectors (arranged in columns) and Σ is a diagonal matrix containing the associated eigenvalues in its diagonal. We consider sorted eigenvalue/eigenvector pairs in decreasing eigenvalue order.
The first three eigenvectors with non-null eigenvalue happen to be an optimal basis for classification; expressing our 36-dimensional data points in this threedimensional space thus operates as a nonlinear dimensionality reduction approach.

Clustering
To partition the data into clusters, k-means clustering is performed on the 224 x 3 dimensionally reduced dataset. To select the optimal number of clusters, we used consensus clustering (Monti et al. 2003), a subsampling-based assessment of robustness.
In more details, the clustering process was repeatedly run (100 times) over 80% of the data points (i.e., 179 subjects), for cluster numbers k ranging from 2 to 17. For each k, a consensus matrix summarizing how frequently two data points would be clustered together was derived. Since the goal in a good clustering scheme is to either always cluster two data points together, or to never do so, the goal is to find a k for which the proportion of ambiguously clustered pairs (PAC; Șenbabaoğlu et al. 2014), linked to intermediate consensus values, is the lowest. As showed in Figure 1A, k = 4 stood as a clear optimum.

Partial Least Square (PLS) Analysis
PLS is a multivariate approach that enables to extract co-varying components between two types of measures. In what follows, we will be considering the matrix of spatiotemporal motion features ∈ ℝ !" ! !!" , and the matrix of behavioral domain scores ∈ ℝ !" ! !!" .
The goal in PLS analysis is to extract covariance components from the data. To do so, we first consider the covariance matrix between spatiotemporal motion features and behavioral domain scores: This matrix can be equivalently expressed in the form of a singular value decomposition: = Σ ! .
In the above equation, the matrix U contains the left singular vectors of R, arranged in successive columns. Those vectors form an orthonormal basis; i.e., ! • = . The same property applies to the right singular vectors, arranged in the columns of the matrix V. As for Σ, it is a diagonal matrix containing the singular values ! , = 1, . . . ,36 as its diagonal elements. We assume here that singular vectors and singular values are sorted in decreasing singular value order.
The intuition behind this decomposition is that the full covariance between both datasets is expressed as a weighted low-rank approximation: where ui and vi are the i th left and right singular vectors, respectively. Since U and V can both be seen as orthonormal bases, it follows that the strength of expression of the components in the investigated pool of subjects can be simply expressed as a projection: with LY denoting the strength of expression of behavioral domain scores, or behavioral latent weights, across subjects, and LX that of spatiotemporal motion features (called motion latent weights). The i th column of LY or LX contains the weights associated to the i th component, while the j th row contains all weights associated to subject j.

Generation of behavioral domain scores
In this work, we describe the relationships between spatiotemporal motion features and a set of 45 domain scores. Each of those scores is obtained as a weighted combination from original scores provided by the HCP. Below, we describe how we selected the original scores to use, and how we combined them. We also provide an interpretation of each domain scores.

Selection of measures of interest
We chose not include some types of HCP scores into our analysis, because they highlighted family relationships or attributes falling beyond the scope of the present work, or were available in a too limited fraction of subjects. This included: 1. Family relationships between subjects and twin status 2. Psychiatric history of the mother or father 3. Scores reflective of the menstrual cycle (only available in female subjects).
In several cases, Age adjusted and Unadjusted scores are provided. Both were kept and aggregated in further processing steps, as described below.
Because some domains included many more scores than others, and in order to conduct a balanced analysis, we converted the scores of each domain into only one value through Probabilistic Principal Component Analysis (PPCA; Bishop 1999), which enabled, at the same time, to fill in the few missing entries in each case. Some scores were not retained because they appeared irrelevant to us (labeled '2' in Supplementary Table 2; e.g., has blood been sampled?), and others because they were considered too specific (that is, would induce overfitting), or overlapped with others (labeled '3').
Eventually, we only retained the domain scores that were sufficiently accurate, according to criteria proposed by Smith et al. (2015); excluded scores at this stage are labeled '4'. Exclusion criteria were: 1. More than 5 unavailable entries (in the case of domain scores that were derived from only one HCP score and did thus not undergo PPCA) 2. The case max( ) > 100 , with = ( − med( )) , the vector ∈ ℝ !!" ! ! the values for a given score across subjects, and I a unitary vector of appropriate size 3. More than 95% of subjects showing the same score value.
Finally, the remaining domain scores underwent rank-based inverse Gaussian transformation. The final matrix of behavioral information used for the analyses had size 224 x 45 (subjects x domains).

Derivation of composite domain measures from individual HCP scores
Our analyses relate spatiotemporal motion features to 45 components reflective of anthropometric, behavioral or clinical features from the scanned subjects.
We assessed robustness of the clustering outcomes by computing the purity measure (Yang et al. 2012), which takes a value of 1 for perfect concordance of classification, and of 0 if no data point is clustered similarly.
For each set of saliences, we computed (1) the absolute valued Spearman's correlation between the reference and output saliences, and (2)