The representational space of observed actions

Categorizing and understanding other people’s actions is a key human capability. Whereas there exists a growing literature regarding the organization of objects, the representational space underlying the organization of observed actions remains largely unexplored. Here we examined the organizing principles of a large set of actions and the corresponding neural representations. Using multiple regression representational similarity analysis of fMRI data, in which we accounted for variability due to major action components (body parts, scenes, movements, objects, sociality, transitivity) and three control models (distance between observer and actor, number of people, HMAX-C1), we found that the semantic dissimilarity structure was best captured by patterns of activation in the lateral occipitotemporal cortex (LOTC). Together, our results demonstrate that the organization of observed actions in the LOTC resembles the organizing principles used by participants to classify actions behaviorally, in line with the view that this region is crucial for accessing the meaning of actions.

Humans can perform a striking number of different types of actions, from hammering a nail to 13 performing open heart surgery. However, most of what we know about the way we control and 14 recognize actions is based on a rich literature on prehension movements in humans and non-human 15 primates. This literature revealed a widespread network of fronto-parietal regions, with a preference 16 along the dorso-medial stream and the dorso-lateral stream for reaching and grasping movements, 17 respectively (for reviews see e.g. [1][2][3]. Less is known regarding the organization of more complex actions 18 (for exceptions, see [4][5][6]. According to which principles are different types of actions organized in the 19 brain, and do these principles help us understand how we are able to tell that two actions, e.g. running 20 and riding a bike, are more similar to each other than two other actions, e.g. driving a bike and reading 21 a book? Are observed actions that we encounter on a regular basis organized according to higher-level 22 semantic categorical distinctions (e.g. between locomotion, object manipulation, and communication 23 actions and further overarching organizational dimensions)? Note that higher-level semantic 24 categories often covary with more basic action-related aspects of perceived action scenes (e.g. 25 locomotion actions often take place outdoors and do not involve objects, communicative actions often 26 involve mouth/lip movements, etc.). Disentangling these levels neurally presents an analytical 27 challenge that has not been addressed so far. 28 A number of recent studies used multivariate pattern analysis (MVPA) 7 to examine which brain areas 29 are capable to distinguish between different observed actions (e.g. opening vs closing, slapping vs 30 lifting an object, or cutting vs peeling) 6,[8][9][10][11][12] . The general results that emerged from these studies is that 31 it is possible to distinguish between different actions on the basis of patterns of brain activation in the 32 lateral occipito-temporal cortex (LOTC), the inferior parietal lobe (IPL) and the ventral premotor cortex 33 (PMv). In line with this view, LOTC has been shown to contain action-related object properties 13 . LOTC 34 and IPL, but not the PMv, furthermore showed a generalization across the way in which these actions 35 were performed (e.g. performing the same action with different kinematics), suggesting that these 36 areas represent actions at more general levels and thus possibly the meaning of the actions. However, 37 previous studies could not unambiguously determine what kind of information was captured from 38 observed actions: movement trajectories and body postures 6,11,12 , certain action precursors at high 39 levels of generality 9 (e.g. object state change), or more complex semantic aspects that go beyond the 40 basic constituents of perceived actions and that represent the meaning of actions at higher integratory 41 levels. In the latter case, the LOTC and the IPL should also reflect the semantic similarity structure of 42 a wide range of actions: Running shares more semantic aspects with riding a bike than with reading; 43 therefore, semantic representations of running and riding a bike should be more similar with each 44 other than with the semantic representation of reading. 45 To test the prediction that the LOTC and the IPL reflect the semantic similarity structure of a wide 46 range of actions, we carried out a behavioral and an fMRI study in the same group of participants. In 47 the behavioral experiment, we used inverse multidimensional scaling (MDS) 14 to examine the 48 structure of the similarity in meaning (which we will refer to as semantic similarity in the remainder 49 of this paper) of a range of different actions ( Figure 1). Moreover, to control for more basic 50 constituents of actions that often covary with action semantics, we carried out inverse MDS for action-51 related aspects that typically constitute the basic perceptual cues in naturalistic actions, namely, body 52 parts, scenes, movement kinematics, and objects. In the fMRI study, we examined which brain regions 53 capture the semantic similarity structure determined in the behavioral experiment, using 54 representational similarity analysis 15 (RSA). Moreover, to examine which brain areas capture semantic 55 similarity over and beyond the other action-related aspects we examined, we carried out a multiple-56 regression RSA for each of the behavioral models while accounting for the remaining models. 57 To increase perceptual variability, actions were shown from different viewpoints, in different scenes, using two different actors (see text for details) and different objects (for actions involving an object). For a full set of stimuli used in the fMRI experiment, see Supplementary Information, Figure S1. B) Illustration of the behavioral experiment used for inverse multidimensional scaling. In the first trial of the experiment, participants were presented with an array of images arranged on the circumference of a grey circle (left panel). In each subsequent trial, an adaptive algorithm determined a subset of actions that provided optimal evidence for the pairwise dissimilarity estimates (see 14 and Methods, for details). In different parts of the experiment, participants were asked to rearrange the images according to their perceived similarity with respect to a specific aspect of the action, namely, their meaning (or semantics), the body part(s) involved, the scene/ context in which the action typically takes place, movement kinematics, and objects involved in the action. Right panel: Example arrangement resulting from the semantic task of one representative participant. Using inverse multidimensional scaling, we derived a behavioral model (see Figure 2) from this arrangement, individually for each participant, that we then used for the representational similarity analysis to individuate those brain regions that showed a similar representational geometry (for details, see Methods section).

60
To investigate how similar participants judged the actions to be with regard to their meaning (referred 61 to as semantics in the remainder of this paper) and action-related aspects (body parts, movement 62 kinematics, objects, scenes), we extracted the pairwise Euclidean distances from the participants' 63 inverse MDS arrangements to obtain dissimilarity matrices (DSMs) for each participant and each 64 behavioral model. Figure 2 shows DSMs for each model averaged across participants. For each model, 65 we found significant (all p-values were smaller than p < 0.0001 and survived false discovery rate 66 correction) inter-observer correlations, i.e., the individual DSMs significantly correlated with the 67 To examine correlations between models, we computed correlations between each pairwise 72 comparison of DSMs, both averaged across participants (Supplementary Information, Figure S2A that the optimal number of clusters for the semantic task was six. As can be seen in Figure S4, the first 92 three components account for the largest amount of variance. For ease of visualization, we show the 93 first two components in Figure 3. A visualization of the first three principal components is shown in 94 Figure S5. The analysis revealed clusters related to locomotion (e.g. biking, running), 95 social/communicative actions (e.g. handshaking, talking on the phone), leisure-related actions (e.g. 96 painting, reading), food-related actions (e.g. eating, drinking), and cleaning-related actions (e.g. 97 showering, washing the dishes). Clusters obtained from the remaining models can be found in the 98 Supplementary Information (Figure S6A-D). 99

100
To identify neural representations of observed actions that are organized according to semantic 101 similarity, we performed a searchlight correlation-based RSA using the semantic model derived from 102 the behavioral experiment. We thereby targeted brain regions in which the similarity of activation 103 patterns associated with the observed actions matches the participants' individual behavioral 104 semantic similarity arrangement. We identified significant clusters in bilateral LOTC extending 105 ventrally into inferior temporal cortex, bilateral posterior intraparietal sulcus (pIPS), and left inferior 106 frontal gyrus/ventral premotor cortex ( Figure 4, Table S2). The remaining models, focusing on action-107 related aspects, revealed some similarity with the semantic model ( Figure 5A-D). This was expected 108 since action-related features covary to some extend with semantic features (e.g. locomotion actions 109 typically take place outdoors, cleaning-related actions involve certain objects, etc.; see also Figure S2). 110 As explained above, an examination of the variance inflation factor indicated that these between-111 model correlations did not lead to problematic collinearities at the individual level, which allowed 112 disentangling effects of the models using multiple regression. However, given these covariations, it is 113 impossible to determine precisely what kind of information drove the RSA effects in the identified 114 regions on the basis of the correlation-based RSA alone. 115 Hence, to test which brain areas contain action information in their activity patterns that can be 116 predicted by the semantic model over and above the action-related aspect models, we conducted a 117 multiple regression RSA. We hypothesized that if actions were organized predominantly according to 118 action-related aspects examined in the additional models (i.e., the body parts involved, the scene in 119 which they typically occur, the movement kinematics, and the objects on which actions are 120 performed), this analysis should not reveal any remaining clusters. As can be seen in Figure 6, the 121 semantic model explained significant amounts of variance over and above the action-related aspect 122 models in bilateral anterior LOTC at the junction to posterior middle temporal gyrus, right posterior 123 superior temporal sulcus, and left pIPS. 124 Error! Reference source not found.S7 shows the results of the multiple regression RSA for the other 125 models (body, scene, movement, and object models). For the body model ( Figure S7A), we found 126 clusters in left pSTS that explained significant amounts of variance over and above the remaining 127 models. Likewise, we obtained a cluster in left dorsal middle occipital gyrus for the scene model 128 ( Figure S7B). The movement model explained observed variance over and above the other models in 129 clusters partially overlapping with the clusters of the semantic model ( Figure S7C). Notably, the 130 clusters in LOTC found for the movement model were more posterior than those of the semantic 131 model (for peak coordinates, see Table S3). By contrast, the cluster in pIPS were more anterior for the 132 movement as compared to the semantic model. Finally, we obtained additional clusters in the left 133 posterior fusiform gyrus and anterior middle temporal gyrus / superior temporal sulcus for the object 134 model ( Figure S7D). 135 Note that the obtained RSA results are unlikely to be due to some low-level features of the images we 136 used. First, to minimize the risk that results could be driven by trivial pixel-wise perceptual similarities, 137 we introduced a substantial amount of variability for each of the 28 actions, using different exemplars 138 in which we varied the actor, the viewpoint, the background/ scene (e.g. kitchen A vs kitchen B), and 139 the object (for actions that involved one). Second, a control analysis confirmed that there was no 140 significant correlation between semantic and pixelwise similarity (see Results, Behavioral). 141 Figure 4. Standard RSA, Semantic model. Group results of the searchlight-based RSA using the semantic model (standard RSA, i.e. second order correlation between neural data and behavioral model). Statistical maps only show the positive t values that survived a cluster based nonparametric permutation analysis with Monte Carlo permutation (cluster stat: max sum; initial pval<0.001 16 ). The resulting individual correlation maps were first fisher transformed and then converted to t scores. After the correction, data were converted to z scores, and only values greater than 1.65 (one-tailed test) were considered as significant. This analysis revealed clusters in bilateral LOTC, bilateral IPL and the left precentral gyrus (see Table S2 for details).   Table S2 for details). Black outlines indicate clusters of the correlation-based RSA for the semantic model ( Figure 4). The analysis procedures and corrections were the same as for the semantic model (see Methods section and the caption of Figure 4).  drinking, cutting, or getting from one place to the other, and actions that fulfil higher (social belonging, 159 self-fulfillment) needs such as hugging, talking to someone, reading, listening to music). Interestingly, 160 this distinction shows some similarity with Maslow's hierarchy structure of needs 17 . The third 161 component might capture the degree to which an action is directed towards another person (hugging, 162 holding hands, talking, etc.) or not (running, swimming, playing video games, reading). Note that we 163 do not claim that the five categories obtained in the current study are the only action categories. We 164 expect that future studies, using a similar approach as described in the current study with a wider 165 range of actions, will reveal additional categories and overarching dimensions. However, given that 166 we made an effort to select actions that we encounter on a regular basis, and given the agreement 167 with semantic categories for words describing events, we can rule out that the categories obtained in 168 the current study are entirely arbitrary. 169 To identify brain regions that encoded the similarity patterns revealed by the behavioral experiment, 170 we conducted a searchlight-based RSA. We observed a significant correlation between the semantic 171 model and the pattern of fMRI data in regions of the so-called action observation network 18 , which 172 broadly includes the LOTC, IPL, premotor and inferior frontal cortex. Using multiple regression RSA, 173 we found that only bilateral LOTC and the left angular gyrus/ pIPS contain action information as 174 predicted by the semantic model over and above the remaining models. In line with these results, it 175 has been demonstrated that it is possible to discriminate between observed actions based on the 176 patterns of activation in the LOTC, generalizing across objects and kinematics 8 , and at different levels 177 of abstraction 9 . Moreover, the LOTC has been shown to be sensitive to categorical action distinctions 178 such as whether they are directed toward persons or objects 6 . Interestingly, studies using semantic 179 tasks on actions using verbal stimuli 19 or action classification across videos and written sentences 20 180 tend to recruit anterior portions of the left LOTC, whereas studies using pictures find LOTC bilaterally 181 and more posteriorly, closer to the cluster in the MOG identified in the current study. Together, these 182 findings suggest that this area captures semantic aspects of actions at higher-order visual levels, 183 whereas anterior portions of the left LOTC might capture these aspects at stimulus-independent or 184 verbal levels of representation (see also 21 ). 185 Another novel finding of our study was that semantic similarity of actions is also encoded in the left 186 angular gyrus/pIPS. Interestingly, an area posteriorly/ventrally of pIPS at the dorsal part of left middle 187 occipital gyrus/retinotopic area V3A & V3B (V3CD), posterior to the transverse occipital sulcus (TOS) 22 , 188 also named occipital place area (OPA) 22,23 , was found to represent the similarity of scenes in which the 189 actions took place. OPA has been shown to respond to local elements of scenes (e.g. furniture, floor) 190 rather than global scene properties (such as boundaries and spatial expanse, i.e., the gist of a scene) 24 . 191 Moreover, OPA has been demonstrated to be sensitive to the navigational affordances of a scene (i.e., 192 the pathways for movements one may use to navigate through the scene) 25 . This profile is not only 193 important for visually-guided navigation and obstacle avoidance, but may also provide critical cues 194 indicating potential interactions with the local environment. Being part of the dorsal "where" 195 pathway, the IPS is known to be sensitive to spatial aspects of actions. Thus, a possible interpretation 196 is that pIPS captures relationships between an action and the environment/scene in which an action 197 takes place. In analogy to object affordances, we suggest to refer to such scene-based constraints of 198 action possibilities as scene affordances 26  have been demonstrated to represent the similarity structure of object categories 29 . Regarding the 213 results of the cluster analysis, salient distinctions at the behavioral and neural level are found between 214 animate and inanimate objects 30,31,e.g. 32 , which are further segregated into human and nonhuman 215 objects 33 , and manipulable and non-manipulative objects 34 , respectively. The division between 216 animate and inanimate objects, supported by neuropsychological, behavioral and neuroimaging 217 findings, has been suggested to have a special status, likely due to evolutionary pressures that favored 218 fast and accurate recognition of animals 32,33,35 . We conjecture that similar evolutionary mechanisms 219 might have produced the distinction between actions belonging to different categories, such as 220 locomotion (which might indicate the approach of an enemy), food-related actions (which might be 221 critical for survival) and communicative actions (critical for survival within a group). 222 Finally, our data showed that people tended to organize actions based on dissimilarity regarding their 223 meaning and the scenes in which they take place in a similar way (Figure 3 and Figure S6B). This 224 observation is corroborated by studies investigating scene categorization: the way humans categorize 225 scenes can be predicted best by a model that describes which actions can be carried out in a scene 27 . 226 In line with our findings, this model is also capable to predict the representational organization of 227 observed scenes in left LOTC, even if scenes only rarely contained persons or salient objects that are 228 associated with specific actions 36 . These results support the idea that there might be a strong interplay 229 between the neural mechanisms involved in processing the scene and the meaning of an action, most 230 likely due to the statistical co-occurrence of certain actions taking place in some scenes (such as food-231 related actions and kitchen) in comparison to others (such as food-related actions and a garage). 232

233
Using a combination of behavioral and fMRI data, we identified a number of meaningful semantic 234 categories according to which participants arrange observed actions. The corresponding similarity 235 structure was captured in the LOTC and the pIPS over and above the major elements of perceived 236 actions (body parts, scenes, movements, and objects), supporting the view that these areas play a 237 critical role in accessing the meaning of actions beyond the mere perceptual processing of action-238 relevant features. We propose that the LOTC encodes conceptual aspects of actions (e.g. running would be placed closer to each other than walking and drinking) and to press a button once 257 they were satisfied with the arrangement. In each subsequent trial (trial 2 to Np, where Np is the total 258 number of trials for participant p), a subset of stimuli was sampled from the original stimulus set. The 259 subset of actions was defined using an adaptive algorithm that provided the optimal evidence for the 260 pairwise dissimilarity estimates (which are inferred from the 2D arrangement of the items on the 261 screen, see 14

263
In contrast to previous studies that used a small set of actions, we aimed to cover a wide range of 264 actions that we encounter on a daily basis. To this aim, we initially carried out an online survey using 265 Google Forms. The aim of the survey was to identify actions that are considered common by a large 266 sample of people. We thus asked 36 participants (different from those that took part in the fMRI study) 267 to spontaneously write down all the actions that came to their mind within 10 minutes that they or 268 other people are likely to do or observe. As expected, participants often used different words to refer 269 to similar meanings (e.g. talking and discussing) or used different specific objects associated with the 270 same action (e.g. drinking coffee and drinking water). Two of the authors (EB, AL) thus assigned these 271 different instances of similar actions to a unique label. Actions were selected if they were mentioned 272 by at least 20% of the participants. In total, we identified 37 actions (see Table S1). 273 As a next step, we selected a subset of actions that were best suitable for the fMRI experiment. 274 Specifically, we aimed to choose a set of actions that were arranged consistently across participants 275 according to their perceived similarities in meaning. To this aim, we retrieved images depicting the 37 276 actions from the Internet. Using these images, we carried out inverse MDS (see corresponding section 277 and 14 for details) using 15 new participants. Each participant had 20 minutes to complete the 278 arrangement. In three additional 20-minute sessions, participants were furthermore instructed to 279 arrange the actions according to the perceived similarity in terms of the scenes in which these actions 280 typically take place, movement kinematics, and the objects which are typically associated with these 281 actions. The order in which these four tasks (semantics, scenes, movements, objects) were 282 administered to participants were counterbalanced across participants. To rule out that any obtained 283 arrangements were driven by the specific exemplars chosen for each action, we repeated the same 284 experiment with a new group of people (N=15) and an independent set of 37 images taken from the 285

Internet. 286
To construct representational dissimilarity matrices (RDMs), we averaged the dissimilarity estimates 287 for each pair of actions (e.g. the dissimilarity between biking and brushing teeth, etc.), separately for 288 each participant and each task, across trials. For each participant, we then constructed dissimilarity 289 matrices based on the Euclidean distance between each pair of actions that resulted from the inverse 290 MDS experiment. The dissimilarity matrices were then normalized by dividing each value by the max. 291 value of each matrix. Each row of this matrix represented the dissimilarity judgment of one action with 292 respect to every other action. To select the most suitable actions for the fMRI experiment, we aimed 293 to evaluate which of the 37 actions were arranged similarly across participants in the different tasks. 294 To this aim, we carried out a cosine distance analysis, which allowed us to determine, for each action, 295 the similarity across all participants. The cosine distance evaluates the similarity of orientation 296 between two vectors. The cosine distance can be defined as one minus the cosine of the angle 297 between two vectors of an inner product space: a cosine distance of 1 indicates that the two vectors 298 are orthogonal to each other (maximum dissimilarity/minimum similarity); a cosine distance of zero 299 indicates that the two vectors have the same orientation (maximum similarity/minimum dissimilarity). 300 The cosine distance can therefore range between 0 and 1. In an RDM, each row (or column) represents 301 the dissimilarity score between one action and every other action, ranging from 0 (minimum 302 dissimilarity) to 1 (maximum dissimilarity). Therefore, each row of the matrix of each single participant 303 was used to compute the pairwise cosine distances between this and the corresponding row of every 304 other participant. For each action, a small cosine distance between two single participants would 305 indicate that they agreed on the geometrical configuration of that action with respect to every other 306 action; a value close to 1 would indicate disagreement. For each action, we computed the mean across 307 the pairwise cosine distances of all participants in both behavioral pilot experiments and kept only 308 those actions that had a cosine distance within one standard deviation from the averaged cosine 309 distance in all tasks and both stimulus sets. Thirty-one actions fulfilled this criterion, whereas five 310 (getting dressed, cleaning floor, brushing teeth, singing and watering plants) had to be discarded. We 311 also decided to remove two additional actions (grocery shopping and taking the train) because these 312 could not be considered as single actions but implied a sequence of actions (e.g. entering the shop, 313 choosing between products, etc.; waiting for the train, getting on the train, sitting on the train, etc.). 314 At the end of the procedure, we identified 30 actions that could be used for the next step which 315 consisted in creating the final stimulus dataset. To this aim, we took photos of 29 of the 30 actions 316 using a Canon EOS 400D camera. To maximize perceptual variability within each action, and thus to 317 minimize low-level feature differences between actions, we varied the actors (2), the scene (2) and 318 perspectives (3), for a total of 12 exemplars per action. Exemplars for the action 'swimming' were 319 collected from the Internet because of the difficulties in taking photos in a public swimming pool. 320 The distance between the camera and the actor was kept constant within each action (across 321 exemplars). Since some actions consisted of hand-object interactions (such as painting, drinking) and 322 thus required finer details while other actions involved the whole body (such as dancing, running) and 323 thus required a certain minimum distance to be depicted, it was not possible to maintain the same 324 distance across all the actions. The two actors were instructed to maintain a neutral facial expression 325 and were always dressed in similar neutral clothing. If an action involved an object, the actor used two 326 different exemplars of the object (e.g. two different bikes for biking) or two different objects (e.g. a 327 sandwich or an apple for eating). Furthermore, some actions required the presence of an additional 328 actor (handshaking, hugging, talking). The brightness of all pictures was adjusted using PhotoPad 329 Image Editor (www.nchsoftware.com/photoeditor/). Pictures were then converted into greyscale and 330 resized to 300 x 400 pixels using FastOne Photo (www.faststone.org). In addition, we made the images 331 equally bright using custom written Matlab code (mean brightness across all images was 115.80 with 332 standard deviation equal to 0.4723). 333 To ensure that the final set of pictures were comprehensible and identified as the actions we intended 334 to investigate, we furthermore validated the stimuli through an online survey using Qualtrics and 335 Amazon Mechanical Turk using 30 participants. Specifically, the 30*12 = 360 pictures were randomly 336 assigned to three groups of 120 images. Each group was assigned to ten participants that had to name 337 the actions depicted in the images. For each participant, the images were presented in a random 338 sequence. Since most of the participants failed to correctly name some of the exemplars of making 339 coffee and switching on lights, these actions were excluded from the stimulus set. Therefore, the final 340 number of actions chosen for the main experiment was 28 (see Table S1). 341

342
The fMRI experiment consisted of twelve functional runs and one anatomical sequence halfway 343 through the experiment. Each functional run started and ended with 15 seconds of fixation. In 344 between runs, the participants could rest. 345 Stimuli were back-projected onto a screen (60Hz frame rate) via a projector (Sanyo, PLC-XP-100L) and 346 viewed through a mirror mounted on the head coil (distance between mirror and eyes: about 12 cm). 347 The background of the screen was uniform grey. Stimulus presentation and response collection was 348 controlled using ASF 37  To ensure that participants paid attention to the stimuli, we included seven (out of 63; 4.41%) catch 362 trials in each run which consisted in the presentation of an image depicting the same action (but not 363 the same exemplar) as the action presented in trial N-1 (e.g. eating an apple, actor A, scene A, followed 364 by eating a sandwich, actor B, scene B). Participants were instructed to press a button with their right 365 index finger whenever they detected an action repetition. Within the entire experimental session, all 366 28 actions could serve as catch trials and each action was selected randomly without replacement 367 such that the same action could not be used as a catch trial within the same run. After a set of 4 runs 368 all 28 actions were used as catch trial once, thus the selection process started from scratch. Catch 369 trials were discarded from multivariate data analysis. 370 Before entering the scanner, participants received written instructions about their task and 371 familiarized with the stimulus material for a couple of minutes. Next, participants carried out a short 372 practice run to ensure that they properly understood the task. 373 Behavioral experiment and model selection 393 Following the fMRI experiment, either on the same or the next day, participants took part in an 394 additional behavioral experiment in which they carried out an inverse MDS task using similar 395 procedures as described above (see Methods section and Figure 1B), using the same actions that were 396 used during the fMRI experiment. In separate blocks of the experiment, participants were asked to 397 arrange the actions according to their perceived similarity in terms of (a) meaning, (b) the body part(s) 398 involved, (c) scene, (d) movement kinematics, (e) the object involved. The order of blocks was 399 counterbalanced across participants. 400

MRI data acquisition
To construct dissimilarity matrices for each participant and task (a -e), we used the same procedure 401 described in the section Stimulus selection, i.e. we determined the Euclidean distance between each 402 pair of actions that resulted from inverse MDS, and normalized the dissimilarity matrices by dividing 403 each value by the maximum value of each matrix. Individual dissimilarity matrices were used as a 404 model for the representational similarity analysis of fMRI data (see next section). 405 To further characterize the structure that emerged from the inverse MDS, we adopted principal 406 component analysis (PCA) as implemented in the R package cluster to individuate the principal 407 components along which the actions were organized. To characterize the observed clusters, we 408 furthermore used a model-based approach using the K-means 44 clustering method. The K-means 409 method requires the number of clusters as an input, which was one of the parameters we wished to 410 estimate from the data. To this aim, we used the Silhouette method 45 as implemented in the R package 411 factoextra to estimate the optimal number of clusters. Specifically, this method provides an estimate 412 of the averaged distance between clusters as a function of the number of clusters used and selects 413 the value which provides the maximal distance. 414 Representational Similarity Analysis (RSA) 415 The aim of the RSA was to individuate those brain regions that were best explained by the models 416 obtained behaviorally and thus to infer the representational geometry that these areas encoded. We 417 therefore conducted an RSA over the entire cortical surface using a searchlight approach 46 at the 418 individual brain space. Each searchlight consisted of 100 features (1 central vertex + 99 neighbors). All 419 multivariate analyses were carried out using custom written Matlab functions and CoSMoMVPA 41 . 420 For the multivariate analysis, the design matrix consisted of 142 volumes X 28 predictors of interest 421 (resulting from the 28 actions) plus nuisance predictors consisting of the catch trials, the parameters 422 resulting from motion correction, and a constant term. Thus, for each participant, we obtained 28 423 beta maps at the volume level. The beta maps were averaged across runs and normalized (z 424 transformed) across voxels of each searchlight before the analysis. We adopted two approaches for 425 the representational similarity analysis, a standard and a multiple regression RSA approach. 426 Standard RSA. First, to identify clusters in which the neural data reflected the dissimilarity patterns 427 derived from the different behavioral tasks, we conducted a standard RSA in which we correlated the 428 dissimilarity matrix (DSM) derived from the neural data of each searchlight with the normalized 429 dissimilarity matrices (DSMs) of each of the five behavioral models. The correlation values were 430 assigned to the central node of each searchlight, thus leading to a correlation map for each of the five 431 models, separately for each participant. The correlation maps of this first-level analysis were then 432 resampled to the common space and fisher transformed to normalize the distribution across 433 participants to run a second-level analysis. Specifically, for each of the five models, the correlation 434 maps of all N participants were tested against zero using a t-test at each vertex. The resulting t maps 435 were corrected using a cluster-based nonparametric Monte Carlo permutation analysis (5000 436 iterations; initial threshold p < 0.001 16 ). 437 Multiple regression RSA. To determine clusters in which the neural DSM correlated with a given 438 behavioral DSM while accounting for the remaining behavioral DSMs, we conducted a multiple-439 regression analysis at each searchlight in which we used the DSMs derived from the five models as 440 regressors of interest to evaluate which predictor best explained the observed variance. This analysis 441 provided us with beta maps for each predictor (i.e., behavioral model) that were then entered in a 442 second-level (group) analysis to test the individual beta maps against zero. The procedure for multiple 443 comparisons correction was the same as described in the previous paragraph. Behavioral and fMRI data, materials and code used in the current study are available from the 452 corresponding author upon request. 453