Different kinds of simulation during literary reading: Insights from a combined fMRI and eye-tracking study

Mental simulation is an important aspect of narrative reading. In a previous study, we found that gaze durations are differentially impacted by different kinds of mental simulation. Motor simulation, perceptual simulation, and mentalizing as elicited by literary short stories influenced eye movements in distinguishable ways (Mak & Willems, 2019). In the current study, we investigated the existence of a common neural locus for these different kinds of simulation. We additionally investigated whether individual differences during reading, as indexed by the eye movements, are reflected in domain-specific activations in the brain. We found a variety of brain areas activated by simulation-eliciting content, both modality-specific brain areas and a general simulation area. Individual variation in percent signal change in activated areas was related to measures of story appreciation as well as personal characteristics (i.e., transportability, perspective taking). Taken together, these findings suggest that mental simulation is supported by both domain-specific processes grounded in previous experiences, and by the neural mechanisms that underlie higher-order language processing (e.g., situation model building, event indexing, integration).


Introduction
Many readers experience mental simulation while they read. Mental simulation has been defined as, "the reenactment of perceptual, motor, and introspective states acquired during experience with the world, body, and mind" (Barsalou, 2008, p. 618). This definition implies the presence of different kinds of simulation: perceptual simulation, motor simulation, and the simulation of introspective states (more commonly called "mentalizing"). Research has shown that, indeed, there is evidence for a difference between these three kinds of simulation in language processing. For example, in a study in which participants listened to audio-narratives, Nijhof and Willems (2015) found that motor simulation and mentalizing activated different brain areas. Moreover, the second part of Barsalou's definition states that simulation is "the reenactment of ... states acquired during experience with the world, body and mind". As each person's experiences with the world, body and mind are different from everyone else's experiences, this grounding of mental simulation in previous experiences implies that individuals will experience mental simulation in different ways. Indeed, individual differences in the connectivity between areas in both the visual and motor cortices were found to be related to personal experience in a study which looked at the relationship between the understanding of narrative events (i.e., actions, visual descriptions of scenes) and functional connectivity in the brain (Chow et al., 2015). Additionally, in the aforementioned study by Nijhof and Willems (2015), there was a negative correlation between the levels of activation in motor areas and mentalizing areas, indicating that some readers are more prone to motor simulation as opposed to mentalizing, and the other way around.
In a previous eye-tracking study we found that the three different kinds of mental simulation as described by Barsalou indeed have qualitatively different effects on eye movements during reading (Mak & Willems, 2019). Here we ask whether these different kinds of simulation share an overlapping neural locus, and/or are domain-specific processes. We examine if and how mental simulation during literary reading is visible on the neural level and if this differs between different kinds of simulation. Secondly, we attempt to find out whether there are individual differences in brain activation levels in the brain regions implicated in simulation.
Before explaining the hypotheses for the current study, we will first discuss what is known about eye movements and brain activation during reading.

1.1.
Eye movements and brain activation during reading A lot of research into eye movements during reading has shown that eye tracking can measure attention (e.g., to words, passages), and processing speed during reading (e.g., Rayner, 1998Rayner, , 2009. For example, increased attention to certain words or passages is associated with longer gaze duration. This can be seen in important text features such as the lexical frequency of words or word length. Words with a high lexical frequency, are words that occur often in day-to-day language. Because they occur frequently, these words are easier to recognize and often more predictable, warrant less attention, and are easier to process than words that infrequently occur in language. Similarly, shorter words are easier to process and thus warrant less attention than longer words. Indeed, highly frequent words and shorter words both are associated with shorter gaze durations (e.g., shorter reading times for frequent words; Rayner, 1998). Moreover, reading speed is not only related to the characteristics of these words themselves, but also relies on the context in which a word occurs. If a word is highly predictable within its context (regardless of its frequency), this word is easier to process (and warrants less attention) than unlikely words, and is indeed associated with shorter gaze durations (Goodkind & Bicknell, 2018;Hale, 2001;Levy, 2008). The predictability of a word within its context will be referred to as surprisal for the remainder of this paper (Lopopolo, Frank, van den Bosch, & Willems, 2017). When looking at simulation-inviting language in stories, this could be associated with reading speed in a similar manner. If simulation is associated with an increased processing load (because it takes time to link the meaning of words to a mental picture, for example), processing speedand therefore reading speed -will decrease. Likewise, if passages that invite simulation attract more attention than passages that don't invite simulation, this will result in longer or more frequent fixations on those passages. However, if simulation aids in processing (for example if the mental picture formed based on the previous text renders an upcoming word more predictable), reading speed will increase.
Apart from eye movements, brain activation has also been found to be informative in the study of reading. When people are reading, they are integrating single words into sentences and binding these sentences together. There are multiple brain networks involved in the execution of this process (Hagoort, 2019). These do not only include the perisylvian areas historically linked to language processing (i.e., Broca's and Wernicke's areas), but many other parts of the temporal and parietal and frontal cortex play an important role in language processing too (Hagoort, 2019). For example, more areas of the Left Inferior Frontal cortex are involved in language than just Broca's area (BA 44 and 45). Finally, not only cortical areas have been found to be involved in language processing, but also subcortical areas such as the thalamus and basal ganglia, as well as parts of the cerebellum (Hagoort, 2019). Activity in these brain areas during reading are modulated by word characteristics, such as word length, frequency and surprisal (Desai, Choi, & Henderson, 2020;Schuster, Hawelka, Hutzler, Kronbichler, & Richlan, 2016;Shain, Blank, van Schijndel, Schuler, & Fedorenko, 2020).
Reading stories does not only involve brain activation in parts of the brain associated with language processing per se. When people are reading stories, they are also often making inferences about a character's thoughts and behavior, or are incorporating new parts of a story into their situation model of the story (Zwaan, 2009). Pragmatic inferencing has been linked to areas within the Theory of Mind network (such as the temporoparietal junction, medial prefrontal cortex), whereas the integration of utterances into the situation model has been linked to the inferior frontal gyrus and angular gyrus (Hagoort, 2019).
The networks described so far are related to the mental processing of language and its integration into a wider context and a wider knowledge about the world. However, reading is also a behavior in a more physical sense. When people are reading, they are making eye movements, and these eye movements are accompanied by brain activation. Therefore, it is important to point out what areas of the brain are involved in the execution of eye movements, even in the absence of congruent language. When comparing word reading with pseudoword reading (under the assumption that both require eye movements, but only word reading requires language processing), Richlan et al. (2014) found that eye movements in general activate mainly bilateral occipito-parietal brain areas, whereas word reading (or, language processing) also activates language processing networks. Similarly, Choi, Desai, and Henderson (2014) found that eye movements were associated with activity in an eye movement network including the frontal eye fields, supplementary eye fields and intraparietal sulci, as well as an attentional network including the cingulate and parietal cortex.

1.2.
Differences between motor, perceptual and mental simulation: eye movements In our previous eye tracking study (Mak & Willems, 2019) we found differences in reading behavior between perceptual simulation, motor simulation, and mentalizing. Reading behavior was studied in an eye-tracking experiment in which participants read literary short stories. It was found that motor simulation reduced gaze duration (faster reading), whereas perceptual simulation and mentalizing increased gaze duration (slower reading). Additionally, individual differences in the effect of simulation on gaze duration were found, which were most striking in the case of mentalizing: although on average mentalizing increased gaze duration, there was a sizeable number of participants for whom mentalizing actually decreased gaze duration at the individual level. These individual differences in simulation were related to aspects of story world absorption and story appreciation. For example, the more attention someone paid to the story, the less their gaze behavior was affected by mental simulation. In contrast, the higher someone's emotional response to the story, the more their gaze behavior was affected by mental simulation.
These findings show that different kinds of mental simulation during narrative reading exist, and that people differ in how much they engage in either kind of simulation. In the current paper, we investigate which brain areas are sensitive to mental simulation and how the strength of activation in these brain areas is associated with measures of subjective reading experiences such as absorption and appreciation. In doing so, we aim to replicate and extend the findings of our previous paper (Mak & Willems, 2019) in the context of a combined eye-tracking and fMRI experiment.

Hypotheses
The changes in gaze duration in reaction to the different kinds of simulation found by Mak and Willems (2019) gave rise to the question what these changes in gaze duration reflect on a neural level. There are three possible answers to this question. The first possibility is that the different types of simulation are all represented in the same brain area, for example an area involved in the general process of constructing a coherent representation of the content of narratives, or situation model building (e.g., Martín-Loeches, Casado, Hern andez-Tamames, & Alvarez-Linera, 2008; see also Smirnov et al., 2014). Candidate areas for such a process would be the posterior cingulate cortex, anterior cingulate cortex, angular gyrus, precuneus, insula, dorsolateral and medial prefrontal cortex, and the superior frontal gyrus (with the first three being found to be involved in situation model building in multiple studies; see Hartung Kurby & Zacks, 2008;Speer, Reynolds, Swallow, & Zacks, 2009;Speer, Zacks, & Reynolds, 2007).
A final possibility would be that we see both a common "situation-model building" simulation area, and modalityspecific brain areas that respond to one specific kind of simulation. Evidence for both options has been found before (as has been reported above), but to our knowledge no previous studies have investigated the possibility of both common simulation areas and modality specific brain areas for all three kinds of simulation.

1.4.
Individual differences in the effect of simulation A second, more exploratory, question that we would like to answer in this study, concerns possible variation between participants in the strength of the brain activation associated with simulation. Individual differences in the relationship between simulation and gaze behavior were found to be related to differences in subjective reading experiences, notably appreciation and absorption (Mak & Willems, 2019). If there is a common mechanism by which simulation is associated with subjective experiences, a similar result may be found for neuroimaging data. For example, individual differences in the strength of activation in simulationsensitive brain areas may be correlated with individual differences in subjective reading experiences. In the current experiment, we will therefore analyze if individual differences in the strength of the brain activation associated with simulation (operationalized as the individual percent signal change in areas associated with simulation) are associated with subjective reading experiences (story world absorption, story appreciation), reading habits, and certain personal characteristics (i.e., empathy and transportability). In this context, story world absorption refers to an experiential state c o r t e x 1 6 2 ( 2 0 2 3 ) 1 1 5 e1 3 5 in which readers are feeling as if they are "lost in a story" (Kuijpers, 2014).

Current experiment
In the current experiment, participants read two Dutch literary short stories while they were in an MRI scanner and simultaneously had their eye-movements tracked. This allowed us to link eye-tracking data to neuroimaging data, within participants. We were interested in the responses of participants at the word level: we measured fixation duration and brain activation as a response to the number of times the fixated words were underlined for being part of motor descriptions, perceptual descriptions, and mental event descriptions. The scoring (underlinings) of motor, perceptual, and mental event content was acquired in a separate pretest with different participants. The stories were presented visually, and participants were asked to read the stories at their own pace. After reading the two stories, participants performed four localizer tasks and (while still in the scanner) completed questionnaires regarding their experience related to the story they just read (story world absorption, story appreciation). After scanning, questionnaires regarding reading habits in daily life (directly and indirectly measured), and personal characteristics questionnaires regarding empathy and transportability were administered.

Methods
We report how we determined our sample size, all data exclusions, all inclusion/exclusion criteria, whether inclusion/ exclusion criteria were established prior to data analysis, all manipulations, and all measures in the study.

Pretest
In a pretest described in detail in Mak and Willems (2019), all words in the two stories used in the current experiment were rated by 30 participants on whether these words were part of a motor description, a perceptual description, or a mental event description. A total of 90 participants took part in the pretest: each type of description was rated by a different group of 30 participants. Motor descriptions are defined as "concrete acts or actions performed by a person or object," such as "They reached the bus-stop shelter" (Story B: Symbols & Signs). Perceptual descriptions are "things that are perceivable with the senses," such as "A tiny unfledged bird" (Story B: Symbols & Signs). Mental event descriptions are "explicit descriptions of the thoughts, feelings and opinions of a character" and/or "reflection[s] by a character on his own or someone else's thoughts, feelings or behaviour". For example, "She thought of the recurrent waves of pain" (Story B: Symbols & Signs). It was counted how many participants underlined each word for each of the three types of description. This resulted in scores ranging from 0 to 30 per word, per type of description. These scores were taken as regressors for the fMRI and eye-tracking data analyses. The number of descriptive words per story per type of description can be seen in Fig. 1. There was no clear association between the ratings per word for motor, perceptual and mental event descriptions on the one hand, and other word characteristics on the other hand (i.e., lexical frequency, word length, surprisal; see Appendix A). The rationale for using these ratings instead of, for example, existing ratings of concreteness (which is highly correlated with imageability; Brysbaert, Stevens, De Deyne, Voorspoels, & Storms, 2014) or sensory modality (Speed & Brysbaert, 2022), is that the ratings obtained in our pretest take the context of the stories into Fig. 1 e Number of times words were underlined for mental event, motor and perceptual descriptions in stories (A) The people who had everything delivered and (B) Symbols and Signs. Note: in story A, 1562 words, 659 words and 1070 words were underlined by none of the pre-test participants for mental event, motor and perceptual descriptions, respectively. In story B, 646, 586 and 788 words were underlined by none of the pre-test participants for mental event, motor and perceptual descriptions, respectively. c o r t e x 1 6 2 ( 2 0 2 3 ) 1 1 5 e1 3 5 account. Since we were interested in mental simulation during the reading of complete literary short stories (a contextualized process; see Willems & Peelen, 2021), we believed it fitting to use contextualized ratings.

Participants
Forty participants took part in the current experiment (16 male). Participants were between 18 and 43 years old (M ¼ 24.61, SD ¼ 5.22). A power analysis based on Mak and Willems (2019) showed that power would be above .8 to capture small effects, in a study with 40 participants, reading two stories, with minimally 400 descriptive words per story per type of description (power analysis based on Jobe, 2009). We tested participants from the participant database of the Radboud University. Participants had no dyslexia, and had normal vision or vision correction of maximally þ4 or À4 (vision correction in the scanner was done with contact lenses or MR compatible glasses that were attached to the head coil). Other exclusion criteria were epilepsy, claustrophobia, pregnancy, brain surgery, or non-removable metal in or on the body. All inclusion and exclusion criteria were established prior to data collection. Participants gave informed consent prior to the study and were allowed to withdraw their consent at any point throughout the experiment, in accordance with the declaration of Helsinki. This experiment was approved under the ethical approval of the ethical committee CMO Arnhem/ Nijmegen (CMO 2014/288; version 2).

Stories
Two existing Dutch short stories were presented to the participants (also used in Mak & Willems, 2019). Story selection was based on the length of the stories, the presence of descriptive content, and the probability that the stories were unknown to the participants (in the study of Mak & Willems, none of the participants reported having read one of these stories before  Nabokov (translation in: Nabokov, 2003). The stories are 2988 and 2143 words long, respectively, and take around 10 min to read (Story A: M ¼ 10.08, SD ¼ 3.01; Story B: M ¼ 9.70, SD ¼ 2.94). All participants read both stories, in counterbalanced order. Legal copyright restrictions prevent public archiving of the stories used in this experiment which can be obtained from the copyright holders in the cited references.

Questionnaires
After reading the stories, participants completed a set of questionnaires. The questionnaires measuring reading experiences were filled out twice, directly after reading each story. The rest of the questionnaires were completed at the end of the experiment (see also Procedure below).  Kuijpers et al., 2014) more specifically aimed at measuring the experience of different kinds of simulation (mainly perceptual and motor simulation, e.g., I could see the events in the story happening as if I could see through the eyes of the main character; I could easily depicture the characters in the story). The SWAS is a validated scale consisting of 18 items with high internal validity , which measures 4 dimensions of story world absorption via the subscales Attention, Transportation, Emotional Engagement and Mental Imagery. Participants rate each question on a 7point scale (1 ¼ disagree, 7 ¼ agree). Legal copyright restrictions prevent public archiving of the SWAS which can be obtained from the copyright holders in the cited references. Story appreciation was measured with a questionnaire consisting of a general score of story liking (How did you like the story; 1 ¼ It was very bad, 7 ¼ It was very good) and thirteen adjectives (e.g., [did you find the story] Entertaining, … Ominous) that can be used to describe the stories (adapted from Knoop, Wagner, Jacobsen, & Menninghaus, 2016). These adjectives are taken from a list of adjectives that were found to be most often used by people to describe their opinion of poetry, and which can also be used to describe aesthetic appeal in the domain of literature (Knoop et al., 2016;Mak, Faber, & Willems, under review). Finally, 6 questions are asked regarding the enjoyment of the story (from Kuijpers et al., 2014; e.g., I was constantly curious about how the story would end; I thought the story was written well). Participants rate both the adjectives and the questions regarding enjoyment on a 7point scale (1 ¼ disagree, 7 ¼ agree). The appreciation questionnaire can be found on https://osf.io/9rwqn/. Both the SWAS and the appreciation questionnaire were also used in the previous eye-tracking experiment. ) on a 7-point scale (e.g., Becoming extremely involved in a good book or movie is somewhat rare for me; When I'm upset at someone, I usually try to "put myself in his shoes" for a while). The Fantasy subscale measures the extent to which someone tends to get mentally involved in the stories they encounter, to the point at which they imagine themselves being part of the story (transportability). The Perspective Taking subscale measures the extent to which someone is able to take someone else's perspective in daily life. Legal copyright restrictions prevent public archiving of the IRI which can be obtained from the copyright holders in the cited references. We do not have legal permission to publicly archive all materials used in this study. Readers seeking access to the materials should contact the first author.

Procedure
Participants first read the two stories in the MRI scanner, while their eye movements were being tracked. Stories were presented in counterbalanced order. Participants were instructed to read the stories the way they would also read for their own leisure. There was no additional task, and participants were able to proceed through the stories at their own pace. To proceed through the pages in the story, participants pressed a button with their right index finger when they finished reading a page. Both stories were divided into 30 pages. After each story, participants were allowed to take a short break from reading, to fill in the SWAS and appreciation questionnaire about the story they just read (while remaining inside the MRI scanner). After reading the two stories, participants performed four localizer tasks (see Appendix B). Prior to this experiment, it was unknown whether individual differences in reading behavior would be detectable at the whole brain level. We therefore included the localizer tasks (now described in Appendix B) to have the opportunity to obtain functional regions of interest (ROIs). One important downside of these localizer tasks, however, was that they contain decontextualized stimuli, whereas our experiment and research question were specifically aimed at the neural processes that underlie naturalistic, contextualized reading (see Willems & Peelen, 2021 for a review about the differences in processing in response to contextualized versus decontextualized stimuli). Given that our whole brain analysis yielded functional ROIs that could be interrogated further, data obtained using the localizer tasks were not used. After the localizer tasks, participants left the MRI scanner, and completed the final questionnaires in a separate booth. First, participants answered the comprehension check questions about the two stories, after which the questions regarding reading habits and personal characteristics were asked.
No part of the study procedures was pre-registered prior to the research being conducted.

Stimulus presentation
Stimuli were presented page by page on a projection screen (http://www.macada-innovision.nl) at the end of the bore, using a EIKI LC -XL100 beamer with a native resolution of 1024 Â 768, with Presentation software (NBS, Berkeley, California). Participants could view the screen via a mirror (https://www.pgo-online.com/intl/katalog/cold-mirrors.html) mounted on the head coil. Pages consisted of maximally eight triple spaced lines. The distance between the mirror (110 Â 100 mm) and the projection screen (369 Â 277 mm) was 855 mm, and the distance between the mirror and the eye about 100 mm (depending on how high a participant's head lies in the head coil).

Eye movement data acquisition and pre-processing
An MR compatible ceiling mounted Eyelink 1000 eye tracker (SR Research, Ottawa, Canada), with a sampling rate of 1000 Hz was used for eye movement data acquisition during scanning. The eye tracker records infrared light reflected by the eyes, via a mirror attached to the head coil. The eye tracker was calibrated and calibration was validated before the presentation of each story. Using SR Research's Eyelink Data Viewer, all fixations were checked before data analysis, and, if necessary, manually aligned. If this was impossible, because data were too noisy, data were excluded on a page-by-page basis. If too many pages had to be excluded within a participant, all data for this participant were excluded. This resulted in the exclusion of all data for three participants, the exclusion of one story for five participants (for one of these participants this was due to tracker malfunction rather than poor data quality), and in the exclusion of one to five pages in six participants (14 pages in total in these six participants). This amounts to a total of 14.33% of data loss based on eye tracking issues. After preprocessing, data for 37 participants were retained (full data for 27 participants, rejection of one story for five participants, rejection of a small portion of data for another five participants).
If entire story readings (or entire participants) needed to be removed due to poor eye tracking quality, the fMRI data for these story readings were also discarded (as we needed the eye tracking data to be able to analyze the fMRI data). In the cases where only one to five pages of eye tracking data needed to be removed, we did not discard the fMRI data for these participants. After preprocessing, we were able to use all fMRI data (two stories) for 32 participants, and fMRI data for one story for five participants. To be able to still analyze the fMRI data for the pages of which eye tracking data were discarded in the five participants for whom a small portion of the eye c o r t e x 1 6 2 ( 2 0 2 3 ) 1 1 5 e1 3 5 tracking data was rejected, we needed to impute the eye fixations for these data. To stay as close as possible to the participant's natural reading behavior, we modelled the onset and duration of the fixations, but imputed the mean value of the word characteristics we wanted to model as the weights of these fixations. This way the discarded part of the data would have no influence on the results of our analyses.

2.7.
fMRI data acquisition and pre-processing Preprocessing was carried out using FEAT (version 6.00) in FSL (FMRIB's Software Library, www.fmrib.ox.ac.uk/fsl). The first ten or eleven volumes (ten or 11 s) were discarded (depending on the task programming) to allow for magnetic field saturation. Using FLIRT (Jenkinson, Bannister, Brady, & Smith, 2002;Jenkinson & Smith, 2001), functional images were registered to the high resolution structural images (using Rigid-Body Transformation (6 DOF) and Boundary-Based Registration (Greve & Fischl, 2009)) after non-brain tissue was removed using BET (Smith, 2002). Motion correction was performed using MCFLIRT (Jenkinson et al., 2002), and values for the framewise displacement (average of rotation and translation parameter differences, using weighted scaling; Power, Barnes, Snyder, Schlaggar, & Petersen, 2012) as calculated using FSLMotionOutliers were saved as a confound EV for the first level analyses. High resolution structural images were registered to standard (MNI152 template, 2 Â 2x2mm) space using FNIRT (Andersson, Jenkinson, & Smith, 2007a;2007b) nonlinear registration (!12 DOF). Spatial smoothing was performed using SUSAN noise reduction (Smith & Brady, 1997) with a 5 mm FWHM Gaussian kernel. Grand-mean intensity normalisation of the entire 4D dataset was done by a single multiplicative factor. High pass temporal filtering was applied using Gaussian-weighted least-squares straight line fitting, with sigma ¼ 45.0 s. fMRI data preprocessing resulted in the additional exclusion of one story for one participant (1.25% data loss in addition to the data loss due to poor eye tracking quality). Note that there was a lot of overlap between the quality of the eye tracking and fMRI data: participants who move much during scanning, tend to have both poor eye tracking and poor fMRI data.

Data analysis
No part of the analyses was pre-registered prior to the research being conducted.

Eye tracking data
The eye tracking data were analyzed in a similar way as the eye tracking data in Mak and Willems (2019) to make direct comparison of the eye tracking results possible. We analyzed how motor description, perceptual description and mental event descriptions related to gaze duration, while controlling for lexical frequency, word length and surprisal value as regressors of no interest, and allowing for random slopes and intercepts for the three types of descriptions over the interaction between subject and story to allow for individual variation between subjects and stories. Lexical frequency was derived from the SUBTLEX-NL database and consisted of the logarithm of the frequency with which a word appeared in the database (Keuleers, Brysbaert, & New, 2010). Word length was determined by counting the number of characters for each word. Surprisal value was derived from perplexity (i.e., an indication of the accuracy of a language model, in this case the perplexity is an indication of the accuracy with which a word is predicted by the previous words, cf. Lopopolo et al., 2017), calculated using a 3-g model trained by SRILM (Stolcke, 2002) on 1 million sentences from the NLCOW2012 corpus (Sch€ afer & Bildhauer, 2012). Perplexity was equal to 10 to the power of negative surprisal. As in the analyses in Mak and Willems (2019), we used the values for the descriptions and word characteristics of the previous word, as this allowed us to look in the spillover regions. We analyzed this with a Bayesian Multilevel Model using the package brms (Bü rkner, 2017, 2018) and Stan (Stan Development Team, 2020) in R version 4.0.3 (R Core Team, 2021), see https:// osf.io/8eqya/. The rationale for calculating a Bayesian multilevel model as opposed to a "classical" frequentist model, was that Bayesian models are more flexible and more capable of fitting complex models (e.g., Bü rkner, 2018; Nalborczyk, Batailler, Loevenbruck, Vilain, & Bü rkner, 2019). Additionally, the analyses of the fMRI data were also done within a Bayesian framework (Beckmann, Jenkinson, & Smith, 2003;Woolrich, Behrens, Beckmann, Jenkinson, & Smith, 2004;Woolrich et al., 2009). Rather intuitively, Bayesian multilevel models calculate the range of the most probable values of each parameter, a 95% Credible Interval. If this Credible Interval does not cross zero for a given parameter, this indicates a 95% certainty that the true value of this parameter is distinguishable from zero. In our model, we used weakly informative, normallydistributed priors with a mean of 0 and a standard deviation of 10 for all fixed effects. These priors are considered relatively conservative (McElreath, 2016). For the population-level intercept we used an informative, normally-distributed prior with a mean of 250 and a standard deviation of 50, since gaze durations are generally between 200 and 300 ms long on average. As variance can only be positive, weakly regularizing, half-cauchy priors with a mean of 0 and a standard deviation of 1 were used for the variance of the random effects as well as the overall variance (as suggested by Gelman, 2006;McElreath, 2016). The Gelman-Rubin diagnostic (Rhat) was 1.0 for all parameters (except for the intercept, for which it was 1.01), indicating that the model had converged.

fMRI data
The fMRI-data were analyzed using a fixation-based analysis (comparable to an event-related analysis; see Richlan et al., 2014). In this analysis, the onset of a fixation was seen as the event onset, and the duration of the fixation as the event duration. These fixation events were then convolved with the HRF. From the eye-tracking data, we extracted the fixation (event) onsets and durations per word (which were determined automatically by SR Research's default parsing algorithm), to determine which word was looked at, at any given time during reading. Data analyses were performed in Feat (FMRI Expert Analysis Tool) Version 6.00, part of FSL (FMRIB's Software Library, www.fmrib.ox.ac.uk/fsl). Time-series statistical analysis was carried out using FILM with local autocorrelation correction (Woolrich, Ripley, Brady, & Smith, 2001). For the first level analysis, we ran a GLM per participant (per run, one analysis for each story) where we modelled the onset and duration of each fixation, weighted by the scores for motor descriptions, perceptual descriptions, mental event descriptions, and the first principal component 1 of lexical frequency, word length and surprisal value of the word that was fixated (to control for these word characteristics). This way we determined which brain areas respond specifically to either of the three types of descriptions in stories, while controlling for differences in word characteristics.
For statistical inference we contrasted each type of description with the other types of descriptions, to find the activation that was specific to that type of description (i.e., weighted contrasts [1 À.5 À.5] for motor > perceptual and mental event, perceptual > motor and mental event, mental event > motor and perceptual). Additionally, we contrasted each type of description with baseline (contrasts [1 0 0], [0 1 0] and [0 0 1], and visualized which areas were commonly activated by all three types of descriptions (as a conjunction analysis). The z-statistic images resulting from the contrasts were thresholded using clusters determined by z > 3.1 and a (corrected) cluster significance threshold of p ¼ .05. The cluster threshold was determined based on the theory of Gaussian Random Fields (Worsley, 2001). This effectively controls the multiple comparisons problem introduced by the massive univariate approach taken at a family-wise error rate of p < .05 (Worsley, 2001).
As participants each read two stories, in two separate runs, we first aggregated the results for the two stories at the participant level using a standard weighted fixed effects model in FLAME (FMRIB Local Analysis of Mixed Effects). In this model, the variances from the first level analysis were used as the fixed effect error variances, and the random effects variance was forced to zero (Beckmann et al., 2003;Woolrich, 2008;Woolrich et al., 2004). The output from the fixed effects models per participant was used as input for the second level analysis. The second level analysis was performed using FLAME (FMRIB Local Analysis of Mixed Effects) stage 1 with automatic outlier detection, which estimates between-subject random effects using MCMC (Beckmann et al., 2003;Woolrich, 2008;Woolrich et al., 2004).
As a final, exploratory analysis, we investigated how individual differences in brain activation in response to the different types of descriptions are related to the experience of narrative reading as well as individual difference measures. We did this to find out whether brain activation due to simulationeliciting content occurs equally or differently across individuals and to find out whether any individual differences could be explained by reading experiences or personal characteristics. The analysis was done by first extracting the percent signal change (per participant and per story, from the first level analyses) in five or six regions of interest for each of the three types of descriptions. We selected regions of interest that were (1) significantly activated by one or all of the three types of descriptions on the group level, that were (2)  We derived the regions of interest from the results of our group analysis: we extracted the by-participant by-story percent signal change from areas that were found to be commonly activated by these descriptions. We then built models to predict percent signal change in each area, by scores on the Story World Absorption Scale, the appreciation questionnaire, the Fantasy and Perspective taking subscales of the Interpersonal Reactivity Index, the Author Recognition Test, and the questions about reading habits (see heading "Questionnaires" below for more information on these questionnaires). We built separate models for Story World Absorption and for appreciation, to make sure that any conceptual overlap between the two would not skew our results. We analyzed this with Bayesian Multilevel Models using the package brms (Bü rkner, 2017, 2018) and Stan (Stan Development Team, 2020) in R version 4.0.3 (R Core Team, 2021), see https://osf.io/8eqya/. In our models, we used weakly informative, normally-distributed priors with a mean of 0 and a standard deviation of 1 for all fixed effects. For the population-level intercept we used a weakly informative, normally-distributed prior with a mean of 0 and a standard deviation of 10. These priors are considered relatively conservative (McElreath, 2016). As variance can only be positive, weakly regularizing, half-cauchy priors with a mean of 0 and a standard deviation of 1 were used for the variance of the random effects as well as the overall variance (as suggested by Gelman, 2006;McElreath, 2016). In all models, the Gelman-Rubin diagnostic (Rhat) was 1.0 for all parameters, indicating that the models had converged.

Results
All anonymized  Bartlett's test of sphericity showed sufficient correlation between items, c2 (55) ¼ 1879.28, p < .001. Based on the scree-plot in combination with the eigenvalues found in an initial analysis (Kaiser's method) and the model fit (fit based upon off diagonal values), it was decided to retain five components in the final model. This model explained 77% of the variance.
The first component that we found corresponded to Interest (consisting of items Suspenseful, Interesting, Captivating, Gripping, and Boring (À)); the second component to Sadness (Tragic, Sad); the third component to Special (Special); the fourth component to Positive Affect (Witty, Beautiful); and the final component to Ominous (Ominous, Funny (À), and Entertaining (À)). The pattern matrix for the factor loadings after rotation can be found in Table 1. 3.1.1.3. COMPREHENSION CHECK. On the comprehension check for story A (three multiple choice questions, with four answer options each), three participants made one mistake. All other participants got all three questions correct. For story B, ten participants made one mistake, and three participants made two mistakes. Seven out of the ten participants who made one mistake and all three participants who made two mistakes, answered the second question incorrectly (this question was answered incorrectly 25% of the time). Therefore, we decided not to reject participants based on this question. Hence, none of the data was rejected based on the comprehension check for either of the stories.
3.1.1.4. READING HABITS. Answers on the reading habits questionnaire were measured on a scale ranging from 1 to 5 on four of the five questions, but from 1 to 4 on the final question. Therefore, z-scores were calculated for all questions on this questionnaire (higher values indicating more reading experience). Overall reliability was good if the question about nonfiction reading was excluded, Cronbach's a ¼ .80.

Eye tracking data
Our model predicting the gaze durations on individual words, by the values of motor descriptions, perceptual descriptions, mental event descriptions, lexical frequency, word length and surprisal value of the previous word (to account for spillover effects, cf. Mak & Willems, 2019), showed that motor descriptions were associated with shorter gaze durations on the next word (i.e., faster reading; see Table 2, Fig. 2B). Perceptual and mental event descriptions were associated with longer gaze durations on the next word (i.e., slower reading; see Table 2, Fig. 2C and D). These results nicely replicate the results in Mak and Willems (2019), indicating that eye movements could reliably be tracked inside the fMRI scanner. Unlike the results in Mak and Willems (2019), the studied word  c o r t e x 1 6 2 ( 2 0 2 3 ) 1 1 5 e1 3 5 characteristics (lexical frequency, word length and surprisal value) did not reliably influence gaze durations in the spillover region (see Table 2, Fig. 2E and F). Looking at the individual variation in the relationship between the three types of descriptions and gaze durations, we see that these relationships varied between all story*subject combinations (the standard deviation of the slope of Motor

3.2.
Neuroimaging results All localizations reported here are based on the Har-vardeOxford Cortical Structural Atlas, HarvardeOxford Subcortical Structural Atlas, or the Cerebellar Atlas in MNI152 space after normalization with FNIRT.

Activations specific to each of the three types of descriptions
Activations specific to each of the three types of descriptions were clusters of voxels that were reliably activated to one type of description but not to the two other types of descriptions. Peak coordinates for the activation clusters can be found in

Shared activation of all three types of descriptions
Shared activation of all three types of descriptions (i.e., the overlap between the contrasts of each type of description with the implicit baseline 2 ; conjunction analysis) was found in the left anterior supramarginal gyrus (see Fig. 7; see Table 4 for the peak coordinates). No other reliable clusters of activation were visible in all three contrasts.

Individual differences: percent signal change analysis
To find out whether brain activation due to simulationeliciting content occurs equally or differently across individuals and to find out whether any individual differences could be explained by reading experiences or personal characteristics, we calculated spherical regions of interest (10 mm diagonally around a peak coordinate) in regions that had been significantly activated by the three types of descriptions on the group level, and that we believed would be good candidates for finding individual differences in simulation. After creating ROI maps of these regions of interest in MNI space, we converted the maps to each participant's native space, in order to be able to extract data on percent signal change per participant from the first level analyses. Percent signal change data was extracted per participant and per story using Featquery. We decided to analyze the 90th percentile of the percent signal change, meaning that within our ROIs we looked at the percent signal change in response to the three types of descriptions, in the 10% of voxels in which this percent signal change was highest in each participant in each story. We chose to look at the 90th percentile of the percent signal change, to make our findings more robust to outliers than if we would have looked in the 95th percentile.
We then created models predicting percent signal change in each of the ROIs, by mean absorption (per participant and story), the five principal components of appreciation (per participant and story), and scores on the Fantasy and Perspective Taking subscales of the Interpersonal Reactivity Index, on the Author Recognition Test, and for daily life reading habits (per participant). We will report all statistically reliable results here (i.e., the instances where questionnaire scores were reliably associated with the percent signal change; see Fig. 8), and provide tables with all results for all ROIs in Appendix C.
3.2.3.1. MOTOR DESCRIPTIONS. The results of the models predicting percent signal change for the five ROIs for the motor contrast showed that there was a positive association between percent signal change and the Special component of appreciation in the right precuneus (Estimate ¼ .10; 95% CI ¼ .01; .18), showing that experiencing a story as special is associated with higher percent signal change in this brain there was no fixation on words (e.g., saccades, blinks). c o r t e x 1 6 2 ( 2 0 2 3 ) 1 1 5 e1 3 5 Table 3 e Clusters of regions activated specifically for words belonging to motor descriptions, perceptual description and mental event descriptions in the stories. Volume is given for the entire cluster, with the first structure in that cluster. c o r t e x 1 6 2 ( 2 0 2 3 ) 1 1 5 e1 3 5 region. In the left frontal orbital gyrus, we found a positive association between percent signal change and the Positive Affect component of appreciation (Estimate ¼ .07, 95% CI ¼ .03; .11), meaning that participants who experienced stories as witty or beautiful showed a larger percent signal change in the frontal orbital gyrus. Finally, for the anterior cingulate gyrus, we found a positive association between percent signal change and the Fantasy subscale of the IRI (Estimate ¼ .09, 95% CI ¼ .03; .14), indicating that people who are prone to getting mentally involved in stories showed higher percent signal change in the cingulate gyrus.

Region
3.2.3.2. PERCEPTUAL DESCRIPTIONS. The results of the models predicting percent signal change for the five ROIs for the perceptual contrast showed that there was a positive association between percent signal change and the Fantasy subscale of the IRI, in both the left (Estimate ¼ .08, 95% CI ¼ .02; .13) and right (Estimate ¼ .08, 95% CI ¼ .01; .15) inferior temporal gyrus, and in the right temporal fusiform cortex (Estimate ¼ .07, 95% CI ¼ .00; .13), showing that people who are prone to getting mentally involved in stories show higher percent signal change in response to perceptual descriptions in these three areas. Additionally, there was a positive association between percent signal change and the Perspective Taking subscale of the IRI in the left inferior temporal gyrus (Estimate ¼ .08, 95% CI ¼ .02; .14) and in the left temporal fusiform cortex (Estimate ¼ .06, 95% CI ¼ .01; .10), indicating that people who more often take other people's perspectives in daily life show higher percent signal change in these areas.  .01), indicating that participants who experienced the stories as witty and beautiful showed a lower percent signal change in this area.

Discussion
In this study, combined eye tracking and fMRI scanning was used to study narrative reading. Participants read two literary short stories in the scanner while their eye movements were being measured so that each participant's individual eye movements could be linked to their brain activity. With this study, we examined whether different kinds of simulation were associated with different patterns of brain activation and/or if there was one general area in the brain that can be associated with all three kinds of simulation under study. Secondly, we attempted to find out whether individual differences in reading experiences can be linked to differences in activation levels in specific brain regions.  c o r t e x 1 6 2 ( 2 0 2 3 ) 1 1 5 e1 3 5 4.1.
Differences in the effect of the three kinds of simulation on brain activation The first question that we attempted to answer with the current experiment was whether different kinds of simulation were associated with different patterns of brain activation, or, alternatively, if there was one general area in the brain that can be associated with all three kinds of simulation under study. In our study, we found both modality-specific activations and a brain area that was activated in response to all kinds of simulation. We will first review the patterns of brain activation that were specific to motor simulation, perceptual simulation and mentalizing, before discussing a possible domain-general "simulation area".
In line with our expectations, we found that motor descriptions were associated with activity in the cingulate and paracingulate cortex, precuneus, parahippocampal gyrus, superior frontal gyrus and middle frontal gyrus. These areas have been observed to be activated in studies of motor simulation before (Kurby & Zacks, 2013;Moody & Gennari, 2010;Nijhof & Willems, 2015). However, we did not find activation in important motor areas such as precentral and postcentral cortex, superior temporal sulcus, cingulate cortex, supplementary motor area, middle and superior temporal gyrus, inferior parietal lobule, intraparietal sulcus. These areas have been implicated in motor simulation in previous studies focusing on the processing of contextualized language (e.g., Kurby & Zacks, 2013;Nijhof & Willems, 2015). Although no activation in response to motor descriptions was found in these primary and premotor areas (as one may expect based on the grounded cognition account of mental simulation; but note that a lack of activation in the primary motor cortex in response to motor simulation has been found before; Willems, Toni, Hagoort, & Casasanto, 2010), activation was found in the precuneus, which is situated next to the primary sensory and motor areas and is extensively connected with the premotor and supplementary motor areas (Cavanna & Trimble, 2006). Functionally, the precuneus has been implicated in motor imagery (Cavanna & Trimble, 2006), although it has been found to play a role in episodic memory, self-processing and mentalizing as well (Cavanna & Trimble, 2006;Schurz, Radua, Aichhorn, Richlan, & Perner, 2014).
Furthermore, a range of brain areas that have been associated with inferencing, event segmentation, and situation model building were activated when people read motor descriptions. For example, the angular gyrus, subcallosal cortex and frontal medial cortex have all been associated with event segmentation and situation model updating, together with the modality-specific motor areas mentioned above (Kurby & Zacks, 2008;Speer et al., 2007Speer et al., , 2009. Perhaps inferencing, event segmentation and situation model building is mainly done through interpretations of actions, something that fiction authors make use of when they decide to "show, don't tell" about their characters minds and intentions. This could also explain why readers speed up while reading motor descriptions: if people are used to inferring other people's minds and intentions based on their actions, this might make motor simulation a relatively automatic, fast process compared to other kinds of simulation (i.e., perceptual and mental event simulation).  c o r t e x 1 6 2 ( 2 0 2 3 ) 1 1 5 e1 3 5 For perceptual simulation, we found several modalityspecific areas that we had expected to find based on previous work. Areas in the fusiform and parahippocampal gyri have previously been associated with visual simulation (Chow et al., 2015), and were activated by perceptual descriptions in the current study as well. Additionally, we also found activation related to perceptual descriptions in the inferior temporal gyrus, inferior frontal gyrus, amygdalae, temporal pole, frontal pole, precentral gyrus, superior lateral occipital cortex, frontal orbital cortex, and angular gyrus. The inferior temporal gyrus is considered to be an important part of the ventral visual pathway, a pathway implicated in object, face and scene perception (Grill-Spector & Weiner, 2014; Kanwisher, 2010). Other areas of this pathway include the parahippocampal gyrus, fusiform cortex and temporal pole, and this pathway is functionally connected to the amygdalae (for the processing of emotionally salient stimuli) and to the precentral gyrus (see for a review Kravitz, Saleem, Baker, Ungerleider, & Mishkin, 2013). Taken together, our findings provide evidence for the idea that a network of visual processing areas, and in particular the ventral pathway, might support perceptual simulation, suggesting a domain-specific mechanism.
Reading mental event descriptions was associated with activations in the temporal pole, parietal operculum, anterior cingulate (situated closely to the medial prefrontal cortex) and Fig. 8 e Scatter plots and estimated slopes for the relationships between percent signal change and components of appreciation or subscales of the Interpersonal Reactivity Index. Plots AeC are relationships found for motor simulation, plots DeH are relationships found for perceptual simulation, and plots I and J are relationships found for mental event simulation.

4.2.
Overlap between the three kinds of simulation Apart from these modality-specific brain areas, all three kinds of simulation were associated with activity in an area in the supramarginal gyrus which is situated in the inferior parietal lobe. Although the supramarginal gyrus has been associated with phonological processing (Hartwigsen et al., 2010;Stoeckel, Gough, Watkins, & Devlin, 2009), it has also been associated with literary reading (Hartung et al., 2021) and with referential indexing: the integration of references into a context (Matchin, Hammerly, & Lau, 2017). Together with the temporoparietal junction (TPJ), the inferior parietal lobe (IPL) is a hub where multiple brain networks come together (i.e., the frontoparietal control network, default mode network, mentalizing network, cingulo-opercular network, ventral attention network; Igelstr€ om & Graziano, 2017). Functionally, the TPJ and IPL have been associated with a variety of tasks, from bottom-up perception tasks (such as automatic, stimulusdriven attention) to higher-order tasks (such as selfperception, mind wandering and social cognition; Igelstr€ om & Graziano, 2017). The fact that this area is involved in such a variety of brain networks and functions, implies that it is crucial for the integration and regulation of a multitude of neural processes (Igelstr€ om & Graziano, 2017). Therefore, it is likely that this area plays a role in situation model updating, which consists of the integration of new information from a variety of sources into an existing situation model (Zwaan, Langston, & Graesser, 1995;Zwaan & Radvansky, 1998). Situation model updating has been suggested to play a role in mental simulation in general (Kurby & Zacks, 2013;Zwaan, 2009), which is why it is not surprising that this area has been found to be associated with all three kinds of simulation. Taken together, the neuroimaging findings from the current study suggest that, although there are different kinds of simulation distinguishable at the neural level, these different kinds of simulation are all supported by the same, overarching neural mechanism.

Links with individual differences in appreciation and absorption
For all three kinds of descriptions, we found areas in which individual differences in the percent signal change as a response to these descriptions was associated with measures of story appreciation (story-specific effects) as well as personal characteristics (trait-based measures). Percent signal change was not associated with story world absorption (statebased measures). Apparently, state-based individual differences do not explain the effects of reading on the neural level, whereas trait-based individual differences do. This is in line with results found in multiple previous studies, where traitbased individual differences (as opposed to state-based individual differences) were also more strongly associated with simulation (Faber, Mak, & Willems, 2020;Hartung et al., 2021;Hartung, Hagoort, & Willems, 2017;Mak, De Vries, & Willems, 2020;Van den Hoven, Hartung, Burke, & Willems, 2016). Seemingly, simulation is more strongly associated with stable characteristics than with reading experiences such as absorption and appreciation.

4.4.
Recommendations for future research The analysis of the eye tracking data in the current study showed similar effects of perceptual simulation, motor simulation and mentalizing on gaze durations as were found in our previous study (Mak & Willems, 2019). Motor simulation was associated with shorter gaze durations (faster reading), whereas perceptual simulation and mentalizing were associated with longer gaze durations (slower reading), with effect sizes that were highly similar to the previously found effect sizes (Mak & Willems, 2019). Given the importance of replication and reproducibility of results, we were happy to see that the eye tracking results are largely consistent with earlier results obtained in a non-fMRI setting. This indicates that the reading in the fMRI scanner appears to be consistent with reading in other settings, suggesting that our findings are likely to generalize to other reading contexts, which is encouraging for future studies attempting to find the relationship between eye movements and brain activation during reading. An interesting remaining question is how the findings at the neural level and at the level of reading behavior relate to each other. Different kinds of mental simulation were associated with either faster or slower reading, and it would be interesting to find out exactly how these changes in reading behavior relate to (changes in) brain activity. Similarly, an open question remains how individual differences in the strength of the association between simulation and reading behavior relate to brain activity. These are questions that the experiments and analyses reported in this paper have not been able to answer, but that deserve further exploration.
The dataset collected for this paper, can be used to answer many more questions concerning eye movements and brain activation during reading. For example, in the current study we used contextualized ratings of the words that were being read as regressors in our models. However, the effects of other aspects of the words in the stories that were read could be studied using existing ratings of, for example, concreteness (which is highly correlated with imageability; Brysbaert et al., 2014), sensory modality (Speed & Brysbaert, 2022), or other word characteristics.

Conclusion
In the current experiment participants read literary short stories in the MRI scanner while their eye movements were being tracked. With this study, we aimed to determine if reading motor descriptions, perceptual descriptions and mental event descriptions are associated with differential neural activation. It was found that reading motor descriptions was associated with activation in modality specific motor areas and in areas involved in event segmentation and situation model building. Reading perceptual descriptions was associated with activation in modality specific perceptual processing areas. Reading mental event descriptions was associated with activation in mentalizing and language processing areas. All three kinds of descriptions were associated with activation in the supramarginal gyrus, which is part of a brain area involved in the integration and regulation of a multitude of neural processes. We propose that it is likely that this area is involved in referential indexing and situation model building. Furthermore, within the activated areas we found some individual differences in percent signal change that were mainly associated with trait-level individual differences (i.e., perspective taking, transportability). This shows that certain personal characteristics can influence how strongly readers respond on the neural level to mental simulation as elicited by stories. Taken together, these findings suggest that mental simulation is supported by both domain-specific processes grounded in previous experiences, and by the neural mechanisms that underlie higher-order language processing (e.g., situation model building, event indexing, integration).

Data statement
For this paper, a dataset was created that is available from https://doi.org/10.34973/hrax-yn39. It is stored on the repository of the Donders Center for Cognitive Neuroimaging.
The data have been archived in a Research Documentation Collection, were anonymized and archived as a Data Sharing Collection in the Donders Repository, according to the policies of the Donders Institute. The data and analysis code for the behavioral results and the individual difference analysis have been published on https://osf.io/8eqya/.

Declarations of interest
None.

Open Practices Section
The study in this article earned Open Data badge for transparent practices. The data for this study is available at: https:// osf.io/8eqya/3. The section Data Availabilty project or the sentence the authors do not have permission to share data. should not appear anywhere in any version of this article.