Emotional valence and arousal affect reading in an interactive way: Neuroimaging evidence for an approach-withdrawal framework

A growing body of literature shows that the emotional content of verbal material affects reading, wherein emotional words are given processing priority compared to neutral words. Human emotions can be conceptualised within a two-dimensional model comprised of emotional valence and arousal (intensity). These variables are at least in part distinct, but recent studies report interactive effects during implicit emotion processing and relate these to stimulus-evoked approach-withdrawal tendencies. The aim of the present study was to explore how valence and arousal interact at the neural level, during implicit emotion word processing. The emotional attributes of written word stimuli were orthogonally manipulated based on behavioural ratings from a corpus of emotion words. Stimuli were presented during an fMRI experiment while 16 participants performed a lexical decision task, which did not require explicit evaluation of a word′s emotional content. Results showed greater neural activation within right insular cortex in response to stimuli evoking conflicting approach-withdrawal tendencies (i.e., positive high-arousal and negative low-arousal words) compared to stimuli evoking congruent approach vs. withdrawal tendencies (i.e., positive low-arousal and negative high-arousal words). Further, a significant cluster of activation in the left extra-striate cortex was found in response to emotional than neutral words, suggesting enhanced perceptual processing of emotionally salient stimuli. These findings support an interactive two-dimensional approach to the study of emotion word recognition and suggest that the integration of valence and arousal dimensions recruits a brain region associated with interoception, emotional awareness and sympathetic functions.

Several theoretical models of emotion have been proposed, including amongst others, models which propose a small number of universal underlying emotional states, i.e., discrete emotions such as joy, fear, etc. (see Levenson, 2011 for a review), appraisal models, which suggest that specific emotions are importantly influenced by appraisal processes which integrate the situational context of an event (see Ellsworth & Scherer, 2003), and dimensional models, which may be particularly useful for investigating the emotional processing of language.
Dimensional models suggest that emotion is best understood as occurring within a dimensional space, most commonly a two-dimensional space spanning valence and arousal. Emotional valence describes the extent to which an emotion is positive or negative, whereas arousal refers to its intensity, i.e., the strength of the associated emotional state (Feldman Barrett & Russell, 1999; Bradley, & Cuthbert, 1997;Russell, 2003). These models typically assume valence and arousal to be at least in part distinct dimensions (Feldman Barrett & Russell, 1999;Reisenzein, 1994). However, behavioural ratings of emotion word stimuli show that highly positive and highly negative stimuli tend to be more arousing  and negative stimuli are generally rated higher in arousal than positive stimuli (e.g., Citron, Weekes, & Ferstl, 2012). Support for a distinction between these two dimensions comes from neuroimaging studies that demonstrate dissociable cortical representations during processing of odours, tastes, and written words. Specifically, the orbitofrontal and ventral anterior cingulate cortices respond more to valence, whereas the amygdala and anterior insular cortex respond more to arousal (Colibazzi et al., 2010;Lewis, Critchley, Rotshtein, & Dolan, 2007;Posner et al., 2009;Small et al., 2003;Winston, Gottfried, Kilner, & Dolan, 2005).
Despite this evidence, some empirical work shows that valence and arousal affect processing of emotional stimuli in an interactive way (Robinson, Storbeck, Meier, & Kirkeby, 2004). The authors propose a model according to which stimuli with negative valence (e.g., bitter taste) or with high arousal (e.g., a loud noise) elicit a withdrawal tendency and corresponding mental set, because they represent a possible threat; in contrast, stimuli with positive valence (e.g., sweets) or with low arousal (e.g., a newsletter) elicit an approach tendency because they are perceived as safe. According to this account, these two tendencies are initiated independently at a pre-attentive level and subsequently integrated in order to evaluate the stimulus for further action (Robinson et al., 2004). Thus, positive low-arousal and negative high-arousal stimuli will be easier to process because they elicit congruent tendencies (approach and withdrawal, respectively), whereas positive high-arousal and negative low-arousal stimuli will be more difficult to process because they elicit conflicting approach-withdrawal tendencies. According to this model, these opposite tendencies are integrated at an implicit processing level, before explicit stimulus evaluation.
In a series of experiments, Robinson et al. (2004) asked participants to judge the emotional valence (positive vs. negative) of pictures as well as written words. The results showed consistent interactive effects of valence and arousal, whereby reaction times (RTs) were faster for stimuli eliciting congruent approach or withdrawal tendencies compared to stimuli eliciting conflicting tendencies (Robinson et al., 2004). In these studies, participants were asked to explicitly evaluate the emotional connotation of the stimuli. Thus, it is difficult to tease apart whether the interactive effects of valence and arousal are caused by truly automatic processes, i.e., implicit integration of approach-withdrawal tendencies, or instead by intentional stimulus evaluation as well as strategic processes.
To this end, Eder and Rothermund (2010) devised a task to assess the emotional evaluations of pictorial stimuli indirectly and observed the same interaction reported by Robinson et al. (2004). Further support for implicit interactive effects of valence and arousal during reading of emotionally-laden words comes from studies using a lexical decision task (LDT), i.e., decide whether a letter string is a real word or a not. This task allows assessment of the implicit processing of a word 0 s emotional connotation (Bayer, Sommer, & Schacht, 2012;Citron, Weekes, & Ferstl, under review;Hofmann, Kuchinke, Tamm, Võ, & Jacobs, 2009;Larsen, Mercer, Balota, & Strube, 2008). Slower LD latencies are reported for words eliciting conflicting approach-withdrawal tendencies compared to words eliciting congruent tendencies.
Neural evidence for interactive effects of valence and arousal comes from studies showing modulation of the amplitude of emotion-related event-related potential (ERP) components during the implicit processing of emotional pictures (Feng et al., 2012) as well as words (Citron, Weekes, & Ferstl, 2013;Hofmann et al., 2009) (but see Bayer et al., 2012 for distinct ERP effects of the two emotional dimensions).
The aim of the present study is to test for interactive effects of valence and arousal on regional neural activity, in order to identify which brain regions are responsible for the implicit integration of approach-withdrawal tendencies during reading of emotionallyladen words. This is the first hemodynamic neuroimaging study to explore the interaction rather than the dissociation of emotional dimensions. In fact, previous functional magnetic resonance imaging (fMRI) studies have tested for the dissociation of brain activation between valence and arousal dimensions and employed either tasks requiring explicit and deep processing of a word 0 s emotional content (Colibazzi et al., 2010;Posner et al., 2009) or self-referential processing, which tends to evoke a bias toward "yes" responses to positively valenced words, which possibly enhances the processing of these trials (Lewis et al., 2007). Such studies support the multidimensional account of emotion processing, but do not speak to the interrelationship between valence and arousal during implicit emotion processing in reading.
In an event-related fMRI design, we presented participants with written positive and negative words, high or low in arousal, and neutral words. Stimuli were intermixed with non-words and participants performed a LDT, thus evoking implicit emotion processing. According to Robinson et al. 0 s model and the extant supportive empirical evidence, we predicted slower LD latencies, lower accuracy and enhanced BOLD signal response for words eliciting conflicting approach-withdrawal tendencies (i.e., positive high-arousal and negative low-arousal words) compared to words eliciting congruent tendencies (i.e., positive low-arousal and negative high-arousal words). More specifically, we expected enhanced BOLD responses in the insula and/or ACC. In fact, the former subserves affective/interoceptive awareness, i.e., integration of bodily sensations and cognitive, evaluative processes (Brooks, Zambreanu, Godinez, Craig, & Tracey, 2005;Craig, 2009;Critchley, Wiens, Rotshtein, Ohman, & Dolan, 2004), whereas the latter is associated with error detection (Botvinick, Nystrom, Fissell, Cater, & Cohen, 1999) and conflict processing (Kanske & Kotz, 2011;Ullsperger, Harsay, Wessel, & Ridderinkhof, 2010). Further, both insula and ACC show activation when the task requires a minimum degree of processing depth (as required by the LDT) (Phan, Wager, Taylor, & Liberzon, 2002). We also more generally predicted better performance and enhanced activation of emotion-related brain regions in response to emotionally-laden words compared to neutral words. Further, we predicted faster LD latencies, higher accuracy and enhanced activation of the classical lexico-semantic neural network in response to words compared to non-words (cf. Fiebach, Friederici, Mueller, & von Cramon, 2002;Price, 2012).

Participants
Nineteen native British English-speakers from the University of Sussex (10 women, 9 men), aged between 18 and 37 years (mean 7 SD ¼23.7 7 5.6 years) took part in the experiment. They were all right-handed with normal or correctedto-normal vision, had no learning disabilities and took no medication for mood disorders. Participants either received course credits or were paid d10 for their participation. They all gave written informed consent before participating. Three participants were excluded from the fMRI analyses during image processing due to head movement artefacts exceeding 3 mm. Due to failure to record behavioural data from other two participants, only seventeen participants were included in RT and accuracy analyses.

Word selection and manipulation
One-hundred and seventy-five words were selected from a corpus of English words , containing subjective ratings for affective featuresemotional valence, arousaland linguistic or more specifically lexico-semantic characteristicsword familiarity, age of acquisition (AoA) and imageability. Sevenpoint Likert scales were used to quantify the different variables and the extremes were labelled as follows: valence ranged from À 3 (very negative) to þ 3 (very positive); arousal, familiarity and imageability were scaled from 1 (not at all) to 7 (very high); for AoA, age ranges in years were given: 0-2, 2-4, 4-6, 6-9, 9-12, 12-16, older than 16, subsequently recoded in 1-to-7 points. The absolute values of emotional valence were used to form an additional variable called "emotionality". This variable gives a measure of valence that is independent of the direction of the rating (positive versus negative) and thus provides an absolute measure of the rated emotionality of a word. Length in letters, phonemes, syllables and frequency (spoken and written) were taken from the web-based CELEX (Max Planck Institute for Psycholinguistics, 2001). Written word neighbourhood size (N-size) and frequency (N-frequency) values were taken from the ELP database (Balota et al., 2007). N-size reflects the number of words generated by changing one letter of the target word and N-frequency reflects the number of words that share letters with the target word.
The two subjective ratings of interest to testing our hypotheses are emotional valence and arousal. These ratings were used to select 35 positive high-arousal (PH), 35 positive low-arousal (PL), 35 negative high-arousal (NH) and 35 negative low-arousal words (NL). In addition, 35 neutral words were selected, whose arousal level was comparable to the level of low-arousal valenced words.
Descriptive statistics for the 5 conditions are presented in Table 1. Words in all 5 conditions were matched for rated imageability, length in letters, phonemes and syllables, logarithm of frequency of use, word N-size and also N-frequency (Fs (4,170)o2.23, ns). Positive and negative high-arousal words were matched for emotionality and for arousal ratings; similarly, positive and negative low-arousal words were also matched (all ts(68) o 2.02, ns). There was no linear correlation between ratings of emotional valence and arousal (r¼ À0.10, ns), but a strong quadratic correlation (r 2 ¼ 0.60, p o 0.0001), i.e., a correlation between emotionality and arousal. Thus, valence and arousal were manipulated in an orthogonal design.

Pseudoword selection
One-hundred and seventy-five pseudowords were selected from the ARC nonword database (Rastle, Harrington, & Coltheart, 2002). Pseudowords are nonexistent words in English that nevertheless follow the orthographic and phonological rules of English. Length of pseudowords ranged between 3-10 letters and 2-8 phonemes. Pseudowords were matched with the 175 words for length in letters t (316.11)¼ 0.28, ns and number of phonemes t(302.21) ¼1.32, ns.

Procedure
The experiment was conducted at the Clinical Imaging Sciences Centre (CISC) at the University of Sussex. The experiment was programmed in Matlab using the Cogent toolbox (Wellcome Laboratory of Neurobiology, http://www.vislab.ucl.ac. uk/cogent.php). Stimulus order and timings were optimised to maximise the statistical efficiency of the task design by using OPTSEQ2 (Dale, 1999) which created a randomised sequence of experimental conditions and null events of varying durations (i.e., jittered). Using this sequence template, 4 different string (word or pseudoword) orders were implemented. The 385 experimental trials lasted 3300-5000 ms, and additional 166 null events lasted 3315-24061 ms.
Participants gave informed consent for the fMRI procedure, following written and oral instructions on how to perform the task. A structural image scan lasting approximately 5 min was acquired before the main experiment. At the beginning of the experiment, 3 filler letter strings were presented, that were later excluded from the analysis. The experiment was divided into 3 sessions containing 196, 196 and 197 events each (fillers, strings, null events). In between sessions, the scanner was stopped and participants had a few minutes to rest.
Each trial began with a central fixation cross, visible for 1300-2999 ms (jittered interval length). Subsequently, a string appeared for 250 ms, followed by a 100-ms blank screen, then by a question mark, which prompted a response and remained present until a response was given. Participants were required to read the letter strings and decide whether the stimulus was an English word or not, as accurately and as quickly as possible. A response pad with two buttons corresponding to "yes/ no" answers was provided and the button configuration was counterbalanced across participants. A fixed time interval of 1650 ms between the onset of the question mark and presentation of the next trial was used to ensure that the trial duration was at least 3300 ms (corresponding to the TR). The mean trial length was 3802 ms (SD ¼288, range¼ 3300-4299 ms). Overall, the experiment lasted approximately 1 h and 40 min, including preparation, structural scanning, 55 min of functional scanning time and debriefing. Approximately 1000 functional volumes per participant were acquired.

MRI data acquisition and preprocessing
Hemodynamic responses were acquired by means of a 1.5 T scanner (Siemens Avanto) with a standard head matrix coil. For each participant, full-brain, T1-weighted structural scans were acquired: 192 slices, 0.9 mm thick with a 151 flip angle, 0.9 mm isotropic voxels without gap, MPRAGE, TR 11.6 s, TE 4.4 s, 300 ms inversion time, 250 Â 250 matrix per slice. For functional images, 36 slices were acquired, 3 mm thick with 901 flip angle, 3 Â 3 Â 3.75 mm voxels with gap, TR 3300 ms, TE 50 ms, 64 Â 64 mm matrix per slice.
Image processing and statistical analyses were performed using SPM5 (Wellcome Trust Centre, http://www.fil.ion.ucl.ac.uk/spm/), employing spatial realignment and sequential coregistration (6-parameter rigid body spatial transformation). Structural images were segmented into grey and white matter and cerebrospinal fluid (CSF) and iteratively normalised to standard space (Montreal Neurologic Institute, MNI). Transformation parameters for structural images were then applied to functional images. Subsequently, functional volumes were spatially smoothed with an 8-mm Gaussian kernel to adjust for betweenparticipants anatomical differences. The first 5 functional volumes were discarded to allow for equilibration of net magnetisation. In order to detect further movement artefacts after realignment, the software ArtRepair (http://cibsr.stanford.edu/pub lications/publications.htm) was used (z threshold¼ 11, movement threshold¼3) and additional movement regressors for outliers were created.

Behavioural data
Lexical decision latencies and accuracy were analysed by means of 3 different designs: lexicality (words, pseudowords), emotionality (neutral, positive, negative) and valence (positive, negative) by arousal (high, low). As a standard in psycholinguistic research (cf. Clark, 1973), we conducted analyses by participant (1 subscripted), in which the raw data are averaged within each experimental condition and compared in a within-subjects design, as well as analyses by item (2 subscripted), in which the data points for each single stimulus are averaged across participants and the words belonging to each condition are compared in a between-subjects design. More specifically, we used t-tests or ANOVAs depending on the number of levels for each factor. For the lexicality design, we used onedirectional t-tests. If the main emotionality effect and the valence by arousal interaction were significant, one-directional planned contrasts between neutral vs. emotionally-valenced words and between words eliciting conflicting vs. congruent tendencies were performed. In case of violation of the sphericity assumption, we used the Greenhouse-Geiser correction and in case of inhomogeneity of variances, we used Welch statistics. Only correctly-responded trials were included in RT analyses and, for each participant, outlier correction of RTs7 3 SDs was applied. A significance level of Po 0.05 was used.
2.5.2. Neuroimaging data A general linear model was used in an event-related design. Hemodynamic responses were time-locked to the stimulus onset and convolved with the canonical hemodynamic response function of SPM5. Six separate regressors were used to model each condition: pseudowords, PH, PL, NH, NL and neutral words. In order to account for signal changes not related to the conditions of interest, six head movement regressors were added as covariates. For some participants, additional artefact regressors, created with the ArtRepair toolbox, were added to the model.
As with the behavioural analyses, lexicality, emotionality and valence by arousal factorial designs were employed for the imaging data, by defining T-contrasts for each participant. For the lexicality design, words were contrasted with pseudowords. For the emotionality design, valenced words were contrasted with neutral words; in addition, positive and negative words were separately contrasted with neutral words. For the valence by arousal design, main effects were tested by contrasting positive and negative words, as well as high-and low-arousal words. The interaction between factors was tested by contrasting PH and NL words with PL and NH words. Further pair-wise comparisons were also performed. At the second (group) level analysis, one-sample t-tests in both directions were performed using the contrast images created at the first (single-participant) level. For significance levels, a voxel-level threshold of P o0.001 uncorrected was chosen, along with a cluster-level threshold of Po 0.05, corrected for family-wise error (FWE).

Behavioural results
Mean accuracy overall was 97%. Descriptive statistics are reported in Table 2 and displayed in Fig. 1a.

Lexicality
Several brain regions were significantly activated for the contrast words4 pseudowords (refer to Table 3 for a detailed list). Increased activations for words were found in the left inferior and superior frontal gyri (IFG, SFG) and in the left dorsomedial prefrontal cortex (dmPFC). Clusters of activation were also found bilaterally within the middle and superior temporal gyri (MTG, STG), and in the right middle cingulate cortex (CC). These areas are known to be part of a general language network (cf. Ferstl, Neumann, Bogler, & von Cramon, 2008), but more specifically, they are activated in response to the retrieval of lexical and semantic word representations (Fiebach et al., 2002;Price, Wise, & Frackowiak, 1996).

Emotionality
The contrast between emotionally-valenced and neutral words (positiveþnegative4neutral words) revealed a cluster of significant activation in the right superior occipital gyrus (SOG) and cuneus, both part of the extra-striate cortex (see Table 4 and Fig. 2). A similar cluster was also significant for negative words compared to neutral ones, whereas activation of the SOG did not reach corrected cluster level significance in the contrast positive vs. neutral words.

Valence by arousal
A significant cluster of activation in the right insula extending to the superior temporal gyrus (STG) was observed in response to the interaction between valence and arousal (PH þNL4PL þ NH). As can be seen in Fig. 3a, this region showed increased activation for PH and NL conditions, stronger for the former, and very little response to PL and NH conditions. A second cluster within the left posterior insula did not reach corrected cluster level significance (T ¼6.12, p ¼0.078). No main effects of either valence or arousal were found.
Further planned pair-wise comparisons extended the results found for the interaction by showing a bigger cluster of activation in the right insula and STG for the contrast PH4 PL (see Table 4). Again, a cluster of activation in the left posterior insula was observed at a level just below corrected cluster level significance. In addition, a significant activation of the left parahippocampal gyrus was found, with increased activation for PH, but no response to PL (see Fig. 3b). No other pair-wise comparisons showed significant clusters of activation. Nevertheless, clusters in the left parahippocampal gyrus and in right STG were visible for the contrast PH 4NH and a cluster in the right pulvinar of the thalamus was apparent for the contrast NL4NH (see Table 4).

Post-hoc collection of behavioural data from an independent sample
The most important result of our study, namely the interaction between valence and arousal dimensions in insular cortex, was Hemi. ¼hemisphere, L¼ left, R¼ right; cluster size is in voxels, T ¼ peak t-value; x, y, z¼ MNI stereotactic space coordinates. n Significant clusters (with correction).
supported by a similar interactive pattern in the accuracy rates, but not in the reaction times. In our view, behavioural data collected in the scanner might be more noisy and have higher variance than data collected while sitting in front of a computer screen, because of greater fatigue (e.g., scanner noise, movement constraints, etc.). Therefore, we decided to conduct a second, independent behavioural replication study. 1

Participant sample
Eighteen native English-speakers living in the Berlin area (8 women, 10 men), aged between 18 and 30 years (mean7 -SD ¼23.9 73.2 years) took part in the experiment. Participants came from different English-speaking countries. They all had normal or corrected-to-normal vision and 16 of them were right-handed. Participants were paid 5€ for their participation. They all gave written informed consent before participating.

Methods
The experiment was conducted in a quiet room, where participants sat in front of a computer screen. They responded by pressing two buttons highlighted on the keyboard. All other details regarding the programming of the experiment, the timing of stimulus presentation, the randomisations, as well as the data analyses, are identical to the ones in the original experiment.

Group comparison
The Berlin sample showed high mean accuracy overall (97%), not different from the accuracy of the Sussex sample (t 1 (33)¼ 0.02, ns), but significantly faster RTs (t(25.83)¼ 5.20, po0.0001) and much lower variance. Please refer to Table 2 for descriptive statistics.

Berlin sample: lexicality
As in the previous sample, words were responded to significantly faster (t 1 (17) ¼ 7.51, p o0.0001; t 2 (348) ¼12.38, p o0.0001) and more accurately (t 1 (17) ¼1.66, p ¼0.06; t 2 (321.60) ¼2.64, po 0.01) than pseudowords (see Table 2). The difference in accuracy was smaller than in the Sussex sample, possibly due to the fact that the Berlin sample came from a more heterogeneous language background. In fact, some participants expressed their awareness of the British spelling and the possibility of not knowing some specific British words.

Discussion
The present study examined how emotional valence and arousal affect hemodynamic brain responses during implicit emotion word processing, within a framework that predicts interactive effects (Robinson et al., 2004). To this end, we employed a lexical decision task and manipulated valence and arousal dimensions orthogonally, by controlling for other lexico-semantic variables that are known to affect written word recognition.
Our main finding, in line with our first hypothesis, was an interaction between the two dimensions of emotion, expressed via increased neural responses within right insular cortex to stimuli eliciting incongruent approach-withdrawal tendencies (PH and NL words, e.g., rollercoaster and weak, respectively) compared to stimuli eliciting congruent approach vs. withdrawal tendencies (PL and NH words, e.g., flower and bomb, respectively). In addition, pairwise comparisons showed increased activation in the very same region as well as in the left parahippocampal gyrus for the contrast PH4 PL. The interaction at the neural level was supported by behavioural data showing slower LD latencies and lower accuracy for stimuli eliciting conflicting orientations, in line with previous behavioural results (Citron et al., under review;Robinson et al., 2004).
Insular cortex, and more specifically anterior insula, is responsible for the integration of afferent information about the physiological state of the body with on-going cognitive and evaluative processes (Critchley et al., 2004;Gray, Harrison, Wiens, & Critchley, 2007). Initial somatic afferent representations within the posterior insula may underlie consciously accessible feeling states following integrative processing within anterior insula regions: this integration gives rise to emotional awareness Regions showing significant BOLD signal changes to emotional words compared to neutral words. Less decrease in activation for emotional words was found significant in the left extra-striate cortex. A small cluster of activation is also visible in the right homologous region, even though not significant. The left diagram shows the signal change (ß values) for positive, negative and neutral words; these conditions are broken down by valence and arousal in the right diagram.
In our study, implicit integration of conflicting approachwithdrawal tendencies elicited by specific emotionally-laden stimuli was processed in the right insula, consistent with increased sympathetic arousal, and suggesting that these stimuli demand more energy in order to be processed. Further, the proposed functional role of insula suggests that both automatic reaction tendencies and cognitive stimulus evaluation were involved.
Our finding lends support to the multidimensional model proposed by Robinson et al. (2004) and confirms previous findings of interactive effects of valence and arousal during implicit emotion processing (Citron et al., under review;Eder & Rothermund, 2010;Feng et al., 2012;Hofmann et al., 2009;Larsen et al., 2008). Moreover, this result extends previous behavioural and ERP findings by suggesting a possible functional neural correlate of the integration of pre-attentive approach-withdrawal tendencies elicited by salient stimuli, namely the right insula.
The model proposed by Robinson et al. (2004) differs from previous models in that it predicts approach vs. withdrawal tendencies for low vs. high arousal independently of whether the stimulus is positive or negative. For example, other models of emotion processing predict appetitive behaviour toward highly  Table 4 for exact MNI coordinates. Diagrams of increase or decrease in activation (ß values) are reported for the 4 conditions. Error bars represent standard errors of the mean.
arousing positive stimuli such as sugary food or sexually attractive pictures, given that they are positive in valence (e.g., Lang et al., 1997;Lang, Bradley, & Cuthbert, 1999). The prediction of conflicting approach-withdrawal tendencies for PH stimuli made by Robinson et al. is based on distinct effects of valence and arousal dimensions and cannot be predicted by previous models, which associate increase in (positive or negative) valence with a necessary increase in emotional arousal Lang et al., 1997). In this respect, empirical support for partial distinction (e.g., Ito, Larsen, Smith, & Cacioppo, 1998) and a non-perfect correlation between these two variables  allows room for interactive effects, that can only be predicted by an interactive model. As an example, rollercoaster represents something positive and exciting one might want to approach, but also something very intense, that might elicit withdrawal.
The interaction between emotional variables elicited no activation in the ACC. This null finding is consistent with the notion that the ACC is typically engaged by emotional (or cognitive) conflict elicited by the task requirements (cf. Kanske & Kotz, 2011;Ullsperger et al., 2010); for example, in the go/no-go or in the Stroop task, the participant needs to put effort in order to avoid a strong, automatic response tendency and to successfully perform the task. In such situations, the individual is aware of the conflict and explicitly acts in order to solve it (Ullsperger et al., 2010). This importantly differs from the type of conflict we propose in our study, which is induced by the implicit integration of conflicting stimulus-driven response tendencies to emotional valence and arousal.
Besides the insula, the anterior portion of the right superior temporal gyrus (STG), also referred to as anterior temporal lobe (ATL), also showed enhanced activation in the interaction as well as in the PH4PL contrast. Bilateral ATL activation is associated with semantic/conceptual categorisation (Rogers et al., 2006) as well as comprehension of coherent, comprehensible text (Ferstl et al., 2008). More specifically, right ATL showed enhanced activation during comprehension of emotionally and chronologically inconsistent stories compared to consistent ones (Ferstl, Rinck, & von Cramon, 2005), thus suggesting a possible role of this region in making sense of emotionally incongruent information. Besides language, ATL is also involved more generally in social-emotional cognition (cf. Wong & Gallate, 2012); for example, the right anterior STG contributes to encoding facial expressions, as it responds to dynamic changes in face features (Haxby, Hoffman, & Gobbini, 2002) and is more strongly activated in response to judgment of emotion from facial expression than to simple face detection (Streit et al., 1999). Thus this region is also involved in decoding the emotional content of visual information.
Activation of the parahippocampal gyrus in response to PH compared to PL words was also not specifically predicted but is not surprising; in fact, this region is part of the Papez circuit, one of the major pathways of the limbic system, involved in the cortical control of emotion, as well as in maintaining novel information in working memory (cf. Bear, Connors, & Paradiso, 2006;Hasselmo & Stern, 2006). PH words elicited the highest increases in activation in the right insula, both in the contrast and in the interaction. These words, even though matched for emotionality and arousal level with the NH words, might represent the most strongly emotionally-laden words in our study. In fact, negative stimuli are naturally more intense than positive ones (cf. Citron et al., 2012). In our manipulation, highly arousing negative words such as rape, war or death had to be excluded in order to obtain a good matching with the corresponding positive words.
A second important finding, in line with our general second hypothesis, was significant activation within the left extrastriate cortex in response to emotionally-valenced words compared to neutral words, in line with previous research (Compton et al., 2003;Herbert et al., 2009). This result is interesting for two reasons: First of all, it suggests enhanced processing (or stronger attention capture) of emotionally salient stimuli (Compton et al., 2003;Kissler et al., 2007) in regions that are functionally associated with perceptual, i.e., visual processing; Further, source localisation techniques found the left extrastriate cortex to be the source of an early ERP component (200-300 ms, with posterior scalp distribution) associated with implicit emotion word processing, namely the early posterior negativity (EPN; Kissler et al., 2007). Thus, the emotional connotation of verbal stimuli seems to modulate not only emotion-related processes, but also early, perception-related processes, during reading.
The fact that no clusters of activation for the contrast emotio-nal4 neutral in emotion-related cortical areas (e.g., OFC, amygdala) were found might be due to the relatively small difference in arousal level between the two conditions. Our neutral words had the same arousal level as half of our valenced words and, as can be seen in the bottom right diagram of Fig. 2, the effect was mainly driven by high-arousal valenced words (i.e., PH, NH), which showed the least decrease of activation in this region. Previous studies reporting OFC activation compared highly arousing positive and negative words with neutral ones (e.g., Kuchinke et al., 2005). Further, OFC activation is typically elicited by tasks that require deep encoding of the emotional material, such as valence decision (Dolcos, LaBar, & Cabeza, 2004;Small et al., 2003), associated with retrieval of emotional memories (Posner et al., 2009), or self-referential tasks (Lewis et al., 2007).
Activation of the amygdala has also been typically reported in studies employing highly arousing positive and negative words (Hamann & Mao, 2002;Lewis et al., 2007) and low-arousal neutral words (Hamann & Mao, 2002). Further, this region is associated with perceptual processing of emotionally-laden material (Garavan, Pendergrass, Ross, Stein, & Risinger, 2001) and its activation may be attenuated or suppressed by cognitively demanding tasks (Phan et al., 2002). In fact, amygdala activation was absent in Kuchinke et al. (2005), who employed a LDT, but was instead reported by Herbert et al. (2009) during silent reading. In our study, the subtle manipulation of arousal levels within valenced words as well as between valenced and neutral words, along with the employment of a LDT, might have hindered amygdala activation.
As an additional note, we would like to mention the fact that single words elicit little BOLD-signal response compared to emotional pictures (Citron, 2012) and only a few neuroimaging studies on implicit emotion word processing exist in the literature (i.e., Herbert et al., 2009;Kuchinke et al., 2005;Lewis et al., 2007). See also Schlochtermeier et al. (2013) for a detailed investigation of the effects of the type of emotional material (verbal vs. pictorial) and its visual complexity.
At the behavioural level, we report better performance for valenced than neutral words, in line with previous literature (e.g., Kousta et al., 2009). Furthermore, faster RTs and higher accuracy for positive than negative words are in line with Kuchinke et al. (2005), who interpreted this effect in light of a more interconnected network of lexical and semantic representations for positive words, therefore making their processing easier (cf. Ashby, Isen, & Turken, 1999). Finally, the lack of an arousal effect within positive and negative words can again be attributed to the relatively small difference in arousal level between high-and low-arousal valenced words. Furthermore, arousal effects within valenced words are not well-established at the behavioural level (cf. Bayer et al., 2012;Hofmann et al., 2009).
The behavioural data collected outside of the scanner showed stronger and more consistent effects than the data collected while scanning, i.e., most effects were significant in both participant and item analyses and the interactive pattern between valence and arousal was replicated in the reaction times, beyond accuracy rates. These results are not surprising, given the fact that performance in the scanner might be affected by a number of factors (e. g., noise, strain, movement constraints) and therefore lead to increased noise and variance in the data. In fact, the Sussex sample showed significantly slower RTs and much larger variance than the Berlin sample. Accuracy in a LDT is typically very high and in this study it was not affected by the environmental conditions.
A possible limitation of our 2 Â 2 design is the fact that the interactive pattern of effects may be driven by other variables. For example, the fact that the difference in BOLD response between PH and PL conditions is larger than the difference between the corresponding negative conditions, along with the significant pairwise comparison, may suggest that the interactive effect found is actually driven by an arousal effect within positive words. We cannot currently rule out this interpretation, but only suggest that future confirmation of this pattern within positive and negative stimuli is needed. One promising approach would be to use a more naturalistic word selection. In particular, selecting highly arousing negative wordsthat do not need to match positive words in arousal levelmight induce a larger difference between high-and low-arousal valenced words. In fact, first results of a pilot study using German words yields a clear interactive pattern, weighting on both positive and negative words.
Finally, our experimental design could also be used to conduct research on mood disorders, such as anxiety or depression, that typically disrupt processing of emotionally salient stimuli (cf. Mathews & MacLeod, 1994)

Conclusions
The present study provides new empirical evidence from fMRI in support of a multidimensional, interactive model of emotion processing (Robinson et al., 2004), whereby valence and arousal dimensions affect the processing of emotional stimuli interactively, i.e., positive high-arousal and negative low-arousal words elicit conflicting approach-withdrawal tendencies and therefore require more processing resources than positive low-arousal and negative highly arousing words, that elicit congruent approach vs. withdrawal tendencies, respectively.
Our findings add to previous research showing that interactive effects arise during implicit processing of a stimulus 0 emotional content (e.g., Eder & Rothermund, 2010;Feng et al., 2012) and propose for the first time a specific neural correlate reflecting the integration of pre-attentive approach-withdrawal tendencies elicited by salient stimuli, namely the right insular cortex.
Finally, our findings support the claim that emotional variables affect even highly abstract cognitive processes such as reading, beyond other well-known lexico-semantic variables, and this could have implications for formal education in school, as well as the assessment and diagnosis of mood disorders.