When language gets emotional: irony and the embodiment of affect in discourse. Acta

Although there is increasing evidence to suggest that language is grounded in perception and action, the relationship between language and emotion is less well understood. We investigate the grounding of language in emotion using a novel approach that examines the relationship between the comprehension of a written discourse and the performance of affect-related motor actions (hand movements towards and away from the body). Results indicate that positively and negatively valenced words presented in context in ﬂ uence motor responses (Experiment 1), whilst valenced words presented in isolation do not (Experiment 3). Furthermore, whether discourse context indicates that an utterance should be interpreted literally or ironically can in ﬂ uence motor responding, suggesting that the grounding of language in emotional states can be in ﬂ uenced by discourse-level factors (Experiment 2). In addition, the ﬁ nding of affect-related motor responses to certain forms of ironic language, but not to non-ironic control sentences, suggests that phrasing a message ironically may in ﬂ uence the emotional response that is elicited.


Introduction
Despite the fact that intuitively, language can evoke strong emotional responses in the reader or listener, the relationship between language and emotion is poorly understood. Recent theoretical developments in grounded cognition (see e.g., Barsalou, 2010, for a review) provide a framework in which this relationship can be investigated. When applied to language, these theories claim that neural systems involved in nonlinguistic activities such as perception, action, and emotion are utilised during language comprehension. Specifically, it is assumed that the same modality-specific (sensorimotor) representations that are activated whilst interacting with the environment are re-enacted or 'simulated' when reading about a similar experience (e.g., Barsalou, 1999Barsalou, , 2008Crocker, Knoeferle, & Mayberry, 2010;Fischer & Zwaan, 2008;Glenberg, 2008;Glenberg & Gallese, 2012;Glenberg & Robertson, 2000;Zwaan, 2004). In the current paper, we investigate the grounding of language in emotion simulation, using a novel approach that examines the relationship between the reading and comprehension of a written discourse and the performance of affect-related motor actions.
There is increasing evidence to suggest that language-induced simulations play a vital role in text comprehension, particularly with respect to action and perception. For instance, it has been demonstrated that semantic sensibility judgements for action phrases such as aim a dart (Klatzky, Pellegrino, McCloskey, & Doherty, 1989), close the drawer (Glenberg & Kaschak, 2002), and turn down the volume (Zwaan & Taylor, 2006; see also Taylor, Lev-Ari, & Zwaan, 2008;Zwaan, Taylor, & de Boer, 2010) are produced faster if the motor response to make the judgement matches the movement direction implied by the phrase (e.g., turning a dial anticlockwise as opposed to clockwise when judging the phrase Eric turned down the volume). These studies suggest that comprehending actionbased language can influence the performance of related actions. Interestingly, other work has shown that performing certain actions can also influence language comprehension (Glenberg, Sato, & Cattaneo, 2008).
Similar findings have been obtained with respect to sentences that may evoke perceptual simulations (imagery). For example, after reading a sentence like The ranger saw the eagle in the sky, participants are faster to recognise a picture of an eagle with extended wings than with folded wings, suggesting that reading the sentence resulted in a perceptual representation of an eagle in flight (Zwaan, Stanfield, & Yaxley, 2002; see also Kaschak et al., 2005;Kaup, Yaxley, Madden, Zwaan, & Lüdtke, 2007;Solomon & Barsalou, 2004;Vandeberg, Eerland, & Zwaan, 2012;Zwaan & Pecher, 2012;Zwaan & Yaxley, 2004). Recent research also suggests that prior exposure to an object in a particular orientation which mismatches with the orientation implied in a subsequently presented sentence can produce disruption to reading as evidenced in both eye-tracking (Wassenburg & Zwaan, 2010) and event-related brain potentials (Coppens, Gootjes, & Zwaan, 2012). Functional neuroimaging findings also point to the contribution of both perceptual and action-related simulations during language comprehension (e.g., Boulenger, Hauk, & Pulvermüller, 2009;Raposo, Moss, Stamatakis, & Tyler, 2009;Speer, Reynolds, Swallow, & Zacks, 2009).
In contrast, the role of emotional simulation during language comprehension is less well understood (see Glenberg, Webster, Mouilso, Havas, & Lindeman, 2009, for a review). In particular, although we typically encounter words in context rather than in isolation, very little is known about language-induced simulations of emotion in sentence and discourse comprehension as compared to single words. To our knowledge, so far only Havas, Glenberg, and Rinck (2007) have investigated the embodied conceptualisation of affective content in sentence comprehension versus isolated words (see Chwilla, Virgilito, & Vissers, 2011, for mood-related influences on comprehension). Havas et al. (2007) report that covert manipulation of emotional facial posture (either an induced smile or an induced pout; cf. Strack, Martin, & Stepper, 1988) interacts with sentence valence when measuring both the amount of time to judge the emotional valence of a sentence (Experiment 1), and to judge whether the sentence is easy to understand, a task unrelated to emotion (Experiment 2). In each case, judgement times were faster when facial posture and sentence valence matched than when they mismatched (see also Havas, Glenberg, Gutowski, Lucarelli, & Davidson, 2010, for related evidence that sentence reading times for sad and angry sentences, but not happy sentences, are influenced by injection of Botulinum Toxin A into muscles that control frowning). Since Havas et al. (2007) found that facial posture did not influence RT to valenced words that were presented in isolation in a lexical decision task (Experiment 3), it seems unlikely that facial postures merely prime specific positively or negatively valenced words in the semantic memory system, thereby producing the observed RT effects. In light of their contrasting findings for isolated words, Havas et al. proposed that "simulation using emotional systems is predominantly a sentence-or phrase-level phenomenon" (p. 439). More specifically, in accord with the indexical hypothesis (Glenberg & Robertson, 1999), they assume that text comprehension and the processing of valenced words use simulations in the emotion system. It is fair to mention though that Havas et al. do not strictly exclude word-based simulation effects, arguing that these might be present for "words that directly name emotions (e.g. happy)" (p. 439) or for motor variables different from facial posture (e.g., approach-avoidance movements, to be discussed below). It is also possible that the tasks employed by Havas et al. in Experiment 1 (judging whether sentences described pleasant or unpleasant events) and 2 (judging whether sentences were easy or hard to understand) demanded a deeper level of semantic (conceptual and affective) processing than lexical decisions, and hence, differential task demands might have contributed to Havas et al.'s discrepant findings for affective words presented in isolation versus in a sentence context.
Other studies have also suggested that the observation of effects which could be attributed to the grounding of affect in motor actions may be task-dependent (e.g., Bamford & Ward, 2008;Van Dantzig, Pecher, & Zwaan, 2008;Wentura, Rothermund, & Bak, 2000). For example, Niedenthal (2007; see also Niedenthal, Winkielman, Mondillon, & Vermeulen, 2009) reported a study in which participants had to make affective or non-affective judgements about single words. Isolated valenced words generated emotion-specific facial activation as measured by electromyogram (EMG) recordings only in the emotionrelated task (e.g., muscles involved in smiling were activated when reading joyful words). When participants had to perform an emotionunrelated task, by judging whether the words were printed in upper or lower case, no such EMG effects were observed, suggesting that valenced words do not automatically prime associated facial expressions. According to Niedenthal (2007;Niedenthal et al., 2009), their findings support the view of task-dependent simulations in the emotion system, that is, emotional simulations are only recruited if they are required in order to perform the specific task.
In contrast, some studies have provided evidence in support of an automatic link between emotion evaluation and specific motor actions when the valence of the stimuli was task-irrelevant, as summarised in Table 1. Following a gradual, feature-based definition of automaticity (cf. Moors & De Houwer, 2006), the term "automatic" is used in the present paper to refer to a fast-operating process that is independent from evaluation goals (cf. Krieglmeyer, De Houwer, & Deutsch, 2013). Initial evidence for a relationship between isolated positively or negatively valenced words and particular muscle actions was obtained in studies that employed an affect-movement compatibility task (e.g., Chen & Bargh, 1999;Neumann, Hess, Schulz, & Alpers, 2005;Solarz, 1960). For example, Chen and Bargh (1999, Experiment 2) instructed participants to push or pull a lever as soon as they detected the presence of a word on the screen. Even though the task was unrelated to the emotional nature of the stimuli, participants were faster to pull the lever towards themselves for positive words and to push for negative words. In light of these findings, Chen and Bargh (1999) argued that positive and negative stimuli are automatically evaluated and linked in a fixed manner to specific approach-avoidance actions. That is, according to this muscle-specific motivational view, positive emotional stimuli automatically activate 'approach' tendencies, thus facilitating hand movements towards the participant's body (flexions), and negative emotional stimuli activate 'avoid' tendencies, thus facilitating hand movements away from the body (extensions) (e.g., Lang, 1995).
However, both the extent and nature of such automatic approachavoidance tendencies have been debated recently (for a review, see Krieglmeyer et al., 2013). As pointed out by Rotteveel and Phaf (2004), the low demands of the detection task might have allowed participants to evaluate stimulus valence. As a result, the affect-movement compatibility effects observed by Chen and Bargh (1999, Experiment 2) might reflect a non-automatic rather than an automatic effect. In support of this possibility, Rotteveel and Phaf (2004) failed to observe an affect-movement compatibility effect when participants judged, by making up (flexion) or down (extension) arm movements, a nonaffective stimulus dimension (gender) of faces displaying happy versus angry expressions. In contrast, the effect was clearly present when the task was to evaluate whether the facial expression was either happy or angry. These findings led Rotteveel and Phaf to assume that muscle-specific action tendencies (flexion vs. extension) depend on the conscious appraisal of affective stimuli.
However, it should be noted that Rotteveel and Phaf did not use linguistic stimuli, nor did their arm movements involve a change in the distance between self and affective stimulus that characterises approachavoidance movements (e.g., Markman & Brendl, 2005). Also in contrast to the arguments of Rotteveel and Phaf, more recent two-choice RT studies (Krieglmeyer, Deutsch, De Houwer, & de Raedt, 2010) showed an affect-movement compatibility effect for positively and negatively valenced words when participants performed (distance-changing and goal-independent) approach-avoidance responses based on a nonaffective stimulus feature (e.g., grammatical word category).
In summary, affect-related motor embodiment effects have been clearly demonstrated for the processing of isolated valenced words when the task itself is emotion-related, whereas evidence in favour of such effects is somewhat mixed when evaluation of the emotional content of the target word is not required (cf. Table 1). It is difficult to point to a single factor that would explain this inconsistency in findings, specifically, as the reviewed studies differ with respect to tasks, materials, and response conditions. It is further evident from Table 1 that the processing of valenced words in context has received little attention so far, which is surprising given the fact that emotion simulation is assumed to be contextualised. In this respect, the Havas et al. (2007) study is exceptional in that they demonstrated that facial posture influences emotion comprehension for words presented in a sentence context, but not for words presented in isolation. Yet, as pointed out earlier, this differential effect might be related to possible differences in task or processing demands for sentence sensibility versus lexical decision judgements, with the former potentially affording conscious affective evaluations. Consequently, in the current paper, our main aim is to further investigate the influence of context on emotion simulation.
Since studies using approach-avoidance responses have revealed automatic affect-movement compatibility effects even to single valenced words (cf. Table 1), this methodology appears well suited to test the notion of context-dependent emotion simulations during text comprehension. Crucially, it provides the opportunity to keep the task the same regardless of whether participants are presented with words in context, or in isolation. To this end, we use a novel approach that combines the comprehension of a written discourse with a variant of the affect-movement compatibility task, in which participants produce approach-avoidance movements to an affect-irrelevant stimulus dimension (word colour) by pushing or pulling a lever (see Kaup, 2011, andBrookshire, Ivry, &Casasanto, 2010, for a related paradigm). As participants are not explicitly asked to evaluate the emotional content of the text, a goalindependent affect-movement compatibility effect would be demonstrated by faster pull than push responses to positive materials and faster push than pull responses to negative materials. More specifically, then, if this effect would be obtained for affective target words only when embedded in context but not when presented in isolation, this outcome would corroborate Havas et al.'s (2007) findings. Importantly, if we would further demonstrate that the nature of the affect-movement compatibility effect depends on the wider discourse context in which the target sentence appears, this finding would even more strongly support the view that readers produce context-dependent emotion simulations during text comprehension. This is because different contexts would allow us to establish a situational frame for the interpretation of the target sentence. Specifically, the content of the target sentence would remain identical across conditions, but would be interpreted differently, depending on context.
Firstly, we aim to establish whether an automatic affect-movement compatibility effect can be observed using this task for words presented in a context which affords the positive or negative valence of the target word (Experiment 1). We then further explore contextual effects by examining whether emotional simulations can be modulated by the wider discourse context in which the sentence appears, specifically, by using context to determine whether the target word is intended literally or ironically (Experiment 2). Finally, we assess whether this effect is obtained when valenced words are presented in isolation (Experiment 3).

Experiment 1
Experiment 1 employed materials in which the valence (positive vs. negative) of the final word was manipulated, as in the following examples: 1. David was out doing some last minute shopping. It was only two days until Christmas. 2. The coastguard's attention was caught by the woman in the white dress. She was very clearly in distress.
The second sentence of each material was presented one word at a time and participants had to respond to the colour of the final word by pushing or pulling a slider (cf. Fig. 1). Specifically, 500 ms following word onset, the final word changed from white to either a 'bluish' or 'greenish' colour. Depending on which group participants were Table 1 Overview of studies examining the automaticity of the influence of valenced words (or faces with emotional expressions in the case of Rotteveel & Phaf, 2004)  assigned to, they had either to pull the slider towards themselves if the word was bluish, and to push it away if the word was greenish, or viceversa. Thus, participants were responding to a stimulus dimension that was irrelevant to the emotional content. If participants produce faster pull than push responses in the case of a positive word (e.g., Christmas) and faster push than pull responses in the case of a negative word (e.g., distress), this affect-movement compatibility effect would support Havas et al.'s (2007) argument that emotion simulation during text comprehension occurs even when participants are not explicitly asked to evaluate the emotional content of the text.

Participants
Forty-eight participants completed the experiment to gain course credits for the undergraduate Psychology degree at the University of Glasgow. They comprised 12 males and 36 females. The age range of the participants was between 17 and 35 years (M = 21 years, SD = 3.1 years). The native language of all of the participants was English. Participants were assessed on the 'Edinburgh Handedness Inventory' (Oldfield, 1971). The mean score was 0.69; 44 participants were righthanded and four were left-handed.

Apparatus
Participants viewed materials on a computer monitor. Positioned to their right was a response device to record continuous movements in the horizontal plane, consisting of a metal platform, where a slider with an attached handle could be moved along a 200-mm straight track (see Ulrich et al., 2006, for a photograph). The start position was located 100-mm away from each end point. A spring kept the slider in the start position and a force of~14.0 N was required to move the slider towards each end point. At the start point, touch-sensitive keys recorded movement onset, that is, reaction time was recorded when the slider began to be moved from its start position.

Materials and design
For this experiment, 80 materials were created, 40 with positively valenced final words (e.g., Christmas, as in 1), and 40 with negative final words (e.g., distress, as in 2). Materials were designed so that the context readily afforded the intended positive or negative nature of the valenced target word. Some of the materials included positive or negative events in the context (approximately half of the positive materials and three quarters of the negative materials), and the remainder did not indicate that the described situation was positive or negative until the target word was encountered. The sentence-final words were selected from the Affective Norms for English Words (ANEW) database (Bradley & Lang, 1999), and from the stimuli used by Meier and Robinson (2004). A fresh sample of 40 participants rated the valence of the words used in both Experiments 1 and 2 (for a statistical analysis, see Experiment 2). Instructions followed the ANEW procedure. That is, participants indicated how they felt whilst reading each word using a scale from 1 to 9, with 1 indicating that they feel completely unhappy, annoyed, unsatisfied, melancholic, despaired, or bored, and 9 indicating that they feel completely happy, pleased, satisfied, contented, or hopeful. The mean valence for positive words used in Experiment 1 was 7.40 (SD = 1.43) and for negative words 2.22 (SD = 1.36).

Procedure
The experiment started with 36 practice trials to familiarise participants with discriminating the colours and with operating the slider, in order to reduce the variance in reaction and movement times as much as possible. Each practice trial began with a white fixation point which was presented in the centre of the screen for 600 ms, after which it was replaced by a 'bluish' or a 'greenish' square which was displayed until response onset. Instructions referred to the mapping of square colours to push-pull responses (see Appendix A). Squares were used instead of words since the aim of the practice trials was simply to practice performing the slider movement as a function of stimulus colour. Half of the participants were instructed to pull the slider if the square was bluish, and to push if the square was greenish, and the other half had to pull the slider if  After reading the context sentence, participants initiated the word-by-word presentation by a button press. The final word was displayed in white for 500 ms and then turned bluish or greenish, upon which participants were to perform a push or pull response as a function of word colour. the square was greenish, and push if the square was bluish. Participants were instructed to respond as quickly and as accurately as possible. After participants repositioned the slider in the centre (start) position, the next trial started.
There were then six further practice trials in which participants were presented with sample sentence materials in order to familiarise them with the rapid serial visual word presentation (RSVP) procedure in combination with the embedded colour discrimination task (see Appendix B).
As can be seen in Fig. 1, each trial started with the presentation of the context sentence of a sentence pair. The context sentence was presented in white Helvetica 14-point font. Participants pressed the spacebar when they had finished reading it. A blank interval of 500 ms followed, after which the word-by-word presentation of the second sentence started. Words were presented in white Helvetica 16-point font. Participants were asked to maintain fixation at the centre of the screen. Except for the final word, each word was displayed centrally for 300 ms, with 200-ms blank intervals between successive word presentations. The last word was presented in white for 500 ms, after which it was displayed in the 'bluish' or 'greenish' colour that was used for the square in the initial practice trials. At this point the participant was required to respond with either a push or pull movement of the slider as indicated by the colour, using the same colour-to-response mapping as during the initial practice. They then completed 136 experimental trials, in which the 80 experimental materials from Experiment 1 were pseudorandomly interleaved with the 56 experimental materials from Experiment 2 (described below).
Participants were asked to read for comprehension. Randomly throughout the experiment, a total of 15 comprehension questions (see Appendix C for examples), one during the practice block, were presented and required an unspeeded yes-no response with the left or right shift key of the computer keyboard. On average, participants responded correctly to 96% of comprehension questions, indicating good comprehension of the materials.

Data analysis
Trials with RT b 100 ms (0%) or RT N 1400 ms (0.49%) as well as trials for which movement direction was not coded due to movements not reaching the end position (1.3%) were excluded from the analysis. For all studies reported in the current paper, mean RT was analysed for correct response trials only. Analysis of errors, defined as any trial where a participant initiated a movement in the wrong direction, was performed on arcsine-transformed data (Winer, 1971). RT and error data were analysed by repeated measures ANOVAs with factors valence (negative vs. positive) and movement direction (push vs. pull) using the ezANOVA function of the R package ez (version 4.2-2; Lawrence, 2013) within the R environment for statistical computing (version 3.0.2; R Development Core Team, 2013). The affect-movement compatibility effect is indicated by a significant interaction of the factors valence and movement direction. 1 In addition, to include by-item random effects, we analysed the RT data using linear mixed-effects modelling (LME; e.g., Baayen, Davidson, & Bates, 2008) with the lmer function of the lme4 package (version 1.0-5; Bates, Maechler, & Bolker, 2013) in R. Following the recommendation of Barr, Levy, Scheepers, and Tilly (2013), we fitted the full mixed effect model justified by the experimental design. That is, as fixed effects we entered valence and movement direction, and the Valence × Movement Direction interaction. As random effects, we included intercepts for both subjects and items and also by-subject random slopes for each fixed effect. We obtained p-values by likelihood ratio tests comparing the model with and without the fixed effect term of interest.

Results
Results showed a significant Valence × Movement Direction interaction, F(1, 47) = 15.64, p b .001, η 2 p = .25. For negative valence materials, push responses were faster than pull responses (M = 559 vs. 580 ms), F(1, 47) = 6.92, p = .012, whereas for positive materials pull responses were faster than push responses (M = 574 vs. 593 ms), F(1, 47) = 5.41, p = .024. That is, responses in which emotional valence was compatible with the direction of movement were faster than those that were incompatible (M = 566 vs. 587 ms), demonstrating the affect-movement compatibility effect. This finding was corroborated by the LME analysis which showed that the model including the Valence × Movement Direction interaction fitted the data better than the model including only the fixed main effects, χ 2 (1) = 47.81, p b .001. Percentage errors (M = 0.67%; cf.

Discussion
Importantly, an affect-movement compatibility effect was found in Experiment 1 even though the participant's task was to respond to an emotion-unrelated stimulus dimension (i.e., target word colour). This provides evidence suggesting that responses elicited by linguistic stimuli are influenced by positively or negatively valenced words independent of an explicit evaluation goal. When viewed within a grounded cognition framework, the current data add support to the notion that processing language with an emotional content activates or re-activates an emotional state in the reader.
Can this effect be influenced by the nature of the wider discourse? To answer this question, in the next experiment we introduce a manipulation in which the emotional content of the target utterance that is afforded by the wider discourse, that is, information that is outside of the target sentence, is not the same as that provided by the target word or sentence in isolation.

Experiment 2
With Experiment 2 we investigated this issue using a common communicative tool: irony. Importantly for the current study, one purported function of irony is to effectively communicate the opposite of the literal interpretation of the utterance (e.g., Grice, 1975). Consider (3), in which the context indicates that an ironic interpretation of the target word is afforded: 3. John finished the race way behind the other competitors. His friend laughed and said to him, "You are so fast!" In such a context, which indicates that the target word is to be interpreted ironically, the word fast is actually intended to mean not fast, and is tainted with a negative connotation (ironic criticism, or sarcasm). In contrast, when the same utterance occurs in a literal context, the connotation is positive (e.g., 4): 4. John finished the race way ahead of the other competitors. His friend laughed and said to him, "You are so fast!" Conversely, when uttered in a context which indicates an ironic interpretation of the target word (e.g., 5), the word slow (or arguably the whole utterance) takes on a positive connotation (ironic praise), whereas it would be negative when uttered in a context in which the word is intended to be interpreted literally (6): 1 The interaction is equivalent to a statistical comparison of the means in the two compatible conditions (push response to negative stimuli and pull response to positive stimuli) with the means of the two incompatible conditions (push/positive and pull/negative). The size of the compatibility effect is given by the average of the means in the incompatible conditions minus the average of the means in the compatible conditions. 5. John finished the race way ahead of the other competitors. His friend laughed and said to him, "You are so slow!" 6. John finished the race way behind the other competitors. His friend laughed and said to him, "You are so slow!" Thus, for Experiment 2 a set of stimuli consisting of short discourses was created. The final target words were either positive or negative (as indicated by valence ratings reported in the Materials and design section, below), and were intended to be interpreted either literally or ironically (see Examples 3-6, above). Crucially, the positive or negative nature of the target word is reversed when the wider context indicates that it should be interpreted ironically. If information in the wider discourse can influence emotion simulations in text comprehension we would therefore expect to find faster and possibly more accurate responses when the direction of movement is compatible with the contextually determined emotional connotation of the target word (i.e., 4 and 5 with pull responses and 3 and 6 with push responses).
It is important to consider here the possible mechanisms via which irony may have an effect on emotion simulations. Firstly, there is the account outlined above (essentially the irony-as-negation account) in which irony simply reverses the valence of the literal target word, following which we would predict the opposite pattern of results for ironic materials compared to their literal counterparts.
In addition to this straightforward irony-as-negation account, there are a number of other accounts relating to the social functions of irony which may make predictions regarding the nature of emotional responses to ironic vs. literal language. Firstly, there is the possibility that verbal irony may reduce the strength of a statement, that is, criticism becomes less negative, and praise less positive, if phrased ironically (e.g., Dews, Kaplan, & Winner, 1995;Harris & Pexman, 2003;Jorgensen, 1996;Matthews, Hancock, & Dunham, 2006). Specifically, Dews and Winner developed the Tinge Hypothesis, which states that the ironic meaning is 'tinged' with the literal meaning. For example, "That was just terrific", uttered as ironic criticism, is tinged with the literal, positive, meaning of terrific, and is thus viewed as being less negative than a literal criticism. In terms of ironic praise, a comment such as "That was just awful" would be tinged with the literal meaning of awful, thus becoming less positive than literal praise. Following this account, we might expect the size of the affect-movement compatibility effect to be larger for literal than ironic materials.
Alternatively, it has been proposed that ironic criticism (or sarcasm) may enhance the (specifically) negative emotions felt by the recipient; such as anger, irritation, disgust (Leggitt & Gibbs, 2000), criticism (Colston, 2007;Toplak & Katz, 2000) and condemnation (Colston, 2007, see also Blasko &Kazmerski, 2006, andBowes &Katz, 2011). One explanation for an enhanced emotional response to sarcastic compared to literal language is that as well as conveying information in the text; the use of sarcasm also conveys information relating to the speaker's attitude towards the recipient. Specifically, it has been argued that this form of language is considered especially appropriate if the speaker wishes to convey a hostile attitude towards the addressee (Lee & Katz, 1998). Thus, in contrast to the tinge hypothesis, this view would predict larger affect-movement compatibility effects for ironic than literal language (for ironic criticism anyway, it is unclear what this account would predict for ironic praise).
It is clear from the above discussion that most theorists would agree that emotions play a role in the use of irony, yet the emotional impact of verbal irony compared to literal language is currently unclear. The results of the current study may further speak to this debate.

Method
The materials for Experiment 2 were interleaved with those from Experiment 1 in a single experimental session, and thus the participants, apparatus, and procedure were identical.

Materials and design
Fifty-six materials were created (see examples 3-6, above, and Appendix C for further examples). The first sentence of each material provided a context which would afford either a literal or ironic interpretation of the target sentence. The connotation of the target word (which was always embedded in direct speech) could be either positive or negative, and would be influenced by the context (reflecting either ironic criticism or ironic praise in the ironic conditions). The mean valence for positive words (e.g., fast) was 6.49 (SD = 1.58) and for negative words (e.g., slow) was 3.33 (SD = 1.46). Comparison of off-line ratings for the words used in Experiment 1 to those used in Experiment 2 with a repeated measures ANOVA (with factors Experiment and Valence) yielded a significant Experiment × Valence interaction, F(1, 39) = 265.55, p b .001, η 2 p = .87. Simple main effects revealed that for negative words, scores were lower (and therefore more negative) for Experiment 1 than Experiment 2 (M = 2.22 vs. 3.33), F(1, 39) = 206.72, p b .001, η 2 p = .84, whereas for positive words, scores were significantly higher (and therefore more positive) for Experiment 1 than for Experiment 2 (M = 7.40 vs. 6.49), F(1, 39) = 154.31, p b .001, η 2 p = .80.

Pre-test
A questionnaire was completed by 140 native-English speaking participants to ensure that the full materials were interpreted as intended (i.e., literally or ironically). There were four different versions of the questionnaire. Each material appeared in only one of its four possible conditions (ironic/positive, ironic/negative, non-ironic/positive, nonironic/negative) in a given version, but appeared in all conditions over the four files. Each participant rated 56 materials, 14 in each condition. Participants were instructed to rate each material based on how ironic they thought it was, on a scale of 1 (not at all ironic or sarcastic) to 6 (definitely ironic or sarcastic). Ironic materials were rated as being significantly more ironic or sarcastic than their non-ironic counterparts (M = 5.01 vs. 1.78), F(1, 139) = 1420.43, p b .001, η 2 p = .91. In the main experiment, these 56 materials were arranged in four different stimulus presentation files. Each item appeared in only one of its four possible versions in a given file, but appeared in all conditions over the four files. Thus each participant viewed all 56 experimental materials, 14 in each condition. Each file also included the 80 materials from Experiment 1. All items were presented in a fixed pseudorandom order, such that no more than two items in the same condition appeared in a row.

Data analysis
Trials with RT b 100 ms (0%) or RT N 1400 ms (1.57%) were excluded from the analysis. Arcsine-transformed error data and mean RT data were analysed using 2 Context (literal vs. ironic) × 2 Valence (negative vs. positive) × 2 Movement Direction (push vs. pull) repeated measures ANOVAs. In addition, RT data were analysed using the same LME modelling approach as in Experiment 1. The full LME model (Barr et al., 2013)  did not converge, hence, random slope terms that accounted for the least variance were successively removed until the model converged.
As fixed effects the final model contained context, valence and movement direction, and the respective interaction terms. As random effects, intercepts for both subjects and items were included in the model and by-subject random slopes for the fixed effects context, movement direction, and their interaction.
The analysis of error data revealed no significant main or interaction effects, all Fs b 2.4, ps N .13, with the exception of a marginally significant main effect of valence, F(1, 47) = 3.37, p = .073, η 2 p = .07. Error rate was slightly higher for negative than positive materials (M = 1.29 vs. 0.69%).

Discussion
The major novel finding from Experiment 2 is that of an affectmovement compatibility effect for ironic (specifically, for ironic praise), but not literal materials, with the former effect being opposite in direction to the one which might be expected simply based on the valence of the target words themselves. An affect-movement compatibility effect was not present in the literal sentence context condition, which might seem surprising given that effects were found for the literal materials used in Experiment 1. One possible contributing factor is that the target words used in Experiment 2 were not as strongly valenced as those used in Experiment 1. In addition, there were relatively fewer stimuli per condition in Experiment 2, which may have lead to a reduction in power. Finally, in Experiment 1, completely different contexts, as well as different final words were used across positive and negative conditions, whereas in Experiment 2, typically a single word was altered in the context across ironic and non-ironic materials. However, although it is necessary to consider the reasons for differences between the two studies, the lack of an effect for non-ironic materials does not detract from the key finding from Experiment 2, that is, that the emotional simulation of a target valenced word can be modulated by the wider context in which the word appears.
As previous studies employing the presentation of isolated valenced words showed an affect-movement compatibility effect (in contrast to Havas et al., 2007, cf . Table 1), we conducted Experiment 3 in order to assess whether such an effect would be observed with the valenced words and colour task used in Experiments 1 and 2.

Experiment 3
To allow for direct comparison with Experiments 1 and 2, we included in Experiment 3 a condition in which participants had to make the colour-related judgement after the word had already been presented for 500 ms in white and then changed to either a 'bluish' or 'greenish' colour (i.e., stimulus onset asynchrony (SOA) = 500 ms). In addition, for the sake of comparison with similar previous work (Chen & Bargh, 1999;Krieglmeyer et al., 2010;Neumann et al., 2005;cf . Table 1), we also included a condition which required an instant colour judgement upon appearance of the target word that was immediately presented in either a 'bluish' or 'greenish' colour (SOA = 0 ms). As before, participants responded in both SOA conditions to the emotion-irrelevant stimulus dimension by either pulling a slider if the word was bluish, and pushing it if the word was greenish, or vice-versa. Thus, if participants would produce faster pull than push responses in the case of a positive word (e.g., Christmas) and faster push than pull responses in the case of a negative word (e.g., distress), this affect-movement compatibility effect would indicate that the emotional content of the word is automatically evaluated.

Participants
Eighty native English speakers from the University of Glasgow community (who had not taken part in Experiments 1 and 2) participated. The age range of the participants was between 17 and 43 years (M = 22.1 years, SD = 4.8 years; mean handedness score = 0.79; 33 males and 47 females, no left-handers). Forty participants each were randomly assigned to the two SOA-conditions of 0-ms and 500-ms.

Apparatus, materials and design
The apparatus was identical to that used in Experiment 1. The materials consisted of the 40 positive and 40 negative words used in Experiment 1 and the target words from the 56 materials created for Experiment 2, half of which were positive and half of which were negative. There were two within-subjects factors, valence (negative vs. 2 Note that the cross-over interaction becomes less symmetrical due to faster responses to positively than negatively valenced words, indicated by the trend for the valence main effect (M = 615 vs. 630 ms). Consequently, it is somewhat difficult to infer a diminished or absent affect-movement compatibility effect in the former condition. positive) and movement direction (push vs. pull), and one betweensubjects factor, which was SOA (0 vs. 500 ms).

Procedure
The procedure used to familiarise participants with discriminating the colours, and with operating the slider, was identical to that used in Experiments 1 and 2 except that, given the single word presentation, no additional six practice trials for the RSVP procedure were presented before experimental trials. Instructions for practice and experimental trials were identical to each other with the exception of mentioning either the square or the word ("A {square/word} coloured in […]"; cf. Appendix A). Participants were asked to maintain fixation at the centre of the screen. The words were presented in Helvetica 16-point font, in the centre of the screen. In the 0-ms SOA condition, the target word appeared in either a bluish or greenish colour, which was used for the square in the initial practice trials, and the participant thus had to make the push-pull movement immediately on encountering the word. In the 500-ms SOA condition, the word was presented in white for 500 ms, after which it was displayed in the 'bluish' or 'greenish' colour. At this point the participant was required to respond with either a push or pull movement of the slider as indicated by the colour. Again, the same colour-to-response mapping was used as during the initial practice.

Data analysis
RT and error data were analysed using a mixed design ANOVA with the between-subjects factor SOA (0 vs. 500 ms) and the repeatedmeasures factors word set (Experiment 1 vs. Experiment 2 words), valence (negative vs. positive) and movement direction (push vs. pull). In the 500-ms SOA condition, one participant was dropped from the analyses due to an excessive number of anticipation errors (i.e., more than 80% of trials with RT b 100 ms).

Results
Trials with RT b 100 ms (0.03%) or RT N 1400 ms (0.77%) were excluded from the analysis. There was a main effect of SOA, F(1, 77) = 12.21, p b .001, η 2 p = .14, indicating slower responses in the 0-ms than the 500-ms SOA condition (523 vs. 460 ms). The trend for the SOA × Set interaction, F(1, 77) = 1.67, p = .079, η 2 p = .04, suggested a 4-ms larger SOA effect for Experiment 1 than Experiment 2 words. As indicated by the non-significant Valence × Movement Direction interaction, F(1, 77) = 0.98, p = .32, η 2 p = .013, response times between isolated words for which word valence was compatible with the direction of movement (negative-push and positive-pull) and those that were incompatible (negative-pull and positive push) did not reliably differ (492 ms vs. 490 ms). This zero interaction effect was not modulated by SOA, F(1, 77) = 0.35, p = .56, η 2 p = .005. The main effect of word set and all interactions including this factor was non-significant, all Fs ≤ 1.7, ps N .19, indicating no differential RT effects for the two sets of positive and negative words used in Experiment 1 versus Experiment 2.

Discussion
The lack of a significant affect-movement compatibility effect for valenced words presented in isolation for which no explicit emotionrelated judgement is required agrees with findings from a number of previous studies which used approach/avoidance movement responses (e.g., Havas et al., 2007, Experiment 3;Niedenthal et al., 2009, Experiment 4;cf . Table 1). Accordingly, this result would appear to support the notion that valenced words presented out of context do not elicit automatic (goal-independent) action tendencies. However, there are other approach/avoidance studies which do report such an effect (e.g., Chen & Bargh, 1999, Experiment 2;Krieglmeyer et al., 2010, Experiment 2 and 3;Neumann et al., 2005, Experiment 2). Thus, the discrepancy between these latter findings and the current results needs further consideration.
One difference that is apparent between experiments that do find reliable approach-avoidance effects for isolated words compared to those which do not is the nature of the task employed (cf. Table 1). It seems to be the case that studies which have required participants to focus on other aspects of the stimuli and to perform choice responses have mainly resulted in absent affect-movement compatibility effects (e.g., colour in the current study, font in Niedenthal et al.'s study, or lexical decision in Havas et al.'s study;but see Krieglmeyer et al., 2010, using grammatical word judgements). In contrast, studies that have not required participants to focus on other aspects of the stimuli by using a stimulus detection task (e.g., Chen & Bargh, 1999, Experiment 2;Neumann et al., 2005, Experiment 2) have consistently demonstrated such effects. It is therefore possible that the present colour discrimination task is sufficiently demanding to minimise contributions from the controlled evaluation of valenced words to the observed affect-movement compatibility effect (cf. Rotteveel & Phaf, 2004).
However, there are a number of other differences between Experiments 1 and 2 and Experiment 3, which require further consideration. For example, the relative salience of the valence of the materials may have differed across experiments. Specifically, Experiments 1 and 2 involved reading for comprehension, which may make valence more relevant than in Experiment 3, where the task could be performed without fully processing the meaning of the words (see Brookshire et al., 2010, for evidence that focusing the task towards word meaning induces embodiment effects, whereas focusing the task towards processing the colour of the stimulus does not). In addition, the inclusion of ironic materials in Experiment 2 may have made valence more salient in the respect that the social functions of irony are clearly related to emotion.
It is noteworthy, however, that differences in valence across the two word sets used in Experiment 3 (i.e., the words used in Experiment 1 vs. those used in Experiment 2) did not result in a modulation of the affect-movement compatibility effect. From a broader perspective, the potential contribution of strength of valence to the mixed findings reported in the literature is somewhat difficult to assess, given that available studies have rarely reported ratings for the valenced words that were presented in isolation (cf . Table 1). Thus, it seems worthwhile to further investigate the boundary conditions under which affect-movement compatibility effects can be obtained for valenced words presented in isolation.

General discussion
The current study revealed a number of key findings. Firstly, results from Experiments 1 and 2 suggest that affect-movement compatibility effects can be obtained in a novel task in which participants make judgements about the stimulus that are unrelated to its emotional content (i.e., the colour of the text in which the word is presented). This adds further to the debate on whether such effects for emotional stimuli occur automatically on encountering the stimulus. In addition, current findings (Experiments 1 and 3) corroborate Havas et al.'s view that the embodiment of affect may not be evoked at the word level, and significantly extend it by demonstrating that it is a discourse-level phenomenon (Experiment 2).
Before discussing the implications of the present findings for current views of text comprehension and irony processing, we will first elaborate on the nature of the affect-movement compatibility effects observed in Experiments 1 and 2. Firstly, an alternative cognitive interpretation of the present affect-movement compatibility effect might be framed along Lakens' (2012) account of metaphor congruency effects (e.g., Meier & Robinson, 2004). According to this account, binary stimulus and response dimensions are asymmetrically processed depending on their polarity differences. With regard to the present study, positively and negatively valenced words reflect + polar and − polar endpoints, respectively, of the word dimension. In the same way, approach and avoidance movements reflect + polar and − polar endpoints, respectively, of the response dimension. Critically, + polar dimensions are typically processed faster than − polar dimensions. Moreover, the polarity correspondence principle (Proctor & Cho, 2006) further states that response selection proceeds faster if S-R polarity codes match than mismatch. As a result, observed RT effects should reflect the sum of dimension-specific polarity effects plus the S-R polarity (non-) correspondence effect. Thus, this account predicts shortest RTs when word and response are both + polar and hence polarity codes match as well. In all other cases RTs should be longer due to the fact that a single +polar word or response involves mismatching S-R polarity codes (+S/−R and −S/+R), whereas for matching ones both word and response are −polar. Consequently, a clear affect-movement compatibility effect should emerge for positively valenced (+polar) words but not negatively valenced (−polar) words. However, and in contrast to this prediction, we observed a symmetric compatibility effect for positively and negatively valenced words in Experiment 1. Therefore, we consider this an unlikely account of present findings.
Another possible interpretation of the affect-movement compatibility effect stresses the importance of the evaluative congruency of stimulus and response codes (Eder & Rothermund, 2008), with the labelling of the responses as positive and negative being critical. The duration of response selection is shorter if S-R labels match than mismatch, bringing about the compatibility effect. Thus, Eder and Rothermund found that positively labelled responses (towards, upward) were faster to positive than negative stimuli and negatively labelled responses (away, downward) were faster to negative than positive stimuli, irrespective of the direction of distance change (approach vs. avoidance) and muscle flexion versus extension. It must be noted though that other studies (e.g., Krieglmeyer et al., 2010;Lavender & Hommel, 2007) failed to find an evaluative coding-dependent compatibility effect when stimulus valence was a task-irrelevant dimension. Given that in our studies, stimulus valence was also task-irrelevant, we argue that it is less likely that the present results reflect this cognitive, response selection view of the affect movement compatibility effect, and more likely that our stimuli triggered motivational tendencies in a goal-independent, yet flexible (muscle-unspecific and context-dependent) manner.
Still, one might argue that participants may have consciously evaluated the valenced words and hence evaluative congruency of stimulus and response codes mattered. To us, this seems an unlikely proposition for two reasons. Firstly, the present task demands are similar to those employed in the Stroop task (Stroop, 1935), where word meaning is taken to automatically influence task-relevant colour processing (e.g., Cohen, Dunbar, & McClelland, 1990). Secondly, and more importantly, if participants would have labelled the words as positive versus negative, it is difficult to see why the affect-movement compatibility effect observed in Experiment 2 was obtained only for ironic utterances, and specifically for negatively valenced words. Therefore, we take the affect-movement compatibility effects observed in the present paper to be triggered by an automatic, fast process that is independent of evaluative goals. Of course, further studies should assess the automaticity issue in a more comprehensive manner with respect also to further key defining features, such as unconscious and effortless processing (Moors & De Houwer, 2006).
Given that we argue against evaluative strategies and stimulusresponse congruency effects as a basis for our findings, it is important to further discuss why valenced language might influence hand movements towards and away from the body. Emotions are thought to be strongly related to certain action tendencies (e.g., Frijda, 1986;Lang, 1995;Lang, Bradley, & Cuthbert, 1990; see also Heberlein & Atkinson, 2009;Neumann & Strack, 2000, for discussion). Of relevance to the current study, it has been argued that positive objects in the environment predispose an approach action, whereas negative stimuli prepare the body for an avoidance response (e.g., Chen & Bargh, 1999; see also Havas et al., 2007, for a discussion of the possibility that positive affect enhances the simulation of approach actions). Specifically, research has indicated that different emotional (facial) expressions are closely linked with two different neural structures which have been assumed to be involved in the production of approach and avoidance behaviours (Davidson, Ekman, Saron, Senulis, & Friesen, 1990). Consequently, when the valence of emotion simulations matches those of the action, response times are faster than if they mismatch.
Coming back to the role of the discourse context for the interpretation of language input, the finding of a (reversed) affect-movement compatibility effect for ironic contexts provides the first evidence that information in the wider discourse rapidly determines the emotional interpretation of a target utterance. This emotional interpretation then influences motor responding to a target word in a task that does not involve explicit evaluation of the emotional content of the stimulus. On a general level, this result appears to fit well with the assumption that emotions gain meaning via their situated conceptualisation (e.g., Barrett, 2009).
Following this, it is important to consider why context seems crucial to emotional simulation in language comprehension (in the current studies, at least). Readers and listeners typically experience language in some kind of meaningful context, from which they can construct a situation model representing the events that are being described (e.g., Zwaan, 2004;Zwaan & Radvansky, 1998). Thus, it may be that the simulation of a relevant event or state of affairs is what is important here, for example, the simulation of a state of affairs which the reader can empathise with as being pleasant or unpleasant for the characters involved. In specific relation to the scenarios used in Experiment 1, given that some of the materials included positive or negative events in the context, and others did not, it is currently unclear whether it is the modulation of the valenced word that is crucial, or the accumulation of positive/negative valenced information in the context. Thus, it is clear that the factors involved in emotional simulation in context are likely to be complex (see also Leuthold, Filik, Murphy, & Mackenzie, 2012), and that further research is needed in this area.
In specific relation to the processing of ironic vs. literal comments, the current findings would suggest that participants experienced an emotional response to the ironic materials but not to non-ironic counterpart materials (in the respect that no such affect-movement compatibility effect was found for non-ironic materials). The finding that results for ironic materials were reversed with respect to the literal meaning of the target word is in line with the irony-as-negation account outlined above. However, it is also necessary to discuss, in relation to the accounts discussed in the Introduction to Experiment 2, why ironic language should evoke an emotional response in contrast to non-ironic language conveying a similar message. To re-cap, the tinge hypothesis (e.g.,  suggests that ironic criticism is viewed as less negative than literal criticism, and ironic praise less positive than literal praise, due to ironic comments being tinged by their literal (opposite) meaning, leading to a 'muted' emotional response to ironic materials. This account is clearly not supported by the current findings, in which an affect-movement compatibility effect was found for ironic materials but not for their literal counterparts. Alternatively, it has been proposed that irony (in particular ironic criticism or sarcasm) can enhance the emotional impact of a message, for instance by conveying a hostile attitude towards the addressee (see e.g., Bowes & Katz, 2011, for recent discussion). In contrast, corresponding literal statements may be regarded as a somewhat bland statement of the obvious that is "dull and almost uninformative" (Giora, 1995, p. 259). This account would seem to be more in line with the current findings, but cannot explain them completely. Specifically, we found a clear affectmovement compatibility effect for the ironic praise materials, whereas the corresponding effect for ironic criticism (sarcasm) did not reach significance (although see Footnote 2).
Thus, it is evident that more work is needed to investigate the social and emotional functions of irony, in particular, to clarify the roles of ironic criticism vs. ironic praise. However, it should be noted that claims about whether irony enhances or mitigates emotional force may depend on a number of factors, for example, the relationship between the speaker and addressee, the social context in which the utterance occurs, and the specific type of irony examined (see e.g., Leggitt & Gibbs, 2000). These are interesting new avenues for future investigation.

Conclusions
In sum, the current results suggest that emotion simulations may contribute to language comprehension, as evidenced by the modulations of response times in the novel affect-movement compatibility task introduced in the current paper, in which participants respond to an irrelevant stimulus dimension (e.g., its colour). In support of previous findings, our results suggest that this may not be a word-level phenomenon. Furthermore, we extend this grounded cognition view of language comprehension to the discourse level, by demonstrating that the emotional content of the stimulus can be determined by the wider discourse context, in this case, whether the phrase is uttered ironically. The results inform theories of how emotional language is represented (in terms of embodiment) and theories concerned with the role of (contextualised) emotional processing.