Everyday language contains many words that refer to people’s mental states, such as anger, doubt, exhaustion, or recognition. How do we understand such concepts? Grounded cognition theories suggest that conceptual understanding involves the mental simulation of sensory states (Barsalou, 1999; Gallese & Lakoff, 2005). On this account, the representation of abstract concepts, including mental states, involves complex, multimodal simulations. The link between understanding mental states and simulation has mostly been explored through studies of emotion concepts (Glenberg, Webster, Mouilso, Havas, & Lindeman, 2009; Havas, Glenberg, & Rinck, 2007; Niedenthal, 2007; Niedenthal, Winkielman, Mondillon, & Vermeulen, 2009; Oosterwijk, Rotteveel, Fischer, & Hess, 2009; Oosterwijk, Topper, Rotteveel, & Fischer, 2010; Wicker et al., 2003; for a review, see Winkielman, Niedenthal, & Oberman, 2008). In the present research, we move beyond emotion concepts and explore the role of simulation in understanding nonemotional as well as emotional terms. More importantly, though, we explore a novel issue concerning the role of mental perspective in multimodal simulation. Specifically, we compare a focus on internal properties of mental states (i.e., properties accessible through introspection) with a focus on external properties of mental states (i.e., properties accessible through vision).

Table 1 Internal and external focus ratings for the eight different sentence categories (standard deviations are in parentheses)
Table 2 Mean reaction times in milliseconds (RT) and percent errors (PE) for sensibility judgments in nonswitch and switch conditions, split for emotional and nonemotional target sentences with internal or external focus (standard deviations are in parentheses)

One type of simulation that is relevant to mental states is the simulation of internal experiences. After all, we feel something when we are angry, exhausted, experience a sense of familiarity, or struggle to recall an event. Internal experiences include introspections (i.e., subjective experiences resulting from self-reflection), interoceptive states (i.e., sensations from the body), and experiences that comprise the general term “affect.” Several experiments support the idea that internal experiences characterize mental states, including emotions (e.g., anger, joy, fear), visceral states (e.g., hunger, dizziness), and states classically seen as cognitive (e.g., familiarity, intuition, thinking). For instance, thinking and recalling can feel easy or difficult, as reflected in subjective ratings and physiological indices of effort (Schwarz & Clore, 2007; von Helversen, Gendolla, Winkielman, & Schmidt, 2008). Furthermore, valence and arousal are seen as intrinsic parts of perception (Barrett & Bar, 2009; Barrett & Bliss-Moreau, 2009), recognition (Winkielman & Cacioppo, 2001), and memory (Phaf & Rotteveel, 2005). Consequently, understanding language that describes mental states from an internal perspective, such as the sentence he retrieved the memory from his mind, may involve simulation of internal experiences.

Nevertheless, mental states can also be described from an external perspective. Exhaustion or anger, for example, are associated with external manifestations on the face (frown) or body (clenched fists)—information that is “on the outside.” A focus on external components may therefore involve the simulation of relevant visual features. Thus, for instance, understanding the sentence contempt was showing on his face may invoke an external perspective and, consequently, a visual simulation.

In short, simulation of mental states may be different depending on the context in which the mental state is situated (Barsalou, Niedenthal, Barbey, & Ruppert, 2003). When a mental state is described in terms of internal experiences, simulation of introspectively accessible features may be relevant for understanding. When a mental state is described in terms of external, expressive manifestations, however, simulation in the visual system may be more relevant.

The idea that language can invoke different perspectives that guide simulations relevant for understanding has been explored concerning spatial perspective. Several studies demonstrated that language comprehension involves a spatial situation model, a mental representation of the linguistically described situation that includes spatial information such as distances and relative positions between elements. In a spatial situation model, attention is focused at specific locations. Thus, a person who comprehends language has a mental simulation of the situation in which he or she is an observer from a particular spatial perspective (Morrow, Greenspan, & Bower, 1987; see Bower & Morrow, 1990; and Zwaan & Radvansky, 1998, for reviews). Researchers have also shown that object properties are more accessible if the spatial perspective allows for perception of the property, as compared to when it does not. For example, from the simulated perspective of inside a restaurant, participants were faster to verify that restaurants have tables than from the perspective of outside a restaurant (Borghi, Glenberg, & Kaschak, 2004; see also Brunyé, Ditman, Mahoney, Augustyn, & Taylor, 2009; Horton & Rapp, 2003; Wu & Barsalou, 2009). In addition, Spivey and Geng (2001) showed that the perspective of a story affected the direction of participants’ eye movements, even though they were looking at a blank screen. Thus, when participants understand language, they construct a situation model from a specific perspective.

Building upon previous research on the role of spatial perspective in comprehension, the present research examined the novel hypothesis that understanding abstract concepts, such as mental states, involves simulating internal experiences or externally observable features, depending on perspective. This study not only extends the traditional research on language comprehension, but also the extant grounded cognition models, which typically focus on motor actions and the classic sensory modalities (vision, audition, smell, taste, and touch). In fact, extant models of grounded cognition at least implicitly assume that what is central for simulation is some form of perception and action “in the world” (cf. Wilson, 2002). Two important exceptions are Barsalou’s (1999) influential article, which explicitly highlights the possible role of introspective simulations in the comprehension of abstract concepts (see also Barsalou & Wiemer-Hastings, 2005), and simulation accounts that propose an important role for interoceptive and introspective simulation in emotion (Bastiaansen, Thioux, & Keysers, 2009; Glenberg et al., 2009; Niedenthal, 2007).

In the present study, we used a switching cost paradigm to investigate the roles of internal and external focus in understanding mental states. Previous research has found switching costs when properties of verified concepts come from different sensory modalities, rather than from the same sensory modality (Marques, 2006; Pecher, Zeelenberg, & Barsalou, 2003; van Dantzig, Pecher, Zeelenberg, & Barsalou, 2008; Vermeulen, Niedenthal, & Luminet, 2007). For instance, Pecher et al. demonstrated that people verify that an apple is shiny more quickly after verifying that a flag is striped than after verifying that an airplane is noisy. This effect is explained by flexible simulations in the modalities relevant for verifying the different properties. For example, in order to verify that an apple is shiny, the conceptual system will use the visual modality to simulate seeing an apple, whereas in order to verify that an airplane is noisy, the conceptual system will use the auditory modality to simulate hearing an airplane. If these different modality-specific features are represented by their respective sensorimotor systems, a switching cost is predicted because attention has to switch between different systems (cf. Spence, Nicholls, & Driver, 2001).

To investigate whether internal and visual forms of simulation play different roles in understanding mental states depending on perspective, we contrasted sentences about mental states that emphasized internal experiences (internal focus) with sentences that emphasized visual features (external focus). If mental states are grounded in simulations that vary with internal or external focus, then switching costs should occur when people process sentences with different focuses. More specifically, target sentences preceded by prime sentences with the same focus should be processed faster than target sentences preceded by prime sentences with a different focus.

In the present study, we presented prime and target sentences from different domains of experience. That is, sentences describing mental states from an emotional domain (e.g., fear, anger, pride, shame) primed sentences describing mental states from nonemotional domains (e.g., thinking, remembering, dizziness, hunger), and vice versa. This design was important for two reasons. First, this method isolates the effects of focus from other potential similarities between the sentences. Critically, it is unlikely that emotional states have strong semantic associations with cognitive and visceral states, which could explain switching costs. In this way, our method offers a strong test of the role of internal versus external focus in sentence comprehension.

Second, using primes and targets from different domains allowed us to assess the similarity between internal simulations associated with different types of mental states. This is relevant to a long-standing discussion in psychology about whether emotion and cognition share processing resources. Some researchers have argued that processing operations and neural substrates of emotion are separate from cognition and other states (Zajonc, 2000). Other views predict overlap in processing between emotional, cognitive, and visceral domains, due to the common involvement of internal states (see Duncan & Barrett, 2007; Pessoa, 2008). If emotion and cognition truly rely on completely different subprocesses, then simulating feeling angry would require access to different resources than retrieving a memory. On this account, one would not expect additional switching costs for moving from internal to external focus, because processing prime and target sentences would always depend on different resources. If, however, switching costs related to internal and external focus are observed regardless of the fact that people switch between the emotional and nonemotional domains, this would challenge a strong “separationist” view of cognition and emotion, at least for language comprehension.

In summary, we tested the hypothesis that representations of mental states can differ according to internal or external focus. We predicted switching costs for sentences with different focuses. The occurrence of switching costs in this design would be a strong test of the hypothesis that simulation mechanisms that underlie the processing of sentences with an internal or an external focus are similar for very different categories of mental states. In other words, if switching costs are found, this would support a counterintuitive notion that emotional, cognitive, and visceral mental states are understood via similar mechanisms of representation.

Method

Participants and design

In total, 169 students from the University of California, San Diego (UCSD), participated for course credit. The experiment had a 2 x 2 x 2 design. The first two factors were varied within participants, manipulating internal versus external focus (target focus) and same versus different focus (switching). The third factor was varied between participants, manipulating whether emotion sentences served as primes and nonemotion sentences as targets, or vice versa (order).

Stimulus materials

We created 200 sensible sentences referring to 10 emotional states (i.e., guilt, shame, disappointment, sadness, fear, anger, disgust, pride, happiness, and love) and 10 nonemotional states (i.e., meditation, dizziness, intuition, doubt, hunger, thinking, remembering, tired, puzzled, and visualization) and varied the internal or external focus of these sentences. The total set consisted of the following four subsets: 50 nonemotion sentences with internal focus (He was famished by the end of the race. The phone number came back to her in a flash.), 50 nonemotion sentences with external focus (She shook her head in doubt. After spinning she lost her balance.), 50 emotion sentences with internal focus (Hot embarrassment came over her. Being at the party filled her with happiness.), and 50 emotion sentences with external focus (His nose wrinkled with disgust. She lowered her head with disappointment.). Internal and external sentences incorporated the same, previously specified, set of 10 abstract concepts in order to ensure that sentences with different focuses did not differ in terms of the mental states they described. A full listing of the sentences can be found online in the Supplementary Materials.

In a separate norming study, 51 students from UCSD provided ratings for the sentences. Approximately half of the participants rated the sentences on internal focus (n = 23), and the remaining participants rated the sentences on external focus (n = 28). Internal/external focus was introduced as “the extent to which a sentence describes internal/external aspects of an experience.” For internal focus, it was emphasized that internal aspects of experiences can only be observed by the person himself, whereas for external focus it was emphasized that external aspects can be observed by outsiders. Internal and external focus were rated on a scale from 1 (no internal/external focus at all) to 5 (very high in internal/external focus).

As can be seen in Table 1, rated internal focus was significantly higher for sentences that were created to produce internal focus as compared to sentences that were created to produce external focus. This effect was present for both the emotion sentences, t(22) = 5.43, p < .001, and the nonemotion sentences, t(22) = 6.44, p < .001. Rated external focus, on the other hand, was significantly higher for sentences that were created to produce external focus as compared to sentences that were created to produce internal focus. This effect was present for both emotion sentences, t(27) = 8.81, p < .001, and nonemotion sentences, t(27) = 9.52, p < .001. In short, the norming study established that the emotional and nonemotional sentences used in the main experiment indeed produced the intended external and internal focuses.

Procedure

In the main experiment, we randomly combined the 200 normed sentences to form prime–target pairs. The resulting 100 experimental pairs were crossed on the same–different dimension and the external–internal dimension, creating four groups (i.e., internal–internal, external–internal, external–external, and internal–external). Prime and target sentences were fully counterbalanced over groups, and sentences in different groups were matched on length. In addition, we also fully counterbalanced the content of the prime and target sentences within the experimental pairs in terms of the mental states described in the sentences. Emotion sentences and nonemotion sentences served either as targets or primes. Half of the participants were presented with the emotion sentences as primes and the nonemotion sentences as targets, and the other half were presented with the nonemotion sentences as primes and the emotion sentences as targets.

As in previous research with sentences, participants were asked to judge sensibility (Glenberg & Kaschak, 2002). The experiment consisted of trials presenting sensible sentences and trials presenting nonsensible sentences (The curtains were dry with fear.). Participants made judgments about the sensibility of these sentences using the “sensible” (m) or the “nonsensible” (z) key. To balance the numbers of sensible and nonsensible responses and to obscure the fact that the experimental sentences were systematically paired, we mixed the 200 experimental sentences with 400 filler sentences. These filler sentences were combined into 50 sensible–nonsensible, 50 nonsensible–sensible, and 100 nonsensible–nonsensible filler pairs.

Participants first completed 12 practice trials, followed by 300 experimental trials. Every trial started with a fixation stimulus (*****) presented for 500 ms, followed by the prime sentence. The prime sentence was removed from the screen when the participant gave a response or after 4,500 ms. After a 1,000-ms interstimulus interval, the fixation stimulus was presented again, followed by the target sentence. The target sentence remained on screen until a response was made (but no longer than 4,500 ms). Response times (RTs) were measured from the onset of the target sentence. Participants received feedback if they made an error (“incorrect”) or responded more slowly than 4,500 ms (“too slow”).

Before data analysis, we excluded participants who made more than 20% errors (23 participants, or 14%), indicating that they were not performing the task as instructed or had poor reading skills. The analyses were performed on the remaining 146 participants. It is important to note that we chose this stringent exclusion rate considering the fact that 37% of UCSD students do not speak English as their native language (www.ucsd.edu/explore/about/facts.html). Nonetheless, an exclusion rate that omitted only those people who made 35% errors or more (leaving out 6 participants) did not change the pattern of our results. Mean RTs were computed for each condition. RTs for trials with incorrect responses to a prime or target sentence or RTs more than three standard deviations from the participant’s mean were excluded.

Results

We predicted that participants would be faster to judge subsequently presented unrelated sentences within the same focus (internal–internal or external–external focus) than across focuses (internal–external or external–internal). This prediction was tested by a repeated measures ANOVA on the RTs with switching (switch, no switch), target focus (internal, external) and target emotional status (emotional, nonemotional) as within-subjects factors. Most importantly, this ANOVA showed the expected switching effect, F(1, 144) = 6.51, p = .01, η p 2 = .04. RTs to nonswitch trials (M = 1,673 ms) were faster than RTs to switch trials (M = 1,696 ms). The interaction between switching (switch, no switch) and target focus (internal, external) was not significant, F(1, 144) < 1, p = .87. This indicates that the switching effect was equally strong for targets with an internal and with an external focus (see Table 2). We were also interested in whether emotional or nonemotional sentences differed in their ability to prime focus. Although numerically the switching effect was larger for nonemotional targets (31 ms) than for emotional targets (16 ms), the interaction between switching and the emotional status of the target sentences was not statistically significant, F(1, 144) < 1, p = .41.

In addition, the repeated measures ANOVA on the RTs showed a main effect of target focus, F(1, 144) = 20.74, p < .001, η p 2 = .13, and a significant interaction between target focus and the emotional status of the target sentences, F(1, 144) = 7.94, p < .01, η p 2 = .05. These effects are theoretically irrelevant and may reflect slight differences in linguistic properties between different types of sentences.

A repeated measures ANOVA on the error rates did not show a switching cost effect, F(1, 144) = 0.88, p = .35. There was, however, a theoretically uninteresting significant interaction between target focus and the emotional status of the target sentences, F(1, 144) = 19.50, p < .001, η p 2 = .12.

Discussion

The present study demonstrated that sentences describing internal aspects of mental states were judged more quickly when primed with sentences with the same focus (internal) than when primed with sentences with a different focus (external). A similar switching effect was present for sentences with an external focus. The advantage of our design was that these results are hard to explain by differences in semantic similarity between sentences with the same focus and with different focuses, because the prime and target sentences refer to experiences in different domains (emotional or nonemotional). Therefore, in our view, the switch in perspective between internal and external focus is a more likely explanation than one based on semantic similarity.

The presence of switching costs suggests that language about both emotional and nonemotional mental states can be understood from at least two different perspectives. By highlighting the contrast between internal and external perspectives, the present findings extend previous work on the role of perspective in language comprehension, which explored how people construct different perspectives when understanding external events (Borghi et al., 2004; Brunyé et al., 2009; Horton & Rapp, 2003; Morrow et al., 1987; Spivey & Geng, 2001; Wu & Barsalou, 2009). Moreover, by introducing an internal perspective, the present findings extend previous work on switching effects that has mainly focused on the classic sensory modalities (Marques, 2006; Pecher et al., 2003; van Dantzig et al., 2008; but see Vermeulen et al., 2007).

In accordance with theories of grounded cognition, we propose that internal and external sentences are understood through simulation in different systems (Barsalou, 1999; Glenberg & Robertson, 2000). While sentences with an external focus may be understood predominantly by simulations in visual systems, sentences with an internal focus may be understood predominantly by simulations in systems associated with internal experiences. Consequently, the present results may be interpreted as support for Barsalou’s (1999) proposal that simulations of introspective states play an important role in understanding abstract mental concepts (see also Barsalou & Wiemer-Hastings, 2005). In addition, highlighting the role of internal experiences is also important, as some embodiment theories tend to focus on perception and action “in the world,” without explicitly acknowledging the roles that attention and perception of internal events may play in concept representation (for discussion, see Wilson, 2002).

Most concepts, including abstract concepts, involve a mix of different modalities (see, e.g., van Dantzig, Cowell, Zeelenberg, & Pecher, 2011). For example, consider the emotion anger; anger is associated with internal experiences (e.g., high arousal, raised body temperature, sense of urgency) and external features (e.g., clenched fists, frown, red face). As such, both internal and external simulations could underlie understanding of this concept (see also Niedenthal et al., 2009). This is consistent with the ratings we collected for our sentences, indicating that these were not exclusively internal or external. Hence, the complete simulation that accompanies sentence understanding may be multimodal. Nevertheless, the switching cost found in the present study suggests that within this mix of modalities, more attention may be given to simulations in contextually relevant modalities than to those in contextually irrelevant modalities. We propose that the internal or external focus of the presently used sentences drew attention to internal or external components of the presented concepts, which resulted in dominance of the relevant modalities in the simulation (see also Connell & Lynott, 2009).

The present study demonstrated switching costs even though prime and target sentences always described mental states from different domains. This finding points out important similarities in the simulations that underlie the understanding of conceptual references to emotional and nonemotional mental states (Barsalou et al., 2003). Most importantly, even though our results leave open the possibility of more specialized introspective subprocesses, as suggested by Barsalou (1999), we show that at least some aspects of internal simulation are similar across emotional, cognitive, and visceral domains. Consequently, our findings challenge a view in which cognitive, emotional, and visceral mental states are seen as processes that can be strictly separated. In contrast, our findings are consistent with several recent studies that have highlighted the neural overlap of emotional and nonemotional circuitry in the brain (Duncan & Barrett, 2007; Lindquist, Wager, Kober, Bliss-Moreau, & Barrett, in press; Pessoa, 2008). Notably, our findings also support the premise that internal experiences (or affect) are a common “ingredient” of all mental events (Barrett, 2009).

Furthermore, our findings are relevant to the recent suggestion by Craig (2002, 2009) that the anterior insular cortex (AIC), a brain region associated with interoception and feeling states, might be involved in processing of many different mental experiences, varying from basic visceral states (pain, coldness, hunger) to emotional states (disgust, anger, sadness) and cognitive states (sudden insight, feeling of knowing). Hence, an interesting and important avenue for further research will be to test whether understanding sentences describing internal components of emotional and nonemotional mental states is accompanied by increased activity in the AIC. Such a finding would be an important addition to brain-imaging studies that have reported activity in “classic” modality-specific areas when people verify perceptual properties of concepts (Goldberg, Perfetti, & Schneider, 2006; Kan, Barsalou, Solomon, Minor, & Thompson-Schill, 2003).

In short, our findings highlight two important points. First, switching between sentences about mental states with internal and external focus has processing costs. This finding emphasizes the importance of perspective in simulations of mental states. Second, although important aspects of diverse mental states—such as anger, exhaustion, and remembering—are processed uniquely, their processing may rely on a shared simulation mechanism. This mechanism allows us to grasp mental states from inside out.