Introduction

Engaging in joint attention is at the heart of social interaction, be it learning about objects from others (Csibra and Gergely 2009), coordinating interpersonal actions (Clark and Krych 2004; Richardson and Dale 2005; Sebanz et al. 2006) or figuring out what others have in mind (Baron-Cohen 1991). Two aspects of attending together have predominantly been addressed in previous research. First, research on gaze following has been concerned with bottom-up, perceptual influences of joint attention. It has been shown that other people’s gaze automatically draws our attention towards the attended to location, providing a perceptual benefit for this location (Driver et al. 1999; Ristic et al. 2002; for a review, see Frischen et al. 2007).

Second, joint-attention research has addressed the role of shared representations. During joint attention, a triadic relationship is formed, including the attendees, the attended object as well as the knowledge that the respective other is attending to the same object as oneself. Engaging in shared attention with others was found to enhance infants’ focus on relevant aspects of the environment (Striano et al. 2006) and is thought to play a crucial role in the development of imitation, social cognition and language (Barresi and Moore 1996; Hobson 2002; Tomasello et al. 2005).

Only recently, studies have started to explore how shared attention influences perceptual processing in adults (Richardson et al. 2009), and in particular, how differences in perspective modulate perception. Does attending to an object from different perspectives influence how we perceive that object? Given that people normally process objects from their own perspective, within an egocentric reference frame (Klatzky 1998), another’s attention from a different perspective may induce a switch to an allocentric perspective, where objects can be more easily processed in relation to the other’s body.

There are some indications in the literature that people spontaneously consult the perspective of others. In a series of experiments by Samson et al. (in press), participants judged the amount of dots on virtual walls from either their own perspective or from the perspective of an avatar present in the scene. When participants judged how many dots they saw themselves, the avatar’s perspective interfered with their own, demonstrated by slower responses when the avatar saw a different amount of dots. The process underlying this effect was suggested to be a rapid, efficient computation of the avatar’s perspective. When confronted with someone else having a different perspective, participants had difficulties maintaining their purely egocentric view of the scene.

Findings by Tversky and Hard (2009) suggest that another’s perspective also affects judgments about the spatial configuration of objects. When participants were asked to describe spatial relations between objects in a picture, they showed a tendency to report the scene from the perspective of the person in the picture, especially when the question about the objects referred to object use (Tversky and Hard 2009). The authors claimed that although an egocentric perspective constitutes the default frame of reference, spatial perspective-taking occurs and “in some situations, taking the other’s perspective appears to be more natural and spontaneous than taking one’s own” (pp. 129). However, this study relied on verbal descriptions, and it is unclear whether such modulations of perspective would manifest themselves in tasks that do not involve language use.

The aim of the present study was to investigate whether joint attention from different perspectives modulates the reference frame that people adopt to process objects. Spatial characteristics of an object are usually encoded with respect to a reference frame. Reference frames can be egocentric, where objects are encoded relative to the perceiver, or allocentric, where objects are encoded relative to the environment rather than the perceiver (Klatzky 1998; Soechting and Flanders 1992; Volcic and Kappers 2008). We employed a rotation task that required gradual mental transformations of hands. This allowed us to measure differential effects of different perspectives, other than in previous studies where binary responses were collected (‘left’ versus ‘right’ in Tversky and Hard 2009; ‘yes’ versus ‘no’ when amount of dots was either the same or different in Samson et al. in press). We predicted that jointly looking at the same stimuli from different spatial perspectives would lead people to adopt an allocentric reference frame, where objects can be encoded relative to the environment or, respectively, to another person’s body orientation. This should be reflected in differential effects on mental rotation, depending on the degree of rotation.

A further question that remains unanswered by earlier studies is whether the mere presence of another individual is sufficient to make people consider another’s perspective or whether sharing attention plays a critical role. In order to address this question, we manipulated whether attention was shared or not while keeping the physical presence of the other person constant.

Participants were sitting opposite each other while attending to objects on a flat screen placed in between them. Attending alone or together, they performed a rotation task in which two pictures of hands were presented in succession, the second picture being rotated (handedness task). Using different angles of rotation, it is possible to get a parametric estimate of how participants perform mental transformations when attending to the same stimuli alone or together.

When handedness is judged by mentally transforming hand pictures, reaction times (RTs) are typically found to increase with the difference in orientation between the hand picture and participants’ own hand (Parsons 1987a, b, 1994; Parsons et al. 1995). Furthermore, RTs depend on the awkwardness of the depicted hand posture, suggesting that participants use motor imagery whereby they imagine the movement of their own hand to match the orientation depicted by the hand picture (de Lange et al. 2006; Kosslyn et al. 2001).

Performing rotations of body parts based on motor imagery involves an egocentric reference frame. However, mental transformation processes of body parts can also be performed within an allocentric reference frame. This allows for body parts to be processed in relation to others’ bodies. It has been suggested that such transformations do not involve motor imagery of the depicted body parts but are accomplished by mentally mapping the body parts onto a body axis (head–feet, left–right; see Lakoff and Johnson 1999; Amorim et al. 2006).

If joint attention leads participants to adopt an allocentric rather than an egocentric reference frame, this should be reflected in differential effects on mental rotation, depending on the degree of rotation. In particular, the rotation curve in the joint-attention condition should be flattened. Participants should become faster for large rotation angles if largely rotated hand pictures are processed within an allocentric reference frame, where the hands can be mapped onto the other’s body axis.

Alternatively, another person’s attention may increase the saliency of stimuli overall or it may increase participants’ motivation. This should be reflected in a general effect, e.g., in an overall improvement of performance in the joint-attention condition. Slopes of the rotation–performance curves should not be affected and slopes should remain parallel.

These predictions were tested in experiment 1. In two further experiments, we investigated whether the joint-attention effect is modulated by social context (cooperation versus competition, experiment 2) and by the degree to which the preceding trial primed the other’s perspective (experiment 3).

Experiment 1

This experiment investigated whether engaging in joint attention from different spatial perspectives leads participants to adopt an allocentric reference frame.

Methods

Participants

Thirteen pairs of undergraduate students (mean age 20.6 years; 18 women; 22 right-handed) participated in the experiment and received course credits or payment for their participation. They were fellow students or friends. All of them reported normal or corrected-to-normal vision and signed informed consent prior to the experiment.

Stimuli and procedure

Participants were tested in same-sex pairs and were seated at opposite sides of a table (see Fig. 1). In between them was a 17-in TFT monitor that was fixed to the table so that the screen faced the ceiling. The viewing distance to the monitor was 70 cm. Ambient light was kept at a constant level.

Fig. 1
figure 1

a Schematic drawing of the experimental setting. Two people were sitting opposite each other with a flat screen in between them. Both of them responded pressing keys with their right hand. Both participants placed their left hand under the table. Each participant’s right hand was hidden inside a box. b Sequence of events on each trial

Each trial started with the presentation of a tone (900 Hz) presented for 100 ms (see Fig. 1). This tone cued the participants to open their eyes and to look at each other. After 1,500 ms, one of three tones appeared together with a fixation cross (size 0.8° visual angle, presented in the centre of the screen). A 400-Hz tone indicated that it was participant A’s turn to perform the subsequent mental rotation task (and participant B’s turn to close the eyes). A 1,400-Hz tone signalled that it was B’s turn (A closing the eyes). A 900-Hz tone indicated that both participants should attend to the screen and perform the subsequent mental rotation task.

In the mental rotation task, participants saw two subsequent pictures of hands. They were instructed to indicate whether or not the second picture depicted the same hand as the initial picture (e.g. right hand when the initial picture also depicted a right hand versus left hand when the initial picture depicted a right hand). The initial hand picture always showed a right hand. The first picture was shown 1,500 ms after the tone, indicating whose turn it was and was presented for 700 ms. After 300 ms, the second-hand stimulus appeared until participants’ responses were recorded, thereby not exceeding 4,000 ms. There was a 500-ms inter-trial interval after the response. Stimuli of the rotation task consisted of one photograph of a female hand (height: 14.7° visual angle, width: 9.0° visual angle). The hand was always shown with palms pointing downwards. This photograph had been edited with the software Photoshop CS3 Extended (version 10.0.1, 2007) in order to create identical pictures of a right and a left hand.

The initial hand picture of the rotation task was presented either from the first-person perspective of participant A (rotation level 0°) (implying that participant B saw the hand from a third-person perspective/rotation level 180°) or from the first-person perspective of participant B (implying that A saw the hand from a third-person perspective). The second stimulus showed a picture of a hand that was rotated relative to the first hand by 0°, 30°, 60°, 90°, 120°, 150°, 180°, 210°, 240°, 270°, 300°, 330° or 360°.

Participants were asked to respond as fast and as accurately as possible to the appearance of the second-hand picture by pressing one of two keys with their index and middle fingers of the right hand. Responses were collected using two keyboards with two horizontally arranged active keys each (‘W’ and ‘R’ for participant A, and ‘3’ and ‘5’ for participant B). In order to prevent subjects from using the sight of their own hands as cues for the rotation task, carton boxes were placed above participants’ hands. These boxes also prevented participants from observing each other’s responses.

Ten experimental blocks followed two practice blocks. Each block consisted of 42 trials and was followed by a short rest. Trials were randomized within blocks. The assignment of stimuli (same versus different hand) to responses (index versus middle finger) was counterbalanced across subjects. After the session, participants were debriefed. During debriefing, participants were asked whether they thought the other’s attention influenced the way they solved the task or their performance. They were then asked to attempt to guess in which way they thought that the other’s attention had affected their behaviour.

Design

A 2 (attention condition) × 7 (rotation) factorial within-subject design was employed. Participants performed one-third of the trials alone (single-attention trials), and one-third simultaneously with the other participant (joint-attention trials). On the remaining third of the trials, their eyes were closed (single-attention trials of the respective other participant). Thus, 50% of the responses came from single-attention trials and 50% from joint-attention trials. Rotations to the left and to the right side were considered equivalent. As a consequence, there were 7 different levels of rotation: no rotation (0° and 360°), level 1 (30° and 330°), level 2 (60° and 300°), level 3 (90° and 270°), level 4 (120° and 240°), level 5 (150° and 210°) and level 6 (180°).

Data analysis

In order to assess the effect of joint attention on the mental rotation pattern, we compared intercepts and slopes of the rotation curves of the single and joint-attention condition (for analysis of slopes in mental rotation tasks, see Shepard and Metzler 1971; Cooper 1975; Amorim et al. 2006). To this end, two linear regression equations were calculated for each participant (see Lorch and Myers 1990, method 3; for a review, see Fias et al. 1996); one for the single condition and one for the joint-attention condition. Angle of rotation served as predictor variable, RTs and errors as dependent variables. Intercepts (indicating response times for non-rotated stimuli) and slopes (reflecting the time taken for rotation processes; see Just and Carpenter 1985) for the single and the joint-attention condition were compared with t tests. By means of this method, the rotation effect can be judged as a main effect and can be quantified in size (slope).

We focused on trials in which the initial hand picture was seen from a first-person perspective (1st PP trials). It can be assumed that in these trials, an egocentric reference frame is taken by default (Klatzky 1998; Tversky and Hard 2009). Thus, these trials allow to test whether joint attention leads to a change from an egocentric to an allocentric reference frame. In contrast, it is unlikely that participants would adopt an egocentric reference frame when seeing the first-hand picture rotated by 180° (3rd PP trials; see Saxe et al. 2006; Vogeley and Fink 2003). Therefore, these trials are unsuitable for testing whether joint attention leads to changes from an egocentric to an allocentric reference frame. Note that showing the initial hand picture from a third-person perspective in 50% of the trials was necessary to collect data from both participants who sat opposite each other.

Therefore, the main analyses only included trials for each participant in which the initial hand picture was seen from a first-person perspective. In an additional analysis of 1st PP trials, data points of the 180° rotation condition were excluded in order to assess whether the pattern of results holds without these data points. If participants in the 180° condition of the rotation tasks applied flipping strategies (flipping the picture along its horizontal axis), one should see a ‘dip’ in the performance rotation curve when stimuli are rotated by 180° (Cooper and Shepard 1973).

Third-person perspective trials (3rd PP trials) were analysed separately. Assuming that participants adopt an allocentric reference frame in 3rd PP trials, no firm predictions can be made regarding differences between the individual condition and the joint-attention condition. The reason is that using an allocentric reference frame should allow a participant to flexibly map different stimuli along their own body axis or along the other’s body axis.

All analyses included trials in which both pictures depicted the same hand (first: right hand; second: right hand) and trials in which the two pictures depicted different hands (first: right hand; second: left hand).

Results

Four participants were excluded due to error rates that were more than two SDs above average (8%). The remaining 22 participants had a mean age of 20.9 years (13 women, 18 right-handed).

Reaction times

Only trials with correct responses were included in the analysis. We found the typical mental rotation pattern, that is, an increase in RTs with increasing angle of rotation (slope tested against zero) [t(21) = 7.6, p < .001; see Table 1]. The comparison of slopes for the single and the joint-attention condition revealed a significant difference. Slopes were considerably flatter when both participant were jointly attending [t(21) = 3.7, p < .001; see Fig. 2]. Intercepts differed significantly [t(21) = 3.4, p < .01]. Participants were slower at processing non-rotated stimuli in the joint-attention condition compared to the single-attention condition.

Table 1 Slopes (ms/deg; per cent error/deg) and intercepts (ms; per cent error) for RTs and error rates of 1st PP trials in experiment 1, experiment 2 (separate for the cooperation and the competition group) and experiment 3 (separate for trials following 1st PP trials and trials following 3rd PP trials)
Fig. 2
figure 2

Reaction times and linear fits for 1st PP trials in both attention conditions of experiment 1. The single-attention condition is depicted in grey (squares), the joint-attention condition in black (triangles). The trend line for the single condition is depicted in grey, R 2 = .99. The trend line for the joint-attention condition is shown in black, R 2 = .95

Errors

Error rates increased significantly with increasing rotation [t(21) = 7.0, p < .001]. No effect of attention on slopes was present in error rates [t(21) < 1], nor was there any effect on intercepts [t(21) < 1]. See Table 1 for intercepts and slopes of both attention conditions.

Debriefing session

Participants indicated that they thought their behaviour and their performance had been unaffected by the other’s attention. None of the participants guessed that joint attention had affected their performance differentially depending on degree of rotation. When asked to guess in which way their performance might have been different in the joint-attention condition, approximately half of the participants indicated that they thought attending together had made them faster, whereas the other half of participants guessed that attending together had made them slower overall.

Exclusion of 180° data

All findings held when data at the 180° level were excluded from the analysis. RT increased significantly with increasing angle of rotation [t(21) = 8.4, p < .001], while slopes were flattened in the joint-attention condition [t(21) = 2.6, p < .05]. Intercepts differed significantly [t(21) = 3.2, p < .01].

Additional analysis including 3rd PP trials

A 2 × 2 ANOVA with the factors perspective of first-hand picture and attention showed a significant main effect of the factor perspective of first-hand picture [RTs: F(1, 21) = 43.0, p < .001; errors: F(1, 21) = 23.3, p < .001] on slopes. This was due to the fact that the rotation curve was nearly flat in trials in which the first-hand picture was shown from a third-person perspective [RTs and errors: ts(21) < 1; see Fig. 4]. However, as can be seen in Fig. 4, RTs on 0° trials were faster than RTs on other trials (0° contrasted with all other degrees: [F(1, 21) = 15.8, p < .01]). When 0° was excluded from the analysis, slopes of the rotation curves were still not different from zero [ts(21) < 1]. Importantly, there was a significant two-way interaction of attention and perspective of first hand in RTs [F(1, 21) = 8.1, p < .01]. This was due to the fact that attention affected only 1st PP trials, but not 3rd PP trials [t(21) < 1]. There was no general difference in RTs between joint and single-attention trials [ts(21) < 1].

Error rates were significantly higher when the initial hand picture was seen from a third-person view [t(21) = 3.1, p < .01] as compared to a first-person view.

Discussion

The results of experiment 1 showed increasing RTs and error rates with increasing hand rotation. Most importantly, the results confirmed our prediction that jointly attending to stimuli from different perspectives modulates the processing of these stimuli. The rotation curve was flattened when two people jointly attended to the same stimuli, as performance in ‘easy’ trials (small angles of rotation) was slowed down compared to the single-attention condition, while responses were faster in ‘difficult’ trials (larger angles of rotation). Thus, the other’s attention had a differential effect on the levels of rotation: the more the stimulus was turned towards the other person, the more participants benefitted from joint attention. The same pattern of results was found when data of the 180° rotation condition was excluded. Participants did not seem to strategically flip the stimulus when it was rotated by 180°.

The results suggest that when attending jointly from different points of view, participants may have suspended their egocentric reference frame and adopted an allocentric reference frame. This implies a transformation process whereby the rotated hand is processed by making use of the other’s body axis (Tversky 2005). Mapping the depicted hand onto the other’s body axis is beneficial in high rotation angles where the hand is seen upside down and easily fits the other’s body orientation. This explains why RTs for higher angles of rotation were faster in the joint-attention condition than in the single-attention condition, where participants likely used motor imagery from an egocentric perspective. Joint attention thus may provide a benefit for stimuli rotated towards the other person.

We also found that participants were slower for smaller rotation angles in the joint-attention condition compared to the single-attention condition. This may indicate interference between the egocentric reference frame and the allocentric reference frame primed through joint attention. When a hand is not rotated or only slightly rotated, motor imagery, which may constitute the default (Parsons 1994), is easily accomplished because the hand looks as if it belonged to one’s own body. In joint-attention trials, however, mapping hand stimuli onto the other’s body axis might interfere with motor imagery at these small rotation angles, leading to an increase in RTs. This interpretation is consistent with the claim that body parts can be spatially transformed by means of two different transformation processes, namely by motor imagery and by mapping stimuli onto a body axis (Amorim et al. 2006). Taken together, the present results are in line with the interpretation of a switch from an ego- to an allocentric reference frame in joint-attention trials.

The results provide evidence that joint attention, but not the mere presence of another person, triggered a switch from an egocentric to an allocentric perspective. A co-actor’s attention to the same location may highlight the co-actor’s perspective and thereby change the reference frame that is used for spatial processing. This extends earlier findings showing that differences in perspective affect stimulus processing and verbal descriptions of visual scenes (Samson et al. in press; Tversky and Hard 2009).

Results of the debriefing session provided no indication that participants were aware of any change in behaviour or performance. This speaks against deliberate use of perspective-taking strategies and suggests that people can rather effortlessly switch from an egocentric reference frame to an allocentric reference frame. Nonetheless, the task context may modulate the extent to which the other’s perspective is taken into account. If the task context calls for ‘fading out’ the other, it could be possible that the influence of the other’s perspective declines, or, vice versa, that it increases when the context demands focusing on the other. This was tested in experiment 2.

3rd PP trials

In 3rd PP trials (where the initial hand picture was rotated by 180°), no systematic relation between degrees of rotation and RTs was found, except for faster responses to pictures showing the same degree of rotation as the initial hand. This suggests that beyond 0° trials (which may have been faster due to a perceptual benefit of seeing the same position twice), participants neither selectively engaged in mentally aligning all second-hand picture with the initial hand picture (180°), nor in aligning them with their own hand (0°). Presenting initial hands in a third-person perspective may have primed participants to adopt an allocentric reference frame (note that stimuli seen from a third-person perspective are often referred to as ‘allocentric’; e.g., see Saxe et al. 2006; Vogeley and Fink 2003). While the initial hand (rotated by 180°) highlighted the other’s body axis, the second hand highlighted participant’s own body axis, especially when there were large rotations relative to the initial hand. This might have elicited a parallel mapping of the second hand onto the other’s body axis and the participant’s own body axis. The results are in line with this assumption because participants never completely ignored the other’s body frame, even when performing trials where the second-hand picture was fully aligned with their own body (180°s). Accordingly, responses in these trials were quite slow in 3rd PP trials (904 ms) as compared to 1st PP trials (734 ms). At the same time, participants never neglected their own body frame, as noticeable in slower responses to no-rotation trials in 3rd PP trials (836 ms) as compared to 1st PP trials (734 ms).

Given that participants in 3rd PP trials did not adopt an egocentric reference frame to begin with, joint attention could not further modulate the mental transformations employed to solve the task.

Experiment 2

The aim of experiment 2 was to investigate whether task context modulates the influence of the co-actor’s perspective on mental transformation. Recent evidence suggests that in competitive situations, participants focus on their own performance and ignore their co-actor. Bekkering and colleagues found that participants processed their partner’s errors like their own only in a cooperative, but not in a competitive setting (Bekkering et al. 2009). Although error processing is thought to occur early and automatically, the social setting modulated how other people’s errors were processed.

We manipulated social context in order to test whether the effect of joint attention observed in experiment 1 is sensitive to the type of social interaction participants are engaged in. If the tendency to adopt an allocentric reference frame depends on social context, the effect of the other’s perspective should be more pronounced in one of the two settings. If, by contrast, the effect of joint attention is immune to social context, it should be found in both a competitive and a cooperative setting.

Methods

Participants

Twenty-six same-sex pairs of undergraduate students participated in the experiment and received course credits or payment for participation. They were fellow students or friends and were randomly assigned to the two social context groups (13 pairs participated in the competition condition, 13 pairs in the cooperation condition). There were no differences in mean age, gender and handedness between groups (cooperation group: 21 female, mean age: 21.0, 3 left-handed; competition group: 20 women, mean age: 21.6, 4 left-handed). All of them reported normal or corrected-to-normal vision and signed informed consent prior to the experiment.

Stimuli and procedure

See experiment 1

Design

The design was the same as in experiment 1, with the additional between-subject factor type of social interaction. Participants in the competition group were informed that the person with faster reaction times and fewer errors would be paid an extra 5 Euros. Participants in the cooperation group were playing together against other pairs. Participants were informed that pairs that performed better than 50% of all other pairs would be paid an extra 5 Euros each. Thus, the chance of getting 5 Euros extra was as high in the competition group as in the cooperation group. To further emphasize individuality versus group belongingness, colours were assigned to either participants or groups (Patterson and Bigler 2007). Each participant in the competition group was assigned a different colour and so was each group in the cooperation condition.

Data analysis

The data were analysed in the same way as in experiment 1 (analysis of slopes and intercepts of the rotation curves with the factor attention condition). A 2 × 2 ANOVA with the between-subject factor type of social interaction and the within-subject factor attention was performed. As in Experiment 1, the main analyses included only 1st PP trials. Additional analyses were performed on the 1st PP data without the 180° rotation condition and separately on 3rd PP trials.

Results

Two participants in the cooperation condition and four participants in the competition condition were excluded due to error rates that were more than two SDs above average.

Reaction times

RTs increased significantly with increasing rotation [t(45) = 9.4, p < .001, see Fig. 3]. There was a significant difference between slopes in the single and the joint-attention condition. Overall, slopes were flatter when the other participant was attending as well [F(1, 45) = 11.2, p < .01, see Table 1]. There was no main effect of type of social interaction [F(1, 45) < 1] and no significant two-way interaction of attention and social interaction [F(1, 45) < 1]. Intercepts were significantly smaller in the competition condition than in the cooperation condition [F(1, 45) = 4.5, p < .05]. Intercepts were marginally higher in the joint-attention condition compared to the single-attention condition [F(1, 45) = 3.4, p = .07]. There was no significant two-way interaction of attention and social interaction [F(1, 45) < 1].

Fig. 3
figure 3

Reaction times and linear fits for 1st PP trials in both attention conditions of experiment 2. Left Cooperation group. Right Competition group. The single-attention condition is depicted in grey (squares), the joint-attention condition in black (triangles). The linear trend line for the single condition is depicted in grey, R 2 = .99 in the cooperation group and R 2 = .98 in the competition group. The linear trend line for the joint condition is shown in black, R 2 = .95 in the cooperation group and R 2 = .97 in the competition group

Exclusion of 180° data

All findings held when data at the 180° level were excluded from the analysis. RT increased significantly with increasing angle of rotation [t(45) = 8.8, p < .001], and slopes were significantly flatter in the joint-attention condition [t(45) = 2.1, p < .05]. There was no main effect of type of social interaction [F(1, 45) < 1] and no significant two-way interaction of attention and social interaction [F(1, 45) < 1]. Intercepts were significantly smaller in the competition condition than in the cooperation condition [F(1, 45) = 3.5, p < .05]. Intercepts did not differ between attention conditions [F(1, 45) < 1]. There was no significant two-way interaction of attention and social interaction [F(1, 45) < 1].

Errors

Errors rates increased significantly with increasing level of rotation [t(45) = 3.7, p < .01]. No effect of attention or social interaction on slopes was present in error rates [Fs(1, 45) < 1]. Intercepts were not affected by any of the factors [Fs(1, 45) < 1]. Slopes and intercepts are listed in Table 1.

Additional analysis including 3rd PP trials

A 2 × 2 × 2 ANOVA with the factors perspective of first-hand picture, attention and social interaction showed a significant main effect of the factor perspective of first-hand picture [RTs: F(1, 45) = 9.7, p < .01; errors: F(1, 21) = 7.3, p < .01] on slopes. This was due to the fact that the rotation curve was nearly flat on trials in which the first-hand picture was shown from a third-person perspective [RTs and errors: ts(45) < 1]. RTs were marginally faster on 0° trials than on trials including rotations (contrast between 0° and all later degrees: [F (1, 45) = 2.9, p = .09]. When 0° was excluded from the analysis, slopes were still flat [ts (45) < 1] (see Fig. 4 for 3rd PP trials in experiments 1 and 2).

Fig. 4
figure 4

Reaction times for 3rd PP trials in both attention conditions in experiments 1 and 2. Left Experiment 1. Middle Cooperation group. Right Competition group. The single-attention condition is depicted in grey (squares), the joint-attention condition in black (triangles)

There was a significant two-way interaction of attention and perspective of first hand in RTs [F(1, 45) = 4.1, p < .05]. This was due to the fact that attention affected only 1st PP trials, but not 3rd PP trials [t(45) = 1.2, p = .23]. Participants were faster in joint-attention trials as compared to individual attention trials in the competition group [t(21) = 2.5, p < .05], but not in the cooperation group [t(23) < 1].

Cooperation only

Reaction times

Increasing angles of rotation elicited an increase in RTs [t(23) = 8.4, p < .001]. We found a significant difference between slopes in the single and the joint-attention condition; slopes were considerably flatter when the other participant was attending as well [t(23) = 2.5, p < .05]. Intercepts were not affected by attention [t(23) = 1.5, p = .14].

Errors

Mean error rates were 7.1%. Error rates increased with increasing angle of rotation [t(23) = 3.5, p < .01]. No effect of attention on slopes was present in error rates [t(23) < 1]. There were no effects on intercepts [t(23) < 1].

Competition only

Reaction times

Increasing angles of rotation elicited an increase in RTs [t(21) = 5.7, p < .001]. We found a significant difference between slopes in the single and the joint-attention condition; slopes were significantly flatter when the other participant was attending as well [t(21) = 2.2, p < .05]. Analysis of intercepts revealed no significant differences [t(21) = 1, p = .33].

Errors

Mean error rate was 8,6%. Error rates increased with increasing angle of rotation [t(21) = 4,9, p < .001]. No effect of attention on slopes was present in error rates [t(21) = 1.5, p > .1]. Intercepts were not influenced by attention [t(21) < 1].

Discussion

The results replicate the main finding of experiment 1. When both participants attended jointly, slopes reflecting the relation between RTs and angle of rotation were flatter than when participants attended alone. The more stimuli were rotated towards the co-attending person, the more participants benefitted from the other’s attention. The joint-attention effect held when trials of the 180° rotation condition were excluded from the analysis.

The aim of the present experiment was to examine whether social context modulates this effect. We found that the slope of the rotation curve in the joint condition was flattened in both the cooperative and the competitive setting and that there was no interaction of group and attention condition. Thus, the type of social interaction between participants did not change the effect of the other’s attention. Even when the social context called for concentrating on one’s own performance (competition group), participants could not help taking the other’s perspective into account. This suggests that joint attention in both social contexts led participants to adopt an allocentric frame of reference.

However, social setting affected general performance. Participants in the cooperation group were generally slower than participants in the competition group. Competing against each other led to faster RTs than collaborating, suggesting that participants complied with the instructions.

Contrary to experiment 1, intercepts for the single and the joint-attention condition only differed marginally in experiment 2. Thus, although participants benefited from the other’s attention when stimuli were rotated towards the other, they were not slowed down as much by the other’s attention on non-rotated stimuli. This finding may be explained by the assumption that participants were highly focused on speeding up their responses because speed was rewarded in both groups. As the non-rotated stimuli were the easiest ones, they were the obvious candidates for speeding-up without making more errors. The attempt to respond as fast as possible might have prevented responses to non-rotated stimuli from being slowed down by the other’s attention.

Taken together, the effect of joint attention on mental rotation first observed in a neutral setting seems quite robust as the effect of joint attention on larger angles of rotation could be replicated in both a competitive and a cooperative setting. This effect seems best explained by the assumption that joint attention leads participants to adopt an allocentric reference frame.

3rd PP trials

As in experiment 1, no systematic relation between degrees of rotation and RTs was found in 3rd PP trials and except for faster responses in 0° trials performance curves were rather flat. Presenting initial hands in a third-person perspective may have primed participants to adopt an allocentric reference frame. As in the previous experiment, participants may have mapped stimuli in parallel onto their own and the other’s body axis. This would explain why, again, participants did not speed up when the second hand fit their own body posture and were slower to respond to 0° trials in 3rd PP condition than in the 1st PP condition. As for 1st PP trials, participants were significantly faster in joint-attention trials compared to single-attention trials in the competitive setting, implying that participants followed the instructions.

Experiment 3

The third experiment aimed at clarifying the mechanisms underlying the effect of joint attention on the slope of the rotation curve. The flattening of the rotation curve in the joint condition can be explained by assuming that joint attention leads participants to abandon their egocentric reference frame and to adopt an allocentric reference frame in order to transform the hand picture. The task we employed may have primed an allocentric perspective because on half of the trials, the initial hand picture was seen from the other’s first-person perspective (implying a third-person perspective for the participant).

This raises the question of whether effects of the other’s attention are stronger after priming an allocentric frame of reference. Previously, it has been reported that some brain areas have a preference for processing allocentric over egocentric views of bodies (Chan et al. 2004) and body parts (Saxe et al. 2006). Seeing a hand from a third-person perspective may prime a tendency towards interpreting stimuli within an allocentric reference frame. Are people more prone to taking the co-actor’s perspective into account after seeing a hand picture displayed from a third person, allocentric view?

To keep the task as similar as possible to the two previous experiments, we manipulated the perspective of the initial hand picture in a given trial (first person vs. third person; see Fig. 5) and studied how this affected performance on subsequent trials. The underlying logic of manipulating the orientation of the initial picture on a trial and studying the effect on a subsequent trial is as follows. If the initial hand picture were always seen from one’s own perspective, there would be no reference to the other’s perspective at all. In contrast, if the initial hand picture were always seen from the other’s perspective, there would be a strong emphasis on the difference in perspectives. Thus, varying the orientation of the initial hand picture in the preceding trial is an effective way of manipulating the reference to the other’s perspective and of priming an allocentric reference frame.

Fig. 5
figure 5

Upper graph Schematic illustration of two subsequent trials where participants saw the first stimulus of the pair in the preceding trial from a first-person perspective (leftmost picture). Lower graph Schematic drawing of two subsequent trials where participants saw the first stimulus of the pair in the preceding trial from a third-person perspective (leftmost picture)

If an allocentric perspective can be primed, seeing the initial hand picture in the preceding trial from a third-person view should enhance the joint-attention effect. In contrast, seeing the initial picture from one’s own perspective should lead to a reduced joint-attention effect in the subsequent trial.

Methods

Participants

Twenty-two undergraduate students (mean age 22 years; 17 women; 2 left-handed) participated in the experiment and received course credits or payment for participation. All of them reported normal or corrected-to-normal vision and signed informed consent prior to the experiment.

Stimuli and procedure

These were the same as in experiment 1, except that participants were assigned to a confederate.

Design

This was the same as in experiment 1, with the following exception. In order to investigate the effect of initial hand perspective in the directly preceding trial, the orientation of the initial hand was manipulated and participants’ responses in the subsequent trial was analysed (see Fig. 5). In the trials directly following the ‘orientation–manipulation–trials’, the initial hand picture was always seen from the participant’s first-person perspective, as only this condition was of interest for the analysis. We employed a 2 (orientation in preceding trial) × 2 (attention condition) factorial design and analysed slopes and intercepts.

Results

One participant was excluded due to error rates that were more than two SDs above the average (8%).

Reaction times

Overall, there was a significant increase in RTs with increasing level of rotation [t(20) = 8.6, p < .001]. No main effects of preceding trial [F(1, 20) = 2.9, p > .1] or attention [F(1, 20) < 1] were found. However, there was a significant two-way interaction of attention and preceding trial [F(1, 20) = 8.7, p < .01]. This was due to a significant flattening of the slope in the joint-attention condition when the preceding trial showed the initial hand picture from a third-person perspective [t(20) = 2.3, p < .05] and no such effect when the preceding trial showed the initial hand picture from a first-person perspective [t(20) = 1.4, p > .1; see Fig. 6].

Fig. 6
figure 6

Reaction times and linear fits for both attention conditions in experiment 3. Left Preceding trial showed first-hand picture from the first-person perspective. Right Preceding trial showed first-hand picture from the third-person perspective. The single-attention condition is depicted in grey (squares), the joint-attention condition in black (triangles). The linear trend line for the single condition is depicted in grey, R 2 = .92 for trials following first-person perspective trials (left) and R 2 = .97 following third-person perspective trials (right). The linear trend line for the joint condition is shown in black, R 2 = .93 following first-person perspective and R 2 = .93 following third-person perspective trials

Analysis of intercepts did not reveal a significant main effect of preceding trial [F(1, 20) < 1]. Attention had a marginally significant effect on intercepts [F(1, 20) = 4.0, p = .058], due to faster responses to non-rotated stimuli in the single-attention condition. The two-way interaction of preceding trial and attention was significant [F(1, 20) = 4.8, p < .05]. RTs were slower in the joint condition when the preceding trial showed the initial hand picture from a third-person perspective [t(20) = 3.0, p < .01]. RTs were unaffected when the preceding trial showed the initial hand picture from a first-person perspective [t(20) < 1]. Intercepts and slopes are summarized in Table 1.

Exclusion of 180° data

RTs increased significantly with increasing angle of rotation [t(20) = 9.9, p < .001]. The factors preceding trial [F(1, 20) = 1.3, p = .26] and attention condition [F(1, 20) = 2.0, p = .18] were not significant. Slopes were flattened in the joint-attention condition following 3rd PP trials [t(20) = 2.3, p < .05], but not following 1st PP trials [t(20) < 1], as reflected in a two-way interaction of attention and preceding trial [F(1, 20) = 4.5, p < .05].

Attention condition [F(1, 20) = 1.3, p = .27] and preceding trial [F(1, 20) = 1.5, p = .23] did not affect intercepts. The two-way interaction of preceding trial and attention was not significant [F(1, 20) = 2.7, p = .12], as RTs in the joint condition were only marginally faster when the preceding trial showed the initial hand picture from a third-person perspective [t(20) = 2.1, p = .058] as compared to no effect when the preceding trial showed the initial hand picture from a first-person perspective [t(20) < 1].

Errors

Error rates increased with increasing rotation [t(20) = 6.1, p < .001]. No effect of attention or preceding trial on slopes was present in error rates [ts(20) < 1]. Intercepts were not significantly affected by preceding trial [F(1, 20) = 1.5, p = .25] or by attention [F(1, 20) < 1], nor was there a significant interaction [F(1, 20) < 1].

Discussion

In this experiment, we manipulated the degree to which the directly preceding trial primed an allocentric rather than an egocentric frame of reference. The initial hand picture of the preceding trial could either be seen from the first-person perspective of the participant or from the first-person perspective of the task partner. As in the previous experiments, we found that joint attention led to a flattening of the rotation–performance curve. However, this effect was only present following trials that primed an allocentric reference frame. When an allocentric perspective was primed in the previous trial, joint attention in the subsequent trial triggered a switch from an egocentric to an allocentric reference frame. These findings corroborate our interpretation of the joint-attention effect in terms of a change in reference frame. Importantly, priming an allocentric reference frame alone cannot explain the observed effect, as the flattening of the rotation–performance curve occurred specifically on joint-attention trials.

Contrary to experiments 1 and 2, the effect of attention on the slope of the rotation curve did not reach significance in this experiment when trials where the initial hand was depicted from a first and a third-person perspective were combined. Re-analyses of experiments 1 and 2 confirmed that the joint-attention effect was due to responses following trials that depicted the initial hand from a third-person perspective, while the effect was absent when the preceding trial showed the initial hand from the participant’s own perspective. Thus, the only difference between the results of experiment 3 and experiments 1 and 2 consists in the size of the overall effect of attention. This is likely due to the fact that experiments 1 and 2 contained more trials overall where initial hands were shown from the other’s perspective. In experiments 1 and 2, participants saw the initial hand equally often from a first-person perspective and from a third-person perspective (50% each). In Experiment 3, the initial hand picture was displayed from a first-person perspective on 75% of the trials and from a third-person perspective only on 25% of the trials. However, the absence or presence of the overall effect of joint attention does not affect the interpretation of the results of experiment 3.

General discussion

The present experiments aimed at bringing together two aspects of joint attention that were addressed separately in previous research. Whereas research on gaze following has mainly focused on bottom-up, perceptual influences of joint attention, approaches on shared attention and shared intentionality have focused on the awareness of what is shared. The question we addressed here reaches into both domains and concerns the impact of sharing attention from different perspectives on object processing. Based on earlier findings (Tversky and Hard 2009), it can be hypothesized that joint attention triggers a switch from an egocentric to an allocentric reference frame. To recall, in an egocentric reference frame, objects are represented relative to the perceiver, whereas in an allocentric reference frame, objects are represented relative to the environment (Klatzky 1998; Soechting and Flanders 1992; Volcic and Kappers 2008).

In three experiments where participants judged the handedness of rotated hand pictures while engaging in joint or single attention, we found flatter slopes of the rotation–performance curves when both participants attended to the same stimuli. This indicates that during joint attention participants suspended their egocentric frame of reference and adopted an allocentric frame of reference. Experiment 2 investigated whether social context modulates this joint-attention effect. Participants in this experiment took the other’s perspective into account in both cooperative and competitive settings. Finally, in experiment 3 the effect of joint attention on mental transformation was only observed following trials that primed an allocentric perspective. Taken together, the results provide evidence that sharing attention affects the processing of jointly attended objects. More precisely, the present results point towards a switch from an egocentric to an allocentric reference frame when people attend to objects jointly from different perspectives.

This switch cannot be explained by the mere presence of another person (single-attention condition), suggesting that joint attention, by highlighting the perspective of the co-actor, plays a crucial role in triggering an allocentric perspective. It seems that participants computed the observed actor’s epistemic relation towards the object (Barresi and Moore 1996) only when the other was actually looking at the object. This implies that only when the other’s relation to the object and the difference to participants’ own relation were highlighted through joint attention did they give up their egocentric reference frame to adopt an allocentric reference frame.

We suggest that taking an allocentric perspective implies a change in the processes that people use to mentally manipulate objects. In single attention, where an egocentric perspective was held, the mental transformation task was likely solved through motor imagery, whereby participants imagined moving their own hand to match the position of the rotated hand (de Lange et al. 2006; Kosslyn et al. 2001; Parsons 1987a, b, 1994, Parsons et al. 1995; Wexler et al. 1998). In contrast, in joint attention, the allocentric reference frame enabled participants to map a rotated hand onto the other’s body axis (Amorim et al. 2006; Lakoff and Johnson 1999; Tversky 2005). The flattened slope in the joint-attention condition suggests that this process was beneficial in larger rotation angles; when the rotated hand was in line with the other’s body, it could easily be mapped onto the other’s body axis. Therefore, the more stimuli were rotated, the faster participants were in joint-attention as compared to single-attention trials.

Adopting an allocentric reference frame when jointly attending from opposite perspectives, thus, facilitated object processing especially when objects were turned towards the other. In contrast, slower responses to small angles of rotation in the joint-attention condition indicate that mapping the hand picture onto a body axis interfered with the default process of motor imagery occurring when the hand looked as if it belonged to one’s own body. However, other than the benefit for larger angles of rotation, the slow-down in smaller angles was not present in all experiments. Especially when the instruction stressed speed (experiment 2), the cost was reduced for trials where the objects were not rotated towards the other. Hence, the costs of an allocentric reference frame seem less reliable than the benefits.

Taking an allocentric reference frame provides co-actors with a processing benefit for objects that are depicted from the other’s perspective (thus are more easily processed from the other’s perspective). This processing benefit may support the efficiency and fluency of joint actions from different spatial orientations (Sebanz et al. 2006). In joint action contexts, co-actors often hold different views. Adopting an allocentric reference frame may help to integrate the consequences of one’s own and others’ actions, to predict each other’s impending actions (Sebanz and Knoblich 2009), and to work towards joint goals (Vesper et al. 2010). Adopting an allocentric reference frame (in which objects can more easily be interpreted in relation to a co-actor’s body) may also facilitate imitation (Wohlschlaeger et al. 2003) and other forms of joint learning (Csibra and Gergely 2009).

It seems that participants were not explicitly aware of any change in behaviour or performance, suggesting that that switching from an ego- to an allocentric reference frame may be rather effortless. Although this may seem surprising, previous studies have reported similar findings, and it has been argued that different perspectives can be rapidly and effortlessly computed (Samson et al. in press). In fact, taking an allocentric perspective in some situations may happen more naturally and spontaneously than taking an egocentric view (Tversky and Hard 2009).

It could be argued that both in the single and in the joint-attention condition, the hand pictures were mentally transformed by using purely visual strategies, merely comparing the visual shapes of stimuli (cf. Corradi-Dell’Acqua and Tessari 2010). This would imply that participants perceptually compared the shapes of the two hands rather than engaging in motor imagery or mapping the hands onto a body axis. However, a direct and continuous comparison of the shapes of the two hands was not possible in the present experiments because they were never displayed simultaneously. Therefore, it is rather unlikely that participants engaged in purely visual rotation (Grabherr et al. 2007). In agreement with this, earlier studies suggest that participants use motor imagery as a default strategy when mentally transforming body parts and only use visual strategies when instructed to do so (Tomasino and Rumiati 2004).

A further question concerns the role of action. In our task, we operationalized joint attention by having a second person attending to stimuli in order to act. This captures natural settings in which joint attention takes place, because we normally attend to objects with the intention to act on them (Humphreys and Riddoch 2004). Both participants attended to the stimuli with the same intention, which ensured that participants in joint-attention trials would direct their gaze to the screen in order to perceive the stimuli. A limitation of the present study is therefore that the role of action and the role of attention cannot be disentangled. It remains to be tested whether the joint-attention effect on mental rotation generalizes to settings where the other merely attends without acting. Note, however, that in order to minimize any potential effects of the other’s action, we made sure that the other’s actions could not be seen. Furthermore, both participants had the same stimulus–response mapping; thus, interference could not be caused by incompatibility of the two responses (Sebanz et al. 2005).

There are several open questions that will be interesting for future research. We have suggested that joint attention from different spatial perspectives may lead people to take the other’s perspective into account and to process stimuli within an allocentric reference frame. Consequently, it would be insightful to test whether brain structures that are related to processing body parts from an allocentric versus egocentric perspective (Saxe et al. 2006) are selectively activated by joint attention from different spatial perspectives. Additionally, besides the processing of body parts, joint attention from opposite perspectives might also affect processing of other kinds of objects. Is a co-attendant’s frame of reference, for instance, also beneficial when processing letters or words seen from a third-person perspective? Given that motor imagery within an egocentric reference frame seems to be involved at least to some extent in transformations of abstract objects (Wexler et al. 1998), it is conceivable that priming an allocentric reference frame through joint attention modulates transformations of objects other than body parts.

To summarize, we suggest that jointly attending to the same stimuli from different visual perspectives leads people to switch from a default egocentric reference frame to an allocentric reference frame. As other people’s perspective and body orientation can more easily be taken into account within an allocentric reference frame (Amorim et al. 2006), the switch to an allocentric reference frame induced by joint attention may provide a mechanism for creating perceptual common ground in joint action and communication (Clark and Krych 2004; Richardson and Dale 2005; Richardson et al. 2007).