In an ensemble, musicians collaborate with each other to achieve better performance. This is accomplished through the use of verbal, auditory, or visual cues. Verbal interaction among performers before a live performance plays an important role in an ensemble (e.g., Davidson & Good, 2002; Ginsborg et al., 2006; Ginsborg & King, 2012; Murnighan & Conlon, 1991). Nonetheless, once live performance begins, verbal communication is restrained. Then, following performance etiquette, performers generally rely on sound and visual cues to communicate with coperformers. With respect to communication through auditory cues, several studies have focused on synchronization. Numerous studies have examined tapping (reviewed by Repp, 2005; Repp & Su, 2013), and several other studies have investigated synchronization on the music ensemble scenario. Rasch (1979) showed that the timing lag in a small instrumental ensemble is generally 30–50 ms, with the performer leading the ensemble starts playing earlier than others. Shaffer (1984) and Keller and Appel (2010) also observed a similar level of asynchrony in piano duo performance. Loehr, Large, and Palmer (2011) examined musicians’ coordination of rhythmic musical sequences and suggested that oscillator-based account was favored. With respect to timing delay conditions, Bartlette Headlam, Bocko, and Velikic (2006) showed that, when latencies were above 100 ms, duet performers’ ratings of musicality decreased and asynchrony increased. Familiarity with the piece or the performance style also influenced coordination (Keller et al., 2007). In addition, prior studies suggested that anticipation or auditory imagery of performance is essential in an ensemble (e.g., Keller & Appel, 2010; Keller et al., 2007; Pecenka & Keller, 2011; reviewed in Hubbard, 2010).

Various types of visual cues also arise during ensemble performance. Prior studies have predominantly demonstrated body movement among visual interactions. Woodwind duos highly interact with each other using movement cues such as body sway and nodding in order to achieve a consistent expressive goal (Davidson, 2012). Ensemble performers’ sound is synchronized with the body movement of a conductor (Luck & Toiviainen, 2006). Maduell and Wing (2007) analyzed flamenco performance and discussed bodily cues for performers’ coordination. In terms of interperformer coordination during ensemble performance, recent studies have shown the importance of performers’ movements, including action simulation (reviewed in by Keller, 2012; Palmer, 2013). Loehr and Palmer (2011) suggested the importance of action co-presentation, in which people activate both their own and their partners’ mental presentation during joint action in piano performance. Pianists’ body movements become more synchronized as auditory feedback is reduced (Goebl & Palmer, 2009). Keller and Appel (2010) found evidence for interdependencies between sound synchrony and body sway. Keller (2008) reviewed the cognitive processes of musical ensemble performance. He pointed out anticipatory auditory imagery, prioritized integrative attention, and adaptive timing with respect to joint action in music performance.

However, apart from body movements, visual channels in an ensemble have barely been studied, although many kinds of channels (e.g., facial expression, gazing, posture, and proxemics) are used in everyday interpersonal communication (Argyle, 1988). In particular, although a large number of performers and commentators have mentioned its importance, only a few studies have focused on gazing behavior or eye contact. Indeed, performers often utilize eye contact in popular music bands (Kurosawa & Davidson, 2005). By analyzing how long the performers looked at a videotaped conductor, Fredrickson (1994) showed that it was 28% of the performance duration and that visual cues of the conductor aided better performance. Davidson (2005) counted the nonverbal behaviors of performers in a popular music band and reported that they frequently made eye contact. Other studies on duo performance reported that gazing behaviors of north Indian instrumental duo musicians and other types of duo musicians occurred with fairly consistent durations of 1–4 s during improvised music performance (Moran, 2010). The proportion of eye contact in piano duo performance increased with each performance during “important” parts and was strongly affected by communication of intense musical moments and the relaxed familiarity between performers (Williamon & Davidson, 2002). The direction of performers’ gaze might depend on the musical structure and social role in the ensemble (Kawase, 2009a, 2009b). These studies demonstrated that systematic gazing behavior occurred frequently and gazing seemed to contribute to the achievement of good performance. However, little substantial data has clarified whether the presence and timing of gazing actually facilitates ensemble performance or is merely an ancillary event that occurs while collecting visual information (e.g., body movements) about a coperformer. Therefore, a quantitative analysis may be more fruitful.

From the perspective of everyday interpersonal communication, the importance of gazing behavior has been robustly proved in the following two milestone studies. Kendon (1967) pointed out that gazing has three roles: emotional or attitudinal expression, information collection, and smooth coordination of conversation. Baron-Cohen (1995) analyzed the role of gazing from a developmental psychological perspective using mechanisms such as an eye -direction detector, intentionality detector, and a shared-attention mechanism. Moreover, gazing is also important for coordination. Shockley Richardson and Dale (2009) suggested that gaze coordination serves for common ground knowledge and visual information and it is related to mutual understanding in conversation. During conversation, the speaker coordinates gaze and speech on a micro level in order to confirm response from listeners (Bavelas et al., 2002). In a study on synchrony, movement of two participants swinging handheld pendulums synchronized unintentionally during visual interaction, whereas movement of the pair did not correlate during verbal interaction (Richardson et al., 2005). These findings indicate that there is a demand for gazing behavior during ensemble performance, in which coordination, emotional communication, or attention is necessary.

To establish whether performers regard gazing behavior as crucial and what kinds of roles gazing behavior plays, all players filled out in a preliminary survey (Appendix A). In this open-ended survey, gaze channel, —that is, a means of communication via gaze (e.g., eye contact or glance), —was the most -mentioned concept, referred to by 66.3% of the 86 surveyed amateur ensemble performers. Responses about gazing behavior were classified into the following categories: synchronization with coperformers (e.g., “I coordinate by looking at my coperformers’ eyes”) and social relationships, such as intimacy with coperformers (e.g., to create a sense of fellowship).

Taking into account of the results of the preliminary survey, I examined the importance of gazing behavior in terms of coordination with coperformers in the present study. As was mentioned above, recent studies focusing on body movement have demonstrated the effects of visual information on synchronization during music performance (e.g., Goebl & Palmer, 2009). Furthermore, during conversation, gazing relates to dyadic coordination (Shockley et al., 2009). However, few studies have demonstrated the role of gazing for synchronization or provided fundamental data of gazing behavior during performance, which was reported as important by the performers in our preliminary survey.

Performers’ gazing behavior was measured during piano duos since this is the smallest ensemble size, in order to examine its roles during ensemble performance from the aspect of synchronization. Specifically, focus was on the moments at which performers might have difficulty with coordination of timing, because prior studies had suggested that performers frequently look at their coperformers at the major boundary points of a piece and barely look at them during pieces with only small tempo changes (e.g., Keller & Appel, 2010; Williamon & Davidson, 2002). The results of those studies implied that gazing was necessary for effective ensemble performance only at specific points.

Experiment 1

The purpose of Experiment 1 was to explore whether gazing cues influence coordination. The moments when a performer looked toward a coperformer during piano duo performance were measured by altering visual-cue conditions.

Method

Participants

Six pianists (all females; age range = 20–31 years, Mage = 25.8 years, mean performance experience = 21.8 years) were professional or recommended by a lecturer of a music school. Four of them were award-winning performers or had experience in teaching. To prevent performers from sharing particular rules or performance cues with coperformers, the partners in the piano duos did not play with each other regularly. In addition, since the prior studies focusing on body movements showed that social relationships (e.g., leader–follower) altered performers’ behavior (Goebl & Palmer, 2009; Keller & Appel, 2010), were selected who had an equal relationship in order to eliminate such influence.

Material

In this experiment, the players performed “Prologue de Coq’licot,” which is the first of the series of four tunes, Quatre Tableaux Féeriques, composed by Yumiko Kano (Kano, 1994; see Appendix B), because the piece incorporates two parts, primo and second, and nine changes of tempo, during which the two performers need to coordinate timing and begin to play simultaneously after a long pause (approximately 1.5 to 3 s). This piece provides information about tempo via musical terms (e.g., from lento to allegretto; see Appendix B). It constitutes nine moments of tempo change. In addition, the piece had a duration suitable for this experiment (approximately 3 min) and is easy to play for both performers. An interview after the performance also showed that the participants easily played their own part.

Procedure

Each performer was positioned in a separate soundproof room (see Fig. 1). They received a score on the day of the experiment and selected one of two parts, primo or second. They could neither see each other nor listen to the coperformers’ performance during practice. After adequate practice, the participants played the piece three times using an electric piano (P-155, Yamaha) under four different visual-contact conditions, done in the following order: invisible, body visible, head (face) visible, and face -to -face (in which the participant could see the coperformer entirely; Fig. 2), with the exception of one pair who played under the body- visible condition prior to the head- visible condition. The participant could only hear her own and her co-performer’s piano sounds and made visual contact via a glass window during the coperformer- visible conditions. A screen between performers set visual limits under each condition. The condition order was determined in light of studies on the high reproducibility of skilled performers’ performance. Skilled performers can reproduce their performance accurately, especially under normal conditions (e.g., Highben & Palmer, 2004; Seashore, 1938; Shaffer, 1984). To avoid having memorized performance from conditions with adequate audiovisual information available to players, the invisible condition occurred first in the sequence, and the normal (i.e., face-to-face) condition last. The participants began to play without guidance, such as a metronome, after the experimenter stated “Please start to play” in each instance. Although the participants had to start to play spontaneously under the invisible condition, they could coordinate with one another during one or two bars of the piece. The audio from each performance was recorded on a multi-track recorder (SX-1, TEAC).

Fig. 1
figure 1

Experimental settings

Fig. 2
figure 2

Visual-contact conditions

Data analysis

Performance was recorded on four video cameras, and each performer’s gazing behavior was analyzed frame -by -frame using the Behaviour Coding System software (IFS-18C, DKH). The temporal resolution of the video was 29.97 frames per second (NTSC standard). To obtain the correct data without disturbing the players’ performances, the observational method was adopted (e.g., Argyle & Dean, 1965). All trials were videotaped by a camera placed behind and slightly to the side of the coperformer (to collect data under the head visible and face-to-face conditions). Another camera was placed in front and slightly to the side of the coperformer (to collect data under the body visible and face-to-face conditions). Both cameras were placed such that each performer’s gaze was directed toward the camera when she looked toward the coperformer (Fig. 1). To confirm this, before the experiment a picture of each performer was taken after she had been asked to look at her coperformer. These procedures were also applied to the second member of the duo (i.e., the coperformer). As is indicated by Fig. 1, performers sat at a slightly oblique angle. They averted their eyes from the score and turned their head and eyes toward the coperformer while looking at her. This phenomenon was quite clear (e.g., Doherty & Anderson, 2001). Therefore, was easy to determine whether the performers were gazing at their coperformers.

Mutual gazing was defined as both performers looking toward their partners. This mutual gaze technically differs from eye contact, in which both performers looked into their partners’ eyes. Eye contact would occur during mutual gazing under head visible and face-to-face conditions. However, eye contact could not occur under only body visible condition, although mutual gazing toward partners’ bodies occurred.

The lag of the tone between performers was measured using Sound Forge (Sony Pictures Digital Inc.) with reference to the waveform and sound recorded in a separate track. To avoid mixing of acoustical information, the performances of both participants were separately recorded in each track. At the times when coordination was required, —that is, when the tempo changed and the pianists had to synchronize with each other (henceforth, moments of tempo change), —no sound interrupted measurement or occurred with another sound. Because the change in waveform could easily be observed, sound onset was defined as the moment when the waveform began to change and the sound started, in order to examine the timing lag (i.e., asynchrony) between two performers (Fig. 3 represents an example). To confirm the accuracy and validity of this method, the experimenter first randomly selected 54 moments of tempo change. Then, the experimenter and a collaborator measured and compared these moments. The result showed that the average absolute lag between them was 0.63 ms, and the SD was 0.62. This discrepancy is much smaller than that of the asynchrony between performers shown in the later results (approximately 50 ms).

Fig. 3
figure 3

Example of waveforms at a moment of tempo change

Point- biserial correlation coefficients were calculated between timing lag and gaze as an index of the relationship between gazing pattern and timing lag. The point- biserial correlation coefficients were calculated as Pearson’s product-moment correlation coefficients between timing lag and binarized gazing behavior. The procedure was as follows: First, the gazing behavior of each pair was measured, with gazing behavior being classified at any specific point into three categories: mutual gazing, solitary gazing, and absence of gazing. Each type of gazing behavior at each measured moment was assigned a binary value (i.e., it occurred [1] or did not [0]). For example, when mutual gaze occurred at a moment, that is, solitary gazing and absence of gazing did not occur that moment was quantified as follows: mutual gazing [1], solitary gazing [0], and absence of gazing [0]. Each condition included 27 moments of tempo change (3 trials × 9 moments of tempo change). The variables were occurrence of each type of gazing behavior (0 or 1) and timing lag. The combination of 27 timing lags at each moment of tempo change was stable; however, since gazing behaviors varied by the moment, the combination of the 27 gazing behaviors varied by moment. Finally, average correlation coefficients among the three pairs were obtained by first transforming the correlation to z score, using Fisher’s z -transformation, then transforming the average z score to the correlation. This procedure was also applied to Experiment 2. The coefficient was only used as an index, —that is, not as a statistical test, —because the number of coefficients obtained from the three pairs varied from moment to moment. Namely, no correlation coefficient at specific moments could be obtained during which all (27 total) gazing behaviors followed the same pattern—for example, the absence of gazing. Accordingly, the threshold of significance also changed from moment to moment.

Results

The frequency of the performer’s gaze toward the coperformer under each condition became the highest around tempo changes. First, the duration of the entire piece was calculated, except for the opening and ending (i.e., from the first to the ninth moment of tempo change). In this period, 93.7% of all gazing toward the coperformer occurred in the interval between 4 s before and 2 s after the sound onset of either performer at points at which the performers had to play simultaneously (i.e., the moments of tempo change), even though the total duration of the nine moments of tempo change was only 25.0% of the duration of the entire piece excluding the opening and ending. In other words, gazing behavior occurred very frequently in these 6-s-long intervals. Figure 4 presents the gazing behavior under each condition during moments of tempo changes. Zero seconds on the horizontal axis represents the onset of the next phrase, played by either participant, at which performers began to resume play after a long pause. Negative values indicate that the performer looked toward the coperformer before the onset of performance. Figure 4a depicts the average data from 162 samples (6 participants × 3 trials × 9 moments of tempo change). The vertical axis represents the frequency of gazing behavior at all moments of tempo change. The results show that each performer looked toward the coperformer just before the moments of tempo change. The timings when a performer looked toward the coperformer most frequently were −0.44 and −0.42 s, under the head-visible condition (rate = .85; i.e., performers looked toward coperformers 137 times out of 162 measuring points); −0.22 and −0.23 s under the body-visible condition (rate = .83); and −0.44 under the face-to-face condition (rate = .86). Mutual gazing (3 pairs × 3 trials × 9 moments of tempo change) also occurred just before the coordination points, and the most frequent mutual gazing timings were −0.44, −0.42, −0.41, −0.37, and −0.36 s (rate = .70) under the head-visible condition; −0.22 and −0.23 s under the body-visible condition (rate = .69); and −0.44, −0.45, and −0.46 s under the face-to-face condition (rate = .73) (Fig. 4b).

Fig. 4
figure 4

Gazing behavior at coordination moments. Zero seconds on the horizontal axis represents onset of the tone. The vertical axis represents the frequency of gazing behavior at all moments of tempo change. (a)All performers’ gazing behavior toward the coperformer. (b)Mutual gazing

Average timing lags were calculated for all nine moments of tempo change on each trial of each pair. Figure 5 indicates the average absolute timing lags between the performers at nine moments of temporal change under each visual-contact condition. A two-way (3 trials × 4 visual-contact conditions) within-subjects ANOVA on timing lag yielded a significant main effect of visual-contact condition [F(3, 22) = 7.520, p = .001, ηp 2 = .506], but no main effect of trials or interaction. Multiple comparisons revealed significant differences between the invisible and other conditions (Bonferroni’s method, p < .05).

Fig. 5
figure 5

Average absolute timing lags under each visual-contact condition: partner invisible, body visible, head visible, and face to face. Error bars represent standard errors

Next, the point-biserial correlation coefficient was calculated between timing lag and the binarized value of gazing behavior at each moment of tempo change (Fig. 6). Thereby, the negative correlation coefficient at a certain point indicates that when the gazing behavior occurred at that moment, the timing lag between performers decreased. In Fig. 6, the horizontal axis represents timing (onset of sound is 0), and the vertical axis represents the correlation coefficient. Even though the correlation coefficient values at the moment at which gazing behavior rarely occurred were either extremely high or low, the figure depicts these values as raw data. The moments when all gazing behaviors were either 0 or 1 were not described.

Fig. 6
figure 6

The point-biserial correlation coefficients between the timing lag and gaze at each time instant. The horizontal axis represents the timing of performance sound (zero means the starting time of performance sound). The vertical axis represents the point-biserial correlation coefficient between timing lag and the gazing behavior at each point of tempo change. Each panel depicts the results of mutual gazing, solitary gazing, and the absence of gazing, respectively, under each condition

The results showed that the correlation coefficient of mutual gazing was low just before (approximately 0.5 s) the moment of tempo change under all conditions. At the same moments, the correlation coefficients of both solitary and absence of gazing behaviors were positive or had near-zero values.

Discussion

The participants frequently looked toward coperformers around the moments of tempo change under all visual conditions. This seems to be consistent with a prior study in which pianists looked at each other at important parts (Williamon & Davidson, 2002). However, this counters several earlier studies on visual cues in a piano duo. Keller and Appel (2010) observed that piano duo performers need not to look at each other under face to face conditions while playing a piece with only small tempo changes. Davidson (2012) argued that the occurrence of glances was less frequent than expected when a flute-clarinet duo played a short piece, despite finding that glance assisted musical coordination. In that study, glances mainly occurred at the major boundaries—for example, the start of the piece, which agrees with a previous work (Williamon & Davidson, 2002). This indicates that visual cues may be adopted at the moments of tempo change with remarkable temporal changes, when a coperformer’s next beginning of tone is difficult to predict, as had been assumed in the introduction. On the other hand, visual cues may not be necessary during the less variable parts of the piece.

The results also suggest that performers might utilize movement cues. No significant timing lag was evident within coordination except under the partner invisible condition, which means the timing lag hardly changed owing to the conditions of visibility of body parts. The timing lag under the conditions with visual cues was 62 to 47 ms. Shaffer (1984) observed timing lags of several dozens of milliseconds during piano duo performance. Horiuchi, Mitsui, Imiya, and Ichikawa (1996) suggested that performers do not recognize a timing lag of approximately 100 ms in piano duo performance. Rasch (1979) showed 30- to 50-ms gaps during a small instrumental performance. According to these findings, the timing lag under the conditions with visual cues in the present study was sufficiently small. Hence, visual cues derived from specific parts of a coperformer’s body may not contribute to coordination. The participants could employ the following channels under each condition: only performance sound under the partner invisible condition; sound and body (movement) under the body visible condition; and sound, head (movement), facial expression, and gazing under the head visible condition. All the above elements were available under the face-to-face condition. In other words, “movement” is a common component of the channels under all conditions with visual information. Overall, performers can coordinate during ensemble performance if “movement” channels are available, which agrees with the finding that the participants employed movement cues in ensemble performance (e.g., Davidson, 2012). Movement plays fundamental roles in synchronization. Detecting information of attention and movement is important for synchronization (Richardson et al., 2007). In a previous study, pairs of participants constantly synchronized pendulum movement while looking at vibrating stimuli (Schmidt et al., 2007). Considering these studies, it may not be surprising that performers utilize movement cues for coordination.

Then, are gazing cues not necessary for synchronization? On the basis of the present result, although mutual gazing (i.e., partners looking toward each other) just before the coordination moment facilitated synchronization, eye contact (i.e., looking into partners’ eyes) might not be of much importance. Under all visibility conditions, the correlation coefficient between gazing behavior and timing lag within coordination were considerably negative when the participants reciprocally looked toward each other just before the onset of tone. This result suggests a coordination function of mutual gazing, not eye contact. First, eye contact under the body-visible condition never occurred, because performers could not see their partners’ eyes. Second, the correlation coefficient of solitary gazing and absence of gazing at the same points were positive or near zero. Hence, only the mutual gazing just before the onset of sound reduced the timing lag within coordination, whereas neither solitary gazing nor absence of gazing at specific points (e.g., just before coordination) facilitated synchronization between performers. In addition, mutual gazing at the moment of onset of sound did not serve to reduce the timing lag. This tendency under each visual condition supports the above hypothesis that participants utilize movement cues to predict the onset of tone. Consequently, participants might predict the onset of tone by using movement cues, whereas gazing toward a coperformer was unnecessary at the very moment of the onset of sound.

The results also suggest that performers should fulfill mutual adaptation with a partner for better synchronization. Konvalinka, Vuust, Roepstorff, and Frith (2010) investigated joint finger tapping between paired participants and found a mutual attempt to synchronize with one another. They suggested that successful coordination was based on not only the prediction accuracy of the partner’s future action, but also on mutual adaptability to the action. Their experiment was carried out under partner-invisible condition. However, such mutual adaptation may also occur under conditions with visual cues. If a performer merely sent the movement cues, and the coperformer received and utilized, such results would not be obtained. Thus, performers play while both sending and receiving the movement cues during performance, although exceptions do occur in actual performances: A piano accompanist with a vocalist often plays under the solitary gazing condition, an orchestra conductor cannot look at all other members, and some performers may not attempt to see coperformers at all.

A discrepancy between actual behavior and musicians’ self-reports is thus apparent (see the preliminary survey in Appendix A). The present results reveal that a performer might utilize movement cues, but eye contact itself might not be of much importance. This slightly contradicts performers’ comments in the preliminary survey that eye contact facilitates coordination. Experiment 2 was therefore designed to explore whether gazing is important for interperformer interaction synchronization.

Experiment 2

In this experiment, the question was whether gazing itself affects coordination. To analyze the roles of gazing behavior, gazing behavior and the timing lag during moments of tempo change were measured by restricting head movement—that is, by excluding movement cues.

Method

Participants

Twelve proficient pianists (all females; age range = 21–41 years, M age  = 26.7 years, mean performance experience = 23.2 years) played in a piano duo. Nine of the participants were award-winning performers or had experience in teaching. As in Experiment 1, piano duos were formed in which partners did not play with each other regularly. Also, pairs were selected who had an equal relationship in order to eliminate an influence of social relationship (e.g., leader-follower). Since the piece included fewer coordination points than the one in Experiment 1, more participants were employed here than in Experiment 1.

Material

A piece composed by a professional composer was newly written for the present study on a single page to exclude large motions such as page turning, which would hinder performance under fixed head conditions. The piece incorporated four changes of tempo (e.g., from moderato to allegro), during which the two performers need to coordinate timing during notes with fermata, which shows the end of a phrase or indicates the prolongation of a note or a rest beyond its usual value (Fuller, 2001), and begin to play simultaneously after a long pause (see Appendix C). To examine gazing behavior in terms of synchronization, the analysis focused on these four moments of tempo change.

Procedure

The experimental settings, procedure, and methods used to measure timing lag and gazing were the same as those used in Experiment 1, except that the visual-contact conditions were partner invisible, only movable head visible, and only fixed head visible. The movable-head-visible condition was the same as the head-visible condition in Experiment 1. The fixed-head-visible condition was implemented by using a chinrest made so as not to hinder the performance movement of the body, except for the head. The participants were instructed to place their chins on the chinrest during the performance. In addition, they were also instructed not to move their heads on the chinrest. The participants performed the piece three times under each condition. First, they played under the partner invisible condition, and then they played under the other two conditions, ordered randomly for each pair. Five measurements were eliminated in which either performer began playing the subsequent note (i.e., missed the beginning of the next phrase), and one measurement in which one performer made an error.

In addition to the asynchrony between performances at moments of tempo change marked with a fermata, the durations of fermatas were also measured. In the present study, the duration of a fermata was defined as the interonset interval (i.e., the period from the later onset of the note with a fermata, played by either participant, to the earlier onset of the next phrase played by either participant; see Fig. 7).

Fig. 7
figure 7

A frame format of the definition of fermata duration

Results

Figure 8 indicates the gazing behavior under two visibility conditions at four moments of tempo change. First, the duration of the entire piece was calculated, except for the opening and ending (i.e., from the first to the fourth moments of tempo change). In this period, 93.6% of all gazing toward the coperformer occurred in the interval between 4 s before and 2 s after the sound onset of either performer at points at which the performers had to play simultaneously (i.e., the moments of tempo change), even though the total duration of the four moments of tempo change was 70.1% of the duration of the entire piece excluding the opening and ending. Figure 8a represents the average rate of gazing toward the coperformer at four moments in three trials under each condition. The horizontal axis represents time, and 0 s is the onset of the tone. The vertical axis represents the ratio of gaze within 144 samples: 12 (participants) × 3 (trials) × 4 (moments of tempo change), with time. The largest rates were as follows: .85 (t = −0.63 to −0.61 s) under the movable-head condition, and .85 (t = −0.66 to −0.64 s) under the fixed-head condition. Figure 8b depicts the rates of mutual gazing within 72 samples: 6 (pairs) × 3 (trials) × 4 (moments of tempo change). The horizontal axis represents time. The largest rates were as follows: .71 at t = −0.63 to −0.61 and −0.56 s under the movable-head condition, and .72 at t = −0.66 to −0.64, −0.57, and −0.56 s under the fixed-head condition. The rate of gazing toward the coperformer or mutual gazing became the largest just before the onset of the tone.

Fig. 8
figure 8

Gazing behavior at the points of tempo change. (a)Gazing patterns of all performers. (b)Ratios of mutual gazing

The average timing lags for all four moments of tempo change on each trial were calculated for each pair. Figure 9 shows the timing lags between performers at four moments of tempo change. A two-way ANOVA on timing lag revealed that the influences of conditions [F(2, 40) = 52.562, p < .001, ηp 2 = .724] and trials [F(2, 40) = 8.169, p = .001, ηp 2 = .290] were significant, but the interaction was not significant. Multiple comparisons indicated significant differences among all conditions (Bonferroni’s method, p < .05). The timing lags of the third trial were smaller than those of other trials (Bonferroni’s method, p < .05).

Fig. 9
figure 9

Average absolute timing lags under each visual-contact condition. Error bars represent standard errors

Figure 10 shows the duration of a fermata, which is the period from the later onset of a tone with a fermata, played by either participant, to the earlier onset of the next phrase played by either participant. A two-way ANOVA on timing lag showed significant differences [F(2, 40) = 48.033, p < .001, ηp 2 = .706]. Neither the main effect of trials nor the interaction was significant. Multiple comparisons showed significant differences under between the invisible and other visible conditions (Bonferroni’s method, p < .05)

Fig. 10
figure 10

Durations of the fermata under each visual-contact condition. Error bars represent standard errors

Next, to elucidate the relationship between gazing behavior and synchronization at each moment, behavior was investigated at the moments of tempo change, and point-biserial correlation coefficients were calculated between the timing lag and the occurrence of gazing behavior (Fig. 11). The correlation coefficient of mutual gazing was low just prior to (approximately 0.5 s) the points of tempo change under both conditions. At the same moments, the correlation coefficients of both solitary and absence of gazing behavior under both conditions achieved positive or near-zero values.

Fig. 11
figure 11

Correlation coefficients between the timing lag and gaze at each time point. The scale is the same as in Fig. 6. Each panel represents the results of mutual gazing, solitary gazing, and the absence of gazing, respectively, under each condition

Discussion

The occurrence of mutual gazing just before the moment of tempo change improves the accuracy of synchronization. This accounts for the following facts. The correlation coefficients between gazing behavior and the timing lag showed a similar tendency in Experiment 1. In particular, the correlation coefficients between mutual gazing and the timing lag were negative just before the onset of tone, regardless of head motion. In contrast, the positive correlation coefficients within solitary gazing suggest that the timing lag increased while either of the performers looked toward the other. The correlation coefficients of absence of gazing around the onset of tone were negative under the fixed-head-visible condition. This suggests that at the onset of tone, the timing lag decreased when the performers did not look toward each other. In summary, the most effective behavior to facilitate synchronization was as follows: Immediately prior to the keystroke, the participants looked toward one another; then, at the moment of the keystroke, both of them averted their gazes from their coperformers, specifically under the restricted movement condition.

The results also showed that mutual gazing correlates with the duration of the last tone or the pause at the moments of tempo change. The performers looked toward the coperformers just before the moments of tempo changes under both movable and fixed head visible conditions. Because the performers’ attitudes were similar regardless of whether the head was fixed, movement of the head was not likely to affect gazing behavior. The performers looked toward the coperformers after the tone with fermata (the highest number of occurrences of gaze was approximately 0.6 s before the onset of tone), although the duration of fermata was more than 2 s. As a result, gazing was not a cue for the onset of tone with fermata, whereas it seems to associate with the duration of the tone with fermata.

Another finding is that gazing alone could to some extent enhance coordination even though movement cues were not available, because the timing lag under the restricted movement conditions was smaller than that under the invisible condition. This suggests that gazing provided some coordination cues, although movement cues are necessary for strict coordination. This issue is taken up again in the General Discussion.

The results also indicated that movement cues are necessary for reducing the timing lag between performers. The timing lags under each condition varied significantly. The two head-visible conditions were different depending on the presence or absence of head movement cues. The timing lags under the movable head visible condition in which performers could move their head was similar to that under the head visible condition in Experiment 1. This means that the cue of head movement significantly enhanced coordination and that the timing lag did not depend on the piece. According to prior studies, performers might also communicate with one another emotionally about aspects such as estimation of performance under the movable head condition, because head movements convey the performer’s intent (Dahl & Friberg, 2007; Davidson, 1994). These findings may support the present result that movement cues are likely to be crucial for strict synchronization.

In this experiment, the practice effect might reduce timing lags with the progression of trials within each condition, because the timing lags of the third trial were smaller than those of other trials. However, no main effect of trials on the duration of fermata and timing lags was observed in Experiment 1. In addition, the timing lags were similar to those under the same condition (i.e., the partner invisible and the head visible condition) in Experiment 1. Further investigation will be necessary to explore this discrepancy in the effect of trials.

The question remains whether the fixed-head condition created an unnatural performance situation. Did this fixed condition hinder each performer’s timing? The results demonstrated that such disruption might affect only micro-level coordination, because there was no significant difference in the duration of the fermata between the movable- and fixed-head conditions. If performers found it difficult to play because of the fixed head, their entire performance would be influenced by this restricted condition. However, the conditions did not alter the duration of fermata. Thus, fixing the head little encumbered the remarkable artistic temporal expression of the piece.

General discussion

The present study showed that (1) piano duo performers frequently looked toward coperformers at moments of tempo changes, in which reciprocal gazing just before the moments of tempo change facilitated synchronization; (2) mutual gaze modulates remarkable and arbitrary temporal expressions such as fermata; (3) gazing without movement cues somewhat enhanced synchronization, although the performers seemed to coordinate with each other not by employing movement of specific parts of the coperformer’s body but by watching the body parts containing movement cues.

First, the results suggest that mutual gazing modulates remarkable temporal expression such as fermata. One possible explanation for this is the characteristic of fermata, whose duration is arbitrary or varies among individuals. Namely, the duration of the fermata note was decided on the basis of the conflict between the individual interpretation and unification of ensemble, which makes it difficult to predict the beginning of the next note, following fermata with playing the note with fermata. Consequently, in the present study, the performers may adjust their perspective of temporal expression of performance (e.g., duration of fermata) by employing gazing behavior. This is supported by the fact that the duration of fermata was similar regardless of whether movement was available. That is, movement cues did not influence the duration of fermata. The present result is consistent with the findings regarding the importance of visual information in coordination. Specifically, eye contact frequently occurs at important parts (Williamon & Davidson, 2002), and visual information influences dyadic synchronization while participants are coordinating with each other well (Richardson et al., 2005). In contrast, other studies indicated that performers within a piano duo do not see the coperformer while playing a piece without remarkable temporal changes (Davidson, 2012; Keller & Appel, 2010). Repp and Penel (2002) showed that auditory information was superior to visual information in conveying temporal precision, even if participants pay attention to the visual sequence. These studies suggest that if performers can adequately predict the onset of tone owing to sufficient practice or when playing pieces without remarkable temporal changes, visual information or gazing may not be necessarily important for coordination.

Gazing might be the cue of the beginning or end of temporal coordination or arbitrary expression, such as fermata, in ensemble performance. The performers looked toward the coperformer just before the onset of tone in temporal changes, in which the next one is difficult to predict. This might have an aspect similar to that of the role of gazing in daily communication, despite difference of situation. Studies on conversation showed the roles of gazing such as turn taking, whereby a speaker looks toward a listener while the listener averts his/her gaze from the speaker just before turn taking during conversation (e.g., Kendon, 1967). The speaker coordinates gaze and speech on a micro level in order to confirm response from listeners (Bavelas et al., 2002). Daibo (1998) noted that turn of speaking may unintentionally transfer to the listener if a speaker looks toward a listener even though the speaker wants to continue to speak. In addition, unnecessary gazing interferes with spontaneous speech (Beattie, 1981). These studies suggest that looking at a partner may be the cue of a forthcoming event, whereas unnecessary gaze toward a partner may lead to misunderstandings. This seems to be a possible explanation for behaviors in the present analysis and prior studies on ensemble performances, in which performers look toward coperformers at the moments of tempo change and do not do so at other parts (e.g., Davidson, 2012; Keller & Appel, 2010). Namely, gazing might be the cue of the flexible temporal expression, such as fermata, in ensemble performance. In contrast, a low frequency of eye contact or gazing during performance of a piece with less tempo variation might prevent the sending of unnecessary information.

Second, gazing alone could somewhat enhance coordination without movement cues because performers coordinated with one another more successfully under the movement-restricted condition than the invisible condition. One interpretation of this phenomenon is that performers might catch the signal of the ending of the current event (e.g., a fermata) through gaze cues, even under conditions without movement cues. Such a signal would contribute to determining the rough onset timing of next phrase. However, because movement cues were not available, synchronization was not sufficiently accurate. This also implies that performers’ eye information did not play a role in predicting the “precise” onset of the next phrase, although such rough coordination via eye information should be investigated in detail.

Finally, the role of gazing in strict synchronization might catch the partner’s movement cues reciprocally. This also suggests that movement cues used by both performers were important for strict synchronization. Indeed, the results indicated that asynchrony was smaller under the free-movement condition than the restricted-movement condition. The present results confirm prior studies suggesting that body movement relates to interperformer interaction (e.g., Davidson, 2005; Goebl & Palmer, 2009; Keller, 2012; Luck & Toiviainen, 2006; Maduell & Wing, 2007). Similarly, these results are compatible with the evidences of the importance of movement cues for synchronization in nonmusical contexts (Richardson et al., 2005; Shockley et al., 2009). In addition, our findings on the mutual utilization of body movement might be parallel with studies suggesting the importance of action copresentation, in which people activate both their own and their partner’s mental presentation during joint action (Loehr & Palmer, 2011; Sebanz et al., 2003; Vesper et al., 2010) or musicians’ action simulation associated with auditory and motor images (reviewed in Keller, 2012). On the basis of these studies, for synchronization during piano duo performance, it seems to be important to predict the mutual effect of own and partner’s actions and to simulate performance actions for keystrokes. Accordingly, the present results of strict synchronization implied that performers might anticipate and coordinate the next-tone onset through mutual movement cues, simulating the keystroke action via the partner’s action.

In summary, the process of gazing behavior at moments of tempo change might occur as follows. First, both performers recognize the moments as difficult to coordinate. Second, they look toward each other just before the moment of synchronization, which serves the exchange of movement cues. Subsequently, they predict the onset of the next tone by using movement cues and anticipate the length of the interval from the end of tone in temporal change to the beginning of tone in the next part. Finally, they avert their gaze from the coperformer, moving their arms on the basis of their prediction, and begin to play the next part. Otherwise, averting their gaze may be the cue for ending the pause.

Another finding of the present study is the discrepancy between performers’ self-reports and actual gazing behavior. In the preliminary study, many participants responded that eye contact contributes to coordination. However, eye contact (i.e., looking into coperformers’ eyes) did not contribute to micro-level synchronization, whereas mutual gazing (i.e., simply looking toward the partner) facilitated synchronization under the motion visible condition even when they could not look at their partner’s eyes. Thus, it can be inferred that performers consider that they make eye contact rather than just looking at their coperformer’s motion on the basis of subjective criteria.

Future studies should confirm that performers actually predict tone onset by gazing at each coperformer’s movement, although in the present study, no whole-body or hand movements were collected. A time lapse analysis of both movement and gazing information might prove this assumption. Researchers should also consider the influence of social relationships among performers, although in the present study their effect was intentionally excluded. Participants’ focus on gazing behavior in the preliminary survey might indicate its importance in maintaining social relationships (e.g., intimacy among performers). In a piano duo, the amount of gazing of the leaders decreases, while that of the followers increases, regardless of the playing part: primo or second (Kawase, 2011). Furthermore, the movements of the leader become explicit during a piano duet (Goebl & Palmer, 2009). These influences of social relationships remain to be examined.