Imitation during phoneme production

https://doi.org/10.1016/j.neuropsychologia.2006.04.004Get rights and content

Abstract

Does listening to and observing the speaking interlocutor influence phoneme production? In two experiments female participants were required to recognize and, then, to repeat the string-of-phonemes /aba/ presented by actors visually, acoustically and audiovisually. In experiment 1 a male actor presented the string-of-phonemes and the participants’ lip kinematics and voice spectra were compared with those of a reading control condition. In experiment 2 female and male actors presented the string-of-phonemes and the lip kinematics and the voice spectra of the participants’ responses to the male actors were compared with those to the female actors (control condition). In both experiments 1 and 2, the lip kinematics in the visual presentations and the voice spectra in the acoustical presentations changed in the comparison with the control conditions approaching the male actors’ values, which were different from those of the female participants and actors. The variation in lip kinematics induced changes also in voice formants but only in the visual presentation. The data suggest that both features of the lip kinematics and of the voice spectra tend to be automatically imitated when repeating a string-of-phonemes presented by a visible and/or audible speaking interlocutor. The use of imitation, in place of the usual lip kinematics and vocal features, suggests an automatic and unconscious tendency of the perceiver to interact closely with the interlocutor. This is in accordance with the idea that resonant circuits are activated by the activity of the mirror system, which relates observation to execution of arm and mouth gestures.

Introduction

Imitation is one of the many capacities of humans, which has allowed for cognitive advancement. Imitation may derive from the activity of “mirror” neurons discovered in monkey premotor cortex. They discharge when both observing and executing the same arm or mouth action (Ferrari, Gallese, Rizzolatti, & Fogassi, 2003; Gallese, Fadiga, Fogassi, & Rizzolatti, 1996). Kohler et al. (2002) recorded acoustical “mirror” neurons discharging when both hearing the related-to-an-action sound and executing the same action. On the basis of these data, Rizzolatti, Fogassi, and Gallese (2002) proposed that the perceiver imitating either mouth or arm gestures in response to audiovisually presented actions automatically goes into resonance with the actor. This resonance activity may allow understanding intention of the interlocutor. In monkeys imitation seems to be used also for learning (for a review see Byrne & Russon, 1998) and it is possible that in humans imitation triggered by an evolved “mirror” system acquired new capabilities for cognitive activities (for a review see Rizzolatti & Craighero, 2004).

In humans imitation may be innate although experimental evidence in favor of this hypothesis is still highly debated (see Heyes, 2001 for a review). Facial (tongue protrusion, lip protrusion, mouth opening), hand and head acts seem to be imitated already in the first hours of life (Heimann, 1998, Meltzoff, 2002), and, in particular, it was reported that facial movements were imitated without any vision of one's own movements (for a review see Meltzoff, 2002). Among many other activities, imitation triggered by observation may be used for language learning. Indeed, imitation of lip gestures is used in the earliest stages of language learning. For example, children imitating tongue protrusion at 3 months display more vocal imitation 9 months later (Heimann, 1998). The strict relationship between speech and imitation is supported by experimental evidence provided by neuroimaging studies. Broca's area, which is involved in encoding phonological representations in terms of mouth articulation gestures (Demonet et al., 1992; Paulesu, Frith, & Frackowiak, 1993; Zatorre, Evans, Meyer, & Gjedde, 1992), is activated also by the observation of speaking faces, imagination of faces expressing emotion, and imitation of hand movements (for a review see Bookheimer, 2002; Buccino et al., 2004; Iacoboni et al., 1999; Leslie, Johnson-Frey, & Grafton, 2004; Petersen, Fox, Posner, Mintun, & Raichle, 1988).

The aim of the present study was to determine whether automatic imitation is behaviorally observable during phoneme production, i.e. when a string-of-phonemes is repeated in response to a speaking interlocutor. We verified whether the kinematics pattern of the mouth pronouncing the string-of-phonemes presented in the sole visual modality and/or the voice spectra of the string presented only acoustically, i.e. the visual and acoustical features of phonemes, are imitated by the observer/listener. In addition, we tested a possible integration between imitation of the visual and the acoustical stimulus. To this purpose we compared imitation in unimodal (visual and acoustical) presentation with that in bimodal (audiovisual) presentation. Summing up, the aim of the present experiment was to determine which sensory modalities can activate resonant circuits in phoneme reproduction. We used the well-known notion that male labial movements are usually larger and voice formants are lower than those of females (Ferrero, Magno Caldognetto, & Cosi, 1996; Pickett, 1999). Consequently, if female perceivers imitated male actors presenting visually and acoustically strings of phonemes, we expected, respectively, increase in lip kinematics parameters and decrease in voice formants. In experiment 1 female participants were required to recognize and, then, to repeat strings of phonemes presented visually, acoustically, or audiovisually by a male actor. We compared the data collected in the three conditions using as a control condition a task in which the participants silently read, and, then repeated aloud the strings of phonemes. In experiment 2 female participants were required to recognize and, then to repeat the strings of phonemes presented by male and female actors with the same modalities as in experiment 1 (i.e. visual, acoustical and audiovisual). We compared the responses to the male actors (i.e. in the test condition) with those to the female actors (i.e. in the control condition). In this experiment we equalized task and type of presented stimulus between the control and the test conditions. In fact, in experiment 1 task (i.e. to read and to repeat) and type of stimulus (i.e. string of letters) in the control condition were different from those in the test conditions (i.e. to recognize and to repeat and actor's face/voice) and these varying factors might be responsible for differences between the control and test conditions.

Section snippets

Participants

Fourteen female volunteers (age 22–25 years), classified as right-handed according to Edinburgh Inventory (Oldfield, 1971), participated in the experiment. All of them were naïve to the purpose of the experiment. The Ethics Committee of the Medical Faculty of the University of Parma approved the study.

Apparatus and stimuli

Participants sat on a chair in front of a table. The stimulus was either a printed string of letters or a short video-clip showing the face of a male actor (face: 21.0 × 14.0 degrees of visual

Experiment 2

In experiment 1 comparison was performed between the data collected in a task of repetition of string-of-phonemes presented visually, acoustically and audiovisually and the data collected in a reading task (control condition). The observed differences between the control condition and the other conditions might depend also on the different type of stimulus (i.e. string of letters versus actor's face/voice) and task. To exclude this possibility, in experiment 2 we compared the string-of-phonemes

Discussion

The participants in the present experiment were required to recognize and, then, to repeat strings of phonemes. Even if the task did not require any imitation, the participants, in fact, imitated the actors’ lip movements and voice when repeating the string-of-phonemes /aba/. In the visual presentation only the mouth and face movements of the speaking actors were presented. The kinematics analysis showed that the participants imitated the actor's lip movements. As a matter of fact, in

Acknowledgments

We wish to thank Nicola Bruno and Cinzia Di Dio for the comments on the manuscript. The work was supported by grant from Ministero dell’Istruzione, dell’Università e della Ricerca (MIUR) to M.G.

References (32)

  • G. Buccino et al.

    Neural circuits involved in the recognition of actions performed by nonconspecifics: An fmri study

    Journal of Cognitive Neurosciences

    (2004)
  • R.W. Byrne et al.

    Learning by imitation: A hierarchical approach

    The Behavioral and Brain Sciences

    (1998)
  • J.F. Demonet et al.

    The anatomy of phonological and semantic processing in normal subjects

    Brain

    (1992)
  • L. Fadiga et al.

    Speech listening specifically modulates the excitability of tongue muscles: A tms study

    European Journal of Neurosciences

    (2002)
  • P.F. Ferrari et al.

    Mirror neurons responding to the observation of ingestive and communicative mouth actions in the monkey ventral premotor cortex

    European Journal of Neurosciences

    (2003)
  • F. Ferrero et al.

    Nozioni di fonetica acustica

    (1979)
  • Cited by (43)

    • A comparison of phonetic convergence in conversational interaction and speech shadowing

      2018, Journal of Phonetics
      Citation Excerpt :

      Although talker sex effects are not consistent across the literature, one shadowing study found that females converged more than males (Namy et al., 2002), but the effect was completely driven by convergence to one out of four model talkers. Despite this limitation, some subsequent studies of phonetic convergence have cited this finding as a motivation for using only female talkers (Delvaux & Soquet, 2007; Dias & Rosenblum, 2016; Gentilucci & Bernardis, 2007; Sanchez, Miller, & Rosenblum, 2010; Walker & Campbell-Kibler, 2015). In contrast, one conversational study found that males converged more than females (Pardo, 2006), but this study used just six pairs of talkers.

    • Sensorimotor Mismapping in Poor-pitch Singing

      2017, Journal of Voice
      Citation Excerpt :

      This conjecture may explain why pitch-matching is better for synthesized tones with the approximate timbre of the human voice than for sine waves, is better for the tones of human voices than for the tones of instrument sounds,4,30–32 and is better for same-sex voices than for opposite-sex voices.45 Even when repeating phonemes presented by members of the opposite sex, participants unconsciously imitate the voice spectra of the target.30,46 These results suggest that articulation consistency and timbre similarity affect pitch-matching and have implications for the treatment of poor-pitch singing.

    • Role of imitation in the emergence of phonological systems

      2015, Journal of Phonetics
      Citation Excerpt :

      Moreover, phonetic convergence arises in conversational interactions although there is no explicit instruction to imitate (e.g. Aubanel, 2011; Pardo, 2006), as well as in minimally interactional experimental settings involving exposure to another individual's speech (Delvaux & Soquet, 2007; Nielsen, 2011; Yu et al., 2013), including shadowed speech (Babel, 2010, 2012; Babel & Bulatov, 2012; Goldinger, 1998; Honorof et al., 2011; Miller et al., 2013; Mitterer & Ernestus, 2008; Namy et al., 2002; Shockley et al., 2004). Besides, shadowers have been shown to align to a model's spoken words whether those words are presented auditorily or visually, i.e. in a lip-reading task (Gentilucci & Bernardis, 2007; Miller, Sanchez, & Rosenblum, 2010; Sanchez, Miller, & Rosenblum, 2010). Altogether, empirical evidence thus points out to a widespread behavior which relies on processes that may be consubstantial to human spoken interaction, or at least to the cognitive processing of speech whenever the perception of another's speech accompanies one's own productions.

    • Direct and octave-shifted pitch matching during nonword imitations in men, women, and children

      2015, Journal of Voice
      Citation Excerpt :

      This phenomenon, as recently reviewed1–3 has been described regarding a variety of aspects of communication including pragmatic, lexical, and syntactic traits. Examples of phonetic assimilations, also termed convergence, accommodation, entrainment, alignment, or chameleon effect, have been described not only at the segmental level4 but also regarding suprasegmental features such as speech rate and intonation.5,6 In a vowel imitation experiment, speakers imitated the fundamental frequency (F0; also referred to as pitch) in the tokens without explicit instructions to do so,2 showing that F0 was a salient trait of the target that was imitated along with phonetic properties.

    • Shared and distinct neural correlates of vowel perception and production

      2013, Journal of Neurolinguistics
      Citation Excerpt :

      At the phonetic level, previous studies have also highlighted a strong tendency by a speaker to ‘imitate’ a number of phonetic characteristics in another speaker's speech, the so-called phonetic convergence effect. While some previous studies on phonetic convergence involved natural settings and conversational interactions (e.g., Aubanel & Nguyen, 2010; Pardo, 2006; Sancier & Fowler, 1997), this effect also appears in laboratory tasks during the exposure of auditory and/or visual speech stimuli (e.g., Delvaux & Soquet, 2007; Gentilucci & Bernardis, 2007; Gentilucci & Cattaneo, 2005; Kappes, Baumgaertner, Peschke, & Ziegler, 2009; Kerzel & Bekkering, 2000). In that respect, although highly speculative, motor activity observed in vowel perception might represent automaticsensorimotor adaptive processes of the perceived stimuli and therefore might indirectly argue for a central role ‘for motor representations and processes in conversation, an essential aspect of human language in its most basic use’ (Scott et al., 2009).

    View all citing articles on Scopus
    View full text