Imitation during phoneme production
Introduction
Imitation is one of the many capacities of humans, which has allowed for cognitive advancement. Imitation may derive from the activity of “mirror” neurons discovered in monkey premotor cortex. They discharge when both observing and executing the same arm or mouth action (Ferrari, Gallese, Rizzolatti, & Fogassi, 2003; Gallese, Fadiga, Fogassi, & Rizzolatti, 1996). Kohler et al. (2002) recorded acoustical “mirror” neurons discharging when both hearing the related-to-an-action sound and executing the same action. On the basis of these data, Rizzolatti, Fogassi, and Gallese (2002) proposed that the perceiver imitating either mouth or arm gestures in response to audiovisually presented actions automatically goes into resonance with the actor. This resonance activity may allow understanding intention of the interlocutor. In monkeys imitation seems to be used also for learning (for a review see Byrne & Russon, 1998) and it is possible that in humans imitation triggered by an evolved “mirror” system acquired new capabilities for cognitive activities (for a review see Rizzolatti & Craighero, 2004).
In humans imitation may be innate although experimental evidence in favor of this hypothesis is still highly debated (see Heyes, 2001 for a review). Facial (tongue protrusion, lip protrusion, mouth opening), hand and head acts seem to be imitated already in the first hours of life (Heimann, 1998, Meltzoff, 2002), and, in particular, it was reported that facial movements were imitated without any vision of one's own movements (for a review see Meltzoff, 2002). Among many other activities, imitation triggered by observation may be used for language learning. Indeed, imitation of lip gestures is used in the earliest stages of language learning. For example, children imitating tongue protrusion at 3 months display more vocal imitation 9 months later (Heimann, 1998). The strict relationship between speech and imitation is supported by experimental evidence provided by neuroimaging studies. Broca's area, which is involved in encoding phonological representations in terms of mouth articulation gestures (Demonet et al., 1992; Paulesu, Frith, & Frackowiak, 1993; Zatorre, Evans, Meyer, & Gjedde, 1992), is activated also by the observation of speaking faces, imagination of faces expressing emotion, and imitation of hand movements (for a review see Bookheimer, 2002; Buccino et al., 2004; Iacoboni et al., 1999; Leslie, Johnson-Frey, & Grafton, 2004; Petersen, Fox, Posner, Mintun, & Raichle, 1988).
The aim of the present study was to determine whether automatic imitation is behaviorally observable during phoneme production, i.e. when a string-of-phonemes is repeated in response to a speaking interlocutor. We verified whether the kinematics pattern of the mouth pronouncing the string-of-phonemes presented in the sole visual modality and/or the voice spectra of the string presented only acoustically, i.e. the visual and acoustical features of phonemes, are imitated by the observer/listener. In addition, we tested a possible integration between imitation of the visual and the acoustical stimulus. To this purpose we compared imitation in unimodal (visual and acoustical) presentation with that in bimodal (audiovisual) presentation. Summing up, the aim of the present experiment was to determine which sensory modalities can activate resonant circuits in phoneme reproduction. We used the well-known notion that male labial movements are usually larger and voice formants are lower than those of females (Ferrero, Magno Caldognetto, & Cosi, 1996; Pickett, 1999). Consequently, if female perceivers imitated male actors presenting visually and acoustically strings of phonemes, we expected, respectively, increase in lip kinematics parameters and decrease in voice formants. In experiment 1 female participants were required to recognize and, then, to repeat strings of phonemes presented visually, acoustically, or audiovisually by a male actor. We compared the data collected in the three conditions using as a control condition a task in which the participants silently read, and, then repeated aloud the strings of phonemes. In experiment 2 female participants were required to recognize and, then to repeat the strings of phonemes presented by male and female actors with the same modalities as in experiment 1 (i.e. visual, acoustical and audiovisual). We compared the responses to the male actors (i.e. in the test condition) with those to the female actors (i.e. in the control condition). In this experiment we equalized task and type of presented stimulus between the control and the test conditions. In fact, in experiment 1 task (i.e. to read and to repeat) and type of stimulus (i.e. string of letters) in the control condition were different from those in the test conditions (i.e. to recognize and to repeat and actor's face/voice) and these varying factors might be responsible for differences between the control and test conditions.
Section snippets
Participants
Fourteen female volunteers (age 22–25 years), classified as right-handed according to Edinburgh Inventory (Oldfield, 1971), participated in the experiment. All of them were naïve to the purpose of the experiment. The Ethics Committee of the Medical Faculty of the University of Parma approved the study.
Apparatus and stimuli
Participants sat on a chair in front of a table. The stimulus was either a printed string of letters or a short video-clip showing the face of a male actor (face: 21.0 × 14.0 degrees of visual
Experiment 2
In experiment 1 comparison was performed between the data collected in a task of repetition of string-of-phonemes presented visually, acoustically and audiovisually and the data collected in a reading task (control condition). The observed differences between the control condition and the other conditions might depend also on the different type of stimulus (i.e. string of letters versus actor's face/voice) and task. To exclude this possibility, in experiment 2 we compared the string-of-phonemes
Discussion
The participants in the present experiment were required to recognize and, then, to repeat strings of phonemes. Even if the task did not require any imitation, the participants, in fact, imitated the actors’ lip movements and voice when repeating the string-of-phonemes /aba/. In the visual presentation only the mouth and face movements of the speaking actors were presented. The kinematics analysis showed that the participants imitated the actor's lip movements. As a matter of fact, in
Acknowledgments
We wish to thank Nicola Bruno and Cinzia Di Dio for the comments on the manuscript. The work was supported by grant from Ministero dell’Istruzione, dell’Università e della Ricerca (MIUR) to M.G.
References (32)
- et al.
Speech and gesture share the same communication system
Neuropsychologia
(2006) - et al.
Evidence from functional magnetic resonance imaging of crossmodal binding in the human heteromodal cortex
Current Biology
(2000) Causes and consequences of imitation
Trends in Cognitive Sciences
(2001)- et al.
Functional imaging of face and hand imitation: Towards a motor theory of empathy
Neuroimage
(2004) The grammars of speech and language
Cognitive Psychology
(1970)- et al.
The motor theory of speech perception revised
Cognition
(1985) The assessment and analysis of handedness: The Edinburgh inventory
Neuropsychologia
(1971)- et al.
Motor and cognitive functions of the ventral premotor cortex
Current Opinions in Neurobiology
(2002) - et al.
Seeing and hearing speech excites the motor system involved in speech production
Neuropsychologia
(2003) Functional MRI of language: New approaches to understanding the cortical organization of semantic processing
Annual Reviews in Neurosciences
(2002)
Neural circuits involved in the recognition of actions performed by nonconspecifics: An fmri study
Journal of Cognitive Neurosciences
Learning by imitation: A hierarchical approach
The Behavioral and Brain Sciences
The anatomy of phonological and semantic processing in normal subjects
Brain
Speech listening specifically modulates the excitability of tongue muscles: A tms study
European Journal of Neurosciences
Mirror neurons responding to the observation of ingestive and communicative mouth actions in the monkey ventral premotor cortex
European Journal of Neurosciences
Nozioni di fonetica acustica
Cited by (43)
A comparison of phonetic convergence in conversational interaction and speech shadowing
2018, Journal of PhoneticsCitation Excerpt :Although talker sex effects are not consistent across the literature, one shadowing study found that females converged more than males (Namy et al., 2002), but the effect was completely driven by convergence to one out of four model talkers. Despite this limitation, some subsequent studies of phonetic convergence have cited this finding as a motivation for using only female talkers (Delvaux & Soquet, 2007; Dias & Rosenblum, 2016; Gentilucci & Bernardis, 2007; Sanchez, Miller, & Rosenblum, 2010; Walker & Campbell-Kibler, 2015). In contrast, one conversational study found that males converged more than females (Pardo, 2006), but this study used just six pairs of talkers.
Sensorimotor Mismapping in Poor-pitch Singing
2017, Journal of VoiceCitation Excerpt :This conjecture may explain why pitch-matching is better for synthesized tones with the approximate timbre of the human voice than for sine waves, is better for the tones of human voices than for the tones of instrument sounds,4,30–32 and is better for same-sex voices than for opposite-sex voices.45 Even when repeating phonemes presented by members of the opposite sex, participants unconsciously imitate the voice spectra of the target.30,46 These results suggest that articulation consistency and timbre similarity affect pitch-matching and have implications for the treatment of poor-pitch singing.
Role of imitation in the emergence of phonological systems
2015, Journal of PhoneticsCitation Excerpt :Moreover, phonetic convergence arises in conversational interactions although there is no explicit instruction to imitate (e.g. Aubanel, 2011; Pardo, 2006), as well as in minimally interactional experimental settings involving exposure to another individual's speech (Delvaux & Soquet, 2007; Nielsen, 2011; Yu et al., 2013), including shadowed speech (Babel, 2010, 2012; Babel & Bulatov, 2012; Goldinger, 1998; Honorof et al., 2011; Miller et al., 2013; Mitterer & Ernestus, 2008; Namy et al., 2002; Shockley et al., 2004). Besides, shadowers have been shown to align to a model's spoken words whether those words are presented auditorily or visually, i.e. in a lip-reading task (Gentilucci & Bernardis, 2007; Miller, Sanchez, & Rosenblum, 2010; Sanchez, Miller, & Rosenblum, 2010). Altogether, empirical evidence thus points out to a widespread behavior which relies on processes that may be consubstantial to human spoken interaction, or at least to the cognitive processing of speech whenever the perception of another's speech accompanies one's own productions.
Direct and octave-shifted pitch matching during nonword imitations in men, women, and children
2015, Journal of VoiceCitation Excerpt :This phenomenon, as recently reviewed1–3 has been described regarding a variety of aspects of communication including pragmatic, lexical, and syntactic traits. Examples of phonetic assimilations, also termed convergence, accommodation, entrainment, alignment, or chameleon effect, have been described not only at the segmental level4 but also regarding suprasegmental features such as speech rate and intonation.5,6 In a vowel imitation experiment, speakers imitated the fundamental frequency (F0; also referred to as pitch) in the tokens without explicit instructions to do so,2 showing that F0 was a salient trait of the target that was imitated along with phonetic properties.
Phonetic convergence in shadowed speech: The relation between acoustic and perceptual measures
2013, Journal of Memory and LanguageShared and distinct neural correlates of vowel perception and production
2013, Journal of NeurolinguisticsCitation Excerpt :At the phonetic level, previous studies have also highlighted a strong tendency by a speaker to ‘imitate’ a number of phonetic characteristics in another speaker's speech, the so-called phonetic convergence effect. While some previous studies on phonetic convergence involved natural settings and conversational interactions (e.g., Aubanel & Nguyen, 2010; Pardo, 2006; Sancier & Fowler, 1997), this effect also appears in laboratory tasks during the exposure of auditory and/or visual speech stimuli (e.g., Delvaux & Soquet, 2007; Gentilucci & Bernardis, 2007; Gentilucci & Cattaneo, 2005; Kappes, Baumgaertner, Peschke, & Ziegler, 2009; Kerzel & Bekkering, 2000). In that respect, although highly speculative, motor activity observed in vowel perception might represent automaticsensorimotor adaptive processes of the perceived stimuli and therefore might indirectly argue for a central role ‘for motor representations and processes in conversation, an essential aspect of human language in its most basic use’ (Scott et al., 2009).