Elsevier

Speech Communication

Volume 50, Issue 4, April 2008, Pages 278-287
Speech Communication

The vocal communication of different kinds of smile

https://doi.org/10.1016/j.specom.2007.10.001Get rights and content

Abstract

The present study investigated the vocal communication of naturally occurring smiles. Verbal variation was controlled in the speech of 8 speakers by asking them to repeat the same sentence in response to a set sequence of 17 questions, intended to provoke reactions such as amusement, mild embarrassment, or just a neutral response. After coding for facial expressions, a sample of 64 utterances was chosen to represent Duchenne smiles, non-Duchenne smiles, suppressed smiles and non-smiles. These audio clips were used to test the discrimination skills of 11 listeners, who had to rely on vocal indicators to identify different types of smiles in speech. The study established that listeners can discriminate different smile types and further indicated that listeners utilize prototypical ideals to discern whether a person is smiling. Some acoustical cues appear to be taken by listeners as strong indicators of a smile, regardless of whether the speaker is actually smiling. Further investigations into listeners’ prototypical ideals of vocal expressivity could prove worthwhile for voice synthesizing technology endeavoring to make computer-simulations more naturalistic.

Introduction

Affect is expressed throughout the body (Trevarthen and Malloch, 2000), and is detectable through different senses (De Gelder and Vroomen, 2000, Scherer et al., 1986). Therefore, different channels (or modes) of expression are likely to be involved in a single communicative act. The ability to discriminate audibly between vocal expressions of different categorical emotions has been found in different cultures and in some different languages (Scherer et al., 2001, Scherer and Wallbott, 1994). Investigating the auditory detection of facial expression is crucial not just for our understanding of perceptual processes but also because it could be vital for helping people with sensory deficits. People who are blind, for instance, may rely heavily on distinctions in emotional tone and cadence in voice in order to facilitate communication.

Although research on the vocal communication of affect is growing (Douglas-Cowie et al., 2003, Juslin and Laukka, 2001, Ladd et al., 1985), the data are often limited to the use of “unnatural” speech samples involving either synthetic manipulation or production by actors who have been asked to focus on “pure” and “intense” exemplars of emotional expression (Juslin and Laukka, 2003, Lieberman and Michaels, 1962, Scherer, 2003). This artificiality has been criticized in research on facial displays of affect (see Fernández-Dols and Ruiz-Belda, 1997). Keltner and Ekman (2000, p. 244) argue that this concentration on “universal, prototypical facial expressions, … ignor(es)… individual variation in such expressions”. It is equally problematic for vocal expressions of affect: the use of prototypical examples obscures the nuances that accompany the regulation and moderation of expression by individuals during social interaction. So despite the acknowledged centrality of multi-modal expressiveness in communication, there has been limited research on the vocal communication of regulated expressions of emotion.

The expression of regulated emotion, although little understood (Gross, 1998), is a regular feature of everyday life (Morris and Reilly, 1987). Some consider variations in the expression of the same categorical emotion to be the result of “pull effects” (Johnstone and Scherer, 2000, Scherer, 1985) or “display rules” (Ekman and Friesen, 1969, Kirouac and Hess, 1999), terms that describe the constraints governing socially acceptable expressions of emotion, e.g. having to stifle a yawn in a meeting, or hide a smile in a serious situation. Others, such as Fridlund (1991) in his ecological behavioral theory, argue that variations themselves constitute different social intentions and acts. In either case, people are continuously managing and regulating their emotional displays in the rapid ebb and flow of social exchange. These modulated expressions, rather than the prototypical displays, are the “normal” forms of emotional expression that need to be explored. Although a number of studies have explored display rules (Ceschi and Scherer, 2003, Levenson, 1994, Levenson, 2002), virtually nothing seems to be known about the display rules for vocalizations.

Smiles are ideal for exploring the effects of regulation on the vocal communication of affect, since they offer a rich source of natural modulation within interaction. Yet despite having been well researched as visual displays, much less investigation has been conducted into the vocal expression of smiles. Smiles can express a large variety of meanings, ranging from embarrassment to amusement, triumph, bitterness and even anger. Despite this, smiles are often just distinguished using the criterion of the activation of the orbicularis oculi muscles (i.e. the presence of “crows feet” wrinkles around the eyes), differentiating Duchenne smiles (DS) from non-Duchenne smiles (NDS). Often dubbed as “genuine” or “felt” smiles, DS have attracted much debate concerning how indicative they are of positive affect (e.g. LaFrance and Hecht, 1999, Zaalberg et al., 2004) and whether they are simply a more intense version of NDS (Messinger et al., 1997), which have been considered by some to be more indicative of feigned enjoyment (e.g. Ekman and Friesen, 1988). There are, however, many more subtle types of smiles – Ekman (2001, p. 127) claims that his Facial Action Coding System (FACS) can distinguish more than 50 different smiles, and at least some of these have been shown to involve different facial acts such as suppression and control (Keltner, 1995, Keltner and Buswell, 1997).

The existing findings about the effects of smiles on speech (Aubergé and Cathiard, 2003, Tartter, 1980, Tartter and Braun, 1994) have yet to be explored in conjunction with what we already know about the different social effects of smiles (Ekman and Friesen, 1988, Fridlund, 1991, LaFrance and Hecht, 1999). Studies on how smiling affects vocalizations have typically focused on the acoustical effects of a mechanical smile gesture (e.g. Tartter and Braun, 1994) or amused smiles (Aubergé and Cathiard, 2003) and have not yet considered the vocal effects of other smiles or indeed suppressed smiles (SS). This suppression may be as evident in the voice as in the face, or even more so, given the speaker’s greater awareness of facial actions over bodily and vocal displays (Ekman and Friesen, 1974).

In light of the literature, DS, NDS, and SS could either be considered distinctive categories (each with distinct social intentions) or on a dimensional scale of “smile intensity” (with DS being most “smiley” then NDS, and finally SS). What remains to be determined is whether differences between smiles (that have arisen either as a consequence of motivational requisites or as a result of affect intensity) have an influence on their vocal expressivity and auditory availability. The present study explores the distinction between naturally occurring DS, NDS, and SS, and whether these have implications for vocal accessibility. One of the major issues when conducting research into vocal communication is how to control for verbal content. Previous studies have overcome this issue by using acted portrayals or synthesized speech, which has resulted in a dearth of information on natural expressions (Juslin and Laukka, 2003). In order to improve on this methodology, the present study utilizes a novel interview technique designed to induce varying facial expressions whilst the speakers repetitively utter the same words. Ensuring utterances are standardized not only controls the verbal content, but also provides a platform from which to study both the encoding and decoding components of the communicative process (as called for by (Juslin and Laukka, 2003)).

Section snippets

Method

The present study investigated the acoustical basis for the discrimination of Duchenne Smiles (DS), Non-Duchenne Smiles (NDS), and Suppressed Smiles (SS) from No Smiles (NS). The study was conducted in three main stages: (1) inducing smiles in speakers to obtain the auditory stimuli, (2) coding and extracting appropriate utterances, and (3) the testing of perceptual discrimination.

How listeners assigned utterances to categories

Listeners’ usage of the three response categories (Open Smile, Suppressed Smile, and No Smile) was reliably associated with the different stimulus categories, χ2 (6, N = 704) = 79.8, p < .001 (see Table 1). If the four stimulus categories are considered as falling along a dimension of “smileyness” (with DS being the most smiley, followed by NDS, then SS, and lastly NS), the use of the responses can also be seen to vary along this dimension. The “No Smile” response category is used less frequently as

Conclusions

The present research has demonstrated that listeners can, with varying degrees of success, hear different types of smiles in the voices of strangers in the absence of visual cues. Listeners are very good at discriminating ‘Duchenne Smiles’ from ‘No Smiles’. They can also, to a lesser degree, successfully discriminate ‘Non-Duchenne Smiles’ from ‘No Smiles’, and ‘Suppressed Smiles’ from ‘No Smiles’. These findings support previous research findings demonstrating that smiles can be communicated

Acknowledgements

With special thanks to all the participants who lent their voices and ears to this study. Thank you also to Dr. Bridget Waller, Dr. Gwenda Simmons, Monja Knoll, Dr. Paul Morris, and Dr. Maria Uther for their time and expertise.

References (56)

  • B. De Gelder et al.

    The perception of emotions be ear and by eye

    Cognition Emotion

    (2000)
  • I. Eibl-Eibesfeldt

    Ethology: The Biology of Behaviour

    (1970)
  • P. Ekman

    Telling Lies: Clues to Deceit in the Marketplace, Politics, and Marriage

    (2001)
  • P. Ekman et al.

    The repertoire of nonverbal behavior: categories, origins, usage and coding

    Semiotica

    (1969)
  • P. Ekman et al.

    Detecting deception from the body or face

    J. Pers. Soc. Psychol.

    (1974)
  • P. Ekman et al.

    Smiles when lying

    J. Pers. Soc. Psychol.

    (1988)
  • P. Ekman et al.

    Facial signs of emotional experience

    J. Pers. Soc. Psychol.

    (1980)
  • Ekman, P., Friesen, W.V., Hager, J.C., 2002. FACS. Research Nexus, Salt Lake...
  • J.M. Fernández-Dols et al.

    Spontaneous facial behavior during intense emotional episodes: artistic truth and optical truth

  • W.T. Fitch

    Vocal tract length and formant frequency dispersion correlate with body size in rhesus macaques

    J. Acoust. Soc. Amer.

    (1997)
  • R.W. Frick

    Communicating emotion: the role of prosodic features

    Psychol. Bull.

    (1985)
  • A.J. Fridlund

    Sociality of solitary smiling: potentiation by an implicit audience

    J. Pers. Soc. Psychol.

    (1991)
  • A.J. Fridlund

    Human Facial Expression: An Evolutionary View

    (1994)
  • J.J. Gross

    Antecedent- and response-focused emotion regulation: divergent consequences for experience, expression and physiology

    J. Pers. Soc. Psychol.

    (1998)
  • C.G. Hayman et al.

    Contingent dissociation between recognition and fragment completion: the method of triangulation

    J. Exp. Psychol.: Learning, Memory, Cognition

    (1989)
  • T. Johnstone et al.

    Vocal communication of emotion

  • Jonckheere, A.R., 1970. Design and analysis; review. Techniques for ordered contingency tables. In: Riemersma, J.B.J.,...
  • P.N. Juslin et al.

    Impact of intended emotion intensity on cue utilization and decoding accuracy in vocal expression of emotion

    Emotion

    (2001)
  • Cited by (53)

    • If your device could smile: People trust happy-sounding artificial agents more

      2020, Computers in Human Behavior
      Citation Excerpt :

      Still, research on the acoustic characteristics of different types of smiles is lacking. Drahota et al. (2008-04) obtained three different smiling expressions – Duchenne smiles, non-Duchenne smiles, and suppressed smiles – as well as a neutral baseline, from English speakers, and asked participants to correctly identify these four expressions. Participants were only able to reliably distinguish Duchenne smiles from non-smiles, but the majority of the other smile types were classified as non-smiles.

    • Smiles as Multipurpose Social Signals

      2017, Trends in Cognitive Sciences
      Citation Excerpt :

      Since most mammals communicate intention and influence conspecifics’ behavior primarily via the voice and body – not the face – this suggests a possible voice-modulating origin of the smile, fear grimace, and other variations of retracted-lip facial displays that convey social surrender. Listeners can hear when a speaker is smiling, and even distinguish between different types of smiles based on auditory input alone [72,73]. What might be the vocal origins of the distinct social-functional smiles proposed here?

    • Amused speech components analysis and classification: Towards an amusement arousal level assessment system

      2017, Computers and Electrical Engineering
      Citation Excerpt :

      Speech-smile is a term used to describe the alteration of speech due to smiling. As already mentioned, smiles can indeed be audibly identifiable in speech [9,10]. It is therefore possible to make use of this dimension in this work.

    View all citing articles on Scopus

    Portions of this work were presented in “The Auditory Discrimination of Socially Mediated Smiles”, Proceedings of 10th European Conference: Facial Expression, Measurement and Meaning, Rimini, Italy, September 2003.

    View full text