ABSTRACT
We examined the effects of virtual characters’ body type and voice pitch on perceived audio-visual correspondence and believability. For our within-group study (N = 72), we developed nine experimental conditions using a 3 (body type: ectomorph vs. mesomorph vs. endomorph body types) × 3 (voice pitch: low vs. medium vs. high fundamental frequency [F0]) design. We found statistically significant main effects from voice pitch and statistically significant interaction effects between a virtual character’s body type and voice pitch on both the level of perceived audio-visual correspondence and believability of female and male virtual characters. For female virtual characters, we also observed an additional statistically significant main effect from body type and a statistically significant interaction effect between the participant’s biological sex and the virtual character’s voice pitch on both perceived audio-visual correspondence and believability. Moreover, the results show that perceived believability is highly correlated to perceived audio-visual correspondence. Our findings have important practical implications in applications where the virtual character is meant to be an emotional or informational guide that requires some level of perceived believability, as the findings suggest that it is possible to enhance the perceived believability of the virtual characters by generating appropriate voices through pitch manipulation of existing voices.
Supplemental Material
Available for Download
- [n. d.]. Hair Color by Country 2022. https://worldpopulationreview.com/country-rankings/hair-color-by-countryGoogle Scholar
- 2009. Handbook of Multimedia for Digital Entertainment and Arts. Springer US. https://doi.org/10.1007/978-0-387-89024-1Google Scholar
- 2017. The UW/NU Corpus. http://depts.washington.edu/phonlab/resources/uwnu2/Google Scholar
- 2019. Configuring the Pitch Contour. https://www.fon.hum.uva.nl/praat/manual/Intro_4_2__Configuring_the_pitch_contour.htmlGoogle Scholar
- J. Abitbol, P. Abitbol, and B. Abitbol. 1999. Sex hormones and the female voice. Journal of Voice 13, 3 (Sept. 1999), 424–446. https://doi.org/10.1016/s0892-1997(99)80048-4Google ScholarCross Ref
- E. P Altenberg and C. T Ferrand. 2006. Fundamental frequency in monolingual English, bilingual English/Russian, and bilingual English/Cantonese young adult women. Journal of Voice 20, 1 (2006), 89–96.Google ScholarCross Ref
- C.D. Aronovitch. 1976. The Voice of Personality: Stereotyped Judgments and their Relation to Voice Quality and Sex of Speaker. The Journal of Social Psychology 99, 2 (1976), 207–220. https://doi.org/10.1080/00224545.1976.9924774 arXiv:https://doi.org/10.1080/00224545.1976.9924774PMID: 979189.Google ScholarCross Ref
- M.P. Aylett, A. Vinciarelli, and M. Wester. 2017. Speech synthesis for the generation of artificial personality. IEEE transactions on affective computing 11, 2 (2017), 361–372.Google Scholar
- S. P Badathala, N. Adamo, N. J Villani, and H. N Dib. 2018. The effect of gait parameters on the perception of animated agents’ personality. In International Conference on Augmented Reality, Virtual Reality and Computer Graphics. Springer, 464–479.Google ScholarCross Ref
- R. J Baken and R. F Orlikoff. 2000. Clinical measurement of speech and voice. Cengage Learning.Google Scholar
- A. M. Baranowski and H. Hecht. 2018. Effect of camera angle on perception of trust and attractiveness. Empirical Studies of the Arts 36, 1 (2018), 90–100.Google ScholarCross Ref
- B. Barsties, R. Verfaillie, P. Dicks, and Y. Maryn. 2016. Is the speaking fundamental frequency in females related to body height?Logopedics Phoniatrics Vocology 41 (2016), 27–32. Issue 1. https://doi.org/10.3109/14015439.2014.941928Google Scholar
- P. Belin, S. Fecteau, and C. Bedard. 2004. Thinking the voice: neural correlates of voice perception. Trends in cognitive sciences 8, 3 (2004), 129–135.Google Scholar
- Kirsten Bergmann, Friederike Eyssel, and Stefan Kopp. 2012. A second chance to make a first impression? How appearance and nonverbal behavior affect perceived warmth and competence of virtual agents over time. In Intelligent Virtual Agents: 12th International Conference, IVA 2012, Santa Cruz, CA, USA, September, 12-14, 2012. Proceedings 12. Springer, 126–138.Google ScholarDigital Library
- M. Berry, S. Lewin, and S. Brown. 2022. Correlated expression of the body, face, and voice during character portrayal in actors. Scientific Reports 12 (2022). Issue 1. https://doi.org/10.1038/s41598-022-12184-7Google Scholar
- Anton Bogdanovych, Tomas Trescak, and Simeon Simoff. 2016. What makes virtual agents believable?Connection Science 28, 1 (2016), 83–108.Google ScholarDigital Library
- S. Bommarito. 2019. Correlation Between Voice, Speech, Body and Facial Types in Young Adults. Global Journal of Otolaryngology 20 (2019). Issue 4. https://doi.org/10.19080/gjo.2019.20.556041Google Scholar
- C. Breitenstein, D. V Lancker, and I. Daum. 2001. The contribution of speech rate and pitch variation to the perception of vocal emotions in a German and an American sample. Cognition & Emotion 15, 1 (2001), 57–79.Google ScholarCross Ref
- J.P. Cabral, B.R. Cowan, K. Zibrek, and R. McDonnell. 2017. The influence of synthetic voice on the evaluation of a virtual character. Proceedings of the Annual Conference of the International Speech Communication Association 2017-August, 229–233. https://doi.org/10.21437/Interspeech.2017-325Google Scholar
- V. Cartei and D. Reby. 2013. Effect of formant frequency spacing on perceived gender in pre-pubertal children’s voices. PLoS ONE 8 (2013). Issue 12. https://doi.org/10.1371/journal.pone.0081022Google Scholar
- JE L. Carter and B. H. Heath. 1990. Somatotyping: development and applications. Vol. 5. Cambridge university press.Google Scholar
- F. Christie and V. Bruce. 1998. The role of dynamic information in the recognition of unfamiliar faces. Memory & cognition 26, 4 (1998), 780–790.Google Scholar
- R. O. Coleman. 1976. A Comparison of the Contributions of Two Voice Quality Characteristics to the Perception of Maleness and Femaleness in the Voice. Journal of Speech and Hearing Research 19, 1 (1976), 168–180. https://doi.org/10.1044/jshr.1901.168 arXiv:https://pubs.asha.org/doi/pdf/10.1044/jshr.1901.168Google ScholarCross Ref
- S.A. Collins and C. Missing. 2003. Vocal and visual attractiveness are related in women. Animal behaviour 65, 5 (2003), 997–1004.Google Scholar
- L. Bernadete Rocha de Souza and M. Marques dos Santos. 2018. Body mass index and acoustic voice parameters: is there a relationship?Brazilian Journal of Otorhinolaryngology 84 (2018), 410–415. Issue 4. https://doi.org/10.1016/j.bjorl.2017.04.003Google Scholar
- E. D’haeseleer, H. Depypere, S. Claeys, N. Baudonck, and K. Van Lierde. 2012. The impact of hormone therapy on vocal quality in postmenopausal women. Journal of Voice 26, 5 (2012), 671–e1.Google ScholarCross Ref
- Patrick Doyle. 2002. Believability through context using" knowledge in the world" to create intelligent characters. In Proceedings of the first international joint conference on Autonomous agents and multiagent systems: part 1. 342–349.Google ScholarDigital Library
- R EBU-Recommendation. 2011. Loudness normalisation and permitted maximum level of audio signals.Google Scholar
- A. Edwards and C. Newell. 2005. Lively voice: a new model for speaking synthetic characters How changing physical stiffness parameters of virtual objects alter our perception of roughness during force feedback based haptic exploration View project.Google Scholar
- J.T. Eichhorn, R. D. Kent, D. Austin, and H. K. Vorperian. 2018. Effects of Aging on Vocal Fundamental Frequency and Vowel Formants in Men and Women. Journal of Voice 32 (2018), 644.e1–644.e9. Issue 5. https://doi.org/10.1016/j.jvoice.2017.08.003Google ScholarCross Ref
- S. Evans, N. Neave, and D. Wakelin. 2006. Relationships between vocal characteristics and body size and shape in human males: An evolutionary explanation for a deep male voice. Biological Psychology 72 (5 2006), 160–163. Issue 2. https://doi.org/10.1016/j.biopsycho.2005.09.003Google Scholar
- Ylva Ferstl, Michael McKay, and Rachel McDonnell. 2021a. Facial feature manipulation for trait portrayal in realistic and cartoon-rendered characters. ACM Transactions on Applied Perception (TAP) 18, 4 (2021), 1–8.Google ScholarDigital Library
- Y. Ferstl, S. Thomas, C. Guiard, C. Ennis, and R. McDonnell. 2021b. Human or Robot? Investigating voice, appearance and gesture motion realism of conversational social agents. In Proceedings of the 21st ACM international conference on intelligent virtual agents. 76–83.Google Scholar
- W T. Fitch and J. Giedd. 1999. Morphology and development of the human vocal tract: A study using magnetic resonance imaging. The Journal of the Acoustical Society of America 106, 3 (1999), 1511–1522.Google ScholarCross Ref
- K. L Garrett and E C. Healey. 1987. An acoustic analysis of fluctuations in the voices of normal adult speakers across three times of day. The Journal of the Acoustical Society of America 82, 1 (1987), 58–62.Google ScholarCross Ref
- Marylou Pausewang Gelfer and Victoria A. Mikos. 2005. The Relative Contributions of Speaking Fundamental Frequency and Formant Frequencies to Gender Identification Based on Isolated Vowels. Journal of Voice 19, 4 (2005), 544–554. https://doi.org/10.1016/j.jvoice.2004.10.006Google ScholarCross Ref
- J. Grzybowska and S. Kacprzak. 2016. Speaker Age Classification and Regression Using i-Vectors.. In INTERSPEECH. 1402–1406.Google Scholar
- I. Guimarães and E. Abberton. 2005. Health and voice quality in smokers: an exploratory investigation. Logopedics Phoniatrics Vocology 30, 3-4 (2005), 185–191.Google ScholarCross Ref
- H. Hatano, T. Kitamura, H. Takemoto, P. Mokhtari, K. Honda, and S. Masaki. 2012. Correlation between vocal tract length, body height, formant frequencies, and pitch frequency for the five Japanese vowels uttered by fifteen male speakers. 13th Annual Conference of the International Speech Communication Association 2012 1, 402–405. https://doi.org/10.21437/interspeech.2012-143Google Scholar
- D. Higgins, K. Zibrek, J. Cabral, D. Egan, and R. McDonnell. 2022. Sympathy for the digital: Influence of synthetic voice on affinity, social presence and empathy for photorealistic virtual humans. Computers and Graphics (Pergamon) 104 (2022), 116–128. https://doi.org/10.1016/j.cag.2022.03.009Google ScholarCross Ref
- Susan Hughes, Marissa A Harrison, and Gordon G Gallup. [n. d.]. SEX-SPECIFIC BODY CONFIGURATIONS CAN BE ESTIMATED FROM VOICE SAMPLES., 343–355 pages. Issue 4.Google Scholar
- F. Joassin, M. Pesenti, P. Maurage, E. Verreckt, R. Bruyer, and S. Campanella. 2011. Cross-modal interactions between human faces and voices involved in person recognition. Cortex 47 (3 2011), 367–376. Issue 3. https://doi.org/10.1016/j.cortex.2010.03.003Google Scholar
- Benedict C Jones, David R Feinberg, Lisa M DeBruine, Anthony C Little, and Jovana Vukovic. 2010. A domain-specific opposite-sex bias in human preferences for manipulated voice pitch. Animal Behaviour 79, 1 (2010), 57–62.Google ScholarCross Ref
- Jessica Junger, Katharina Pauly, Sabine Bröhr, Peter Birkholz, Christiane Neuschaefer-Rube, Christian Kohler, Frank Schneider, Birgit Derntl, and Ute Habel. 2013. Sex matters: Neural correlates of voice gender perception. NeuroImage 79 (2013), 275–287. https://doi.org/10.1016/j.neuroimage.2013.04.105Google ScholarCross Ref
- M. Kamachi, H. Hill, K. Lander, and E. Vatikiotis-Bateson. 2003. Putting the face to the voice’: Matching identity across modality. Current Biology 13, 19 (2003), 1709–1714.Google ScholarCross Ref
- Dominic Kao, Rabindra Ratan, Christos Mousas, Amogh Joshi, and Edward F. Melcer. 2022. Audio Matters Too: How Audial Avatar Customization Enhances Visual Avatar Customization. Conference on Human Factors in Computing Systems - Proceedings. https://doi.org/10.1145/3491102.3501848Google ScholarDigital Library
- Dominic Kao, Rabindra Ratan, Christos Mousas, and Alejandra J. Magana. 2021. The Effects of a Self-Similar Avatar Voice in Educational Games. Proceedings of the ACM on Human-Computer Interaction 5. Issue CHIPLAY. https://doi.org/10.1145/3474665Google ScholarDigital Library
- M Koleva, A Nacheva, and M Boev. 2002. Somatotype and disease prevalence in adults. Reviews on environmental health 17, 1 (2002), 65–84.Google Scholar
- J. Kreiman and D. Sidtis. 2011. Foundations of voice studies. Wiley-Blackwell, Chichester, England.Google Scholar
- A. Kuznetsova, P. B Brockhoff, and R. HB Christensen. 2017. lmerTest package: tests in linear mixed effects models. Journal of statistical software 82 (2017), 1–26.Google ScholarCross Ref
- L. Lachs and D.B Pisoni. 2004. Crossmodal source identification in speech perception. Ecological Psychology 16, 3 (2004), 159–187.Google ScholarCross Ref
- K. Lightstone, R. Francis, and L. Kocum. 2011. University faculty style of dress and students’ perception of instructor credibility. International Journal of Business and Social Science 2, 15 (2011).Google Scholar
- A Bryan Loyall, Joseph Bates, Jill Fain Lehman, Tom Mitchell, and Nils Nilsson. 1997. Believable Agents: Building Interactive Personalities.Google Scholar
- H.H. Lu, S.E. Weng, Y.F. Yen, H.H Shuai, and W.H. Cheng. 2021. Face-based Voice Conversion: Learning the Voice behind a Face. MM 2021 - Proceedings of the 29th ACM International Conference on Multimedia, 496–505. https://doi.org/10.1145/3474085.3475198Google ScholarDigital Library
- A. T. Macari, I. A. Karam, D. Tabri, D. Sarieddine, and A.L. Hamdan. 2014. Correlation between the length and sagittal projection of the upper and lower jaw and the fundamental frequency. Journal of Voice 28 (2014), 291–296. Issue 3. https://doi.org/10.1016/j.jvoice.2013.10.003Google ScholarCross Ref
- A. T. Macari, I. A. Karam, D. Tabri, D. Sarieddine, and A.L. Hamdan. 2015. Formants frequency and dispersion in relation to the length and projection of the upper and lower jaws. Journal of Voice 29 (2015), 83–90. Issue 1. https://doi.org/10.1016/j.jvoice.2014.05.011Google ScholarCross Ref
- A. T. Macari, I. A. Karam, G. Ziade, D. Tabri, D. Sarieddine, E.S. Alam, and A.L. Hamdan. 2017. Association Between Facial Length and Width and Fundamental Frequency. Journal of Voice 31 (2017), 410–415. Issue 4. https://doi.org/10.1016/j.jvoice.2016.12.001Google ScholarCross Ref
- C. Maguinness, C. Roswandowitz, and K. von Kriegstein. 2018. Understanding the mechanisms of familiar voice-identity recognition in the human brain. Neuropsychologia 116 (2018), 179–193.Google ScholarCross Ref
- S. Marsella and J. Gratch. 2003. Modeling coping behavior in virtual humans: don’t worry, be happy. In Proceedings of the second international joint conference on Autonomous agents and multiagent systems. 313–320.Google Scholar
- L. W. Mavica and E. Barenholtz. 2013. Matching voice and face identity from static images. Journal of Experimental Psychology: Human Perception and Performance 39 (2013), 307–312. Issue 2. https://doi.org/10.1037/a0030945Google ScholarCross Ref
- W.J. Mitchell, K.A. Szerszen, A.S. Lu, P.W. Schermerhorn, M. Scheutz, and K.F. MacDorman. 2011. A mismatch in the human realism of face and voice produces an uncanny valley., 10–12 pages. Issue 1. https://doi.org/10.1068/i0415Google Scholar
- R. Mondragón-Ceballos, M. D. G. Granados, A. L. Cerda-Molina, R. Chavira-Ramírez, and L. E. Hernández-López. 2015. Waist-to-hip ratio, but not body mass index, is associated with testosterone and estradiol concentrations in young women. International Journal of Endocrinology 2015 (2015). https://doi.org/10.1155/2015/654046Google ScholarCross Ref
- R. Niewiadomski and C. Pelachaud. 2011. How Is Believability of a Virtual Agent Related to Warmth, Competence, Personification, and Embodiment?, 431–448 pages. Issue 5.Google Scholar
- T. Oh, T. Dekel, C. Kim, I. Mosseri, W.T. Freeman, M. Rubinstein, and W. Matusik. 2019. Speech2Face: Learning the Face Behind a Voice. (2019).Google Scholar
- L.P. Pawelec, K. Graja, and A. Lipowicz. 2020. Vocal Indicators of Size, Shape and Body Composition in Polish Men. Journal of Voice (2020). https://doi.org/10.1016/j.jvoice.2020.09.011Google Scholar
- Katarzyna Pisanski, Paul J. Fraccaro, Cara C. Tigue, Jillian J.M. O'Connor, Susanne Röder, Paul W. Andrews, Bernhard Fink, Lisa M. DeBruine, Benedict C. Jones, and David R. Feinberg. 2014. Vocal indicators of body size in men and women: a meta-analysis. Animal Behaviour 95 (Sept. 2014), 89–99. https://doi.org/10.1016/j.anbehav.2014.06.011Google Scholar
- K. Pisanski, B. C. Jones, B. Fink, J. J.M. O’Connor, L. M. DeBruine, S. Röder, and D. R. Feinberg. 2016. Voice parameters predict sex-specific body morphology in men and women. Animal Behaviour 112 (2016), 13–22. https://doi.org/10.1016/j.anbehav.2015.11.008Google ScholarCross Ref
- T. Rakić, M.C. Steffens, and A. Mummendey. 2011. Blinded by the Accent! The Minor Role of Looks in Ethnic Categorization. Journal of personality and social psychology 100, 1 (2011), 16–29.Google ScholarCross Ref
- Mark O Riedl and Andrew Stern. 2006. Failing believably: Toward drama management with autonomous actors in interactive narratives. In Technologies for Interactive Digital Storytelling and Entertainment: Third International Conference, TIDSE 2006, Darmstadt, Germany, December 4-6, 2006. Proceedings 3. Springer, 195–206.Google ScholarDigital Library
- Rachel Sewell. 2011. What is appealing? sex and racial differences in perceptions of the physical attractiveness of women. (2011).Google Scholar
- D. R. R. Smith, R. D. Patterson, R. Turner, H. Kawahara, and T. Irino. 2005. The processing and perception of size information in speech sounds. The Journal of the Acoustical Society of America 117 (2005), 305–318. Issue 1. https://doi.org/10.1121/1.1828637Google ScholarCross Ref
- H. M.J. Smith, A. K. Dunn, T. Baguley, and P. C. Stacey. 2016. Matching novel face and voice identity using static and dynamic facial images. Attention, Perception, and Psychophysics 78 (2016), 868–879. Issue 3. https://doi.org/10.3758/s13414-015-1045-8Google ScholarCross Ref
- F. Thomas and O. Johnston. 1981. The Illusion of life: Disney animation in New York. NY Hyperion (1981).Google Scholar
- S. Thomas, Y. Ferstl, R. Mcdonnell, and C. Ennis. [n. d.]. Investigating How Speech And Animation Realism Influence The Perceived Personality Of Virtual Characters And Agents.Google Scholar
- A. Tinwell, M. Grimshaw-Aagaard, and A. Williams. 2010. Uncanny behaviour in survival horror games. Games Computing and Creative Technologies: Journal Articles (Peer-Reviewed) 2 (05 2010). https://doi.org/10.1386/jgvw.2.1.3_1Google Scholar
- Hartmut Traunmüller and Anders Eriksson. 1995. The frequency range of the voice fundamental in the speech of male and female adults. Unpublished manuscript 11 (1995).Google Scholar
- D.G. Walshe, E.J. Lewis, S.I. Kim, K. O’Sullivan, and B.K. Wiederhold. 2003. Exploring the use of computer games and virtual reality in exposure therapy for fear of driving following a motor vehicle accident. CyberPsychology & Behavior 6, 3 (2003), 329–334.Google ScholarCross Ref
- Taiba Majid Wani, Teddy Surya Gunawan, Syed Asif Ahmad Qadri, M. Kartiwi, and E. Ambikairajah. 2021. A Comprehensive Review of Speech Emotion Recognition Systems., 47795–47814 pages. https://doi.org/10.1109/ACCESS.2021.3068045Google Scholar
- P. Wisessing, K. Zibrek, D. W. Cunningham, J. Dingliana, and R. McDonnell. 2020. Enlighten Me: Importance of Brightness and Shadow for Character Emotion and Appeal. ACM Trans. Graph. 39, 3, Article 19 (4 2020), 12 pages. https://doi.org/10.1145/3383195Google ScholarDigital Library
- S. Wuhrer and C. Shu. 2013. Estimating 3D human shapes from measurements. Machine Vision and Applications 24 (8 2013), 1133–1147. Issue 6. https://doi.org/10.1007/s00138-012-0472-yGoogle Scholar
- A. Yamauchi, H. Imagawa, H. Yokonishi, K.I. Sakakibara, and N. Tayama. 2022. Gender- and Age- Stratified Normative Voice Data in Japanese-Speaking Subjects: Analysis of Sustained Habitual Phonations. Journal of Voice (2022). https://doi.org/10.1016/j.jvoice.2021.12.002Google Scholar
- A.W. Young, S. Frühholz, and S.R. Schweinberger. 2020. Face and voice perception: Understanding commonalities and differences. Trends in Cognitive Sciences 24, 5 (2020), 398–410.Google ScholarCross Ref
- Z. Zhang. 2016. Mechanics of human voice production and control. The Journal of the Acoustical Society of America 140 (2016), 2614–2635. Issue 4. https://doi.org/10.1121/1.4964509Google ScholarCross Ref
- Z. Zhang, B. Wu, and B. Schuller. 2019. Attention-augmented end-to-end multi-task learning for emotion prediction from speech. In ICASSP 2019-2019 IEEE International Conference on Acoustics, Speech and Signal Processing. IEEE, 6705–6709.Google Scholar
- R. I Zraick, M. A Gentry, L. Smith-Olinde, and B. A Gregg. 2006. The effect of speaking context on elicitation of habitual pitch. Journal of Voice 20, 4 (2006), 545–554.Google ScholarCross Ref
Index Terms
- Effects of Body Type and Voice Pitch on Perceived Audio-Visual Correspondence and Believability of Virtual Characters
Recommendations
Creating Virtual Characters
MOCO '18: Proceedings of the 5th International Conference on Movement and ComputingAn encounter with a virtual person can be one of the most compelling experiences in immersive virtual reality, as Mel Slater and his group have shown in many experiments on social interaction in VR. Much of this is due to virtual reality's ability to ...
Statistical Voice Conversion Techniques for Body-Conducted Unvoiced Speech Enhancement
In this paper, we present statistical approaches to enhance body-conducted unvoiced speech for silent speech communication. A body-conductive microphone called nonaudible murmur (NAM) microphone is effectively used to detect very soft unvoiced speech ...
An improvement in audio-visual voice activity detection for automatic speech recognition
IEA/AIE'10: Proceedings of the 23rd international conference on Industrial engineering and other applications of applied intelligent systems - Volume Part INoise-robust Automatic Speech Recognition (ASR) is essential for robots which are expected to communicate with humans in a daily environment. In such an environment, Voice Activity Detection (VAD) strongly affects the performance of ASR because there ...
Comments