A Survey on perceived speaker traits: Personality, likability, pathology, and the first challenge

doi:10.1016/j.csl.2014.08.003

Computer Speech & Language

Volume 29, Issue 1, January 2015, Pages 100-131

https://doi.org/10.1016/j.csl.2014.08.003 Get rights and content

Abstract

The INTERSPEECH 2012 Speaker Trait Challenge aimed at a unified test-bed for perceived speaker traits – the first challenge of this kind: personality in the five OCEAN personality dimensions, likability of speakers, and intelligibility of pathologic speakers. In the present article, we give a brief overview of the state-of-the-art in these three fields of research and describe the three sub-challenges in terms of the challenge conditions, the baseline results provided by the organisers, and a new openSMILE feature set, which has been used for computing the baselines and which has been provided to the participants. Furthermore, we summarise the approaches and the results presented by the participants to show the various techniques that are currently applied to solve these classification tasks.

Introduction

In 2009–2012, challenges (Schuller et al., 2009, Schuller et al., 2010, Schuller et al., 2011, Schuller et al., 2012, Schuller et al., 2013a, Schuller et al., 2013) were organised at the INTERSPEECH conferences that featured several different aspects of paralinguistics: topics of interest were not what the speaker said, i.e., word recognition, or the semantics behind word recognition, e.g., hot spots or ontologies, but how it was said; for that, pertinent information can either be found between words (vocal, non-verbal events), it can be modulated onto the word chain (typically supra-segmental phenomena such as prosody or voice quality), or it can be encoded in the (types of) words chosen and in the connotations of these words. Catalogues of (short-term) speaker states such as emotions and of (long-term) speaker traits such as gender or personality are given in Schuller et al. (2013) and Schuller and Batliner (2014). In the 2012 challenge and accordingly in the present article, we want to address speaker traits that were obtained by perceptual annotation and not by some ‘objective’ measurement such as placing subjects on a scale to find out about their weight, or simply by deciding between male or female.

There are different definitions for the field that deals with ‘how’ instead of ‘what’; traditionally, paralinguistics is mostly conceived as dealing with the non-verbal, vocal aspects of communication, sometimes including, sometimes excluding multi-modal behaviour such as facial expression, hand gesture, gait, body posture. Here, we follow the definition given in Schuller and Batliner (2014): paralinguistics is “[...] the discipline dealing with those phenomena that are modulated onto or embedded into the verbal message, be this in acoustics (vocal, non-verbal phenomena) or in linguistics (connotations of single units or of bunches of units).” Thus, we exclude multi-modality but include verbal phenomena: although most of the contributions to our challenges so far concentrated on acoustics, i.e. on vocal phenomena modulated onto or embedded into the verbal message, we do not want to exclude linguistic approaches such as the modelling of interjections, hesitations, part-of-speech, or n-grams.

Speech is produced by speakers, and when we aim at paralinguistics, then a specific type of speech (friendly speech, pathological speech) characterises a specific type of speakers – they display friendliness or pathological speech traits. Thus, we could subsume all these phenomena under Speaker Characterisation or Speaker Classification as was done by Müller (2007, V): “[...] the term speaker classification is defined as assigning a given speech sample to a particular class of speakers. These classes could be Women vs. Men, Children vs. Adults, Natives vs. Foreigners, etc.”. Eventually, it is simply a matter of perspective whether we call the object of our investigation “type of speech” (indicated by specific speech characteristics) or “speaker traits” (indicated by specific speech characteristics extracted from the speech of specific speakers).

Irrespective of the term chosen, it is always about assigning one individual sample (speech or speaker) to k = 1, …, n groups (classes) of speakers; the larger n is, the more likely we may employ regression procedures instead of classification. Of course, it is always possible to map more or less continuous attributions such as rating scales onto a few classes. For challenges like the present one, we as organisers have to know which class a speaker in the test set belongs to. As mentioned above, this ‘reference’ (or ‘ground truth’, ‘gold standard’) can be obtained by (sort of) objective measures (for instance, speaker weight classes by following the ‘body mass index’) or by using perceptive evaluation. In this challenge on perceived speaker traits, we presented three sub-challenges where all speakers were assigned to (two) different classes, based on perceptive evaluation.

Perceptual judgements as basis for reference classes set specific edge conditions: basically, this mostly results in ranked/ordinal scales; however, often-parametric procedures such as Pearson's correlation are used. Human annotators do not always agree; thus, we do need some measure for agreement, and some method for ending up with one ‘unified’ label per token. This is normally the mean of the rating scale scores of all annotators. If we aim at classes, we have to partition the scale at appropriate points (mean, median, etc.).

When some of the authors started organising challenges back in 2009, the main motivation behind was to introduce a certain standard of comparability into the field of Computational Paralinguistics, by introducing concepts like

•
a partitioning of the database into train, development, and test data; often, there were only train and test partitions, and researchers defined the partitions of the very same corpus in different ways
•
a clearcut stratification of subjects for the partitions, if necessary and feasible, for instance, into male/female, old/young, etc.
•
the ‘open microphone setting’ which means that all data recorded and available should be processed; this pertains especially realistic data that often were preselected, based on labeller agreement, quality of recordings, and alike
•
adequate performance measures such as Unweighted Average (UA) Recall (UAR), that is, the unweighted (by number of instances in each class) mean of the percentage correctly classified in the diagonal of the confusion matrix; especially for more than two classes, this measure is more adequate than the usual Weighted Average Recall
•
both feature extraction and machine learning procedures done with open source tools, to guarantee strict comparability (e.g. of different features, using exactly the same learning algorithm, and of various learning algorithms, using exactly the same features) and repeatability (ensuring, also by means of software configuration management, that baseline results can be reproduced by anyone with access to the data and open source software, at any time)
•
comparability between studies both within the setting of the challenge (this is easy to obtain because the organisers can define the settings in a strict way) and later on, after the challenge (this cannot be ascertained in a strict way, of course, but authors often refer to and apply the challenge settings)

In the later challenges 2010–2012, we basically kept these conditions, with slight modifications: we introduced some more performance measures (correlation and area under ROC (Receiver–Operating Characteristic) curve (AUC)); we employed not only free interaction (as in our Speaker Personality Corpus, see Section 3.2) but controlled, prompted data as well (as in our likability and pathology corpus, see Sections 3.3 Speaker Likability Database (SLD), 3.4 The pathology corpus – The NKI CCRT Speech Corpus (NCSC)); we implemented larger feature vectors, see Section 3.1.

The three speaker traits dealt with in the challenge are described in Section 2. Previous studies on these traits are summarised below in order to motivate research on their automatic recognition as well as demonstrate feasibility. In Section 3, after shortly presenting the challenge and the unified machine learning framework (feature vectors and learning algorithms employed for computing the baseline results), we introduce the three challenge corpora, together with baseline results. Section 4 presents the contributions to each of the three sub-challenges, and the winners – in contrast to the general literature review, this section serves to review state of the art methods in a comparable setting and to provide a form of quantitative meta-analysis. Section 5 aims at summarising what we have learnt from the challenge.

Section snippets

Three speaker traits

In this section, we want to give a short account of the state-of-the-art in research on perceived speaker traits within computational paralinguistics. The recognition of perceived speaker traits is exemplified by personality, likability, and pathology. These three traits have been chosen based on the quantity of available labelled data (a crucial prerequisite for meaningful machine learning experiments) and the existence of feasibility studies on automatic classification.

The first challenge on perceived speaker traits: personality, likability, pathology

Whereas the first open comparative challenges in the field of paralinguistics targeted more ‘conventional’ phenomena such as emotion, age, and gender, there still exists a multiplicity of not yet covered, but highly relevant speaker states and traits. In the previous 2011 challenge, we focused on medium-term speaker states, namely sleepiness and intoxication. Consequently, we now wanted to focus on long(er) term speaker traits. In that regard, the INTERSPEECH 2012 Speaker Trait Challenge

Challenge results

One of the requirements for participation in the challenge was the submission and acceptance of a paper to the INTERSPEECH 2012 Speaker Trait Challenge, which was organised as a special event at the INTERSPEECH conference. Overall, 52 research groups registered for the challenge, and finally, 18 papers were accepted for presentation in the regular review process of the conference. All participants were encouraged to compete in all three sub-challenges. Table 9 shows how many participants took

Summary: challenge setup and results

In this INTERSPEECH 2012 Speaker Trait Challenge, we focused on perceived speaker traits, i.e., on traits that have to be annotated by humans. The recording settings were realistic with respect to specific applications: radio broadcast in the case of personality, mobile and landline phone in the case of likability, and office environment in the case of pathological speech. The type of data was spontaneous, prompted, and read speech. Annotation was made using rating scales.

To keep the conditions

Acknowledgement

The research leading to these results has received funding from the European Community's Seventh Framework Programme (FP7/2007–2013) under grant agreements No. 289021 (ASC-Inclusion) and No. 338164 (ERC Starting Grant iHEARu), and from the German Research Foundation (DFG grant WE 5050/1-1). The authors would further like to thank the sponsors of the challenge, the HUMAINE Association and Telekom Innovation Laboratories, and Catherine Middag for adding phoneme alignments for the Pathology

References (122)

T. Bocklet et al.
Automatic intelligibility assessment of speakers after laryngeal cancer by means of acoustic modeling
J. Voice
(2012)
A. Chattopadhyay et al.
Hearing voices: the impact of announcer speech characteristics on consumer response to broadcast advertising
J. Consum. Psychol.
(2003)
S.A. Collins
Men's voices and women's choices
Anim. Behav.
(2000)
T.L. Eadie et al.
Influence of speaker gender on listener judgments of tracheoesophageal speech
J. Voice
(2008)
R. Eklund et al.
Xenophones: an investigation of phone set expansion in Swedish and implications for speech recognition and speech synthesis
Speech Commun.
(2001)
A. Maier et al.
PEAKS – a system for the automatic evaluation of voice and speech disorders
Speech Commun.
(2009)
D. McColl et al.
Listener ratings of the intelligibility of tracheoesophageal speech in noise
J. Commun. Disord.
(1998)
D.A. McColl
Intelligibility of tracheoesophageal speech in noise
J. Voice
(2006)
C. Middag et al.
Robust automatic intelligibility assessment techniques evaluated on speakers treated for head and neck cancer
Comput. Speech Lang.
(2014)
L. Pinto-Coelho et al.
On the development of an automatic voice pleasantness classification and intensity estimation system
Comput. Speech Lang.
(2013)

B. Rammstedt et al.

Measuring personality in one minute or less: a 10-item short version of the Big Five inventory in English and German

J. Res. Personal.

(2007)

R. Ranganath et al.

Detecting friendly, flirtatious, awkward, and assertive speech in speed-dates

Comput. Speech Lang.

(2013)

A. Rosenberg et al.

Charisma perception from text and speech

Speech Commun.

(2009)

B. Schuller et al.

Recognising realistic emotions and affect in speech: state of the art and lessons learnt from the first challenge

Speech Commun.

(2011)

B. Schuller et al.

Paralinguistics in speech and language – state-of-the-art and the challenge

Comput. Speech Lang.

(2013)

A. Afifi et al.

Statistical Analysis. A Computer Oriented Approach

(1979)

G.K. Anumanchipalli et al.

Text-dependent pathological voice detection

A.E. Aronson et al.

Clinical Voice Disorders

(2009)

E. Aronson et al.

Social Psychology

(2009)

Y. Attabi et al.

Anchor models and WCCN normalization for speaker trait classification

K. Audhkhasi et al.

Speaker personality classification using systems based on acoustic-lexical cues and an optimal tree-structures Bayesian network

J. Biesanz et al.

Personality coherence: moderating self-other profile agreement and profile consensus

J. Pers. Soc. Psychol.

(2000)

W.J. Blot et al.

Smoking and drinking in relation to oral and pharyngeal cancer

Cancer Res.

(1988)

T. Bocklet et al.

Voice assessment of speakers with laryngeal cancer by glottal excitation modeling based on a 2-mass model

T. Bocklet et al.

Detection of persons with Parkinson's disease by acoustic, vocal, and prosodic analysis

D.H. Brown et al.

Postlaryngectomy voice rehabilitation: state of the art at the millennium

World J. Surg.

(2003)

R. Brueckner et al.

Likability classification – a not so deep neural network approach

H. Buisman et al.

The log-Gabor method: speech classification using spectrogram image analysis

K. Bunton et al.

Listener agreement for auditory-perceptual ratings of dysarthria

J. Speech Lang. Hear. Res.

(2007)

F. Burkhardt et al.

A database of age and gender annotated telephone speech

F. Burkhardt et al.

‘Would you buy a car from me?’ – On the likability of telephone voices

N. Campbell et al.

Voice quality: the 4th prosodic dimension

C. Chastagnol et al.

Personality traits detection using a parallelized modified SFFS algorithm

R.P. Clapham et al.

NKI-CCRT Corpus – speech intelligibility before and after advanced head and neck cancer treated with concomitant chemoradiotherapy

S. Cloninger

Conceptual issues in personality theory

N. Cummins et al.

A comparison of classification paradigms for speaker likeability determination

N. Dahlbäck et al.

Similarity is more important than expertise: accent effects in speech interfaces

P. Ekman et al.

Relative importance of face, body, and speech in judgments of personality and affect

J. Pers. Soc. Psychol.

(1980)

F. Eyben

Real-time speech and music classification by large audio feature space extraction

(2014)

F. Eyben et al.

Recent developments in openSMILE, the munich open-source multimedia feature extractor

F. Eyben et al.

openSMILE – the munich versatile and fast open-source audio feature extractor

M.J. Ferguson et al.

Likes and dislikes: a social cognitive perspective on attitudes

L. Ferrier et al.

Dysarthric speakers’ intelligibility and speech characteristics in relation to computer speech recognition

Augment. Altern. Commun.

(1995)

D. Funder

Personality

Ann. Rev. Psychol.

(2001)

A. Gravano et al.

Acoustic and prosodic correlates of social behavior

M. Grimm et al.

Evaluation of natural emotions using self assessment manikins

T. Haderlein

Automatic Evaluation of Tracheoesophageal Substitute Voices

(2007)

M. Hall et al.

The WEKA data mining software: an update

SIGKDD Explorations Newsletter

(2009)

A.E. Harrison

Speech Disorders: Causes, Treatment and Social Effects

(2010)

Cited by (64)

A survey on privacy issues and solutions for Voice-controlled Digital Assistants
2022, Pervasive and Mobile Computing
With the development and increasing deployment of smart home devices, voice control supports comfortable end user interactions. However, potential end users may refuse to use Voice-controlled Digital Assistants (VCDAs) because of privacy concerns. To address these concerns, some manufacturers provide limited privacy-preserving mechanisms for end users; however, these mechanisms are seldom used. We herein provide an analysis of privacy threats resulting from the utilization of VCDAs. We further analyze how existing solutions address these threats considering the principles of the European General Data Protection Regulation (GDPR). Based on our analysis, we propose directions for future research and suggest countermeasures for better privacy protection.
A new fuzzy unit selection cost function optimized by relaxed gradient descent algorithm
2020, Expert Systems with Applications
Citation Excerpt :
However, when computer algorithms start to appear and behave like humans, the feeling of ‘repulsion’ increases in line with mistakes in the generated behavior (i.e. the uncanny valley theorem, Ciechanowski, Przegalinska, Magnuski, & Gloor, 2019), voice quality (i.e. mis-articulation and synthetic voice) being the major factor in likeability and perceived human likeness of synthesized speech (Baird, Parada-Cabaleiro, Hantke, Burkhardt, Cummins, & Schuller, 2018). Paralinguistic features (i.e. excluding the linguistic meaning) strongly affect human perceptions of essential interaction factors, including “likeability” (Schuller et al., 2015). From IBM’s Watson (Fernandez, Rendel, Ramabhadran, & Hoory, 2015) to WaveNet (Shen, Pang, Weiss, Schuster, Jaitly, Yang, & Saurous, 2018), the methods for creating natural voices have substantially advanced in the past decade.
In data-driven corpus-based text-to-speech synthesis systems, the main issue is to select the most natural-sounding sequence of acoustic units without unnatural acoustic transitions, and to minimize all acoustic mismatches at the concatenation points. Unit selection algorithms incorporating unit selection cost functions have been known to synthesize speech close to natural quality. However, these algorithms operate over large acoustic inventories with huge number of acoustic units in a broad spectrum of linguistic, prosodic and acoustic contexts, and with a huge number of concatenation possibilities. Moreover, the shape of the unit selection cost function, which evaluates the cost of concatenating two subsequent acoustic units, is modelled manually in a time-consuming and laborious iterative process, which is based on subjective evaluation. Since this process must be repeated for any new acoustic inventory, or even after changes in a given acoustic inventory, we propose instead a new fuzzy unit selection cost function. We further propose to optimize fully automatically the shape of the fuzzy unit selection cost function to the given acoustic inventory’s context by using a relaxed gradient descent algorithm, where the subjective tests are replaced by a novel objective measure needed to evaluate unit selection cost function performance. Furthermore, the proposed approach is fully interpretable and also highlights insights into which parts of the fuzzy unit selection cost function’s shape could be further improved. The experiments show that the optimized fuzzy unit selection cost function significantly outperforms the baseline fuzzy unit selection cost function. Moreover, the results prove that the unit selection optimization algorithm is capable of finding the optimal shape of the fuzzy unit selection cost function, even when optimized over a small subset of sentences.
Manifestation of depression in speech overlaps with characteristics used to represent and recognize speaker identity
2023, Scientific Reports
Multimodal affective computing: Technologies and applications in learning environments
2023, Multimodal Affective Computing: Technologies and Applications in Learning Environments
Computational charisma—A brick by brick blueprint for building charismatic artificial intelligence
2023, Frontiers in Computer Science
Multimodal Personality Traits Assessment (MuPTA) Corpus: The Impact of Spontaneous and Read Speech
2023, Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH

View all citing articles on Scopus

Björn Schuller received his diploma in 1999, his doctoral degree in 2006, and his habilitation in 2012, all in electrical engineering and information technology from TUM in Munich/Germany where he is tenured heading the Machine Intelligence & Signal Processing Group. He is further a Senior Lecturer at the Imperial College London/U. K. In 2013 he also chaired the Institute for Sensor Systems at the University of Passau/Germany. From 2009 to 2010 he was with the CNRS-LIMSI in Orsay/France and a visiting scientist in the Imperial College London. In 2012 he was with Joanneum Research in Graz/Austria, and in 2013 Visiting Professor of the Harbin Institute of Technology in Harbin/P. R. China and of the University of Geneva/Switzerland. Dr. Schuller is president of the Association for the Advancement of Affective Computing (AAAC), elected member of the IEEE SLTC, member of the ACM, IEEE, and ISCA and (co-)authored more than 390 peer reviewed publications leading to more than 6100 citations – his current h-index equals 39.

Stefan Steidl received his diploma degree in Computer Science in 2002 from Friedrich-Alexander University Erlangen-Nuremberg in Germany (FAU). In 2009, he received his doctoral degree from FAU for his work on Vocal Emotion Recognition. He is currently a member of the research staff of ICSI in Berkeley/USA and the Pattern Recognition Lab of FAU. His primary research interests are the classification of naturally occurring emotion-related states and of atypical speech (children's speech, speech of elderly people, pathological voices). He has (co-)authored more than 40 publications in journals and peer reviewed conference proceedings and been a member of the Network-of-Excellence HUMAINE.

Anton Batliner received his M.A. degree in Scandinavian Languages and his doctoral degree in phonetics in 1978, both at LMU Munich/Germany. He has been a member of the research staff of the Institute for Pattern Recognition at FAU Erlangen/Germany since 1997. He is co-editor of one book and author/co-author of more than 200 technical articles, with a current h-index of 37 and more than 5000 citations. His research interests are all aspects of prosody and paralinguistics in speech processing. He repeatedly served as Workshop/Session (co)-organiser and has been Associate Editor for the IEEE Transactions on Affective Computing.

Elmar Nöth is a professor for Applied Computer Science at the University of Erlangen-Nuremberg. He studied in Erlangen and at M.I.T. and received the Dipl.-Inf. (univ.) degree and the Dr.-Ing. degree from the University of Erlangen-Nuremberg in 1985 and 1990, respectively. Since 1990 he was an assistant professor at the Institute for Pattern Recognition in Erlangen. Since 2008 he is a full professor at the same institute and head of the speech group. Since 2013 he is Adjunct Professor at the King Abdulaziz University in Saudi Arabia. He is on the editorial board of Speech Communication and EURASIP Journal on Audio, Speech, and Music Processing. His current interests are prosody, analysis of pathologic speech, computer aided language learning and emotion analysis.

Alessandro Vinciarelli is Lecturer at the University of Glasgow (UK) and Senior Researcher at the Idiap Research Institute (Switzerland). His main research interest is Social Signal Processing, the domain aimed at modelling analysis and synthesis of nonverbal behaviour in social interactions. He has published more than 80 works (1700+ citations, h-index 23), organized the IEEE International Conference on Social Computing, and chaired twenty international scientific events. Furthermore, he is or has been Principal Investigator of several national and international projects, including a European Network of Excellence (the SSPNet, www.sspnet.eu). Last, but not least, Alessandro is co-founder of Klewel (www.klewel.com), a knowledge management company recognised with several awards.

Felix Burkhardt does tutoring, consulting, research and development in the working fields human-machine dialogue systems, Text-to-Speech synthesis, speaker classification, ontology based natural language modelling and emotional human-machine interfaces. Originally an expert of Speech Synthesis at the Technical University of Berlin, he wrote his PhD thesis on the simulation of emotional speech by machines, recorded the Berlin acted emotions database, EmoDB, and maintains the open source emotional speech synthesiser Emofilt. He has been working for the Deutsche Telekom AG since 2000, currently for the Telekom Innovation Laboratories in Berlin.

Rob van Son received a masters degree from the Radboud University in Nijmegen and a PhD in Phonetics from the University of Amsterdam. He has worked for the Amsterdam Center for Language and Communication (ACLC, University of Amsterdam) and the NKI-AVL in Amsterdam on a number of projects in the field of phonetics, psycholinguistics, and speech technology.

Felix Weninger received his diploma in computer science (Dipl.-Inf. degree) from TUM in 2009. He is currently pursuing his PhD degree as a researcher in the Machine Intelligence & Signal Processing Group at TUM's Institute for Human-Machine Communication. He has (co-)authored more than 60 publications in peer-reviewed books, journals and conference proceedings covering the fields of robust audio analysis, computational paralinguistics and medical informatics. Mr. Weninger serves as a reviewer for the IEEE Transactions on Audio, Speech and Language Processing, IEEE Transactions on Affective Computing and other high-profile journals and international conferences.

Florian Eyben obtained his diploma in Information Technology from TUM. He is currently pursuing his PhD degree in the Machine Intelligence & Signal Processing Group. His research interests include large scale hierarchical audio feature extraction and evaluation, automatic emotion recognition from the speech signal, recognition of non-linguistic vocalisations, automatic large vocabulary continuous speech recognition, statistical and context-dependent language models, and Music Information Retrieval. He has over 90 publications in peer-reviewed books, journals, and conference proceedings covering many of his areas of research, leading to over 1900 citations and an h-index of 23.

Tobias Bocklet received his diploma degree in computer science in 2007 and his PhD in 2012 both from the University of Erlangen-Nuremberg. In 2008 he was with the speech group at SRI International working on automatic speaker identification. From 2009 to 2013 he was a member of the research staff of the Institute of Pattern Recognition at the University of Erlangen-Nuremberg and the Department of Phoniatrics and Pedaudiology of the University Clinics Erlangen. In his work he focused on the assessment of speech and language development and pathologies. Tobias is now a researcher at Intel Corporation.

Gelareh Mohammadi is postdoctoral researcher at Idiap Research Institute, Martigny, Switzerland. Her work investigates the effect of nonverbal vocal behaviour on personality perception. She received her BSc in Biomedical Engineering from Amirkabir University of Technology, Iran, in 2003, her MSc in Electrical Engineering from Sharif University of Technology, Iran, in 2006, and her PhD in Electrical Engineering from EPFL in 2013. Her research interests include social signal processing, machine learning and pattern recognition.

Benjamin Weiss received his PhD in Linguistics in 2008 from Humboldt University of Berlin, doing his dissertation about speech tempo and pronunciation. In the same year, he evaluated embodied conversational agents as Visiting Fellow at the MARCS Auditory Laboratories, University of Western Sydney. Currently, he is working on likability of voices and multimodal Human-Computer Interaction at the Telekom Innovation Laboratories of TU Berlin.

^☆: This paper has been recommended for acceptance by L. ten Bosch.

View full text

A Survey on perceived speaker traits: Personality, likability, pathology, and the first challenge☆

Abstract

Introduction

Section snippets

Three speaker traits

The first challenge on perceived speaker traits: personality, likability, pathology

Challenge results

Summary: challenge setup and results

Acknowledgement

J. Voice

J. Consum. Psychol.

Anim. Behav.

J. Voice

Speech Commun.

Speech Commun.

J. Commun. Disord.

J. Voice

Comput. Speech Lang.

Comput. Speech Lang.

J. Res. Personal.

Comput. Speech Lang.

Speech Commun.

Speech Commun.

Comput. Speech Lang.

Statistical Analysis. A Computer Oriented Approach

Text-dependent pathological voice detection

Clinical Voice Disorders

Social Psychology

Anchor models and WCCN normalization for speaker trait classification

Speaker personality classification using systems based on acoustic-lexical cues and an optimal tree-structures Bayesian network

Personality coherence: moderating self-other profile agreement and profile consensus

J. Pers. Soc. Psychol.

Smoking and drinking in relation to oral and pharyngeal cancer

Cancer Res.

Voice assessment of speakers with laryngeal cancer by glottal excitation modeling based on a 2-mass model

Detection of persons with Parkinson's disease by acoustic, vocal, and prosodic analysis

Postlaryngectomy voice rehabilitation: state of the art at the millennium

World J. Surg.

Likability classification – a not so deep neural network approach

The log-Gabor method: speech classification using spectrogram image analysis

Listener agreement for auditory-perceptual ratings of dysarthria

J. Speech Lang. Hear. Res.

A database of age and gender annotated telephone speech

‘Would you buy a car from me?’ – On the likability of telephone voices

Voice quality: the 4th prosodic dimension

Personality traits detection using a parallelized modified SFFS algorithm

NKI-CCRT Corpus – speech intelligibility before and after advanced head and neck cancer treated with concomitant chemoradiotherapy

Conceptual issues in personality theory

A comparison of classification paradigms for speaker likeability determination

Similarity is more important than expertise: accent effects in speech interfaces

Relative importance of face, body, and speech in judgments of personality and affect

J. Pers. Soc. Psychol.

Real-time speech and music classification by large audio feature space extraction

Recent developments in openSMILE, the munich open-source multimedia feature extractor

openSMILE – the munich versatile and fast open-source audio feature extractor

Likes and dislikes: a social cognitive perspective on attitudes

Dysarthric speakers’ intelligibility and speech characteristics in relation to computer speech recognition

Augment. Altern. Commun.

Personality

Ann. Rev. Psychol.

Acoustic and prosodic correlates of social behavior

Evaluation of natural emotions using self assessment manikins

Automatic Evaluation of Tracheoesophageal Substitute Voices

The WEKA data mining software: an update

SIGKDD Explorations Newsletter

Speech Disorders: Causes, Treatment and Social Effects