ABSTRACT
In this work, we provide an empirical analysis of differences in word use between genders in telephone conversations, which complements the considerable body of work in sociolinguistics concerned with gender linguistic differences. Experiments are performed on a large speech corpus of roughly 12000 conversations. We employ machine learning techniques to automatically categorize the gender of each speaker given only the transcript of his/her speech, achieving 92% accuracy. An analysis of the most characteristic words for each gender is also presented. Experiments reveal that the gender of one conversation side influences lexical use of the other side. A surprising result is that we were able to classify male-only vs. female-only conversations with almost perfect accuracy.
- C. Cieri, D. Miller, and K. Walker. 2004. The Fisher corpus: a resource for the next generations of speech-to-text. In 4th International Conference on Language Resources and Evaluation, LREC, pages 69--71.Google Scholar
- J. Coates, editor. 1997. Language and Gender: A Reader. Blackwell Publishers.Google Scholar
- G. Doddington. 2001. Speaker recognition based on idiolectal differences between speakers. In Proceedings of the 7th European Conference on Speech Communication and Technology (Eurospeech 2001), pages 2251--2254.Google Scholar
- P. Eckert and S. McConnell-Ginet, editors. 2003. Language and Gender. Cambridge University Press.Google Scholar
- G. Forman. 2003. An extensive empirical study of feature selection metrics for text classification. Machine Learning Research, 3:1289--1305. Google ScholarDigital Library
- S. Kiesling. in press. Dude. American Speech.Google Scholar
- R. Kneser and H. Ney. 1987. Improved backing-off for m-gram language modeling. In Proc. Intl. Conf. on Acoustics, Speech and Signal Processing (ICASSP), pages 181--184.Google Scholar
- M. Koppel, S. Argamon, and A. R. Shimoni. 2002. Automatically categorizing written texts by author gender. Literary and Linguistic Computing, 17(4):401--412.Google ScholarCross Ref
- A. McCallum. 1996. Bow: A toolkit for statistical language modeling, text retrieval, classification and clustering. http://www.cs.cmu.edu/mccallum/bow.Google Scholar
- S. Singh. 2001. A pilot study on gender differences in conversational speech on lexical richness measures. Literary and Linguistic Computing, 16(3):251--264.Google ScholarCross Ref
- E. Stamatatos, N. Fakotakis, and G. Kokkinakis. 2000. Automatic text categorization in terms of genre and author. Computational Linguistics, 26:471--495. Google ScholarDigital Library
- A. Stolcke. 2002. An extensible language modeling toolkit. In Proc. Intl. Conf. on Spoken Language Processing (ICSLP), pages 901--904.Google Scholar
- A quantitative analysis of lexical differences between genders in telephone conversations
Recommendations
Everyday Conversations: A Comparative Study of Expert Transcriptions and ASR Outputs at a Lexical Level
Speech and ComputerAbstractThe study examines the outcomes of automatic speech recognition (ASR) applied to field recordings of daily Russian speech. Everyday conversations, captured in real-life communicative scenarios, pose quite a complex subject for ASR. This is due to ...
Mandarin lexical tone duration: Impact of speech style, word length, syllable position and prosodic position
Highlights- This study aims to establish a link between speech technology and linguistic research by studying the durations of Mandarin lexical tones in large speech ...
AbstractThis study aims to increase our knowledge of Mandarin lexical tone duration in continuous Mandarin speech. Related variation factors such as the number of syllable(s) in word, the position of syllable in word, its prosodic position and ...
Investigating prosodic entrainment from global conversations to local turns and tones in Mandarin conversations
Highlights- This is the first journal submission describing in considerable detail acoustic-prosodic entrainment (the tendency of speakers to speak like one another in ...
AbstractPrevious research on acoustic entrainment has paid less attention to tones than to other prosodic features. This study sets a hierarchical framework by three layers of conversations, turns and tone units, investigates prosodic ...
Comments