Automatic stress exaggeration by prosody modification to assist language learners perceive sentence stress

Lu, Jingli; Wang, Ruili; De Silva, Liyanage C.

doi:10.1007/s10772-011-9124-2

Automatic stress exaggeration by prosody modification to assist language learners perceive sentence stress

Published: 11 November 2011

Volume 15, pages 87–98, (2012)
Cite this article

International Journal of Speech Technology Aims and scope Submit manuscript

Jingli Lu^1,2,
Ruili Wang^1,2 &
Liyanage C. De Silva^1,3

409 Accesses
3 Citations
Explore all metrics

Abstract

In this paper, we propose a set of automatic stress exaggeration methods that can enlarge the differences between stressed and unstressed syllables. Our stress exaggeration methods can be used in computer-aided language learning systems to assist second language learners perceive stress patterns. The intention of our automatic stress exaggeration methods is to support hyper-pronunciation training which is commonly used in classrooms by teachers. In hyper-pronunciation training, exaggeration is used to help learners increase their awareness of acoustic features and effectively apply these features into their pronunciation. Duration, pitch and intensity have been claimed to be the main acoustic features that are closely related to stress in English language. Thus, four stress exaggeration methods are proposed in this paper: (i) duration-based stress exaggeration, (ii) pitch-based stress exaggeration, (iii) intensity-based stress exaggeration, and (iv) a combined stress exaggeration method that integrates the duration-based, pitch-based and intensity-based exaggeration methods. Our perceptual experimental results show that resynthesised stimuli by our proposed stress exaggerated methods can help learners of English as a Second Language (ESL) better perceive English stress patterns significantly.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Automatic Stress Annotation and Prediction for Expressive Mandarin TTS

Mandarin Stress Analysis and Prediction for Speech Synthesis

Phonetic Issue in the Process of Foreign Students Adaptation: Implementation and Perception of the Russian Word Stress by Tajik Speakers

References

Akahane-Yamada, R., Tohkura, Y., Bradlow, A. R., & Pisoni, D. B. (1996). Does training in speech perception modify speech production. In Proceedings of international conference on spoken language processing (Vols. 1–4, pp. 606–609).
Chapter Google Scholar
Ananthakrishnan, S., & Narayanan, S. (2008). Automatic prosodic event detection using acoustic, lexical, and syntactic evidence. IEEE Trans. Audio, Speech and Language Processing, 16(1), 216–228.
Article Google Scholar
Beskow, J., & Sjölander, K. (2000). WaveSurfer—a public domain speech tool. In Proceedings of international conference on spoken language processing, China, Beijing (Vol. 4, pp. 464–467).
Google Scholar
Bissiri, M. P., & Pfitzinger, H. R. (2009). Italian speakers learn lexical stress of German morphologically complex words. Speech Communication, 51(10), 933–947.
Article Google Scholar
Black, A. (2007). Speech synthesis for educational technology. In Proceedings of workshop on speech and language technology in education (pp. 104–107).
Google Scholar
Bond, Z. (1999). Slips of the ear: errors in the perception of casual conversation. San Diego: Academic Press.
Google Scholar
Bond, Z., & Small, L. H. (1983). Voicing, vowel and stress mispronunciations in continuous speech. Perception and Psychophysics, 34, 470–474.
Article Google Scholar
Bradlow, A., Pisoni, D., Akahana-Yamada, R., & Tohkura, Y. (1997). Training Japanese listeners to identify English /r/ and /l/: Some effects of perceptual learning on speech production. Journal of the Acoustical Society of America, 101(4), 2299–2310.
Article Google Scholar
Dalton, C., & Seidlhofer, B. (1994). Pronunciation. Oxford: Oxford University Press.
Google Scholar
Dauer, R. M. (1983). Stress-timing and syllable-timing reanalyzed. Journal of Phonetics, 11, 51–62.
Google Scholar
Delmonte, R. (2000). SLIM prosodic automatic tools for self-learning instruction. Speech Communication, 30, 145–166.
Article Google Scholar
Delmonte, R. (2009). Prosodic tools for language learning. International Journal of Speech Technology, 12(4), 161–184.
Article Google Scholar
Dupoux, E., Pallier, C., Sebastián-Gallés, N., & Mehler, J. (1997). A destressing ‘deafness’ in French?. Journal of Memory and Language, 36, 406–421.
Article Google Scholar
Engelbrecht, K. P., Quade, M., & Möller, S. (2009). Analysis of a new simulation approach to dialog system evaluation. Speech Communication, 51, 1234–1252.
Article Google Scholar
Eskenazi, M. (2009). An overview of spoken language technology for education. Speech Communication, 51(10), 832–844.
Article Google Scholar
Eskenazi, M., & Hansma, S. (1998). The Fluency pronunciation trainer. In Proceedings of speech technology in language learning (pp. 77–80).
Google Scholar
Fant, G. (1960). Acoustic theory of speech production. Moutons’Gravenhage.
Google Scholar
Felps, D., Bortfeld, H., & Gutierrez-Osuna, R. (2009). Foreign accent conversion in computer assisted pronunciation training. Speech Communication, 51(10), 920–932.
Article Google Scholar
Field, J. (2005). Intelligibility and the listener: the role of lexical stress. TESOL. Quarterly, 39, 399–423.
Article Google Scholar
Fry, D. B. (1955). Duration and intensity as physical correlates of linguistic stress. Journal of the Acoustical Society of America, 27, 765–768.
Article Google Scholar
Hahn, L. D. (2004). Primary stress and intelligibility: Research to motivate the teaching of suprasegmentals. TESOL Quarterly, 38(2), 201–223.
Article Google Scholar
Hincks, R. (2002). Speech synthesis for teaching lexical stress. TMH-QPSR, 44, 153–156.
Google Scholar
Hirose, K. (2004). Accent type recognition of Japanese using perceived mora pitch values and its use for pronunciation training system. In Proceedings of international symposium on tonal aspects of languages, Beijing (pp. 77–80).
Google Scholar
Kang, M., Kashiwagi, H., Treviranus, J., & Kaburagi, M. (2008). Synthetic speech in foreign language learning: an evaluation by learners. International Journal of Speech Technology, 11(2), 97–106.
Article Google Scholar
Lu, J., Wang, R., De Silva, L. C., Gao, Y., & Liu, J. (2010). CASTLE: a Computer-assisted stress teaching and learning environment for learners of English as a second language. In InterSpeech, Makuhari, Japan (pp. 606–609).
Google Scholar
MIT courseware (2006). Transcribing Prosodic Structure of Spoken Utterances with ToBI http://ocw.mit.edu/OcwWeb. Accessed on 15/08/2009.
Moulines, E., & Charpentier, F. (1990). Pitch-synchronous waveform processing techniques for text-to-speech synthesis using diphones. Speech Communication, 9(5–6), 453–467.
Article Google Scholar
Nagamine, T. (2002). An experimental study on the teachability and learnability of English intonational aspect: Acoustic analysis on F0 and native-speaker judgment task. Journal of Language and Linguistics, 1(4), 362–399.
Google Scholar
Nolan, F. (2003). Intonational equivalence: an experimental evaluation of pitch scales. In Proceedings of the 15th international congress of phonetic sciences, Barcelona (pp. 771–774).
Google Scholar
Ostendorf, M., Price, P. J., & Shattuck-Hufnagel, S. (1995). The Boston University radio news corpus. Boston Univ., Boston, MA, Tech. Rep. ECS-95-001, Mar.
Peperkamp, S., & Dupoux, E. (2002). A typological study of stress ‘deafness’. In C. Gussenhoven & N. Warner (Eds.), Laboratory Phonology 7 (pp. 203–240). Berlin: Mouton de Gruyter.
Google Scholar
Raux, A., & Black, A. W. (2003). A unit selection approach to F0 modeling and its application to emphasis. In Proceedings of IEEE workshop on automatic speech recognition and understanding (pp. 700–705).
Chapter Google Scholar
Silverman, K., Beckman, M., Pitrelli, J., Ostendorf, M., Wightman, C., Price, P., Pierrehumbert, J., & Hirschberg, J. (1992). Tobi: a standard for labeling english prosody. In Proceedings of international conference on spoken language processing (pp. 867–870).
Google Scholar
Sluijter, A. M. C., van Heuven, V. J., & Pacilly, J. J. A. (1997). Spectral Balance as a cue in the perception of linguistic stress. J. Acoust. Soc. Amer., 101, 503–513.
Article Google Scholar
Solé Sabater, M. J. (1991). Stress and Rhythm in English. Revista Alicantina de Estudios Ingleses, 4, 145–162.
Google Scholar
Sundström, A. (1998). Automatic prosody modification as a means for foreign language pronunciation training. In Proceedings of ISCA workshop on speech technology in language learning (STILL 98), Marholmen, Sweden (pp. 49–52).
Google Scholar
Tamburini, F., & Caini, C. (2005). An automatic system for detecting prosodic prominence in American English continuous speech. International Journal of speech technology, 8(1), 33–44.
Article Google Scholar
Toda, T., Black, A. W., & Tokuda, K. (2007). Voice conversion based on maximum likelihood estimation of spectral parameter trajectory. IEEE Transactions on Audio, Speech and Language Processing, 15(8), 2222–2235.
Article Google Scholar
Todaka, Y. (1995). A preliminary study of voice quality differences between Japanese and American English: Some pedagogical suggestions. JALT Journal, 17(2), 261–268.
Google Scholar
Wang, C., & Seneff, S. (2006). High-quality speech-to-speech translation for computer-aided language learning. ACM Transactions on Speech and Language Processing, 3(2), 1–21.
Article Google Scholar
Wang, R., & Lu, J. (2011). Investigation of the golden speaker for a language learner from the imitation preference perspective by voice modification. Speech Communication, 53, 175–184.
Article Google Scholar
WWW (2011). Voices materials on http://www.box.net/shared/srmu4tjj9f.
Xie, H., Andreae, P., Zhang, M., & Warren, P. (2004). Detecting stress in spoken English using decision trees and support vector machines. Australian Computer Science Communications (Data Mining, CRPIT 32), 26(7), 145–150.
Google Scholar
Yoon, K. (2008). Synthesis and evaluation of prosodically exaggerated utterances: a preliminary study. In Proceedings of conference of the association of modern British & American language & literature.
Google Scholar
Zielinski, B. W. (2008). The listener: No longer the silent partner in reduced intelligibility. System, 36, 69–84.
Article MathSciNet Google Scholar

Download references

Author information

Authors and Affiliations

Massey University, Palmerston North, New Zealand
Jingli Lu, Ruili Wang & Liyanage C. De Silva
State Key Laboratory for Novel Software Technology, Nanjing University, Nanjing, China
Jingli Lu & Ruili Wang
Faculty of Science, University of Brunei Darussalam, Bandar Seri Begawan, Brunei Darussalam
Liyanage C. De Silva

Authors

Jingli Lu
View author publications
You can also search for this author in PubMed Google Scholar
Ruili Wang
View author publications
You can also search for this author in PubMed Google Scholar
Liyanage C. De Silva
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ruili Wang.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Lu, J., Wang, R. & De Silva, L.C. Automatic stress exaggeration by prosody modification to assist language learners perceive sentence stress. Int J Speech Technol 15, 87–98 (2012). https://doi.org/10.1007/s10772-011-9124-2

Download citation

Received: 28 July 2011
Accepted: 20 October 2011
Published: 11 November 2011
Issue Date: June 2012
DOI: https://doi.org/10.1007/s10772-011-9124-2

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Automatic stress exaggeration by prosody modification to assist language learners perceive sentence stress

Abstract

Access this article

Similar content being viewed by others

Automatic Stress Annotation and Prediction for Expressive Mandarin TTS

Mandarin Stress Analysis and Prediction for Speech Synthesis

Phonetic Issue in the Process of Foreign Students Adaptation: Implementation and Perception of the Russian Word Stress by Tajik Speakers

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Automatic stress exaggeration by prosody modification to assist language learners perceive sentence stress

Abstract

Access this article

Similar content being viewed by others

Automatic Stress Annotation and Prediction for Expressive Mandarin TTS

Mandarin Stress Analysis and Prediction for Speech Synthesis

Phonetic Issue in the Process of Foreign Students Adaptation: Implementation and Perception of the Russian Word Stress by Tajik Speakers

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation