Supporting Collaborative Transcription of Recorded Speech with a 3D Game Interface

Luz, Saturnino; Masoodian, Masood; Rogers, Bill

doi:10.1007/978-3-642-15384-6_42

Supporting Collaborative Transcription of Recorded Speech with a 3D Game Interface

Saturnino Luz²³,
Masood Masoodian²⁴ &
Bill Rogers²⁴

Conference paper

1531 Accesses
4 Citations

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 6279))

Abstract

The amount of speech data available on-line and in institutional repositories, including recordings of lectures, “podcasts”, news broadcasts etc, has increased greatly in the past few years. Effective access to such data demands transcription. While current automatic speech recognition technology can help with this task, results of automatic transcription alone are often unsatisfactory. Recently, approaches which combine automatic speech recognition and collaborative transcription have been proposed in which geographically distributed users edit and correct automatically generated transcripts. These approaches, however, are based on traditional text-editor interfaces which provide little satisfaction to the users who perform these time-consuming tasks, most often on a voluntarily basis. We present a 3D “transcription game” interface which aims at improving the user experience of the transcription task and, ultimately, creating an extra incentive for users to engage in a process of collaborative transcription in the first place.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Ainsworth, W.A., Pratt, S.R.: Feedback strategies for error correction in speech recognition systems. International Journal of Man-Machine Studies 36(6), 833–842 (1992)
Article Google Scholar
Chelba, C., Silva, J., Acero, A.: Soft indexing of speech content for search in spoken documents. Computer Speech & Language 21(3), 458–478 (2007)
Article Google Scholar
Désilets, A., Gonzalez, L., Paquet, S., Stojanovic, M.: Translation the Wiki way. In: WikiSym 2006: Proceedings of the 2006 International Symposium on Wikis, pp. 19–32. ACM, New York (2006)
Chapter Google Scholar
Evermann, G., Woodland, P.C.: Posterior probability decoding, confidence estimation and system combination. In: Proceedings of the Speech Transcription Workshop. College Park, MD (October 2000)
Google Scholar
Goel, V., Byrne, W., Khudanpur, S.: LVCSR rescoring with modified loss functions: a decision theoretic perspective. In: Procs. of the IEEE Intl. Conf. on Acoustics, Speech and Signal Processing (ICASSP 1998), vol. 1, pp. 425–428 (1998)
Google Scholar
Jelinek, F.: Statistical Methods for Speech Recognition. MIT Press, Cambridge (1998)
Google Scholar
Luz, S., Masoodian, M., Rogers, B., Deering, C.: Interface design strategies for computer-assisted speech transcription. In: Proceedings of the Australasian Conference on Human-Computer Interaction (OZCHI 2008), pp. 203–210. ACM, New York (2008)
Chapter Google Scholar
Luz, S., Masoodian, M., Rogers, B., Zhang, B.: A system for dynamic 3D visualisation of speech recognition paths. In: Bottoni, P., Levialdi, S. (eds.) Proceedings of Advanced Visual Interfaces (AVI 2008), pp. 482–483. ACM Press, New York (2008)
Google Scholar
Mangu, L., Brill, E., Stolcke, A.: Finding consensus in speech recognition: word error minimization and other applications of confusion networks. Computer Speech & Language 14(4), 373–400 (2000)
Article Google Scholar
Munteanu, C., Baecker, R., Penn, G.: Collaborative editing for improved usefulness and usability of transcript-enhanced webcasts. In: Proceedings of the 26th SIGCHI Conference on Human Factors in Computing Systems (CHI 2008), pp. 373–382. ACM, New York (2008)
Google Scholar
Nanjo, H., Kawahara, T.: Towards an efficient archive of spontaneous speech: Design of computer-assisted speech transcription system. The Journal of the Acoustical Society of America 120, 3042 (2006)
Google Scholar
Ogata, J., Goto, M.: PodCastle: a spoken document retrieval system for podcasts and its performance improvement by anonymous user contributions. In: SSCS 2009: Proceedings of the ACM Multimedia Workshop on Searching Spontaneous Conversational Speech, pp. 37–38. ACM, New York (2009)
Chapter Google Scholar
Resnick, P., Iacovou, N., Suchak, M., Bergstrom, P., Riedl, J.: GroupLens: an open architecture for collaborative filtering of netnews. In: Proceedings of the 1994 ACM Conference on Computer Supported Cooperative Work, CSCW 1994, pp. 175–186. ACM, New York (1994)
Chapter Google Scholar
Roy, B., Roy, D.: Fast transcription of unstructured audio recordings. In: Proceedings of the 10th Annual Conference of the International Speech Communication Association (INTERSPEECH), Bristol, UK, p. 4 (2009)
Google Scholar
Suhm, B., Myers, B., Waibel, A.: Multimodal error correction for speech user interfaces. ACM Transactions on Computer-Human Interaction 8(1), 60–98 (2001)
Article Google Scholar
Völkel, M., Krötzsch, M., Vrandecic, D., Haller, H., Studer, R.: Semantic Wikipedia. In: Proceedings of the 15th International Conference on World Wide Web, WWW 2006, pp. 585–594. ACM, New York (2006)
Chapter Google Scholar
Wessel, F., Schluter, R., Ney, H.: Explicit word error minimization using word hypothesis posterior probabilities. In: Procs. of the IEEE Intl. Conf. on Acoustics, Speech, and Signal Processing (ICASSP 2001), vol. 1, pp. 33–36 (2001)
Google Scholar
Zhou, Z.Y., Yu, P., Chelba, C., Seide, F.: Towards spoken-document retrieval for the internet: lattice indexing for large-scale web-search architectures. In: Proceedings of the Conference of the North American Chapter of the ACL, pp. 415–422 (2006)
Google Scholar

Download references

Author information

Authors and Affiliations

School of Computer Science and Statistics, Trinity College Dublin, Dublin, Ireland
Saturnino Luz
Department of Computer Science, The University of Waikato, Hamilton, New Zealand
Masood Masoodian & Bill Rogers

Authors

Saturnino Luz
View author publications
You can also search for this author in PubMed Google Scholar
Masood Masoodian
View author publications
You can also search for this author in PubMed Google Scholar
Bill Rogers
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

School of Engineering, Cardiff University, The Parade, CF24 3AA, Cardiff, UK
Rossitza Setchi
Dept. of Computer Science and Software Engineering, University of Portsmouth, BUckingham Building, Lion Terrace, PO1 3HE, Portsmouth, UK
Ivan Jordanov
KES International, 145-157 St. John Street, EC1V 4PY, London, UK
Robert J. Howlett
School of Electrical and Information Engineering, University of South Australia, Adelaide, Mawson Lakes Campus, 5095, SA, Australia
Lakhmi C. Jain

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Luz, S., Masoodian, M., Rogers, B. (2010). Supporting Collaborative Transcription of Recorded Speech with a 3D Game Interface. In: Setchi, R., Jordanov, I., Howlett, R.J., Jain, L.C. (eds) Knowledge-Based and Intelligent Information and Engineering Systems. KES 2010. Lecture Notes in Computer Science(), vol 6279. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-15384-6_42

Download citation

DOI: https://doi.org/10.1007/978-3-642-15384-6_42
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-15383-9
Online ISBN: 978-3-642-15384-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics