Skip to main content

Supporting Collaborative Transcription of Recorded Speech with a 3D Game Interface

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 6279))

Abstract

The amount of speech data available on-line and in institutional repositories, including recordings of lectures, “podcasts”, news broadcasts etc, has increased greatly in the past few years. Effective access to such data demands transcription. While current automatic speech recognition technology can help with this task, results of automatic transcription alone are often unsatisfactory. Recently, approaches which combine automatic speech recognition and collaborative transcription have been proposed in which geographically distributed users edit and correct automatically generated transcripts. These approaches, however, are based on traditional text-editor interfaces which provide little satisfaction to the users who perform these time-consuming tasks, most often on a voluntarily basis. We present a 3D “transcription game” interface which aims at improving the user experience of the transcription task and, ultimately, creating an extra incentive for users to engage in a process of collaborative transcription in the first place.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Ainsworth, W.A., Pratt, S.R.: Feedback strategies for error correction in speech recognition systems. International Journal of Man-Machine Studies 36(6), 833–842 (1992)

    Article  Google Scholar 

  2. Chelba, C., Silva, J., Acero, A.: Soft indexing of speech content for search in spoken documents. Computer Speech & Language 21(3), 458–478 (2007)

    Article  Google Scholar 

  3. Désilets, A., Gonzalez, L., Paquet, S., Stojanovic, M.: Translation the Wiki way. In: WikiSym 2006: Proceedings of the 2006 International Symposium on Wikis, pp. 19–32. ACM, New York (2006)

    Chapter  Google Scholar 

  4. Evermann, G., Woodland, P.C.: Posterior probability decoding, confidence estimation and system combination. In: Proceedings of the Speech Transcription Workshop. College Park, MD (October 2000)

    Google Scholar 

  5. Goel, V., Byrne, W., Khudanpur, S.: LVCSR rescoring with modified loss functions: a decision theoretic perspective. In: Procs. of the IEEE Intl. Conf. on Acoustics, Speech and Signal Processing (ICASSP 1998), vol. 1, pp. 425–428 (1998)

    Google Scholar 

  6. Jelinek, F.: Statistical Methods for Speech Recognition. MIT Press, Cambridge (1998)

    Google Scholar 

  7. Luz, S., Masoodian, M., Rogers, B., Deering, C.: Interface design strategies for computer-assisted speech transcription. In: Proceedings of the Australasian Conference on Human-Computer Interaction (OZCHI 2008), pp. 203–210. ACM, New York (2008)

    Chapter  Google Scholar 

  8. Luz, S., Masoodian, M., Rogers, B., Zhang, B.: A system for dynamic 3D visualisation of speech recognition paths. In: Bottoni, P., Levialdi, S. (eds.) Proceedings of Advanced Visual Interfaces (AVI 2008), pp. 482–483. ACM Press, New York (2008)

    Google Scholar 

  9. Mangu, L., Brill, E., Stolcke, A.: Finding consensus in speech recognition: word error minimization and other applications of confusion networks. Computer Speech & Language 14(4), 373–400 (2000)

    Article  Google Scholar 

  10. Munteanu, C., Baecker, R., Penn, G.: Collaborative editing for improved usefulness and usability of transcript-enhanced webcasts. In: Proceedings of the 26th SIGCHI Conference on Human Factors in Computing Systems (CHI 2008), pp. 373–382. ACM, New York (2008)

    Google Scholar 

  11. Nanjo, H., Kawahara, T.: Towards an efficient archive of spontaneous speech: Design of computer-assisted speech transcription system. The Journal of the Acoustical Society of America 120, 3042 (2006)

    Google Scholar 

  12. Ogata, J., Goto, M.: PodCastle: a spoken document retrieval system for podcasts and its performance improvement by anonymous user contributions. In: SSCS 2009: Proceedings of the ACM Multimedia Workshop on Searching Spontaneous Conversational Speech, pp. 37–38. ACM, New York (2009)

    Chapter  Google Scholar 

  13. Resnick, P., Iacovou, N., Suchak, M., Bergstrom, P., Riedl, J.: GroupLens: an open architecture for collaborative filtering of netnews. In: Proceedings of the 1994 ACM Conference on Computer Supported Cooperative Work, CSCW 1994, pp. 175–186. ACM, New York (1994)

    Chapter  Google Scholar 

  14. Roy, B., Roy, D.: Fast transcription of unstructured audio recordings. In: Proceedings of the 10th Annual Conference of the International Speech Communication Association (INTERSPEECH), Bristol, UK, p. 4 (2009)

    Google Scholar 

  15. Suhm, B., Myers, B., Waibel, A.: Multimodal error correction for speech user interfaces. ACM Transactions on Computer-Human Interaction 8(1), 60–98 (2001)

    Article  Google Scholar 

  16. Völkel, M., Krötzsch, M., Vrandecic, D., Haller, H., Studer, R.: Semantic Wikipedia. In: Proceedings of the 15th International Conference on World Wide Web, WWW 2006, pp. 585–594. ACM, New York (2006)

    Chapter  Google Scholar 

  17. Wessel, F., Schluter, R., Ney, H.: Explicit word error minimization using word hypothesis posterior probabilities. In: Procs. of the IEEE Intl. Conf. on Acoustics, Speech, and Signal Processing (ICASSP 2001), vol. 1, pp. 33–36 (2001)

    Google Scholar 

  18. Zhou, Z.Y., Yu, P., Chelba, C., Seide, F.: Towards spoken-document retrieval for the internet: lattice indexing for large-scale web-search architectures. In: Proceedings of the Conference of the North American Chapter of the ACL, pp. 415–422 (2006)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2010 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Luz, S., Masoodian, M., Rogers, B. (2010). Supporting Collaborative Transcription of Recorded Speech with a 3D Game Interface. In: Setchi, R., Jordanov, I., Howlett, R.J., Jain, L.C. (eds) Knowledge-Based and Intelligent Information and Engineering Systems. KES 2010. Lecture Notes in Computer Science(), vol 6279. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-15384-6_42

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-15384-6_42

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-15383-9

  • Online ISBN: 978-3-642-15384-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics