skip to main content
10.1145/2501988.2501993acmconferencesArticle/Chapter ViewAbstractPublication PagesuistConference Proceedingsconference-collections
research-article
Open Access

Content-based tools for editing audio stories

Published:08 October 2013Publication History

ABSTRACT

Audio stories are an engaging form of communication that combine speech and music into compelling narratives. Existing audio editing tools force story producers to manipulate speech and music tracks via tedious, low-level waveform editing. In contrast, we present a set of tools that analyze the audio content of the speech and music and thereby allow producers to work at much higher level. Our tools address several challenges in creating audio stories, including (1) navigating and editing speech, (2) selecting appropriate music for the score, and (3) editing the music to complement the speech. Key features include a transcript-based speech editing tool that automatically propagates edits in the transcript text to the corresponding speech track; a music browser that supports searching based on emotion, tempo, key, or timbral similarity to other songs; and music retargeting tools that make it easy to combine sections of music with the speech. We have used our tools to create audio stories from a variety of raw speech sources, including scripted narratives, interviews and political speeches. Informal feedback from first-time users suggests that our tools are easy to learn and greatly facilitate the process of editing raw footage into a final story.

Skip Supplemental Material Section

Supplemental Material

uist127.mov

mov

33.6 MB

References

  1. Adobe Audition. http://adobe.com/audition, Apr. 2013. Accessed: 2013-04-02.Google ScholarGoogle Scholar
  2. Avid ProTools. http://avid.com/protools, Apr. 2013. Accessed: 2013-04-02.Google ScholarGoogle Scholar
  3. CastingWords. http://castingwords.com/, Apr. 2013. Accessed: 2013-04-02.Google ScholarGoogle Scholar
  4. The EchoNest API. http://developer.echonest.com/docs/v4, Apr. 2013. Accessed: 2013-04-02.Google ScholarGoogle Scholar
  5. Hindenburg Journalist Pro. http://hindenburgsystems.com, Apr. 2013. Accessed: 2013-04-02.Google ScholarGoogle Scholar
  6. Last.fm. http://last.fm/, Apr. 2013. Accessed: 2013-04-02.Google ScholarGoogle Scholar
  7. Abel, J., and Glass, I. Radio: An Illustrated Guide. WBEZ Alliance Inc., 1999.Google ScholarGoogle Scholar
  8. Abumrad, J. Music: A force for good (and sometimes evil). http://www.thirdcoastfestival.org/library/450-music-a-force-for-good-and-sometimes-evil, 2005. Accessed: 2013-04-02.Google ScholarGoogle Scholar
  9. Barthet, M., Hargreaves, S., and Sandler, M. Speech/music discrimination in audio podcast using structural segmentation and timbre recognition. Exploring Music Contents (2011), 138--162. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Berthouzoz, F., Li, W., and Agrawala, M. Tools for placing cuts and transitions in interview video. ACM Transactions on Graphics (SIGGRAPH Conference Proceedings) 31 (2012). Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Blesser, B. Audio dynamic range compression for minimum perceived distortion. IEEE Transactions on Audio and Electroacoustics 17, 1 (1969), 22--32.Google ScholarGoogle ScholarCross RefCross Ref
  12. Boersma, P. Praat, A system for doing phonetics by computer. Glot International 5, 9/10 (2002), 341--345.Google ScholarGoogle Scholar
  13. Boll, S. Suppression of acoustic noise in speech using spectral subtraction. IEEE Transactions on Acoustics, Speech and Signal Processing (TASSP) 27, 2 (1979), 113--120.Google ScholarGoogle ScholarCross RefCross Ref
  14. Casares, J., Long, A. C., Myers, B. A., Bhatnagar, R., Stevens, S. M., Dabbish, L., Yocum, D., and Corbett, A. Simplifying video editing using metadata. Proceedings of the 4th conference on Designing interactive systems (DIS) (2002), 157--166. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Davis, M. Editing out video editing. IEEE MultiMedia 10, 2 (2003), 54--64. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Ellis, D. P., and Poliner, G. E. Identifying cover songs with chroma features and dynamic programming beat tracking. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 4 (2007), 1429--1432.Google ScholarGoogle ScholarCross RefCross Ref
  17. Fazekas, G., and Sandler, M. Intelligent editing of studio recordings with the help of automatic music structure extraction. Audio Engineering Society Convention (2007).Google ScholarGoogle Scholar
  18. Foote, J. Automatic audio segmentation using a measure of audio novelty. IEEE International Conference on Multimedia and Expo (ICME) 1 (2000), 452--455.Google ScholarGoogle ScholarCross RefCross Ref
  19. Girgensohn, A., Boreczky, J., Chiu, P., Doherty, J., Foote, J., Golovchinsky, G., Uchihashi, S., and Wilcox, L. A semi-automatic approach to home video editing. Proceedings of the 13th annual ACM symposium on User interface software and technology (UIST) (2000), 81--89. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Glass, I. The transom review: Ira Glass. http://transom.org/?p=6978, June 2004. Accessed: 2013-04-02.Google ScholarGoogle Scholar
  21. Golding, B. Digital editing basics. http://transom.org/?p=7540, Jan. 2001. Accessed: 2013-04-02.Google ScholarGoogle Scholar
  22. Jehan, T. Creating music by listening. PhD thesis, Massachusetts Institute of Technology, 2005.Google ScholarGoogle Scholar
  23. Jehan, T., and Sundram, J. The EchoNest remix earworm. https://github.com/echonest/remix/tree/master/examples/earworm, Apr. 2013. Accessed: 2013-04-02.Google ScholarGoogle Scholar
  24. Kern, J. Sound reporting: The NPR guide to audio journalism and production. University of Chicago Press, 2008.Google ScholarGoogle ScholarCross RefCross Ref
  25. Li, F. C., Gupta, A., Sanocki, E., He, L.-w., and Rui, Y. Browsing digital video. Proceedings of the SIGCHI conference on Human factors in computing systems (CHI) 1, 6 (2000), 169--176. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Lillie, A. S. MusicBox: Navigating the space of your music. PhD thesis, Massachusetts Institute of Technology, 2008.Google ScholarGoogle Scholar
  27. Logan, B. Mel frequency cepstral coefficients for music modeling. International Symposium on Music Information Retrieval (ISMIR) (2000).Google ScholarGoogle Scholar
  28. Louizou, P. C. Speech Enhancement: Theory and Practice. CRC Press, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Lu, L., Wenyin, L., and Zhang, H.-J. Audio textures: Theory and applications. IEEE Transactions on Speech and Audio Processing 12, 2 (2004), 156--167.Google ScholarGoogle ScholarCross RefCross Ref
  30. Pachet, F., Aucouturier, J.-J., La Burthe, A., Zils, A., and Beurive, A. The cuidado music browser: an end-to-end electronic music distribution system. Multimedia Tools and Applications 30, 3 (2006), 331--349. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Pampalk, E., and Goto, M. Musicrainbow: A new user interface to discover artists using audio-based similarity and web-based labeling. International Conference on Music Information Retrieval (ISMIR) (2006).Google ScholarGoogle Scholar
  32. Pang, B., and Lee, L. Opinion mining and sentiment analysis. Foundations and trends in information retrieval 2, 1-2 (2008), 1--135. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Purcell, J. Dialogue editing for motion pictures: a guide to the invisible art. Focal Press, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Rubin, S., Berthouzoz, F., Mysore, G. J., Li, W., and Agrawala, M. Underscore: Musical underlays for audio stories. Proceedings of the 25th ACM Symposium on User Interface Software and Technology (UIST) (2012), 359--366. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Rudnicky, A. Sphinx knowledge base tool. http://www.speech.cs.cmu.edu/tools/lmtool-new.html, Apr. 2013. Accessed: 2013-04-02.Google ScholarGoogle Scholar
  36. Russell, J. A. A circumplex model of affect. Journal of personality and social psychology 39, 6 (1980), 1161--1178.Google ScholarGoogle Scholar
  37. Schmidt, E. M., and Kim, Y. E. Modeling musical emotion dynamics with conditional random fields. International Symposium on Music Information Retrieval (ISMIR) (2011).Google ScholarGoogle Scholar
  38. Wenner, S., Bazin, J.-C., Sorkine-Hornung, A., Kim, C., and Gross, M. Scalable music: Automatic music retargeting and synthesis. Computer Graphics Forum (Eurographics conference Proceedings) 32, 2 (2013), 345--354.Google ScholarGoogle ScholarCross RefCross Ref
  39. Whitman, B. Infinite Jukebox. http://labs.echonest.com/Uploader/index.html, Apr. 2013. Accessed: 2013-04-02.Google ScholarGoogle Scholar
  40. Whittaker, S., and Amento, B. Semantic speech editing. Proceedings of the SIGCHI conference on Human factors in computing systems 24, 29 (2004), 527--534. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Young, S., Evermann, G., Kershaw, D., Moore, G., Odell, J., Ollason, D., Valtchev, V., and Woodland, P. The HTK book. Cambridge University Engineering Department (2002).Google ScholarGoogle Scholar
  42. Yuan, J., and Liberman, M. Speaker identification on the SCOTUS corpus. Journal of the Acoustical Society of America 123, 5 (2008), 3878.Google ScholarGoogle ScholarCross RefCross Ref
  43. Zils, A., and Pachet, F. Musical mosaicing. Digital Audio Effects (DAFx) (2001).Google ScholarGoogle Scholar
  44. Zolzer, U. DAFX: Digital Audio Effects. Wiley Publishing, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Content-based tools for editing audio stories

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      UIST '13: Proceedings of the 26th annual ACM symposium on User interface software and technology
      October 2013
      558 pages
      ISBN:9781450322683
      DOI:10.1145/2501988

      Copyright © 2013 Owner/Author

      Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 8 October 2013

      Check for updates

      Qualifiers

      • research-article

      Acceptance Rates

      UIST '13 Paper Acceptance Rate62of317submissions,20%Overall Acceptance Rate842of3,967submissions,21%

      Upcoming Conference

      UIST '24

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader