ABSTRACT
Audio stories are an engaging form of communication that combine speech and music into compelling narratives. Existing audio editing tools force story producers to manipulate speech and music tracks via tedious, low-level waveform editing. In contrast, we present a set of tools that analyze the audio content of the speech and music and thereby allow producers to work at much higher level. Our tools address several challenges in creating audio stories, including (1) navigating and editing speech, (2) selecting appropriate music for the score, and (3) editing the music to complement the speech. Key features include a transcript-based speech editing tool that automatically propagates edits in the transcript text to the corresponding speech track; a music browser that supports searching based on emotion, tempo, key, or timbral similarity to other songs; and music retargeting tools that make it easy to combine sections of music with the speech. We have used our tools to create audio stories from a variety of raw speech sources, including scripted narratives, interviews and political speeches. Informal feedback from first-time users suggests that our tools are easy to learn and greatly facilitate the process of editing raw footage into a final story.
Supplemental Material
- Adobe Audition. http://adobe.com/audition, Apr. 2013. Accessed: 2013-04-02.Google Scholar
- Avid ProTools. http://avid.com/protools, Apr. 2013. Accessed: 2013-04-02.Google Scholar
- CastingWords. http://castingwords.com/, Apr. 2013. Accessed: 2013-04-02.Google Scholar
- The EchoNest API. http://developer.echonest.com/docs/v4, Apr. 2013. Accessed: 2013-04-02.Google Scholar
- Hindenburg Journalist Pro. http://hindenburgsystems.com, Apr. 2013. Accessed: 2013-04-02.Google Scholar
- Last.fm. http://last.fm/, Apr. 2013. Accessed: 2013-04-02.Google Scholar
- Abel, J., and Glass, I. Radio: An Illustrated Guide. WBEZ Alliance Inc., 1999.Google Scholar
- Abumrad, J. Music: A force for good (and sometimes evil). http://www.thirdcoastfestival.org/library/450-music-a-force-for-good-and-sometimes-evil, 2005. Accessed: 2013-04-02.Google Scholar
- Barthet, M., Hargreaves, S., and Sandler, M. Speech/music discrimination in audio podcast using structural segmentation and timbre recognition. Exploring Music Contents (2011), 138--162. Google ScholarDigital Library
- Berthouzoz, F., Li, W., and Agrawala, M. Tools for placing cuts and transitions in interview video. ACM Transactions on Graphics (SIGGRAPH Conference Proceedings) 31 (2012). Google ScholarDigital Library
- Blesser, B. Audio dynamic range compression for minimum perceived distortion. IEEE Transactions on Audio and Electroacoustics 17, 1 (1969), 22--32.Google ScholarCross Ref
- Boersma, P. Praat, A system for doing phonetics by computer. Glot International 5, 9/10 (2002), 341--345.Google Scholar
- Boll, S. Suppression of acoustic noise in speech using spectral subtraction. IEEE Transactions on Acoustics, Speech and Signal Processing (TASSP) 27, 2 (1979), 113--120.Google ScholarCross Ref
- Casares, J., Long, A. C., Myers, B. A., Bhatnagar, R., Stevens, S. M., Dabbish, L., Yocum, D., and Corbett, A. Simplifying video editing using metadata. Proceedings of the 4th conference on Designing interactive systems (DIS) (2002), 157--166. Google ScholarDigital Library
- Davis, M. Editing out video editing. IEEE MultiMedia 10, 2 (2003), 54--64. Google ScholarDigital Library
- Ellis, D. P., and Poliner, G. E. Identifying cover songs with chroma features and dynamic programming beat tracking. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 4 (2007), 1429--1432.Google ScholarCross Ref
- Fazekas, G., and Sandler, M. Intelligent editing of studio recordings with the help of automatic music structure extraction. Audio Engineering Society Convention (2007).Google Scholar
- Foote, J. Automatic audio segmentation using a measure of audio novelty. IEEE International Conference on Multimedia and Expo (ICME) 1 (2000), 452--455.Google ScholarCross Ref
- Girgensohn, A., Boreczky, J., Chiu, P., Doherty, J., Foote, J., Golovchinsky, G., Uchihashi, S., and Wilcox, L. A semi-automatic approach to home video editing. Proceedings of the 13th annual ACM symposium on User interface software and technology (UIST) (2000), 81--89. Google ScholarDigital Library
- Glass, I. The transom review: Ira Glass. http://transom.org/?p=6978, June 2004. Accessed: 2013-04-02.Google Scholar
- Golding, B. Digital editing basics. http://transom.org/?p=7540, Jan. 2001. Accessed: 2013-04-02.Google Scholar
- Jehan, T. Creating music by listening. PhD thesis, Massachusetts Institute of Technology, 2005.Google Scholar
- Jehan, T., and Sundram, J. The EchoNest remix earworm. https://github.com/echonest/remix/tree/master/examples/earworm, Apr. 2013. Accessed: 2013-04-02.Google Scholar
- Kern, J. Sound reporting: The NPR guide to audio journalism and production. University of Chicago Press, 2008.Google ScholarCross Ref
- Li, F. C., Gupta, A., Sanocki, E., He, L.-w., and Rui, Y. Browsing digital video. Proceedings of the SIGCHI conference on Human factors in computing systems (CHI) 1, 6 (2000), 169--176. Google ScholarDigital Library
- Lillie, A. S. MusicBox: Navigating the space of your music. PhD thesis, Massachusetts Institute of Technology, 2008.Google Scholar
- Logan, B. Mel frequency cepstral coefficients for music modeling. International Symposium on Music Information Retrieval (ISMIR) (2000).Google Scholar
- Louizou, P. C. Speech Enhancement: Theory and Practice. CRC Press, 2007. Google ScholarDigital Library
- Lu, L., Wenyin, L., and Zhang, H.-J. Audio textures: Theory and applications. IEEE Transactions on Speech and Audio Processing 12, 2 (2004), 156--167.Google ScholarCross Ref
- Pachet, F., Aucouturier, J.-J., La Burthe, A., Zils, A., and Beurive, A. The cuidado music browser: an end-to-end electronic music distribution system. Multimedia Tools and Applications 30, 3 (2006), 331--349. Google ScholarDigital Library
- Pampalk, E., and Goto, M. Musicrainbow: A new user interface to discover artists using audio-based similarity and web-based labeling. International Conference on Music Information Retrieval (ISMIR) (2006).Google Scholar
- Pang, B., and Lee, L. Opinion mining and sentiment analysis. Foundations and trends in information retrieval 2, 1-2 (2008), 1--135. Google ScholarDigital Library
- Purcell, J. Dialogue editing for motion pictures: a guide to the invisible art. Focal Press, 2007. Google ScholarDigital Library
- Rubin, S., Berthouzoz, F., Mysore, G. J., Li, W., and Agrawala, M. Underscore: Musical underlays for audio stories. Proceedings of the 25th ACM Symposium on User Interface Software and Technology (UIST) (2012), 359--366. Google ScholarDigital Library
- Rudnicky, A. Sphinx knowledge base tool. http://www.speech.cs.cmu.edu/tools/lmtool-new.html, Apr. 2013. Accessed: 2013-04-02.Google Scholar
- Russell, J. A. A circumplex model of affect. Journal of personality and social psychology 39, 6 (1980), 1161--1178.Google Scholar
- Schmidt, E. M., and Kim, Y. E. Modeling musical emotion dynamics with conditional random fields. International Symposium on Music Information Retrieval (ISMIR) (2011).Google Scholar
- Wenner, S., Bazin, J.-C., Sorkine-Hornung, A., Kim, C., and Gross, M. Scalable music: Automatic music retargeting and synthesis. Computer Graphics Forum (Eurographics conference Proceedings) 32, 2 (2013), 345--354.Google ScholarCross Ref
- Whitman, B. Infinite Jukebox. http://labs.echonest.com/Uploader/index.html, Apr. 2013. Accessed: 2013-04-02.Google Scholar
- Whittaker, S., and Amento, B. Semantic speech editing. Proceedings of the SIGCHI conference on Human factors in computing systems 24, 29 (2004), 527--534. Google ScholarDigital Library
- Young, S., Evermann, G., Kershaw, D., Moore, G., Odell, J., Ollason, D., Valtchev, V., and Woodland, P. The HTK book. Cambridge University Engineering Department (2002).Google Scholar
- Yuan, J., and Liberman, M. Speaker identification on the SCOTUS corpus. Journal of the Acoustical Society of America 123, 5 (2008), 3878.Google ScholarCross Ref
- Zils, A., and Pachet, F. Musical mosaicing. Digital Audio Effects (DAFx) (2001).Google Scholar
- Zolzer, U. DAFX: Digital Audio Effects. Wiley Publishing, 2011. Google ScholarDigital Library
Index Terms
- Content-based tools for editing audio stories
Recommendations
Generating emotionally relevant musical scores for audio stories
UIST '14: Proceedings of the 27th annual ACM symposium on User interface software and technologyHighly-produced audio stories often include musical scores that reflect the emotions of the speech. Yet, creating effective musical scores requires deep expertise in sound production and is time-consuming even for experts. We present a system and ...
UnderScore: musical underlays for audio stories
UIST '12: Proceedings of the 25th annual ACM symposium on User interface software and technologyAudio producers often use musical underlays to emphasize key moments in spoken content and give listeners time to reflect on what was said. Yet, creating such underlays is time-consuming as producers must carefully (1) mark an emphasis point in the ...
Dynamic Authoring of Audio with Linked Scripts
UIST '16: Proceedings of the 29th Annual Symposium on User Interface Software and TechnologySpeech recordings are central to modern media from podcasts to audio books to e-lectures and voice-overs. Authoring these recordings involves an iterative back and forth process between script writing/editing and audio recording/editing. Yet, most ...
Comments