research-article

Open Access

Content-based tools for editing audio stories

Authors:
Steve Rubin

University of California, Berkeley, Berkeley, USA

University of California, Berkeley, Berkeley, USA
View Profile

,
Floraine Berthouzoz

University of California, Berkeley, Berkeley, USA

University of California, Berkeley, Berkeley, USA
View Profile

,
Gautham J. Mysore

Adobe Research, San Francisco, USA

Adobe Research, San Francisco, USA
View Profile

,
Wilmot Li

Adobe Systems, San Francisco, USA

Adobe Systems, San Francisco, USA
View Profile

,
Maneesh Agrawala

University of California, Berkeley, Berkeley, USA

University of California, Berkeley, Berkeley, USA
View Profile

UIST '13: Proceedings of the 26th annual ACM symposium on User interface software and technologyOctober 2013Pages 113–122https://doi.org/10.1145/2501988.2501993

Published:08 October 2013Publication History

UIST '13: Proceedings of the 26th annual ACM symposium on User interface software and technology

Pages 113–122

ABSTRACT

Audio stories are an engaging form of communication that combine speech and music into compelling narratives. Existing audio editing tools force story producers to manipulate speech and music tracks via tedious, low-level waveform editing. In contrast, we present a set of tools that analyze the audio content of the speech and music and thereby allow producers to work at much higher level. Our tools address several challenges in creating audio stories, including (1) navigating and editing speech, (2) selecting appropriate music for the score, and (3) editing the music to complement the speech. Key features include a transcript-based speech editing tool that automatically propagates edits in the transcript text to the corresponding speech track; a music browser that supports searching based on emotion, tempo, key, or timbral similarity to other songs; and music retargeting tools that make it easy to combine sections of music with the speech. We have used our tools to create audio stories from a variety of raw speech sources, including scripted narratives, interviews and political speeches. Informal feedback from first-time users suggests that our tools are easy to learn and greatly facilitate the process of editing raw footage into a final story.

Supplemental Material

uist127.mov

mov

33.6 MB

Download

References

Adobe Audition. http://adobe.com/audition, Apr. 2013. Accessed: 2013-04-02.Google Scholar
Avid ProTools. http://avid.com/protools, Apr. 2013. Accessed: 2013-04-02.Google Scholar
CastingWords. http://castingwords.com/, Apr. 2013. Accessed: 2013-04-02.Google Scholar
The EchoNest API. http://developer.echonest.com/docs/v4, Apr. 2013. Accessed: 2013-04-02.Google Scholar
Hindenburg Journalist Pro. http://hindenburgsystems.com, Apr. 2013. Accessed: 2013-04-02.Google Scholar
Last.fm. http://last.fm/, Apr. 2013. Accessed: 2013-04-02.Google Scholar
Abel, J., and Glass, I. Radio: An Illustrated Guide. WBEZ Alliance Inc., 1999.Google Scholar
Abumrad, J. Music: A force for good (and sometimes evil). http://www.thirdcoastfestival.org/library/450-music-a-force-for-good-and-sometimes-evil, 2005. Accessed: 2013-04-02.Google Scholar
Barthet, M., Hargreaves, S., and Sandler, M. Speech/music discrimination in audio podcast using structural segmentation and timbre recognition. Exploring Music Contents (2011), 138--162. Google ScholarDigital Library
Berthouzoz, F., Li, W., and Agrawala, M. Tools for placing cuts and transitions in interview video. ACM Transactions on Graphics (SIGGRAPH Conference Proceedings) 31 (2012). Google ScholarDigital Library
Blesser, B. Audio dynamic range compression for minimum perceived distortion. IEEE Transactions on Audio and Electroacoustics 17, 1 (1969), 22--32.Google ScholarCross Ref
Boersma, P. Praat, A system for doing phonetics by computer. Glot International 5, 9/10 (2002), 341--345.Google Scholar
Boll, S. Suppression of acoustic noise in speech using spectral subtraction. IEEE Transactions on Acoustics, Speech and Signal Processing (TASSP) 27, 2 (1979), 113--120.Google ScholarCross Ref
Casares, J., Long, A. C., Myers, B. A., Bhatnagar, R., Stevens, S. M., Dabbish, L., Yocum, D., and Corbett, A. Simplifying video editing using metadata. Proceedings of the 4th conference on Designing interactive systems (DIS) (2002), 157--166. Google ScholarDigital Library
Davis, M. Editing out video editing. IEEE MultiMedia 10, 2 (2003), 54--64. Google ScholarDigital Library
Ellis, D. P., and Poliner, G. E. Identifying cover songs with chroma features and dynamic programming beat tracking. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 4 (2007), 1429--1432.Google ScholarCross Ref
Fazekas, G., and Sandler, M. Intelligent editing of studio recordings with the help of automatic music structure extraction. Audio Engineering Society Convention (2007).Google Scholar
Foote, J. Automatic audio segmentation using a measure of audio novelty. IEEE International Conference on Multimedia and Expo (ICME) 1 (2000), 452--455.Google ScholarCross Ref
Girgensohn, A., Boreczky, J., Chiu, P., Doherty, J., Foote, J., Golovchinsky, G., Uchihashi, S., and Wilcox, L. A semi-automatic approach to home video editing. Proceedings of the 13th annual ACM symposium on User interface software and technology (UIST) (2000), 81--89. Google ScholarDigital Library
Glass, I. The transom review: Ira Glass. http://transom.org/?p=6978, June 2004. Accessed: 2013-04-02.Google Scholar
Golding, B. Digital editing basics. http://transom.org/?p=7540, Jan. 2001. Accessed: 2013-04-02.Google Scholar
Jehan, T. Creating music by listening. PhD thesis, Massachusetts Institute of Technology, 2005.Google Scholar
Jehan, T., and Sundram, J. The EchoNest remix earworm. https://github.com/echonest/remix/tree/master/examples/earworm, Apr. 2013. Accessed: 2013-04-02.Google Scholar
Kern, J. Sound reporting: The NPR guide to audio journalism and production. University of Chicago Press, 2008.Google ScholarCross Ref
Li, F. C., Gupta, A., Sanocki, E., He, L.-w., and Rui, Y. Browsing digital video. Proceedings of the SIGCHI conference on Human factors in computing systems (CHI) 1, 6 (2000), 169--176. Google ScholarDigital Library
Lillie, A. S. MusicBox: Navigating the space of your music. PhD thesis, Massachusetts Institute of Technology, 2008.Google Scholar
Logan, B. Mel frequency cepstral coefficients for music modeling. International Symposium on Music Information Retrieval (ISMIR) (2000).Google Scholar
Louizou, P. C. Speech Enhancement: Theory and Practice. CRC Press, 2007. Google ScholarDigital Library
Lu, L., Wenyin, L., and Zhang, H.-J. Audio textures: Theory and applications. IEEE Transactions on Speech and Audio Processing 12, 2 (2004), 156--167.Google ScholarCross Ref
Pachet, F., Aucouturier, J.-J., La Burthe, A., Zils, A., and Beurive, A. The cuidado music browser: an end-to-end electronic music distribution system. Multimedia Tools and Applications 30, 3 (2006), 331--349. Google ScholarDigital Library
Pampalk, E., and Goto, M. Musicrainbow: A new user interface to discover artists using audio-based similarity and web-based labeling. International Conference on Music Information Retrieval (ISMIR) (2006).Google Scholar
Pang, B., and Lee, L. Opinion mining and sentiment analysis. Foundations and trends in information retrieval 2, 1-2 (2008), 1--135. Google ScholarDigital Library
Purcell, J. Dialogue editing for motion pictures: a guide to the invisible art. Focal Press, 2007. Google ScholarDigital Library
Rubin, S., Berthouzoz, F., Mysore, G. J., Li, W., and Agrawala, M. Underscore: Musical underlays for audio stories. Proceedings of the 25th ACM Symposium on User Interface Software and Technology (UIST) (2012), 359--366. Google ScholarDigital Library
Rudnicky, A. Sphinx knowledge base tool. http://www.speech.cs.cmu.edu/tools/lmtool-new.html, Apr. 2013. Accessed: 2013-04-02.Google Scholar
Russell, J. A. A circumplex model of affect. Journal of personality and social psychology 39, 6 (1980), 1161--1178.Google Scholar
Schmidt, E. M., and Kim, Y. E. Modeling musical emotion dynamics with conditional random fields. International Symposium on Music Information Retrieval (ISMIR) (2011).Google Scholar
Wenner, S., Bazin, J.-C., Sorkine-Hornung, A., Kim, C., and Gross, M. Scalable music: Automatic music retargeting and synthesis. Computer Graphics Forum (Eurographics conference Proceedings) 32, 2 (2013), 345--354.Google ScholarCross Ref
Whitman, B. Infinite Jukebox. http://labs.echonest.com/Uploader/index.html, Apr. 2013. Accessed: 2013-04-02.Google Scholar
Whittaker, S., and Amento, B. Semantic speech editing. Proceedings of the SIGCHI conference on Human factors in computing systems 24, 29 (2004), 527--534. Google ScholarDigital Library
Young, S., Evermann, G., Kershaw, D., Moore, G., Odell, J., Ollason, D., Valtchev, V., and Woodland, P. The HTK book. Cambridge University Engineering Department (2002).Google Scholar
Yuan, J., and Liberman, M. Speaker identification on the SCOTUS corpus. Journal of the Acoustical Society of America 123, 5 (2008), 3878.Google ScholarCross Ref
Zils, A., and Pachet, F. Musical mosaicing. Digital Audio Effects (DAFx) (2001).Google Scholar
Zolzer, U. DAFX: Digital Audio Effects. Wiley Publishing, 2011. Google ScholarDigital Library

Index Terms

Content-based tools for editing audio stories
1. Human-centered computing
  1. Human computer interaction (HCI)
    1. Interaction paradigms
      1. Graphical user interfaces

Recommendations

Generating emotionally relevant musical scores for audio stories
UIST '14: Proceedings of the 27th annual ACM symposium on User interface software and technology

Highly-produced audio stories often include musical scores that reflect the emotions of the speech. Yet, creating effective musical scores requires deep expertise in sound production and is time-consuming even for experts. We present a system and ...
Read More
UnderScore: musical underlays for audio stories
UIST '12: Proceedings of the 25th annual ACM symposium on User interface software and technology

Audio producers often use musical underlays to emphasize key moments in spoken content and give listeners time to reflect on what was said. Yet, creating such underlays is time-consuming as producers must carefully (1) mark an emphasis point in the ...
Read More
Dynamic Authoring of Audio with Linked Scripts
UIST '16: Proceedings of the 29th Annual Symposium on User Interface Software and Technology

Speech recordings are central to modern media from podcasts to audio books to e-lectures and voice-overs. Authoring these recordings involves an iterative back and forth process between script writing/editing and audio recording/editing. Yet, most ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
UIST '13: Proceedings of the 26th annual ACM symposium on User interface software and technology
October 2013
558 pages
ISBN:9781450322683
DOI:10.1145/2501988
General Chairs:
Shahram Izadi
Microsoft Research, UK
,
Aaron Quigley
University of St Andrews, UK
,
Program Chairs:
Ivan Poupyrev
Disney Research, USA
,
Takeo Igarashi
The University of Tokyo, Japan
Copyright © 2013 Owner/Author
Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 8 October 2013
Check for updates
Author Tags
audio editing
music browsing
music retargeting
storytelling
transcript-based editing
Qualifiers
- research-article
Conference

Acceptance Rates
UIST '13 Paper Acceptance Rate62of317submissions,20%Overall Acceptance Rate842of3,967submissions,21%
More
Upcoming Conference
UIST '24

Sponsor:

sigchi

sigchi

UIST '24: The 37th Annual ACM Symposium on User Interface Software and Technology

October 13 - 16, 2024

Pittsburgh , PA , USA
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 47
  Total Citations
  View Citations
- 1,607
  Total Downloads
- Downloads (Last 12 months)198
- Downloads (Last 6 weeks)19
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Content-based tools for editing audio stories

UIST '13: Proceedings of the 26th annual ACM symposium on User interface software and technology

ABSTRACT

Supplemental Material

References

Cited By

Index Terms

Recommendations

Generating emotionally relevant musical scores for audio stories

UnderScore: musical underlays for audio stories

Dynamic Authoring of Audio with Linked Scripts