Abstract
Trained musicians intuitively produce expressive variations that add to their audience’s enjoyment. However, there is little quantitative information about the kinds of strategies used in different musical contexts. Since the literal synthesis of notes from a score is bland and unappealing, there is an opportunity for learning systems that can automatically produce compelling expressive variations. The ESP (Expressive Synthetic Performance) system generates expressive renditions using hierarchical hidden Markov models trained on the stylistic variations employed by human performers. Furthermore, the generative models learned by the ESP system provide insight into a number of musicological issues related to expressive performance.
Article PDF
Similar content being viewed by others
Avoid common mistakes on your manuscript.
References
Arcos, J., & de Mántaras, R.L. (2001). An interactive cbr approach for generating expressive music. Journal of Applied Intelligence, 27(1), 115–129.
Bengio, Y. (1999). Markovian models for sequential data. Neural Computing Surveys, 2, 129–162.
Bilmes, J. (1997). A gentle tutorial on the EM algorithm and its application to parameter estimation for gaussian mixture and hidden Markov models. Technical Report ICSI-TR-97-021, University of California at Berkeley.
Brand, M., & Hertzmann, A. (2000). Style machines. In: Proceedings of ACM SIGGRAPH 2000 (pp. 183–192).
Brand, M. (1999a). An entropic estimator for structure discovery. In M. J. Kearns, S. A. Solla, and D. A. Cohn (Eds.), Advances in Neural Information Processing Systems 11. MIT Press: Cambridge, MA.
Brand, M. (1999b). Pattern discovery via entropy minimization. In: D. Heckerman and C. Whittaker (Eds.), Artificial Intelligence and Statistics, Morgan Kaufman.
Brand, M. (1999c). Structure learning in conditional probability models via an entropic prior and parameter extinction. Neural Computation, 11(5), 1155–1182.
Brand, M. (1999d). Voice puppetry. In: A. Rockwood (Ed.), Proceedings of ACM SIGGRAPH 1999 (pp. 21–28), Los Angeles.
Bresin, R., Friberg, A., & Sundberg, J. (2002). Director musices: The KTH performance rules system. In: Proceedings of SIGMUS-46, Kyoto.
Casey, M. (2003). Musical structure and content repurposing with bayesian models. In: Proceedings of the Cambridge Music Processing Colloquium.
Cemgil, A., Kappen, H., & Barber, D. (2003). Generative model based polyphonic music transcription. In: D., Heckerman and C. Whittaker (Eds.), IEEE Workshop on Applications of Signal Processing to Audio and Acoustics. New Paltz, NY.
Cemgil, A., & Kappen, H. (2003). Monte carlo methods for tempo tracking and rhythm quantization. Journal of Artifical Intelligence Research, 18, 45–81.
Corless, R., Gonnet, G., Hare, D., & Knuth, D. (1996). On the lambert W function. Advances in Computational Mathematics, 5, 329–359.
Dempster, A., Laird, N., & Rubin, D. (1977). Maximum-likelihood from incomplete data via the em algorithm. Journal of the Royal Statistics Society, Series B, 39(1), 1–38.
de Mántaras, R.L., & Arcos, J. (2002). AI and music: From composition to expressive performances. AI Magazine, 23(3), 43–57.
Dixon, S. (2001). Automatic extraction of tempo and beat from expressive performances. Journal New Music Research, 30(1), 39–58.
Fine, S., Singer, Y., & Tishby, N. (1998). The hierarchical hidden Markov model: Analysis and applications. Machine Learning, 32(1), 41–62.
Grindlay, G. (2005). Modeling expressive musical performance with hidden Markov models. Master’;s thesis, Dept. of Computer Science, U.C., Santa Cruz.
Lang, D., & de Freitas, N. (2005). Beat tracking the graphical model way. In: L. K. Saul, Y. Weiss, and L. Bottou, (Eds.), Advances in Neural Information Processing Systems 17 (pp. 745–752). Cambridge, MA: MIT Press.
Lerdahl, F., & Jackendoff, R. (1983). A Generative Theory of Tonal Music. MIT Press.
Murphy, K.P., & Paskin, M.A. (2002). Linear-time inference in hierarchical hmms. In: T. G. Dietterich, S. Becker, and Z. Ghahramani (Eds.), Advances in Neural Information Processing Systems 14 (pp. 833–840), Cambridge MIT Press.
Murphy, K. (2004). The BayesNet toolbox. URL: http://bnt.sourceforge.net.
Raphael, C. (2002a). Automatic transcription of piano music. In D. Heckerman and C. Whittaker, (Eds.), Proceedings ISMIR. Paris. France.
Raphael, C. (2002b). A Bayesian Network for real-time musical accompaniment. In T.G. Dietterich, S. Becker, and Z. Ghahramani, (Eds.), Advances in Neural Information Processing Systems 14, (pp. 1433–1439). Cambridge, MA: MIT Press.
Repp, B. (1992). Diversity and commonality in music performance: An analysis of timing microstructure in schumann’;s trumerei. Journal of the Acoustical Society of America, 92, 2546–2568.
Repp, B. (1997). Expressive timing in a debussy prelude: A comparison of student and expert pianists. Musicae Scientiae, 1(2), 257–268.
Saunders, C., Hardoon, D. R., Shawe-Taylor, J., & Widmer, G. (2004). Using string kernels to identify famous performers from their playing style. In: Proceedings of the 15th European Conference on Machine Learning (ECML’;2004) (pp. 384–395), Springer.
Scheirer, E. (1995). Extracting expressive performance information from recorded music. Master’;s thesis, Program in Media Arts and Science, Massachusetts Institute of Technology.
Stamatatos, E., & Widmer, G. (2005). Automatic identification of music performers with learning ensembles. Artificial Intelligence, 165(1), 37–56.
Tobudic, A., & Widmer, G. (2003). Learning to play mozart: Recent improvements. In: Proceedings of the IJCAI’;03 Workshop on Methods for Automatic Music Performance and their Applications in a Public Rendering Contest.
Tobudic, A., & Widmer, G. (2005). Learning to play like the great pianists. In: Proceedings of the 19th International Joint Conference on Aritificial Intelligence (IJCAI’;05).
Todd, N. (1992). The dynamics of dynamics: A model of musical expression. Journal of the Acoustical Society of America, 91, 3540–3550.
Wang, T., Zheng, N., Li, Y., Xu, Y., & Shum, H. (2003). Learning kernel-based hmms for dynamic sequence synthesis. Graphical Models, 65(4), 206–221.
Widmer, G., & Goebl, W. (2004). Computational models of expressive music performance: The state of the art. Journal of New Music Research, 33(3), 203–216.
Widmer, G., & Tobudic, A. (2003). Playing mozart by analogy: Learning multi-level timing and dynamics strategies. Journal of New Music Research, 32(3), 259–268.
Widmer, G. (2003). Discovering simple rules in complex data: A meta-learning algorithm and some surprising musical discoveries. Artificial Intellignece, 146(2), 129–148.
Author information
Authors and Affiliations
Corresponding author
Additional information
Editor: Gerhard Widmer
Rights and permissions
About this article
Cite this article
Grindlay, G., Helmbold, D. Modeling, analyzing, and synthesizing expressive piano performance with graphical models. Mach Learn 65, 361–387 (2006). https://doi.org/10.1007/s10994-006-8751-3
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10994-006-8751-3