Comparison of Adversarial and Non-Adversarial LSTM Music Generative Models

Mots’oehli, Moseli; Bosman, Anna Sergeevna; De Villiers, Johan Pieter

doi:10.1007/978-3-031-37717-4_28

Moseli Mots’oehli¹⁰,
Anna Sergeevna Bosman¹¹ &
Johan Pieter De Villiers¹²

Part of the book series: Lecture Notes in Networks and Systems ((LNNS,volume 711))

Included in the following conference series:

Science and Information Conference

484 Accesses

Abstract

Algorithmic music composition is a way of composing musical pieces with minimal to no human intervention. While recurrent neural networks are traditionally applied to many sequence-to-sequence prediction tasks, including successful implementations of music composition, their standard supervised learning approach based on input-to-output mapping leads to a lack of note variety. These models can therefore be seen as potentially unsuitable for tasks such as music generation. Generative adversarial networks learn the generative distribution of data and lead to varied samples. This work implements and compares adversarial and non-adversarial training of recurrent neural network music composers on MIDI data. The resulting music samples are evaluated by human listeners, and their preferences are recorded. The evaluation indicates that adversarial training produces more aesthetically pleasing music.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 219.00; Price excludes VAT (USA)

Softcover Book: USD 279.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Yamshchikov, I.P., Tikhonov, A.: Music generation with variational recurrent autoencoder supported by history. SN Appl. Sci. 2(12), 1–7 (2020). https://doi.org/10.1007/s42452-020-03715-w
Article Google Scholar
Alfonseca, M., Cebrian, M., Puente, O.: A simple genetic algorithm for music generation by means of algorithmic information theory. In: Proceedings of the Institute of Electrical and Electronics Engineers Congress, Evolutionary Computation, pp. 25–28 (2007)
Google Scholar
António, R., et al. The freesound loop dataset and annotation tool. In arXiv, (2020)
Google Scholar
Arjovsky, M., Chintala, S., Bottou, L.: Wasserstein gan. In: Proceedings of the 34th International Conference on Machine Learning, vol. 70, pp. 214–223 (2017)
Google Scholar
Bretan, M., Weinberg, G., Heck, L.: A unit selection methodology for music generation using deep neural networks. In: Proceedings of the International Conference on Computational Creativity (2017)
Google Scholar
Chen, B., Hsu, E., Liao, W., Ramírez, M., Mitsufuji, Y., Yang, Y.: Automatic DJ transitions with differentiable audio effects and generative adversarial networks. In: IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2022, Virtual and Singapore, 23–27 May 2022, pp. 466–470. IEEE (2022)
Google Scholar
Cho, K., et al.: Learning phrase representations using RNN encoder-decoder for statistical machine translation. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing, pp. 1724–1734 (2014)
Google Scholar
Chongsheng, Z., et al.: An empirical study on the joint impact of feature selection and data resampling on imbalance classification. Appl. Intell. 53, 5449–5461 (2022)
Google Scholar
Chu, H., Urtasun, R., Fidler, S.: Song from pi: a musically plausible network for pop music generation. In: Proceedings of the 5th International Conference on Learning Representations (2017)
Google Scholar
Chung, J., Gulcehre, C., Cho, K., Bengio, Y.: Empirical evaluation of gated recurrent neural networks on sequence modeling. In: Proceedings of the 27th international conference on Neural Information Processing Systems, Deep Learning and Representation Learning (2014)
Google Scholar
Colombo, F., Gerstner, W.: Bachprop: Learning to compose music in multiple styles. Arxiv, volume abs/1802.05162 (2018)
Google Scholar
Cui, Z., Ke, R., Wang, Y.: Deep stacked bidirectional and unidirectional LSTM recurrent neural network for network-wide traffic speed prediction. In: 6th International Workshop on Urban Computing (2017)
Google Scholar
Denil, M., Bazzani, L., Larochelle, H., de Freitas, N.: Learning where to attend with deep architectures for image tracking. IEEE Neural Comput. 24, 2151–2184 (2012)
Article MathSciNet MATH Google Scholar
Dong, H., Hsiao, W., Yang, L., Yang, Y.: Musegan: multi-track sequential generative adversarial networks for symbolic music generation and accompaniment. In: Proceedings of the the 30th international conference on Innovative Applications of Artificial Intelligence, pp. 34–41 (2018)
Google Scholar
Edirisooriya, S., Dong, H.W., McAuley, J., Berg-Kirkpatrick, T.: An empirical evaluation of end-to-end polyphonic optical music recognition. In: International Society for Music Information Retrieval (2021)
Google Scholar
Elgammal, A., Liu, B., Elhoseiny, M., Mazzone, M.: CAN: creative adversarial networks generating "art" by learning styles and deviating from style norms. In: Proceedings of the International Conference on Computational Creativity (2017)
Google Scholar
Goodfellow, I., et al.: Generative adversarial nets. In: Proceedings of the 27th International Conference on Neural Information Processing Systems, pp. 2672–2680 (2014)
Google Scholar
Henning, B., Soon, O., Enno, S., Joachim, B.M.: The balanced accuracy and its posterior distribution. In: 20th International Conference on Pattern Recognition, pp. 3121–3124 (2010)
Google Scholar
Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9, 1735–1780 (1997)
Article Google Scholar
Huang, A., Wu, R.: Deep learning for music. Arxiv, volume abs/1606.04930 (2016)
Google Scholar
Hui, J.: GAN-wasserstein GAN and WGAN-GP. Medium (2018)
Google Scholar
Hung, T., Chen, B., Yeh, Y., Yang, Y.: A benchmarking initiative for audio-domain music generation using the freesound loop dataset. In: Proceedings of the 22nd International Society for Music Information Retrieval Conference, ISMIR 2021, Online, November 7–12, 2021, pp. 310–317 (2021)
Google Scholar
Juefei-Xu, F., Boddeti, V.N., Savvides, M.: Gang of GANs: generative adversarial networks with maximum margin ranking. ArXiv, volume abs/1704.04865 (2017)
Google Scholar
Kim, J.: Using keras and theano for deep learning driven jazz generation (2017). https://deepjazz.io
Lackner, K.: Composing a melody with long-short term memory (LSTM) recurrent neural networks, Bachelor’s thesis, Technische Universität München, Munich, Germany (2016)
Google Scholar
Lerdahl, F., Jackendoff, R.: On the Algorithmic Description of the Process of Composing Music. MIT Press (1983)
Google Scholar
Liu, I. Randall, R.: Predicting missing music components with bidirectional long short-term memory neural networks. In: Proceedings of the 17th International Society on Music Information Retrieval Conference (2016)
Google Scholar
Lopyrev, K.: Generating news headlines with recurrent neural networks. ArXiv, volume abs/1512.01712 (2015)
Google Scholar
Mogren, O.: C-RNN-GAN: continuous recurrent neural networks with adversarial training. In: Proceedings of the 30th International Conference on Neural Information Processing Systems, Constructive Machine Learning Workshop (2016)
Google Scholar
Mukkamala, M.C., Hein, M.: Variants of RMSProp and adagrad with logarithmic regret bounds. In: Proceedings of the 34th International Conference on Machine Learning, vol. 70, pp. 2545–2553 (2017)
Google Scholar
Oord, A.,et al.: WaveNet: a generative model for raw audio. In: Proceedings of the 9th International Speech Communication Association, Speech Synthesis Workshop, pp. 125–125 (2016)
Google Scholar
Oord, A., Kalchbrenner, N., Kavukcuoglu, K.: Pixel recurrent neural networks. In: Proceedings of the 33rd International Conference on Machine Learning, vol. 48, pp. 1747–1756 (2016)
Google Scholar
Raffel, C.: Learning-based methods for comparing sequences, with applications to audio-to-midi alignment and matching (2018). https://colinraffel.com/projects/lmd/
Roberts, A., Engel, J., Eck, D.: Hierarchical variational autoencoders for music. NIPS Workshop on Machine Learning for Creativity and Design, vol. 3 (2017)
Google Scholar
Roberts, A., Engel, J., Raffel, C., Hawthorne, C., Eck, D.: A hierarchical latent vector model for learning long-term structure in music. In: Proceedings of the 35th International Conference on Machine Learning, Machine Learning Research, vol. 80, pp. 4364–4373 (2018)
Google Scholar
Salimans, T., Goodfellow, I., Zaremba, W., Cheung, V., Radford, A., Chen, X.: Improved techniques for training GANs. In: Proceedings of the 30th International Conference on Neural Information Processing Systems, pp. 2234–2242 (2016)
Google Scholar
Schuster, M., Paliwal, K.: Bidirectional recurrent neural networks. IEEE Signal Process. 45, 2673–2681 (1997)
Article Google Scholar
Shiebler, D.: Music RNN RBM. github.com/dshieble/Music_RNN_RBM (2017)
Google Scholar
Shuqi, D., Zeyu, J., Celso, G., Roger, D.: Controllable deep melody generation via hierarchical music structure representation. arXiv preprint arXiv:2109.00663 (2021)
Srivastava, N., Hinton, G.E., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.: Dropout: a simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 15(1), 1929–1958 (2014)
MathSciNet MATH Google Scholar
Sutskever, I., Vinyals, O., Le, Q.: Sequence to sequence learning with neural networks. Adv. Neural Inf. Process. Syst. 4, (2014)
Google Scholar
Vaswani, A., et al.: Attention is all you need. In: Proceedings of the 31st International Conference on Neural Information Processing Systems, pp. 6000–6010 (2017)
Google Scholar
García, V., Mollineda, R.A., Sánchez, J.S.: Index of balanced accuracy: a performance measure for skewed class distributions. In: Araujo, H., Mendonça, A.M., Pinho, A.J., Torres, M.I. (eds.) IbPRIA 2009. LNCS, vol. 5524, pp. 441–448. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-642-02172-5_57
Chapter Google Scholar
Vincent, P., Lewandowski, N., Bengio, Y.: Modeling temporal dependencies in high-dimensional sequences: application to polyphonic music generation and transcription. In: Proceedings of the 27th International Conference on Machine Learning (2012)
Google Scholar
Waite, E.: Generating long-term structure in songs and stories. Lookback-RNN-attention-RNN (2016)
Google Scholar
Weel, J.: RoboMozart: generating music using LSTM networks trained per-tick on a midi collection with short music segments as input (2017)
Google Scholar
Werbos, P.: Backpropagation through time: what it does and how to do it. Proc. IEEE 78, 1550–1560 (1990)
Article Google Scholar
Wilcoxon, F.: Individual comparisons by ranking methods. Biometrics Bull. 1, 80–83 (1945)
Article Google Scholar
Xenakis, I.: Formalized Music: Thought and Mathematics in Music. Pendragon Press (1992)
Google Scholar
Yang, L., Chuo, S., Yang, Y.: MidiNet: a convolutional generative adversarial network for symbolic-domain music generation. In: Proceedings of the 18th International Society on Music Information Retrieval Conference, pp. 1–12 (2017)
Google Scholar
Yu, L., Zhang, W., Wang, J., Yu, Y.: Seqgan: Sequence generative adversarial nets with policy gradient. In: Proceedings of the 31st AAAI Conference on Artificial Intelligence, vol. 31 (2017)
Google Scholar
Zaripov, R.: On the algorithmic description of the process of composing music. USSR Acad. Sci. 132, 1283–1286 (1960)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Information and Computer Science, University of Hawai’i at Manoa, Honolulu, 96822, USA
Moseli Mots’oehli
Department of Computer Science, University of Pretoria, Pretoria, 0002, South Africa
Anna Sergeevna Bosman
Electrical and Computer Engineering, University of Pretoria, Pretoria, 0002, South Africa
Johan Pieter De Villiers

Authors

Moseli Mots’oehli
View author publications
You can also search for this author in PubMed Google Scholar
Anna Sergeevna Bosman
View author publications
You can also search for this author in PubMed Google Scholar
Johan Pieter De Villiers
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Moseli Mots’oehli .

Editor information

Editors and Affiliations

Faculty of Science and Engineering, Saga University, Saga, Japan
Kohei Arai

A Appendix :Listener Comments

“I couldn’t get a feel of where the encoder-decoder song is going, the WGAN sample has a nice classical feel to it.”
“They both sound musical but the sound quality is bad.”
“I like the pace of the WGAN sample.”
“The encoder-decoder sample had too many silent spaces, the quiet spots emphasize the loud spots , which sounded like rambling.”
“The WGAN sample is a little chaotic but generally creates a good atmosphere, the encoder-decoder song has a good rhythm but no actual melody.
“I’d say the encoder-decoder song is better, has more rhythm, the WGAN sample sounds too fast and just noise to me.”
“The WGAN Sample sounds like me when I am under dissertation stress, terrible.”
“The WGAN sample sounds more creative, the encoder-decoder sample wins on rhythm although it takes so long to get there.”
“What’s important to me is they all sound musical.”
“I don’t listen to this genre so my view may be way off.”
“Most of the songs sound similar to me.”
“Wow, can’t believe the WGAN sample was generated by a machine, though it’s funny at the end LOL.”
“Can’t you get them to generate longer songs? perhaps with words? I like the encoder-decoder sample.”

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Mots’oehli, M., Bosman, A.S., De Villiers, J.P. (2023). Comparison of Adversarial and Non-Adversarial LSTM Music Generative Models. In: Arai, K. (eds) Intelligent Computing. SAI 2023. Lecture Notes in Networks and Systems, vol 711. Springer, Cham. https://doi.org/10.1007/978-3-031-37717-4_28

Download citation

DOI: https://doi.org/10.1007/978-3-031-37717-4_28
Published: 01 September 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-37716-7
Online ISBN: 978-3-031-37717-4
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics

Comparison of Adversarial and Non-Adversarial LSTM Music Generative Models

Abstract

Access this chapter

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

A Appendix :Listener Comments

A Appendix :Listener Comments

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation