Melody generation based on deep ensemble learning using varying temporal context length

Nag, Baibhav; Middya, Asif Iqbal; Roy, Sarbani

doi:10.1007/s11042-024-18270-4

Melody generation based on deep ensemble learning using varying temporal context length

Published: 02 February 2024

(2024)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

54 Accesses
Explore all metrics

Abstract

Music has always been the most powerful medium to express human emotion and feeling which sometimes mere words cannot express. As a result generating music using machine and deep learning approaches have been quite popular for some time now. It is a very challenging and interesting task to do as imitating human creativity is not easy. This paper attempted to perform effective melody generation using sequential deep learning models particularly LSTMs (Long short-term memory). In this context, note that the previous works exhibit two principal limitations. Firstly, a significant majority of the studies rely on RNN variants that cannot effectively remember long past sequences. Secondly, they often don’t consider the varying temporal context lengths in melody generation during data modeling. In this work, experiments have been performed with different LSTM variants namely Vanilla LSTM, Multi-Layer LSTM, Bidirectional LSTM and different temporal context lengths for each of them to find out the optimal LSTM model and the optimal timestep for efficient melody generation. Moreover, ensembles of the best-performing techniques for each genre (e.g., classical, country, jazz, and pop) are implemented to see if we can generate even better melodies than the corresponding individual models. Finally, a qualitative evaluation is carried out for the generated melodies by conducting a survey that we circulated among fellow colleagues and within the ISMIR community and asked the participants to rate each audio on a scale of 1-5 which helped us in assessing the quality of our generated music samples. All the models have been validated on four datasets that we manually prepared based on their genres namely Classical, Jazz, Country and Pop.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 10

Long Short-Term Memory Recurrent Neural Network Architectures for Melody Generation

Automatic Music Melody Generation Using LSTM and Markov Chain Model

Attentional networks for music generation

Article 21 January 2022

Data Availability

The source of the benchmark dataset is provided in the manuscript.

Notes

https://colinraffel.com/projects/lmd/

References

Varshney LR, Pinel F, Varshney KR, Schörgendorfer A, Chee Y-M (2013) Cognition as a part of computational creativity. In: IEEE 12th international conference on cognitive informatics and cognitive computing. IEEE 2013:36–43
Besold TR, Schorlemmer M, Smaill A et al (2015) Computational creativity research: towards creative machines
Toivonen H et al (2020) Computational creativity beyond machine learning. Phys Life Rev
Colton S, Wiggins GA et al (2012) Computational creativity: the final frontier?. In Ecai, vol 12, pp 21–26. Montpelier
Gero JS (2000) Computational models of innovative and creative design processes. Technol Forecast Soc Change 64(2–3):183–196
Article Google Scholar
Leach J, Fitch J (1995) Nature, music, and algorithmic composition. Comput Music J 19(2):23–33
Article Google Scholar
Papadopoulos G, Wiggins G (1999) Ai methods for algorithmic composition: a survey, a critical view and future prospects, in AISB symposium on musical creativity, vol 124. UK, Edinburgh, pp 110–117
Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput 9(8):1735–1780
Article CAS PubMed Google Scholar
Gers FA, Schmidhuber J, Cummins F (2000) Learning to forget: continual prediction with lstm. Neural Comput 12(10):2451–2471
Mozer MC (1994) Neural network music composition by prediction: exploring the benefits of psychoacoustic constraints and multi-scale processing. Conn Sci 6(2–3):247–280
Chen C-C, Miikkulainen R (2001) Creating melodies with evolving recurrent neural networks. In IJCNN’01. International joint conference on neural networks. Proceedings (Cat. No. 01CH37222), vol. 3, pp. 2241–2246. IEEE
Eck D, Schmidhuber J (2002) A first look at music composition using lstm recurrent neural networks. Istituto Dalle Molle Di Studi Sull Intelligenza Artificiale 103:48
Google Scholar
Boulanger-Lewandowski N, Bengio Y, Vincent P (2012) Modeling temporal dependencies in high-dimensional sequences: application to polyphonic music generation and transcription. arXiv:1206.6392
Waite E, Eck D, Roberts A, Abolafia D (2016) Project magenta: generating long-term structure in songs and stories. https://magenta.tensorflow.org/2016/07/15/lookback-rnn-attention-rnn
Colombo F, Muscinelli SP, Seeholzer A, Brea J, Gerstner W (2016) Algorithmic composition of melodies with deep recurrent neural networks. arXiv:1606.07251
Colombo F, Seeholzer A, Gerstner W (2017) Deep artificial composer: a creative neural network model for automated melody generation. In International conference on evolutionary and biologically inspired music and art. Springer, pp 81–96
Kalingeri V, Grandhe S (2016) Music generation with deep learning. arXiv:1612.04928
Wu J, Hu C, Wang Y, Hu X, Zhu J (2019) A hierarchical recurrent neural network for symbolic melody generation. IEEE Trans Cybern 50(6):2749–2757
Article PubMed Google Scholar
Ranjan A, Behera VNJ, Reza M (2020) Using a bi-directional lstm model with attention mechanism trained on midi data for generating unique music. arXiv:2011.00773,
Moog RA (1986) Midi: musical instrument digital interface. J Audio Eng Soc 34(5):394–404
Vemula DR, Tripathi SK, Sharma NK, Hussain MM, Swamy UR, Polavarapu BL (2023) Music generation using deep learning. In: Machine vision and augmented intelligence: select proceedings of MAI 2022. Springer, pp 597–607
V, Ingale, A, Mohan, D, Adlakha, K, Kumar, and M, Gupta, Music generation using three-layered lstm, arXiv:2105.09046, 2021
Minu R, Nagarajan G, Borah S, Mishra D (2022) Lstm-rnn-based automatic music generation algorithm. In: Intelligent and cloud computing: Proceedings of ICICC 2021. Springer, pp 327–339
Mohanty R, Dubey PP, Sandhan T (2023) Temporally conditioning of generative adversarial networks with lstm for music generation. In: 2023 10th International conference on signal processing and integrated networks (SPIN).IEEE, pp 526–530
Yang L-C, Chou S-Y, Yang Y-H (2017) Midinet: a convolutional generative adversarial network for symbolic-domain music generation. arXiv:1703.10847
Mogren O (2016) C-rnn-gan: continuous recurrent neural networks with adversarial training. arXiv:1611.09904
Dong H-W, Hsiao W-Y, Yang L-C, Yang Y-H (2018) Musegan: multi-track sequential generative adversarial networks for symbolic music generation and accompaniment. In: Thirty-Second AAAI conference on artificial intelligence
Zhu H, Liu Q, Yuan NJ, Qin C, Li J, Zhang K, Zhou G, Wei F, Xu Y, Chen E (2018) Xiaoice band: a melody and arrangement generation framework for pop music. In: Proceedings of the 24th ACM SIGKDD international conference on knowledge discovery & data mining, pp 2837–2846
Chandna P, Blaauw M, Bonada J, Gómez E (2019) Wgansing: a multi-voice singing voice synthesizer based on the wasserstein-gan. In: 27th European signal processing conference (EUSIPCO). IEEE 2019:1–5
Spezzatti A (2019) Neural networks for music generation. Towards Data Science, June, vol 24
Hameed Z, Garcia-Zapirain B (2020) Sentiment classification using a single-layered bilstm model. IEEE Access, vol 8, pp 73 992–74 001,
Ramezanpanah Z, Mallem M, Davesne F (2023) Autonomous gesture recognition using multi-layer lstm networks and laban movement analysis. Int J Knowl-Based Intell Eng Syst 1–9
Hernàndez-Carnerero À, Sànchez-Marrè M, Mora-Jiménez I, Soguero-Ruiz C, Martínez-Agüero S, Álvarez-Rodríguez J (2023) Dimensionality reduction and ensemble of lstms for antimicrobial resistance prediction. Artif Intell Med 138:102508
Article PubMed Google Scholar
Oord Avd, Dieleman S, Zen H, Simonyan K, Vinyals O, Graves A, Kalchbrenner N, Senior A, Kavukcuoglu K (2016) Wavenet: a generative model for raw audio. arXiv:1609.03499
Oord A, Li Y, Babuschkin I, Simonyan K, Vinyals O, Kavukcuoglu K, Driessche G, Lockhart E, Cobo L, Stimberg F, et al (2018) Parallel wavenet: fast high-fidelity speech synthesis. In: International conference on machine learning. PMLR, pp 3918–3926
Gabrielli L, Cella CE, Vesperini F, Droghini D, Principi E, Squartini S (2018) Deep learning for timbre modification and transfer: an evaluation study. In: Audio engineering society convention 144. Audio Engineering Society
Tan HH, Herremans D (2020) Music fadernets: controllable music generation based on high-level features via low-level feature modelling. arXiv:2007.15474
Cuthbert MS, Ariza C (2010) music21: a toolkit for computer-aided musicology and symbolic music data
Kingma DP, Ba J (2014) Adam: a method for stochastic optimization. arXiv:1412.6980
Huang F, Xie G, Xiao R (2009) Research on ensemble learning. In: 2009 international conference on artificial intelligence and computational intelligence, vol 3 pp 249–252. IEEE
Czika W, Maldonado M, Liu Y (2023) Ensemble modeling: recent advances and applications

Download references

Author information

Authors and Affiliations

Department of Mathematics, Jadavpur University, Kolkata, India
Baibhav Nag
Department of Computer Science and Engineering, Jadavpur University, Kolkata, India
Asif Iqbal Middya & Sarbani Roy

Authors

Baibhav Nag
View author publications
You can also search for this author in PubMed Google Scholar
Asif Iqbal Middya
View author publications
You can also search for this author in PubMed Google Scholar
Sarbani Roy
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Sarbani Roy.

Ethics declarations

Conflicts of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Nag, B., Middya, A.I. & Roy, S. Melody generation based on deep ensemble learning using varying temporal context length. Multimed Tools Appl (2024). https://doi.org/10.1007/s11042-024-18270-4

Download citation

Received: 11 December 2022
Revised: 27 October 2023
Accepted: 11 January 2024
Published: 02 February 2024
DOI: https://doi.org/10.1007/s11042-024-18270-4

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Melody generation based on deep ensemble learning using varying temporal context length

Abstract

Access this article

Similar content being viewed by others

Long Short-Term Memory Recurrent Neural Network Architectures for Melody Generation

Automatic Music Melody Generation Using LSTM and Markov Chain Model

Attentional networks for music generation

Data Availability

Notes

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflicts of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Melody generation based on deep ensemble learning using varying temporal context length

Abstract

Access this article

Similar content being viewed by others

Long Short-Term Memory Recurrent Neural Network Architectures for Melody Generation

Automatic Music Melody Generation Using LSTM and Markov Chain Model

Attentional networks for music generation

Data Availability

Notes

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflicts of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation