Skip to main content
Log in

Melody generation based on deep ensemble learning using varying temporal context length

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

Music has always been the most powerful medium to express human emotion and feeling which sometimes mere words cannot express. As a result generating music using machine and deep learning approaches have been quite popular for some time now. It is a very challenging and interesting task to do as imitating human creativity is not easy. This paper attempted to perform effective melody generation using sequential deep learning models particularly LSTMs (Long short-term memory). In this context, note that the previous works exhibit two principal limitations. Firstly, a significant majority of the studies rely on RNN variants that cannot effectively remember long past sequences. Secondly, they often don’t consider the varying temporal context lengths in melody generation during data modeling. In this work, experiments have been performed with different LSTM variants namely Vanilla LSTM, Multi-Layer LSTM, Bidirectional LSTM and different temporal context lengths for each of them to find out the optimal LSTM model and the optimal timestep for efficient melody generation. Moreover, ensembles of the best-performing techniques for each genre (e.g., classical, country, jazz, and pop) are implemented to see if we can generate even better melodies than the corresponding individual models. Finally, a qualitative evaluation is carried out for the generated melodies by conducting a survey that we circulated among fellow colleagues and within the ISMIR community and asked the participants to rate each audio on a scale of 1-5 which helped us in assessing the quality of our generated music samples. All the models have been validated on four datasets that we manually prepared based on their genres namely Classical, Jazz, Country and Pop.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Algorithm 1
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

Data Availability

The source of the benchmark dataset is provided in the manuscript.

Notes

  1. https://colinraffel.com/projects/lmd/

References

  1. Varshney LR, Pinel F, Varshney KR, Schörgendorfer A, Chee Y-M (2013) Cognition as a part of computational creativity. In: IEEE 12th international conference on cognitive informatics and cognitive computing. IEEE 2013:36–43

  2. Besold TR, Schorlemmer M, Smaill A et al (2015) Computational creativity research: towards creative machines

  3. Toivonen H et al (2020) Computational creativity beyond machine learning. Phys Life Rev

  4. Colton S, Wiggins GA et al (2012) Computational creativity: the final frontier?. In Ecai, vol 12, pp 21–26. Montpelier

  5. Gero JS (2000) Computational models of innovative and creative design processes. Technol Forecast Soc Change 64(2–3):183–196

    Article  Google Scholar 

  6. Leach J, Fitch J (1995) Nature, music, and algorithmic composition. Comput Music J 19(2):23–33

    Article  Google Scholar 

  7. Papadopoulos G, Wiggins G (1999) Ai methods for algorithmic composition: a survey, a critical view and future prospects, in AISB symposium on musical creativity, vol 124. UK, Edinburgh, pp 110–117

  8. Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput 9(8):1735–1780

    Article  CAS  PubMed  Google Scholar 

  9. Gers FA, Schmidhuber J, Cummins F (2000) Learning to forget: continual prediction with lstm. Neural Comput 12(10):2451–2471

  10. Mozer MC (1994) Neural network music composition by prediction: exploring the benefits of psychoacoustic constraints and multi-scale processing. Conn Sci 6(2–3):247–280

  11. Chen C-C, Miikkulainen R (2001) Creating melodies with evolving recurrent neural networks. In IJCNN’01. International joint conference on neural networks. Proceedings (Cat. No. 01CH37222), vol. 3, pp. 2241–2246. IEEE

  12. Eck D, Schmidhuber J (2002) A first look at music composition using lstm recurrent neural networks. Istituto Dalle Molle Di Studi Sull Intelligenza Artificiale 103:48

    Google Scholar 

  13. Boulanger-Lewandowski N, Bengio Y, Vincent P (2012) Modeling temporal dependencies in high-dimensional sequences: application to polyphonic music generation and transcription. arXiv:1206.6392

  14. Waite E, Eck D, Roberts A, Abolafia D (2016) Project magenta: generating long-term structure in songs and stories. https://magenta.tensorflow.org/2016/07/15/lookback-rnn-attention-rnn

  15. Colombo F, Muscinelli SP, Seeholzer A, Brea J, Gerstner W (2016) Algorithmic composition of melodies with deep recurrent neural networks. arXiv:1606.07251

  16. Colombo F, Seeholzer A, Gerstner W (2017) Deep artificial composer: a creative neural network model for automated melody generation. In International conference on evolutionary and biologically inspired music and art. Springer, pp 81–96

  17. Kalingeri V, Grandhe S (2016) Music generation with deep learning. arXiv:1612.04928

  18. Wu J, Hu C, Wang Y, Hu X, Zhu J (2019) A hierarchical recurrent neural network for symbolic melody generation. IEEE Trans Cybern 50(6):2749–2757

    Article  PubMed  Google Scholar 

  19. Ranjan A, Behera VNJ, Reza M (2020) Using a bi-directional lstm model with attention mechanism trained on midi data for generating unique music. arXiv:2011.00773,

  20. Moog RA (1986) Midi: musical instrument digital interface. J Audio Eng Soc 34(5):394–404

  21. Vemula DR, Tripathi SK, Sharma NK, Hussain MM, Swamy UR, Polavarapu BL (2023) Music generation using deep learning. In: Machine vision and augmented intelligence: select proceedings of MAI 2022. Springer, pp 597–607

  22. V, Ingale, A, Mohan, D, Adlakha, K, Kumar, and M, Gupta, Music generation using three-layered lstm, arXiv:2105.09046, 2021

  23. Minu R, Nagarajan G, Borah S, Mishra D (2022) Lstm-rnn-based automatic music generation algorithm. In: Intelligent and cloud computing: Proceedings of ICICC 2021. Springer, pp 327–339

  24. Mohanty R, Dubey PP, Sandhan T (2023) Temporally conditioning of generative adversarial networks with lstm for music generation. In: 2023 10th International conference on signal processing and integrated networks (SPIN).IEEE, pp 526–530

  25. Yang L-C, Chou S-Y, Yang Y-H (2017) Midinet: a convolutional generative adversarial network for symbolic-domain music generation. arXiv:1703.10847

  26. Mogren O (2016) C-rnn-gan: continuous recurrent neural networks with adversarial training. arXiv:1611.09904

  27. Dong H-W, Hsiao W-Y, Yang L-C, Yang Y-H (2018) Musegan: multi-track sequential generative adversarial networks for symbolic music generation and accompaniment. In: Thirty-Second AAAI conference on artificial intelligence

  28. Zhu H, Liu Q, Yuan NJ, Qin C, Li J, Zhang K, Zhou G, Wei F, Xu Y, Chen E (2018) Xiaoice band: a melody and arrangement generation framework for pop music. In: Proceedings of the 24th ACM SIGKDD international conference on knowledge discovery & data mining, pp 2837–2846

  29. Chandna P, Blaauw M, Bonada J, Gómez E (2019) Wgansing: a multi-voice singing voice synthesizer based on the wasserstein-gan. In: 27th European signal processing conference (EUSIPCO). IEEE 2019:1–5

  30. Spezzatti A (2019) Neural networks for music generation. Towards Data Science, June, vol 24

  31. Hameed Z, Garcia-Zapirain B (2020) Sentiment classification using a single-layered bilstm model. IEEE Access, vol 8, pp 73 992–74 001,

  32. Ramezanpanah Z, Mallem M, Davesne F (2023) Autonomous gesture recognition using multi-layer lstm networks and laban movement analysis. Int J Knowl-Based Intell Eng Syst 1–9

  33. Hernàndez-Carnerero À, Sànchez-Marrè M, Mora-Jiménez I, Soguero-Ruiz C, Martínez-Agüero S, Álvarez-Rodríguez J (2023) Dimensionality reduction and ensemble of lstms for antimicrobial resistance prediction. Artif Intell Med 138:102508

    Article  PubMed  Google Scholar 

  34. Oord Avd, Dieleman S, Zen H, Simonyan K, Vinyals O, Graves A, Kalchbrenner N, Senior A, Kavukcuoglu K (2016) Wavenet: a generative model for raw audio. arXiv:1609.03499

  35. Oord A, Li Y, Babuschkin I, Simonyan K, Vinyals O, Kavukcuoglu K, Driessche G, Lockhart E, Cobo L, Stimberg F, et al (2018) Parallel wavenet: fast high-fidelity speech synthesis. In: International conference on machine learning. PMLR, pp 3918–3926

  36. Gabrielli L, Cella CE, Vesperini F, Droghini D, Principi E, Squartini S (2018) Deep learning for timbre modification and transfer: an evaluation study. In: Audio engineering society convention 144. Audio Engineering Society

  37. Tan HH, Herremans D (2020) Music fadernets: controllable music generation based on high-level features via low-level feature modelling. arXiv:2007.15474

  38. Cuthbert MS, Ariza C (2010) music21: a toolkit for computer-aided musicology and symbolic music data

  39. Kingma DP, Ba J (2014) Adam: a method for stochastic optimization. arXiv:1412.6980

  40. Huang F, Xie G, Xiao R (2009) Research on ensemble learning. In: 2009 international conference on artificial intelligence and computational intelligence, vol 3 pp 249–252. IEEE

  41. Czika W, Maldonado M, Liu Y (2023) Ensemble modeling: recent advances and applications

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Sarbani Roy.

Ethics declarations

Conflicts of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Nag, B., Middya, A.I. & Roy, S. Melody generation based on deep ensemble learning using varying temporal context length. Multimed Tools Appl (2024). https://doi.org/10.1007/s11042-024-18270-4

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s11042-024-18270-4

Keywords

Navigation