Skip to main content

A post-processing of onset detection based on verification with neural network

  • Conference paper
  • First Online:
Proceedings of the 7th Conference on Sound and Music Technology (CSMT)

Part of the book series: Lecture Notes in Electrical Engineering ((LNEE,volume 635))

Abstract

Onset detection is the primary task of music transcription that aims to find the start time of each note, which directly associated with the beats perception in the auditory system. Researchers attempted to find a data representation of universal onset function. However, the onset detection would not generalize to all cases. For example, onset detection in solo singing has a lower performance than solo playing the instrument in MIREX challenge every year. This paper presents a post-processing step to singing onset detection that solely reduces false detected onsets. In the post-processing step, the system checks the onsets picked from local maximums of onset function, and uses the neural network model to discern onset or non-onset feature rather than consider a complicated onset function. The performance of the network has a close relationship to the onset detection. In the public dataset about the research of singing transcription, the pipeline with post-processing presents a higher performance than the standard and novelty method, when it was focused on the onsets, that it reduces false alarms from feature methods. It can provide further supports for the research of singing transcription when the data-driven approach provided an effective method to eliminate spurious peaks, which can be the state-of-art of singing onset detection.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Matti Ryynänen. Singing Transcription. Springer, 2006.

    Google Scholar 

  2. J. P. Bello, L. Daudet, S. Abdallah, C. Duxbury, M. Davies, and M. B. Sandler. A tutorial on onset detection in music signals. IEEE Transactions on Speech and Audio Processing, 13(5):1035–1047, 2005.

    Article  Google Scholar 

  3. Yongwei Zhu, Mohan S Kankanhalli, and Changsheng Xu. Pitch tracking and melody slope matching for song retrieval. pages 530–537, 2001.

    Google Scholar 

  4. De Cheveigné Alain and Kawahara Hideki. Yin, a fundamental frequency estimator for speech and music. Journal of the Acoustical Society of America, 111(4):1917–30, 2002.

    Google Scholar 

  5. Matthias Mauch and Simon Dixon. Pyin: A fundamental frequency estimator using probabilistic threshold distributions. In IEEE International Conference on Acoustics, 2014.

    Google Scholar 

  6. Camacho Arturo and John G Harris. A sawtooth waveform inspired pitch estimator for speech and music. Journal of the Acoustical Society of America, 124(3):1638, 2008.

    Article  Google Scholar 

  7. Emilio Molina, Lorenzo J. Tardon, Ana M. Barbancho, and Isabel Barbancho. Sipth: Singing transcription based on hysteresis defined on the pitch-time curve. IEEE/ACM Transactions on Audio Speech and Language Processing, 23(2): 252–263, 2015.

    Article  Google Scholar 

  8. Hoon Heo and Kyogu Lee. Robust singing transcription system using local homogeneity in the harmonic structure. Ieice Transactions on Information and Systems, 100(5):1114–1123, 2017.

    Google Scholar 

  9. Rodger J. Mcnab, Lloyd A. Smith, and Ian H. Witten. Signal processing for melody transcription. Proc.australasian Computer Science Conf.–, pages 301–307, 1995.

    Google Scholar 

  10. Meinard Müller. Fundamentals of Music Processing. Springer, 2015.

    Google Scholar 

  11. A. Holzapfel, Y. Stylianou, A. C. Gedik, and B. Bozkurt. Three dimensions of pitched instrument onset detection. IEEE Transactions on Audio, Speech, and Language Processing, 18(6):1517–1527, Aug 2010.

    Article  Google Scholar 

  12. Ruohua Zhou, M. Mattavelli, and G. Zoia. Music onset detection based on resonator time frequency image. Trans. Audio, Speech and Lang. Proc., 16(8):1685–1695, nov 2008.

    Article  Google Scholar 

  13. Paul Masri. Computer modelling of sound for transformation and synthesis of musical signals. PhD thesis, University of Bristol, 1996.

    Google Scholar 

  14. Sebastian Böck and Gerhard Widmer. Local group delay based vibrato and tremolo suppression for onset detection. In ISMIR, pages 361–366. Citeseer, 2013.

    Google Scholar 

  15. Yann LeCun, Yoshua Bengio, and Geoffrey Hinton. Deep learning. Nature, 521:436–444, 2015.

    Article  Google Scholar 

  16. Sebastian Böck, Andreas Arzt, Florian Krebs, and Markus Schedl. Online real-time onset detection with recurrent neural networks. In Proceedings of the 15th International Conference on Digital Audio Effects (DAFx-12), York, UK, 2012.

    Google Scholar 

  17. Jan Schlüter and Sebastian Böck. Musical onset detection with convolutional neural networks. In 6th international workshop on machine learning and music (MML), Prague, Czech Republic, 2013.

    Google Scholar 

  18. Florian Eyben, Sebastian Böck, Björn Schuller, and Alex Graves. Universal onset detection with bidirectional long-short term memory neural networks. In Proc. 11th Intern. Soc. for Music Information Retrieval Conference, ISMIR, Utrecht, The Netherlands, pages 589–594, 2010.

    Google Scholar 

  19. MIREX. Audio onset detection. https://www.music-ir.org/mirex/wiki/, 2019.

  20. MIREX. Audio onset detection. https://nema.lis.illinois.edu/nema_out/mirex2018/results/aod/summary.html, 2018.

  21. Sebastian Böck, Jan Schlüter, and Gerhard Widmer. Enhanced peak picking for onset detection with recurrent neural networks. In International Workshop on Machine Learning and Music, 2013.

    Google Scholar 

  22. Sebastian Böck, Florian Eyben, Björn Schuller, and Technische Universität München. Mirex 2010 submission: Onset detection with bidirectional long short-term memory neural networks. Proc ISMIR, 2013.

    Google Scholar 

  23. Huy Phan, Philipp Koch, Fabrice Katzberg, Marco Maass, Radoslaw Mazur, Ian McLoughlin, and Alfred Mertins. What makes audio event detection harder than classification? In 2017 25th European Signal Processing Conference (EUSIPCO), pages 2739–2743. IEEE, 2017.

    Google Scholar 

  24. Junge Zhang, Zhao Xin, Yongzhen Huang, and Kaiqi Huang. Semantic windows mining in sliding window based object detection. In International Conference on Pattern Recognition, 2012.

    Google Scholar 

  25. A. Klapuri and M. Davy. Signal processing methods for music transcription. 2006.

    Google Scholar 

  26. Brian C. J Moore. An introduction to the psychology of hearing, 5th ed. Archives of Otolaryngology, 103(12):745–746, 1977.

    Google Scholar 

  27. Deng Li and Dong Yu. Automatic Speech Recognition: A Deep Learning Approach. Springer, 2014.

    Google Scholar 

  28. François Chollet et al. Keras. 2015.

    Google Scholar 

  29. F. A. Gers, J Schmidhuber, and F Cummins. Learning to forget: continual prediction with lstm. Neural Computation, 12(10):2451–2471, 2000.

    Article  Google Scholar 

  30. Serrá Joan Salamon, Justin and Emilia Gómez. Tonal representations for music retrieval: from version identification to query-by-humming. International Journal of Multimedia Information Retrieval, 2(1):45–58, Mar 2013.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Lin, M., Feng, Y. (2020). A post-processing of onset detection based on verification with neural network. In: Li, H., Li, S., Ma, L., Fang, C., Zhu, Y. (eds) Proceedings of the 7th Conference on Sound and Music Technology (CSMT). Lecture Notes in Electrical Engineering, vol 635. Springer, Singapore. https://doi.org/10.1007/978-981-15-2756-2_6

Download citation

  • DOI: https://doi.org/10.1007/978-981-15-2756-2_6

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-15-2755-5

  • Online ISBN: 978-981-15-2756-2

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics