A post-processing of onset detection based on verification with neural network

Lin, Mingtai; Feng, Yin

doi:10.1007/978-981-15-2756-2_6

Mingtai Lin³⁹ &
Yin Feng³⁹

Part of the book series: Lecture Notes in Electrical Engineering ((LNEE,volume 635))

425 Accesses
1 Citations

Abstract

Onset detection is the primary task of music transcription that aims to find the start time of each note, which directly associated with the beats perception in the auditory system. Researchers attempted to find a data representation of universal onset function. However, the onset detection would not generalize to all cases. For example, onset detection in solo singing has a lower performance than solo playing the instrument in MIREX challenge every year. This paper presents a post-processing step to singing onset detection that solely reduces false detected onsets. In the post-processing step, the system checks the onsets picked from local maximums of onset function, and uses the neural network model to discern onset or non-onset feature rather than consider a complicated onset function. The performance of the network has a close relationship to the onset detection. In the public dataset about the research of singing transcription, the pipeline with post-processing presents a higher performance than the standard and novelty method, when it was focused on the onsets, that it reduces false alarms from feature methods. It can provide further supports for the research of singing transcription when the data-driven approach provided an effective method to eliminate spurious peaks, which can be the state-of-art of singing onset detection.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Hardcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Matti Ryynänen. Singing Transcription. Springer, 2006.
Google Scholar
J. P. Bello, L. Daudet, S. Abdallah, C. Duxbury, M. Davies, and M. B. Sandler. A tutorial on onset detection in music signals. IEEE Transactions on Speech and Audio Processing, 13(5):1035–1047, 2005.
Article Google Scholar
Yongwei Zhu, Mohan S Kankanhalli, and Changsheng Xu. Pitch tracking and melody slope matching for song retrieval. pages 530–537, 2001.
Google Scholar
De Cheveigné Alain and Kawahara Hideki. Yin, a fundamental frequency estimator for speech and music. Journal of the Acoustical Society of America, 111(4):1917–30, 2002.
Google Scholar
Matthias Mauch and Simon Dixon. Pyin: A fundamental frequency estimator using probabilistic threshold distributions. In IEEE International Conference on Acoustics, 2014.
Google Scholar
Camacho Arturo and John G Harris. A sawtooth waveform inspired pitch estimator for speech and music. Journal of the Acoustical Society of America, 124(3):1638, 2008.
Article Google Scholar
Emilio Molina, Lorenzo J. Tardon, Ana M. Barbancho, and Isabel Barbancho. Sipth: Singing transcription based on hysteresis defined on the pitch-time curve. IEEE/ACM Transactions on Audio Speech and Language Processing, 23(2): 252–263, 2015.
Article Google Scholar
Hoon Heo and Kyogu Lee. Robust singing transcription system using local homogeneity in the harmonic structure. Ieice Transactions on Information and Systems, 100(5):1114–1123, 2017.
Google Scholar
Rodger J. Mcnab, Lloyd A. Smith, and Ian H. Witten. Signal processing for melody transcription. Proc.australasian Computer Science Conf.–, pages 301–307, 1995.
Google Scholar
Meinard Müller. Fundamentals of Music Processing. Springer, 2015.
Google Scholar
A. Holzapfel, Y. Stylianou, A. C. Gedik, and B. Bozkurt. Three dimensions of pitched instrument onset detection. IEEE Transactions on Audio, Speech, and Language Processing, 18(6):1517–1527, Aug 2010.
Article Google Scholar
Ruohua Zhou, M. Mattavelli, and G. Zoia. Music onset detection based on resonator time frequency image. Trans. Audio, Speech and Lang. Proc., 16(8):1685–1695, nov 2008.
Article Google Scholar
Paul Masri. Computer modelling of sound for transformation and synthesis of musical signals. PhD thesis, University of Bristol, 1996.
Google Scholar
Sebastian Böck and Gerhard Widmer. Local group delay based vibrato and tremolo suppression for onset detection. In ISMIR, pages 361–366. Citeseer, 2013.
Google Scholar
Yann LeCun, Yoshua Bengio, and Geoffrey Hinton. Deep learning. Nature, 521:436–444, 2015.
Article Google Scholar
Sebastian Böck, Andreas Arzt, Florian Krebs, and Markus Schedl. Online real-time onset detection with recurrent neural networks. In Proceedings of the 15th International Conference on Digital Audio Effects (DAFx-12), York, UK, 2012.
Google Scholar
Jan Schlüter and Sebastian Böck. Musical onset detection with convolutional neural networks. In 6th international workshop on machine learning and music (MML), Prague, Czech Republic, 2013.
Google Scholar
Florian Eyben, Sebastian Böck, Björn Schuller, and Alex Graves. Universal onset detection with bidirectional long-short term memory neural networks. In Proc. 11th Intern. Soc. for Music Information Retrieval Conference, ISMIR, Utrecht, The Netherlands, pages 589–594, 2010.
Google Scholar
MIREX. Audio onset detection. https://www.music-ir.org/mirex/wiki/, 2019.
MIREX. Audio onset detection. https://nema.lis.illinois.edu/nema_out/mirex2018/results/aod/summary.html, 2018.
Sebastian Böck, Jan Schlüter, and Gerhard Widmer. Enhanced peak picking for onset detection with recurrent neural networks. In International Workshop on Machine Learning and Music, 2013.
Google Scholar
Sebastian Böck, Florian Eyben, Björn Schuller, and Technische Universität München. Mirex 2010 submission: Onset detection with bidirectional long short-term memory neural networks. Proc ISMIR, 2013.
Google Scholar
Huy Phan, Philipp Koch, Fabrice Katzberg, Marco Maass, Radoslaw Mazur, Ian McLoughlin, and Alfred Mertins. What makes audio event detection harder than classification? In 2017 25th European Signal Processing Conference (EUSIPCO), pages 2739–2743. IEEE, 2017.
Google Scholar
Junge Zhang, Zhao Xin, Yongzhen Huang, and Kaiqi Huang. Semantic windows mining in sliding window based object detection. In International Conference on Pattern Recognition, 2012.
Google Scholar
A. Klapuri and M. Davy. Signal processing methods for music transcription. 2006.
Google Scholar
Brian C. J Moore. An introduction to the psychology of hearing, 5th ed. Archives of Otolaryngology, 103(12):745–746, 1977.
Google Scholar
Deng Li and Dong Yu. Automatic Speech Recognition: A Deep Learning Approach. Springer, 2014.
Google Scholar
François Chollet et al. Keras. 2015.
Google Scholar
F. A. Gers, J Schmidhuber, and F Cummins. Learning to forget: continual prediction with lstm. Neural Computation, 12(10):2451–2471, 2000.
Article Google Scholar
Serrá Joan Salamon, Justin and Emilia Gómez. Tonal representations for music retrieval: from version identification to query-by-humming. International Journal of Multimedia Information Retrieval, 2(1):45–58, Mar 2013.
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Artificial Intelligence, Xiamen university, Xiamen, 361000, China
Mingtai Lin & Yin Feng

Authors

Mingtai Lin
View author publications
You can also search for this author in PubMed Google Scholar
Yin Feng
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

School of Computer Science and Tech., Harbin Institute of Technology (HIT), Harbin, Heilongjiang, China
Haifeng Li
Beijing Univ. of Posts and Telecom., Beijing, China
Shengchen Li
School of Computer Science and Tech., Harbin Institute of Technology (HIT), Harbin, Heilongjiang, China
Lin Ma
School of Computer and Information Eng., Heilongjiang University of science and technology, Harbin, Heilongjiang, China
Chunying Fang
The acoustical society of Beijing, Xicheng, Beijing, China
Yidan Zhu

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Lin, M., Feng, Y. (2020). A post-processing of onset detection based on verification with neural network. In: Li, H., Li, S., Ma, L., Fang, C., Zhu, Y. (eds) Proceedings of the 7th Conference on Sound and Music Technology (CSMT). Lecture Notes in Electrical Engineering, vol 635. Springer, Singapore. https://doi.org/10.1007/978-981-15-2756-2_6

Download citation

DOI: https://doi.org/10.1007/978-981-15-2756-2_6
Published: 22 December 2019
Publisher Name: Springer, Singapore
Print ISBN: 978-981-15-2755-5
Online ISBN: 978-981-15-2756-2
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics