Abstract
Onset detection is the primary task of music transcription that aims to find the start time of each note, which directly associated with the beats perception in the auditory system. Researchers attempted to find a data representation of universal onset function. However, the onset detection would not generalize to all cases. For example, onset detection in solo singing has a lower performance than solo playing the instrument in MIREX challenge every year. This paper presents a post-processing step to singing onset detection that solely reduces false detected onsets. In the post-processing step, the system checks the onsets picked from local maximums of onset function, and uses the neural network model to discern onset or non-onset feature rather than consider a complicated onset function. The performance of the network has a close relationship to the onset detection. In the public dataset about the research of singing transcription, the pipeline with post-processing presents a higher performance than the standard and novelty method, when it was focused on the onsets, that it reduces false alarms from feature methods. It can provide further supports for the research of singing transcription when the data-driven approach provided an effective method to eliminate spurious peaks, which can be the state-of-art of singing onset detection.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Matti Ryynänen. Singing Transcription. Springer, 2006.
J. P. Bello, L. Daudet, S. Abdallah, C. Duxbury, M. Davies, and M. B. Sandler. A tutorial on onset detection in music signals. IEEE Transactions on Speech and Audio Processing, 13(5):1035–1047, 2005.
Yongwei Zhu, Mohan S Kankanhalli, and Changsheng Xu. Pitch tracking and melody slope matching for song retrieval. pages 530–537, 2001.
De Cheveigné Alain and Kawahara Hideki. Yin, a fundamental frequency estimator for speech and music. Journal of the Acoustical Society of America, 111(4):1917–30, 2002.
Matthias Mauch and Simon Dixon. Pyin: A fundamental frequency estimator using probabilistic threshold distributions. In IEEE International Conference on Acoustics, 2014.
Camacho Arturo and John G Harris. A sawtooth waveform inspired pitch estimator for speech and music. Journal of the Acoustical Society of America, 124(3):1638, 2008.
Emilio Molina, Lorenzo J. Tardon, Ana M. Barbancho, and Isabel Barbancho. Sipth: Singing transcription based on hysteresis defined on the pitch-time curve. IEEE/ACM Transactions on Audio Speech and Language Processing, 23(2): 252–263, 2015.
Hoon Heo and Kyogu Lee. Robust singing transcription system using local homogeneity in the harmonic structure. Ieice Transactions on Information and Systems, 100(5):1114–1123, 2017.
Rodger J. Mcnab, Lloyd A. Smith, and Ian H. Witten. Signal processing for melody transcription. Proc.australasian Computer Science Conf.–, pages 301–307, 1995.
Meinard Müller. Fundamentals of Music Processing. Springer, 2015.
A. Holzapfel, Y. Stylianou, A. C. Gedik, and B. Bozkurt. Three dimensions of pitched instrument onset detection. IEEE Transactions on Audio, Speech, and Language Processing, 18(6):1517–1527, Aug 2010.
Ruohua Zhou, M. Mattavelli, and G. Zoia. Music onset detection based on resonator time frequency image. Trans. Audio, Speech and Lang. Proc., 16(8):1685–1695, nov 2008.
Paul Masri. Computer modelling of sound for transformation and synthesis of musical signals. PhD thesis, University of Bristol, 1996.
Sebastian Böck and Gerhard Widmer. Local group delay based vibrato and tremolo suppression for onset detection. In ISMIR, pages 361–366. Citeseer, 2013.
Yann LeCun, Yoshua Bengio, and Geoffrey Hinton. Deep learning. Nature, 521:436–444, 2015.
Sebastian Böck, Andreas Arzt, Florian Krebs, and Markus Schedl. Online real-time onset detection with recurrent neural networks. In Proceedings of the 15th International Conference on Digital Audio Effects (DAFx-12), York, UK, 2012.
Jan Schlüter and Sebastian Böck. Musical onset detection with convolutional neural networks. In 6th international workshop on machine learning and music (MML), Prague, Czech Republic, 2013.
Florian Eyben, Sebastian Böck, Björn Schuller, and Alex Graves. Universal onset detection with bidirectional long-short term memory neural networks. In Proc. 11th Intern. Soc. for Music Information Retrieval Conference, ISMIR, Utrecht, The Netherlands, pages 589–594, 2010.
MIREX. Audio onset detection. https://www.music-ir.org/mirex/wiki/, 2019.
MIREX. Audio onset detection. https://nema.lis.illinois.edu/nema_out/mirex2018/results/aod/summary.html, 2018.
Sebastian Böck, Jan Schlüter, and Gerhard Widmer. Enhanced peak picking for onset detection with recurrent neural networks. In International Workshop on Machine Learning and Music, 2013.
Sebastian Böck, Florian Eyben, Björn Schuller, and Technische Universität München. Mirex 2010 submission: Onset detection with bidirectional long short-term memory neural networks. Proc ISMIR, 2013.
Huy Phan, Philipp Koch, Fabrice Katzberg, Marco Maass, Radoslaw Mazur, Ian McLoughlin, and Alfred Mertins. What makes audio event detection harder than classification? In 2017 25th European Signal Processing Conference (EUSIPCO), pages 2739–2743. IEEE, 2017.
Junge Zhang, Zhao Xin, Yongzhen Huang, and Kaiqi Huang. Semantic windows mining in sliding window based object detection. In International Conference on Pattern Recognition, 2012.
A. Klapuri and M. Davy. Signal processing methods for music transcription. 2006.
Brian C. J Moore. An introduction to the psychology of hearing, 5th ed. Archives of Otolaryngology, 103(12):745–746, 1977.
Deng Li and Dong Yu. Automatic Speech Recognition: A Deep Learning Approach. Springer, 2014.
François Chollet et al. Keras. 2015.
F. A. Gers, J Schmidhuber, and F Cummins. Learning to forget: continual prediction with lstm. Neural Computation, 12(10):2451–2471, 2000.
Serrá Joan Salamon, Justin and Emilia Gómez. Tonal representations for music retrieval: from version identification to query-by-humming. International Journal of Multimedia Information Retrieval, 2(1):45–58, Mar 2013.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Lin, M., Feng, Y. (2020). A post-processing of onset detection based on verification with neural network. In: Li, H., Li, S., Ma, L., Fang, C., Zhu, Y. (eds) Proceedings of the 7th Conference on Sound and Music Technology (CSMT). Lecture Notes in Electrical Engineering, vol 635. Springer, Singapore. https://doi.org/10.1007/978-981-15-2756-2_6
Download citation
DOI: https://doi.org/10.1007/978-981-15-2756-2_6
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-15-2755-5
Online ISBN: 978-981-15-2756-2
eBook Packages: EngineeringEngineering (R0)