Abstract
Today, many private households as well as broadcasting or film companies own large collections of digital music plays. These are time series that differ from, e.g., weather reports or stocks market data. The task is normally that of classification, not prediction of the next value or recognizing a shape or motif. New methods for extracting features that allow to classify audio data have been developed. However, the development of appropriate feature extraction methods is a tedious effort, particularly because every new classification task requires tailoring the feature set anew.
This paper presents a unifying framework for feature extraction from value series. Operators of this framework can be combined to feature extraction methods automatically, using a genetic programming approach. The construction of features is guided by the performance of the learning classifier which uses the features. Our approach to automatic feature extraction requires a balance between the completeness of the methods on one side and the tractability of searching for appropriate methods on the other side. In this paper, some theoretical considerations illustrate the trade-off. After the feature extraction, a second process learns a classifier from the transformed data. The practical use of the methods is shown by two types of experiments: classification of genres and classification according to user preferences.
Article PDF
Similar content being viewed by others
References
Bäck, T., Hammel, U., & Schwefel, H.-P. (1997). Evolutionary computation: Comments on the history and current state. IEEE Transactions on Evolutionary Computation, 1:1, 3–17.
Cooley, J. W., & Tukey, J. W. (1965). An algorithm for the machine computation of the complex Fourier series. Mathematics of Computation, 19, 297–301.
Droste, S., Jansen, T., & Wegener, I. (1998). On the analysis of the (1+1) evolutionary algorithm. Technical Report CI 21/98, SFB 531, Univ. Dortmund, Germany.
Fischer, S., Klinkenberg, R., Mierswa, I., & Ritthoff, O. (2002). Yale-yet another learning environment tutorial. Technical Report CI 136/02, SFB 531, Univ. Dortmund, Germany.
Ghias, A., Logan, J., Chamberlin, D., & Smith, B. C. (1995). Query by humming: Musical information retrieval in an audio database. In Proc. of ACM Multimedia (pp. 231–236).
Guo, G., & Li, S. Z. (2003). Content-based audio classification and retrieval by support vector machines. IEEE Transaction on Neural Networks, 14:1, 209–215.
Hastie, T., Tibshirani, R., & Friedman, J. (2001). The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Springer series in statistics, Springer.
Holland, J. H. (1986). Escaping brittleness: The possibilities of general-purpose learning algorithms applied to parallel rule-based systems. In R. S. Michalski, J. G. Carbonell, & T. M. Mitchell (Eds.), Machine Learning–An artificial intelligence approach, Chapt. 20, Vol. 2 (pp. 593–624). Palo Alto, CA: Morgan Kaufmann.
Jayant, N. S., & Noll, P. (1984). Digital coding of waveforms: Principles and applications to speech and video. Prentice Hall.
Joachims, T. (2002a). Learning to classify text using support vector machines, Vol. 668 of Kluwer International Series in Engineering and Computer Science. Kluwer.
Joachims, T. (2002b). Optimizing search engines using clickthrough data. In Procs. of the 8th Conference on Knowledge Discovery in Databases.
Kahveci, T., & Singh, A. K. (2001). An efficient index structure for string databases. In Proceedings of the 27th VLDB (pp. 352–360). Morgan Kaufmann.
Keogh, E., & Pazzani, M. (1998). An enhanced representation of time series which allows fast classification, clustering and relevance feedback. In Procs. of the 4th Conference on Knowledge Discovery in Databases. (pp. 239–241).
Keogh, E., & Smyth, P. (1997). An enhanced representation of time series which allows fast classification, clustering and relevance feedbacA probabilistic approach to fast pattern matching in time series databases. In Procs. of the 3rd Conference on Knowledge Discovery in Databases (pp. 24–30).
Klinkenberg, R. (2004). Learning drifting concepts: Example selection vs. example weighting. Intelligent Data Analysis (IDA), Special Issue on Incremental Learning Systems Capable of Dealing with Concept Drift, 8:3. (to appear).
Kohavi, R., & John, G. H. (1997). Wrappers for feature subset selection. Artificial Intelligence, 97:1/2, 273–324.
Koza, J. (1992). Genetic programming: On the programming of computers by means of natural selection. Cambridge, MA: MIT Press.
Kurth, F., & Clausen, M. (2001). Full-text indexing of very-large audio data bases. In 110th Convention of the Audio Engineering Society.
Liu, Z., Wang, Y., & Chen, T. (1998). Audio feature extraction and analysis for scene segmentation and classification. Journal of VLSI Signal Processing System.
Loy, G. (1989). Musicians make a standard: The MIDI phenomenon. Computer Music Journal, 9:4.
Morik, K., & Wessel, S. (1999). Incremental signal to symbol processing. In K. Morik, M. Kaiser, & V. Klingspor (Eds.), Making robots smarter–combining sensing and action through robot learning Chapt. 11. (pp. 185–198). Kluwer Academic Publ.
Pickens, J. (1996). A Survey of feature selection techniques for music information retrieval. Technical report, Center of Intelligent Information Retrieval, Department of Computer Science, University of Masschusetts.
Rüping, S. (2000). mySVM-Manual. Universität Dortmund, Lehrstuhl Informatik VIII. http://www-ai.cs.uni-dortmund.de/SOFTWARE/MYSVM/.
Takens, F. (1980). Detecting strange attractors in turbulence. In D. A. Rand & L. S. Young (Eds.), Dynamical systems and turbulence, Vol. 898 of Lecture Notes in Mathematics (pp. 366–381). Berlin: Springer.
Tzanetakis, G. (2002). Manipulation, analysis and retrieval systems for audio signals. Ph.D. thesis, Computer Science Department, Princeton University.
Tzanetakis, G., Essl, G., & Cook, P. (2001). Automatic musical genre classification of audio signals. In Procs. of the Int. Symposium on Music Information Retrieval (ISMIR) (pp. 205–210).
Yi, B., Jagadish, H., & Faloutsos, C. (1998). Efficient retrieval of similar time series under time warping. In Procs. 14th Conference on Data Engineering (pp. 201–208).
Zhang, T. & Kuo, C. (1998). Content-based classification and retrieval of audio. In SPIE’s 43rd Annual Meeting–Conference on Advanced Signal Processing Algorithms, Architectures, and Implementations VIII. San Diego.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Mierswa, I., Morik, K. Automatic Feature Extraction for Classifying Audio Data. Mach Learn 58, 127–149 (2005). https://doi.org/10.1007/s10994-005-5824-7
Received:
Revised:
Accepted:
Issue Date:
DOI: https://doi.org/10.1007/s10994-005-5824-7