ABSTRACT
Supervised learning requires data to be labeled. However, labels may not always be available, or creating a labeled dataset may be costly. Even when the data is labeled, labeling is often inconsistent, incomplete and inaccurate. If the data changes over time, a model also needs to be retrained periodically. A machine learning model, therefore, needs to learn from data "in the wild", not just from an initial training dataset. This problem can be addressed by techniques that combine clustering and classification with user feedback. The paper describes one such technique in the form of a pattern: Incremental Analysis. The target audience includes developers who do not have much experience with using machine learning in dynamic environments. This is the first of a number of planned papers on patterns for machine learning.
- T Gonzàlez. 1985. Clustering to minimize the maximum intercluster distance. Theoretical Computer Science 38 (1985), 293--306.Google ScholarCross Ref
- W Hsiao and T Chang. 2008. An incremental cluster-based approach to spam filtering. Expert Systems with Applications 34 (2008), 1599--1608. Issue 3.Google ScholarDigital Library
- K Rieck, P Trinius, C Willems, and T Holz. 2011. Automatic analysis of malware behavior using machine learning. Journal of Computer Security 19 (2011), 639--668. Issue 4.Google ScholarDigital Library
Index Terms
- Incremental analysis in machine learning
Recommendations
Transductive Multilabel Learning via Label Set Propagation
The problem of multilabel classification has attracted great interest in the last decade, where each instance can be assigned with a set of multiple class labels simultaneously. It has a wide variety of real-world applications, e.g., automatic image ...
Towards Guidelines for Designing Human-in-the-Loop Machine Training Interfaces
IUI '21: Proceedings of the 26th International Conference on Intelligent User InterfacesSupervised machine learning approaches commonly require good availability and quality of training data. In applications that depend on human-labeled data, especially from experts, or that depend on contextual knowledge for training data sets, the human-...
Machine Learning: The State of the Art
The two fundamental problems in machine learning (ML) are statistical analysis and algorithm design. The former tells us the principles of the mathematical models that we establish from the observation data. The latter defines the conditions on which ...
Comments