Elsevier

Neurocomputing

Volume 150, Part A, 20 February 2015, Pages 240-249
Neurocomputing

Towards cost-sensitive adaptation: When is it worth updating your predictive model?

https://doi.org/10.1016/j.neucom.2014.05.084Get rights and content

Abstract

Our digital universe is rapidly expanding, more and more daily activities are digitally recorded, data arrives in streams, it needs to be analyzed in real time and may evolve over time. In the last decade many adaptive learning algorithms and prediction systems, which can automatically update themselves with the new incoming data, have been developed. The majority of those algorithms focus on improving the predictive performance and assume that model update is always desired as soon as possible and as frequently as possible. In this study we consider potential model update as an investment decision, which, as in the financial markets, should be taken only if a certain return on investment is expected. We introduce and motivate a new research problem for data streams – cost-sensitive adaptation. We propose a reference framework for analyzing adaptation strategies in terms of costs and benefits. Our framework allows us to characterize and decompose the costs of model updates, and to asses and interpret the gains in performance due to model adaptation for a given learning algorithm on a given prediction task. Our proof-of-concept experiment demonstrates how the framework can aid in analyzing and managing adaptation decisions in the chemical industry.

Introduction

Learning from evolving data has become a popular research topic in the last decade. As distributions of real world data often evolve over time [1], predictive models need to have mechanisms to update themselves by regularly taking into account new data, otherwise their predictive performance will degrade. Such adaptive predictive algorithms have been developed in different research fields, such as data mining and machine learning in general [2], [3], [4], recommender systems [5], user modeling and personalization [6], information retrieval [7], intrusion detection [8], robotics [9], time series analysis [10], and chemical engineering [11]. The majority of those algorithms focus on optimizing the prediction accuracy over time. Several studies (e.g. [12], [13]) consider time and memory consumed for this operation as additional performance criteria. Different predictive analytics applications may operate in different environments, generate very different volumes of data, have different complexities of predictive decisions, and different sensitivity to errors. Naturally, there is no single best adaptation strategy or algorithm for all situations (“no free lunch” [14]).

From a practical perspective excessive adaptation (e.g. too often) may be a waste of resources and provide only incremental insignificant benefits towards the predictive performance. Consider as an example a chemical production process where an adaptive predictive model estimates the quality of a product given sensor readings as inputs. Sensor readings may arrive every second. However, changes in the process that require a model update, such as new suppliers or replacement of sensors, are not likely to happen every second, but rather on a yearly basis or so. During that year an incremental learning algorithm would make updates to itself every second, resulting in 30 million incremental updates. The question is, whether it would be a desirable investment of resources.

In response to this question we introduce a research problem of cost-sensitive adaptation, where model adaptation is considered as an investment decision. Computational resources are invested for updating the models, and labor resources to obtain feedback (the true labels) expecting to improve the predictive performance. In the financial markets investment decisions are made on the basis of the expected return on investment (ROI) [15]. In predictive systems to estimate ROI of adaptation we need to assess costs and benefits of running an adaptive algorithm. This assessment needs to be performed in a standardized way such that different algorithms and their implementations could be compared before putting an adaptive system into operation. Moreover, in order to fully utilize the opportunities presented by the modern market of computing resources, such as cloud computing, we need to be able to decompose adaptive algorithms into critical and optional components and to assess the costs and contributions of each component independently.

In this study we propose a reference framework for assessing the utility of an adaptive algorithm for a given prediction problem. The framework includes a methodology for identifying cost components, a model for analytically combining those components, and a setting for experimental assessment of ROI before putting the algorithm to online operation. We propose using our analytical framework and the ROI – the gain in predictive performance per resources invested – for comparing and justifying algorithmic decisions in designing adaptive prediction systems.

Our study makes the following contributions. Firstly, we introduce and motivate a new research problem for data streams – cost-sensitive adaptation. Secondly, we systematically characterize costs of adaptive learning, which are typically ignored in theoretical work, but are critical for real-world applications. Thirdly, the proposed reference framework makes it possible to compare adaptive algorithms within a given application context in terms of costs and benefits of adaptation. Different businesses naturally have different costs and benefits. Even if the same measure is used for assessing the predictive performance, the implications of 1% improvement may be very different in different applications. For instance, in the airline industry a 10% improvement in the demand prediction accuracy can bring 2–4% additional monetary revenue [16]. Our proof-of-concept experiments demonstrate how the proposed framework can help in deciding upon an optimal adaptation strategy in a chemical production application.

The paper is organized as follows. Section 2 discusses the requirements for adaptation, and overviews adaptation possibilities offered by available computing resources. In Section 3 we propose a framework and accompanying methodology for quantifying utility of adaptation retrospectively and online in real time. Section 4 characterizes existing adaptive learning algorithms following our framework. Section 5 presents a proof-of-concept experimental analysis that demonstrates how the framework can be used for analyzing learning algorithms. Section 6 discusses related work, and Section 7 concludes the study.

Section snippets

Overview of requirements and resources

This section provides a context for our study. We first discuss the requirements for adaptation quoted in research literature and illustrate the need for adaptation with a few application examples. Then we present an overview of currently available computing resources, and discuss what are the possibilities for adaptivity from the technical point of view. The goal of this section is to analyze how adaptivity can be organized and to what extent adaptivity needs to and can be flexible (on demand).

Assessing costs and benefits of adaptation

In this section we consider how to measure the utility of adaptation. First we formally define the setting of adaptive supervised learning and then we present the framework for assessing utility of adaptation and discuss its components.

Adaptation strategies

Many adaptive learning algorithms have been developed, each offering different benefits, but also requiring different resources for adaptation. In terms of resources, five types of adaptation strategies can be distinguished.

Fully Incremental (FI): The predictive model is updated using only the previous model and the latest data observation: LT=f(LT1,{Xt,yt}), here f is the model update function, t is the current time and T=t is the counter of model adaptations. Adaptation is performed on a

Experimental analysis

The goal of this experimental analysis is to demonstrate how a basic instantiation of the proposed evaluation framework could help in the evaluation of adaptive learning algorithms. The intention is to present a proof-of-concept case rather than examining a wide spectrum of methods and problems.

Related work

From the algorithmic perspective our approach to monitoring the performance over time may resemble optimization with adaptive learning rates [32], [33]. Adaptive learning rates in optimization refer to dynamically adjusting the search step in search for the optimum. The goal is typically to arrive at the optimum faster. While the adaptive learning rate procedure is concerned with finding the final optimum, which is fixed, our monitoring of ROI aims to remain at the optimum position while the

Conclusions and future work

We introduced a new research problem of cost-sensitive adaptation and proposed a reference framework for assessing the utility of adaptation in online predictive modeling tasks. The proposed framework defines the components of gains and costs in adaptive online learning, and proposes a way to measure the return resulting from adaptation of the model on the resources invested. As we saw in the airline example, business can estimate concrete monetary gains that can result from improved predictive

Acknowledgments

This research has been supported by the EC within the Marie Curie IAPP Programme (Grant agreement no. 251617), and the Academy of Finland CoE ALGODAN (Grant no. 118653).

Indrė Žliobaitė is a research scientist at Aalto University, Finland. Prior to joining Aalto University she was a research task leader in the INFER.eu project and a lecturer in Computational Intelligence at Bournemouth University, UK. She received her PhD from Vilnius University, Lithuania, in 2010. Her research interests and expertise concentrate around predictive modeling from streaming data.

References (36)

  • Y. Le Borgne et al.

    Adaptive model selection for time series prediction in wireless sensor networks

    Signal Process.

    (2007)
  • P. Kadlec et al.

    Review of adaptation mechanisms for data-driven soft sensors

    Comput. Chem. Eng.

    (2011)
  • D. Hand

    Classifier technology and the illusion of progress

    Stat. Sci.

    (2006)
  • G. Widmer et al.

    Learning in the presence of concept drift and hidden contexts

    Mach. Learn.

    (1996)
  • R. Klinkenberg

    Learning drifting conceptsexample selection vs. example weighting

    Intell. Data Anal.

    (2004)
  • E. Ikonomovska et al.

    Learning model trees from evolving data streams

    Data Min. Knowl. Discov.

    (2011)
  • Y. Koren

    Collaborative filtering with temporal dynamics

    Commun. ACM

    (2010)
  • D. Billsus et al.

    User modeling for adaptive news access

    User Model. User-Adapt. Interact.

    (2000)
  • F. Menczer et al.

    Topical web crawlersevaluating adaptive algorithms

    ACM Trans. Internet Technol.

    (2004)
  • W. Lee et al.

    Adaptive intrusion detectiona data mining approach

    Artif. Intell. Rev.

    (2000)
  • D. Stavens, G. Hoffmann, S. Thrun, Online speed adaptation using supervised learning for high-speed, off-road...
  • A. Bifet, G. Holmes, B. Pfahringer, R. Kirkby, R. Gavalda, New ensemble methods for evolving data streams, in:...
  • A. Bifet, G. Holmes, B. Pfahringer, E. Frank, Fast perceptron decision tree learning from evolving data streams, in:...
  • D. Wolpert et al.

    No free lunch theorems for optimization

    IEEE Trans. Evol. Comput.

    (1997)
  • D. Brealey et al.

    Principles of Corporate Finance

    (1999)
  • S. Riedel et al.

    Combination of multi level forecasts

    J. VLSI Signal Process.

    (2007)
  • A. Bifet et al.

    MOAmassive online analysis

    J. Mach. Learn. Res.

    (2010)
  • C. Giraud-Carrier

    A note on the utility of incremental learning

    AI Commun.

    (2000)
  • Cited by (44)

    • A survey on machine learning for recurring concept drifting data streams

      2023, Expert Systems with Applications
      Citation Excerpt :

      The prequential scheme uses instances more efficiently (Cerqueira et al., 2020). It is suitable for online and incremental algorithms that can adapt to drifts, avoiding retraining strategies when dealing with non-stationary data (Žliobaitė et al., 2015). Continuous learning may only be seen as a requirement in non-stationary scenarios, as other stationary processes could be handled by model reuse.

    • Adaptive online incremental learning for evolving data streams

      2021, Applied Soft Computing
      Citation Excerpt :

      A good adaptive online incremental learning algorithm on data streaming should meet the following requirements [43–45]: (1) Reduce misjudgment and false alarm in the case of the concept drift. High false positive rate requires more training data, which would consume more computing resources [46]. Because the current estimated model does not reflect the changed distribution, the higher false negative rate reduces the classification accuracy. (

    View all citing articles on Scopus

    Indrė Žliobaitė is a research scientist at Aalto University, Finland. Prior to joining Aalto University she was a research task leader in the INFER.eu project and a lecturer in Computational Intelligence at Bournemouth University, UK. She received her PhD from Vilnius University, Lithuania, in 2010. Her research interests and expertise concentrate around predictive modeling from streaming data.

    Marcin Budka received his dual BA+MA degree in Finance and Banking from the University of Economics, Katowice, Poland, in 2003, BSc in Computer Science from the University of Silesia, Poland, in 2005 and PhD in Computational Intelligence from Bournemouth University, UK, in 2010, where he currently holds the lecturer position in Computational Intelligence. His current research interests include information theoretic learning, metalearning, adaptive systems, ensemble models and complex networked systems with focus on evolution and dynamics of social networks.

    Frederic Stahl is a lecturer at the University of Reading. His research interests are in the area of big data analytics, in particular in parallel and distributed data mining; data stream mining; and data mining in resource constraint environments. Before joining the University of Reading, Dr. Stahl was working at Bournemouth University as a lecturer and as senior research associate at the University of Portsmouth. He obtained his PhD from the University of Portsmouth in ‘Parallel Rule Induction’ in 2010.

    View full text