Elsevier

Expert Systems with Applications

Volume 93, 1 March 2018, Pages 283-298
Expert Systems with Applications

DEER: Distant and Essential Episode Rules for early prediction

https://doi.org/10.1016/j.eswa.2017.10.035Get rights and content

Highlights

  • An algorithm of episode rules mining in an event sequence for early prediction.

  • The extracted rules have a temporally distant consequent and a minimal antecedent.

  • The algorithm outperforms traditional algorithms in running time and scalability.

Abstract

Events prediction in a sequence of events is a challenging task that can be approached with data mining. In this paper, we focus on the specific case of early prediction of distant events. We aim at mining episode rules with a consequent temporally distant from the antecedent and an antecedent as small as possible both in number of events and in occurrence duration. We refer to these rules as essential rules. To reach this goal, we propose an original algorithm, DEER: Distant and Essential Episode Rules. This algorithm differs from traditional algorithms in three points. First, it determines the consequent of episode rules at an early stage in the mining process, which allows to mine rules with an antecedent as small as possible. Second, it applies a minimal gap constraint between the antecedent and the consequent to guarantee a suitable distance between both elements. Third, the stop criterion used considers both the support and the confidence of the rules, at the opposite of traditional algorithms that use only the support.

Experiments on both synthetic and real datasets show that DEER runs faster than several algorithms of the state-of-the-art and has also good scalability on large datasets. Furthermore, by studying in details the episode rules mined from a real dataset of blog messages, we demonstrate not only the efficiency of our algorithm for mining interesting essential rules with a distant consequent, but also that these rules can be used to accurately predict distant events.

Introduction

Data mining is concerned with discovering patterns, trends or relationships hidden in data. Discovering patterns in an event sequence is an important field of data mining. By event sequence, we mean data made up of a single long sequence of events, ordered by a given criterion such as time or position (Laxman & Sastry, 2006). An event is represented by an event-type (from a finite set of items) and an occurrence timestamp. Mining an event sequence generally comes down to mining episodes, where an episode is a temporal pattern made up of relatively close, partially ordered, events that appears often throughout the sequence or in a part of it (Mannila, Toivonen, & Verkamo, 1997). When the order of the events is total, the episode is said to be serial, and when the order is not considered, the episode is parallel. Besides episode mining, it is also possible to mine episode rules. Episode rules are generally used to predict the occurrence of events: the consequent of the rules (Achar, Laxman, Viswanathan, Sastry, 2012, Mannila, Toivonen, Verkamo, 1997). Predicting events can be viewed as a way to perform recommendations (Letham, Rudin, & Madigan, 2013), i.e. by recommending the predicted events or other events that are known as a reaction to this prediction. Let us consider some examples. In the context of web user behavior analysis, the rules can be discovered from interaction logs (Laxman, Tankasali, & White, 2008) and used to predict users behavior. This prediction can in turn be used to recommend the pages to be prefetched by the browser. In the context of road traffic analysis (Cho, Zheng, Wu, & Chen, 2008), rules may be used to predict a traffic jam in a particular road. Recommendations can then be made to limit the traffic flow from specific roads. When analyzing faults log data in a manufacturing plant (Laxman, Sastry, Shadid, & Unnikrishnan, 2009), rules can be used to predict the final state of products, thus solutions can be recommended to improve the plant throughput.

Since it has been introduced by Mannila et al. (1997), the discovery of episodes and episode rules in an event sequence has been an important and active field in pattern mining (Gan, Dai, 2011, Tatti, Cule, 2012, Zhu, Wang, He, Li, Wang, Shi, 2010).

We introduce here the two challenges we take up in this paper in the context of events prediction through the mining and use of episode rules. We also illustrate these challenges through examples of explicative requirements.

Challenge 1: The rules mining task, whether they are association rules (Agrawal, Imieliński, & Swami, 1993) or episode rules (Mannila et al., 1997), is usually decomposed into two sub-tasks (Gan, Dai, 2011, Mannila, Toivonen, Verkamo, 1997). The first one is the discovery of frequent itemsets or frequent episodes (which are, by definition, made up of relatively close events) that have a support higher than a predefined threshold. The second one is the generation of the rules from those frequent itemsets or episodes, with the condition that they have a confidence that exceeds a predefined threshold (Agrawal et al., 1993). Rules are generated by considering some items in the itemset or the last events in the episode as the consequent of the rule, and the rest of the items/events as the antecedent. Since the second sub-task is quite straightforward, most of the researches focus on the first one: the extraction of frequent itemsets or episodes. Relying on these first sub-task makes the consequent of the rules mined close to the antecedent (i.e. relatively with a small temporal distance) in the event sequence. These rules thus allow to predict events that are temporally close to the antecedent. This characteristic is relevant in many applications such as the prediction of user behavior in the web (Laxman et al., 2008) (as presented above), the prediction of customer behavior in stores (Nakahara & Yada, 2012), etc. Furthermore, mining rules dedicated to the prediction of temporally close events has the advantage to limit the running time of the mining process, as a relatively small span is used.

However, in some particular applications, it is preferable to predict temporally distant events. It is for example the case when one needs to have time before the occurrence of the predicted event. To explain this, let us consider social networks and blogs framework. The flow of messages and posts can be viewed as a single sequence of messages and episode rules can be mined and used to predict what will be said. In the context of bank and finance, an interesting prediction can be the fact that a customer who will go into serious debt. If the prediction can be made early, it will be all the more useful as the bank will have time to react and prevent the occurrence of this event. Let us consider again the example related to faults log data analysis in a manufacturing plant (Laxman et al., 2009). Let us suppose that a product needs twenty days to be produced, by adding several components or by performing particular processes each day. Episode rules can be extracted from the fault logs sequence, and used to predict the final state of the product. The plant manager may be interested in predicting early a failure (distant event) in the final state of the product during the production process, in order to be able to rectify the potential errors.

In this work, we focus on the mining of episode rules with the aim of predicting distant events. We assume that, to predict distant events, the rules mined have to reflect a distant relationship between the antecedent and the consequent. We thus face the challenge of mining episode rules with a distant consequent, which we will refer to as distant episode rules.

In the bank domain, as example of such a rule can be: R: [credit card problem, attractive proposition of the concurrence, bad relation with consultant]  →  [close account] with a 15 days of temporal distance between the antecedent and the consequent. This rule reflects the fact that when a customer has a problem with his/her credit card, then receives offers from a competing bank, in addition has a bad relation with his/her bank consultant, then the customer will probably close his/her bank account 15 days later. This rule is useful as the prediction of the [close account] event can be made about 15 days before the effective closure of the account. So, the bank has time to act to prevent the occurrence of the event. At the opposite, the rule R′: [credit card problem, attractive proposition of the concurrence, bad relation with consultant, not replying customer mails, bad telephone conversation]  →  [close account] with a 2 days temporal distance between the antecedent and the consequent is not useful, as the prediction is made too late, and the bank will not have time to act to prevent the closure of the account. Notice that R′ differs from R by the presence of two additional events at the end of the antecedent.

To the best of our knowledge, mining distant episode rules has never been proposed in the literature. In addition, traditional episode rules mining algorithms are designed to mine episode rules with a consequent temporally close to the antecedent mainly due to a small span constraint. To mine rules with a distant consequent using these algorithms, a post-processing step has to be run: the rules formed are filtered to keep only those with a consequent that may occur far from the antecedent. However, it is known that it is more efficient, in term of running time, to incorporate the constraints (i.e. distance constraints in our case) within the rules mining algorithm, in comparison to running a post-processing step (Srikant, Vu, & Agrawal, 1997). Our challenge thus lies in the design of an algorithm that integrates the distance within the mining process.

Challenge 2: In addition to predicting distant events, we aim at performing this prediction as soon as possible, in order to maximize the amount of time available to react once the consequent is predicted. Let us consider the previously presented rule: R: [credit card problem, attractive proposition of the concurrence, bad relation with consultant] →  [close account] with a temporal distance of 15 days between the antecedent and the consequent. Let us take another rule R′: [credit card problem, attractive proposition of the concurrence]  →  [close account], with a temporal distance of 20 days between the antecedent and the consequent, which has an antecedent made up of only the first two events of the antecedent of R. If R′ reliably predicts the [close account] event, just as R, we consider that R′ is more useful than R. Indeed, the temporal distance between the antecedent and the consequent is larger in R′ than in R. So, with R′ the consequent is predicted 20 days before its occurrence, instead of 15 days when using R. We thus consider that a rule with a small antecedent, i.e. with an antecedent that is a subset of the antecedent of another rule, is more interesting than a rule with a larger antecedent (as the example of rules above), as this characteristic increases the temporal distance between the antecedent and the consequent.

Notice that a third rule R′′ : [attractive proposition of the concurrence, bad relation with consultant] →  [close account] with a temporal distance of 15 days between the antecedent and the consequent, that has an antecedent made up of only the last two events of the antecedent of R is, from our point of view, not more interesting than R, as the moment where the consequent can be predicted is the same whether R or R′′ is used to perform this prediction.

We thus aim at tackling the challenge of mining episode rules with an antecedent as small as possible, i.e. that contains the smallest number of events sufficient to predict the consequent. We propose to denote these rules by essential rules. Traditional algorithms are used to form rules with an antecedent that is minimal in time (through the minimal occurrence frequency and a span that bounds the antecedent), but are not able to guarantee that the antecedent is minimal in number of events. To get such essential rules, they have to rely, once more, on a post-processing step in order to filter the large set of extracted rules, which is highly time consuming.

Fig. 1 presents a rule that we aim at extracting (represented in blue color): R: [credit card problem, attractive proposition of the concurrence] →  [close account]. The temporal distance between the antecedent and the consequent is 20 days. This rule is essential.

To face these two challenges of mining distant and essential episode rules, while avoiding the time consuming post-processing step used in traditional mining algorithm, we propose a new episode rules mining algorithm DEER: Distant and Essential Episode Rules. DEER does not rely on an episode mining step nor on a post-processing step to form episode rules. The originality of DEER lies in the time where the consequent of the rule is identified. Indeed, the consequent is identified at an early stage, which guarantees that the consequent is temporally distant from the antecedent. This originality of the mining process also allows to evaluate the confidence of the rules during the mining process, thus allows to form essential rules, without relying on a post-processing step. Furthermore, we expect that the early identification of the consequent will result in a significant decrease of the running time as many candidate occurrences of the rule will be filtered out early in the mining process. Besides, we choose to mine rules with a consequent that contains only one event, which is common in the rule mining task (Agrawal et al., 1993). Notice that DEER can be easily extended to mine rules with a consequent made up of several events.

A simple version of the algorithm DEER has been introduced in Fahed, Brun, and Boyer (2014a) and Fahed (2016); Fahed, Brun, and Boyer (2014b). This paper presents a more comprehensive version of DEER, with associated challenges, definitions, detailed examples, theoretical comparisons with state-of-the-art works, and extensive experimentations.

The rest of this paper is organized as follows: Section 2 presents related works about episodes and episode rules mining and the associated relevant concepts. DEER is introduced in Section 3, followed by a comparative discussion in Section 4. Experimental results are presented in Section 5. We conclude and provide some perspectives in Section 6.

Section snippets

Related works

This section is dedicated to the definition of episodes, episode rules and some related concepts. It also focuses on the presentation of some works from the literature that deal with episode and episode rules mining.

How to mine essential episode rules with distant consequents?

In this section, we start by presenting in details the limitations of traditional algorithms and an introduction of the way we propose to mine essential episode rules with distant consequents through the algorithm DEER, then we present new concepts and redefinitions on which DEER relies, and finally we explain DEER in details.

Comparative discussion: efficiency of DEER in terms of number of extracted rules

In this section, we demonstrate the efficiency of DEER especially the impact of the minimal gap constraint (represented by the sub-window Wininvisible) and the position of the consequent in the sub-window Winconsq.

Concerning the goal of mining essential rules (with a small antecedent and a distant consequent), we present how DEER behaves, in terms of number of rules mined at the end of mining process, relying on three examples of confident rules and compared to the behavior of a

Experimental results

This section is dedicated to the evaluation of the DEER algorithm, in comparison to two state-of-the-art algorithms. The evaluations are run on synthetic and real-world datasets. Experiments are performed on a computer with a 2,3 GHz Intel Core i7 Processor and 16 gigabytes memory, running on OSX system. All the algorithms are implemented in Java.

Conclusion and perspectives

In this paper, we have proposed an original algorithm that mines essential episode rules (with distant consequents and minimal antecedents), learned on an event sequence. The originality of the algorithm lies in the fact that it directly mines serial episode rules, without relying on any episode extraction phase, by determining the distant consequent in an early stage. In addition to the classical support and confidence thresholds, a new confidence measure, the temporal confidence, was proposed

References (57)

  • R. Agrawal et al.

    Fast algorithms for mining association rules

    Proceedings of the 20th international conference very large data bases, VLDB

    (1994)
  • X. Ao et al.

    Online frequent episode mining

    IEEE 31st international conference on data engineering (ICDE)

    (2015)
  • X. Ao et al.

    Mining precise-positioning episode rules from event sequences

    IEEE 33rd international conference on data engineering (ICDE)

    (2017)
  • A. Boudane et al.

    Enumerating non-redundant association rules using satisfiability

    Pacific-Asia conference on knowledge discovery and data mining

    (2017)
  • Center for Ultra-scale Computing and Information Security (2006)....
  • R. Chan et al.

    Mining high utility itemsets

    Data mining, 2003. ICDM 2003. third IEEE international conference on

    (2003)
  • V. Chaurasia et al.

    Early prediction of heart diseases using data mining techniques

    Caribbean Journal of Science and Technology

    (2013)
  • C.-W. Cho et al.

    A tree-based approach for event prediction using episode rules over event streams

    Database and expert systems applications

    (2008)
  • B. Cule et al.

    Marbles: Mining association rules buried in long event sequences

    Statistical Analysis and Data Mining: The ASA Data Science Journal

    (2014)
  • L. Fahed

    Prédire et influencer l’apparition des événements dans une séquence complexe, Ph.D. thesis

    (2016)
  • L. Fahed et al.

    Efficient discovery of episode rules with a minimal antecedent and a distant consequent

    International joint conference on knowledge discovery, knowledge engineering, and knowledge management

    (2014)
  • L. Fahed et al.

    Episode rules mining algorithm for distant event prediction

    Research Report

    (2014)
  • P. Fournier-Viger et al.

    Phm: mining periodic high-utility itemsets

    Industrial conference on data mining

    (2016)
  • P. Fournier-Viger et al.

    A survey of sequential pattern mining

    Data Science and Pattern Recognition

    (2017)
  • P. Fournier-Viger et al.

    A survey of itemset mining

    Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery

    (2017)
  • Frequent Itemset Mining Implementations Repository (2006)....
  • M. Gan et al.

    Fast mining of non-derivable episode rules in complex sequences

    Modeling decision for artificial intelligence

    (2011)
  • M. Granroth-Wilding et al.

    What happens next? event prediction using a compositional neural network model.

    Proceedings of the thirtieth AAAI conference on artificial intelligence

    (2016)
  • Cited by (14)

    View all citing articles on Scopus
    1

    This work has been prepared while the first author was at Lorraine University - LORIA / KIWI Team. Now the first author is working at IMT-Atlantique, Lab-STICC / DECIDE Team, University of Bretagne Loire, F-29238 Brest, France.

    View full text