DEER: Distant and Essential Episode Rules for early prediction
Introduction
Data mining is concerned with discovering patterns, trends or relationships hidden in data. Discovering patterns in an event sequence is an important field of data mining. By event sequence, we mean data made up of a single long sequence of events, ordered by a given criterion such as time or position (Laxman & Sastry, 2006). An event is represented by an event-type (from a finite set of items) and an occurrence timestamp. Mining an event sequence generally comes down to mining episodes, where an episode is a temporal pattern made up of relatively close, partially ordered, events that appears often throughout the sequence or in a part of it (Mannila, Toivonen, & Verkamo, 1997). When the order of the events is total, the episode is said to be serial, and when the order is not considered, the episode is parallel. Besides episode mining, it is also possible to mine episode rules. Episode rules are generally used to predict the occurrence of events: the consequent of the rules (Achar, Laxman, Viswanathan, Sastry, 2012, Mannila, Toivonen, Verkamo, 1997). Predicting events can be viewed as a way to perform recommendations (Letham, Rudin, & Madigan, 2013), i.e. by recommending the predicted events or other events that are known as a reaction to this prediction. Let us consider some examples. In the context of web user behavior analysis, the rules can be discovered from interaction logs (Laxman, Tankasali, & White, 2008) and used to predict users behavior. This prediction can in turn be used to recommend the pages to be prefetched by the browser. In the context of road traffic analysis (Cho, Zheng, Wu, & Chen, 2008), rules may be used to predict a traffic jam in a particular road. Recommendations can then be made to limit the traffic flow from specific roads. When analyzing faults log data in a manufacturing plant (Laxman, Sastry, Shadid, & Unnikrishnan, 2009), rules can be used to predict the final state of products, thus solutions can be recommended to improve the plant throughput.
Since it has been introduced by Mannila et al. (1997), the discovery of episodes and episode rules in an event sequence has been an important and active field in pattern mining (Gan, Dai, 2011, Tatti, Cule, 2012, Zhu, Wang, He, Li, Wang, Shi, 2010).
We introduce here the two challenges we take up in this paper in the context of events prediction through the mining and use of episode rules. We also illustrate these challenges through examples of explicative requirements.
Challenge 1: The rules mining task, whether they are association rules (Agrawal, Imieliński, & Swami, 1993) or episode rules (Mannila et al., 1997), is usually decomposed into two sub-tasks (Gan, Dai, 2011, Mannila, Toivonen, Verkamo, 1997). The first one is the discovery of frequent itemsets or frequent episodes (which are, by definition, made up of relatively close events) that have a support higher than a predefined threshold. The second one is the generation of the rules from those frequent itemsets or episodes, with the condition that they have a confidence that exceeds a predefined threshold (Agrawal et al., 1993). Rules are generated by considering some items in the itemset or the last events in the episode as the consequent of the rule, and the rest of the items/events as the antecedent. Since the second sub-task is quite straightforward, most of the researches focus on the first one: the extraction of frequent itemsets or episodes. Relying on these first sub-task makes the consequent of the rules mined close to the antecedent (i.e. relatively with a small temporal distance) in the event sequence. These rules thus allow to predict events that are temporally close to the antecedent. This characteristic is relevant in many applications such as the prediction of user behavior in the web (Laxman et al., 2008) (as presented above), the prediction of customer behavior in stores (Nakahara & Yada, 2012), etc. Furthermore, mining rules dedicated to the prediction of temporally close events has the advantage to limit the running time of the mining process, as a relatively small span is used.
However, in some particular applications, it is preferable to predict temporally distant events. It is for example the case when one needs to have time before the occurrence of the predicted event. To explain this, let us consider social networks and blogs framework. The flow of messages and posts can be viewed as a single sequence of messages and episode rules can be mined and used to predict what will be said. In the context of bank and finance, an interesting prediction can be the fact that a customer who will go into serious debt. If the prediction can be made early, it will be all the more useful as the bank will have time to react and prevent the occurrence of this event. Let us consider again the example related to faults log data analysis in a manufacturing plant (Laxman et al., 2009). Let us suppose that a product needs twenty days to be produced, by adding several components or by performing particular processes each day. Episode rules can be extracted from the fault logs sequence, and used to predict the final state of the product. The plant manager may be interested in predicting early a failure (distant event) in the final state of the product during the production process, in order to be able to rectify the potential errors.
In this work, we focus on the mining of episode rules with the aim of predicting distant events. We assume that, to predict distant events, the rules mined have to reflect a distant relationship between the antecedent and the consequent. We thus face the challenge of mining episode rules with a distant consequent, which we will refer to as distant episode rules.
In the bank domain, as example of such a rule can be: R: [credit card problem, attractive proposition of the concurrence, bad relation with consultant] → [close account] with a 15 days of temporal distance between the antecedent and the consequent. This rule reflects the fact that when a customer has a problem with his/her credit card, then receives offers from a competing bank, in addition has a bad relation with his/her bank consultant, then the customer will probably close his/her bank account 15 days later. This rule is useful as the prediction of the [close account] event can be made about 15 days before the effective closure of the account. So, the bank has time to act to prevent the occurrence of the event. At the opposite, the rule R′: [credit card problem, attractive proposition of the concurrence, bad relation with consultant, not replying customer mails, bad telephone conversation] → [close account] with a 2 days temporal distance between the antecedent and the consequent is not useful, as the prediction is made too late, and the bank will not have time to act to prevent the closure of the account. Notice that R′ differs from R by the presence of two additional events at the end of the antecedent.
To the best of our knowledge, mining distant episode rules has never been proposed in the literature. In addition, traditional episode rules mining algorithms are designed to mine episode rules with a consequent temporally close to the antecedent mainly due to a small span constraint. To mine rules with a distant consequent using these algorithms, a post-processing step has to be run: the rules formed are filtered to keep only those with a consequent that may occur far from the antecedent. However, it is known that it is more efficient, in term of running time, to incorporate the constraints (i.e. distance constraints in our case) within the rules mining algorithm, in comparison to running a post-processing step (Srikant, Vu, & Agrawal, 1997). Our challenge thus lies in the design of an algorithm that integrates the distance within the mining process.
Challenge 2: In addition to predicting distant events, we aim at performing this prediction as soon as possible, in order to maximize the amount of time available to react once the consequent is predicted. Let us consider the previously presented rule: R: [credit card problem, attractive proposition of the concurrence, bad relation with consultant] → [close account] with a temporal distance of 15 days between the antecedent and the consequent. Let us take another rule R′: [credit card problem, attractive proposition of the concurrence] → [close account], with a temporal distance of 20 days between the antecedent and the consequent, which has an antecedent made up of only the first two events of the antecedent of R. If R′ reliably predicts the [close account] event, just as R, we consider that R′ is more useful than R. Indeed, the temporal distance between the antecedent and the consequent is larger in R′ than in R. So, with R′ the consequent is predicted 20 days before its occurrence, instead of 15 days when using R. We thus consider that a rule with a small antecedent, i.e. with an antecedent that is a subset of the antecedent of another rule, is more interesting than a rule with a larger antecedent (as the example of rules above), as this characteristic increases the temporal distance between the antecedent and the consequent.
Notice that a third rule R′′ : [attractive proposition of the concurrence, bad relation with consultant] → [close account] with a temporal distance of 15 days between the antecedent and the consequent, that has an antecedent made up of only the last two events of the antecedent of R is, from our point of view, not more interesting than R, as the moment where the consequent can be predicted is the same whether R or R′′ is used to perform this prediction.
We thus aim at tackling the challenge of mining episode rules with an antecedent as small as possible, i.e. that contains the smallest number of events sufficient to predict the consequent. We propose to denote these rules by essential rules. Traditional algorithms are used to form rules with an antecedent that is minimal in time (through the minimal occurrence frequency and a span that bounds the antecedent), but are not able to guarantee that the antecedent is minimal in number of events. To get such essential rules, they have to rely, once more, on a post-processing step in order to filter the large set of extracted rules, which is highly time consuming.
Fig. 1 presents a rule that we aim at extracting (represented in blue color): R: [credit card problem, attractive proposition of the concurrence] → [close account]. The temporal distance between the antecedent and the consequent is 20 days. This rule is essential.
To face these two challenges of mining distant and essential episode rules, while avoiding the time consuming post-processing step used in traditional mining algorithm, we propose a new episode rules mining algorithm DEER: Distant and Essential Episode Rules. DEER does not rely on an episode mining step nor on a post-processing step to form episode rules. The originality of DEER lies in the time where the consequent of the rule is identified. Indeed, the consequent is identified at an early stage, which guarantees that the consequent is temporally distant from the antecedent. This originality of the mining process also allows to evaluate the confidence of the rules during the mining process, thus allows to form essential rules, without relying on a post-processing step. Furthermore, we expect that the early identification of the consequent will result in a significant decrease of the running time as many candidate occurrences of the rule will be filtered out early in the mining process. Besides, we choose to mine rules with a consequent that contains only one event, which is common in the rule mining task (Agrawal et al., 1993). Notice that DEER can be easily extended to mine rules with a consequent made up of several events.
A simple version of the algorithm DEER has been introduced in Fahed, Brun, and Boyer (2014a) and Fahed (2016); Fahed, Brun, and Boyer (2014b). This paper presents a more comprehensive version of DEER, with associated challenges, definitions, detailed examples, theoretical comparisons with state-of-the-art works, and extensive experimentations.
The rest of this paper is organized as follows: Section 2 presents related works about episodes and episode rules mining and the associated relevant concepts. DEER is introduced in Section 3, followed by a comparative discussion in Section 4. Experimental results are presented in Section 5. We conclude and provide some perspectives in Section 6.
Section snippets
Related works
This section is dedicated to the definition of episodes, episode rules and some related concepts. It also focuses on the presentation of some works from the literature that deal with episode and episode rules mining.
How to mine essential episode rules with distant consequents?
In this section, we start by presenting in details the limitations of traditional algorithms and an introduction of the way we propose to mine essential episode rules with distant consequents through the algorithm DEER, then we present new concepts and redefinitions on which DEER relies, and finally we explain DEER in details.
Comparative discussion: efficiency of DEER in terms of number of extracted rules
In this section, we demonstrate the efficiency of DEER especially the impact of the minimal gap constraint (represented by the sub-window Wininvisible) and the position of the consequent in the sub-window Winconsq.
Concerning the goal of mining essential rules (with a small antecedent and a distant consequent), we present how DEER behaves, in terms of number of rules mined at the end of mining process, relying on three examples of confident rules and compared to the behavior of a
Experimental results
This section is dedicated to the evaluation of the DEER algorithm, in comparison to two state-of-the-art algorithms. The evaluations are run on synthetic and real-world datasets. Experiments are performed on a computer with a 2,3 GHz Intel Core i7 Processor and 16 gigabytes memory, running on OSX system. All the algorithms are implemented in Java.
Conclusion and perspectives
In this paper, we have proposed an original algorithm that mines essential episode rules (with distant consequents and minimal antecedents), learned on an event sequence. The originality of the algorithm lies in the fact that it directly mines serial episode rules, without relying on any episode extraction phase, by determining the distant consequent in an early stage. In addition to the classical support and confidence thresholds, a new confidence measure, the temporal confidence, was proposed
References (57)
- et al.
Pattern-growth based frequent serial episode discovery
Data & Knowledge Engineering
(2013) - et al.
Influencer events in episode rules: a way to impact the occurrence of events
Procedia Computer Science
(2015) - et al.
Efficient mining of frequent episodes from complex sequences
Information systems
(2008) - et al.
Efficient discovery of periodic-frequent patterns in very large databases
Journal of Systems and Software
(2016) - et al.
Applying the maximum utility measure in high utility sequential pattern mining
Expert Systems with Applications
(2014) - et al.
Discovering utility-based episode rules in complex event sequences
Expert Systems with Applications
(2015) - et al.
Constraint based temporal event sequence mining for glioblastoma survival prediction
Journal of Biomedical Informatics
(2016) - et al.
An efficient algorithm for mining high utility patterns from incremental databases with one database scan
Knowledge-Based Systems
(2017) - et al.
Discovering injective episodes with general partial orders
Data Mining and Knowledge Discovery
(2012) - et al.
Mining association rules between sets of items in large databases
ACM SIGMOD record
(1993)
Fast algorithms for mining association rules
Proceedings of the 20th international conference very large data bases, VLDB
Online frequent episode mining
IEEE 31st international conference on data engineering (ICDE)
Mining precise-positioning episode rules from event sequences
IEEE 33rd international conference on data engineering (ICDE)
Enumerating non-redundant association rules using satisfiability
Pacific-Asia conference on knowledge discovery and data mining
Mining high utility itemsets
Data mining, 2003. ICDM 2003. third IEEE international conference on
Early prediction of heart diseases using data mining techniques
Caribbean Journal of Science and Technology
A tree-based approach for event prediction using episode rules over event streams
Database and expert systems applications
Marbles: Mining association rules buried in long event sequences
Statistical Analysis and Data Mining: The ASA Data Science Journal
Prédire et influencer l’apparition des événements dans une séquence complexe, Ph.D. thesis
Efficient discovery of episode rules with a minimal antecedent and a distant consequent
International joint conference on knowledge discovery, knowledge engineering, and knowledge management
Episode rules mining algorithm for distant event prediction
Research Report
Phm: mining periodic high-utility itemsets
Industrial conference on data mining
A survey of sequential pattern mining
Data Science and Pattern Recognition
A survey of itemset mining
Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery
Fast mining of non-derivable episode rules in complex sequences
Modeling decision for artificial intelligence
What happens next? event prediction using a compositional neural network model.
Proceedings of the thirtieth AAAI conference on artificial intelligence
Cited by (14)
Discovering event episodes from sequences of online news articles: A time-adjoining frequent itemset-based clustering method
2020, Information and ManagementCitation Excerpt :As shown, a common evolution pattern for earthquake events involves multiple episodes (development stages), such as epicenter and magnitude, casualties and damages, rescue actions, and rebuilding and restoration efforts. Supported by effective EEP discovery, firms can identify distinct stages of an emerging event, which they otherwise might overlook, and thereby can adapt better to the changing environment with agility and appropriate responses [23–26]. All else being equal, the common evolution pattern of a specific event type, if effectively identified and tracked, enables firms to anticipate and respond better to subsequent developments of different events of that type [27].
A survey of episode mining
2023, Wiley Interdisciplinary Reviews: Data Mining and Knowledge DiscoveryNatural Exponent Inertia Weight-based Particle Swarm Optimization for Mining Serial Episode Rules from Event Sequences
2023, IETE Journal of ResearchPattern discovery in colored strings
2021, ACM Journal of Experimental Algorithmics
- 1
This work has been prepared while the first author was at Lorraine University - LORIA / KIWI Team. Now the first author is working at IMT-Atlantique, Lab-STICC / DECIDE Team, University of Bretagne Loire, F-29238 Brest, France.