Mining sequential patterns for classification

Fradkin, Dmitriy; Mörchen, Fabian

doi:10.1007/s10115-014-0817-0

Mining sequential patterns for classification

Regular Paper
Published: 03 January 2015

Volume 45, pages 731–749, (2015)
Cite this article

Knowledge and Information Systems Aims and scope Submit manuscript

Dmitriy Fradkin¹ &
Fabian Mörchen²

2056 Accesses
53 Citations
Explore all metrics

Abstract

While a number of efficient sequential pattern mining algorithms were developed over the years, they can still take a long time and produce a huge number of patterns, many of which are redundant. These properties are especially frustrating when the goal of pattern mining is to find patterns for use as features in classification problems. In this paper, we describe BIDE-Discriminative, a modification of BIDE that uses class information for direct mining of predictive sequential patterns. We then perform an extensive evaluation on nine real-life datasets of the different ways in which the basic BIDE-Discriminative can be used in real multi-class classification problems, including 1-versus-rest and model-based search tree approaches. The results of our experiments show that 1-versus-rest provides an efficient solution with good classification performance.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Notes

References

Agrawal R, Imielinski T, Swami AN (1993) Mining association rules between sets of items in large databases. In: Proceedings of the 1993 ACM SIGMOD international ACM Press, conference on management of data, pp 207–216
Agrawal R, Srikant R (1995) Mining sequential patterns. In: ICDE. IEEE Press, pp 3–14
Asuncion A, Newman D (n.d.) UCI Machine Learning Repository
Batal I, Fradkin D, Harrison J, Moerchen F, Hauskrecht M (2012) Mining recent temporal patterns for event detection in multivariate time series data. In: Proceedings of the 18th ACM SIGKDD international conference on knowledge discovery and data mining, pp 280–288. doi:10.1145/2339530.2339578
Batal I, Valizadegan H, Cooper GF, Hauskrecht M (2011) A pattern mining approach for classifying multivariate temporal data. In: Proceedings of the 2011 IEEE international conference on bioinformatics and biomedicine, pp 358–365. doi:10.1109/BIBM.2011.39
Bringmann B, Zimmermann A (2008) One in a million: picking the right patterns. Knowl Inf Syst 18(1):61–81
Article Google Scholar
Bringmann B, Zimmermann A, Raedt L, Nijssen S (2006) Dont be afraid of simpler patterns. In: Frnkranz J, Scheffer T, Spiliopoulou M (eds) Knowledge discovery in databases: PKDD 2006, vol 4213 of LNCS. Springer, Berlin, pp 55–66. doi:10.1007/11871637_10
Buza K, Schmidt-Thieme L (2010) Motif-based classification of time series with bayesian networks and svms. In: Fink A, Lausen B, Seidel W, Ultsch A (eds) Advances in data analysis, data handling and business intelligence. Studies in classification, data analysis, and knowledge organization. Springer, Berlin, pp 105–114. doi:10.1007/978-3-642-01044-6_9
Carbonell J, Coldstein J (1998) The use of MMR, diversity-based reranking for reordering documents and producing summaries. In: Proceedings of SIGIR, p 335336
Cheng H, Yan X, Han J, Hsu C-W (2007) Discriminative frequent pattern analysis for effective classification. In: Proceedings of the IEEE ICDE
Cheng H, Yan X, Han J, Yu PS (2008) Direct discriminative pattern mining for effective classification. In: ICDE, pp 169–178
Cover TM, Thomas JA (2006) Elements of information theory, 2nd edn. Wiley, New York
Dong G, Pei J (2007) Sequence data mining. Morgan Kaufmann, Burlington
MATH Google Scholar
Fan R-E, Chang K-W, Hsieh C-J, Wang X-R, Lin C-J (2008) Liblinear: a library for large linear classification. J Mach Learn Res 9:1871–1874
MATH Google Scholar
Fan W, Zhang K, Cheng H, Gao J, Yan X, Han J, Yu PS, Verscheure O (2008) Direct mining of discriminative and essential frequent patterns via model-based search tree. In: KDD, pp 230–238
Fern A (2004) Learning models and formulas of a temporal event logic. PhD thesis, Purdue University, West Lafayette, IN, USA
Fradkin D, Moerchen F (2010) Margin-closed frequent sequential pattern mining. KDD workshop on useful patterns. ACM, New York, NY, USA, pp 45–54
Grahne G, Zhu J (2003) Efficiently using prefix-trees in mining frequent itemsets. In: ICDM workshop on frequent itemset mining implementations
Guyon I, Elisseeff A (2003) An introduction to variable and feature selection. J Mach Learn Res 3:1157–1182
MATH Google Scholar
Han J, Kamber M (2006) Data mining: concepts and techniques, 2nd edn. Morgan Kaufmann, Burlington
Google Scholar
Han J, Pei J, Yin Y (2000) Mining frequent patterns without candidate generation. In: Proceedings of the ACM SIGMOD international conference on management of data. ACM Press, pp 1–12
Han J, Pei J, Yin Y (2000) Mining frequent patterns without candidate generation. In: SIGMOD, pp 1–12
Ifrim G, Bakir GH, Weikum G (2008) Fast logistic regression for text categorization with variable-length n-grams. In: KDD, pp 354–362
Ifrim G, Wiuf C (2011) Bounded coordinate-descent for biological sequence classification in high dimensional predictor space. In: KDD
Kadous MW (2002) Temporal classification: extending the classification paradigm to multivariate time series. PhD thesis, University of New South Wales
Kerr W, Cohen P, Chang Y-H (2008) Learning and playing in wubble world. In: Proceedings of the fourth artificial intelligence and interactive digital entertainment conference, pp 66–71
Knobbe AJ, Ho EKY (2006) Pattern teams. In: PKDD, pp 577–584
Lee J-G, Han J, Li X, Cheng H (2011) Mining discriminative patterns for classifying trajectories on road networks. IEEE Trans Knowl Data Eng 23(5):713–726
Article Google Scholar
Lin J, Keogh E, Lonardi S, Chiu B (2003) A symbolic representation of time series, with implications for streaming algorithms. In: Proceedings of the 2003 ACM SIGMOD workshop on research issues in data mining and knowledge discovery. ACM Press, pp 2–11. URL:http://citeseer.ist.psu.edu/583097.html
Lo D, Cheng H, Cia L (2011) Mining closed discriminative dyadic sequential patterns. In: EDBT
Lo D, Han J, Cheng H, Khoo S-C, Sun C (2009) Classification of software behaviros for failure detection: a discriminative pattern mining approach. In: Proceedings of KDD
Lucchese C, Orlando S, Perego R (2006) Fast and memory efficient mining of frequent closed itemsets. IEEE Trans Knowl Data Eng 18(1):21–36
Article Google Scholar
Mäntyjärvi J, Himberg J, Kangas P, Tuomela U, Huuskonen P (2004) Sensor signal data set for exploring context recognition of mobile devices. In: Proceedings of PERVASIVE. Springer, pp 18–23
Moerchen F, Thies M, Ultsch A (2011) Efficient mining of all margin-closed itemsets with applications in temporal knowledge discovery and classification by compression. Knowl Inf Syst 29:55–80. doi:10.1007/s10115-010-0329-5
Mörchen F, Ultsch A (2005) Optimizing time series discretization for knowledge discovery. In: Proceedings of the ACM SIGKDD. ACM Press, pp 660–665
Mörchen F, Ultsch A (2007) Efficient mining of understandable patterns from multivariate interval time series. Data Min Knowl Discov 15(2):181–215. doi:10.1007/s10618-007-0070-1
Morishita S, Sese J (2000) Traversing itemset lattice with statistical metric pruning. In: PODS, pp 226–236
Nijssen S, Kok J (2006) Multi-class correlated pattern mining. In: Bonchi F, Boulicaut J-F (eds) Knowledge discovery in inductive databases, vol 3933 of LNCS. Springer, Berlin, pp 165–187. doi:10.1007/11733492_10
Ohara K, Hara M, Takabayashi K, Motoda H, Washio T (2008) Pruning strategies based on the upper bound of information gain for discriminative subgraph mining. In: PKAW’08, pp 50–60
Papaterou P, Kollios G, Sclaroff S, Gunopoulos D (2005) Discovering frequent arrangements of temporal intervals. In: ICDM, pp 354–361
Pei J, Han J, Mortazavi-Asl B, Pinto H, Chen Q, Dayal U, Hsu M-C (2001) PrefixSpan: mining sequential patterns efficiently by prefix-projected pattern growth. In: Proceedings of the IEEE ICDE. IEEE Press, pp 215–224
Sese J, Morishita S (2004) Itemset classified clustering, In: Boulicaut J-F, Esposito F, Giannotti F, Pedreschi D (eds) Knowledge discovery in databases: PKDD 2004, vol 3202 of LNCS. Springer, Berlin, pp 398–409. doi:10.1007/978-3-540-30116-5_37
Sipos R, Fradkin D, Moerchen F, Wang Z (2014) Log-based predictive maintenance, In: Proceedings of the 20th ACM SIGKDD international conference on knowledge discovery and data mining, pp 1867–1876. doi:10.1145/2623330.2623340
Srikant R, Agrawal R (1996) Mining sequential patterns: generalizations and performance improvements. In: Proceedings of the 5th international conference on extending database technology (EDBT). Springer, pp 3–17. URL:http://citeseer.ist.psu.edu/article/srikant96mining.html
Starner T, Weaver J, Pentland A (1998) Real-time American sign language recognition using desk and wearable computer-based video. IEEE Trans Pattern Anal Mach Intell 20(12):1371–1375. doi:10.1109/34.735811
Wang J, Han J (2004) BIDE: Efficient mining of frequent closed sequences. In: ICDE. IEEE Press, pp 79–90
Wang J, Han J, Li C (2007) Frequent closed sequence mining without candidate maintenance. IEEE Trans Knowl Data Eng 19(8):1042–1056
Article MathSciNet Google Scholar
Wu S-Y, Chen Y-L (2007) Mining nonambiguous temporal patterns for interval-based events. IEEE Trans Knowl Data Eng 19(6):742–758
Article Google Scholar
Xu W, Huang L, Fox A, Patterson D, Jordan M (2008) Mining console logs for large-scale system problem detection. In: Proceedings of the 3rd workshop on tackling computer systems problems with machine learning techniques
Yan X, Han J (2002) gspan: Graph-based substructure pattern mining. In: ICDM
Yang Y, Pedersen J (1997) A comparative study on feature selection in text categorization. In: ICML, pp 412–420
Zaki M (2001) Spade: an efficient algorithm for mining frequent sequences. Mach Learn 42:31–60
Article MATH Google Scholar
Zaki MJ, Hsiao C-J (2002) CHARM: an efficient algorithm for closed itemset mining. In: Proceedings of the 2nd SIAM international conference on data mining (SDM), SIAM, pp 457–473

Download references

Author information

Authors and Affiliations

Siemens Corporate Technology, 755 College Rd East, Princeton, NJ, 08540, USA
Dmitriy Fradkin
Amazon, Seattle, WA, USA
Fabian Mörchen

Authors

Dmitriy Fradkin
View author publications
You can also search for this author in PubMed Google Scholar
Fabian Mörchen
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Dmitriy Fradkin.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Fradkin, D., Mörchen, F. Mining sequential patterns for classification. Knowl Inf Syst 45, 731–749 (2015). https://doi.org/10.1007/s10115-014-0817-0

Download citation

Received: 15 May 2014
Revised: 08 October 2014
Accepted: 26 December 2014
Published: 03 January 2015
Issue Date: December 2015
DOI: https://doi.org/10.1007/s10115-014-0817-0

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Mining sequential patterns for classification

Abstract

Access this article

Similar content being viewed by others

A user parameter-free approach for mining robust sequential classification rules

A Novel Sequential Pattern Mining Algorithm for Large Scale Data Sequences

Itemset Based Sequence Classification

Notes

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Mining sequential patterns for classification

Abstract

Access this article

Similar content being viewed by others

A user parameter-free approach for mining robust sequential classification rules

A Novel Sequential Pattern Mining Algorithm for Large Scale Data Sequences

Itemset Based Sequence Classification

Notes

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation