Abstract
Owing to important applications such as mining web page traversal sequences, many algorithms have been introduced in the area of sequential pattern mining over the last decade, most of which have also been modified to support concise representations like closed, maximal, incremental or hierarchical sequences. This article presents a taxonomy of sequential pattern-mining techniques in the literature with web usage mining as an application. This article investigates these algorithms by introducing a taxonomy for classifying sequential pattern-mining algorithms based on important key features supported by the techniques. This classification aims at enhancing understanding of sequential pattern-mining problems, current status of provided solutions, and direction of research in this area. This article also attempts to provide a comparative performance analysis of many of the key techniques and discusses theoretical aspects of the categories in the taxonomy.
- Agrawal, R., Imielinski, T., and Swami, A. 1993. Mining association rules between sets of items in large databases. In Proceedings of the ACM SIGMOD Conference on Management of Data. ACM, New York, 207--216. Google ScholarDigital Library
- Agrawal, R. and Srikant, R. 1994. Fast algorithms for mining association rules. In Proceedings of the1994 International Conference on Very Large Data Bases (VLDB'94). 487--499. Google ScholarDigital Library
- Agrawal, R. and Srikant, R. 1995. Mining sequential patterns. In Proceedings of the 11th Conference on Data Engineering (ICDE'95), 3--14. Google ScholarDigital Library
- Antunes, C. and Oliveira, A. L. 2004. Sequential pattern mining algorithms: Trade-offs between speed and memory. In Proceedings of the Workshop on Mining Graphs, Trees and Sequences (MGTS-ECML/PKDD '04).Google Scholar
- Ayres, J., Flannick, J., Gehrke, J., and Yiu, T. 2002. Sequential pattern mining using a bitmap representation. In Proceedings of the 8th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 429--435. Google ScholarDigital Library
- Brin, S., Motwani, R., Ullman, J. D., and Tsur, S. 1997. Dynamic itemset counting and implication rules for market basket analysis. In Proceedings of the ACM-SIGMOD International Conference on Management of Data (SIGMOD'97). ACM, New York, 255--264. Google ScholarDigital Library
- Chiu, D.-Y., Wu, Y.-H., and Chen, A. L. P. 2004. An efficient algorithm for mining frequent sequences by a new strategy without support counting. In Proceedings of the 20th International Conference on Data Engineering. 375--386. Google ScholarDigital Library
- Dave, B. A. and Priestley, H. A. 1990. Introduction to Lattices and Order. Cambridge University Press.Google Scholar
- Dunham, M. H. 2003.Data Mining: Introductory and Advanced Topics. Prentice Hall, Englewood Cliffs, NJ. Google ScholarDigital Library
- El-Sayed, M., Ruiz, C., and Rundensteiner, E. A. 2004. FS-Miner: Efficient and incremental mining of frequent sequence patterns in web logs. In Proceedings of the 6th Annual ACM International Workshop on Web Information and Data Management. ACM, New York, 128--135. Google ScholarDigital Library
- Ezeife, C. I. and Lu, Y. 2005. Mining web log sequential patterns with position coded pre-order linked WAP-tree.Int. J. Data Mining Knowl. Discovery 10, 5--38. Google ScholarDigital Library
- Ezeife, C. I., Lu, Y., and Liu, Y. 2005. PLWAP sequential mining: Open source code. In Proceedings of the 1st International Workshop on Open Source Data Mining: Frequent Pattern Mining Implementation (SIGKDD), ACM, New York, 26--35. Google ScholarDigital Library
- Facca, F. M. and Lanzi, P. L. 2003. Recent developments in web usage mining research. In Proceedings of the 5th International Conference on Data Warehousing and Knowledge Discovery. (DaWaK'03), Lecture Notes in Computer Science, Springer, Berlin.Google Scholar
- Facca, F. M. And Lanzi, P. L. 2005. Mining interesting knowledge from weblogs: A survey.Data Knowl. Eng. 53, 3 225--241. Google ScholarDigital Library
- Goethals, B. 2005. Frequent set mining. In The Data Mining and Knowledge Discovery Handbook, O. Maimon and L. Rokach Eds., Springer, Berlin, 377--397.Google Scholar
- Han, J. and Fu, Y. 1995. Discovery of multiple-level association rules from large databases. In Proceedings of the International Conference on Very Large Data Bases (VLDB'95). 420--431. Google ScholarDigital Library
- Han, J., Pei, J., Mortazavi-Asl, B., Chen, Q., Dayal, U., and Hsu, M.-C. 2000. Freespan: Frequent pattern-projected sequential pattern mining. In Proceedings of the 6th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, New York, 355--359. Google ScholarDigital Library
- Han, J., Pei, J., and Yin, Y. 2000. Mining frequent patterns without candidate generation. In Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD'00), ACM, New York, 1--12. Google ScholarDigital Library
- Huang, J.-W., Tseng, C.-Y., Ou, J.-C., and Chen, M.-S. 2006. On progressive sequential pattern mining. In Proceedings of the 15th ACM International Conference on Information and Knowledge Management. ACM, New York, 850--851. Google ScholarDigital Library
- Iváncsy, R. and Vajk, I. 2006. Frequent pattern mining in web log data. Acta Polytech. Hungarica 3, 1, 77--90.Google Scholar
- Jin, X. 2006. Task-oriented modeling for the discovery of web user navigational patterns. Ph.D. dissertation, School of Computer Science. DePaul University, Chicago, IL. Google ScholarDigital Library
- Liu, B. 2007. Web Data Mining: Exploring Hyperlinks, Contents, and Usage Data. Springer, Berlin. Google ScholarDigital Library
- Lu, Y. and Ezeife, C. I. 2003. Position coded pre-order linked WAP-tree for web log sequential pattern mining. In Proceedings of the 7th Pacific-Asia Conference on Knowledge Discovery and Data Mining. Lecture Notes in Computer Science, Springer, Berlin, 337--349. Google ScholarDigital Library
- Masseglia, F., Poncelet, O., and Cicchetti, R. 1999. An efficient algorithm for web usage mining. Network Inform. Syst. J. 2, 571--603.Google Scholar
- Masseglia, F., Teisseire, M., and Poncelet, P. 2005. Sequential pattern mining: A survey on issues and approaches. In Encyclopedia of Data Warehousing and Mining,1--14.Google Scholar
- Nandi, A. and Jagadish, H.V. 2007. Effective phrase prediction. In Proceedings of the International Conference on Very Large Data Bases (VLDB'07). 219--230. Google ScholarDigital Library
- Nanopoulos, A. and Manolopoulos, Y. 2000. Finding generalized path patterns for web log data mining. In Proceedings of the East-European Conference on Advances in Databases and Information Systems. (Held jointly with the International Conference on Database Systems for Advanced Applications: Current Issues in Databases and Information Systems, 215--228. Google ScholarDigital Library
- Park, J. S., Chen, M. S., and Yu, P. S. 1995. An effective hash-based algorithm for mining association rules. In Proceedings of the 1995 ACM-SIGMOD International Conference on Management of Data (SIGMOD'95). ACM, New York, 175--186. Google ScholarDigital Library
- Parthasarathy, S., Zaki, M.J., Ogihara, M., and Dwarkadas, S. 1999. Incremental and interactive sequence mining. In Proceedings of the 8th International Conference on Information and Knowledge Management. ACM, New York, 251--258. Google ScholarDigital Library
- Pei, J., Han, J., Mortazavi-Asl, B., and Pinto, H. 2001. PrefixSpan: Mining sequential patterns efficiently by prefix-projected pattern growth. In Proceedings of the International Conference on Data Engineering. 215--224. Google ScholarDigital Library
- Pei, J., Han, J., Mortazavi-Asl, B., and Zhu, H. 2000. Mining access patterns efficiently from web logs. In Knowledge Discovery and Data Mining. Current Issues and New Applications. Lecture Notes Computer Science, vol. 1805, Springer, Berlin, 396--407. Google ScholarDigital Library
- Rymon, R. 1992. Search through systematic set enumeration. In Proceedings of the 3rd International Conference. on the Principles of Knowledge Representation and Reasoning. 539--550.Google Scholar
- Savasere, A., Omiecinski, E., and Navathe, S. 1995. An efficient algorithm for mining association rules in large databases. In Proceedings of the International Conference on Very Large Data Bases (VLDB'95). 432--443. Google ScholarDigital Library
- Song, S., Hu, H., and Jin, S. 2005. HVSM: A new sequential pattern mining algorithm using bitmap representation. In Advanced Data Mining and Applications. Lecture Notes in Computer Science, vol. 3584, Springer, Berlin, 455--463. Google ScholarDigital Library
- Srikant, R. and Agrawal, R. 1996. Mining sequential patterns: Generalizations and performance improvements. In Proceedings of the 5th International Conference on Extending Database Technology: Advances in Database Technology. Leture Notes in Computer Science, vol. 1057, Springer, Berlin, 3--17. Google ScholarDigital Library
- Srivastava, J., Cooley, R., Deshpande, M., and Tan, P.-N. 2000. Web usage mining: Discovery and applications of usage patterns from Web data. ACM SIGKDD Explorations Newsl. 1, 2, 12--23. Google ScholarDigital Library
- Tanasa, D. 2005. Web usage mining: contributions to intersites logs preprocessing and sequential pattern extraction with low support. Ph.D. dissertation, Université De Nice Sophia-Antipolis.Google Scholar
- Toivonen, H. 1996. Sampling large databases for association rules. In Proceedings of the International Conference on Very Large Data Bases (VLDB'95). 134--145. Google ScholarDigital Library
- Wang, J. and Han, J. 2004. BIDE: Efficient mining of frequent closed sequences. In Proceedings of the 20th International Conference on Data Engineering. 79--90. Google ScholarDigital Library
- Yang, Z. and Kitsuregawa, M. 2005. LAPIN-SPAM: An improved algorithm for mining sequential pattern. In Proceedings of the 21st International Conference on Data Engineering Workshops. 1222. Google ScholarDigital Library
- Yang, Z., Wang, Y., and Kitsuregawa, M. 2005. LAPIN: Effective sequential pattern mining algorithms by last position induction. Tech. rep., Tokyo University. http://www.tkl.iis.u-tokyo.ac.jp/~yangzl/Document/LAPIN.pdf.Google Scholar
- Yang, Z., Wang, Y., and Kitsuregawa, M. 2006. An effective system for mining web log. In Proceedings of the 8th Asia-Pacific Web Conference (APWeb'06), 40--52. Google ScholarDigital Library
- Yang, Z., Wang, Y., and Kitsuregawa, M. 2007. LAPIN: Effective sequential pattern mining algorithms by last position induction for dense databases. In Advances in Databases: Concepts, Systems and Applications. Lecture Notes in Computer Science, vol. 4443, 1020--1023. Google ScholarDigital Library
- Zaki, M. J. 1998. Efficient enumeration of frequent sequences. In Proceedings of the 7th International Conference on Information and Knowledge Management. 68--75. Google ScholarDigital Library
- Zaki, M. J. 2000. Scalable algorithms for association mining. IEEE Trans. Knowl. Data Eng. 12, 3, 372--390. Google ScholarDigital Library
- Zaki, M. J. 2001. SPADE: An efficient algorithm for mining frequent sequences. Mach. Learn. 42, 31--60. Google ScholarDigital Library
- Zheng, T. 2004. WebFrame: In pursuit of computationally and cognitively efficient web mining. Ph.D. dissertation, Department of Computing Science. University of Alberta, Edmonton. Google ScholarDigital Library
- Zill, D. J. 1998. Calculus with Analytic Geometry 2nd ed. PWS-KENT.Google Scholar
Index Terms
- A taxonomy of sequential pattern mining algorithms
Recommendations
Sequential pattern mining algorithms review
From the beginning of sequential pattern mining to the present, this field has received important attention within the data mining area, because it has a wide application in several significant computational problems. Many algorithms have been created ...
Closed frequent similar pattern mining
The concept of closed frequent similar pattern mining is introduced.Several lemmas to prune the search space are introduced and proved.A novel closed frequent similar pattern mining algorithm (CFSP-Miner), is proposed.CFSP-Miner is more efficient than ...
Efficient mining of sequential patterns with time constraints by delimited pattern growth
An active research topic in data mining is the discovery of sequential patterns, which finds all frequent subsequences in a sequence database. The generalized sequential pattern (GSP) algorithm was proposed to solve the mining of sequential patterns ...
Comments