skip to main content
research-article

A taxonomy of sequential pattern mining algorithms

Published:03 December 2010Publication History
Skip Abstract Section

Abstract

Owing to important applications such as mining web page traversal sequences, many algorithms have been introduced in the area of sequential pattern mining over the last decade, most of which have also been modified to support concise representations like closed, maximal, incremental or hierarchical sequences. This article presents a taxonomy of sequential pattern-mining techniques in the literature with web usage mining as an application. This article investigates these algorithms by introducing a taxonomy for classifying sequential pattern-mining algorithms based on important key features supported by the techniques. This classification aims at enhancing understanding of sequential pattern-mining problems, current status of provided solutions, and direction of research in this area. This article also attempts to provide a comparative performance analysis of many of the key techniques and discusses theoretical aspects of the categories in the taxonomy.

References

  1. Agrawal, R., Imielinski, T., and Swami, A. 1993. Mining association rules between sets of items in large databases. In Proceedings of the ACM SIGMOD Conference on Management of Data. ACM, New York, 207--216. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Agrawal, R. and Srikant, R. 1994. Fast algorithms for mining association rules. In Proceedings of the1994 International Conference on Very Large Data Bases (VLDB'94). 487--499. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Agrawal, R. and Srikant, R. 1995. Mining sequential patterns. In Proceedings of the 11th Conference on Data Engineering (ICDE'95), 3--14. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Antunes, C. and Oliveira, A. L. 2004. Sequential pattern mining algorithms: Trade-offs between speed and memory. In Proceedings of the Workshop on Mining Graphs, Trees and Sequences (MGTS-ECML/PKDD '04).Google ScholarGoogle Scholar
  5. Ayres, J., Flannick, J., Gehrke, J., and Yiu, T. 2002. Sequential pattern mining using a bitmap representation. In Proceedings of the 8th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 429--435. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Brin, S., Motwani, R., Ullman, J. D., and Tsur, S. 1997. Dynamic itemset counting and implication rules for market basket analysis. In Proceedings of the ACM-SIGMOD International Conference on Management of Data (SIGMOD'97). ACM, New York, 255--264. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Chiu, D.-Y., Wu, Y.-H., and Chen, A. L. P. 2004. An efficient algorithm for mining frequent sequences by a new strategy without support counting. In Proceedings of the 20th International Conference on Data Engineering. 375--386. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Dave, B. A. and Priestley, H. A. 1990. Introduction to Lattices and Order. Cambridge University Press.Google ScholarGoogle Scholar
  9. Dunham, M. H. 2003.Data Mining: Introductory and Advanced Topics. Prentice Hall, Englewood Cliffs, NJ. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. El-Sayed, M., Ruiz, C., and Rundensteiner, E. A. 2004. FS-Miner: Efficient and incremental mining of frequent sequence patterns in web logs. In Proceedings of the 6th Annual ACM International Workshop on Web Information and Data Management. ACM, New York, 128--135. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Ezeife, C. I. and Lu, Y. 2005. Mining web log sequential patterns with position coded pre-order linked WAP-tree.Int. J. Data Mining Knowl. Discovery 10, 5--38. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Ezeife, C. I., Lu, Y., and Liu, Y. 2005. PLWAP sequential mining: Open source code. In Proceedings of the 1st International Workshop on Open Source Data Mining: Frequent Pattern Mining Implementation (SIGKDD), ACM, New York, 26--35. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Facca, F. M. and Lanzi, P. L. 2003. Recent developments in web usage mining research. In Proceedings of the 5th International Conference on Data Warehousing and Knowledge Discovery. (DaWaK'03), Lecture Notes in Computer Science, Springer, Berlin.Google ScholarGoogle Scholar
  14. Facca, F. M. And Lanzi, P. L. 2005. Mining interesting knowledge from weblogs: A survey.Data Knowl. Eng. 53, 3 225--241. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Goethals, B. 2005. Frequent set mining. In The Data Mining and Knowledge Discovery Handbook, O. Maimon and L. Rokach Eds., Springer, Berlin, 377--397.Google ScholarGoogle Scholar
  16. Han, J. and Fu, Y. 1995. Discovery of multiple-level association rules from large databases. In Proceedings of the International Conference on Very Large Data Bases (VLDB'95). 420--431. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Han, J., Pei, J., Mortazavi-Asl, B., Chen, Q., Dayal, U., and Hsu, M.-C. 2000. Freespan: Frequent pattern-projected sequential pattern mining. In Proceedings of the 6th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, New York, 355--359. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Han, J., Pei, J., and Yin, Y. 2000. Mining frequent patterns without candidate generation. In Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD'00), ACM, New York, 1--12. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Huang, J.-W., Tseng, C.-Y., Ou, J.-C., and Chen, M.-S. 2006. On progressive sequential pattern mining. In Proceedings of the 15th ACM International Conference on Information and Knowledge Management. ACM, New York, 850--851. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Iváncsy, R. and Vajk, I. 2006. Frequent pattern mining in web log data. Acta Polytech. Hungarica 3, 1, 77--90.Google ScholarGoogle Scholar
  21. Jin, X. 2006. Task-oriented modeling for the discovery of web user navigational patterns. Ph.D. dissertation, School of Computer Science. DePaul University, Chicago, IL. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Liu, B. 2007. Web Data Mining: Exploring Hyperlinks, Contents, and Usage Data. Springer, Berlin. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Lu, Y. and Ezeife, C. I. 2003. Position coded pre-order linked WAP-tree for web log sequential pattern mining. In Proceedings of the 7th Pacific-Asia Conference on Knowledge Discovery and Data Mining. Lecture Notes in Computer Science, Springer, Berlin, 337--349. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Masseglia, F., Poncelet, O., and Cicchetti, R. 1999. An efficient algorithm for web usage mining. Network Inform. Syst. J. 2, 571--603.Google ScholarGoogle Scholar
  25. Masseglia, F., Teisseire, M., and Poncelet, P. 2005. Sequential pattern mining: A survey on issues and approaches. In Encyclopedia of Data Warehousing and Mining,1--14.Google ScholarGoogle Scholar
  26. Nandi, A. and Jagadish, H.V. 2007. Effective phrase prediction. In Proceedings of the International Conference on Very Large Data Bases (VLDB'07). 219--230. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Nanopoulos, A. and Manolopoulos, Y. 2000. Finding generalized path patterns for web log data mining. In Proceedings of the East-European Conference on Advances in Databases and Information Systems. (Held jointly with the International Conference on Database Systems for Advanced Applications: Current Issues in Databases and Information Systems, 215--228. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Park, J. S., Chen, M. S., and Yu, P. S. 1995. An effective hash-based algorithm for mining association rules. In Proceedings of the 1995 ACM-SIGMOD International Conference on Management of Data (SIGMOD'95). ACM, New York, 175--186. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Parthasarathy, S., Zaki, M.J., Ogihara, M., and Dwarkadas, S. 1999. Incremental and interactive sequence mining. In Proceedings of the 8th International Conference on Information and Knowledge Management. ACM, New York, 251--258. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Pei, J., Han, J., Mortazavi-Asl, B., and Pinto, H. 2001. PrefixSpan: Mining sequential patterns efficiently by prefix-projected pattern growth. In Proceedings of the International Conference on Data Engineering. 215--224. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Pei, J., Han, J., Mortazavi-Asl, B., and Zhu, H. 2000. Mining access patterns efficiently from web logs. In Knowledge Discovery and Data Mining. Current Issues and New Applications. Lecture Notes Computer Science, vol. 1805, Springer, Berlin, 396--407. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Rymon, R. 1992. Search through systematic set enumeration. In Proceedings of the 3rd International Conference. on the Principles of Knowledge Representation and Reasoning. 539--550.Google ScholarGoogle Scholar
  33. Savasere, A., Omiecinski, E., and Navathe, S. 1995. An efficient algorithm for mining association rules in large databases. In Proceedings of the International Conference on Very Large Data Bases (VLDB'95). 432--443. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Song, S., Hu, H., and Jin, S. 2005. HVSM: A new sequential pattern mining algorithm using bitmap representation. In Advanced Data Mining and Applications. Lecture Notes in Computer Science, vol. 3584, Springer, Berlin, 455--463. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Srikant, R. and Agrawal, R. 1996. Mining sequential patterns: Generalizations and performance improvements. In Proceedings of the 5th International Conference on Extending Database Technology: Advances in Database Technology. Leture Notes in Computer Science, vol. 1057, Springer, Berlin, 3--17. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Srivastava, J., Cooley, R., Deshpande, M., and Tan, P.-N. 2000. Web usage mining: Discovery and applications of usage patterns from Web data. ACM SIGKDD Explorations Newsl. 1, 2, 12--23. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Tanasa, D. 2005. Web usage mining: contributions to intersites logs preprocessing and sequential pattern extraction with low support. Ph.D. dissertation, Université De Nice Sophia-Antipolis.Google ScholarGoogle Scholar
  38. Toivonen, H. 1996. Sampling large databases for association rules. In Proceedings of the International Conference on Very Large Data Bases (VLDB'95). 134--145. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Wang, J. and Han, J. 2004. BIDE: Efficient mining of frequent closed sequences. In Proceedings of the 20th International Conference on Data Engineering. 79--90. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Yang, Z. and Kitsuregawa, M. 2005. LAPIN-SPAM: An improved algorithm for mining sequential pattern. In Proceedings of the 21st International Conference on Data Engineering Workshops. 1222. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Yang, Z., Wang, Y., and Kitsuregawa, M. 2005. LAPIN: Effective sequential pattern mining algorithms by last position induction. Tech. rep., Tokyo University. http://www.tkl.iis.u-tokyo.ac.jp/~yangzl/Document/LAPIN.pdf.Google ScholarGoogle Scholar
  42. Yang, Z., Wang, Y., and Kitsuregawa, M. 2006. An effective system for mining web log. In Proceedings of the 8th Asia-Pacific Web Conference (APWeb'06), 40--52. Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. Yang, Z., Wang, Y., and Kitsuregawa, M. 2007. LAPIN: Effective sequential pattern mining algorithms by last position induction for dense databases. In Advances in Databases: Concepts, Systems and Applications. Lecture Notes in Computer Science, vol. 4443, 1020--1023. Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. Zaki, M. J. 1998. Efficient enumeration of frequent sequences. In Proceedings of the 7th International Conference on Information and Knowledge Management. 68--75. Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. Zaki, M. J. 2000. Scalable algorithms for association mining. IEEE Trans. Knowl. Data Eng. 12, 3, 372--390. Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. Zaki, M. J. 2001. SPADE: An efficient algorithm for mining frequent sequences. Mach. Learn. 42, 31--60. Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. Zheng, T. 2004. WebFrame: In pursuit of computationally and cognitively efficient web mining. Ph.D. dissertation, Department of Computing Science. University of Alberta, Edmonton. Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. Zill, D. J. 1998. Calculus with Analytic Geometry 2nd ed. PWS-KENT.Google ScholarGoogle Scholar

Index Terms

  1. A taxonomy of sequential pattern mining algorithms

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in

        Full Access

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader