SPADE: An Efficient Algorithm for Mining Frequent Sequences

Zaki, Mohammed J.

doi:10.1023/A:1007652502315

SPADE: An Efficient Algorithm for Mining Frequent Sequences

Published: January 2001

Volume 42, pages 31–60, (2001)
Cite this article

Download PDF

Machine Learning Aims and scope Submit manuscript

SPADE: An Efficient Algorithm for Mining Frequent Sequences

Download PDF

Mohammed J. Zaki¹

14k Accesses
1219 Citations
9 Altmetric
1 Mention
Explore all metrics

Abstract

In this paper we present SPADE, a new algorithm for fast discovery of Sequential Patterns. The existing solutions to this problem make repeated database scans, and use complex hash structures which have poor locality. SPADE utilizes combinatorial properties to decompose the original problem into smaller sub-problems, that can be independently solved in main-memory using efficient lattice search techniques, and using simple join operations. All sequences are discovered in only three database scans. Experiments show that SPADE outperforms the best previous algorithm by a factor of two, and by an order of magnitude with some pre-processed data. It also has linear scalability with respect to the number of input-sequences, and a number of other database parameters. Finally, we discuss how the results of sequence mining can be applied in a real application domain.

References

Agrawal, R. & Srikant, R. (1995). Mining sequential patterns. In 11th Intl. Conf. on Data Engineering.
Agrawal, R., Mannila, H., Srikant, R., Toivonen, H., & Verkamo, A. I. (1996). Fast discovery of association rules. In U. Fayyad, et al. (ed.), Advances in knowledge discovery and data mining, pp. 307–328. Menlo Park, CA: AAAI Press.
Google Scholar
Davey, B. A. & Priestley, H. A. (1990). Introduction to lattices and order. Cambridge: Cambridge University Press.
Google Scholar
Ferguson, G. & James, A. (1998). TRIPS: An integrated intelligent problem-solving assistant. In 15th Nat. Conf. Artificial Intelligence.
Hatonen, K., Klemettinen, M., Mannila, H., Ronkainen, P., & Toivonen, H. (1996). Knowledge discovery from telecommunication network alarm databases. In 12th Intl. Conf. Data Engineering.
IBM. http://www.almaden.ibm.com/cs/quest/syndata.html. Quest Data Mining Project, IBM Almaden Research Center, San Jose, CA 95120.
Lesh, N., Martin, N., & Allen, J. (1998). Improving big plans. In 15th Nat. Conf. Artificial Intelligence.
Mannila, H., & Toivonen, H. (1996). Discovering generalized episodes using minimal occurences. In 2nd Intl. Conf. Knowledge Discovery and Data Mining.
Mannila, H., Toivonen, H., & Verkamo, I. (1995). Discovering frequent episodes in sequences. In 1st Intl. Conf. Knowledge Discovery and Data Mining.
Oates, T., Schmill, M. D., Jensen, D., & Cohen, P. R. (1997). A family of algorithms for finding temporal structure in data. In 6th Intl. Workshop on AI and Statistics.
Parthasarathy, S., Zaki, M. J., & Li, W.(1998). Memory placement techniques for parallel association mining. In 4th Intl. Conf. Knowledge Discovery and Data Mining.
Savasere, A., Omiecinski, E., & Navathe, S. (1995). An efficient algorithm for mining association rules in large databases. In 21st Intl. Conf. on Very Large Data Bases.
Srikant, R., & Agrawal, R. (1996). Mining sequential patterns: Generalizations and performance improvements. In 5th Intl. Conf. Extending Database Technology.
Zaki, M. J., Parthasarathy, S., Ogihara, M., & Li, W. (1997). New algorithms for fast discovery of association rules. In 3rd Intl. Conf. on Knowledge Discovery and Data Mining.
Zaki, M. J., Lesh, N., & Ogihara, M. (1998). PLANMINE: Sequence mining for plan failures. In 4th Intl. Conf. Knowledge Discovery and Data Mining.
Zaki, M. J. (1998). Efficient enumeration of frequent sequences. In 7th Intl. Conf. on Information and Knowledge Management.

Download references

Author information

Authors and Affiliations

Computer Science Department, Rensselaer Polytechnic Institute, Troy, NY, 12180-3590
Mohammed J. Zaki

Authors

Mohammed J. Zaki
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zaki, M.J. SPADE: An Efficient Algorithm for Mining Frequent Sequences. Machine Learning 42, 31–60 (2001). https://doi.org/10.1023/A:1007652502315

Download citation

Issue Date: January 2001
DOI: https://doi.org/10.1023/A:1007652502315

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

SPADE: An Efficient Algorithm for Mining Frequent Sequences

Abstract

Article PDF

Similar content being viewed by others

An efficient join operations for utility list-based high-utility mining approaches using hybrid search technique

A survey of density based clustering algorithms

A survey on Bayesian network structure learning from data

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Navigation

SPADE: An Efficient Algorithm for Mining Frequent Sequences

Abstract

Article PDF

Similar content being viewed by others

An efficient join operations for utility list-based high-utility mining approaches using hybrid search technique

A survey of density based clustering algorithms

A survey on Bayesian network structure learning from data

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Share this article

Search

Navigation