Abstract
In this paper we present SPADE, a new algorithm for fast discovery of Sequential Patterns. The existing solutions to this problem make repeated database scans, and use complex hash structures which have poor locality. SPADE utilizes combinatorial properties to decompose the original problem into smaller sub-problems, that can be independently solved in main-memory using efficient lattice search techniques, and using simple join operations. All sequences are discovered in only three database scans. Experiments show that SPADE outperforms the best previous algorithm by a factor of two, and by an order of magnitude with some pre-processed data. It also has linear scalability with respect to the number of input-sequences, and a number of other database parameters. Finally, we discuss how the results of sequence mining can be applied in a real application domain.
Article PDF
Similar content being viewed by others
References
Agrawal, R. & Srikant, R. (1995). Mining sequential patterns. In 11th Intl. Conf. on Data Engineering.
Agrawal, R., Mannila, H., Srikant, R., Toivonen, H., & Verkamo, A. I. (1996). Fast discovery of association rules. In U. Fayyad, et al. (ed.), Advances in knowledge discovery and data mining, pp. 307–328. Menlo Park, CA: AAAI Press.
Davey, B. A. & Priestley, H. A. (1990). Introduction to lattices and order. Cambridge: Cambridge University Press.
Ferguson, G. & James, A. (1998). TRIPS: An integrated intelligent problem-solving assistant. In 15th Nat. Conf. Artificial Intelligence.
Hatonen, K., Klemettinen, M., Mannila, H., Ronkainen, P., & Toivonen, H. (1996). Knowledge discovery from telecommunication network alarm databases. In 12th Intl. Conf. Data Engineering.
IBM. http://www.almaden.ibm.com/cs/quest/syndata.html. Quest Data Mining Project, IBM Almaden Research Center, San Jose, CA 95120.
Lesh, N., Martin, N., & Allen, J. (1998). Improving big plans. In 15th Nat. Conf. Artificial Intelligence.
Mannila, H., & Toivonen, H. (1996). Discovering generalized episodes using minimal occurences. In 2nd Intl. Conf. Knowledge Discovery and Data Mining.
Mannila, H., Toivonen, H., & Verkamo, I. (1995). Discovering frequent episodes in sequences. In 1st Intl. Conf. Knowledge Discovery and Data Mining.
Oates, T., Schmill, M. D., Jensen, D., & Cohen, P. R. (1997). A family of algorithms for finding temporal structure in data. In 6th Intl. Workshop on AI and Statistics.
Parthasarathy, S., Zaki, M. J., & Li, W.(1998). Memory placement techniques for parallel association mining. In 4th Intl. Conf. Knowledge Discovery and Data Mining.
Savasere, A., Omiecinski, E., & Navathe, S. (1995). An efficient algorithm for mining association rules in large databases. In 21st Intl. Conf. on Very Large Data Bases.
Srikant, R., & Agrawal, R. (1996). Mining sequential patterns: Generalizations and performance improvements. In 5th Intl. Conf. Extending Database Technology.
Zaki, M. J., Parthasarathy, S., Ogihara, M., & Li, W. (1997). New algorithms for fast discovery of association rules. In 3rd Intl. Conf. on Knowledge Discovery and Data Mining.
Zaki, M. J., Lesh, N., & Ogihara, M. (1998). PLANMINE: Sequence mining for plan failures. In 4th Intl. Conf. Knowledge Discovery and Data Mining.
Zaki, M. J. (1998). Efficient enumeration of frequent sequences. In 7th Intl. Conf. on Information and Knowledge Management.
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Zaki, M.J. SPADE: An Efficient Algorithm for Mining Frequent Sequences. Machine Learning 42, 31–60 (2001). https://doi.org/10.1023/A:1007652502315
Issue Date:
DOI: https://doi.org/10.1023/A:1007652502315