Sequential Pattern Mining in Multi-Databases via Multiple Alignment

Kum, Hye-Chung; Chang, Joong Hyuk; Wang, Wei

doi:10.1007/s10618-005-0017-3

Sequential Pattern Mining in Multi-Databases via Multiple Alignment

Published: 20 April 2006

Volume 12, pages 151–180, (2006)
Cite this article

Data Mining and Knowledge Discovery Aims and scope Submit manuscript

Hye-Chung Kum¹,
Joong Hyuk Chang² &
Wei Wang¹

443 Accesses
31 Citations
Explore all metrics

Abstract

To efficiently find global patterns from a multi-database, information in each local database must first be mined and summarized at the local level. Then only the summarized information is forwarded to the global mining process. However, conventional sequential pattern mining methods based on support cannot summarize the local information and is ineffective for global pattern mining from multiple data sources. In this paper, we present an alternative local mining approach for finding sequential patterns in the local databases of a multi-database. We propose the theme of approximate sequential pattern mining roughly defined as identifying patterns approximately shared by many sequences. Approximate sequential patterns can effectively summerize and represent the local databases by identifying the underlying trends in the data. We present a novel algorithm, ApproxMAP, to mine approximate sequential patterns, called consensus patterns, from large sequence databases in two steps. First, sequences are clustered by similarity. Then, consensus patterns are mined directly from each cluster through multiple alignment. We conduct an extensive and systematic performance study over synthetic and real data. The results demonstrate that ApproxMAP is effective and scalable in mining large sequences databases with long patterns. Hence, ApproxMAP can efficiently summarize a local database and reduce the cost for global mining. Furthremore, we present an elegant and uniform model to identify both high vote sequential patterns and exceptional sequential patterns from the collection of these consensus patterns from each local databases.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Agrawal, R. and Srikant, R. 1995. Mining sequential patterns. In Proc. of International Conference on Data Engineering (ICDE), Taipei, Taiwan, pp. 3–14.
Ayres, J., Flannick, J., Gehrke, J., and Yiu, T. 2002. Sequential pattern mining using a bitmap representation. In Proceedings of the ACM International Conference on Knowledge discovery and data mining (SIGKDD), pp. 429–435.
Ertoz, L., Steinbach, M., and Kumar, V. 2003. Finding clusters of different sizes, shapes, and densities in noisy, high dimensional data. In Third SIAM International Conference on Data Mining(SDM), San Fransico. CA, pp. 47–58.
Fukunaga, K.K. and Narendra, P.M. 1975. A branch and bound algorithm for computing k-nearest neighbours. IEEE Transactions on Computers, 24: 750–753.
Google Scholar
Gotoh, O. 1999. Multiple sequence alignment: Algorithms and applications. Adv. Biophys., 36: 159–206.
Google Scholar
Gusfield, D. 1997. Algorithms on strings, trees, and sequences: Computer science and computational biology. Cambridge Univ. Press, Cambridge, England.
Google Scholar
Jain, A.K., Murty, M.N., and Flynn, P.J. 1999. Data clustering: A review. ACM Computing Surveys, 31(3): 264–323.
Google Scholar
Kohavi, R., Brodley, C., Frasca, B., Mason, L., and Zheng, Z. 2000. KDD-CUP 2000 Organizers’ Report: Peeling the Onion. In Proc. SIGKDD Explorations, 2: 86–98.
Google Scholar
Kum, H.C., Pei, J., Wang, W., and Duncan, D. 2003. ApproxMAP : Approximate mining of consensus sequential patterns. In Third SIAM International Conference on Data Mining(SDM), San Fransico, CA, pp. 311–315.
Kum, H.C., Paulsen, S., and Wang, W. 2005. Comparitive Study of Sequential Pattern Mining Models. Studies in Computational Intelligence: Foundations of Data Mining and Knowledge Discovery, Vol. 6, Springer, pp. 45–71.
Kum, H.C., Paulsen, S., and Wang, W. 2005. Comparitive Study of Sequential Pattern Mining Models. Studies in Computational Intelligence: Foundations of Data Mining and Knowledge Discovery, Vol. 6, Springer, pp. 45–71.
McPherson, G.R. and DeStefano, S. 2002. Applied Ecology and Natural Resource Management. Cambridge Univ. Press, Cambridge, England.
Google Scholar
Pei, J., Han, J., Mortazavi-Asl, B., Pinto, H., Chen, Q., Dayal, U., and Hsu, M.C. 2001. PrefixSpan: Mining sequential patterns efficiently by prefix-projected pattern growth. In Proc. of International Conference on Data Engineering (ICDE), pp. 215–224.
Sander, J., Ester, M., Kriegel, H.P., and Xu, X. 1998. Density based clustering in spatial databases: The algorithm gdbscan and its applications. Data Mining and Knowledge Discovery, 2(2): 169–194.
Google Scholar
Sas Institute. 2000. Proc Modeclust. In SAS/STAT User Guide: Sas online Document.
Spiliopoulou, M. 1999. Managing interesting rules in sequence mining. In Proc. European Conf. on Principles and Practice of Knowledge Discovery in Databases, pp. 554–560.
Srikant, R. and Agrawal, R. 1996. Mining sequential patterns: Generalizations and performance improvements. In Proc. 6th Intl. Conf Extending Database Technology (EDBT), pp. 3–17.
Thompson, J., Plewniak, F., and Poch, O. 1999. A comprehensive comparison of multiple sequence alignment programs. Oxford Univ. Press. Nucleic Acids Research, 27(13): 2682–2690.
Wong, M.A. and Lane, T. 1983. A kth nearest neighbor clustering procedure. Journal of the Royal Statistical Society, Series B, 45: 362–368.
Google Scholar
Wu, X. and Zhang, S. 2003. Synthesizing high-frequency rules from different data sources. IEEE Trans. Knowledge Data Engineering, 15(2): 353–367.
Google Scholar
Wu, X., Zhang, C., and Zhang, S. 2005. Database classification for multi-database mining. Information System, 30(1): 71–88.
Google Scholar
Yan, X., Han, J., and Afshar, R. 2003. CloSpan: Mining closed sequential patterns in larege datasets. In Third SIAM International Conference on Data Mining (SDM), San Fransico, CA, pp. 166–177.
Yang, J., Yu, P.S., Wang, W., and Han, J. 2002. Mining long sequential patterns in a noisy environment. In Proc. of ACM Int’l Conf. On Management of Data (SIGMOD), Madison, WI, pp. 406–417.
Zaki, M.J. 1998. Efficient enumeration of frequent sequences. In 7th International Conference Information and Knowledge Management, pp. 68–75.
Zhang, C., Liu, M., Nie,W., and Zhang, S. 2004a. Identifying global exceptional patterns in multi-database mining. IEEE Computational Intelligence Bulletin, 3(1): 19–24.
Google Scholar
Zhang, S., Wu, X., and Zhang, C. 2003. Multi-Database Mining. IEEE Computational Intelligence Bulletin, 2(1): 5–13.
Google Scholar
Zhang, S., Zhang, C., and Yu, J.X. 2004b. An efficient strategy for mining exceptions in multi-databases. Information System, 165(1–2): 1–20.
Google Scholar
Zhong, N., Yao, Y., and Ohsuga, S. 1999. Peculiarity oriented multi-database mining. In Proceedings of PKDD, pp. 136–146.

Download references

Author information

Authors and Affiliations

Department of Computer Science, University of North Carolina at Chapel Hill, Chapel Hill, U.S.A.
Hye-Chung Kum & Wei Wang
Department of Computer Science, Yonsei University, Seoul, 120-749, Korea
Joong Hyuk Chang

Authors

Hye-Chung Kum
View author publications
You can also search for this author in PubMed Google Scholar
Joong Hyuk Chang
View author publications
You can also search for this author in PubMed Google Scholar
Wei Wang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Hye-Chung Kum.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Kum, HC., Chang, J.H. & Wang, W. Sequential Pattern Mining in Multi-Databases via Multiple Alignment. Data Min Knowl Disc 12, 151–180 (2006). https://doi.org/10.1007/s10618-005-0017-3

Download citation

Received: 05 April 2005
Accepted: 29 August 2005
Published: 20 April 2006
Issue Date: May 2006
DOI: https://doi.org/10.1007/s10618-005-0017-3

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Sequential Pattern Mining in Multi-Databases via Multiple Alignment

Abstract

Access this article

Similar content being viewed by others

WS-Miner: A Fast Weighted Sequential Pattern Mining Algorithm

A Novel Sequential Pattern Mining Algorithm for Large Scale Data Sequences

Scale Invariant Multi-length Motif Discovery

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Sequential Pattern Mining in Multi-Databases via Multiple Alignment

Abstract

Access this article

Similar content being viewed by others

WS-Miner: A Fast Weighted Sequential Pattern Mining Algorithm

A Novel Sequential Pattern Mining Algorithm for Large Scale Data Sequences

Scale Invariant Multi-length Motif Discovery

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation