Efficient Mining of Frequent Closed XML Query Pattern

Feng, Jian-Hua; Qian, Qian; Wang, Jian-Yong; Zhou, Li-Zhu

doi:10.1007/s11390-007-9081-z

Efficient Mining of Frequent Closed XML Query Pattern

Regular Paper
Published: 25 September 2007

Volume 22, pages 725–735, (2007)
Cite this article

Journal of Computer Science and Technology Aims and scope Submit manuscript

Jian-Hua Feng¹,
Qian Qian¹,
Jian-Yong Wang¹ &
…
Li-Zhu Zhou¹

30 Accesses
3 Citations
Explore all metrics

Abstract

Previous research works have presented convincing arguments that a frequent pattern mining algorithm should not mine all frequent but only the closed ones because the latter leads to not only more compact yet complete result set but also better efficiency. Upon discovery of frequent closed XML query patterns, indexing and caching can be effectively adopted for query performance enhancement. Most of the previous algorithms for finding frequent patterns basically introduced a straightforward generate-and-test strategy. In this paper, we present SOLARIA*, an efficient algorithm for mining frequent closed XML query patterns without candidate maintenance and costly tree-containment checking. Efficient algorithm of sequence mining is involved in discovering frequent tree-structured patterns, which aims at replacing expensive containment testing with cheap parent-child checking in sequences. SOLARIA* deeply prunes unrelated search space for frequent pattern enumeration by parent-child relationship constraint. By a thorough experimental study on various real-life data, we demonstrate the efficiency and scalability of SOLARIA* over the previous known alternative. SOLARIA* is also linearly scalable in terms of XML queries’ size.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

The pattern frequency distribution theory: a mathematic establishment toward rational and reliable pattern mining

Article 20 August 2022

Dataset search: a survey

Article Open access 24 August 2019

Knowledge Graph Identification

References

Chen Q, Lim A et al. D(k)-index: An adaptive structural summary for graph-structured data. In Proc. the ACM SIGMOD Int. Conf. Management of Data, San Diego, CA, USA, Jun. 9–12, 2003, pp.134–144.
Kaushik R, Shenoy P et al. Exploiting local similarity for efficient indexing of paths in graph structured data. In Proc. the 18th Int. Conf. Data Engineering, San Jose, CA, USA, Feb. 26–Mar. 1, 2002, pp.129–140.
Milo T, Suciu D. Index structures for path expressions. In Proc. the 7th Int. Conf. Database Theory, Jerusalem, Israel, Jan. 10–12, 1999, pp.277–295.
Yang L H, Lee M L et al. Efficient mining of XML query patterns for caching. In Proc. the 29th Int. Conf. Very Large Data Bases, Berlin, Germany, Sept. 9–12, 2003, pp.69–80.
Yan X, Han J et al. Mining closed sequential patterns in large databases. In Proc. the 3rd SIAM Int. Conf. Data Mining, San Francisco, CA, USA, May 1–3, 2003, Electronic Edition.
Dehaspe L, Toivonen H et al. Finding frequent substructures in chemical compounds. In Proc. 4th Int. Conf. Knowledge Discovery and Data Mining, New York, USA, Aug. 27–31, 1998, pp.30–36.
Bettini C, Wang X et al. Mining temporal relationals with multiple granularities in time sequences. IEEE Data Engineering Bulletin, 1998, 21(1): 32–38.
Google Scholar
Pei J, Han J et al. PrefixSpan: Mining sequential patterns efficiently by prefix-projected pattern growth. In Proc. the 18th Int. Conf. Data Engineering, Heidelberg, Germany, April 2–6, 2001, pp.215–224.
Feng J, Qian Q et al. Exploit sequencing to accelerate hot XML query pattern mining. In Proc. the 2006 ACM Symp. Applied Computing, Dijon, France, Apr. 23–27, 2006, pp.517–524.
Qian Q, Feng J et al. Exploit sequencing to accelerate XML twig query answering. In Proc. the 11th Int. Conf. Database Systems for Advanced Applications, Singapore, Apr. 12–15, 2006, pp.279–294.
Wang J, Han J. BIDE: Efficient mining of frequent closed sequences. In Proc. the 20th Int. Conf. Data Engineering, Boston, MA, USA, Mar. 30–Apr. 2, 2004, pp.79–90.
Kuramochi M, Karypis G. Frequent subgraph discovery. In Proc. the 1st IEEE Int. Conf. Data Mining, San Jose, CA, USA, Nov. 29–Dec. 2, 2001, pp.313–320.
Agrawal R, Srikant R. Fast algorithms for mining association rules. In Proc. the 20th Int. Conf. Very Large Data Bases, Santiago de Chile, Chile, Sept. 12–15, 1994, pp.487–499.
Zaki M. Efficiently mining frequent trees in a forest. In Proc. the 8th ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, Edmonton, Alberta, Canada, Jul. 23–26, 2002, pp.71–80.
Asai T, Abe K et al. Efficient substructure discovery from large semi-structured data. In Proc. the 2nd SIAM Int. Conf. Data Mining, Arlington, VA, USA, Apr. 11–13, 2002, Electronic Edition.
Termier A, Rousset M C et al. TreeFinder: A first step towards XML data mining. In Proc. the 2nd IEEE Int. Conf. Data Mining, Maebashi, Japan, Dec. 9–12, 2002, pp.450–457.
Han J, Pei J et al. FreeSpan: Frequent pattern-projected sequential pattern mining. In Proc. the 6th ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, Boston, MA, USA, Aug. 20–23, 2000, pp.355–359.
Masseglia F, Cathala F et al. The PSP approach for mining sequential patterns. In Proc. the 2nd European Symp. Principles of Data Mining and Knowledge Discovery, Nantes, France, Sept. 23–26, 1998, pp.176–184.
Srikant R, Agrawal R. Mining sequential patterns: Generalizations and performance improvements. In Proc. the 5th Int. Conf. Extending Database Technology, Avignon, France, Mar. 25–29, 1996, pp.3–17.
Ozden B, Ramaswamy S et al. Cyclic association rules. In Proc. the 14th Int. Conf. Data Engineering, Orlando, Florida, USA, Feb. 23–27, 1998, pp.412–421.
Han J, Dong G et al. Efficient mining of partial periodic patterns in time series database. In Proc. the 18th Int. Conf. Data Engineering, Sydney, Australia, Mar. 23–26, 1999, pp.106–115.
Yang J, Yu P S et al. Mining long sequential patterns in a noisy environment. In Proc. 2003 ACM SIGMOD Int. Conf. Management of Data, Madison, WI, USA, Jun. 3–6, 2002, pp.406–417.
Chi Y, Xia Y et al. Mining closed and maximal frequent subtrees from databases of labeled rooted trees. IEEE Trans. Knowledge and Data Engineering, 2005, 17(2): 190–202.
Article Google Scholar
Berglund A, Boag S et al. XML path language (XPath) 2.0, W3C Candidate Recommendation, June, 2006, http://www.w3.org/TR/xpath20/.
Boag S, Chamberlin D et al. XQuery 1.0: An XML query language. W3C Candidate Recommendation, June, 2006, http://www.w3.org/TR/xquery.
Raw P R, Moon B. PRIX: Indexing and querying XML using Prufer sequences. In Proc. the 20th Int. Conf. Data Engineering, Boston, MA, USA, Mar. 30–Apr. 2, 2004, pp.288–300.
Picciotto S. How to encode a tree [Dissertation]. University of California, San Diego, USA, 1999.
Yang L, Lee M L et al. Mining frequent query patterns from XML queries. In Proc. the 8th Int. Conf. Database Systems for Advanced Applications, Kyoto, Japan, Mar. 26–28, 2003, pp.355–362.

Download references

Author information

Authors and Affiliations

Department of Computer Science and Technology, Tsinghua University, Beijing, 100084, China
Jian-Hua Feng, Qian Qian, Jian-Yong Wang & Li-Zhu Zhou

Authors

Jian-Hua Feng
View author publications
You can also search for this author in PubMed Google Scholar
Qian Qian
View author publications
You can also search for this author in PubMed Google Scholar
Jian-Yong Wang
View author publications
You can also search for this author in PubMed Google Scholar
Li-Zhu Zhou
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jian-Hua Feng.

Additional information

This work is supported in part by the National Natural Science Foundation of China under Grant No. 60573094, the National Grand Fundamental Research 973 Program of China under Grant No. 2006CB303103, the National High Technology Development 863 Program of China under Grant No. 2006AA01A101, and Tsinghua Basic Research Foundation under Grant No. JCqn2005022.

Electronic Supplementary Material

Supplementary material - Chinese Abstract (PDF 83 Kb).

Rights and permissions

Reprints and permissions

About this article

Cite this article

Feng, JH., Qian, Q., Wang, JY. et al. Efficient Mining of Frequent Closed XML Query Pattern. J Comput Sci Technol 22, 725–735 (2007). https://doi.org/10.1007/s11390-007-9081-z

Download citation

Received: 12 October 2006
Revised: 03 April 2007
Published: 25 September 2007
Issue Date: September 2007
DOI: https://doi.org/10.1007/s11390-007-9081-z

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Efficient Mining of Frequent Closed XML Query Pattern

Abstract

Access this article

Similar content being viewed by others

The pattern frequency distribution theory: a mathematic establishment toward rational and reliable pattern mining

Dataset search: a survey

Knowledge Graph Identification

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Electronic Supplementary Material

Supplementary material - Chinese Abstract (PDF 83 Kb).

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Efficient Mining of Frequent Closed XML Query Pattern

Abstract

Access this article

Similar content being viewed by others

The pattern frequency distribution theory: a mathematic establishment toward rational and reliable pattern mining

Dataset search: a survey

Knowledge Graph Identification

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Electronic Supplementary Material

Supplementary material - Chinese Abstract (PDF 83 Kb).

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation