Online mining of frequent sets in data streams with error guarantee

Dang, Xuan Hong; Ng, Wee-Keong; Ong, Kok-Leong

doi:10.1007/s10115-007-0106-2

Online mining of frequent sets in data streams with error guarantee

Regular Paper
Published: 22 September 2007

Volume 16, pages 245–258, (2008)
Cite this article

Knowledge and Information Systems Aims and scope Submit manuscript

Xuan Hong Dang¹,
Wee-Keong Ng¹ &
Kok-Leong Ong²

121 Accesses
16 Citations
Explore all metrics

Abstract

For most data stream applications, the volume of data is too huge to be stored in permanent devices or to be thoroughly scanned more than once. It is hence recognized that approximate answers are usually sufficient, where a good approximation obtained in a timely manner is often better than the exact answer that is delayed beyond the window of opportunity. Unfortunately, this is not the case for mining frequent patterns over data streams where algorithms capable of online processing data streams do not conform strictly to a precise error guarantee. Since the quality of approximate answers is as important as their timely delivery, it is necessary to design algorithms to meet both criteria at the same time. In this paper, we propose an algorithm that allows online processing of streaming data and yet guaranteeing the support error of frequent patterns strictly within a user-specified threshold. Our theoretical and experimental studies show that our algorithm is an effective and reliable method for finding frequent sets in data stream environments when both constraints need to be satisfied.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Agrawal R, Srikant R (1994) Fast algorithms for mining association rules in large databases. In: Jorge B, Matthias J, Carlo Z (eds) Proceedings of the 20th international conference on very large databases, San Francisco, CA, USA, pp 489–499
Babcock B, Datar M, Motwani R (2002) Sampling from a moving window over streaming data. In: Eppstein D (ed) Proceedings of the thirteenth annual ACM-SIAM symposium on discrete algorithms, San Francisco, CA, USA, pp 633–634
Babcock B, Datar M, Babcock B, Babu S, Datar M, Motwani R, Widom J (2002) Models and issues in data stream systems. In: Popa L (ed) Proceedings of the twenty-first ACM SIGACT-SIGMOD-SIGART symposium on principles of database systems, Madison, WT, USA, pp 1–16
Chang JH, Lee WS (2003) EstWin: adaptively monitoring recent change of frequent itemsets over online data streams. In: Proceedings of the 2003 ACM CIKM international conference on information and knowledge management, New Orleans, LA, USA, pp 536–539
Chang JH, Lee WS (2003) Finding recent frequent itemsets adaptively over online data streams. In: Getoor L, Senator TE, Domingos P, Faloutsos C (eds) Proceedings of the ninth ACM SIGKDD international conference on knowledge discovery and data mining, Washington, DC, USA, pp 487–492
Charikar M, Chen K, Farach-Colton M (2002) Finding frequent items in data streams. In: Widmayer P, Ruiz FT, Bueno RM, Hennessy M, Eidenbenz S, Conejo R (eds) Proceedings of 29th international colloquium on automata, languages and programming, Malaga, Spain, pp 693–703
Cormode C, Muthukrishnan S (2005) What’s hot and what’s not: tracking most frequent items dynamically. ACM Trans Database Syst (TODS) 30(1), 249–278
Google Scholar
Cortes C, Fisher K, Pregibon D, Rogers A (2000) Hancock: a language for extracting signatures from data streams. In: Ramakrishnan R, Stolfo S (eds) Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining, Boston, MA, USA, pp 9–17
Dang XH, Ng WK, Ong KL (2006) Adaptive load shedding for mining frequent patterns from data streams. In: Tjoa AM, Trujillo J (eds) Proceedings of 8th international conference on data warehousing and knowledge discovery, Krakow, Poland
Domingos P, Hulten G (2001) Catching up with the data: research issues in mining data streams. In: Bayardo R, Gehrke J (eds) Research issues on data mining and knowledge discovery, Santa Barbara, CA, USA
Garofalakis M, Gehrke J, Rastogi R (2002) Querying and mining data streams: you only get one look a tutorial. In: Franklin MJ, Moon B, Ailamaki A (eds) Proceedings of the 2002 ACM SIGMOD international conference on management of data, Madison, WI, USA, p 635
Giannella C, Han J, Pei J, Yan X, Yu PS (2003) Mining frequent patterns in data streams at multiple time granularities. In: Kargupta H, Joshi A, Sivakumar K, Yesha Y (eds) Next generation data mining, AAAI/MIT
Hellerstein JM, Haas PJ, Wang HJ (1997) Online aggregation. In: Peckham J (eds) Proceedings ACM SIGMOD international conference on management of data, Tucson, AZ, USA, pp 171–182
Hidber C (1999) Online association rule mining. In: Delis A, Faloutsos C, Ghandeharizadeh S (eds) Proceedings ACM SIGMOD international conference on management of data, Philadelphia, PA, USA, pp 145–156
Lambert D, Pinheiro JC (2001) Mining a stream of transactions for customer patterns. In: Provost F, Srikant R (eds) Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining, San Francisco, CA, USA, pp 305–310
Madden S, Franklin MJ (2002) Fjording the stream: an architecture for queries over streaming sensor data. In: Agrawal R, Dittrich K, Anne HH (eds) Proceedings of the 18th international conference on data engineering, San Jose, CA, USA, pp 555–566
Manku GS, Motwani R (2002) Approximate frequency counts over data streams. In: Proceedings of 28th international conference on very large databases, Hong Kong, China, pp 346–357
Metwally A, Agrawal D, Abbadi AE (2005) Using association rules for fraud detection in web advertising networks. In: Böhm K, Jensen CS, Haas LM, Kersten ML, Larson PA, Ooi BC (eds) Proceedings of 31st international conference on very large databases, Trondheim, Norway, pp 169–180
Tatbul N, Çetintemel U, Zdonik SB, Cherniack M, Stonebraker M (2003) Load shedding in a data stream manager. In: Johann CF, Peter CL, Serge A, Michael JC, Patricia GS, Andreas H (eds) Proceedings of the 29th international conference on very large databases, Berlin, Germany, pp 309–320
Tatbul N, Zdonik SB (2006) Window-aware load shedding for aggregation queries over data streams. In: Dayal U, Whang KW, Lomet DB, Alonso G, Lohman GM, Kersten ML, Cha SK, Kim YK (eds) Proceedings of the 32nd international conference on very large databases, Seoul, Korea, pp 799–810
Teng WG, Chen MS, Yu PS (2003) A regression-based temporal pattern mining scheme for data streams. In: Johann CF, Peter CL, Serge A, Michael JC, Patricia GS, Andreas H (eds) Proceedings of the 29th international conference on very large databases, Berlin, Germany, pp 93–104
Yang J (2003) Dynamic clustering of evolving streams with a single pass. In: Dayal U, Ramamritham K, Vijayaraman TM (eds) Proceedings of the 19th international conference on data engineering, Bangalore, India, pp 695–697
Yu JX, Chong Z, Lu H (2004) False positive or false negative: mining frequent itemsets from high speed transactional data streams. In: Nascimento MA, Özsu TM, Kossmann D, Miller RJ, Blakeley JA, Schiefer KB (eds) Proceedings of the 30th international conference on very large databases, Toronto, ON, Canada, pp 204–215
Zhu Y, Shasha D (2003) Efficient elastic burst detection in data streams. In: Getoor L, Senator TE, Domingos P, Faloutsos C (eds) Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining, Washington, DC, USA, pp 336–345

Download references

Author information

Authors and Affiliations

School of Computer Engineering, Nanyang Technological University, Nanyang Avenue, Singapore, 639798, Singapore
Xuan Hong Dang & Wee-Keong Ng
School of Engineering and IT, Deakin University, Waurn Ponds, VIC, 3217, Australia
Kok-Leong Ong

Authors

Xuan Hong Dang
View author publications
You can also search for this author in PubMed Google Scholar
Wee-Keong Ng
View author publications
You can also search for this author in PubMed Google Scholar
Kok-Leong Ong
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Xuan Hong Dang.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Dang, X.H., Ng, WK. & Ong, KL. Online mining of frequent sets in data streams with error guarantee. Knowl Inf Syst 16, 245–258 (2008). https://doi.org/10.1007/s10115-007-0106-2

Download citation

Received: 30 November 2006
Revised: 21 June 2007
Accepted: 14 July 2007
Published: 22 September 2007
Issue Date: August 2008
DOI: https://doi.org/10.1007/s10115-007-0106-2

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Online mining of frequent sets in data streams with error guarantee

Abstract

Access this article

Similar content being viewed by others

Fast and Exact Mining of Probabilistic Data Streams

Frequent Itemset Mining over Data Streams

Mining Data Streams with Dynamic Confidence Intervals

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Online mining of frequent sets in data streams with error guarantee

Abstract

Access this article

Similar content being viewed by others

Fast and Exact Mining of Probabilistic Data Streams

Frequent Itemset Mining over Data Streams

Mining Data Streams with Dynamic Confidence Intervals

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation