Skip to main content
Log in

Online mining of frequent sets in data streams with error guarantee

  • Regular Paper
  • Published:
Knowledge and Information Systems Aims and scope Submit manuscript

Abstract

For most data stream applications, the volume of data is too huge to be stored in permanent devices or to be thoroughly scanned more than once. It is hence recognized that approximate answers are usually sufficient, where a good approximation obtained in a timely manner is often better than the exact answer that is delayed beyond the window of opportunity. Unfortunately, this is not the case for mining frequent patterns over data streams where algorithms capable of online processing data streams do not conform strictly to a precise error guarantee. Since the quality of approximate answers is as important as their timely delivery, it is necessary to design algorithms to meet both criteria at the same time. In this paper, we propose an algorithm that allows online processing of streaming data and yet guaranteeing the support error of frequent patterns strictly within a user-specified threshold. Our theoretical and experimental studies show that our algorithm is an effective and reliable method for finding frequent sets in data stream environments when both constraints need to be satisfied.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Similar content being viewed by others

References

  1. Agrawal R, Srikant R (1994) Fast algorithms for mining association rules in large databases. In: Jorge B, Matthias J, Carlo Z (eds) Proceedings of the 20th international conference on very large databases, San Francisco, CA, USA, pp 489–499

  2. Babcock B, Datar M, Motwani R (2002) Sampling from a moving window over streaming data. In: Eppstein D (ed) Proceedings of the thirteenth annual ACM-SIAM symposium on discrete algorithms, San Francisco, CA, USA, pp 633–634

  3. Babcock B, Datar M, Babcock B, Babu S, Datar M, Motwani R, Widom J (2002) Models and issues in data stream systems. In: Popa L (ed) Proceedings of the twenty-first ACM SIGACT-SIGMOD-SIGART symposium on principles of database systems, Madison, WT, USA, pp 1–16

  4. Chang JH, Lee WS (2003) EstWin: adaptively monitoring recent change of frequent itemsets over online data streams. In: Proceedings of the 2003 ACM CIKM international conference on information and knowledge management, New Orleans, LA, USA, pp 536–539

  5. Chang JH, Lee WS (2003) Finding recent frequent itemsets adaptively over online data streams. In: Getoor L, Senator TE, Domingos P, Faloutsos C (eds) Proceedings of the ninth ACM SIGKDD international conference on knowledge discovery and data mining, Washington, DC, USA, pp 487–492

  6. Charikar M, Chen K, Farach-Colton M (2002) Finding frequent items in data streams. In: Widmayer P, Ruiz FT, Bueno RM, Hennessy M, Eidenbenz S, Conejo R (eds) Proceedings of 29th international colloquium on automata, languages and programming, Malaga, Spain, pp 693–703

  7. Cormode C, Muthukrishnan S (2005) What’s hot and what’s not: tracking most frequent items dynamically. ACM Trans Database Syst (TODS) 30(1), 249–278

    Google Scholar 

  8. Cortes C, Fisher K, Pregibon D, Rogers A (2000) Hancock: a language for extracting signatures from data streams. In: Ramakrishnan R, Stolfo S (eds) Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining, Boston, MA, USA, pp 9–17

  9. Dang XH, Ng WK, Ong KL (2006) Adaptive load shedding for mining frequent patterns from data streams. In: Tjoa AM, Trujillo J (eds) Proceedings of 8th international conference on data warehousing and knowledge discovery, Krakow, Poland

  10. Domingos P, Hulten G (2001) Catching up with the data: research issues in mining data streams. In: Bayardo R, Gehrke J (eds) Research issues on data mining and knowledge discovery, Santa Barbara, CA, USA

  11. Garofalakis M, Gehrke J, Rastogi R (2002) Querying and mining data streams: you only get one look a tutorial. In: Franklin MJ, Moon B, Ailamaki A (eds) Proceedings of the 2002 ACM SIGMOD international conference on management of data, Madison, WI, USA, p 635

  12. Giannella C, Han J, Pei J, Yan X, Yu PS (2003) Mining frequent patterns in data streams at multiple time granularities. In: Kargupta H, Joshi A, Sivakumar K, Yesha Y (eds) Next generation data mining, AAAI/MIT

  13. Hellerstein JM, Haas PJ, Wang HJ (1997) Online aggregation. In: Peckham J (eds) Proceedings ACM SIGMOD international conference on management of data, Tucson, AZ, USA, pp 171–182

  14. Hidber C (1999) Online association rule mining. In: Delis A, Faloutsos C, Ghandeharizadeh S (eds) Proceedings ACM SIGMOD international conference on management of data, Philadelphia, PA, USA, pp 145–156

  15. Lambert D, Pinheiro JC (2001) Mining a stream of transactions for customer patterns. In: Provost F, Srikant R (eds) Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining, San Francisco, CA, USA, pp 305–310

  16. Madden S, Franklin MJ (2002) Fjording the stream: an architecture for queries over streaming sensor data. In: Agrawal R, Dittrich K, Anne HH (eds) Proceedings of the 18th international conference on data engineering, San Jose, CA, USA, pp 555–566

  17. Manku GS, Motwani R (2002) Approximate frequency counts over data streams. In: Proceedings of 28th international conference on very large databases, Hong Kong, China, pp 346–357

  18. Metwally A, Agrawal D, Abbadi AE (2005) Using association rules for fraud detection in web advertising networks. In: Böhm K, Jensen CS, Haas LM, Kersten ML, Larson PA, Ooi BC (eds) Proceedings of 31st international conference on very large databases, Trondheim, Norway, pp 169–180

  19. Tatbul N, Çetintemel U, Zdonik SB, Cherniack M, Stonebraker M (2003) Load shedding in a data stream manager. In: Johann CF, Peter CL, Serge A, Michael JC, Patricia GS, Andreas H (eds) Proceedings of the 29th international conference on very large databases, Berlin, Germany, pp 309–320

  20. Tatbul N, Zdonik SB (2006) Window-aware load shedding for aggregation queries over data streams. In: Dayal U, Whang KW, Lomet DB, Alonso G, Lohman GM, Kersten ML, Cha SK, Kim YK (eds) Proceedings of the 32nd international conference on very large databases, Seoul, Korea, pp 799–810

  21. Teng WG, Chen MS, Yu PS (2003) A regression-based temporal pattern mining scheme for data streams. In: Johann CF, Peter CL, Serge A, Michael JC, Patricia GS, Andreas H (eds) Proceedings of the 29th international conference on very large databases, Berlin, Germany, pp 93–104

  22. Yang J (2003) Dynamic clustering of evolving streams with a single pass. In: Dayal U, Ramamritham K, Vijayaraman TM (eds) Proceedings of the 19th international conference on data engineering, Bangalore, India, pp 695–697

  23. Yu JX, Chong Z, Lu H (2004) False positive or false negative: mining frequent itemsets from high speed transactional data streams. In: Nascimento MA, Özsu TM, Kossmann D, Miller RJ, Blakeley JA, Schiefer KB (eds) Proceedings of the 30th international conference on very large databases, Toronto, ON, Canada, pp 204–215

  24. Zhu Y, Shasha D (2003) Efficient elastic burst detection in data streams. In: Getoor L, Senator TE, Domingos P, Faloutsos C (eds) Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining, Washington, DC, USA, pp 336–345

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xuan Hong Dang.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Dang, X.H., Ng, WK. & Ong, KL. Online mining of frequent sets in data streams with error guarantee. Knowl Inf Syst 16, 245–258 (2008). https://doi.org/10.1007/s10115-007-0106-2

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10115-007-0106-2

Keywords

Navigation