Skip to main content
Log in

DualMiner: A Dual-Pruning Algorithm for Itemsets with Constraints

  • Published:
Data Mining and Knowledge Discovery Aims and scope Submit manuscript

Abstract

Recently, constraint-based mining of itemsets for questions like “find all frequent itemsets whose total price is at least $50” has attracted much attention. Two classes of constraints, monotone and antimonotone, have been very useful in this area. There exist algorithms that efficiently take advantage of either one of these two classes, but no previous algorithms can efficiently handle both types of constraints simultaneously. In this paper, we present DualMiner, the first algorithm that efficiently prunes its search space using both monotone and antimonotone constraints. We complement a theoretical analysis and proof of correctness of DualMiner with an experimental study that shows the efficacy of DualMiner compared to previous work.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Similar content being viewed by others

References

  • Agrawal, R., Imielinski, T., and Swami, A.N. 1993. Mining association rules between sets of items in large databases. In Proceedings of the 1993 ACM SIGMOD International Conference on Management of Data, P. (Buneman and S. Jajodia (Eds.)). Washington, DC: ACM Press, pp. 207–216.

    Google Scholar 

  • Agrawal, R., Mannila, H., Srikant, R., Toivonen, H., and Verkamo, A.I. 1996. Fast discovery of association rules. In Advances in Knowledge Discovery and Data Mining, (U.M. Fayyad, G. Piatetsky-Shapiro, P. Smyth, and R. Uthurusamy (Eds.)). AAAI/MIT Press, Chap. 12, pp. 307–328.

  • Agrawal, R. and Srikant, R. 1994. Fast algorithms for mining association rules in large databases. In VLDB'94, Proceedings of 20th International Conference on Very Large Data Bases, (J.B. Bocca, M. Jarke, and C. Zaniolo (Eds.)). Santiago de Chile, Chile: Morgan Kaufmann, pp. 487–499.

    Google Scholar 

  • Bayardo, R.J. 1998. Efficiently mining long patterns from databases. In SIGMOD 1998, Proceedings of ACMSIGMOD International Conference on Management of Data, (L.M. Haas and A. Tiwary (Eds.)). Seattle, WA: ACM Press, pp. 85–93.

    Google Scholar 

  • Bayardo, R.J., Agrawal, R., and Gunopulos, D. 2000. Constraint-based rule mining in large, dense databases. Data Mining and Knowledge Discovery, 4(2/3):217–240.

    Google Scholar 

  • Boulicaut, J. and Jeudy, B. 2000. Using constraints during set mining: Should we prune or not.

  • Boulicaut, J.-F. and Jeudy, B. 2001. Mining free item sets under constraints. In International Database Engineering and Application Symposium, pp. 322–329

  • Burdick, D., Calimlim, M., and Gehrke, J. 2001. Mafia: A maximal frequent item set algorithm for transactional databases. In ICDE 2001. IEEE Computer Society.

  • Delis, A., Faloutsos, C., and Ghandeharizadeh, S. (Eds.). 1999. SIGMOD 1999, Philadephia, PA: ACM Press.

    Google Scholar 

  • Gunopulos, D., Mannila, H., Khardon, R., and Toivonen, H. 1997. Data mining, hyper graph transversals, and machine learning. In Proc. PODS 1997, pp. 209–216.

  • Haas, L.M. and Tiwary, A. (Eds.). 1998. SIGMOD 1998, Proceedings ACM SIGMOD International Conference on Management of Data, Seattle, WA: ACM Press.

  • Han, J., Pei, J., Dong, G., and Wang, K. 2001. Efficient computation of iceberg cubes with complex measures. In SIGMOD Conference.

  • Hipp, J. and Guntzer, U. 2002. Is pushing constraints deeply into the mining algorithms really what we want? SIGKDD Explorations, 4(1):50–55

    Google Scholar 

  • Lakshmanan, L.V.S., Ng, R.T., Han, J., and Pang, A. 1999. Optimization of constrained frequent set queries with 2-variable constraints. In SIGMUD 1999, (Delis, Faloutsos, and Ghandeharizadeh (Eds.)). Philadephia, PA: ACMPress, pp. 157–168.

    Google Scholar 

  • Leung, C.K.-S., Lakshmanan, L.V., and Ng, R.T. 2002. Exploiting succinct constraints using fp-trees. SIGKDD Explorations, 4(1):31–39.

    Google Scholar 

  • Ng, R.T., Lakshmanan, L.V.S., Han, J., and Mah, T. 1999. Exploratory mining via constrained frequent set queries. In SIGMUD 1999, (Delis, Faloutsos, and Ghandeharizadeh (Eds.)). Philadephia, PA: ACM Press, pp. 556–558.

    Google Scholar 

  • Ng, R.T., Lakshmanan, L.V.S., Han, J., and Pang, A. 1998. Exploratory mining and pruning optimizations ofconstrained association rules. In SIGMOD 1998, Proceedings ACM SIGMOD International Conference on Management of Data, (Haas, and Tiwary (Eds.)). Seattle, WA: ACM Press, pp. 13–24.

    Google Scholar 

  • Pei, J. and Han, J. 2000. Can we push more constraints into frequent pattern mining? In ACMSIGKDD Conference, pp. 350–354.

  • Pei, J. and Han, J. 2002. Constrained frequent pattern mining: A pattern-growth view. SIGKDD Explorations, 4(1):31–39.

    Google Scholar 

  • Pei, J., Han, J., and Lakshmanan, L.V.S. 2001. Mining frequent item sets with convertible constraints. In ICDE 2001, IEEE Computer Society, pp. 433–442.

  • Perng, C.-S., Wang, H., Ma, S., and Hellerstein, J.L. 2002. Discovery in multi-attribute data with user-defined constraints. SIGKDD Explorations,4(1):56–64.

    Google Scholar 

  • Raedt, L.D. and Kramer, S. 2001. The level wise version space algorithm and its application to molecular fragment finding. In Proceedings of the Seventeenth International Joint Conference on Artificial Intelligence (IJCAI 2001), pp. 853–862.

  • Srikant, R., Vu, Q., and Agrawal, R. 1997. Mining association rules with item constraints. In Proc. 3rd Int. Conf. Knowledge Discovery and Data Mining, (KDD, D. Heckerman, H. Mannila, D. Pregibon, and R. Uthurusamy(Eds.)). AAAI Press, pp. 67–73. IBM data generator. http://www.almaden.ibm.com/cs/quest/syndata.html.

  • Cristian Bucil¢a is a Ph.D. student at Cornell University. He received his Bachelor's degree in computer science at the Technical University of Cluj-Napoca, Romania. His current research interests are in Data Mining.

  • Johannes Gehrke is an Assistant Professor in the Department of Computer Science at Cornell University. He obtained his Ph.D. in computer science from the University of Wisconsin-Madison in 1999. Johannes’ research interests are in the areas of data mining and novel distributed database technology. Johannes has received a National Science Foundation Career Award, an Arthur P. Sloan Fellowship, an IBM Faculty Award, and the Cornell College of Engineering James and Mary Tien Excellence in Teaching Award. He co-authored the textbook “Database Management Systems” (McGrawHill, currently in its third edition).

  • Daniel Kifer is a Ph.D. student at Cornell University. He received a Bachelor's degree in mathematics and in computer science at New York University. His current research interests are Databases and Data Mining. Walker White is an assistant professor in the mathematics department at the University of Dallas, a liberal arts college, where he is responsible for developing their new computer science program. He received his Bachelor's degree in mathematics from Dartmouth College and both a Ph.D. in mathematics and Master's in computer science from Cornell University. His primary research is in mathematical logic and its applications to computer science.

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

About this article

Cite this article

Bucilă, C., Gehrke, J., Kifer, D. et al. DualMiner: A Dual-Pruning Algorithm for Itemsets with Constraints. Data Mining and Knowledge Discovery 7, 241–272 (2003). https://doi.org/10.1023/A:1024076020895

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1023/A:1024076020895

Navigation