Skip to main content
Log in

Subgroup Discovery Algorithms: A Survey and Empirical Evaluation

  • Survey
  • Published:
Journal of Computer Science and Technology Aims and scope Submit manuscript

Abstract

Subgroup discovery is a data mining technique that discovers interesting associations among different variables with respect to a property of interest. Existing subgroup discovery methods employ different strategies for searching, pruning and ranking subgroups. It is very crucial to learn which features of a subgroup discovery algorithm should be considered for generating quality subgroups. In this regard, a number of reviews have been conducted on subgroup discovery. Although they provide a broad overview on some popular subgroup discovery methods, they employ few datasets and measures for subgroup evaluation. In the light of the existing measures, the subgroups cannot be appraised from all perspectives. Our work performs an extensive analysis on some popular subgroup discovery methods by using a wide range of datasets and by defining new measures for subgroup evaluation. The analysis result will help with understanding the major subgroup discovery methods, uncovering the gaps for further improvement and selecting the suitable category of algorithms for specific application domains.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Similar content being viewed by others

References

  1. Fayyad U, Piatetsky-Shapiro G, Smyth P. Knowledge discovery and data mining: Towards a unifying framework. In Proc. the 2nd International Conference on Knowledge Discovery and Data Mining (KDD), Aug. 1996, pp.82-88.

  2. Novak P K, Lavrač N, Webb G I. Supervised descriptive rule discovery: A unifying survey of contrast set, emerging pattern and subgroup mining. The Journal of Machine Learning Research, 2009, 10: 377–403.

    MATH  Google Scholar 

  3. Gamberger D, Lavrač N, Krstačić G. Active subgroup mining: A case study in coronary heart disease risk group detection. Artificial Intelligence in Medicine, 2003, 28(1): 27–57.

    Article  Google Scholar 

  4. Gamberger D, Lavrač N. Supporting factors in descriptive analysis of brain ischaemia. In Proc. the 11th Conference on Artificial Intelligence in Medicine (AIME), Jul. 2007, pp.155-159.

  5. Gamberger D, Lavrač N, Krstačić A, Krstačić G. Clinical data analysis based on iterative subgroup discovery: Experiments in brain ischaemia data analysis. Applied Intelligence, 2007, 27(3): 205–217.

    Article  MATH  Google Scholar 

  6. KlösgenW. Applications and research problems of subgroup mining. In Proc. the 11th ISMIS, June 1999.

  7. Lavrač N, Cestnik B, Gamberger D, Flach P. Decision support through subgroup discovery: Three case studies and the lessons learned. Machine Learning, 2004, 57(1/2): 115-143.

    Article  MATH  Google Scholar 

  8. Romero C, González P, Ventura S, del Jesus M J, Herrera F. Evolutionary algorithms for subgroup discovery in e-learning: A practical application using Moodle data. Expert Systems with Applications: An International Journal, 2009, 36(2): 1632–1644.

    Article  Google Scholar 

  9. Klösgen W, May M. Spatial subgroup mining integrated in an object-relational spatial database. In Proc. the 6th European Conference on Principles of Data Mining and Knowledge Discovery (PKDD), Aug. 2002, pp.275-286.

  10. May M, Ragia L. Spatial subgroup discovery applied to the analysis of vegetation data. In Proc. the 4th Practical Aspects of Knowledge Management, Dec. 2002, pp.49-61.

  11. Gamberger D, Lavrač N. Expert-guided subgroup discovery: Methodology and application. Journal of Artificial Intelligence Research, 2002, 17(1): 501–527.

    MATH  Google Scholar 

  12. Kavšek B, Lavrač N, Jovanoski U. APRIORI-SD: Adapting association rule learning to subgroup discovery. In Proc. the 5th IDA, Aug. 2003, pp.230-241.

  13. Atzmueller M, Puppe F. SD-Map — A fast algorithm for exhaustive subgroup discovery. In Proc. the 10th European Conference on Principle and Practice of Knowledge Discovery in Databases (PKDD), Sept. 2006, pp.6-17.

  14. Leeuwen M, Knobbe A. Diverse subgroup set discovery. Data Mining and Knowledge Discovery, 2012, 25(2): 208-242.

    Article  MathSciNet  Google Scholar 

  15. del Jesus M J, González P, Herrera F, Mesonero M. Evolutionary fuzzy rule induction process for subgroup discovery: A case study in marketing. IEEE Trans. Fuzzy Systems, 2007, 15(4): 578–592.

    Article  Google Scholar 

  16. Herrera F, Carmona C J, González P, del Jesus M J. An overview on subgroup discovery: Foundations and applications. Knowledge Information System, 2011, 29(3): 495-525.

    Article  Google Scholar 

  17. KlösgenW. Explora: A multipattern and multistrategy discovery assistant. In Advances in Knowledge Discovery and Data Mining, Fayyad V M, Piatetsky-Shapiro G, Smyth P et al. (eds.), AAAI/WIT Press, 1996, pp.249-271.

  18. Wrobel S. An algorithm for multi-relational discovery of subgroups. In Proc. the 1st European Symposium on Principles of Data Mining and Knowledge Discovery (PKDD), Jun. 1997, pp.78-87.

  19. Han J, Cheng H, Xin D, Yan X. Frequent pattern mining: Current status and future directions. Data Mining and Knowledge Discovery, 2007, 15(1): 55–86.

    Article  MathSciNet  Google Scholar 

  20. Grosskreutz H, R¨uping S, Wrobel S. Tight optimistic estimates for fast subgroup discovery. In Proc. the European Conference on Machine Learning and Knowledge Discovery in Databases (ECML PKDD), Sept. 2008, pp.440-456.

  21. Boley M, Grosskreutz H. Non-redundant subgroup discovery using a closure system. In Proc. the European Conference on Machine Learning and Knowledge Discovery in Databases (ECML PKDD), Sept. 2009, pp.179-194.

  22. Grosskreutz H, Rüping S. On subgroup discovery in numerical domains. Data Mining and Knowledge Discovery, 2009, 19(2): 210–226.

    Article  MathSciNet  Google Scholar 

  23. Lavrač N, Kavšek B, Flach P, Todorovski L. Subgroup discovery with CN2-SD. The Journal of Machine Learning Research, 2004, 5: 153–188.

    MathSciNet  Google Scholar 

  24. Atzmueller M, Puppe F, Buscher H P. Towards knowledgeintensive subgroup discovery. In Proc. the Lernen-Wissensentdeckung-Adaptivit¨at-Fachgruppe Maschinelles Lernen, Oct. 2004, pp.111-117.

  25. Klösgen W, May M, Petch J. Mining census data for spatial effects on mortality. Intelligent Data Analysis, 2003, 7(6): 521–540.

    Google Scholar 

  26. Clark P, Niblett T. The CN2 induction algorithm. Journal of Machine Learning, 1989, 3(4): 261–283.

    Google Scholar 

  27. Lavrač N, Zelezný F, Flach P. RSD: Relational subgroup discovery through first-order feature construction. In Proc. the 12th International Conference on Inductive Logic Programming, Jul. 2002, pp.149-165.

  28. Jovanoski V, Lavrač N. Classification rule learning with APRIORI-C. In Proc. the 10th Portuguese Conference on Artificial Intelligence, Dec. 2001, pp.44-51.

  29. Han J, Pei J, Yin Y. Mining frequent patterns without candidate generation. In Proc. the ACM SIGMOD International Conference on Management of Data, May 2000, pp.1-12.

  30. Agrawal R, Srikant R. Fast algorithms for mining association. In Proc. the 20th VLDB, Sept. 1994, pp.487-499.

  31. del Jesus M J, González P, Herrera F. Multiobjective genetic algorithm for extracting subgroup discovery fuzzy rules. In Proc. IEEE Symp. Computational Intelligence in Multicriteria Decision Making, Apr. 2007, pp.50-57.

  32. Zitzler E, Laumanns M, Thiele L. SPEA2: Improving the strength Pareto evolutionary algorithm. In Proc. International Congress on Evolutionary Methods for Design Optimization and Control with Applications to Industrial Problems, Sept. 2001, pp.95-100.

  33. Carmona C J, González P, del Jesus M J, Herrera F. NMEEF-SD: Non-dominated multiobjective evolutionary algorithm for extracting fuzzy rules in subgroup discovery. IEEE Trans. Fuzzy Systems, 2010, 18(5): 958–970.

    Article  Google Scholar 

  34. Deb K, Pratap A, Agarwal S, Meyarivan T. A fast and elitist multiobjective genetic algorithm NSGA-II. IEEE Trans. Evolutionary Computation, 2002, 6(2): 182–197.

    Article  Google Scholar 

  35. Luna J M, Romero J R, Romero C, Ventura S. On the use of genetic programming for mining comprehensible rules in subgroup discovery. IEEE Trans. Cybernatics, 2014, 44(12): 2329–2341.

    Article  Google Scholar 

  36. Gamberger D, Lavrač N. Generating actionable knowledge by expert-guided subgroup discovery. In Proc. the 6th European Conference on Principles of Data Mining and Knowledge Discovery, Aug. 2002, pp.163-175.

  37. Lavrač N. Subgroup discovery techniques and applications. In Proc. the 9th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining, May 2005, pp.2-14.

  38. Carmona C J, González P, del Jesus M J, Navío-Acosta M, Jiménez-Trevino L. Evolutionary fuzzy rule extraction for subgroup discovery in a psychiatric emergency department. Soft Computing, 2011, 15(12): 2435–2448.

    Article  Google Scholar 

  39. Carmona C J, Ruiz-Rodado V, del Jesus M J, Weber A, Grootveld M, González P, Elizondo D. A fuzzy genetic programming-based algorithm for subgroup discovery and the application to one problem of pathogenesis of acute sore throat conditions in humans. Information Sciences, 2015, 298(C): 180–197.

    Article  Google Scholar 

  40. Gamberger D, Lavrač N. Avoiding data overfitting in scientific discovery: Experiments in functional genomics. In Proc. the 16th European Conference on Artificial Intelligence, Aug. 2004, pp.470-474.

  41. Mueller M, Rosales R, Steck H, Krishnan S, Rao B, Kramer S. Subgroup discovery for test selection: A novel approach and its application to breast cancer diagnosis. In Proc. the 8th Intelligent Data Analysis, Aug.31-Sept.2, 2009, pp.119-130.

  42. Trajkovski I, Železný F, Lavrač N, Tolar J. Learning relational descriptions of differentially expressed gene groups. IEEE Trans. Systems, Man, and Cybernetics, 2008, 38(1): 16–25.

    Article  Google Scholar 

  43. Trajkovski I, Železný F, Tolar J, Lavrač N. Relational subgroup discovery for descriptive analysis of microarray data. In Proc. the 2nd International Conference on Computational Life Sciences, Sept. 2006, pp.86-96.

  44. Schmidt J, Hapfelmeier A, Mueller M, Perneczky R, Kurz A, Drzezga A, Kramer S. Interpreting PET scans by structured patient data: A data mining case study in dementia research. Knowledge and Information Systems, 2010, 24(1): 149–170.

    Article  Google Scholar 

  45. Kavšek B, Lavrač N. Using subgroup discovery to analyze the UK traffic data. Advances in Methodology and Statistics, 2004, 1(1): 249–264.

    Google Scholar 

  46. Kavšek B, Lavrač N, Bullas J C. Rule induction for subgroup discovery: A case study in mining UK traffic accident data. In Proc. International Multi-Conference on Information Society, Jan. 2002, pp.127-130.

  47. Agrawal R, Mannila H, Srikant R, Toivonen H, Verkamo A I. Fast discovery of association rules. In Advances in Knowledge Discovery and Data Mining, Fayyad VM, Piatefsky-Shapiro G, Smyth P et al. (eds.), AAAI/MIT Press, 1996, pp.307-328.

  48. Lavrač N, Flach P, Zupan B. Rule evaluation measures: A unifying view. In Proc. the 9th International Workshop on Inductive Logic Programming (ILP), Jun. 1999, pp.174-185.

  49. Lichman M. UCI machine learning repository, 2013. http://archive.ics.uci.edu/ml, Mar. 2016.

  50. Kohavi R, Sommerfield D, Dougherty J. Data mining using MLC++: A machine learning library in C++. International Journal on Artificial Intelligence Tools, 1997, 6(4): 537–566.

    Article  Google Scholar 

  51. Demšar J, Curk T, Erjavec A, Gorup C, Hočevar T, Milutinovič M, Možina M, Polajnar M, Toplak M, Starič A, Štajdohar M, Umek L, Žagar L, Žbontar J, Žitnik M, Zupan B. Orange: Data mining toolbox in python. Journal of Machine Learning Research, 2013, 14: 2349–2353.

    MATH  Google Scholar 

  52. Atzmueller M, Lemmerich F. VIKAMINE — Open-source subgroup discovery, pattern mining, and analytics. In Proc. ECML PKDD, Sept. 2012, pp.842-845.

  53. Alcalá-Fdez J, Sánchez L, García S, del Jesus M J, Ventura S, Garrell J M, Otero J, Romero C, Bacardit J, Rivas V M, Fernández J C, Herrera F. KEEL: A software tool to assess evolutionary algorithms for data mining problems. Soft Computing, 2009, 13(3): 307–318.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Sumyea Helal.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Helal, S. Subgroup Discovery Algorithms: A Survey and Empirical Evaluation. J. Comput. Sci. Technol. 31, 561–576 (2016). https://doi.org/10.1007/s11390-016-1647-1

Download citation

  • Received:

  • Revised:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11390-016-1647-1

Keywords

Navigation