Abstract
This paper focuses on the modeling and the implementation as a multi-objective optimization problem of a Pittsburgh classification rule mining algorithm adapted to large and imbalanced datasets, as encountered in hospital data. We associate to this algorithm an original post-processing method based on ROC curve to help the decision maker to choose the most interesting rules. After an introduction to problems brought by hospital data such as class imbalance, volumetry or inconsistency, we present MOCA-I - a Pittsburgh modelization adapted to this kind of problems. We propose its implementation as a dominance-based local search in opposition to existing multi-objective approaches based on genetic algorithms. Then we introduce the post-processing method to sort and filter the obtained classifiers. Our approach is compared to state-of-the-art classification rule mining algorithms, giving as good or better results, using less parameters. Then it is compared to C4.5 and C4.5-CS on hospital data with a larger set of attributes, giving the best results.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
International classification of diseases; http://www.who.int/classifications/icd/en/
References
Fernández, A., Garciá, S., Luengo, J., Bernadó-Mansilla, E., Herrera, F.: Genetics-based machine learning for rule induction: state of the art, taxonomy, and comparative study. IEEE Trans. Evol. Comput. 14(6), 913–941 (2010)
Quinlan, J.R.: C4.5: Programs for Machine Learning. Morgan Kaufmann Publishers Inc., San Francisco (1993)
Chawla, N.V.: Data mining for imbalanced datasets: an overview. In: Maimon, O., Rokach, L. (eds.) Data Mining and Knowledge Discovery Handbook, 2nd edn, pp. 875–886. Springer, New York (2010)
Jo, T., Japkowicz, N.: Class imbalances versus small disjuncts. ACM SIGKDD Explor. Newsl. 6(1), 40–49 (2004)
Geng, L., Hamilton, H.J.: Interestingness measures for data mining: a survey. ACM Comput. Surv. (CSUR) 38(3), 1–32 (2006)
Ohsaki, M., Abe, H., Tsumoto, S., Yokoi, H., Yamaguchi, T.: Evaluation of rule interestingness measures in medical knowledge discovery in databases. Artif. Intell. Med. 41, 177–196 (2007)
Greco, S., Pawlak, Z., Slowiński, R.: Can bayesian confirmation measures be useful for rough set decision rules? Eng. Appl. Artif. Intell. 17(4), 345–361 (2004)
Bayardo, J., Agrawal, R.: Mining the most interesting rules. In: Proceedings of the Fifth ACM SIGKDD, ser. KDD ’99, pp. 145–154 (1999)
Rissanen, J.: Modeling by shortest data description. Automatica 14(5), 465–471 (1978)
Reynolds, A., de la Iglesia, B.: Rule induction for classification using multi-objective genetic programming. In: Obayashi, S., Deb, K., Poloni, C., Hiroyasu, T., Murata, T. (eds.) EMO 2007. LNCS, vol. 4403, pp. 516–530. Springer, Heidelberg (2007)
Bacardit, J., Stout, M., Hirst, J.D., Sastry, K., Llorà, X., Krasnogor, N.: Automated alphabet reduction method with evolutionary algorithms for protein structure prediction. In: GECCO, pp. 346–353 (2007)
Corne, D., Dhaenens, C., Jourdan, L.: Synergies between operations research and data mining: the emerging use of multi-objective approaches. Eur. J. Oper. Res. 221(3), 469–479 (2012)
Srinivasan, S., Ramakrishnan, S.: Evolutionary multi objective optimization for rule mining: a review. Artif. Intell. Rev. 36(3), 205–248 (2011)
Coello Coello, C.A, Dhaenens, C., Jourdan, L. (eds.): Advances in Multi-Objective Nature Inspired Computing. SCI, vol. 272. Springer, Heidelberg (2010)
Casillas, J., Martínez, P., Benítez, A.: Learning consistent, complete and compact sets of fuzzy rules in conjunctive normal form for regression problems. Soft Comput. (A Fusion of Foundations, Methodologies and Applications) 13, 451–465 (2009)
Liefooghe, A., Humeau, J., Mesmoudi, S., Jourdan, L., Talbi, E.-G.: On dominance-based multiobjective local search: design, implementation and experimental analysis on scheduling and traveling salesman problems. J. Heuristics 18, 317–352 (2012)
Liefooghe, A., Jourdan, L., Talbi, E.-G.: A software framework based on a conceptual unified model for evolutionary multiobjective optimization: paradiseo-moeo. Eur. J. Oper. Res. 209(2), 104–112 (2011)
Fawcett, T.: An introduction to roc analysis. Pattern Recogn. Lett. 27(8), 861–874 (2006)
Alcalá-Fdez, J., et al.: Keel: a software tool to assess evolutionary algorithms for data mining problems. Soft Comput. (A Fusion of Foundations, Methodologies and Applications) 13, 307–318 (2009)
Plantevit, M., Laurent, A., Laurent, D., Teisseire, M., Choong, Y.W.: Mining multidimensional and multilevel sequential patterns. ACM TKDD 4(1), 1–37 (2010)
Zhang, J., Bala, J.W., Hadjarian, A., Han, B.: Learning to rank cases with classification rules. In: Fürnkranz, J., Hüllermeier, E. (eds.) Preference Learning, pp. 155–177. Springer, Heidelberg (2011)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Jacques, J., Taillard, J., Delerue, D., Jourdan, L., Dhaenens, C. (2013). MOCA-I: Discovering Rules and Guiding Decision Maker in the Context of Partial Classification in Large and Imbalanced Datasets. In: Nicosia, G., Pardalos, P. (eds) Learning and Intelligent Optimization. LION 2013. Lecture Notes in Computer Science(), vol 7997. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-44973-4_5
Download citation
DOI: https://doi.org/10.1007/978-3-642-44973-4_5
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-44972-7
Online ISBN: 978-3-642-44973-4
eBook Packages: Computer ScienceComputer Science (R0)