Skip to main content
Log in

Metaheuristics for data mining: survey and opportunities for big data

  • Original Research
  • Published:
Annals of Operations Research Aims and scope Submit manuscript

Abstract

In the context of big data, many scientific communities aim to provide efficient approaches to accommodate large-scale datasets. This is the case of the machine-learning community, and more generally, the artificial intelligence community. The aim of this article is to explain how data mining problems can be considered as combinatorial optimization problems, and how metaheuristics can be used to address them. Four primary data mining tasks are presented: clustering, association rules, classification, and feature selection. This article follows the publication of a book in 2016 concerning this subject (Dhaenens and Jourdan in Metaheuristics for big data, Wiley, Hoboken, 2016), and an article published in 4OR (Dhaenens and Jourdan in 4OR 17 (2):115–139, 2019); additionally, updated references and an analysis of the current trends are presented.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1

Similar content being viewed by others

References

  • Abdul-Rahman, S., Bakar, A. A., & Mohamed-Hussein, Z. A. (2013). Optimizing big data in bioinformatics with swarm algorithms. In 2013 IEEE 16th international conference on computational science and engineering (pp. 1091–1095).

  • Abubaker, A., Baharum, A., & Alrefaei, M. (2015). Automatic clustering using multi-objective particle swarm and simulated annealing. PLoS ONE, 10(7), e0130,995.

    Article  Google Scholar 

  • Agrawal, R., Imielinski, T., & Swami, A. N. (1993). Mining association rules between sets of items in large databases. In Proceedings of the 1993 ACM SIGMOD international conference on management of data (pp. 207–216). ACM Press.

  • Agrawal, R., & Srikant, R. (1994). Fast algorithms for mining association rules in large databases. In VLDB ’94: Proceedings of the 20th international conference on very large data bases (pp. 487–499). Morgan Kaufmann Publishers Inc.

  • Alam, S., Dobbie, G., Koh, Y. S., Riddle, P., & Rehman, S. U. (2014). Research on particle swarm optimization based clustering: A systematic review of literature and techniques. Swarm and Evolutionary Computation, 17, 1–13.

    Article  Google Scholar 

  • Alatas, B., Akin, E., & Karci, A. (2008). Modenar: Multi-objective differential evolution algorithm for mining numeric association rules. Applied Soft Computing, 8(1), 646–656.

    Article  Google Scholar 

  • Alba, E., García-Nieto, J., Jourdan, L., & Talbi, E. G. (2007). Gene selection in cancer classification using pso/svm and ga/svm hybrid algorithms. In IEEE congress on evolutionary computation, 2007. CEC 2007 (pp 284–290). IEEE.

  • Alshammari, S., Zolkepli, M. B., & Abdullah, R. B. (2020). Genetic algorithm based parallel K-means data clustering algorithm using MapReduce programming paradigm on hadoop environment (GAPKCA) recent advances on soft computing and data mining. SCDM. Advances in Intelligent Systems and Computing, 978, 98–108.

    Article  Google Scholar 

  • Anand, R., Vaid, A., & Singh, P. K. (2009). Association rule mining using multi-objective evolutionary algorithms: Strengths and challenges. In World congress on nature & biologically inspired computing, 2009. NaBIC 2009 (pp. 385–390). IEEE.

  • Baalamurugan, K., & Bhanu, S. V. (2018). An efficient clustering scheme for cloud computing problems using metaheuristic algorithms. Cluster Computing, 22(5), 12917–12927.

  • Bacardit, J., & Butz, M. V. (2007). Data mining in learning classifier systems: Comparing \(XCS\) with \(GAssist\). Learning Classifier Systems, 4399, 282–290.

    Article  Google Scholar 

  • Bala, J., Huang, J., Vafaie, H., DeJong, K., & Wechsler, H. (1995). Hybrid learning using genetic algorithms and decision trees for pattern classification. In IJCAI (1) (pp. 719–724).

  • Bandyopadhyay, S., & Maulik, U. (2001). Nonparametric genetic clustering: Comparison of validity indices. IEEE Transactions on Systems, Man and Cybernetics Part C: Applications and Reviews, 31(1), 120–125.

    Article  Google Scholar 

  • Bandyopadhyay, S., Mukhopadhyay, A., & Maulik, U. (2007). An improved algorithm for clustering gene expression data. Bioinformatics, 23(21), 2859–2865.

    Article  Google Scholar 

  • Barba-Gonzaléz, C., García-Nieto, J., Nebro, A. J., & Aldana-Montes, J. F. (2017). Multi-objective big data optimization with jmetal and spark. In International conference on evolutionary multi-criterion optimization (pp. 16–30). Springer.

  • Barros, R. C., Basgalupp, M. P., de Carvalho, A. C., & Freitas, A. A. (2012). A hyper-heuristic evolutionary algorithm for automatically designing decision-tree algorithms. In Proceedings of the 14th annual conference on genetic and evolutionary computation (pp. 1237–1244). ACM.

  • Basgalupp, M. P., Barros, R. C., & Podgorelec, V. (2015). Evolving decision-tree induction algorithms with a multi-objective hyper-heuristic. In Proceedings of the 30th annual ACM symposium on applied computing (pp. 110–117). ACM.

  • Begum, S., Chakraborty, S., Banerjee, A., Das, S., Sarkar, R., & Chakraborty, D. (2018). Gene selection for diagnosis of cancer in microarray data using memetic algorithm. In V. Bhateja, C. A. Coello Coello, S. C. Satapathy, & P. K. Pattnaik (Eds.), Intelligent engineering informatics (pp. 441–449). Springer.

  • Bezdek, J. C., Boggavarapu, S., Hall, L. O., & Bensaid, A. (1994). Genetic algorithm guided clustering. In International conference on evolutionary computation (pp. 34–39).

  • Bong, C. W., & Rajeswari, M. (2011). Multi-objective nature-inspired clustering and classification techniques for image segmentation. Applied Soft Computing Journal, 11(4), 3271–3282.

    Article  Google Scholar 

  • Borges, H. B., & Nievola, J. C. (2005). Attribute selection methods comparison for classification of diffuse large b-cell lymphoma. In Proceedings. Fourth international conference on machine learning and applications, 2005 (pp. 6–pp). IEEE.

  • Boryczka, U., & Kozak, J. (2010). Ant colony decision trees—A new method for constructing decision trees based on ant colony optimization. In J.-S. Pan, S.-M. Chen, & N.T. Nguyen (Eds.), Computational collective intelligence. Technologies and applications (pp. 373–382). Springer.

  • Boryczka, U., & Kozak, J. (2015). Enhancing the effectiveness of ant colony decision tree algorithms by co-learning. Applied Soft Computing, 30, 166–178.

    Article  Google Scholar 

  • Bursa, M., Lhotska, L., & Macas, M.(2007). Hybridized swarm metaheuristics for evolutionary random forest generation. In 7th international conference on hybrid intelligent systems, 2007. HIS 2007 (pp. 150–155).

  • Can, U., & Alatas, B. (2017). Automatic mining of quantitative association rules with gravitational search algorithm. International Journal of Software Engineering and Knowledge Engineering, 27(03), 343–372.

    Article  Google Scholar 

  • Cano, A., Luna, J. M., & Ventura, S. (2013). High performance evaluation of evolutionary-mined association rules on gpus. The Journal of Supercomputing, 66(3), 1438–1461.

    Article  Google Scholar 

  • Che, D., Safran, M., & Peng, Z. (2013). From big data to big data mining: Challenges, issues, and opportunities. In B. Hong, X. Meng, L. Chen, W. Winiwarter, & W. Song (Eds.), Database systems for advanced applications (pp. 1–15). Springer.

  • Corne, D., Dhaenens, C., & Jourdan, L. (2012). Synergies between operations research and data mining: The emerging use of multi-objective approaches. European Journal of Operational Research, 221(3), 469–479.

    Article  Google Scholar 

  • Cowgill, M., Harvey, R., & Watson, L. (1999). Genetic algorithm approach to cluster analysis. Computers and Mathematics with Applications, 37(7), 99–108. https://doi.org/10.1016/S0898-1221(99)00090-5

    Article  Google Scholar 

  • Dankolo, M. N., Radzi, N. H. M., Sallehuddin, R., & Mustaffa, N. H. (2017). A study of metaheuristic algorithms for high dimensional feature selection on microarray data. In AIP conference proceedings (vol. 1905, p. 040010). AIP Publishing.

  • Dean, J., & Ghemawat, S. (2008). Mapreduce: Simplified data processing on large clusters. Communications of the ACM, 51(1), 107–113.

    Article  Google Scholar 

  • Defays, D. (1977). An efficient algorithm for a complete link method. The Computer Journal, 20(4), 364–366.

    Article  Google Scholar 

  • de la Iglesia, B., Reynolds, A., & Rayward-Smith, V. J. (2005). Developments on a multi-objective metaheuristic (momh) algorithm for finding interesting sets of classification rules. In C. A. Coello Coello, A. Hernández Aguirre, & E. Zitzler (Eds.), Evolutionary multi-criterion optimization (pp. 826–840). Springer.

  • de la Iglesia, B., Richards, G., Philpott, M. S., & Rayward-Smith, V. J. (2006). The application and effectiveness of a multi-objective metaheuristic algorithm for partial classification. European Journal of Operational Research, 169(3), 898–917.

    Article  Google Scholar 

  • del Jesus, M. J., Gamez, J. A., Gonzalez, P., & Puerta, J. M. (2011). On the discovery of association rules by means of evolutionary algorithms. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, 1(5), 397–415.

    Google Scholar 

  • Derouiche, A., Layeb, A., & Habbas, Z. (2020). Metaheuristics guided by the apriori principle for association rule mining: Case study-CRO metaheuristic. International Journal of Organizational and Collective Intelligence (IJOCI), 10(3), 14–37.

    Article  Google Scholar 

  • Dhaenens, C., & Jourdan, L. (2016). Metaheuristics for big data. Wiley.

  • Dhaenens, C., & Jourdan, L. (2019). Metaheuristics for data mining: Survey and opportunities for big data. 4OR, 17(2), 115–139.

    Article  Google Scholar 

  • Diao, R., & Shen, Q. (2015). Nature inspired feature selection meta-heuristics. Artificial Intelligence Review, 44(3), 311–340.

    Article  Google Scholar 

  • Djenouri, Y., Bendjoudi, A., Mehdi, M., Nouali-Taboudjemat, N., & Habbas, Z. (2015). Gpu-based bees swarm optimization for association rules mining. The Journal of Supercomputing, 71(4), 1318–1344.

    Article  Google Scholar 

  • Djenouri, Y., Djenouri, D., Habbas, Z., & Belhadi, A. (2018). How to exploit high performance computing in population-based metaheuristics for solving association rule mining problem. Distributed and Parallel Databases, 36(2), 369–397.

  • Djenouri, Y., Drias, H., & Habbas, Z. (2014). Bees swarm optimisation using multiple strategies for association rule mining. International Journal of Bio-Inspired Computation, 6(4), 239–249.

    Article  Google Scholar 

  • Dussaut, J. S., Vidal, P. J., Ponzoni, I., & Olivera, A. C. (2018). Comparing multiobjective evolutionary algorithms for cancer data microarray feature selection. In 2018 IEEE congress on evolutionary computation (CEC) (pp. 1–8).

  • Ebrahimpour, M. K., Nezamabadi-Pour, H., & Eftekhari, M. (2018). Ccfs: A cooperating coevolution technique for large scale feature selection on microarray datasets. Computational Biology and Chemistry, 73, 171–178.

    Article  Google Scholar 

  • Ezugwu, A. E. (2020). Nature-inspired metaheuristic techniques for automatic clustering: A survey and performance study. SN Applied Sciences, 2, 273.

  • Fahad, A., Alshatri, N., Tari, Z., Alamri, A., Khalil, I., Zomaya, A. Y., Foufou, S., & Bouras, A. (2014). A survey of clustering algorithms for big data: Taxonomy and empirical analysis. IEEE Transactions on Emerging Topics in Computing, 2(3), 267–279.

    Article  Google Scholar 

  • Fong, S., Wong, R., & Vasilakos, A. V. (2016). Accelerated pso swarm search feature selection for data stream mining big data. IEEE Transactions on Services Computing, 9(1), 33–45.

    Google Scholar 

  • Freitas, A. A. (2008). A review of evolutionary algorithms for data mining. In O. Maimon, & L. Rokach (Eds.), Soft computing for knowledge discovery and data mining (pp. 79–111). Springer.

  • Freitas, A. A. (2013). Data mining and knowledge discovery with evolutionary algorithms. Springer.

  • Friedrichs, F., & Igel, C. (2005). Evolutionary tuning of multiple svm parameters. Neurocomputing, 64, 107–117.

    Article  Google Scholar 

  • Fong, S., Deb, S., & Yang, X. S. (2018). How meta-heuristic algorithms contribute to deep learning in the hype of big data analytics. In P. K. Sa, M. N. Sahoo, M. Murugappan, Y. Wu, & B. Majhi (Eds.), Progress in intelligent computing techniques: Theory, practice, and applications. (pp. 3–25). Springer.

  • Gao, W. (2016). Improved ant colony clustering algorithm and its performance study. Computational Intelligence and Neuroscience, 2016,14.

  • García-Nieto, J., Alba, E., Jourdan, L., & Talbi, E. G. (2009). Sensitivity and specificity based multiobjective approach for feature selection: Application to cancer diagnosis. Information Processing Letters, 109, 887–896.

    Article  Google Scholar 

  • García Piquer, Á. (2012). Facing-up challenges of multiobjective clustering based on evolutionary algorithms: Representations, scalability and retrieval solutions. Ph.D. thesis, Universitat Ramon Llull.

  • Gheraibia, Y., Moussaoui, A., Djenouri, Y., Kabir, S., & Yin, P. Y. (2016). Penguins search optimisation algorithm for association rules mining. Journal of Computing and Information Technology, 24(2), 165–179.

    Article  Google Scholar 

  • Ghosh, A., Halder, A., Kothari, M., & Ghosh, S. (2008). Aggregation pheromone density based data clustering. Information Sciences, 178(13), 2816–2831.

    Article  Google Scholar 

  • Ghosh, A., & Nath, B. (2004). Multi-objective rule mining using genetic algorithms. Information Sciences, 163(1), 123–133.

    Article  Google Scholar 

  • Green, R. C., Wang, L., & Alam, M. (2012). Training neural networks using central force optimization and particle swarm optimization: Insights and comparisons. Expert Systems with Applications, 39(1), 555–563.

    Article  Google Scholar 

  • Gu, S., Cheng, R., & Jin, Y. (2018). Feature selection for high-dimensional classification using a competitive swarm optimizer. Soft Computing, 22(3), 811–822.

    Article  Google Scholar 

  • Gupta, G. P., & Jha, S. (2018). Integrated clustering and routing protocol for wireless sensor networks using cuckoo and harmony search based metaheuristic techniques. Engineering Applications of Artificial Intelligence, 68, 101–109.

    Article  Google Scholar 

  • Guyon, I., & Elisseeff, A. (2003). An introduction to variable and feature selection. The Journal of Machine Learning Research, 3, 1157–1182.

    Google Scholar 

  • Han, J. (2005). Data mining: Concepts and techniques. Morgan Kaufmann Publishers Inc.

  • Han, X., Quan, L., Xiong, X., Almeter, M., Xiang, J., & Lan, Y. (2017). A novel data clustering algorithm based on modified gravitational search algorithm. Engineering Applications of Artificial Intelligence, 61, 1–7.

    Article  Google Scholar 

  • Handl, J., & Knowles, J.(2004). Evolutionary multiobjective clustering. In Proceedings of the eighth international conference on parallel problem solving from nature (pp. 1081–1091). Springer.

  • Handl, J., & Knowles, J. (2012). Clustering criteria in multiobjective data clustering. Lecture notes in computer science. In C. Coello, V. Cutello, K. Deb, S. Forrest, G. Nicosia, & M. Pavone (Eds.), Parallel Problem Solving from Nature—PPSN XII (Vol. 7492, pp. 32–41). Springer.

  • Handl, J., Knowles, J., & Kell, D. (2005). Computational cluster validation in post-genomic data analysis. Bioinformatics, 21(15), 3201–3212.

    Article  Google Scholar 

  • Handl, J., & Knowles, J. D. (2007). An evolutionary approach to multiobjective clustering. IEEE Transactions Evolutionary Computation, 11(1), 56–76.

    Article  Google Scholar 

  • Handl, J., & Meyer, B. (2007). Ant-based and swarm-based clustering. Swarm Intelligence, 1(2), 95–113.

    Article  Google Scholar 

  • Heraguemi, K. E., Kamel, N., & Drias, H. (2016). Multi-swarm bat algorithm for association rule mining using multiple cooperative strategies. Applied Intelligence, 45(4), 1021–1033.

    Article  Google Scholar 

  • Hilderman, R., & Hamilton, H. J. (2013). Knowledge discovery and measures of interest (Vol. 638). Springer.

  • Holden, N., & Freitas, A. A. (2005). A hybrid particle swarm/ant colony algorithm for the classification of hierarchical biological data. In: SIS (pp. 100–107).

  • Holden, N., & Freitas, A. A. (2008). A hybrid pso/aco algorithm for discovering classification rules in data mining. Journal of Artificial Evolution and Applications, 2008, 2:1-2:11.

    Article  Google Scholar 

  • Hruschka, E., Campello, R., Freitas, A., & de Carvalho, A. (2009). A survey of evolutionary algorithms for clustering. IEEE Transactions on Systems, Man and Cybernetics Part C: Applications and Reviews, 39(2), 133–155.

    Article  Google Scholar 

  • Hu, J., & Yang-Li, X.(2007). Association rules mining using multi-objective coevolutionary algorithm. In International conference on computational intelligence and security workshops, 2007. CISW 2007 (pp. 405–408). IEEE.

  • Huang, D. S., & Du, J. X. (2008). A constructive hybrid structure optimization methodology for radial basis probabilistic neural networks. IEEE Transactions on Neural Networks, 19(12), 2099–2115.

    Article  Google Scholar 

  • Igel, C., Wiegand, S., & Friedrichs, F. (2005). Evolutionary optimization of neural systems: The use of strategy adaptation. In D. H. Mache, J. Szabados, & M. G. de Bruin (Eds.), Trends and applications in constructive approximation (pp. 103–123). Springer.

  • Fister Jr, I., Galvez, A., Osaba, E., Ser, J. D., Iglesias, A., & Fister, I. (2019). Discovering dependencies among mined association rules with population-based metaheuristics. In Proceedings of the genetic and evolutionary computation conference companion (pp. 1668–1674).

  • Jacques, J., Martin-Huyghe, H., Lemtiri-Florek, J., Taillard, J., Jourdan, L., Dhaenens, C., Delerue, D., Hansske, A., & Leclercq, V. (2020). The detection of hospitalized patients at risk of testing positive to multi-drug resistant bacteria using MOCA-I, a rule-based “white-box” classification algorithm for medical data. International Journal of Medical Informatics, 142, 6.

  • Jacques, J., Taillard, J., Delerue, D., Dhaenens, C., & Jourdan, L. (2015). Conception of a dominance-based multi-objective local search in the context of classification rule mining in large and imbalanced data sets. Applied Soft Computing, 34, 705–720.

    Article  Google Scholar 

  • Jacques, J., Taillard, J., Delerue, D., Jourdan, L., & Dhaenens, C. (2013). The benefits of using multi-objectivization for mining pittsburgh partial classification rules in imbalanced and discrete data. In Proceedings of the 15th annual conference on genetic and evolutionary computation (pp. 543–550). ACM.

  • José-García, A., & Gómez-Flores, W. (2016). Automatic clustering using nature-inspired metaheuristics: A survey. Applied Soft Computing Journal, 41, 192–213.

    Article  Google Scholar 

  • Juliusdottir, T., Corne, D., Keedwell, E., & Narayanan, A.(2005). Two-phase ea/k-nn for feature selection and classification in cancer microarray datasets. In Proceedings of the 2005 IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology, CIBCB 2005, Embassy Suites Hotel La Jolla, La Jolla, CA, USA, November 14 & 15, 2005 (pp. 1–8). IEEE.

  • Kaufman, L., & Rousseeuw, P. (1990). Finding groups in data: An introduction to cluster analysis. Wiley Series in Probability and Statistics. Wiley.

  • Kaufman, L., & Rousseeuw, P. J. (2008). Partitioning around medoids (program PAM), chap. 2 (pp. 68–125). Wiley.

  • Kaya, M. (2006). Multi-objective genetic algorithm based approaches for mining optimized fuzzy association rules. Soft Computing, 10(7), 578–586.

    Article  Google Scholar 

  • Kaya, M., & Alhajj, R. (2005). Genetic algorithm based framework for mining fuzzy association rules. Fuzzy Sets and Systems, 152(3), 587–601.

    Article  Google Scholar 

  • Kazmi, S., Javaid, N., Mughal, M. J., Akbar, M., Ahmed, S. H., & Alrajeh, N. (2017). Towards optimization of metaheuristic algorithms for iot enabled smart homes targeting balanced demand and supply of energy. IEEE Access.

  • Khabzaoui, M., Dhaenens, C., & Talbi, E. G. (2004). A multicriteria genetic algorithm to analyze microarray data. In Congress on evolutionary computation, 2004. CEC2004 (Vol. 2, pp. 1874–1881).

  • Khabzaoui, M., Dhaenens, C., & Talbi, E. G. (2005). Parallel genetic algorithms for multi-objective rule mining. MIC2005. The 6th

  • Khabzaoui, M., Dhaenens, C., & Talbi, E. G. (2008). Combining evolutionary algorithms and exact approaches for multi-objective knowledge discovery. RAIRO-Operations Research-Recherche Opérationnelle, 42(1), 69–83.

    Article  Google Scholar 

  • Khan, K., & Sahai, A. (2012). A comparison BA, GA, PSO, BP and LM for training feed forward neural networks in e-learning context. International Journal of Intelligent Systems and Applications, 4(7), 23.

    Article  Google Scholar 

  • Kim, Y., Street, W., & Menczer, F. (2002). Data mining: Opportunities and challenges, chap. Feature selection in data mining (pp. 80–105). Idea Group.

  • Kira, K., & Rendell, L. A. (1992). A practical approach to feature selection. In Proceedings of the ninth international workshop on Machine learning (pp. 249–256).

  • Kohavi, R., & John, G. H. (1997). Wrappers for feature subset selection. Artificial intelligence, 97(1), 273–324.

    Article  Google Scholar 

  • Krishna, K., & Murty, M. (1999). Genetic k-means algorithm. IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics, 29(3), 433–439.

    Article  Google Scholar 

  • Kurada, R. R., Pavan, D. K. K., & Rao, D. A. (2013). A preliminary survey on optimized multiobjective metaheuristic methods for data clustering using evolutionary approaches. arXiv preprint arXiv:1312.2366.

  • Laney, D. (2001). 3d data management: Controlling data volume, velocity and variety. Gartner Retrieved, 6.

  • Larose, D. T. (2014). Discovering knowledge in data: An introduction to data mining. Wiley.

  • Leung, S., Tang, Y., & Wong, W. K. (2012). A hybrid particle swarm optimization and its application in neural networks. Expert Systems with Applications, 39(1), 395–405.

    Article  Google Scholar 

  • Li, L., Wan, M., Xiao, J., Wang, C., & Yang, Y. (2012). Data clustering using bacterial foraging optimization. Journal of Intelligent Information Systems, 38(2), 321–341.

    Article  Google Scholar 

  • Liu, H., & Motoda, H. (2007). Computational methods of feature selection (Chapman & Hall/Crc data mining and knowledge discovery series). Chapman & Hall/CRC.

  • Liu, W., & Wang, J. A. (2019). Brief survey on nature-inspired metaheuristics for feature selection in classification in this decade. In IEEE 16th international conference on networking, sensing and control (ICNSC) (pp. 424-429).

  • Ma, B. B., Fong, S., & Millham, R. (2018). Data stream mining in fog computing environment with feature selection using ensemble of swarm search algorithms. In 2018 conference on information communications technology and society (ICTAS) (pp 1–6). IEEE.

  • Maimon, O., & Rokach, L. (2007). Soft computing for knowledge discovery and data mining. Springer.

  • Maimon, O., & Rokach, L. (2010). Data mining and knowledge discovery handbook (2nd ed.). Springer.

  • Manikandan, R., & Kalpana, A. (2017). Feature selection using fish swarm optimization in big data. Cluster Computing, 22(5), 10825–10837.

  • Marinakis, Y., Marinaki, M., Doumpos, M., Matsatsinis, N., & Zopounidis, C. (2008). Optimization of nearest neighbor classifiers via metaheuristic algorithms for credit risk assessment. Journal of Global Optimization, 42(2), 279–293.

    Article  Google Scholar 

  • Matthews, S. G., Gongora, M. A., & Hopgood, A. A. (2011). Evolving temporal fuzzy association rules from quantitative data with a multi-objective evolutionary algorithm. In E. Corchado, M. Kurzyński, & M. Woźniak (Eds.), Hybrid artificial intelligent systems (pp. 198–205). Springer.

  • Maulik, U., & Bandyopadhyay, S. (2000). Genetic algorithm-based clustering technique. Pattern Recognition, 33(9), 1455–1465.

    Article  Google Scholar 

  • McQueen, J. (1967). Some methods for classification and analysis of multivariate observations. In Proceedings of the fifth Berkeley symposium on mathematical statistics and probability (pp. 281–297).

  • Meisel, S., & Mattfeld, D. (2010). Synergies of operations research and data mining. European Journal of Operational Research, 206(1), 1–10.

    Article  Google Scholar 

  • Mlakar, U., Zorman, M., Fister, I., Jr., & Fister, I. (2017). Modified binary cuckoo search for association rule mining. Journal of Intelligent& Fuzzy Systems, 32(6), 4319–4330.

    Article  Google Scholar 

  • Mohanty, P. P., Nayak, S. K., Mohapatra, U. M., & Mishra, D. (2019). A survey on partitional clustering using single-objective metaheuristic approach. International Journal of Innovative Computing and Applications, 10(3–4), 207–226.

    Article  Google Scholar 

  • Mukhopadhyay, A., & Maulik, U. (2011). A multiobjective approach to MR brain image segmentation. Applied Soft Computing, 11(1), 872–880.

    Article  Google Scholar 

  • Mukhopadhyay, A., Maulik, U., & Bandyopadhyay, S. (2009). Multiobjective genetic algorithm-based fuzzy clustering of categorical attributes. IEEE Transactions Evolutionary Computation, 13(5), 991–1005.

    Article  Google Scholar 

  • Mukhopadhyay, A., Maulik, U., & Bandyopadhyay, S. (2015). A survey of multiobjective evolutionary clustering. ACM Computing Surveys, 47(4), 61.

    Article  Google Scholar 

  • Mukhopadhyay, A., Maulik, U., Bandyopadhyay, S., & Coello, C. (2014). Survey of multiobjective evolutionary algorithms for data mining: Part ii. IEEE Transactions on Evolutionary Computation, 18(1), 20–35.

    Article  Google Scholar 

  • Murthy, C., & Chowdhury, N. (1996). In search of optimal clusters using genetic algorithms. Pattern Recognition Letters, 17(8), 825–832. https://doi.org/10.1016/0167-8655(96)00043-8

    Article  Google Scholar 

  • Nanda, S. J., & Panda, G. (2014). A survey on nature inspired metaheuristic algorithms for partitional clustering. Swarm and Evolutionary Computation, 16, 1–18.

    Article  Google Scholar 

  • Narendra, P. M., & Fukunaga, K. (1977). A branch and bound algorithm for feature subset selection. IEEE Transactions on Computers, 26(9), 917–922.

    Article  Google Scholar 

  • Nunez, S. G., & Attoh-Okine, N. (2014). Metaheuristics in big data: An approach to railway engineering. In 2014 IEEE international conference on big data (big data) (pp. 42–47). IEEE.

  • Olafsson, S., Li, X., & Wu, S. (2008). Operations research and data mining. European Journal of Operational Research, 187(3), 1429–1448.

    Article  Google Scholar 

  • Otero, F. E., Freitas, A. A., & Johnson, C. G. (2012). Inducing decision trees with an ant colony optimization algorithm. Applied Soft Computing, 12(11), 3615–3626.

    Article  Google Scholar 

  • Ozbakir, L., & Turna, F. (2017). Clustering performance comparison of new generation meta-heuristic algorithms. Knowledge-Based Systems, 130, 1–16.

    Article  Google Scholar 

  • Pandove, D., Goel, S., & Rani, R. (2018). Systematic review of clustering high-dimensional and large datasets. ACM Transactions on Knowledge Discovery from Data, 12(2), 16:1-16:68.

    Article  Google Scholar 

  • Qodmanan, H. R., Nasiri, M., & Minaei-Bidgoli, B. (2011). Multi objective association rule mining with genetic algorithm without specifying minimum support and minimum confidence. Expert Systems with Applications, 38(1), 288–298.

    Article  Google Scholar 

  • Rana, S., Jasola, S., & Kumar, R. (2011). A review on particle swarm optimization algorithms and their applications to data clustering. Artificial Intelligence Review, 35(3), 211–222.

    Article  Google Scholar 

  • Rebentrost, P., Mohseni, M., & Lloyd, S. (2013). Quantum support vector machine for big feature and big data classification. arXiv preprint arXiv:1307.0471.

  • Refaeilzadeh, P., Tang, L., & Liu, H. (2009). Cross-validation. In L. Liu, & M. Özsu (Eds.), Encyclopedia of database systems (pp. 532–538). Springer.

  • Salama, K. M., Abdelbar, A. M., & Otero, F. E. (2015). Investigating evaluation measures in ant colony algorithms for learning decision tree classifiers. In 2015 IEEE symposium series on computational intelligence.

  • Salama, K. M., & Otero, F. E. (2014). Learning multi-tree classification models with ant colony optimization. In Proceedings international conference on evolutionary computation theory and applications (ECTA-14) (pp 38–48).

  • Salleb-Aouissi, A., Vrain, C., & Nortet, C. (2007). Quantminer: A genetic algorithm for mining quantitative association rules. In IJCAI (Vol. 7).

  • Sarkar, M., Yegnanarayana, B., & Khemani, D. (1997). A clustering algorithm using an evolutionary programming-based approach. Pattern Recognition Letters, 18(10), 975–986.

    Article  Google Scholar 

  • Sawhney, R., Mathur, P., & Shankar, R. (2018). A firefly algorithm based wrapper-penalty feature selection method for cancer diagnosis. In O. Gervasi, B. Murgante, S. Misra, E. Stankova, C. M. Torre, A. M. A. Rocha, D. Taniar, B. O. Apduhan, E. Tarantino, & Y. Ryu (Eds.), Computational science and its applications—ICCSA 2018 (pp. 438–449). Springer International Publishing.

  • Sayed, A. A., Abdallah, M. M., Zaki, A. M., & Ahmed, A. A. (2020). Big data analysis using a metaheuristic algorithm: Twitter as case study. In 2020 IEEE international conference on innovative trends in communication and computer engineering (ITCE) (pp. 20–26).

  • Selvi, R. S., & Valarmathi, M. L. (2017). An improved firefly heuristics for efficient feature selection and its application in big data. Biomedical Research, 28, S236–S241.

    Google Scholar 

  • Shah, S. C., & Kusiak, A. (2004). Data mining and genetic algorithm based gene/snp selection. Artificial Intelligence in Medicine, 31(3), 183–196.

    Article  Google Scholar 

  • Sheikh, R. H., Raghuwanshi, M. M., & Jaiswal, A. N. (2008). Genetic algorithm based clustering: A survey. In First international conference on emerging trends in engineering and technology (pp. 314–319). IEEE.

  • Shelokar, P., Jayaraman, V., & Kulkarni, B. (2004). An ant colony approach for clustering. Analytica Chimica Acta, 509(2), 187–195.

    Article  Google Scholar 

  • Shenoy, P. D., Srinivasa, K., Venugopal, K., & Patnaik, L. M. (2003). Evolutionary approach for mining association rules on dynamic databases. In K.-Y. Whang, J. Jeon, K. Shim, & J. Srivastava (Eds.), Advances in knowledge discovery and data mining (pp. 325–336). Springer.

  • Shenoy, P. D., Srinivasa, K., Venugopal, K., & Patnaik, L. M. (2005). Dynamic association rule mining using genetic algorithms. Intelligent Data Analysis, 9(5), 439–453.

    Article  Google Scholar 

  • Shi, S. Y., Suganthan, P. N., & Deb, K. (2004). Multiclass protein fold recognition using multiobjective evolutionary algorithms. In Proceedings of the 2004 IEEE symposium on computational intelligence in bioinformatics and computational biology, 2004. CIBCB’04 (pp. 61–66). IEEE.

  • Shvachko, K., Kuang, H., Radia, S., & Chansler, R. (2010). The Hadoop distributed file system. In Proceedings of the 2010 IEEE 26th symposium on mass storage systems and technologies (MSST), MSST ’10 (pp. 1–10). IEEE Computer Society, Washington, DC, USA.

  • Shukla, A. K., Tripathi, D., Reddy, B. R., & Chandramohan, D. (2020). A study on metaheuristics approaches for gene selection in microarray data: Algorithms, applications and open challenges. Evolutionary Intelligence, 13, 309–329.

    Article  Google Scholar 

  • Sibson, R. (1973). Slink: An optimally efficient algorithm for the single-link cluster method. The Computer Journal, 16(1), 30–34.

    Article  Google Scholar 

  • Siedlecki, W., & Sklansky, J. (1989). A note on genetic algorithms for large-scale feature selection. Pattern Recognition Letters, 10(5), 335–347.

    Article  Google Scholar 

  • Sklansky, J., & Vriesenga, M. (1996). Genetic selection and neural modeling of piecewise-linear classifiers. International Journal of Pattern Recognition and Artificial Intelligence, 10(05), 587–612.

    Article  Google Scholar 

  • Song, A., Song, J., Ding, X., Xu, G., & Chen, J. (2017). Utilizing bat algorithm to optimize membership functions for fuzzy association rules mining. In International conference on database and expert systems applications (pp. 496–504). Springer.

  • Sörensen, K. (2015). Metaheuristics-The metaphor exposed. International Transactions in Operational Research, 22(1), 3–18.

    Article  Google Scholar 

  • Suthaharan, S. (2015). Machine learning models and algorithms for big data classification: Thinking with examples for effective learning (Vol. 36). Springer.

  • Suttorp, T., & Igel, C. (2006). Multi-objective optimization of support vector machines. In Y. Jin (Ed), Multi-objective machine learning (pp. 199–220). Springer.

  • Talbi, E.-G. (2020). Optimization of deep neural networks: A survey and unified taxonomy. hal-02570804v2.

  • Tang, R., & Fong, S. (2018). Clustering big IoT data by metaheuristic optimized mini-batch and parallel partition-based dgc in hadoop. Future Generation Computer Systems, 86, 1395–1412.

    Article  Google Scholar 

  • Triguero, I., Peralta, D., Bacardit, J., García, S., & Herrera, F. (2015). Mrpr: A mapreduce solution for prototype reduction in big data classification. Neurocomputing, 150, 331–345.

    Article  Google Scholar 

  • Tripathi, A. K., Sharma, K., Bala, M., Kumar, A., Menon, V. G., & Bashir, A. K. (2021). A parallel military-dog-based algorithm for clustering big data in cognitive industrial internet of things. IEEE Transactions on Industrial Informatics, 17(3), 2134–2142.

    Article  Google Scholar 

  • Tripathi, A. K., Sharma, K., & Bala, M. (2018). A novel clustering method using enhanced grey wolf optimizer and MapReduce. Big Data Research, 14, 93–100.

    Article  Google Scholar 

  • Tsai, C. W., Chiang, M. C., Ksentini, A., & Chen, M. (2016). Metaheuristic algorithms for healthcare: Open issues and challenges. Computers& Electrical Engineering, 53, 421–434.

    Article  Google Scholar 

  • Tsai, C. W., Liu, S. J., & Wang, Y. C. (2018). A parallel metaheuristic data clustering framework for cloud. Journal of Parallel and Distributed Computing, 116, 39–49.

    Article  Google Scholar 

  • Tseng, L., & Yang, S. (2001). Genetic approach to the automatic clustering problem. Pattern Recognition, 34(2), 415–424.

    Article  Google Scholar 

  • Vandromme, M., Jacques, J., Taillard, J., Jourdan, L., & Dhaenens, C. (2020). A biclustering method for heterogeneous and temporal medical data. IEEE Transactions on Knowledge and Data Engineering.

  • Xu, X., Chen, L., & Chen, Y. (2004). A4c: An adaptive artificial ants clustering algorithm. In Proceedings of the 2004 IEEE symposium on computational intelligence in bioinformatics and computational biology, 2004. CIBCB ’04 (pp. 268–275).

  • Xue, B., Zhang, M., & Browne, W. N. (2013). Particle swarm optimization for feature selection in classification: A multi-objective approach. IEEE Transactions on Cybernetics, 43(6), 1656–1671.

    Article  Google Scholar 

  • Xue, B., Zhang, M., Browne, W. N., & Yao, X. (2016). A survey on evolutionary computation approaches to feature selection. IEEE Transactions on Evolutionary Computation, 20(4), 606–626.

    Article  Google Scholar 

  • Yan, X., Zhang, C., & Zhang, S. (2009). Genetic algorithm-based strategy for identifying association rules without specifying actual minimum support. Expert Systems with Applications, 36(2), 3066–3076.

    Article  Google Scholar 

  • Yang, C. S., Chuang, L. Y., Chen, Y. J., & Yang, C. H. (2008). Feature selection using memetic algorithms. In Third international conference on convergence and hybrid information technology, 2008. ICCIT’08 (Vol. 1, pp. 416–423). IEEE.

  • Yifei, Z., Jia, L., & Cao, H. (2012). Multi-objective gene expression programming for clustering. Information Technology and Control, 41(3), 283–294.

    Google Scholar 

  • Zhang, Y., Gong, D. W., & Cheng, J. (2017). Multi-objective particle swarm optimization approach for cost-based feature selection in classification. IEEE/ACM Transactions on Computational Biology Bioinformatics, 14(1), 64–75.

    Article  Google Scholar 

  • Zheng, B., Zhang, J., Yoon, S. W., Lam, S. S., Khasawneh, M., & Poranki, S. (2015). Predictive modeling of hospital readmissions using metaheuristics and data mining. Expert Systems with Applications, 42(20), 7110–7120.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Clarisse Dhaenens.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This is an updated version of the paper that appeared in 4OR, 17(2), 115–139 (2019)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Dhaenens, C., Jourdan, L. Metaheuristics for data mining: survey and opportunities for big data. Ann Oper Res 314, 117–140 (2022). https://doi.org/10.1007/s10479-021-04496-0

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10479-021-04496-0

Keywords

Mathematics subject classification

Navigation