Abstract
A large number of association rules often minimizes the reliability of data mining results; hence, a dimensionality reduction technique is crucial for data analysis. When analyzing massive datasets, existing models take more time to scan the entire database because they discover unnecessary items and transactions that are not necessary for data analysis. For this purpose, the Fuzzy Rough Set-based Horse Herd Optimization (FRS-HHO) algorithm is proposed to be integrated with the Map Reduce algorithm to minimize query retrieval time and improve performance. The HHO algorithm minimizes the number of unnecessary items and transactions with minimal support value from the dataset to maximize fitness based on multiple objectives such as support, confidence, interestingness, and lift to evaluate the quality of association rules. The feature value of each item in the population is obtained by a Map Reduce-based fitness function to generate optimal frequent itemsets with minimum time. The Horse Herd Optimization (HHO) is employed to solve the high-dimensional optimization problems. The proposed FRS-HHO approach takes less time to execute for dimensions and has a space complexity of 38% for a total of 10 k transactions. Also, the FRS-HHO approach offers a speedup rate of 17% and a 12% decrease in input–output communication cost when compared to other approaches. The proposed FRS-HHO model enhances performance in terms of execution time, space complexity, and speed.
Similar content being viewed by others
Availability of data and material
The data that support the findings of this study are available from the corresponding author upon reasonable request.
References
Abualigah L, Diabat A, Mirjalili S, Abd Elaziz M, Gandomi AH (2021) The arithmetic optimization algorithm. Comput Methods Appl Mech Eng 376:113609
Agapito G, Guzzi PH, Cannataro M (2021) Parallel and distributed association rule mining in life science: a novel parallel algorithm to mine genomics data. Inf Sci 575:747–761
Agushaka JO, Ezugwu AE, Abualigah L (2023) Gazelle optimization algorithm: a novel nature-inspired metaheuristic optimizer. Neural Comput Appl 35(5):4099–4131
Agushaka JO, Ezugwu AE, Abualigah L (2022) Dwarf mongoose optimization algorithm. Comput Method Appl Mech Eng 391:114570
Annie LCM, Kumar AD (2012) Market basket analysis for a supermarket based on frequent itemset mining. Int J Comput Sci Issues (IJCSI) 9(5):257
Badhon B, Kabir MMJ, Xu S, Kabir M (2021) A survey on association rule mining based on evolutionary algorithms. Int J Comput Appl 43(8):775–785
Bai X, Jia J, Wei Q, Huang S, Du W, Gao W (2019) Association rule mining algorithm based on Spark for pesticide transaction data analyses. Int J Agric Biol Eng 12(5):162–166
Barkhordari M, Niamanesh M (2018) Kavosh: an effective Map-Reduce-based association rule mining method. J Big Data 5(1):25
Bhattacharya N, Mondal S, Khatua S (2019) A MapReduce-based association rule mining using Hadoop cluster—An application of disease analysis. In Innovations in Computer Science and Engineering: Proceedings of the Sixth ICICSE 2018 (pp. 533–541). Springer Singapore.
Dean J, Ghemawat S (2008) MapReduce: simplified data processing on large clusters. Commun ACM 51(1):107–113
Di Caprio D, Ebrahimnejad A, Alrezaamiri H, Santos-Arteaga FJ (2022) A novel ant colony algorithm for solving shortest path problems with fuzzy arc weights. Alex Eng J 61(5):3403–3415
Djenouri Y, Belhadi A, Fournier-Viger P (2018) Extracting useful knowledge from event logs: a frequent itemset mining approach. Knowl-Based Syst 139:132–148
Fournier-Viger P, Lin JCW, Vo B, Chi TT, Zhang J, Le HB (2017) A survey of itemset mining. Wiley Interdiscip Rev Data Mining Knowl Discov 7(4):e1207
Gan W, Lin JCW, Chao HC, Zhan J (2017) Data mining in distributed environment: a survey. Wiley Interdiscip Rev Data Mining Knowl Discov 7(6):e1216
Gan W, Lin JCW, Fournier-Viger P, Chao HC, Yu PS (2019) A survey of parallel sequential pattern mining. ACM Trans Knowl Discov from Data (TKDD) 13(3):1–34
Gunawan R, Winarko E, Pulungan R (2020) A BPSO-based method for high-utility itemset mining without minimum utility threshold. Knowl-Based Syst 190:105164
Gupta A, Begum SA (2023) Fuzzy Rough Set-Based Feature Selection for Text Categorization. In Fuzzy, Rough and Intuitionistic Fuzzy Set Approaches for Data Handling: Theory and Applications (pp. 65–85). Singapore: Springer Nature Singapore.
Jamshed A, Mallick B, Kumar P (2020) Deep learning-based sequential pattern mining for progressive database. Soft Comput 24:17233–17246
Kannimuthu S, Premalatha K (2014) Discovery of high utility itemsets using genetic algorithm with ranked mutation. Appl Artif Intell 28(4):337–359
Leng ZH, Fan JC (2021) NRS-CSO: neighbourhood rough set-based cat swarm optimisation algorithms. Int J Comput Sci Math 13(2):156–166
Lin CW, Hong TP (2013) A survey of fuzzy web mining. Wiley Interdiscip Rev Data Mining Knowl Discovery 3(3):190–199
Lin JCW, Djenouri Y, Srivastava G (2021) Efficient closed high-utility pattern fusion model in large-scale databases. Information Fusion 76:122–132
Lin JCW, Djenouri Y, Srivastava G, Li Y, Yu PS (2021) Scalable mining of high-utility sequential patterns with three-tier MapReduce model. ACM Trans Knowl Discov from Data (TKDD) 16(3):1–26
Matrouk KM, Nalavade JE, Alhasen S, Chavan M, Verma N (2023) MapReduce framework based sequential association rule mining with deep learning enabled classification in retail scenario. Cyber Syst 10(1080/01969722):2166256
Menaga D, Saravanan S (2022) GA-PPARM: constraint-based objective function and genetic algorithm for privacy preserved association rule mining. Evol. Intel. 15:1487–1498. https://doi.org/10.1007/s12065-021-00576-z
MiarNaeimi F, Azizyan G, Rashki M (2021) Horse herd optimization algorithm: a nature-inspired algorithm for high-dimensional optimization problems. Knowl-Based Syst 213:106711
Nasr M, Hamdy M, Hegazy D, Bahnasy K (2021) An efficient algorithm for unique class association rule mining. Expert Syst Appl 164:113978
Prajapati DJ, Garg S, Chauhan NC (2017) Interesting association rule mining with consistent and inconsistent rule detection from big sales data in distributed environment. Future Computing and Informatics Journal 2(1):19–30
Qiu H, Gu R, Yuan C, Huang Y (2014, May) Yafim: a parallel frequent itemset mining algorithm with spark. In 2014 IEEE international parallel & distributed processing symposium workshops (pp. 1664–1671). IEEE.
Raj S, Ramesh D, Sethi KK (2021) A Spark-based a priori algorithm with reduced shuffle overhead. J Supercomput 77(1):133–151
Sarwar S, Hafeez MA, Javed MY, Asghar AB, Ejsmont K (2022) A horse herd optimization algorithm (HOA)-based MPPT technique under partial and complex partial shading conditions. Energies 15(5):1880
Senthilkumar A, Hari Prasad D (2020) An efficient FP-Growth based association rule mining algorithm using Hadoop MapReduce. Indian J Sci Technol 13(34):3561–3571
Sharma K, Bajaj M (2023) DeepWalk based influence maximization (DWIM): influence maximization using deep learning. Intell Autom & Soft Computing. https://doi.org/10.32604/iasc.2023.026134
Sharmila S, Vijayarani S (2021) Association rule mining using fuzzy logic and whale optimization algorithm. Soft Comput 25:1431–1446
Subramanian K, Kandhasamy P (2015) UP-GNIV: an expeditious high utility pattern mining algorithm for itemsets with negative utility values. Int J Inf Technol Manage 14(1):26–42
Subramanian K, Kandhasamy P, Subramanian S (2012) A novel approach to extract high utility itemsets from distributed databases. Computing and Informatics 31(6+):1597–1615
Wen H, Kou M, He H, Li X, Tou H, Yang Y (2018, October) A spark-based incremental algorithm for frequent itemset mining. In Proceedings of the 2018 2nd International Conference on Big Data and Internet of Things (pp. 53–58).
Wu JMT, Liu S, Lin JCW (2022) Efficient uncertain sequence pattern mining based on hadoop platform. J Circ Syst Comput 31(15):2250261
Wu JMT, Srivastava G, Wei M, Yun U, Lin JCW (2021) Fuzzy high-utility pattern mining in parallel and distributed Hadoop framework. Inf Sci 553:31–48
Yao W, Zhang G, Zhou CJ (2023) Real-valued hemimetric-based Fuzzy Rough Sets and an application to contour extraction of digital surfaces. Fuzzy Sets Syst 459:201–219
Funding
Not applicable.
Author information
Authors and Affiliations
Contributions
All agreed on the content of the study. DS and MK collected all the data for analysis. DS and MK agreed on the methodology. DS and MK completed the analysis based on agreed steps. Results and conclusions are discussed and written together. All authors read and approved the final manuscript.
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest.
Consent to participate
Not applicable.
Consent for publication
Not applicable.
Human and animal rights
This article does not contain any studies with human or animal subjects performed by any of the authors.
Informed consent
Informed consent was obtained from all individual participants included in the study.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Sudha, D., Krishnamurthy, M. A fuzzy rough set-based horse herd optimization algorithm for map reduce framework for customer behavior data. Knowl Inf Syst (2024). https://doi.org/10.1007/s10115-024-02105-7
Received:
Revised:
Accepted:
Published:
DOI: https://doi.org/10.1007/s10115-024-02105-7