Skip to main content
Log in

A fuzzy rough set-based horse herd optimization algorithm for map reduce framework for customer behavior data

  • Regular Paper
  • Published:
Knowledge and Information Systems Aims and scope Submit manuscript

Abstract

A large number of association rules often minimizes the reliability of data mining results; hence, a dimensionality reduction technique is crucial for data analysis. When analyzing massive datasets, existing models take more time to scan the entire database because they discover unnecessary items and transactions that are not necessary for data analysis. For this purpose, the Fuzzy Rough Set-based Horse Herd Optimization (FRS-HHO) algorithm is proposed to be integrated with the Map Reduce algorithm to minimize query retrieval time and improve performance. The HHO algorithm minimizes the number of unnecessary items and transactions with minimal support value from the dataset to maximize fitness based on multiple objectives such as support, confidence, interestingness, and lift to evaluate the quality of association rules. The feature value of each item in the population is obtained by a Map Reduce-based fitness function to generate optimal frequent itemsets with minimum time. The Horse Herd Optimization (HHO) is employed to solve the high-dimensional optimization problems. The proposed FRS-HHO approach takes less time to execute for dimensions and has a space complexity of 38% for a total of 10 k transactions. Also, the FRS-HHO approach offers a speedup rate of 17% and a 12% decrease in input–output communication cost when compared to other approaches. The proposed FRS-HHO model enhances performance in terms of execution time, space complexity, and speed.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Algorithm 1
Algorithm 2
Algorithm 3
Algorithm 4
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13

Similar content being viewed by others

Availability of data and material

The data that support the findings of this study are available from the corresponding author upon reasonable request.

References

  1. Abualigah L, Diabat A, Mirjalili S, Abd Elaziz M, Gandomi AH (2021) The arithmetic optimization algorithm. Comput Methods Appl Mech Eng 376:113609

    Article  MathSciNet  Google Scholar 

  2. Agapito G, Guzzi PH, Cannataro M (2021) Parallel and distributed association rule mining in life science: a novel parallel algorithm to mine genomics data. Inf Sci 575:747–761

    Article  MathSciNet  Google Scholar 

  3. Agushaka JO, Ezugwu AE, Abualigah L (2023) Gazelle optimization algorithm: a novel nature-inspired metaheuristic optimizer. Neural Comput Appl 35(5):4099–4131

    Article  Google Scholar 

  4. Agushaka JO, Ezugwu AE, Abualigah L (2022) Dwarf mongoose optimization algorithm. Comput Method Appl Mech Eng 391:114570

    Article  MathSciNet  Google Scholar 

  5. Annie LCM, Kumar AD (2012) Market basket analysis for a supermarket based on frequent itemset mining. Int J Comput Sci Issues (IJCSI) 9(5):257

    Google Scholar 

  6. Badhon B, Kabir MMJ, Xu S, Kabir M (2021) A survey on association rule mining based on evolutionary algorithms. Int J Comput Appl 43(8):775–785

    Google Scholar 

  7. Bai X, Jia J, Wei Q, Huang S, Du W, Gao W (2019) Association rule mining algorithm based on Spark for pesticide transaction data analyses. Int J Agric Biol Eng 12(5):162–166

    Google Scholar 

  8. Barkhordari M, Niamanesh M (2018) Kavosh: an effective Map-Reduce-based association rule mining method. J Big Data 5(1):25

    Article  Google Scholar 

  9. Bhattacharya N, Mondal S, Khatua S (2019) A MapReduce-based association rule mining using Hadoop cluster—An application of disease analysis. In Innovations in Computer Science and Engineering: Proceedings of the Sixth ICICSE 2018 (pp. 533–541). Springer Singapore.

  10. Dean J, Ghemawat S (2008) MapReduce: simplified data processing on large clusters. Commun ACM 51(1):107–113

    Article  Google Scholar 

  11. Di Caprio D, Ebrahimnejad A, Alrezaamiri H, Santos-Arteaga FJ (2022) A novel ant colony algorithm for solving shortest path problems with fuzzy arc weights. Alex Eng J 61(5):3403–3415

    Article  Google Scholar 

  12. Djenouri Y, Belhadi A, Fournier-Viger P (2018) Extracting useful knowledge from event logs: a frequent itemset mining approach. Knowl-Based Syst 139:132–148

    Article  Google Scholar 

  13. Fournier-Viger P, Lin JCW, Vo B, Chi TT, Zhang J, Le HB (2017) A survey of itemset mining. Wiley Interdiscip Rev Data Mining Knowl Discov 7(4):e1207

    Article  Google Scholar 

  14. Gan W, Lin JCW, Chao HC, Zhan J (2017) Data mining in distributed environment: a survey. Wiley Interdiscip Rev Data Mining Knowl Discov 7(6):e1216

    Article  Google Scholar 

  15. Gan W, Lin JCW, Fournier-Viger P, Chao HC, Yu PS (2019) A survey of parallel sequential pattern mining. ACM Trans Knowl Discov from Data (TKDD) 13(3):1–34

    Article  Google Scholar 

  16. Gunawan R, Winarko E, Pulungan R (2020) A BPSO-based method for high-utility itemset mining without minimum utility threshold. Knowl-Based Syst 190:105164

    Article  Google Scholar 

  17. Gupta A, Begum SA (2023) Fuzzy Rough Set-Based Feature Selection for Text Categorization. In Fuzzy, Rough and Intuitionistic Fuzzy Set Approaches for Data Handling: Theory and Applications (pp. 65–85). Singapore: Springer Nature Singapore.

  18. Jamshed A, Mallick B, Kumar P (2020) Deep learning-based sequential pattern mining for progressive database. Soft Comput 24:17233–17246

    Article  Google Scholar 

  19. Kannimuthu S, Premalatha K (2014) Discovery of high utility itemsets using genetic algorithm with ranked mutation. Appl Artif Intell 28(4):337–359

    Article  Google Scholar 

  20. Leng ZH, Fan JC (2021) NRS-CSO: neighbourhood rough set-based cat swarm optimisation algorithms. Int J Comput Sci Math 13(2):156–166

    Article  MathSciNet  Google Scholar 

  21. Lin CW, Hong TP (2013) A survey of fuzzy web mining. Wiley Interdiscip Rev Data Mining Knowl Discovery 3(3):190–199

    Article  Google Scholar 

  22. Lin JCW, Djenouri Y, Srivastava G (2021) Efficient closed high-utility pattern fusion model in large-scale databases. Information Fusion 76:122–132

    Article  Google Scholar 

  23. Lin JCW, Djenouri Y, Srivastava G, Li Y, Yu PS (2021) Scalable mining of high-utility sequential patterns with three-tier MapReduce model. ACM Trans Knowl Discov from Data (TKDD) 16(3):1–26

    Article  Google Scholar 

  24. Matrouk KM, Nalavade JE, Alhasen S, Chavan M, Verma N (2023) MapReduce framework based sequential association rule mining with deep learning enabled classification in retail scenario. Cyber Syst 10(1080/01969722):2166256

    Google Scholar 

  25. Menaga D, Saravanan S (2022) GA-PPARM: constraint-based objective function and genetic algorithm for privacy preserved association rule mining. Evol. Intel. 15:1487–1498. https://doi.org/10.1007/s12065-021-00576-z

    Article  Google Scholar 

  26. MiarNaeimi F, Azizyan G, Rashki M (2021) Horse herd optimization algorithm: a nature-inspired algorithm for high-dimensional optimization problems. Knowl-Based Syst 213:106711

    Article  Google Scholar 

  27. Nasr M, Hamdy M, Hegazy D, Bahnasy K (2021) An efficient algorithm for unique class association rule mining. Expert Syst Appl 164:113978

    Article  Google Scholar 

  28. Prajapati DJ, Garg S, Chauhan NC (2017) Interesting association rule mining with consistent and inconsistent rule detection from big sales data in distributed environment. Future Computing and Informatics Journal 2(1):19–30

    Article  Google Scholar 

  29. Qiu H, Gu R, Yuan C, Huang Y (2014, May) Yafim: a parallel frequent itemset mining algorithm with spark. In 2014 IEEE international parallel & distributed processing symposium workshops (pp. 1664–1671). IEEE.

  30. Raj S, Ramesh D, Sethi KK (2021) A Spark-based a priori algorithm with reduced shuffle overhead. J Supercomput 77(1):133–151

    Article  Google Scholar 

  31. Sarwar S, Hafeez MA, Javed MY, Asghar AB, Ejsmont K (2022) A horse herd optimization algorithm (HOA)-based MPPT technique under partial and complex partial shading conditions. Energies 15(5):1880

    Article  Google Scholar 

  32. Senthilkumar A, Hari Prasad D (2020) An efficient FP-Growth based association rule mining algorithm using Hadoop MapReduce. Indian J Sci Technol 13(34):3561–3571

    Article  Google Scholar 

  33. Sharma K, Bajaj M (2023) DeepWalk based influence maximization (DWIM): influence maximization using deep learning. Intell Autom & Soft Computing. https://doi.org/10.32604/iasc.2023.026134

    Article  Google Scholar 

  34. Sharmila S, Vijayarani S (2021) Association rule mining using fuzzy logic and whale optimization algorithm. Soft Comput 25:1431–1446

    Article  Google Scholar 

  35. Subramanian K, Kandhasamy P (2015) UP-GNIV: an expeditious high utility pattern mining algorithm for itemsets with negative utility values. Int J Inf Technol Manage 14(1):26–42

    Google Scholar 

  36. Subramanian K, Kandhasamy P, Subramanian S (2012) A novel approach to extract high utility itemsets from distributed databases. Computing and Informatics 31(6+):1597–1615

    Google Scholar 

  37. Wen H, Kou M, He H, Li X, Tou H, Yang Y (2018, October) A spark-based incremental algorithm for frequent itemset mining. In Proceedings of the 2018 2nd International Conference on Big Data and Internet of Things (pp. 53–58).

  38. Wu JMT, Liu S, Lin JCW (2022) Efficient uncertain sequence pattern mining based on hadoop platform. J Circ Syst Comput 31(15):2250261

    Article  Google Scholar 

  39. Wu JMT, Srivastava G, Wei M, Yun U, Lin JCW (2021) Fuzzy high-utility pattern mining in parallel and distributed Hadoop framework. Inf Sci 553:31–48

    Article  MathSciNet  Google Scholar 

  40. Yao W, Zhang G, Zhou CJ (2023) Real-valued hemimetric-based Fuzzy Rough Sets and an application to contour extraction of digital surfaces. Fuzzy Sets Syst 459:201–219

    Article  MathSciNet  Google Scholar 

Download references

Funding

Not applicable.

Author information

Authors and Affiliations

Authors

Contributions

All agreed on the content of the study. DS and MK collected all the data for analysis. DS and MK agreed on the methodology. DS and MK completed the analysis based on agreed steps. Results and conclusions are discussed and written together. All authors read and approved the final manuscript.

Corresponding author

Correspondence to D. Sudha.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Consent to participate

Not applicable.

Consent for publication

Not applicable.

Human and animal rights

This article does not contain any studies with human or animal subjects performed by any of the authors.

Informed consent

Informed consent was obtained from all individual participants included in the study.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Sudha, D., Krishnamurthy, M. A fuzzy rough set-based horse herd optimization algorithm for map reduce framework for customer behavior data. Knowl Inf Syst (2024). https://doi.org/10.1007/s10115-024-02105-7

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s10115-024-02105-7

Keywords

Navigation