Skip to main content
Log in

Fuzzy K-Means with M-KMP: a security framework in pyspark environment for intrusion detection

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

In recent times, IDS (Intrusion Detection System) has become a significant tool for improvising network security through the detection of abnormal and normal data. It is vital as it permits one to identify and respond to incoming malicious traffic. The intruders have also enhanced the inclusion of attacks in systems with a recent increase in data. Concurrently, ML (Machine Learning) algorithms can learn from corresponding data that has been afforded. With the provision of new data, the accuracy and efficacy of the ML model to take decisions to enhance with training. However, with the evolution of big data, ML has turned incapable of handling huge data interpretation issues which made most of the conventional systems explore high FP (False Positive) rates and low accuracy rates. This gave rise to pyspark which serves as a platform for addressing these issues that the ML method fails to solve. ML in pyspark is a scale and easy to use. Considering this, the present research intends to propose ML-based algorithms for classifying intrusion detection in a pyspark environment. This study proposes a security framework named Fuzzy K-Means with M-KMP (Modified-Knuth Morris Pratt) wherein the clustering is accomplished by Fuzzy K-means which is capable of exploring data points that potentially relate to multiple clusters. Whereas, M-KMP achieves information matching on the clustered data for assessment of the information occurrence on the allocated threat data that will serve as an assistance for security developers in attack prevention. The efficiency of this proposed work is confirmed through the results.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Algorithm 1
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11

Similar content being viewed by others

References

  1. Soe YN, Feng Y, Santosa PI, Hartanto R, Sakurai K (2019) Rule generation for signature based detection systems of cyber attacks in iot environments. Bull Netw Comput Syst Softw 8(2):93–97

    Google Scholar 

  2. Ali MM, El-Henawy IM, Salah A (2021) Usages of spark framework with different machine learning algorithms. Comput Intell Neurosci 2021. https://doi.org/10.1155/2021/1896953

  3. Othman SM, Ba-Alwi FM, Alsohybe NT, Al-Hashida AY (2018) Intrusion detection model using machine learning algorithm on Big Data environment. J Big Data 5(1):1–12

    Article  Google Scholar 

  4. Morfino V, Rampone S (2020) Towards near-real-time intrusion detection for IoT devices using supervised learning and apache spark. Electronics 9(3):444

    Article  Google Scholar 

  5. Singh J, Singh J (2021) A survey on machine learning-based malware detection in executable files. J Syst Architect 112:101861

    Article  Google Scholar 

  6. Karataş F, Korkmaz SA (2018) Big Data: controlling fraud by using machine learning libraries on spark. Int J Appl Math Electron Computers 6(1):1–5

    Article  Google Scholar 

  7. Peng K, Leung VC, Huang Q (2018) Clustering approach based on mini batch kmeans for intrusion detection system over big data. IEEE Access 6:11897–11906

    Article  Google Scholar 

  8. Sun L, Zhang H, Fang C (2021) Data security governance in the era of big data: status, challenges, and prospects. Data Sci Manage 2:41–44

    Article  Google Scholar 

  9. Latah M (2020) Detection of malicious social bots: a survey and a refined taxonomy. Expert Syst Appl 151:113383

    Article  Google Scholar 

  10. Zhang F, Kodituwakku HADE, Hines JW, Coble J (2019) Multilayer data-driven cyber-attack detection system for industrial control systems based on network, system, and process data. IEEE Trans Industr Inf 15(7):4362–4369

    Article  CAS  Google Scholar 

  11. Do Xuan C, Nguyen HD, Tisenko VN (2020) Malicious URL detection based on machine learning. Int J Adv Comput Sci Appl 11(1). https://doi.org/10.14569/ijacsa.2020.0110119

  12. Shi Y, Chen G, Li J (2018) Malicious domain name detection based on extreme machine learning. Neural Process Lett 48(3):1347–1357

    Article  Google Scholar 

  13. Liu S, Huang S, Xu X, Lloret J, Muhammad K (2023) Efficient visual tracking based on fuzzy inference for intelligent transportation systems. IEEE Trans Intell Trans Syst. https://doi.org/10.1109/TITS.2022.3232242

  14. Liu S et al (2022) Human inertial thinking strategy: a novel fuzzy reasoning mechanism for IoT-assisted visual monitoring. IEEE Internet of Things J 10(5):3735–3748

  15. Jemal I, Cheikhrouhou O, Hamam H, Mahfoudhi A (2020) Sql injection attack detection and prevention techniques using machine learning. Int J Appl Eng Res 15(6):569–580

    Google Scholar 

  16. Dhalaria M, Gandotra E (2021) A hybrid approach for android malware detection and family classification. IJIMAI 6.6(2021):174–188

  17. Singh J, Singh J (2020) Detection of malicious software by analyzing the behavioral artifacts using machine learning algorithms. ‎Inf Softw Technol 121:106273

    Article  Google Scholar 

  18. Shahriar H, Nimmagadda S (2021) Network Intrusion Detection for TCP/IP packets with machine learning techniques. In: Machine Intelligence and Big Data Analytics for Cybersecurity Applications. Springer, vol 919, pp 231–247. https://doi.org/10.1007/978-3-030-57024-8_10

  19. Subroto A, Apriyana A (2019) Cyber risk prediction through social media big data analytics and statistical machine learning. J Big Data 6(1):1–19

    Article  Google Scholar 

  20. Kotenko I, Saenko I, Branitskiy A (2018) Framework for mobile internet of things security monitoring based on big data processing and machine learning. IEEE Access 6:72714–72723

    Article  Google Scholar 

  21. Rashid M, Singh H, Goyal V, Parah SA, Wani AR (2021) Big data based hybrid machine learning model for improving performance of medical internet of things data in healthcare systems. In: Healthcare Paradigms in the Internet of Things Ecosystem. Elsevier, pp 47–62

  22. Shrestha R, Omidkar A, Roudi SA, Abbas R, Kim S (2021) Machine-learning-enabled intrusion detection system for cellular connected UAV networks. Electronics 10(13):1549

    Article  Google Scholar 

  23. Peng K, Leung V, Zheng L, Wang S, Huang C, Lin T (2018) Intrusion detection system based on decision tree over big data in fog environment. Wirel Commun Mob Comput 2018. https://doi.org/10.1155/2018/4680867

  24. Deepa G, Thilagam PS, Khan FA, Praseed A, Pais AR, Palsetia N (2018) Black-box detection of XQuery injection and parameter tampering vulnerabilities in web applications. Int J Inf Secur 17(1):105–120

    Article  Google Scholar 

  25. Atefinia R, Ahmadi M (2022) Performance evaluation of Apache Spark MLlib algorithms on an intrusion detection dataset. J Comput Secur 9(1):57–69

    Google Scholar 

  26. Marir N, Wang H, Feng G, Li B, Jia M (2018) Distributed abnormal behavior detection approach based on deep belief network and ensemble SVM using spark. IEEE Access 6:59657–59671

    Article  Google Scholar 

  27. Hafsa M, Jemili F (2018) Comparative study between big data analysis techniques in intrusion detection. Big Data and Cogn Comput 3(1):1

    Article  Google Scholar 

  28. Donkal G, Verma GK (2018) A multimodal fusion based framework to reinforce IDS for securing Big Data environment using spark. J Inform Secur Appl 43:1–11

    Google Scholar 

  29. Atefinia R, Ahmadi M (2021) Network intrusion detection using multi-architectural modular deep neural network. J Supercomput 77(4):3571–3593

    Article  Google Scholar 

  30. Basnet RB, Shash R, Johnson C, Walgren L, Doleck T (2019) Towards detecting and classifying Network Intrusion Traffic using Deep Learning frameworks. J Internet Serv Inf Secur 9(4):1–17

    CAS  Google Scholar 

  31. Al-Tarawneh A, Al-Saraireh Ja (2021) Efficient detection of hacker community based on twitter data using complex networks and machine learning algorithm. J Intell Fuzzy Syst 40(6):12321–12337

    Article  Google Scholar 

  32. Islam U et al (2022) Detection of Distributed Denial of Service (DDoS) Attacks in IOT Based Monitoring System of Banking Sector Using Machine Learning Models. Sustainability 14(14):8374

    Article  Google Scholar 

  33. Iqbal F, Batool R, Fung BC, Aleem S, Abbasi A, Javed AR (2021) Toward tweet-mining framework for extracting terrorist attack-related information and reporting. IEEE Access 9:115535–115547

    Article  Google Scholar 

  34. Bouya-Moko BE, Boahen EK, Wang C (2022) Fuzzy Local Information and Bhattacharya-Based C-Means Clustering and Optimized Deep Learning in Spark Framework for Intrusion Detection. Electronics 11(11):1675

    Article  Google Scholar 

  35. Gupta R, Tanwar S, Tyagi S, Kumar N (2020) Machine learning models for secure data analytics: A taxonomy and threat model. Comput Commun 153:406–440

    Article  Google Scholar 

  36. Akkem Y, Biswas SK, Varanasi A (2023) Smart farming using artificial intelligence: a review. Eng Appl Artif Intell 120:105899

    Article  Google Scholar 

Download references

Funding

There is no funding for this study.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Gousiya Begum.

Ethics declarations

Conflict of interest

There is no conflict of Interest.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Begum, G., Ul Huq, S.Z. & Kumar, A.P.S. Fuzzy K-Means with M-KMP: a security framework in pyspark environment for intrusion detection. Multimed Tools Appl (2024). https://doi.org/10.1007/s11042-024-18180-5

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s11042-024-18180-5

Keywords

Navigation