Abstract
In recent times, IDS (Intrusion Detection System) has become a significant tool for improvising network security through the detection of abnormal and normal data. It is vital as it permits one to identify and respond to incoming malicious traffic. The intruders have also enhanced the inclusion of attacks in systems with a recent increase in data. Concurrently, ML (Machine Learning) algorithms can learn from corresponding data that has been afforded. With the provision of new data, the accuracy and efficacy of the ML model to take decisions to enhance with training. However, with the evolution of big data, ML has turned incapable of handling huge data interpretation issues which made most of the conventional systems explore high FP (False Positive) rates and low accuracy rates. This gave rise to pyspark which serves as a platform for addressing these issues that the ML method fails to solve. ML in pyspark is a scale and easy to use. Considering this, the present research intends to propose ML-based algorithms for classifying intrusion detection in a pyspark environment. This study proposes a security framework named Fuzzy K-Means with M-KMP (Modified-Knuth Morris Pratt) wherein the clustering is accomplished by Fuzzy K-means which is capable of exploring data points that potentially relate to multiple clusters. Whereas, M-KMP achieves information matching on the clustered data for assessment of the information occurrence on the allocated threat data that will serve as an assistance for security developers in attack prevention. The efficiency of this proposed work is confirmed through the results.
Similar content being viewed by others
References
Soe YN, Feng Y, Santosa PI, Hartanto R, Sakurai K (2019) Rule generation for signature based detection systems of cyber attacks in iot environments. Bull Netw Comput Syst Softw 8(2):93–97
Ali MM, El-Henawy IM, Salah A (2021) Usages of spark framework with different machine learning algorithms. Comput Intell Neurosci 2021. https://doi.org/10.1155/2021/1896953
Othman SM, Ba-Alwi FM, Alsohybe NT, Al-Hashida AY (2018) Intrusion detection model using machine learning algorithm on Big Data environment. J Big Data 5(1):1–12
Morfino V, Rampone S (2020) Towards near-real-time intrusion detection for IoT devices using supervised learning and apache spark. Electronics 9(3):444
Singh J, Singh J (2021) A survey on machine learning-based malware detection in executable files. J Syst Architect 112:101861
Karataş F, Korkmaz SA (2018) Big Data: controlling fraud by using machine learning libraries on spark. Int J Appl Math Electron Computers 6(1):1–5
Peng K, Leung VC, Huang Q (2018) Clustering approach based on mini batch kmeans for intrusion detection system over big data. IEEE Access 6:11897–11906
Sun L, Zhang H, Fang C (2021) Data security governance in the era of big data: status, challenges, and prospects. Data Sci Manage 2:41–44
Latah M (2020) Detection of malicious social bots: a survey and a refined taxonomy. Expert Syst Appl 151:113383
Zhang F, Kodituwakku HADE, Hines JW, Coble J (2019) Multilayer data-driven cyber-attack detection system for industrial control systems based on network, system, and process data. IEEE Trans Industr Inf 15(7):4362–4369
Do Xuan C, Nguyen HD, Tisenko VN (2020) Malicious URL detection based on machine learning. Int J Adv Comput Sci Appl 11(1). https://doi.org/10.14569/ijacsa.2020.0110119
Shi Y, Chen G, Li J (2018) Malicious domain name detection based on extreme machine learning. Neural Process Lett 48(3):1347–1357
Liu S, Huang S, Xu X, Lloret J, Muhammad K (2023) Efficient visual tracking based on fuzzy inference for intelligent transportation systems. IEEE Trans Intell Trans Syst. https://doi.org/10.1109/TITS.2022.3232242
Liu S et al (2022) Human inertial thinking strategy: a novel fuzzy reasoning mechanism for IoT-assisted visual monitoring. IEEE Internet of Things J 10(5):3735–3748
Jemal I, Cheikhrouhou O, Hamam H, Mahfoudhi A (2020) Sql injection attack detection and prevention techniques using machine learning. Int J Appl Eng Res 15(6):569–580
Dhalaria M, Gandotra E (2021) A hybrid approach for android malware detection and family classification. IJIMAI 6.6(2021):174–188
Singh J, Singh J (2020) Detection of malicious software by analyzing the behavioral artifacts using machine learning algorithms. Inf Softw Technol 121:106273
Shahriar H, Nimmagadda S (2021) Network Intrusion Detection for TCP/IP packets with machine learning techniques. In: Machine Intelligence and Big Data Analytics for Cybersecurity Applications. Springer, vol 919, pp 231–247. https://doi.org/10.1007/978-3-030-57024-8_10
Subroto A, Apriyana A (2019) Cyber risk prediction through social media big data analytics and statistical machine learning. J Big Data 6(1):1–19
Kotenko I, Saenko I, Branitskiy A (2018) Framework for mobile internet of things security monitoring based on big data processing and machine learning. IEEE Access 6:72714–72723
Rashid M, Singh H, Goyal V, Parah SA, Wani AR (2021) Big data based hybrid machine learning model for improving performance of medical internet of things data in healthcare systems. In: Healthcare Paradigms in the Internet of Things Ecosystem. Elsevier, pp 47–62
Shrestha R, Omidkar A, Roudi SA, Abbas R, Kim S (2021) Machine-learning-enabled intrusion detection system for cellular connected UAV networks. Electronics 10(13):1549
Peng K, Leung V, Zheng L, Wang S, Huang C, Lin T (2018) Intrusion detection system based on decision tree over big data in fog environment. Wirel Commun Mob Comput 2018. https://doi.org/10.1155/2018/4680867
Deepa G, Thilagam PS, Khan FA, Praseed A, Pais AR, Palsetia N (2018) Black-box detection of XQuery injection and parameter tampering vulnerabilities in web applications. Int J Inf Secur 17(1):105–120
Atefinia R, Ahmadi M (2022) Performance evaluation of Apache Spark MLlib algorithms on an intrusion detection dataset. J Comput Secur 9(1):57–69
Marir N, Wang H, Feng G, Li B, Jia M (2018) Distributed abnormal behavior detection approach based on deep belief network and ensemble SVM using spark. IEEE Access 6:59657–59671
Hafsa M, Jemili F (2018) Comparative study between big data analysis techniques in intrusion detection. Big Data and Cogn Comput 3(1):1
Donkal G, Verma GK (2018) A multimodal fusion based framework to reinforce IDS for securing Big Data environment using spark. J Inform Secur Appl 43:1–11
Atefinia R, Ahmadi M (2021) Network intrusion detection using multi-architectural modular deep neural network. J Supercomput 77(4):3571–3593
Basnet RB, Shash R, Johnson C, Walgren L, Doleck T (2019) Towards detecting and classifying Network Intrusion Traffic using Deep Learning frameworks. J Internet Serv Inf Secur 9(4):1–17
Al-Tarawneh A, Al-Saraireh Ja (2021) Efficient detection of hacker community based on twitter data using complex networks and machine learning algorithm. J Intell Fuzzy Syst 40(6):12321–12337
Islam U et al (2022) Detection of Distributed Denial of Service (DDoS) Attacks in IOT Based Monitoring System of Banking Sector Using Machine Learning Models. Sustainability 14(14):8374
Iqbal F, Batool R, Fung BC, Aleem S, Abbasi A, Javed AR (2021) Toward tweet-mining framework for extracting terrorist attack-related information and reporting. IEEE Access 9:115535–115547
Bouya-Moko BE, Boahen EK, Wang C (2022) Fuzzy Local Information and Bhattacharya-Based C-Means Clustering and Optimized Deep Learning in Spark Framework for Intrusion Detection. Electronics 11(11):1675
Gupta R, Tanwar S, Tyagi S, Kumar N (2020) Machine learning models for secure data analytics: A taxonomy and threat model. Comput Commun 153:406–440
Akkem Y, Biswas SK, Varanasi A (2023) Smart farming using artificial intelligence: a review. Eng Appl Artif Intell 120:105899
Funding
There is no funding for this study.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
There is no conflict of Interest.
Additional information
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Begum, G., Ul Huq, S.Z. & Kumar, A.P.S. Fuzzy K-Means with M-KMP: a security framework in pyspark environment for intrusion detection. Multimed Tools Appl (2024). https://doi.org/10.1007/s11042-024-18180-5
Received:
Revised:
Accepted:
Published:
DOI: https://doi.org/10.1007/s11042-024-18180-5