Abstract
Spam detection on social networks is increasingly important owing to the rapid growth of social network user base. Sophisticated spam filters must be developed to deal with this complex problem. Traditional machine learning approaches such as neural networks, support vector machines and Naïve Bayes classifiers are not effective enough to process and utilize complex features present in high-dimensional data on social network spam. Moreover, the traditional objective criteria of social network spam filters cannot cope with different costs assigned to type I and type II errors. To overcome these problems, here we propose a novel cost-sensitive approach to social network spam filtering. The proposed approach is composed of two stages. In the first stage, multi-objective evolutionary feature selection is used to minimize both the misclassification cost of the proposed model and the number of attributes necessary for spam filtering. Then, the approach uses cost-sensitive ensemble learning techniques with regularized deep neural networks as base learners. We demonstrate that this approach is effective for social network spam filtering on two benchmark datasets. We also show that the proposed approach outperforms other popular algorithms used in social network spam filtering, such as random forest, Naïve Bayes or support vector machines.
Similar content being viewed by others
References
Cormack GV (2006) Email spam filtering: a systematic review. Found Trends Inf Retr 1(4):335–455. https://doi.org/10.1561/1500000006
Nexgate (2013) State of social media spam. http://nexgate.com/wp-content/uploads/2013/09/Nexgate-2013-State-of-Social-Media-Spam-Research-Report.pdf. Accessed 20 Apr 2019
Statista (2018) Twitter: number of monthly active users 2010–2018. https://www.statista.com/statistics/282087/number-of-monthly-active-twitter-users/. Accessed 20 Apr 2019
Prieto VM, Alvarez M, Cacheda F (2013) Detecting linkedin spammers and its spam nets. Int J Adv Comput Sci Appl (IJACSA) 4(9):189–199
Shen H, Ma F, Zhang X, Zong L, Liu X, Liang W (2017) Discovering social spammers from multiple views. Neurocomputing 225:49–57. https://doi.org/10.1016/j.neucom.2016.11.013
Adewole KS, Anuar NB, Kamsin A, Varathan KD, Razak SA (2017) Malicious accounts: dark of the social networks. J Netw Comput Appl 79:41–67. https://doi.org/10.1016/j.jnca.2016.11.030
Soliman A, Girdzijauskas S (2016) Adaptive graph-based algorithms for spam detection in social networks. KTH Royal Institute of Technology, diva2:998690
Dutta S, Ghatak S, Dey R, Das AK, Ghosh S (2018) Attribute selection for improving spam classification in online social networks: a rough set theory-based approach. Soc Netw Anal Min 8(7):1–16. https://doi.org/10.1007/s13278-017-0484-8
Barushka A, Hajek P (2016) Spam filtering using regularized neural networks with rectified linear units. In: Adorni G, Cagnoni S, Gori M, Maratea M (eds) Conference of the Italian Association for artificial intelligence. Lecture notes in computer science, vol 10037. Springer, Cham, pp 65–75. https://doi.org/10.1007/978-3-319-49130-1_6
Bhowmick A, Hazarika SM (2018) E-mail spam filtering: a review of techniques and trends. In: Kalam A, Das S, Sharma K (eds) Advances in electronics, communication and computing. Lecture notes in electrical engineering, vol 443. Springer, Singapore, pp 583–590. https://doi.org/10.1007/978-981-10-4765-7_61
Almeida TA, Almeida J, Yamakami A (2011) Spam filtering: how the dimensionality reduction affects the accuracy of Naive Bayes classifiers. J Internet Serv Appl 1(3):183–200. https://doi.org/10.1007/s13174-010-0014-7
Choudhary N, Jain AK (2017) Towards filtering of SMS spam messages using machine learning based technique. In: Singh D, Raman B, Luhach A, Lingras P (eds) Advanced informatics for computing research. Communications in computer and information science, vol 712. Springer, Singapore, pp 18–30. https://doi.org/10.1007/978-981-10-5780-9_2
Kaur P, Singhal A, Kaur J (2016) Spam detection on Twitter: A survey. In: 2016 3rd international conference on computing for sustainable global development (INDIACom). IEEE, New Delhi, pp 2570–2573
Kaur R, Singh S, Kumar H (2018) Rise of spam and compromised accounts in online social networks: a state-of-the-art review of different combating approaches. J Netw Comput Appl 112:53–88. https://doi.org/10.1016/j.jnca.2018.03.015
Sanz JA, Bernardo D, Herrera F, Bustince H, Hagras H (2015) A compact evolutionary interval-valued fuzzy rule-based classification system for the modeling and prediction of real-world financial applications with imbalanced data. IEEE Trans Fuzzy Syst 23(4):973–990. https://doi.org/10.1109/TFUZZ.2014.2336263
Al-Janabi M, Quincey ED, Andras P (2017) Using supervised machine learning algorithms to detect suspicious URLs in online social networks. In: Proceedings of the 2017 IEEE/ACM international conference on advances in social networks analysis and mining 2017, ACM, pp 1104–1111. https://doi.org/10.1145/3110025.3116201
Jiménez F, Sánchez G, García JM, Sciavicco G, Miralles L (2017) Multi-objective evolutionary feature selection for online sales forecasting. Neurocomputing 234:75–92. https://doi.org/10.1016/j.neucom.2016.12.045
Barushka A, Hajek P (2018) Spam filtering in social networks using regularized deep neural networks with ensemble learning. In: Iliadis L, Maglogiannis I, Plagianakos V (eds) Artificial intelligence applications and innovations. AIAI 2018. IFIP advances in information and communication technology, vol 519. Springer, Cham, pp 38–49. https://doi.org/10.1007/978-3-319-92007-8_4
Statista (2018) Number of facebook users worldwide 2008–2018. https://www.statista.com/statistics/264810/number-of-monthly-active-facebook-users-worldwide/. Accessed 20 Apr 2019
Zheng X, Zeng Z, Chen Z, Yu Y, Rong C (2015) Detecting spammers on social networks. Neurocomputing 159:27–34. https://doi.org/10.1016/j.neucom.2015.02.047
Stringhini G, Kruegel C, Vigna G (2010) Detecting spammers on social networks. In: Proceedings of the 26th annual computer security applications conference. ACM, pp 1–9
Lee K, Caverlee J, Webb S (2010) Uncovering social spammers: social honeypots + machine learning. In: Proceedings of the 33rd international ACM SIGIR conference on research and development in information retrieval. ACM, pp 435–442
Wang AH (2010) Don’t follow me: spam detection in Twitter. In: Proceedings of the 2010 international conference on security and cryptography (SECRYPT). IEEE, pp 1–10
Benevenuto F, Magno G, Rodrigues T, Almeida V (2010) Detecting spammers on twitter. In: 6th collaboration, electronic messaging, anti-abuse and spam conference (CEAS), pp 1–12
Lee K, Eoff BD, Caverlee J (2011) Seven months with the devils: a long-term study of content polluters on Twitter. In: Proceedings of the 5th international AAAI conference on weblogs and social media, pp 185–192
Jin X, Lin C, Luo J, Han J (2011) A data mining-based spam detection system for social media networks. Proc VLDB Endow 4(12):1458–81461
Thomas K, Grier C, Song D, Paxson V (2011) Suspended accounts in retrospect: an analysis of twitter spam. In: Proceedings of the 2011 ACM SIGCOMM conference on internet measurement conference. ACM, pp 243–258
Song J, Lee S, Kim J (2011) Spam filtering in twitter using sender-receiver relationship. In: International workshop on recent advances in intrusion detection. Springer, Berlin, pp 301–317
Chu Z, Widjaja I, Wang H (2012) Detecting social spam campaigns on twitter. In: International conference on applied cryptography and network security. Springer, Berlin, pp 455–472. https://doi.org/10.1007/978-3-642-31284-7_27
Bosma M, Meij E, Weerkamp W (2012) A framework for unsupervised spam detection in social networking sites. In: Baeza-Yates R et al (eds) European conference on information retrieval. Springer, Berlin, pp 364–375. https://doi.org/10.1007/978-3-642-28997-2_31
Yang C, Harkreader R, Gu G (2013) Empirical evaluation and new design for fighting evolving twitter spammers. IEEE Trans Inf Forensics Secur 8(8):1280–1293. https://doi.org/10.1109/TIFS.2013.2267732
Martinez-Romo J, Araujo L (2013) Detecting malicious tweets in trending topics using a statistical analysis of language. Expert Syst Appl 40(8):2992–3000. https://doi.org/10.1016/j.eswa.2012.12.015
Lee S, Kim J (2013) Warningbird: a near real-time detection system for suspicious urls in twitter stream. IEEE Trans Dependable Secure Comput 10(3):183–195. https://doi.org/10.1109/TDSC.2013.3
Bhat SY, Abulaish M (2013) Community-based features for identifying spammers in online social networks. In: 2013 IEEE/ACM international conference on advances in social networks analysis and mining (ASONAM). IEEE, pp 100–107
Ahmed F, Abulaish M (2013) A generic statistical approach for spam detection in online social networks. Comput Commun 36(10–11):1120–1129. https://doi.org/10.1016/j.comcom.2013.04.004
Miller Z, Dickinson B, Deitrick W, Hu W, Wang AH (2014) Twitter spammer detection using data stream clustering. Inf Sci 260:64–73. https://doi.org/10.1016/j.ins.2013.11.016
Cao C, Caverlee J (2015) Detecting spam urls in social media via behavioral analysis. In: European conference on information retrieval. Springer, Cham, pp 703–714. https://doi.org/10.1007/978-3-319-16354-3_77
Antonakaki D, Polakis I, Athanasopoulos E, Ioannidis S, Fragopoulou P (2016) Exploiting abused trending topics to identify spam campaigns in Twitter. Soc Netw Anal Min 6(1):48. https://doi.org/10.1007/s13278-016-0354-9
Liu C, Wang G (2016) Analysis and detection of spam accounts in social networks. In: 2016 2nd IEEE international conference on computer and communications (ICCC). IEEE, pp 2526–2530. https://doi.org/10.1109/compcomm.2016.7925154
Wu F, Shu J, Huang Y, Yuan Z (2016) Co-detecting social spammers and spam messages in microblogging via exploiting social contexts. Neurocomputing 201:51–65. https://doi.org/10.1016/j.neucom.2016.03.036
Zheng X, Zhang X, Yu Y, Kechadi T, Rong C (2016) ELM-based spammer detection in social networks. J Supercomput 72(8):2991–3005. https://doi.org/10.1007/s11227-015-1437-5
Song L, Lau RYK, Kwok RCW, Mirkovski K, Dou W (2017) Who are the spoilers in social media marketing? Incremental learning of latent semantics for social spam detection. Electron Commer Res 17(1):51–81. https://doi.org/10.1007/s10660-016-9244-5
Chen C, Wang Y, Zhang J, Xiang Y, Zhou W, Min G (2017) Statistical features-based real-time detection of drifted twitter spam. IEEE Trans Inf Forensics Secur 12(4):914–925. https://doi.org/10.1109/TIFS.2016.2621888
Adewole KS, Anuar NB, Kamsin A, Sangaiah AK (2019) SMSAD: a framework for spam message and spam account detection. Multimed Tools Appl 78(4):3925–3960. https://doi.org/10.1007/s11042-017-5018-x
Watcharenwong N, Saikaew K (2017) Spam detection for closed Facebook groups. In: 2017 14th international joint conference on computer science and software engineering (JCSSE). IEEE, pp 1–6. https://doi.org/10.1109/jcsse.2017.8025914
Yu D, Chen N, Jiang F, Fu B, Qin A (2017) Constrained NMF-based semi-supervised learning for social media spammer detection. Knowl-Based Syst 125:64–73. https://doi.org/10.1016/j.knosys.2017.03.025
Chen W, Yeo CK, Lau CT, Lee BS (2017) A study on real-time low-quality content detection on Twitter from the users’ perspective. PLoS ONE 12(8):e0182487. https://doi.org/10.1371/journal.pone.0182487
Al-Zoubi AM, Faris H, Hassonah MA (2018) Evolving support vector machines using whale optimization algorithm for spam profiles detection on online social networks in different lingual contexts. Knowl-Based Syst 153:91–104. https://doi.org/10.1016/j.knosys.2018.04.025
Aswani R, Kar AK, Ilavarasan PV (2017) Detection of spammers in twitter marketing: a hybrid approach using social media analytics and bio inspired computing. Inf Syst Front. https://doi.org/10.1007/s10796-017-9805-8
Bindu PV, Mishra R, Thilagam PS (2018) Discovering spammer communities in twitter. J Intell Inf Syst. https://doi.org/10.1007/s10844-017-0494-z
Sedhai S, Sun A (2018) Semi-supervised spam detection in Twitter stream. IEEE Trans Comput Soc Syst 5(1):169–175. https://doi.org/10.1109/TCSS.2017.2773581
Sohrabi MK, Karimi F (2018) A feature selection approach to detect spam in the Facebook social network. Arab J Sci Eng 43(2):949–958. https://doi.org/10.1007/s13369-017-2855-x
Barushka A, Hajek P (2018) Spam filtering using integrated distribution-based balancing approach and regularized deep neural networks. Appl Intell 48(10):3538–3556. https://doi.org/10.1007/s10489-018-1161-y
Gogoglou A, Theodosiou Z, Kounoudes T, Vakali A, Manolopoulos Y (2016) Early malicious activity discovery in microblogs by social bridges detection. In: 2016 IEEE international symposium on signal processing and information technology (ISSPIT). IEEE, Limassol, pp 132–137. https://doi.org/10.1109/isspit.2016.7886022
Hinton G, Srivastava N, Krizhevsky A, Sutskever I, Salakhutdinov R (2012) Improving neural networks by preventing co-adaptation of feature detectors. arXiv:1207.0580
Dhillon IS, Mallela S, Kumar R (2003) A divisive information-theoretic feature clustering algorithm for text classification. J Mach Learn Res 3:1265–1287. https://doi.org/10.1162/153244303322753661
Chandrashekar G, Sahin F (2014) A survey on feature selection methods. Comput Electr Eng 40(1):16–28. https://doi.org/10.1016/j.compeleceng.2013.11.024
Jiménez F, Marzano E, Sánchez G, Sciavicco G, Vitacolonna N (2015) Attribute selection via multi-objective evolutionary computation applied to multi-skill contact center data classification. In: 2015 IEEE symposium series on computational intelligence. IEEE, pp 488–495. https://doi.org/10.1109/ssci.2015.78
Zhang Y, Wang S, Phillips P, Ji G (2014) Binary PSO with mutation operator for feature selection using decision tree applied to spam detection. Knowl-Based Syst 64:22–31. https://doi.org/10.1016/j.knosys.2014.03.015
Jia X, Shang L (2014) Three-way decisions versus two-way decisions on filtering spam email. In: Transactions on rough sets XVIII, Springer, Berlin, pp 69–91. https://doi.org/10.1007/978-3-662-44680-5_5
Maas AL, Hannun AY, Ng AY (2013) Rectifier nonlinearities improve neural network acoustic models. In: Proceedings of the 30th international conference on machine learning, pp 1–6
Freund Y, Schapire RE (1996) Experiments with a new boosting algorithm. In: 13th international conference on machine learning, San Francisco, pp 148–156
Breiman L (1996) Bagging predictors. Mach Learn 24(2):123–140. https://doi.org/10.1007/BF00058655
Ho TK (1998) The random subspace method for constructing decision forests. IEEE Trans Pattern Anal Mach Intell 20(8):832–844. https://doi.org/10.1109/34.709601
Bermejo P, Gámez JA, Puerta JM (2011) Improving the performance of Naive Bayes multinomial in e-mail foldering by introducing distribution-based balance of datasets. Expert Syst Appl 38(3):2072–2080. https://doi.org/10.1016/j.eswa.2010.07.146
Pérez-Díaz N, Ruano-Ordás D, Fdez-Riverola F, Méndez JR (2012) SDAI: an integral evaluation methodology for content-based spam filtering models. Expert Syst Appl 39(16):12487–12500. https://doi.org/10.1016/j.eswa.2012.04.064
Cao J, Fu Q, Li Q, Guo D (2017) Discovering hidden suspicious accounts in online social networks. Inf Sci 394:123–140. https://doi.org/10.1016/j.ins.2017.02.030
Gao H, Chen Y, Lee K, Palsetia D, Choudhary AN (2012) Towards online spam filtering in social networks. NDSS 12(2012):1–16
Masood F, Almogren A, Abbas A, Khattak HA, Din IU, Guizani M, Zuair M (2019) Spammer detection and fake user identification on social networks. IEEE Access 7:68140–68152. https://doi.org/10.1109/ACCESS.2019.2918196
Barushka A, Hajek P (2019). Review spam detection using word embeddings and deep neural networks. In: MacIntyre J, Maglogiannis I, Iliadis L, Pimenidis E (eds) Artificial intelligence applications and innovations. AIAI 2019. IFIP Advances in information and communication technology, vol 559. Springer, Cham, pp 340–350. https://doi.org/10.1007/978-3-030-19823-7_28
Jang B, Jeong S, Kim CK (2019) Distance-based customer detection in fake follower markets. Inf Syst 81:104–116. https://doi.org/10.1016/j.is.2018.12.001
Acknowledgements
This article was supported by the scientific research project of the Czech Sciences Foundation Grant No: 16-19590S.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Barushka, A., Hajek, P. Spam detection on social networks using cost-sensitive feature selection and ensemble-based regularized deep neural networks. Neural Comput & Applic 32, 4239–4257 (2020). https://doi.org/10.1007/s00521-019-04331-5
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00521-019-04331-5