ABSTRACT
Twitter is one of the most popular microblogging social systems, which provides a set of distinctive posting services operating in real time. The flexibility of these services has attracted unethical individuals, so-called "spammers", aiming at spreading malicious, phishing, and misleading information. Unfortunately, the existence of spam results non-ignorable problems related to search and user's privacy. In the battle of fighting spam, various detection methods have been designed, which work by automating the detection process using the "features" concept combined with machine learning methods. However, the existing features are not effective enough to adapt spammers' tactics due to the ease of manipulation in the features. Also, the graph features are not suitable for Twitter based applications, though the high performance obtainable when applying such features.
In this paper, beyond the simple statistical features such as number of hashtags and number of URLs, we examine the time property through advancing the design of some features used in the literature, and proposing new time based features. The new design of features is divided between robust advanced statistical features incorporating explicitly the time attribute, and behavioral features identifying any posting behavior pattern. The experimental results show that the new form of features is able to classify correctly the majority of spammers with an accuracy higher than 93% when using Random Forest learning algorithm, applied on a collected and annotated data-set. The results obtained outperform the accuracy of the state of the art features by about 6%, proving the significance of leveraging time in detecting spam accounts.
- Formerly Digital Marketing Ramblings. By the numbers: 170+ amazing twitter statistics. http://expandedramblings.com/index.php/march-2013-by-\\the-numbers-a-few-amazing-twitter-stats/, 2013. {Online; accessed 1-July-2016}.Google Scholar
- Fabricio Benevenuto, Gabriel Magno, Tiago Rodrigues, and Virgilio Almeida. Detecting spammers on twitter. In In Collaboration, Electronic messaging, Anti-Abuse and Spam Conference (CEAS, page 12, 2010.Google Scholar
- Alex Hai Wang. Don't follow me: Spam detection in twitter. In Security and Cryptography (SECRYPT), Proceedings of the 2010 International Conference on, pages 1--10, July 2010.Google Scholar
- Kyumin Lee, James Caverlee, and Steve Webb. Uncovering social spammers: Social honeypots + machine learning. In Proceedings of the 33rd International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR '10, pages 435--442, New York, NY, USA, 2010. ACM. Google ScholarDigital Library
- M. McCord and M. Chuah. Spam detection on twitter using traditional classifiers. In Proceedings of the 8th International Conference on Autonomic and Trusted Computing, ATC'11, pages 175--186. Springer-Verlag, 2011. Google ScholarDigital Library
- Gianluca Stringhini, Christopher Kruegel, and Giovanni Vigna. Detecting spammers on social networks. In Proceedings of the 26th Annual Computer Security Applications Conference, ACSAC '10, pages 1--9, New York, NY, USA, 2010. ACM. Google ScholarDigital Library
- Chao Yang, Robert Chandler Harkreader, and Guofei Gu. Die free or live hard? empirical evaluation and new design for fighting evolving twitter spammers. In Proceedings of the 14th International Conference on Recent Advances in Intrusion Detection, RAID'11, pages 318--337, Berlin, Heidelberg, 2011. Springer-Verlag. Google ScholarDigital Library
- Amit A Amleshwaram, Nutan Reddy, Suneel Yadav, Guofei Gu, and Chao Yang. Cats: Characterizing automation of twitter spammers. In Communication Systems and Networks (COMSNETS), 2013 Fifth International Conference on, pages 1--10. IEEE, 2013.Google ScholarCross Ref
- Cheng Cao and James Caverlee. Detecting spam urls in social media via behavioral analysis. In Advances in Information Retrieval, pages 703--714. Springer, 2015.Google ScholarCross Ref
- Zi Chu, Indra Widjaja, and Haining Wang. Detecting social spam campaigns on twitter. In Applied Cryptography and Network Security, pages 455--472. Springer, 2012. Google ScholarDigital Library
- Claudia Meda, Federica Bisio, Paolo Gastaldo, and Rodolfo Zunino. A machine learning approach for twitter spammers detection. In 2014 International Carnahan Conference on Security Technology (ICCST), pages 1--6. IEEE, 2014.Google ScholarCross Ref
- Igor Santos, Igor Miambres-Marcos, Carlos Laorden, Patxi Galn-Garca, Aitor Santamara-Ibirika, and Pablo Garca Bringas. Twitter content-based spam filtering. In International Joint Conference SOCO'13-CISIS'13-ICEUTE'13, pages 449--458. Springer, 2014.Google ScholarCross Ref
- Juan Martinez-Romo and Lourdes Araujo. Detecting malicious tweets in trending topics using a statistical analysis of language. Expert Systems with Applications, 40(8):2992--3000, 2013. Google ScholarDigital Library
- Twitter. The twitter rules. https://support.twitter.com/articles/18311#, 2016. {Online; accessed 1-July-2016}.Google Scholar
- Chao Yang, Robert Harkreader, Jialong Zhang, Seungwon Shin, and Guofei Gu. Analyzing spammers' social networks for fun and profit: A case study of cyber criminal ecosystem on twitter. In Proceedings of the 21st International Conference on World Wide Web, WWW '12, pages 71--80, New York, NY, USA, 2012. ACM. Google ScholarDigital Library
- Zi Chu, Steven Gianvecchio, Haining Wang, and Sushil Jajodia. Detecting automation of twitter accounts: Are you a human, bot, or cyborg? Dependable and Secure Computing, IEEE Transactions on, 9(6):811--824, 2012. Google ScholarDigital Library
- Xia Hu, Jiliang Tang, and Huan Liu. Online social spammer detection. In AAAI, pages 59--65, 2014. Google ScholarDigital Library
- Xia Hu, Jiliang Tang, Yanchao Zhang, and Huan Liu. Social spammer detection in microblogging. In IJCAI, volume 13, pages 2633--2639. Citeseer, 2013. Google ScholarDigital Library
- Pang-Ning Tan, Michael Steinbach, and Vipin Kumar. Introduction to Data Mining, (First Edition). Addison-Wesley Longman Publishing Co., Inc., Boston, MA, USA, 2005. Google ScholarDigital Library
- Solomon Kullback and Richard A Leibler. On information and sufficiency. The annals of mathematical statistics, 22(1):79--86, 1951.Google Scholar
- Alex Hai Wang. Detecting spam bots in online social networking sites: A machine learning approach. In Proceedings of the 24th Annual IFIP WG 11.3 Working Conference on Data and Applications Security and Privacy, DBSec'10, pages 335--342, Berlin, Heidelberg, 2010. Springer-Verlag. Google ScholarDigital Library
- Po-Ching Lin and Po-Min Huang. A study of effective features for detecting long-surviving twitter spam accounts. In Advanced Communication Technology (ICACT), 2013 15th International Conference on, pages 841--846, Jan 2013.Google Scholar
- Sarita Yardi, Daniel Romero, Grant Schoenebeck, and danah boyd. Detecting spam in a twitter network. First Monday, 15(1), 2009.Google Scholar
- Kurt Thomas, Chris Grier, Dawn Song, and Vern Paxson. Suspended accounts in retrospect: An analysis of twitter spam. In Proceedings of the 2011 ACM SIGCOMM Conference on Internet Measurement Conference, IMC '11, pages 243--258, New York, NY, USA, 2011. ACM. Google ScholarDigital Library
- Hongyu Gao, Yan Chen, Kathy Lee, Diana Palsetia, and Alok N Choudhary. Towards online spam filtering in social networks. In NDSS, page 16, 2012.Google Scholar
- Jonghyuk Song, Sangho Lee, and Jong Kim. Spam filtering in twitter using sender-receiver relationship. In Recent Advances in Intrusion Detection, pages 301--317. Springer, 2011. Google ScholarDigital Library
- Mark Hall, Eibe Frank, Geoffrey Holmes, Bernhard Pfahringer, Peter Reutemann, and Ian H. Witten. The weka data mining software: An update. SIGKDD Explor. Newsl., 11(1):10--18, November 2009. Google ScholarDigital Library
- Diansheng Guo and Chao Chen. Detecting non-personal and spam users on geo-tagged twitter network. Transactions in GIS, 18(3):370--384, 2014.Google ScholarCross Ref
Index Terms
- Leveraging time for spammers detection on Twitter
Recommendations
Correlation Analysis between Spamming Botnets and Malware Infected Hosts
SAINT '11: Proceedings of the 2011 IEEE/IPSJ International Symposium on Applications and the InternetMany of recent cyber attacks are being launched by botnets for the purpose of carrying out large-scale cyber attacks such as spam emails, Distributed Denial of Service (DDoS), network scanning and so on. In many cases, these botnets consist of a lot of ...
Feature engineering for detecting spammers on Twitter
Twitter is a social networking website that has gained a lot of popularity around the world in the last decade. This popularity made Twitter a common target for spammers and malicious users to spread unwanted advertisements, viruses and phishing ...
Understanding the network-level behavior of spammers
SIGCOMM '06: Proceedings of the 2006 conference on Applications, technologies, architectures, and protocols for computer communicationsThis paper studies the network-level behavior of spammers, including: IP address ranges that send the most spam, common spamming modes (e.g., BGP route hijacking, bots), how persistent across time each spamming host is, and characteristics of spamming ...
Comments