research-article

Leveraging time for spammers detection on Twitter

Authors:
Mahdi Washha

University of Toulouse, Toulouse, France

University of Toulouse, Toulouse, France
View Profile

,
Aziz Qaroush

Birzeit University, Ramallah, Palestine

Birzeit University, Ramallah, Palestine
View Profile

,
Florence Sedes

University of Toulouse, Toulouse, France

University of Toulouse, Toulouse, France
View Profile

MEDES: Proceedings of the 8th International Conference on Management of Digital EcoSystemsNovember 2016Pages 109–116https://doi.org/10.1145/3012071.3012078

Published:01 November 2016Publication History

MEDES: Proceedings of the 8th International Conference on Management of Digital EcoSystems

Pages 109–116

ABSTRACT

Twitter is one of the most popular microblogging social systems, which provides a set of distinctive posting services operating in real time. The flexibility of these services has attracted unethical individuals, so-called "spammers", aiming at spreading malicious, phishing, and misleading information. Unfortunately, the existence of spam results non-ignorable problems related to search and user's privacy. In the battle of fighting spam, various detection methods have been designed, which work by automating the detection process using the "features" concept combined with machine learning methods. However, the existing features are not effective enough to adapt spammers' tactics due to the ease of manipulation in the features. Also, the graph features are not suitable for Twitter based applications, though the high performance obtainable when applying such features.

In this paper, beyond the simple statistical features such as number of hashtags and number of URLs, we examine the time property through advancing the design of some features used in the literature, and proposing new time based features. The new design of features is divided between robust advanced statistical features incorporating explicitly the time attribute, and behavioral features identifying any posting behavior pattern. The experimental results show that the new form of features is able to classify correctly the majority of spammers with an accuracy higher than 93% when using Random Forest learning algorithm, applied on a collected and annotated data-set. The results obtained outperform the accuracy of the state of the art features by about 6%, proving the significance of leveraging time in detecting spam accounts.

References

Formerly Digital Marketing Ramblings. By the numbers: 170+ amazing twitter statistics. http://expandedramblings.com/index.php/march-2013-by-\\the-numbers-a-few-amazing-twitter-stats/, 2013. {Online; accessed 1-July-2016}.Google Scholar
Fabricio Benevenuto, Gabriel Magno, Tiago Rodrigues, and Virgilio Almeida. Detecting spammers on twitter. In In Collaboration, Electronic messaging, Anti-Abuse and Spam Conference (CEAS, page 12, 2010.Google Scholar
Alex Hai Wang. Don't follow me: Spam detection in twitter. In Security and Cryptography (SECRYPT), Proceedings of the 2010 International Conference on, pages 1--10, July 2010.Google Scholar
Kyumin Lee, James Caverlee, and Steve Webb. Uncovering social spammers: Social honeypots + machine learning. In Proceedings of the 33rd International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR '10, pages 435--442, New York, NY, USA, 2010. ACM. Google ScholarDigital Library
M. McCord and M. Chuah. Spam detection on twitter using traditional classifiers. In Proceedings of the 8th International Conference on Autonomic and Trusted Computing, ATC'11, pages 175--186. Springer-Verlag, 2011. Google ScholarDigital Library
Gianluca Stringhini, Christopher Kruegel, and Giovanni Vigna. Detecting spammers on social networks. In Proceedings of the 26th Annual Computer Security Applications Conference, ACSAC '10, pages 1--9, New York, NY, USA, 2010. ACM. Google ScholarDigital Library
Chao Yang, Robert Chandler Harkreader, and Guofei Gu. Die free or live hard? empirical evaluation and new design for fighting evolving twitter spammers. In Proceedings of the 14th International Conference on Recent Advances in Intrusion Detection, RAID'11, pages 318--337, Berlin, Heidelberg, 2011. Springer-Verlag. Google ScholarDigital Library
Amit A Amleshwaram, Nutan Reddy, Suneel Yadav, Guofei Gu, and Chao Yang. Cats: Characterizing automation of twitter spammers. In Communication Systems and Networks (COMSNETS), 2013 Fifth International Conference on, pages 1--10. IEEE, 2013.Google ScholarCross Ref
Cheng Cao and James Caverlee. Detecting spam urls in social media via behavioral analysis. In Advances in Information Retrieval, pages 703--714. Springer, 2015.Google ScholarCross Ref
Zi Chu, Indra Widjaja, and Haining Wang. Detecting social spam campaigns on twitter. In Applied Cryptography and Network Security, pages 455--472. Springer, 2012. Google ScholarDigital Library
Claudia Meda, Federica Bisio, Paolo Gastaldo, and Rodolfo Zunino. A machine learning approach for twitter spammers detection. In 2014 International Carnahan Conference on Security Technology (ICCST), pages 1--6. IEEE, 2014.Google ScholarCross Ref
Igor Santos, Igor Miambres-Marcos, Carlos Laorden, Patxi Galn-Garca, Aitor Santamara-Ibirika, and Pablo Garca Bringas. Twitter content-based spam filtering. In International Joint Conference SOCO'13-CISIS'13-ICEUTE'13, pages 449--458. Springer, 2014.Google ScholarCross Ref
Juan Martinez-Romo and Lourdes Araujo. Detecting malicious tweets in trending topics using a statistical analysis of language. Expert Systems with Applications, 40(8):2992--3000, 2013. Google ScholarDigital Library
Twitter. The twitter rules. https://support.twitter.com/articles/18311#, 2016. {Online; accessed 1-July-2016}.Google Scholar
Chao Yang, Robert Harkreader, Jialong Zhang, Seungwon Shin, and Guofei Gu. Analyzing spammers' social networks for fun and profit: A case study of cyber criminal ecosystem on twitter. In Proceedings of the 21st International Conference on World Wide Web, WWW '12, pages 71--80, New York, NY, USA, 2012. ACM. Google ScholarDigital Library
Zi Chu, Steven Gianvecchio, Haining Wang, and Sushil Jajodia. Detecting automation of twitter accounts: Are you a human, bot, or cyborg? Dependable and Secure Computing, IEEE Transactions on, 9(6):811--824, 2012. Google ScholarDigital Library
Xia Hu, Jiliang Tang, and Huan Liu. Online social spammer detection. In AAAI, pages 59--65, 2014. Google ScholarDigital Library
Xia Hu, Jiliang Tang, Yanchao Zhang, and Huan Liu. Social spammer detection in microblogging. In IJCAI, volume 13, pages 2633--2639. Citeseer, 2013. Google ScholarDigital Library
Pang-Ning Tan, Michael Steinbach, and Vipin Kumar. Introduction to Data Mining, (First Edition). Addison-Wesley Longman Publishing Co., Inc., Boston, MA, USA, 2005. Google ScholarDigital Library
Solomon Kullback and Richard A Leibler. On information and sufficiency. The annals of mathematical statistics, 22(1):79--86, 1951.Google Scholar
Alex Hai Wang. Detecting spam bots in online social networking sites: A machine learning approach. In Proceedings of the 24th Annual IFIP WG 11.3 Working Conference on Data and Applications Security and Privacy, DBSec'10, pages 335--342, Berlin, Heidelberg, 2010. Springer-Verlag. Google ScholarDigital Library
Po-Ching Lin and Po-Min Huang. A study of effective features for detecting long-surviving twitter spam accounts. In Advanced Communication Technology (ICACT), 2013 15th International Conference on, pages 841--846, Jan 2013.Google Scholar
Sarita Yardi, Daniel Romero, Grant Schoenebeck, and danah boyd. Detecting spam in a twitter network. First Monday, 15(1), 2009.Google Scholar
Kurt Thomas, Chris Grier, Dawn Song, and Vern Paxson. Suspended accounts in retrospect: An analysis of twitter spam. In Proceedings of the 2011 ACM SIGCOMM Conference on Internet Measurement Conference, IMC '11, pages 243--258, New York, NY, USA, 2011. ACM. Google ScholarDigital Library
Hongyu Gao, Yan Chen, Kathy Lee, Diana Palsetia, and Alok N Choudhary. Towards online spam filtering in social networks. In NDSS, page 16, 2012.Google Scholar
Jonghyuk Song, Sangho Lee, and Jong Kim. Spam filtering in twitter using sender-receiver relationship. In Recent Advances in Intrusion Detection, pages 301--317. Springer, 2011. Google ScholarDigital Library
Mark Hall, Eibe Frank, Geoffrey Holmes, Bernhard Pfahringer, Peter Reutemann, and Ian H. Witten. The weka data mining software: An update. SIGKDD Explor. Newsl., 11(1):10--18, November 2009. Google ScholarDigital Library
Diansheng Guo and Chao Chen. Detecting non-personal and spam users on geo-tagged twitter network. Transactions in GIS, 18(3):370--384, 2014.Google ScholarCross Ref

Index Terms

Leveraging time for spammers detection on Twitter
1. Information systems
  1. Information systems applications
    1. Data mining
      1. Data stream mining

Recommendations

Correlation Analysis between Spamming Botnets and Malware Infected Hosts
SAINT '11: Proceedings of the 2011 IEEE/IPSJ International Symposium on Applications and the Internet

Many of recent cyber attacks are being launched by botnets for the purpose of carrying out large-scale cyber attacks such as spam emails, Distributed Denial of Service (DDoS), network scanning and so on. In many cases, these botnets consist of a lot of ...
Read More
Feature engineering for detecting spammers on Twitter

Twitter is a social networking website that has gained a lot of popularity around the world in the last decade. This popularity made Twitter a common target for spammers and malicious users to spread unwanted advertisements, viruses and phishing ...
Read More
Understanding the network-level behavior of spammers
SIGCOMM '06: Proceedings of the 2006 conference on Applications, technologies, architectures, and protocols for computer communications

This paper studies the network-level behavior of spammers, including: IP address ranges that send the most spam, common spamming modes (e.g., BGP route hijacking, bots), how persistent across time each spamming host is, and characteristics of spamming ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
MEDES: Proceedings of the 8th International Conference on Management of Digital EcoSystems
November 2016
243 pages
ISBN:9781450342674
DOI:10.1145/3012071
General Chair:
Richard Chbeir,
Program Chairs:
Rajeev Agrawal,
Ismail Biskri
Copyright © 2016 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 1 November 2016
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Spam
honeypot
legitimate users
machine learning
time
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate267of682submissions,39%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 6
  Total Citations
  View Citations
- 178
  Total Downloads
- Downloads (Last 12 months)2
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Leveraging time for spammers detection on Twitter

MEDES: Proceedings of the 8th International Conference on Management of Digital EcoSystems

ABSTRACT

References

Cited By

Index Terms

Recommendations

Correlation Analysis between Spamming Botnets and Malware Infected Hosts

Feature engineering for detecting spammers on Twitter

Understanding the network-level behavior of spammers

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Leveraging time for spammers detection on Twitter

MEDES: Proceedings of the 8th International Conference on Management of Digital EcoSystems

ABSTRACT

References

Cited By

Index Terms

Recommendations

Correlation Analysis between Spamming Botnets and Malware Infected Hosts

Feature engineering for detecting spammers on Twitter

Understanding the network-level behavior of spammers

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media