research-article

Who is tweeting on Twitter: human, bot, or cyborg?

Authors:
Zi Chu

The College of William and Mary, Williamsburg, VA

The College of William and Mary, Williamsburg, VA
View Profile

,
Steven Gianvecchio

The College of William and Mary, Williamsburg, VA

The College of William and Mary, Williamsburg, VA
View Profile

,
Haining Wang

The College of William and Mary, Williamsburg, VA

The College of William and Mary, Williamsburg, VA
View Profile

,
Sushil Jajodia

George Mason University, Fairfax, VA

George Mason University, Fairfax, VA
View Profile

ACSAC '10: Proceedings of the 26th Annual Computer Security Applications ConferenceDecember 2010Pages 21–30https://doi.org/10.1145/1920261.1920265

Published:06 December 2010Publication History

ACSAC '10: Proceedings of the 26th Annual Computer Security Applications Conference

Pages 21–30

ABSTRACT

Twitter is a new web application playing dual roles of online social networking and micro-blogging. Users communicate with each other by publishing text-based posts. The popularity and open structure of Twitter have attracted a large number of automated programs, known as bots, which appear to be a double-edged sword to Twitter. Legitimate bots generate a large amount of benign tweets delivering news and updating feeds, while malicious bots spread spam or malicious contents. More interestingly, in the middle between human and bot, there has emerged cyborg referred to either bot-assisted human or human-assisted bot. To assist human users in identifying who they are interacting with, this paper focuses on the classification of human, bot and cyborg accounts on Twitter. We first conduct a set of large-scale measurements with a collection of over 500,000 accounts. We observe the difference among human, bot and cyborg in terms of tweeting behavior, tweet content, and account properties. Based on the measurement results, we propose a classification system that includes the following four parts: (1) an entropy-based component, (2) a machine-learning-based component, (3) an account properties component, and (4) a decision maker. It uses the combination of features extracted from an unknown user to determine the likelihood of being a human, bot or cyborg. Our experimental evaluation demonstrates the efficacy of the proposed classification system.

References

Amazon comes to twitter. http://www.readwriteweb.com/archives/amazon_comes_to_twitter.php {Accessed: Dec. 20, 2009}.Google Scholar
Barack obama uses twitter in 2008 presidential campaign. http://twitter.com/BarackObama/ {Accessed: Dec. 20, 2009}.Google Scholar
Best buy goes all twitter crazy with @twelpforce. http://twitter.com/in_social_media/status/2756927865 {Accessed: Dec. 20, 2009}.Google Scholar
The crm114 discriminator. http://crm114.sourceforge.net/ {Accessed: Sept. 12, 2009}.Google Scholar
Alexa. The top 500 sites on the web by alexa. http://www.alexa.com/topsites {Accessed: Jan. 15, 2010}.Google Scholar
Meeyoung Cha, Haewoon Kwak, Pablo Rodriguez, Yong-Yeol Ahn, and Sue Moon. I tube, you tube, everybody tubes: analyzing the world's largest user generated content video system. In Proceedings of the 7th ACM SIGCOMM Conference on Internet Measurement, San Diego, CA, USA, 2007. Google ScholarDigital Library
Meeyoung Cha, Alan Mislove, and Krishna P. Gummadi. A measurement-driven analysis of information propagation in the flickr social network. In Proceedings of the 18th International Conference on World Wide Web, Madrid, Spain, 2009. Google ScholarDigital Library
Thomas M. Cover and Joy A. Thomas. Elements of information theory. Wiley-Interscience, New York, NY, USA, 2006. Google ScholarDigital Library
Marcel Dischinger, Andreas Haeberlen, Krishna P. Gummadi, and Stefan Saroiu. Characterizing residential broadband networks. In Proceedings of the 7th ACM SIGCOMM conference on Internet Measurement, San Diego, CA, USA, 2007. Google ScholarDigital Library
Il-Chul Moon Dongwoo Kim, Yohan Jo and Alice Oh. Analysis of twitter lists as a potential source for discovering latent characteristics of users. In To appear on CHI 2010 Workshop on Microblogging: What and How Can We Learn From It?, 2010.Google Scholar
Henry J. Fowler and Will E. Leland. Local area network traffic characteristics, with implications for broadband network congestion management. IEEE Journal of Selected Areas in Communications, 9(7), 1991.Google ScholarDigital Library
Steven Gianvecchio and Haining Wang. Detecting covert timing channels: An entropy-based approach. In Proceedings of the 2007 ACM Conference on Computer and Communications Security, Alexandria, VA, USA, October-November 2007. Google ScholarDigital Library
Steven Gianvecchio, Zhenyu Wu, Mengjun Xie, and Haining Wang. Battle of botcraft: fighting bots in online games with human observational proofs. In Proceedings of the 16th ACM conference on Computer and Communications Security, Chicago, IL, USA, 2009. Google ScholarDigital Library
Steven Gianvecchio, Mengjun Xie, Zhenyu Wu, and Haining Wang. Measurement and classification of humans and bots in internet chat. In Proceedings of the 17th USENIX Security symposium, San Jose, CA, 2008. Google ScholarDigital Library
Minas Gjoka, Maciej Kurant, Carter T Butts, and Athina Markopoulou. Walking in facebook: A case study of unbiased sampling of osns. In Proceedings of the 27th IEEE International Conference on Computer Communications, San Diego, CA, USA, March 2010. Google ScholarDigital Library
Google. Google safe browsing API. http://code.google.com/apis/safebrowsing/ {Accessed: Feb. 5, 2010}.Google Scholar
Paul Graham. A plan for spam, 2002. http://www.paulgraham.com/spam.html {Accessed: Jan. 25, 2008}.Google Scholar
Monika R. Henzinger, Allan Heydon, Michael Mitzenmacher, and Marc Najork. On near-uniform url sampling. In Proceedings of the 9th International World Wide Web Conference on Computer Networks, Amsterdam, The Netherlands, May 2000. Google ScholarDigital Library
Christopher M. Hill and Linda C. Malone. Using simulated data in support of research on regression analysis. In WSC '04: Proceedings of the 36th conference on Winter simulation, 2004. Google ScholarDigital Library
B A Huberman and T Hogg. Complexity and adaptation. Phys. D, 2(1--3), 1986. Google ScholarDigital Library
A. L. Hughes and L. Palen. Twitter adoption and use in mass convergence and emergency events. In Proceedings of the 6th International ISCRAM Conference, Gothenburg, Sweden, May 2009.Google ScholarCross Ref
H. Husna, S. Phithakkitnukoon, and R. Dantu. Traffic shaping of spam botnets. In Proceedings of the 5th IEEE Conference on Consumer Communications and Networking, Las Vegas, NV, USA, January 2008.Google ScholarCross Ref
Bernard J. Jansen, Mimi Zhang, Kate Sobel, and Abdur Chowdury. Twitter power: Tweets as electronic word of mouth. American Society for Information Science and Technology, 60(11), 2009. Google ScholarDigital Library
Akshay Java, Xiaodan Song, Tim Finin, and Belle Tseng. Why we twitter: understanding microblogging usage and communities. In Proceedings of the 9th WebKDD and 1st SNA-KDD 2007 Workshop on Web Mining and Social Network Analysis, San Jose, CA, USA, 2007. Google ScholarDigital Library
Balachander Krishnamurthy, Phillipa Gill, and Martin Arlitt. A few chirps about twitter. In Proceedings of the First Workshop on Online Social Networks, Seattle, WA, USA, 2008. Google ScholarDigital Library
G. J. McLachlan. Discriminant Analysis and Statistical Pattern Recognition. Wiley Interscience, 2004.Google Scholar
Alan Mislove, Massimiliano Marcon, Krishna P. Gummadi, Peter Druschel, and Bobby Bhattacharjee. Measurement and analysis of online social networks. In Proceedings of the 7th ACM SIGCOMM Conference on Internet Measurement, San Diego, CA, USA, 2007. Google ScholarDigital Library
A Porta, G Baselli, D Liberati, N Montano, C Cogliati, T Gnecchi-Ruscone, A Malliani, and S Cerutti. Measuring regularity by means of a corrected conditional entropy in sympathetic outflow. Biological Cybernetics, Vol. 78(No. 1), January 1998.Google Scholar
P. Real. A generalized analysis of variance program utilizing binary logic. In ACM '59: Preprints of papers presented at the 14th national meeting of the Association for Computing Machinery, New York, NY, USA, 1959. Google ScholarDigital Library
Erick Schonfeld. Costolo: Twitter now has 190 million users tweeting 65 million times a day. http://techcrunch.com/2010/06/08/twitter-190-million-users/ {Accessed: Sept. 26, 2010}.Google Scholar
Fabrizio Sebastiani. Machine learning in automated text categorization. ACM Computing Surveys, Vol. 34(No. 1), 2002. Google ScholarDigital Library
Kate Starbird, Leysia Palen, Amanda Hughes, and Sarah Vieweg. Chatter on the red: What hazards threat reveals about the social life of microblogged information. In Proceedings of the ACM 2010 Conference on Computer Supported Cooperative Work, February 2010. Google ScholarDigital Library
Statsoft. Statistica, a statistics and analytics software package developed by statsoft. http://www.statsoft.com/support/download/brochures/ {Accessed: Mar. 12, 2010}.Google Scholar
Brett Stone-Gross, Marco Cova, Lorenzo Cavallaro, Bob Gilbert, Martin Szydlowski, Richard Kemmerer, Christopher Kruegel, and Giovanni Vigna. Your botnet is my botnet: analysis of a botnet takeover. In Proceedings of the 16th ACM conference on Computer and Communications Security, Chicago, IL, USA, 2009. Google ScholarDigital Library
J. Sutton, Leysia Palen, and Irina Shlovski. Back-channels on the front lines: Emerging use of social media in the 2007 southern california wildfires. In Proceedings of the 2008 ISCRAM Conference, Washington, DC, USA, May 2008.Google Scholar
Alan M. Turing. Computing machinery and intelligence. Mind, Vol. 59:433--460, 1950.Google ScholarDigital Library
Tweetadder. Automatic twitter software. http://www.tweetadder.com/ {Accessed: Feb. 5, 2010}.Google Scholar
Twitter. How to report spam on twitter. http://help.twitter.com/entries/64986 {Accessed: May. 30, 2010}.Google Scholar
Twitter. Twitter api wiki. http://apiwiki.twitter.com/ {Accessed: Feb. 5, 2010}.Google Scholar
Mengjun Xie, Zhenyu Wu, and Haining Wang. Honeyim: Fast detection and suppression of instant messaging malware in enterprise-like networks,. In Proceedings of the 23rd Annual Computer Security Applications Conference, Miami Beach, FL, USA, 2007.Google ScholarCross Ref
Mengjun Xie, Heng Yin, and Haining Wang. An effective defense against email spam laundering. In Proceedings of the 13th ACM conference on Computer and Communications Security, Alexandria, VA, USA, 2006. Google ScholarDigital Library
Jeff Yan. Bot, cyborg and automated turing test. In Proceedings of the 14th International Workshop on Security Protocols, Cambridge, UK, March 2006.Google Scholar
Sarita Yardi, Daniel Romero, Grant Schoenebeck, and Danah Boyd. Detecting spam in a twitter network. First Monday, 15(1), January 2010.Google Scholar
Jonathan A. Zdziarski. Ending Spam: Bayesian Content Filtering and the Art of Statistical Language Classification. No Starch Press, 2005. Google ScholarDigital Library
Dejin Zhao and Mary Beth Rosson. How and why people twitter: the role that micro-blogging plays in informal communication at work. In Proceedings of the ACM 2009 International Conference on Supporting Group Work, Sanibel Island, FL, USA, 2009. Google ScholarDigital Library

Index Terms

Who is tweeting on Twitter: human, bot, or cyborg?
1. Security and privacy
  1. Network security

Recommendations

Detecting Automation of Twitter Accounts: Are You a Human, Bot, or Cyborg?

Twitter is a new web application playing dual roles of online social networking and microblogging. Users communicate with each other by publishing text-based posts. The popularity and open structure of Twitter have attracted a large number of automated ...
Read More
A sentiment analysis of audiences on twitter: who is the positive or negative audience of popular twitterers?
ICHIT'11: Proceedings of the 5th international conference on Convergence and hybrid information technology

Microblogging is a new informal communication medium of blogging that differs from a traditional blog in which content is much shorter. Microbloggers post about topics that describe their current status. Twitter is a popular microblogging service and ...
Read More
Information resonance on Twitter: watching Iran
SOMA '10: Proceedings of the First Workshop on Social Media Analytics

Twitter has undoubtedly caught the attention of both the general public, and academia as a microblogging service worthy of study and attention. Twitter has several features that sets it apart from other social media/networking sites, including its 140 ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
ACSAC '10: Proceedings of the 26th Annual Computer Security Applications Conference
December 2010
419 pages
ISBN:9781450301336
DOI:10.1145/1920261
Conference Chair:
Carrie Gates
CA Labs
,
Program Chairs:
Michael Franz
University of California, Irvine
,
John McDermott
Naval Research Lab
Copyright © 2010 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 6 December 2010
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Twitter
automatic identification
bot
cyborg
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate104of497submissions,21%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 324
  Total Citations
  View Citations
- 4,058
  Total Downloads
- Downloads (Last 12 months)199
- Downloads (Last 6 weeks)28
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Who is tweeting on Twitter: human, bot, or cyborg?

ACSAC '10: Proceedings of the 26th Annual Computer Security Applications Conference

ABSTRACT

References

Cited By

Index Terms

Recommendations

Detecting Automation of Twitter Accounts: Are You a Human, Bot, or Cyborg?

A sentiment analysis of audiences on twitter: who is the positive or negative audience of popular twitterers?

Information resonance on Twitter: watching Iran