skip to main content
10.1145/1920261.1920265acmotherconferencesArticle/Chapter ViewAbstractPublication PagesacsacConference Proceedingsconference-collections
research-article

Who is tweeting on Twitter: human, bot, or cyborg?

Published:06 December 2010Publication History

ABSTRACT

Twitter is a new web application playing dual roles of online social networking and micro-blogging. Users communicate with each other by publishing text-based posts. The popularity and open structure of Twitter have attracted a large number of automated programs, known as bots, which appear to be a double-edged sword to Twitter. Legitimate bots generate a large amount of benign tweets delivering news and updating feeds, while malicious bots spread spam or malicious contents. More interestingly, in the middle between human and bot, there has emerged cyborg referred to either bot-assisted human or human-assisted bot. To assist human users in identifying who they are interacting with, this paper focuses on the classification of human, bot and cyborg accounts on Twitter. We first conduct a set of large-scale measurements with a collection of over 500,000 accounts. We observe the difference among human, bot and cyborg in terms of tweeting behavior, tweet content, and account properties. Based on the measurement results, we propose a classification system that includes the following four parts: (1) an entropy-based component, (2) a machine-learning-based component, (3) an account properties component, and (4) a decision maker. It uses the combination of features extracted from an unknown user to determine the likelihood of being a human, bot or cyborg. Our experimental evaluation demonstrates the efficacy of the proposed classification system.

References

  1. Amazon comes to twitter. http://www.readwriteweb.com/archives/amazon_comes_to_twitter.php {Accessed: Dec. 20, 2009}.Google ScholarGoogle Scholar
  2. Barack obama uses twitter in 2008 presidential campaign. http://twitter.com/BarackObama/ {Accessed: Dec. 20, 2009}.Google ScholarGoogle Scholar
  3. Best buy goes all twitter crazy with @twelpforce. http://twitter.com/in_social_media/status/2756927865 {Accessed: Dec. 20, 2009}.Google ScholarGoogle Scholar
  4. The crm114 discriminator. http://crm114.sourceforge.net/ {Accessed: Sept. 12, 2009}.Google ScholarGoogle Scholar
  5. Alexa. The top 500 sites on the web by alexa. http://www.alexa.com/topsites {Accessed: Jan. 15, 2010}.Google ScholarGoogle Scholar
  6. Meeyoung Cha, Haewoon Kwak, Pablo Rodriguez, Yong-Yeol Ahn, and Sue Moon. I tube, you tube, everybody tubes: analyzing the world's largest user generated content video system. In Proceedings of the 7th ACM SIGCOMM Conference on Internet Measurement, San Diego, CA, USA, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Meeyoung Cha, Alan Mislove, and Krishna P. Gummadi. A measurement-driven analysis of information propagation in the flickr social network. In Proceedings of the 18th International Conference on World Wide Web, Madrid, Spain, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Thomas M. Cover and Joy A. Thomas. Elements of information theory. Wiley-Interscience, New York, NY, USA, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Marcel Dischinger, Andreas Haeberlen, Krishna P. Gummadi, and Stefan Saroiu. Characterizing residential broadband networks. In Proceedings of the 7th ACM SIGCOMM conference on Internet Measurement, San Diego, CA, USA, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Il-Chul Moon Dongwoo Kim, Yohan Jo and Alice Oh. Analysis of twitter lists as a potential source for discovering latent characteristics of users. In To appear on CHI 2010 Workshop on Microblogging: What and How Can We Learn From It?, 2010.Google ScholarGoogle Scholar
  11. Henry J. Fowler and Will E. Leland. Local area network traffic characteristics, with implications for broadband network congestion management. IEEE Journal of Selected Areas in Communications, 9(7), 1991.Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Steven Gianvecchio and Haining Wang. Detecting covert timing channels: An entropy-based approach. In Proceedings of the 2007 ACM Conference on Computer and Communications Security, Alexandria, VA, USA, October-November 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Steven Gianvecchio, Zhenyu Wu, Mengjun Xie, and Haining Wang. Battle of botcraft: fighting bots in online games with human observational proofs. In Proceedings of the 16th ACM conference on Computer and Communications Security, Chicago, IL, USA, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Steven Gianvecchio, Mengjun Xie, Zhenyu Wu, and Haining Wang. Measurement and classification of humans and bots in internet chat. In Proceedings of the 17th USENIX Security symposium, San Jose, CA, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Minas Gjoka, Maciej Kurant, Carter T Butts, and Athina Markopoulou. Walking in facebook: A case study of unbiased sampling of osns. In Proceedings of the 27th IEEE International Conference on Computer Communications, San Diego, CA, USA, March 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Google. Google safe browsing API. http://code.google.com/apis/safebrowsing/ {Accessed: Feb. 5, 2010}.Google ScholarGoogle Scholar
  17. Paul Graham. A plan for spam, 2002. http://www.paulgraham.com/spam.html {Accessed: Jan. 25, 2008}.Google ScholarGoogle Scholar
  18. Monika R. Henzinger, Allan Heydon, Michael Mitzenmacher, and Marc Najork. On near-uniform url sampling. In Proceedings of the 9th International World Wide Web Conference on Computer Networks, Amsterdam, The Netherlands, May 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Christopher M. Hill and Linda C. Malone. Using simulated data in support of research on regression analysis. In WSC '04: Proceedings of the 36th conference on Winter simulation, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. B A Huberman and T Hogg. Complexity and adaptation. Phys. D, 2(1--3), 1986. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. A. L. Hughes and L. Palen. Twitter adoption and use in mass convergence and emergency events. In Proceedings of the 6th International ISCRAM Conference, Gothenburg, Sweden, May 2009.Google ScholarGoogle ScholarCross RefCross Ref
  22. H. Husna, S. Phithakkitnukoon, and R. Dantu. Traffic shaping of spam botnets. In Proceedings of the 5th IEEE Conference on Consumer Communications and Networking, Las Vegas, NV, USA, January 2008.Google ScholarGoogle ScholarCross RefCross Ref
  23. Bernard J. Jansen, Mimi Zhang, Kate Sobel, and Abdur Chowdury. Twitter power: Tweets as electronic word of mouth. American Society for Information Science and Technology, 60(11), 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Akshay Java, Xiaodan Song, Tim Finin, and Belle Tseng. Why we twitter: understanding microblogging usage and communities. In Proceedings of the 9th WebKDD and 1st SNA-KDD 2007 Workshop on Web Mining and Social Network Analysis, San Jose, CA, USA, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Balachander Krishnamurthy, Phillipa Gill, and Martin Arlitt. A few chirps about twitter. In Proceedings of the First Workshop on Online Social Networks, Seattle, WA, USA, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. G. J. McLachlan. Discriminant Analysis and Statistical Pattern Recognition. Wiley Interscience, 2004.Google ScholarGoogle Scholar
  27. Alan Mislove, Massimiliano Marcon, Krishna P. Gummadi, Peter Druschel, and Bobby Bhattacharjee. Measurement and analysis of online social networks. In Proceedings of the 7th ACM SIGCOMM Conference on Internet Measurement, San Diego, CA, USA, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. A Porta, G Baselli, D Liberati, N Montano, C Cogliati, T Gnecchi-Ruscone, A Malliani, and S Cerutti. Measuring regularity by means of a corrected conditional entropy in sympathetic outflow. Biological Cybernetics, Vol. 78(No. 1), January 1998.Google ScholarGoogle Scholar
  29. P. Real. A generalized analysis of variance program utilizing binary logic. In ACM '59: Preprints of papers presented at the 14th national meeting of the Association for Computing Machinery, New York, NY, USA, 1959. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Erick Schonfeld. Costolo: Twitter now has 190 million users tweeting 65 million times a day. http://techcrunch.com/2010/06/08/twitter-190-million-users/ {Accessed: Sept. 26, 2010}.Google ScholarGoogle Scholar
  31. Fabrizio Sebastiani. Machine learning in automated text categorization. ACM Computing Surveys, Vol. 34(No. 1), 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Kate Starbird, Leysia Palen, Amanda Hughes, and Sarah Vieweg. Chatter on the red: What hazards threat reveals about the social life of microblogged information. In Proceedings of the ACM 2010 Conference on Computer Supported Cooperative Work, February 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Statsoft. Statistica, a statistics and analytics software package developed by statsoft. http://www.statsoft.com/support/download/brochures/ {Accessed: Mar. 12, 2010}.Google ScholarGoogle Scholar
  34. Brett Stone-Gross, Marco Cova, Lorenzo Cavallaro, Bob Gilbert, Martin Szydlowski, Richard Kemmerer, Christopher Kruegel, and Giovanni Vigna. Your botnet is my botnet: analysis of a botnet takeover. In Proceedings of the 16th ACM conference on Computer and Communications Security, Chicago, IL, USA, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. J. Sutton, Leysia Palen, and Irina Shlovski. Back-channels on the front lines: Emerging use of social media in the 2007 southern california wildfires. In Proceedings of the 2008 ISCRAM Conference, Washington, DC, USA, May 2008.Google ScholarGoogle Scholar
  36. Alan M. Turing. Computing machinery and intelligence. Mind, Vol. 59:433--460, 1950.Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Tweetadder. Automatic twitter software. http://www.tweetadder.com/ {Accessed: Feb. 5, 2010}.Google ScholarGoogle Scholar
  38. Twitter. How to report spam on twitter. http://help.twitter.com/entries/64986 {Accessed: May. 30, 2010}.Google ScholarGoogle Scholar
  39. Twitter. Twitter api wiki. http://apiwiki.twitter.com/ {Accessed: Feb. 5, 2010}.Google ScholarGoogle Scholar
  40. Mengjun Xie, Zhenyu Wu, and Haining Wang. Honeyim: Fast detection and suppression of instant messaging malware in enterprise-like networks,. In Proceedings of the 23rd Annual Computer Security Applications Conference, Miami Beach, FL, USA, 2007.Google ScholarGoogle ScholarCross RefCross Ref
  41. Mengjun Xie, Heng Yin, and Haining Wang. An effective defense against email spam laundering. In Proceedings of the 13th ACM conference on Computer and Communications Security, Alexandria, VA, USA, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. Jeff Yan. Bot, cyborg and automated turing test. In Proceedings of the 14th International Workshop on Security Protocols, Cambridge, UK, March 2006.Google ScholarGoogle Scholar
  43. Sarita Yardi, Daniel Romero, Grant Schoenebeck, and Danah Boyd. Detecting spam in a twitter network. First Monday, 15(1), January 2010.Google ScholarGoogle Scholar
  44. Jonathan A. Zdziarski. Ending Spam: Bayesian Content Filtering and the Art of Statistical Language Classification. No Starch Press, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. Dejin Zhao and Mary Beth Rosson. How and why people twitter: the role that micro-blogging plays in informal communication at work. In Proceedings of the ACM 2009 International Conference on Supporting Group Work, Sanibel Island, FL, USA, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Who is tweeting on Twitter: human, bot, or cyborg?

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Other conferences
      ACSAC '10: Proceedings of the 26th Annual Computer Security Applications Conference
      December 2010
      419 pages
      ISBN:9781450301336
      DOI:10.1145/1920261

      Copyright © 2010 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 6 December 2010

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article

      Acceptance Rates

      Overall Acceptance Rate104of497submissions,21%

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader