skip to main content
10.1145/2766462.2767799acmconferencesArticle/Chapter ViewAbstractPublication PagesirConference Proceedingsconference-collections
short-paper

Time-Aware Authorship Attribution for Short Text Streams

Published:09 August 2015Publication History

ABSTRACT

Identifying authors of short texts on Internet or social media based communication systems is an important tool against fraud and cybercrimes. Besides the challenges raised by the limited length of these short messages, evolving language and writing styles of authors of these texts makes authorship attribution difficult. Most current short text authorship attribution approaches only address the challenge of limited text length. However, neglecting the second challenge may lead to poor performance of authorship attribution for authors who change their writing styles.

In this paper, we analyse the temporal changes of word usage by authors of tweets and emails and based on this analysis we propose an approach to estimate the dynamicity of authors' word usage. The proposed approach is inspired by time-aware language models and can be employed in any time-unaware authorship attribution method. Our experiments on Tweets and the Enron email dataset show that the proposed time-aware authorship attribution approach significantly outperforms baselines that neglect the dynamicity of authors.

References

  1. G. Frantzeskou, E. Stamatatos, S. Gritzalis, C. E. Chaski, and B. S. Howald. Identifying authorship by byte-level n-grams: The source code author profile (SCAP) method. IJDE, 6(1), 2007.Google ScholarGoogle Scholar
  2. N. Kanhabua and K. Nørvåg. A comparison of time-aware ranking methods. SIGIR '11, pages 1257--1258, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. B. Klimt and Y. Yang. The enron corpus: A new dataset for email classification research. In ECML'04, pages 217--226, 2004.Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. M. Koppel, J. Schler, and S. Argamon. Authorship attribution in the wild. LREC, 45(1):83--94, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. M. Koppel, J. Schler, S. Argamon, and E. Messeri. Authorship attribution with thousands of candidate authors. SIGIR '06, pages 659--660, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. M. Koppel and Y. Winter. Determining if two documents are written by the same author. JASIST, 65(1):178--187, 2014.Google ScholarGoogle Scholar
  7. I. Lancashire and G. Hirst. Vocabulary changes in agatha christie's mysteries as an indication of dementia: A case study. In 19th Annual Rotman Research Institute Conference Cognitive Aging: Research and Practice, pages 1--5, 2009.Google ScholarGoogle Scholar
  8. R. Layton, P. Watters, and R. Dazeley. Authorship attribution for twitter in 140 characters or less. In Cybercrime and Trustworthy Computing Workshop (CTC), pages 1--8, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. X. Li and W. B. Croft. Time-based language models. CIKM '03, pages 469--475, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. R. Schwartz, O. Tsur, A. Rappoport, and M. Koppel. Authorship attribution of micro-messages. In EMNLP'13, pages 1880--1891, 2013.Google ScholarGoogle Scholar
  11. R. S. Silva, G. Laboreiro, L. Sarmento, T. Grant, E. Oliveira, and B. Maia. Twazn me: Automatic authorship analysis of micro-blogging messages. NLDB'11, pages 161--168, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. E. Stamatatos. A survey of modern authorship attribution methods. JASIST, 60(3):538--556, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. M. van Dam and C. Hauff. Large-scale author verification: Temporal and topical influences. SIGIR '14, pages 1039--1042, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. C. Zhai and J. Lafferty. A study of smoothing methods for language models applied to information retrieval. TOIS, 22(2):179--214, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Time-Aware Authorship Attribution for Short Text Streams

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      SIGIR '15: Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval
      August 2015
      1198 pages
      ISBN:9781450336215
      DOI:10.1145/2766462

      Copyright © 2015 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 9 August 2015

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • short-paper

      Acceptance Rates

      SIGIR '15 Paper Acceptance Rate70of351submissions,20%Overall Acceptance Rate792of3,983submissions,20%

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader