short-paper

Time-Aware Authorship Attribution for Short Text Streams

Authors:
Hosein Azarbonyad

University of Amsterdam, Amsterdam, Netherlands

University of Amsterdam, Amsterdam, Netherlands
View Profile

,
Mostafa Dehghani

University of Amsterdam, Amsterdam, Netherlands

University of Amsterdam, Amsterdam, Netherlands
View Profile

,
Maarten Marx

University of Amsterdam, Amsterdam, Netherlands

University of Amsterdam, Amsterdam, Netherlands
View Profile

,
Jaap Kamps

University of Amsterdam, Amsterdam, Netherlands

University of Amsterdam, Amsterdam, Netherlands
View Profile

SIGIR '15: Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information RetrievalAugust 2015Pages 727–730https://doi.org/10.1145/2766462.2767799

Published:09 August 2015Publication History

SIGIR '15: Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval

Pages 727–730

ABSTRACT

Identifying authors of short texts on Internet or social media based communication systems is an important tool against fraud and cybercrimes. Besides the challenges raised by the limited length of these short messages, evolving language and writing styles of authors of these texts makes authorship attribution difficult. Most current short text authorship attribution approaches only address the challenge of limited text length. However, neglecting the second challenge may lead to poor performance of authorship attribution for authors who change their writing styles.

In this paper, we analyse the temporal changes of word usage by authors of tweets and emails and based on this analysis we propose an approach to estimate the dynamicity of authors' word usage. The proposed approach is inspired by time-aware language models and can be employed in any time-unaware authorship attribution method. Our experiments on Tweets and the Enron email dataset show that the proposed time-aware authorship attribution approach significantly outperforms baselines that neglect the dynamicity of authors.

References

G. Frantzeskou, E. Stamatatos, S. Gritzalis, C. E. Chaski, and B. S. Howald. Identifying authorship by byte-level n-grams: The source code author profile (SCAP) method. IJDE, 6(1), 2007.Google Scholar
N. Kanhabua and K. Nørvåg. A comparison of time-aware ranking methods. SIGIR '11, pages 1257--1258, 2011. Google ScholarDigital Library
B. Klimt and Y. Yang. The enron corpus: A new dataset for email classification research. In ECML'04, pages 217--226, 2004.Google ScholarDigital Library
M. Koppel, J. Schler, and S. Argamon. Authorship attribution in the wild. LREC, 45(1):83--94, 2011. Google ScholarDigital Library
M. Koppel, J. Schler, S. Argamon, and E. Messeri. Authorship attribution with thousands of candidate authors. SIGIR '06, pages 659--660, 2006. Google ScholarDigital Library
M. Koppel and Y. Winter. Determining if two documents are written by the same author. JASIST, 65(1):178--187, 2014.Google Scholar
I. Lancashire and G. Hirst. Vocabulary changes in agatha christie's mysteries as an indication of dementia: A case study. In 19th Annual Rotman Research Institute Conference Cognitive Aging: Research and Practice, pages 1--5, 2009.Google Scholar
R. Layton, P. Watters, and R. Dazeley. Authorship attribution for twitter in 140 characters or less. In Cybercrime and Trustworthy Computing Workshop (CTC), pages 1--8, 2010. Google ScholarDigital Library
X. Li and W. B. Croft. Time-based language models. CIKM '03, pages 469--475, 2003. Google ScholarDigital Library
R. Schwartz, O. Tsur, A. Rappoport, and M. Koppel. Authorship attribution of micro-messages. In EMNLP'13, pages 1880--1891, 2013.Google Scholar
R. S. Silva, G. Laboreiro, L. Sarmento, T. Grant, E. Oliveira, and B. Maia. Twazn me: Automatic authorship analysis of micro-blogging messages. NLDB'11, pages 161--168, 2011. Google ScholarDigital Library
E. Stamatatos. A survey of modern authorship attribution methods. JASIST, 60(3):538--556, 2009. Google ScholarDigital Library
M. van Dam and C. Hauff. Large-scale author verification: Temporal and topical influences. SIGIR '14, pages 1039--1042, 2014. Google ScholarDigital Library
C. Zhai and J. Lafferty. A study of smoothing methods for language models applied to information retrieval. TOIS, 22(2):179--214, 2004. Google ScholarDigital Library

Index Terms

Time-Aware Authorship Attribution for Short Text Streams
1. Information systems
  1. Information retrieval

Recommendations

Arabic Authorship Attribution: An Extensive Study on Twitter Posts

Law enforcement faces problems in tracing the true identity of offenders in cybercrime investigations. Most offenders mask their true identity, impersonate people of high authority, or use identity deception and obfuscation tactics to avoid detection ...
Read More
Authorship Attribution of Russian Forum Posts with Different Types of N-gram Features
NLPIR '19: Proceedings of the 2019 3rd International Conference on Natural Language Processing and Information Retrieval

Authorship attribution is an important field in online security. Recently there have been numerous successful works in authorship attribution in various European languages. Character n-grams are reported to be the best choice in authorship attribution, ...
Read More
Authorship Attribution for Short Texts with Author-Document Topic Model
Knowledge Science, Engineering and Management
Abstract
The goal of authorship attribution is to assign the controversial texts to the known authors correctly. With the development of social media services, authorship attribution for short texts becomes very necessary. In the earlier works, topic ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
SIGIR '15: Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval
August 2015
1198 pages
ISBN:9781450336215
DOI:10.1145/2766462
General Chair:
Ricardo Baeza-Yates
Yahoo Labs, USA
,
Program Chairs:
Mounia Lalmas
Yahoo Labs, UK
,
Alistair Moffat
University of Melbourne, Australia
,
Berthier Ribeiro-Neto
Google, Brazil, and UFMG, Brazil
Copyright © 2015 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 9 August 2015
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
authorship attribution
short text analysis
time-aware language models
Qualifiers
- short-paper
Conference

Acceptance Rates
SIGIR '15 Paper Acceptance Rate70of351submissions,20%Overall Acceptance Rate792of3,983submissions,20%
More
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 15
  Total Citations
  View Citations
- 437
  Total Downloads
- Downloads (Last 12 months)10
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Time-Aware Authorship Attribution for Short Text Streams

SIGIR '15: Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval

ABSTRACT

References

Cited By

Index Terms

Recommendations

Arabic Authorship Attribution: An Extensive Study on Twitter Posts

Authorship Attribution of Russian Forum Posts with Different Types of N-gram Features

Authorship Attribution for Short Texts with Author-Document Topic Model