ABSTRACT
Identifying authors of short texts on Internet or social media based communication systems is an important tool against fraud and cybercrimes. Besides the challenges raised by the limited length of these short messages, evolving language and writing styles of authors of these texts makes authorship attribution difficult. Most current short text authorship attribution approaches only address the challenge of limited text length. However, neglecting the second challenge may lead to poor performance of authorship attribution for authors who change their writing styles.
In this paper, we analyse the temporal changes of word usage by authors of tweets and emails and based on this analysis we propose an approach to estimate the dynamicity of authors' word usage. The proposed approach is inspired by time-aware language models and can be employed in any time-unaware authorship attribution method. Our experiments on Tweets and the Enron email dataset show that the proposed time-aware authorship attribution approach significantly outperforms baselines that neglect the dynamicity of authors.
- G. Frantzeskou, E. Stamatatos, S. Gritzalis, C. E. Chaski, and B. S. Howald. Identifying authorship by byte-level n-grams: The source code author profile (SCAP) method. IJDE, 6(1), 2007.Google Scholar
- N. Kanhabua and K. Nørvåg. A comparison of time-aware ranking methods. SIGIR '11, pages 1257--1258, 2011. Google ScholarDigital Library
- B. Klimt and Y. Yang. The enron corpus: A new dataset for email classification research. In ECML'04, pages 217--226, 2004.Google ScholarDigital Library
- M. Koppel, J. Schler, and S. Argamon. Authorship attribution in the wild. LREC, 45(1):83--94, 2011. Google ScholarDigital Library
- M. Koppel, J. Schler, S. Argamon, and E. Messeri. Authorship attribution with thousands of candidate authors. SIGIR '06, pages 659--660, 2006. Google ScholarDigital Library
- M. Koppel and Y. Winter. Determining if two documents are written by the same author. JASIST, 65(1):178--187, 2014.Google Scholar
- I. Lancashire and G. Hirst. Vocabulary changes in agatha christie's mysteries as an indication of dementia: A case study. In 19th Annual Rotman Research Institute Conference Cognitive Aging: Research and Practice, pages 1--5, 2009.Google Scholar
- R. Layton, P. Watters, and R. Dazeley. Authorship attribution for twitter in 140 characters or less. In Cybercrime and Trustworthy Computing Workshop (CTC), pages 1--8, 2010. Google ScholarDigital Library
- X. Li and W. B. Croft. Time-based language models. CIKM '03, pages 469--475, 2003. Google ScholarDigital Library
- R. Schwartz, O. Tsur, A. Rappoport, and M. Koppel. Authorship attribution of micro-messages. In EMNLP'13, pages 1880--1891, 2013.Google Scholar
- R. S. Silva, G. Laboreiro, L. Sarmento, T. Grant, E. Oliveira, and B. Maia. Twazn me: Automatic authorship analysis of micro-blogging messages. NLDB'11, pages 161--168, 2011. Google ScholarDigital Library
- E. Stamatatos. A survey of modern authorship attribution methods. JASIST, 60(3):538--556, 2009. Google ScholarDigital Library
- M. van Dam and C. Hauff. Large-scale author verification: Temporal and topical influences. SIGIR '14, pages 1039--1042, 2014. Google ScholarDigital Library
- C. Zhai and J. Lafferty. A study of smoothing methods for language models applied to information retrieval. TOIS, 22(2):179--214, 2004. Google ScholarDigital Library
Index Terms
- Time-Aware Authorship Attribution for Short Text Streams
Recommendations
Arabic Authorship Attribution: An Extensive Study on Twitter Posts
Law enforcement faces problems in tracing the true identity of offenders in cybercrime investigations. Most offenders mask their true identity, impersonate people of high authority, or use identity deception and obfuscation tactics to avoid detection ...
Authorship Attribution of Russian Forum Posts with Different Types of N-gram Features
NLPIR '19: Proceedings of the 2019 3rd International Conference on Natural Language Processing and Information RetrievalAuthorship attribution is an important field in online security. Recently there have been numerous successful works in authorship attribution in various European languages. Character n-grams are reported to be the best choice in authorship attribution, ...
Authorship Attribution for Short Texts with Author-Document Topic Model
Knowledge Science, Engineering and ManagementAbstractThe goal of authorship attribution is to assign the controversial texts to the known authors correctly. With the development of social media services, authorship attribution for short texts becomes very necessary. In the earlier works, topic ...
Comments