Software Library for Authorship Identification

Authors

  • Ivan Ivanov Faculty of Mathematics and Informatics, Sofia University, Bulgaria
  • Cvetina Hantova Faculty of Mathematics and Informatics, Sofia University, Bulgaria
  • Maria Nisheva Faculty of Mathematics and Informatics, Sofia University, Bulgaria; Institute of Mathematics and Informatics, Bulgarian Academy of Sciences, Sofia, Bulgaria
  • Peter L. Stanchev Institute of Mathematics and Informatics, Bulgarian Academy of Sciences, Sofia, Bulgaria; Kettering University, Flint, USA
  • Phillip Ein-Dor Faculty of Management, Tel Aviv University

DOI:

https://doi.org/10.55630/dipp.2015.5.8

Keywords:

text authorship identification, compression algorithms, normalized compression distance, n-grams, natural frequency zoned word distribution

Abstract

The aim of this paper is to review some methods for text authorship attribution and to discuss the development of a software library with tools for automatic authorship attribution. The presentation is focused on an analysis of two groups of tools oriented to: (1) methods for extraction of features and (2) methods for computing the distance between character strings based on data compression algorithms.

References

Adair D. (1944). The Authorship of the Disputed Federalist Papers. The William and Mary Quarterly ser. 3, vol. 1, no. 2: 97-122.

Cavnar W., Trenkle J. (1994). N-gram-based text categorization. In Proceedings of the 3 rd Annual Symposium on Document Analysis and Information Retrieval SDAIR-94, 161–175.

Chen Z., Huang L., Yang W., Meng P., Haibo Miao H. (2012). More than Word Frequencies: Authorship Attribution via Natural Frequency Zoned Word Distribution Analysis. Cornell University Library.

Cilibrasi R., Vitanyi P. M. B. (2005). Clustering by compression. IEEE Transactions on Information Theory, 51(4), 1523-1545.

Diederich J. (2003). Authorship Attribution with Support Vector Machines. Applied Intelligence 19, 109-123.

Google Code Jam: https://code.google.com/codejam

Hantova C. (2015). Authorship attribution. MSc Thesis, Sofia University, Facuty of Mathematics and Informatics.

Ivanov I. (2013). Automatic authorship attribution using compression methods. MSc Thesis, Sofia University, Facuty of Mathematics and Informatics.

Luyckx K. (2011). Scalability Issues in Authorship Attribution. Vubpress.

Mosteller F., Wallace D. (1964). Inference and disputed authorship: The Federalist. Addison- Wesley.

Peng F., Shuurmans D., Wang S. (2004). Augmenting naive Bayes classifiers with statistical language models. Information Retrieval Journal, 7(1), 317-345.

Stamatatos E. (2009). A survey of modern authorship attribution methods. Journal of the American Society for information Science and Technology, 60(3), 538-556.

Downloads

Published

2015-09-30

How to Cite

Ivanov, I., Hantova, C., Nisheva, M., L. Stanchev, P., & Ein-Dor, P. (2015). Software Library for Authorship Identification. Digital Presentation and Preservation of Cultural and Scientific Heritage, 5, 91–97. https://doi.org/10.55630/dipp.2015.5.8

Most read articles by the same author(s)