The Grouped Author-Topic Model for Unsupervised Entity Resolution

Dai, Andrew M.; Storkey, Amos J.

doi:10.1007/978-3-642-21735-7_30

Andrew M. Dai¹⁹ &
Amos J. Storkey¹⁹

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 6791))

Included in the following conference series:

International Conference on Artificial Neural Networks

7183 Accesses
11 Citations

Abstract

This paper describes a generative approach for tackling the problem of identity resolution in a completely unsupervised context with no fixed assumption regarding the true number of identities. The problem of entity resolution involves associating different references to authors (in a paper’s author list, for example) with real underlying identities. The references may be written in differing forms or may have errors, and identical references may refer to different real identities. The approach taken here uses a generative model of both the abstract of a document and its list of authors to resolve identities in a corpus of documents. In the model, authors and topics are associated with latent groups. For each document, an abstract and an author list are generated conditioned on a given group. Results are presented on real-world datasets, and outperform the best performing unsupervised methods.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Ferguson, T.S.: A Bayesian Analysis of Some Nonparametric Problems. The Annals of Statistics 1(2), 209–230 (1973)
Article MATH MathSciNet Google Scholar
Rodriguez, A., Dunson, D.B., Gelfand, A.E.: The nested Dirichlet process. Journal of the American Statistical Association 103(483), 1131–1154 (2008)
Article MATH MathSciNet Google Scholar
Teh, Y.W., Jordan, M.I., Beal, M.J., Blei, D.M.: Hierarchical Dirichlet Processes. Journal of the American Statistical Association 101(476), 1566–1581 (2006)
Article MATH MathSciNet Google Scholar
Bhattacharya, I., Getoor, L.: A Latent Dirichlet Model for Unsupervised Entity Resolution. In: The SIAM International Conference on Data Mining (SIAM-SDM), Bethesda, MD, USA (2006)
Google Scholar
Rosen-Zvi, M., Griffiths, T., Steyvers, M., Smyth, P.: The author-topic model for authors and documents. In: UAI 2004: Proceedings of the 20th Conference on Uncertainty in Artificial Intelligence, pp. 487–494. AUAI Press, Arlington (2004)
Google Scholar
Neal, R.M.: Markov Chain Sampling Methods for Dirichlet Process Mixture Models. Journal of Computational and Graphical Statistics 9(2), 249–265 (2000)
MathSciNet Google Scholar
Giles, C.L., Bollacker, K.D., Lawrence, S.: CiteSeer: An Automatic Citation Indexing System. In: Digital Libraries 1998 - The Third ACM Conference on Digital Libraries, pp. 89–98 (1998)
Google Scholar
Peng, F., Mccallum, A.: Information extraction from research papers using conditional random fields. Information Processing & Management 42(4), 963–979 (2006)
Article Google Scholar
Dorazio, R.M.: On selecting a prior for the precision parameter of Dirichlet process mixture models. Journal of Statistical Planning and Inference 139(9), 3384–3390 (2009)
Article MATH MathSciNet Google Scholar
Bagga, A., Baldwin, B.: Entity-based cross-document coreferencing using the Vector Space Model. In: Proceedings of the 17th International Conference on Computational Linguistics, Association for Computational Linguistics, pp. 79–85 Morristown (1998)
Google Scholar
Artiles, J., Gonzalo, J., Sekine, S.: WePS 2 Evaluation Campaign: Overview of the Web People Search Clustering Task. In: Evaluation (2009)
Google Scholar
Finkel, J.R., Grenager, T., Manning, C.: Incorporating non-local information into information extraction systems by Gibbs sampling. In: Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics ACL 2005, vol. 43, pp. 363–370 (2005)
Google Scholar

Download references

Author information

Authors and Affiliations

Institute for Adaptive and Neural Computation, School of Informatics, University of Edinburgh, U.K.
Andrew M. Dai & Amos J. Storkey

Authors

Andrew M. Dai
View author publications
You can also search for this author in PubMed Google Scholar
Amos J. Storkey
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Information and Computer Science, Aalto University School of Science, P.O. Box 15400, 00076, Aalto, Finland
Timo Honkela & Samuel Kaski &
School of Physics, Astronomy and Informatics, Department of Informatics, Nicolaus Copernicus University, ul. Grudziadzka 5, 87-100, Torun, Poland
Włodzisław Duch
Department of Statistical Science, University College London, 1-19 Torrington Place, WC1E 7HB, London, UK
Mark Girolami

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Dai, A.M., Storkey, A.J. (2011). The Grouped Author-Topic Model for Unsupervised Entity Resolution. In: Honkela, T., Duch, W., Girolami, M., Kaski, S. (eds) Artificial Neural Networks and Machine Learning – ICANN 2011. ICANN 2011. Lecture Notes in Computer Science, vol 6791. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-21735-7_30

Download citation

DOI: https://doi.org/10.1007/978-3-642-21735-7_30
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-21734-0
Online ISBN: 978-3-642-21735-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics