Abstract
Name ambiguity is a critical problem in many applications, in particular in the online bibliographic digital libraries. Although several clustering-based methods have been proposed, the problem still presents to be a big challenge for both data integration and cleaning process. In this paper, we present a complementary study to the author name disambiguation from another point of view. We focus on the common names, especially non-canonical ones. We propose an approach of automatic access to authors’ personal information over Deep Web, and compute the similarity of every two citations according to the following features: co-author name, author’s affiliation, e-mail address and title. Then we employ Affinity Propagation clustering algorithm to attributing the resembling citations to the proper authors. We conducted experiments based on five data sources: DBLP, CiteSeer, IEEE, ACM and Springer LINK. Experiments results show that significant improvements can be obtained by using the proposed approach.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Wang, F., Li, J., Tang, J., Zhang, J., Wang, K.: Name Disambiguation Using Atomic Clusters. In: WAIM, pp. 357–364. IEEE, New York (2008)
Shu, L., Long, B., Meng, W.: A Latent Topic Model for Complete Entity Resolution. In: ICDE, pp. 880–891. IEEE, New York (2009)
Zhu, J., Zhou, X., Fung, G.P.C.: A Term-Based Driven Clustering Approach for Name Disambiguation. In: Li, Q., Feng, L., Pei, J., Wang, S.X., Zhou, X., Zhu, Q.-M. (eds.) APWeb/WAIM 2009. LNCS, vol. 5446, pp. 320–331. Springer, Heidelberg (2009)
Han, H., Giles, C.L., Zha, H., Li, C., Tsioutsiouliklis, K.: Two supervised learning approaches for name disambiguation in author citations. In: JCDL, pp. 296–305. ACM, New York (2004)
Han, H., Zha, H., Giles, C.L.: Name disambiguation in author citations using a K-way spectral clustering method. In: JCDL, pp. 334–343. ACM, New York (2005)
Zhang, D., Tang, J., Li, J., Wang, K.: A constraint-based probabilistic framework for name disambiguation. In: CIKM, pp. 1019–1022. ACM, New York (2007)
Song, Y., Huang, J., Councill, I.G., Li, J., Giles, C.L.: Generative models for name disambiguation. In: WWW, pp. 1163–1164. ACM, New York (2007)
http://en.wikipedia.org/wiki/Wikipedia:Digital_Object_Identifier
Zhu, M.D., Shen, D.R., Kou, Y., Nie, T.Z., Yu, G.: A Model of Identifying Duplicate Records for Deep Web Environment. Journal of Computer Research and Development 46I(suppl.), 14–21 (2009)
Frey, B.J., Dueck, D.: Clustering by passing messages between data points. Science 315(16), 972–976 (2007), http://www.psi.toronto.edu/affinitypropagation
Chen, Z., Kalashnikov, D.V., Mehrotra, S.: Adaptive graphical approach to entity resolution. In: JCDL, pp. 204–213. ACM, New York (2007)
Masada, T., Takasu, A., Adachi, J.: Citation data clustering for author name disambiguation. In: Infoscale, p. 62. ACM, New York (2007)
Tang, J., Zhang, J., Zhang, D., Li, J.: A unified framework for name disambiguation. In: WWW, pp. 1205–1206. ACM, New York (2008)
Fan, X., Wang, J., Lv, B., Zhou, L., Hu, W.: GHOST: an effective graph-based framework for name distinction. In: CIKM, pp. 1449–1450. ACM, New York (2008)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2010 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Zhang, R., Shen, D., Kou, Y., Nie, T. (2010). Author Name Disambiguation for Citations on the Deep Web. In: Shen, H.T., et al. Web-Age Information Management. WAIM 2010. Lecture Notes in Computer Science, vol 6185. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-16720-1_21
Download citation
DOI: https://doi.org/10.1007/978-3-642-16720-1_21
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-16719-5
Online ISBN: 978-3-642-16720-1
eBook Packages: Computer ScienceComputer Science (R0)