Skip to main content

Author Name Disambiguation for Citations on the Deep Web

  • Conference paper
Web-Age Information Management (WAIM 2010)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 6185))

Included in the following conference series:

Abstract

Name ambiguity is a critical problem in many applications, in particular in the online bibliographic digital libraries. Although several clustering-based methods have been proposed, the problem still presents to be a big challenge for both data integration and cleaning process. In this paper, we present a complementary study to the author name disambiguation from another point of view. We focus on the common names, especially non-canonical ones. We propose an approach of automatic access to authors’ personal information over Deep Web, and compute the similarity of every two citations according to the following features: co-author name, author’s affiliation, e-mail address and title. Then we employ Affinity Propagation clustering algorithm to attributing the resembling citations to the proper authors. We conducted experiments based on five data sources: DBLP, CiteSeer, IEEE, ACM and Springer LINK. Experiments results show that significant improvements can be obtained by using the proposed approach.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Wang, F., Li, J., Tang, J., Zhang, J., Wang, K.: Name Disambiguation Using Atomic Clusters. In: WAIM, pp. 357–364. IEEE, New York (2008)

    Google Scholar 

  2. Shu, L., Long, B., Meng, W.: A Latent Topic Model for Complete Entity Resolution. In: ICDE, pp. 880–891. IEEE, New York (2009)

    Google Scholar 

  3. Zhu, J., Zhou, X., Fung, G.P.C.: A Term-Based Driven Clustering Approach for Name Disambiguation. In: Li, Q., Feng, L., Pei, J., Wang, S.X., Zhou, X., Zhu, Q.-M. (eds.) APWeb/WAIM 2009. LNCS, vol. 5446, pp. 320–331. Springer, Heidelberg (2009)

    Chapter  Google Scholar 

  4. Han, H., Giles, C.L., Zha, H., Li, C., Tsioutsiouliklis, K.: Two supervised learning approaches for name disambiguation in author citations. In: JCDL, pp. 296–305. ACM, New York (2004)

    Google Scholar 

  5. Han, H., Zha, H., Giles, C.L.: Name disambiguation in author citations using a K-way spectral clustering method. In: JCDL, pp. 334–343. ACM, New York (2005)

    Google Scholar 

  6. Zhang, D., Tang, J., Li, J., Wang, K.: A constraint-based probabilistic framework for name disambiguation. In: CIKM, pp. 1019–1022. ACM, New York (2007)

    Google Scholar 

  7. Song, Y., Huang, J., Councill, I.G., Li, J., Giles, C.L.: Generative models for name disambiguation. In: WWW, pp. 1163–1164. ACM, New York (2007)

    Chapter  Google Scholar 

  8. http://en.wikipedia.org/wiki/Wikipedia:Digital_Object_Identifier

  9. Zhu, M.D., Shen, D.R., Kou, Y., Nie, T.Z., Yu, G.: A Model of Identifying Duplicate Records for Deep Web Environment. Journal of Computer Research and Development 46I(suppl.), 14–21 (2009)

    Google Scholar 

  10. Frey, B.J., Dueck, D.: Clustering by passing messages between data points. Science 315(16), 972–976 (2007), http://www.psi.toronto.edu/affinitypropagation

    Article  MathSciNet  MATH  Google Scholar 

  11. Chen, Z., Kalashnikov, D.V., Mehrotra, S.: Adaptive graphical approach to entity resolution. In: JCDL, pp. 204–213. ACM, New York (2007)

    Chapter  Google Scholar 

  12. Masada, T., Takasu, A., Adachi, J.: Citation data clustering for author name disambiguation. In: Infoscale, p. 62. ACM, New York (2007)

    Google Scholar 

  13. Tang, J., Zhang, J., Zhang, D., Li, J.: A unified framework for name disambiguation. In: WWW, pp. 1205–1206. ACM, New York (2008)

    Chapter  Google Scholar 

  14. Fan, X., Wang, J., Lv, B., Zhou, L., Hu, W.: GHOST: an effective graph-based framework for name distinction. In: CIKM, pp. 1449–1450. ACM, New York (2008)

    Chapter  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2010 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Zhang, R., Shen, D., Kou, Y., Nie, T. (2010). Author Name Disambiguation for Citations on the Deep Web. In: Shen, H.T., et al. Web-Age Information Management. WAIM 2010. Lecture Notes in Computer Science, vol 6185. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-16720-1_21

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-16720-1_21

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-16719-5

  • Online ISBN: 978-3-642-16720-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics