Skip to main content
Log in

A supervised and distributed framework for cold-start author disambiguation in large-scale publications

  • S.I. : Deep Social Computing
  • Published:
Neural Computing and Applications Aims and scope Submit manuscript

Abstract

Names make up a large portion of queries in search engines, while the name ambiguity problem brings negative effect to the service quality of search engines. In digital academic systems, this problem refers to a large number of publications containing ambiguous author names. Name ambiguity derives from many people sharing identical names, or names may be abbreviated. Although some methods have been proposed in the decade, this problem is still not completely solved and there are many subproblems needing to be studied. Due to lack of information, it is a nontrivial task to distinguish ambiguous authors accurately relying on limited internal information only. In this paper, we focus on the cold-start disambiguation task with homonymous author names, i.e., distinguishing publications written by authors with identical names. We present a supervised framework named DND (abbreviation for Distributed Framework for Name Disambiguation) to solve the author disambiguation problem efficiently. DND utilizes accessible information and trains a robust function to measure similarities between publications, and then determines whether they belong to the same author. In traditional clustering-based approaches for author disambiguation, the number of clusters which is the amount of authors sharing the same name is hard to predict in advance, while DND transforms the clustering task to a linkage prediction task to avoid specifying the number of clusters. We validate the effectiveness of DND on two real-world datasets. The experimental results indicate that DND achieves a competitive performance compared with the baselines.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3

Similar content being viewed by others

Notes

  1. http://howmanyofme.com, accessed July 1, 2020.

  2. https://dblp.uni-trier.de.

  3. https://academic.microsoft.com/.

  4. https://www.aminer.cn.

  5. http://spark.apache.org/.

  6. https://github.com/xxx/xxx.

  7. https://www.biendata.com/competition/aminer2019/.

References

  1. pal Singh V, Kumar P (2020) Word sense disambiguation for Punjabi language using deep learning techniques. Neural Comput Appl 32:2963–2973

    Article  Google Scholar 

  2. Jirak D, Biertimpel D, Kerzel M, Wermter S (2020) Solving visual object ambiguities when pointing: an unsupervised learning approach. Neural Comput Appl. https://doi.org/10.1007/s00521-020-05109-w

    Article  Google Scholar 

  3. Wang Z, Zhang J, Feng J, Chen Z (2014) Knowledge graph embedding by translating on hyperplanes. In: Proceedings of AAAI conference on artificial intelligence, pp 1112–1119

  4. Gao J, Tian L, Lv T, Wang J, Song B, Hu X (2019) Protein2vec: aligning multiple ppi networks with representation learning. IEEE/ACM Trans Comput Biol Bioinform 19(3):571–578

    Google Scholar 

  5. Zhang J, Philip SY (2015) Multiple anonymized social networks alignment. In: Proceedings of IEEE international conference on data mining. IEEE, pp 599–608

  6. Zhang Y, Zhang F, Yao P, Tang J (2018) Name disambiguation in aminer: clustering, maintenance, and human in the loop. In: Proceedings of ACM SIGKDD international conference on knowledge discovery and data mining. ACM, pp 1002–1011

  7. Zhang B, Al Hasan M (2017) Name disambiguation in anonymized graphs using network embedding. In: Proceedings of ACM international conference on information and knowledge management. ACM, pp 1239–1248

  8. Fan X, Wang J, Pu X, Zhou L, Lv B (2011) On graph-based name disambiguation. J Data Inf Qual (JDIQ) 2(2):10

    Google Scholar 

  9. Shen J, Xiao J, He X, Shang J, Sinha S, Han J (2018) Entity set search of scientific literature: an unsupervised ranking approach. In: Proceedings of ACM SIGIR conference on research and development in information retrieval. ACM, pp 565–574

  10. Zwicklbauer S, Seifert C, Granitzer M (2016) Robust and collective entity disambiguation through semantic embeddings. In: Proceedings of ACM SIGIR conference on research and development in information retrieval. ACM, pp 425–434

  11. Huang S, Yang B, Yan S, Rousseau R (2014) Institution name disambiguation for research assessment. Scientometrics 99(3):823–838

    Article  Google Scholar 

  12. Kim J, Kim J, Owen-Smith J (2019) Generating automatically labeled data for author name disambiguation: an iterative clustering method. Scientometrics 118(1):253–280

    Article  Google Scholar 

  13. Schulz J (2016) Using Monte Carlo simulations to assess the impact of author name disambiguation quality on different bibliometric analyses. Scientometrics 107(3):1283–1298

    Article  Google Scholar 

  14. Yin D, Motohashi K, Dang J (2020) Large-scale name disambiguation of Chinese patent inventors (1985–2016). Scientometrics 122(2):765–790

    Article  Google Scholar 

  15. Krizhevsky A, Sutskever I, Hinton G E (2012) Imagenet classification with deep convolutional neural networks. In: Proceedings of international conference on neural information processing systems. Curran Associates Inc., pp 1097–1105

  16. Devlin J, Chang M-W, Lee K, Toutanova K (2018) Bert: pre-training of deep bidirectional transformers for language understanding. arXiv1810.04805, pp 1–14

  17. Singh M, Kumar R, Chana I (2020) Improving neural machine translation for low-resource Indian languages using rule-based feature extraction. Neural Comput Appl. https://doi.org/10.1007/s00521-020-04990-9

    Article  Google Scholar 

  18. Teles G, Rodrigues JJPC, Saleem K, Kozlov S, Rabêlo RAL (2020) Machine learning and decision support system on credit scoring. Neural Comput Appl 32:9809–9826

    Article  Google Scholar 

  19. Hou R, Kong Y, Cai B, Liu H (2020) Unstructured big data analysis algorithm and simulation of internet of things based on machine learning. Neural Comput Appl 32:5399–5407

    Article  Google Scholar 

  20. Zhang Y, Wu J, Zhou C, Cai Z (2017) Instance cloned extreme learning machine. Pattern Recognit 68:52–65

    Article  Google Scholar 

  21. Gurney T, Horlings E, Van Den Besselaar P (2012) Author disambiguation using multi-aspect similarity indicators. Scientometrics 91(2):435–449

    Article  Google Scholar 

  22. Müller M-C (2018) On the contribution of word-level semantics to practical author name disambiguation. In: Proceedings of ACM/IEEE joint conference on digital libraries, pp 367–368

  23. Yin D, Motohashi K (2018) Inventor name disambiguation with gradient boosting decision tree and inventor mobility in China (1985–2016). Technical report, Research Institute of Economy, Trade and Industry

  24. Ju Y, Adams B, Janowicz K, Hu Y, Yan B, McKenzie G (2016)Things and strings: improving place name disambiguation from short texts by combining entity co-occurrence with topic modeling. In: Proceedings of European knowledge acquisition workshop. Springer, pp 353–367

  25. Steorts RC, Ventura SL, Sadinle M, Fienberg SE (2014) A comparison of blocking methods for record linkage. In: Proceedings of international conference on privacy in statistical databases. Springer, pp 253–268

  26. Yoshida M, Ikeda M, Ono S, Sato I, Nakagawa H (2010) Person name disambiguation by bootstrapping. In: Proceedings of ACM SIGIR international conference on research and development in information retrieval. ACM, pp 10–17

  27. Zhang K, Zhu Y, Gao W, Xing Y, Zhou J (2018) An approach for named entity disambiguation with knowledge graph. In: Proceedings of international conference on audio, language and image processing. IEEE, pp 138–143

  28. Qian Y, Hu Y, Cui J, Zheng Q, Nie Z (2011) Combining machine learning and human judgment in author disambiguation. In: Proceedings of ACM international conference on information and knowledge management. ACM, pp 1241–1246

  29. Shen Q, Wu T, Yang H, Wu Y, Qu H, Cui W (2016) Nameclarifier: a visual analytics system for author name disambiguation. IEEE Trans Vis Comput Graph 23(1):141–150

    Article  Google Scholar 

  30. Louppe G, Al-Natsheh HT, Susik M, Maguire EJ (2016) Ethnicity sensitive author disambiguation using semi-supervised learning. In: Proceedings of international conference on knowledge engineering and the semantic web. Springer, pp 272–287

  31. Zhang B, Dundar M, Al Hasan M (2016) Bayesian non-exhaustive classification a case study: Online name disambiguation using temporal record streams. In: Proceedings of ACM international on conference on information and knowledge management. ACM, pp 1341–1350

  32. Treeratpituk P, Giles CL (2009) Disambiguating authors in academic publications using random forests. In: Proceedings of ACM/IEEE joint conference on digital libraries. ACM, pp 39–48

  33. Han H, Giles L, Zha H, Li C, Tsioutsiouliklis K (2004) Two supervised learning approaches for name disambiguation in author citations. In: Proceedings of ACM/IEEE joint conference on digital libraries. IEEE, pp 296–305

  34. Pooja KM, Mondal S, Chandra J (2018) An unsupervised heuristic based approach for author name disambiguation. In: Proceedings of international conference on communication systems and networks. IEEE, pp 540–542

  35. Kim J (2018) Evaluating author name disambiguation for digital libraries: a case of DBLP. Scientometrics 116(3):1867–1886

    Article  Google Scholar 

  36. Zhu J, Wu X, Xueqin Lin, Huang C, Fung GPC, Tang Y (2018) A novel multiple layers name disambiguation framework for digital libraries using dynamic clustering. Scientometrics 114(3):781–794

    Article  Google Scholar 

  37. Xiong B, Bao P, Wu Y (2020) Learning semantic and relationship joint embedding for author name disambiguation. Neural Comput Appl. https://doi.org/10.1007/s00521-020-05088-y

    Article  Google Scholar 

  38. Mikolov T, Sutskever I, Chen K, Corrado GS, Dean J (2013) Distributed representations of words and phrases and their compositionality. In: Proceedings of international conference on neural information processing systems. Curran Associates Inc., pp 3111–3119

  39. Zhu J, Yang Y, Xie Q, Wang L, Hassan S-U (2014) Robust hybrid name disambiguation framework for large databases. Scientometrics 98(3):2255–2274

    Article  Google Scholar 

  40. Han H, Yao C, Fu Y, Yu Y, Zhang Y, Xu S (2017) Semantic fingerprints-based author name disambiguation in chinese documents. Scientometrics 111(3):1879–1896

    Article  Google Scholar 

  41. Tang J, Fong ACM, Wang B, Zhang J (2011) A unified probabilistic framework for name disambiguation in digital library. IEEE Trans Knowl Data Eng 24(6):975–987

    Article  Google Scholar 

  42. Pelleg D, Moore AW et al (2000) X-means: extending k-means with efficient estimation of the number of clusters. In: Proceedings of international conference on machine learning, vol 1, pp 727–734

  43. Wu H, Li B, Pei Y, He J (2014a) Unsupervised author disambiguation using Dempster–Shafer theory. Scientometrics 101(3):1955–1972

    Article  Google Scholar 

  44. Arif T, Ali R, Asger M (2014) Author name disambiguation using vector space model and hybrid similarity measures. In: Proceedings of international conference on contemporary computing. IEEE, pp 135–140

  45. Liu W, Doğan RI, Kim S, Comeau DC, Kim W, Yeganova L, Lu Z, Wilbur WJ (2014) Author name disambiguation for pubmed. J Assoc Inf Sci Technol 65(4):765–781

    Article  Google Scholar 

  46. Huang J, Ertekin S, Giles CL (2006) Efficient name disambiguation for large-scale databases. In: Proceedings of European conference on principles of data mining and knowledge discovery. Springer, pp 536–544

  47. Wu J, Pan S, Zhu X, Zhang C, Wu X (2016) Positive and unlabeled multi-graph learning. IEEE Trans Cybern 47(4):818–829

    Article  Google Scholar 

  48. Qiao Z, Du Y, Fu Y, Wang P, Zhou Y (2019) Unsupervised author disambiguation using heterogeneous graph convolutional network embedding. In: 2019 IEEE international conference on big data (Big Data), pp 910–919

  49. Li Z, Sun Y, Zhu J, Tang S, Zhang C, Ma H (2020) Improve relation extraction with dual attention-guided graph convolutional networks. Neural Comput Appl. https://doi.org/10.1007/s00521-020-05087-z

    Article  Google Scholar 

  50. Wu J, Pan S, Zhu X, Cai Z (2014b) Boosting for multi-graph classification. IEEE Trans Cybern 45(3):416–429

    Google Scholar 

  51. Wu J, Zhu X, Zhang C, Philip SY (2014) Bag constrained structure pattern mining for multi-graph classification. IEEE Trans Knowl Data Eng 26(10):2382–2396

    Article  Google Scholar 

  52. Kipf TN, Welling M (2017) Semi-supervised classification with graph convolutional networks. In: Proceedings of international conference on learning representations, pp 1–14

  53. Veličković P, Cucurull G, Casanova A, Romero A, Liò P, Bengio Y (2018) Graph attention networks. In: Proceedings of international conference on learning representations, pp 1–12

  54. Huang W, Qu Q, Yang M (2020) Interactive knowledge-enhanced attention network for answer selection. Neural Comput Appl. https://doi.org/10.1007/s00521-019-04630-x

    Article  Google Scholar 

  55. Rozenshtein P, Bonchi F, Gionis A, Sozio M, Tatti N (2020) Finding events in temporal networks: segmentation meets densest subgraph discovery. Knowl Inf Syst 62:1611–1639

    Article  Google Scholar 

  56. Chen Z, Chen F, Lai R, Zhang X, Lu C-T (2018) Rational neural networks for approximating jump discontinuities of graph convolution operator. In: Proceedings of IEEE international conference on data mining. IEEE, pp 406–415

  57. Yang C, Feng Y, Li P, Shi Y, Han J (2018) Meta-graph based hin spectral embedding: methods, analyses, and insights. In: Proceedings of IEEE international conference on data mining. IEEE, pp 657–666

  58. Hermansson L, Kerola T, Johansson F, Jethava V, Dubhashi D (2013) Entity disambiguation in anonymized graphs using graph kernels. In: Proceedings of ACM international conference on information and knowledge management. ACM, pp 1037–1046

  59. Shin D, Kim T, Choi J, Kim J (2014) Author name disambiguation using a graph model with node splitting and merging based on bibliographic information. Scientometrics 100(1):15–50

    Article  Google Scholar 

  60. Hussain I, Asghar S (2018) Author name disambiguation by exploiting graph structural clustering and hybrid similarity. Arab J Sci Eng 43(12):7421–7437

    Article  Google Scholar 

  61. Si HJ, Tong W, Kausar S (2018) A conditional random field model for name disambiguation in national natural science foundation of china fund. J Algorithms Comput Technol 12(2):91–100

    Article  Google Scholar 

  62. Saha TK, Zhang B, Al Hasan M (2015) Name disambiguation from link data in a collaboration graph using temporal and topological features. Soc Netw Anal Min 5(1):11

    Article  Google Scholar 

  63. Shen W, Han J, Wang J (2014) A probabilistic model for linking named entities in web text with heterogeneous information networks. In: Proceedings of ACM SIGMOD international conference on management of data. ACM, pp 1199–1210

  64. Wang X, Tang J, Cheng H, Philip SY (2011) Adana: active name disambiguation. In: Proceedings of international conference on data mining. IEEE, pp 794–803

Download references

Acknowledgements

This work was supported in part by National Natural Science Foundation of China under grant no. 61873288, in part by CERNET Innovation Project (NGII20190517); in part by Technology Projects, Hunan Key Laboratory for Internet of Things in Electricity (2019TP1016); in part by the Fundamental Research Funds for the Central Universities of Central South University (2020zzts594).

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Liping Gao or Zhao Li.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest regarding the contents of present article.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Chen, Y., Jiang, Z., Gao, J. et al. A supervised and distributed framework for cold-start author disambiguation in large-scale publications. Neural Comput & Applic 35, 13093–13108 (2023). https://doi.org/10.1007/s00521-020-05684-y

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00521-020-05684-y

Keywords

Navigation