Accuracy estimation of link-based similarity measures and its application

Zhang, Yinglong; Li, Cuiping; Xie, Chengwang; Chen, Hong

doi:10.1007/s11704-015-4570-7

Accuracy estimation of link-based similarity measures and its application

Research Article
Published: 15 July 2015

Volume 10, pages 113–123, (2016)
Cite this article

Frontiers of Computer Science Aims and scope Submit manuscript

Yinglong Zhang^1,3,
Cuiping Li²,
Chengwang Xie^1,3 &
…
Hong Chen²

82 Accesses
Explore all metrics

Abstract

Link-based similarity measures play a significant role in many graph based applications. Consequently, measuring node similarity in a graph is a fundamental problem of graph datamining. Personalized pagerank (PPR) and simrank (SR) have emerged as the most popular and influential link-based similarity measures. Recently, a novel link-based similarity measure, penetrating rank (P-Rank), which enriches SR, was proposed. In practice, PPR, SR and P-Rank scores are calculated by iterative methods. As the number of iterations increases so does the overhead of the calculation. The ideal solution is that computing similarity within the minimum number of iterations is sufficient to guarantee a desired accuracy. However, the existing upper bounds are too coarse to be useful in general. Therefore, we focus on designing an accurate and tight upper bounds for PPR, SR, and P-Rank in the paper. Our upper bounds are designed based on the following intuition: the smaller the difference between the two consecutive iteration steps is, the smaller the difference between the theoretical and iterative similarity scores becomes. Furthermore, we demonstrate the effectiveness of our upper bounds in the scenario of top-k similar nodes queries, where our upper bounds helps accelerate the speed of the query. We also run a comprehensive set of experiments on real world data sets to verify the effectiveness and efficiency of our upper bounds.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Accuracy Estimation of Link-Based Similarity Measures and Its Application

SimRank*: effective and scalable pairwise similarity search based on graph topology

Article Open access 11 January 2019

Towards distributed node similarity search on graphs

Article 18 June 2020

References

Gupta P, Goel A, Lin J, Sharma A,Wang D, Zadeh R. WTF: the who to follow service at Twitter. In: Proceedings of International World Wide Web Conference. 2013, 505᾿14
Google Scholar
Liben-Nowell D, Kleinberg J. The link-prediction problem for social networks. Journal of the Association for Information Science and Technology, 2007, 58 (7): 1019–1031
Article Google Scholar
Joshi A, Kumar R, Reed B, Tomkins A. Anchor-based proximity measures. In: Proceedings of International World Wide Web Conference. 2007, 1131–1132
Chapter Google Scholar
Antonellis I, Molina H G, Chang C C. Simrank++: query rewriting through link analysis of the click graph. Proceedings of the VLDB Endowment, 2008, 1 (1): 408–421
Article Google Scholar
Jeh G, Widom J. Scaling personalized web search. In: Proceedings of International World Wide Web Conference. 2003, 271–279
Google Scholar
Jeh G, Widom J. SimRank: a measure of structural-context similarity. In: Proceedings of ACM SIGKDD Conference on Knowledge Discovery and Data Mining. 2002, 538-43
Google Scholar
Sarkar P, Moore A W, Prakash A. Fast incremental proximity search in large graphs. In: Proceedings of International Conference on Machine Learning. 2008, 896-03
Chapter Google Scholar
Sarkar P, Moore AW. A tractable approach to finding closest truncated-commute-time neighbors in large graphs. In: Proceedings of Uncertainty in Artificial Intelligence. 2007, 335-43
Google Scholar
Zhao P, Han J, Sun Y. P-Rank: a comprehensive structural similarity measure over information networks. In: Proceedings of the 18th ACM Conference on Information and Knowledge Management. 2009, 553-62
Google Scholar
Lizorkin D, Velikhov P, Grinev M N, Turdakov D. Accuracy estimate and optimization techniques for simrank computation. Proceedings of the VLDB Endowment, 2008, 1 (1): 422–433
Article Google Scholar
Sun L, Cheng R, Li X, Cheung D W, Han J. On link-based similarity join. The Proceedings of the VLDB Endowment, 2011, 4 (11): 714–725
Google Scholar
Zhang Y, Li C, Xie C, Chen H. Accuracy estimation of link-based similarity measures and its application. In: Proceedings of Web-Age Information Management WAIM. 2014, 100-12
Google Scholar
Zhang Y, Li C, Chen H, Sheng L. Fast simrank computation over disk-resident graphs. In: Proceedings of International Conference of Database Systems for Advanced Applications. 2013, 16-0
Chapter Google Scholar
Lizorkin D, Velikhov P, Grinev M N, Turdakov D. Accuracy estimate and optimization techniques for simrank computation. The International Journal on Very Large Data Bases, 2010, 19 (1): 45–66
Article Google Scholar
Zhu F, Fang Y, Chang K C C, Ying J. Incremental and accuracyaware personalized pagerank through scheduled approximation. The Proceedings of the VLDB Endowment, 2013, 6 (6): 481–492
Article Google Scholar
Lee P, Lakshmanan L V S, Yu J X. On top-k structural similarity search. In: Proceedings of International Conference on Data Engineering. 2012, 774–785
Google Scholar
Yu W, Lin X, Zhang W. Towards efficient simrank computation on large networks. In: Proceedings of International Conference on Data Engineering. 2013, 601–612
Google Scholar
Li X, Yu W, Yang B, Le J. ASAP: Towards accurate, stable and accelerative penetrating-rank estimation on large graphs. In: Proceedings of Web-Age Information Management. 2011, 415–429
Chapter Google Scholar
Yu W, Le J, Lin X, Zhang W. On the efficiency of estimating penetrating rank on large graphs. In: Proceedings of Scientific and Statistical Database Management. 2012, 231–249
Chapter Google Scholar
Fujiwara Y, Nakatsuji M, Shiokawa H, Mishima T, Onizuka M. Efficient ad-hoc search for personalized pagerank. In: Proceedings of the ACM SIGMOD International Conference on Management of Data. 2013, 445–456
Google Scholar
Albert R, Barabasi A. Statistical mechanics of complex networks. Reviews of Modern Physics, 2002, 74: 47–97
Article MATH MathSciNet Google Scholar
Jin R, Ruan N, Xiang Y, Wang H. Path-tree: an efficient reachability indexing scheme for large directed graphs. ACM Transaction Database System, 2011, 36 (1): 1–44
Article Google Scholar
Zheng W, Zou L, Feng Y, Chen L, Zhao D. Efficient simrank-based similarity join over large graphs. Proceedings of the VLDB Endowment, 2013, 6 (7): 493–504
Article Google Scholar

Download references

Author information

Authors and Affiliations

School of Software, East China Jiaotong University, Nanchang, 330045, China
Yinglong Zhang & Chengwang Xie
Key Laboratory of Data Engineering and Knowledge Engineering of Ministry of Education, and Department of Computer Science, Renmin University of China, Beijing, 100872, China
Cuiping Li & Hong Chen
Intelligent Optimization and Information Processing Laboratory, East China Jiaotong University, Nanchang, 330013, China
Yinglong Zhang & Chengwang Xie

Authors

Yinglong Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Cuiping Li
View author publications
You can also search for this author in PubMed Google Scholar
Chengwang Xie
View author publications
You can also search for this author in PubMed Google Scholar
Hong Chen
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Cuiping Li.

Additional information

Yinglong Zhang received his PhD from RenMin University, China in 2014. He is a lecturer at China East Jiaotong University, China. His research interests include data mining and information network analysis.

Cuiping Li received her PhD from the Chinese Academy of Science, China in 2003. She is a professor and doctoral supervisor at Renmin University, China. Her research interests include databases, data mining, information network analysis, and data stream management.

Chengwang Xie received his PhD from Wuhan University, China in 2010. He is an associate professor at East China Jiaotong University, China. His research interests include evolutionary computation and data miming.

Hong Chen received her PhD from the Chinese Academy of Science, China in 2000. She is a professor and doctoral supervisor at Renmin University, China. Her research interests include databases, data mining, data stream analysis and management, and sensor network data management.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zhang, Y., Li, C., Xie, C. et al. Accuracy estimation of link-based similarity measures and its application. Front. Comput. Sci. 10, 113–123 (2016). https://doi.org/10.1007/s11704-015-4570-7

Download citation

Received: 15 December 2014
Accepted: 13 February 2015
Published: 15 July 2015
Issue Date: February 2016
DOI: https://doi.org/10.1007/s11704-015-4570-7

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Accuracy estimation of link-based similarity measures and its application

Abstract

Access this article

Similar content being viewed by others

Accuracy Estimation of Link-Based Similarity Measures and Its Application

SimRank*: effective and scalable pairwise similarity search based on graph topology

Towards distributed node similarity search on graphs

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Accuracy estimation of link-based similarity measures and its application

Abstract

Access this article

Similar content being viewed by others

Accuracy Estimation of Link-Based Similarity Measures and Its Application

SimRank*: effective and scalable pairwise similarity search based on graph topology

Towards distributed node similarity search on graphs

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation