Skip to main content
Log in

Large-scale supervised similarity learning in networks

  • Regular paper
  • Published:
Knowledge and Information Systems Aims and scope Submit manuscript

Abstract

The problem of similarity learning is relevant to many data mining applications, such as recommender systems, classification, and retrieval. This problem is particularly challenging in the context of networks, which contain different aspects such as the topological structure, content, and user supervision. These different aspects need to be combined effectively, in order to create a holistic similarity function. In particular, while most similarity learning methods in networks such as SimRank utilize the topological structure, the user supervision and content are rarely considered. In this paper, a factorized similarity learning (FSL) is proposed to integrate the link, node content, and user supervision into a uniform framework. This is learned by using matrix factorization, and the final similarities are approximated by the span of low-rank matrices. The proposed framework is further extended to a noise-tolerant version by adopting a hinge loss alternatively. To facilitate efficient computation on large-scale data, a parallel extension is developed. Experiments are conducted on the DBLP and CoRA data sets. The results show that FSL is robust and efficient and outperforms the state of the art. The code for the learning algorithm used in our experiments is available at http://www.ifp.illinois.edu/~chang87/.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13

Similar content being viewed by others

References

  1. Aggarwal CC (2003) Towards systematic design of distance functions for data mining applications. In: Proceedings of the ninth ACM SIGKDD, ACM, pp 9–18

  2. Bar-Hillel A, Hertz T, Shental N, Weinshall D (2005) Learning a mahalanobis metric from equivalence constraints. J Mach Learn Res 6:937–965

    MathSciNet  MATH  Google Scholar 

  3. Birgin EG, Martínez JM, Raydan M (2000) Nonmonotone spectral projected gradient methods on convex sets. SIAM J Optim 10(4):1196–1211

    Article  MathSciNet  MATH  Google Scholar 

  4. Boyd S, Vandenberghe L (2004) Convex optimization. Cambridge University Press, New York

    Book  MATH  Google Scholar 

  5. Ca JF, Candès EJ, Shen Z (2010) A singular value thresholding algorithm for matrix completion. SIAM J Optim 20(4):1956–1982

    Article  MathSciNet  MATH  Google Scholar 

  6. Chang S, Qi G, Aggarwal C, Zhou J, Wang M, Huang T (2014) Factorized similarity learning in networks. In: ICDM, pp 60–69

  7. Cheney W, Goldstein AA (1959) Proximity maps for convex sets. Proc Am Math Soc 10(3):448–450

    Article  MathSciNet  MATH  Google Scholar 

  8. Davis JV, Kulis B, Jain P, Sra S, Dhillon IS (2007) Information-theoretic metric learning. In: ICML, pp 209–216

  9. Dean J, Ghemawat S (2008) Mapreduce: simplified data processing on large clusters. Commun ACM 51(1):107–113

    Article  Google Scholar 

  10. Deng H, Han J, Zhao B, Yu Y, Lin CX (2011) Probabilistic topic models with biased propagation on heterogeneous information networks. In: SIGKDD, pp 1271–1279

  11. Geerts F, Mannila H, Terzi E (2004) Relational link-based ranking. In: VLDB, pp 552–563

  12. Goldberger J, Roweis S, Hinton H, Salakhutdinov R (2004) Neighbourhood components analysis. In: NIPS, pp 513–520

  13. Han SP (1988) A successive projection method. Math Progr 40(1–3):1–14

    Article  MathSciNet  MATH  Google Scholar 

  14. Hoi SCH, Liu W, Chang SF (2008) Semi-supervised distance metric learning for collaborative image retrieval. In: CVPR, IEEE computer society

  15. Jeh G, Widom J (2002) Simrank: a measure of structural-context similarity. In: SIGKDD, pp 538–543

  16. KorenY Bell R, Volinsky C (2009) Matrix factorization techniques for recommender systems. Computer 42(8):30–37

    Article  Google Scholar 

  17. Kotz S, Kozubowski T, Podgorski K (2001) The laplace distribution and generalizations: a revisit with applications to communications, economics, engineering, and finance. Progress in mathematics series. Birkhäuser, Boston

  18. Kumar N, Kummamuru K, Paranjpe D (2005) Semi-supervised clustering with metric learning using relative comparisons. In: Fifth IEEE international conference on data mining, p 4

  19. Lee DD, Seung HS (1999) Learning the parts of objects by non-negative matrix factorization. Nature 401(6755):788–791

    Article  Google Scholar 

  20. Li Z, Chang S, Liang F, Huang TS, Cao L, Smith JR (2013) Learning locally-adaptive decision functions for person verification. In: CVPR, 2013

  21. Lin Z, King I, Lyu M (2006) Pagesim: a novel link-based similarity measure for the world wide web. In: IEEE/WIC/ACM international conference on web intelligence, 2006. WI 2006, pp 687–693

  22. Liu X, Ji R, Yao H, Xu P, Sun X, Liu T (2008) Cross-media manifold learning for image retrieval and annotation. In: Lew MS, Bimbo AD, Bakker EM (eds) Multimedia information retrieval. ACM, New York, pp 141–148

  23. Ma H, Yang H, Lyu MR, King I (2008) Sorec: social recommendation using probabilistic matrix factorization. In: CKIM, pp 931–940

  24. Manning CD, Raghavan P, Schütze H (2008) Introduction to information retrieval. Cambridge University Press, New York

    Book  MATH  Google Scholar 

  25. McCallum AK, Nigam K, Rennie J, Seymore K (2000) Automating the construction of internet portals with machine learning. Inf Retr 3(2):127–163

    Article  Google Scholar 

  26. McPherson M, Smith-Lovin L, Cook JM (2001) Birds of a feather: homophily in social networks. Annu Rev Sociol 27:415–444

    Article  Google Scholar 

  27. Mnih A, Salakhutdinov R (2007) Probabilistic matrix factorization. In: NIPS, pp 1257–1264

  28. Nesterov Y, Nesterov IE (2004) Introductory lectures on convex optimization: a basic course, vol 87. Springer, Berlin

    MATH  Google Scholar 

  29. Paatero P, Tapper U (1994) Positive matrix factorization: a non-negative factor model with optimal utilization of error estimates of data values. Environmetrics 5(2):111–126

    Article  Google Scholar 

  30. Page L, Brin S, Motwani R, Winograd T (1999) The pagerank citation ranking: bringing order to the web. Technical report 1999-66, Stanford InfoLab, November 1999. Previous number = SIDL-WP-1999-0120

  31. Purushotham S, Liu Y, Kuo CCJ (2012) Collaborative topic regression with social matrix factorization for recommendation systems. In: ICML, 2012

  32. Qi GJ, Aggarwal C, Tian Q, Ji H, Huang T (2012) Exploring context and content links in social media: a latent space method. IEEE Trans Pattern Anal Mach Intell 34(5):850–862

    Article  Google Scholar 

  33. Qi GJ, Tang J, Zha ZJ, Chua TS, Zhang HJ (2009) An efficient sparse metric learning in high-dimensional space via l1-penalized log-determinant regularization. In: ICML, pp 841–848

  34. Qian B, Wang X, Wang F, Li H, Ye J, Davidson I (2013) Active learning from relative queries. In: Proceedings of the twenty-third international joint conference on artificial intelligence. AAAI Press, pp 1614–1620

  35. Qian B, Wang X, Wang J, Li H, Cao N, Zhi W, Davidson I (2013) Fast pairwise query selection for large-scale active learning to rank. In: IEEE 13th international conference on data mining (ICDM), 2013, pp 607–616

  36. Shalev-Shwartz S, Singer Y, Srebro N (2007) Pegasos: primal estimated sub-gradient solver for svm. In: ICML, pp 807–814

  37. Tang J, Yan S, Hong R, Qi GJ, Chua TS (2009) Inferring semantic concepts from community-contributed images and noisy tags. In: SIGMM. ACM, pp 223–232

  38. Tseng P (2001) Convergence of a block coordinate descent method for nondifferentiable minimization. J Optim Theory Appl 109(3):475–494

    Article  MathSciNet  MATH  Google Scholar 

  39. Vapnik VN (1995) The nature of statistical learning theory. Springer, New York

    Book  MATH  Google Scholar 

  40. Wang C, Blei DM (2011) Collaborative topic modeling for recommending scientific articles. In: SIGKDD, pp 448–456

  41. Weinberger KQ, Saul LK (2009) Distance metric learning for large margin nearest neighbor classification. J Mach Learn Res 10:207–244

    MATH  Google Scholar 

  42. Wen Z, Yin W, Zhang Y (2012) Solving a low-rank factorization model for matrix completion by a nonlinear successive over-relaxation algorithm. Math Progr Comput 4(4):333–361

    Article  MathSciNet  MATH  Google Scholar 

  43. Xi W, Fox EA, Fan W, Zhang B, Chen Z, Yan J, Zhuang D (2005) Simfusion: measuring similarity using unified relationship matrix. In: SIGIR, pp 130–137

  44. Xing EP, Ng AY, Jordan MY, Russell S (2003) Distance metric learning, with application to clustering with side-information. In: NIPS, pp 505–512

  45. Zeng C, Jiang Y, Zheng L, Li J, Li L, Li L, Shen C, Zhou W, Li T, Duan B, Lei M, Wang P (2013) Fiu-miner: a fast, integrated, and user-friendly system for data mining in distributed environment. In: SIGKDD, pp 1506–1509

  46. Zhao P, Han J, Sun Y (2009) P-rank: a comprehensive structural similarity measure over information networks. In: CIKM, pp 553–562

  47. Zhou J, Lu Z, Sun J, Yuan L, Wang F, Ye J (2013) Feafiner: biomarker identification from medical data through feature generalization and selection. In: SIGKDD, pp 1034–1042

Download references

Acknowledgments

The work of Shiyu Chang and Thomas S. Huang was funded in part by the National Science Foundation under Grant Number 1318971 and the Samsung Global Research Program 2013 under Theme “Big Data and Network,” Subject “Privacy and Trust Management In Big Data Analysis.” This work was partially sponsored by the Army Research Laboratory under Cooperative Agreement Number W911NF-09-2-0053.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Shiyu Chang.

Additional information

This paper is an extended journal version of the ICDM 2014 best student paper [6] for the “Best of ICDM” special issue.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Chang, S., Qi, GJ., Yang, Y. et al. Large-scale supervised similarity learning in networks. Knowl Inf Syst 48, 707–740 (2016). https://doi.org/10.1007/s10115-015-0894-8

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10115-015-0894-8

Keywords

Navigation