Skip to main content
Log in

Network completion by leveraging similarity of nodes

  • Original Article
  • Published:
Social Network Analysis and Mining Aims and scope Submit manuscript

Abstract

The analysis of social networks has attracted much attention in recent years. Link prediction is an important aspect of social network analysis and an area of key research within that is the network completion problem, where it is assumed that only a small sample of a network (e.g., a complete or partially observed subgraph of a social graph) is observed and we would like to infer the unobserved part of the network. In a typical network completion problem the standard methods, such as matrix completion, are inapplicable due the nonuniform sampling of observed links. This paper investigates the network completion problem and demonstrates that by effectively leveraging the side information about the nodes (such as the pairwise similarity), it is possible to predict the unobserved part of the network with high accuracy. To this end, we propose an efficient algorithm that decouples the completion from transduction stage to effectively exploit the similarity information. This crucial difference greatly boosts the performance where appropriate similarity information is used. The recovery error of the proposed algorithm is analyzed theoretically based on the richness of the similarity information and the size of the observed subnetwork. To the best of our knowledge, this is the first algorithm that addresses the network completion with similarity of nodes with provable guarantees. Through extensive experiments on four real-world datasets, we demonstrate that (1) leveraging side information in matrix completion by decoupling the completion from transduction significantly improves the link prediction performance, (2) proposed two-stage method can deal with the cold-start problem that arises when a new entity enters the network, and (3) our approach is scalable to large-scale networks.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

Notes

  1. A preliminary version of this paper is appeared in IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM) 2015.

  2. It is worth mentioning that for the most of networks, the literature algorithms are mostly assumed to use 60% of the data for their training, 20% for evaluation and 20% for their test, unless otherwise stated.

  3. We note that for many real-world social networks the underlying adjacency matrix is low rank (e.g., see Chiang et al. 2014).

  4. The names of the features have been anonymized for Facebook users, since the names of the features would reveal private data.

  5. http://snap.stanford.edu/data.

  6. A full list of all attributes and their values can be found at: https://snap.stanford.edu/data/egonets-Gplus.html.

  7. http://snap.stanford.edu/data.

  8. http://www.public.asu.edu/~jtang20/datasetcode/truststudy.htm.

  9. http://www.kddcup2012.org/c/kddcup2012-track1/data.

  10. http://cseweb.ucsd.edu/~akmenon/code/.

  11. It is worth reminding that the richer the side information of a network is, the more accurate the results are. Also, given same side information, for different applications, the accuracy of the results might be different.

References

  • Abernethy J, Bach F, Evgeniou T, Vert J-P (2009) A new approach to collaborative filtering: operator estimation with spectral regularization. JMLR 10:803–826

    MATH  Google Scholar 

  • Annibale A, Coolen ACC (2011) What you see is not what you get: how sampling affects macroscopic features of biological networks. Interface Focus 1(6):836–856

    Article  Google Scholar 

  • Barjasteh I, Forsati R, Masrour F, Esfahanian AH, Radha H (2015) Cold-start item and user recommendation with decoupled completion and transduction. In: Proceedings of the 9th ACM conference on recommender systems. ACM, pp 91–98

  • Barjasteh I, Forsati R, Ross D, Esfahanian A, Radha H (2016) Cold-Start Recommendation with Provable Guarantees: A Decoupled Approach. IEEE Trans Knowl Data Eng 28(6):1462-1474

    Article  Google Scholar 

  • Cai J-F, Candès EJ, Shen Z (2010) A singular value thresholding algorithm for matrix completion. SIAM J Optim 20(4):1956–1982

    Article  MathSciNet  MATH  Google Scholar 

  • Candès EJ, Recht B (2009) Exact matrix completion via convex optimization. Found Comput Math 9(6):717–772

    Article  MathSciNet  MATH  Google Scholar 

  • Chiang K-Y, Hsieh C-J, Dhillon IS (2015) Matrix completion with noisy side information. In: NIPS'15 Proceedings of the 28th International Conference on Neural Information Processing Systems. MIT Press, Cambridge, pp 3447–3455

  • Kai-Yang C, Cho-Jui H, Nagarajan N, Dhillon Inderjit S, Ambuj Tewari (2014) Prediction and clustering in signed networks: a local to global perspective. JMLR 15(1):1177–1213

    MathSciNet  MATH  Google Scholar 

  • Fang Y, Si L (2011) Matrix co-factorization for recommendation with rich side information and implicit feedback. In: Proceedings of the 2nd international workshop on information heterogeneity and fusion in recommender systems. ACM, pp 65–69

  • Frank O (2005) Network sampling and model fitting. In: Models Methods Social Network Analysis. Cambridge University Press, Cambridge, pp 31–56

  • Zeno G, Steffen R, Christoph F, Lars ST (2011) MyMediaLite: a free recommender system library. In: Proceedings of the 5th ACM conference on recommender systems (RecSys 2011)

  • Gittens AA (2013) Topics in randomized numerical linear algebra. PhD thesis, California Institute of Technology

  • Goldberg A, Recht B, Xu J, Nowak R, Zhu X (2010) Transduction with matrix completion: three birds with one stone. In: Advances in neural information processing systems, pp 757–765

  • Guimerà R, Sales-Pardo M (2009) Missing and spurious interactions and the reconstruction of complex networks. Proc Nat Acad Sci 106(52):22073–22078

    Article  Google Scholar 

  • Hanneke S, Xing EP (2009) Network completion and survey sampling. J Mach Learn Res 5:209–215

    Google Scholar 

  • Jain P, Dhillon IS (2013) Provable inductive matrix completion. arXiv preprint arXiv:1306.0626

  • Kim M, Leskovec J (2011) The network completion problem: inferring missing nodes and edges in networks. SDM SIAM 11:47–58

    Google Scholar 

  • Liben-Nowell D, Kleinberg J (2007) The link-prediction problem for social networks. J Am Soc Inf Sci Technol 58(7):1019–1031

    Article  Google Scholar 

  • Liu W, Wang J, Chang S-F (2012) Robust and scalable graph-based semisupervised learning. Proc IEEE 100(9):2624–2638

    Article  Google Scholar 

  • Masrour F, Barjasteh I, Forsati R, Esfahanian A-H, Radha H (2015) Network completion with node similarity: a matrix completion approach with provable guarantees. In: Proceedings of the 2015 IEEE/ACM international conference on advances in social networks analysis and mining 2015. ACM, pp 302–307

  • McPherson M, Smith-Lovin L, Cook JM (2001) Birds of a feather: homophily in social networks. Annu Rev Sociol 27:415–444

    Article  Google Scholar 

  • Menon AK, Chitrapura KP, Garg S, Agarwal D, Kota N (2011) Response prediction using collaborative filtering with hierarchies and side information. In: Proceedings of the 17th ACM SIGKDD international conference on knowledge discovery and data mining. ACM, pp 141–149

  • Menon AK, Elkan C (2011) Link prediction via matrix factorization. In: Gunopulos D, Hofmann T, Malerba D, Vazirgiannis M (eds) Joint European Conference on Machine Learning and Knowledge Discovery in Databases, Springer Berlin, Heidelberg, pp 437–452  

  • Pan W, Xiang EW, Liu NN, Yang Q (2010) Transfer learning in collaborative filtering for sparsity reduction. AAAI 10:230–235

    Google Scholar 

  • Papagelis M, Das G, Koudas N (2013) Sampling online social networks. Knowl Data Eng, IEEE Trans 25(3):662–676

    Article  Google Scholar 

  • Porteous I, Asuncion AU, Welling M (2010) Bayesian matrix factorization with side information and dirichlet process mixtures. In Proceedings of the 24th AAAI Conference on Artificial Intelligence, pp 563–568

  • Recht B (2011) A simpler approach to matrix completion. JMLR 12:3413–3430

    MathSciNet  MATH  Google Scholar 

  • Shiga M, Takigawa I, Mamitsuka H (2007) Annotating gene function by combining expression data with a modular gene network. Bioinformatics 23(13):i468–i478

    Article  Google Scholar 

  • Srebro N, Rennie J, Jaakkola TS (2004) Maximum-margin matrix factorization. In Advances in neural information processing systems, pp 1329–1336

  • Zhou T, Shan H, Banerjee A, Sapiro G (2012) Kernelized probabilistic matrix factorization: exploiting graphs and side information. SDM SIAM 12:403–414

    Google Scholar 

Download references

Acknowledgements

This work was supported in part by the National Science Foundation under Awards IIS-096849, CCF-1117709, and 1331852. The authors would like to thank the anonymous reviewers for their insightful comments and suggestions. We also acknowledge contributions of Farzan Masrour in the conference version of this work.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Rana Forsati.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Forsati, R., Barjasteh, I., Ross, D. et al. Network completion by leveraging similarity of nodes. Soc. Netw. Anal. Min. 6, 102 (2016). https://doi.org/10.1007/s13278-016-0405-2

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s13278-016-0405-2

Keywords

Navigation