Abstract
The analysis of social networks has attracted much attention in recent years. Link prediction is an important aspect of social network analysis and an area of key research within that is the network completion problem, where it is assumed that only a small sample of a network (e.g., a complete or partially observed subgraph of a social graph) is observed and we would like to infer the unobserved part of the network. In a typical network completion problem the standard methods, such as matrix completion, are inapplicable due the nonuniform sampling of observed links. This paper investigates the network completion problem and demonstrates that by effectively leveraging the side information about the nodes (such as the pairwise similarity), it is possible to predict the unobserved part of the network with high accuracy. To this end, we propose an efficient algorithm that decouples the completion from transduction stage to effectively exploit the similarity information. This crucial difference greatly boosts the performance where appropriate similarity information is used. The recovery error of the proposed algorithm is analyzed theoretically based on the richness of the similarity information and the size of the observed subnetwork. To the best of our knowledge, this is the first algorithm that addresses the network completion with similarity of nodes with provable guarantees. Through extensive experiments on four real-world datasets, we demonstrate that (1) leveraging side information in matrix completion by decoupling the completion from transduction significantly improves the link prediction performance, (2) proposed two-stage method can deal with the cold-start problem that arises when a new entity enters the network, and (3) our approach is scalable to large-scale networks.
Similar content being viewed by others
Notes
A preliminary version of this paper is appeared in IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM) 2015.
It is worth mentioning that for the most of networks, the literature algorithms are mostly assumed to use 60% of the data for their training, 20% for evaluation and 20% for their test, unless otherwise stated.
We note that for many real-world social networks the underlying adjacency matrix is low rank (e.g., see Chiang et al. 2014).
The names of the features have been anonymized for Facebook users, since the names of the features would reveal private data.
A full list of all attributes and their values can be found at: https://snap.stanford.edu/data/egonets-Gplus.html.
It is worth reminding that the richer the side information of a network is, the more accurate the results are. Also, given same side information, for different applications, the accuracy of the results might be different.
References
Abernethy J, Bach F, Evgeniou T, Vert J-P (2009) A new approach to collaborative filtering: operator estimation with spectral regularization. JMLR 10:803–826
Annibale A, Coolen ACC (2011) What you see is not what you get: how sampling affects macroscopic features of biological networks. Interface Focus 1(6):836–856
Barjasteh I, Forsati R, Masrour F, Esfahanian AH, Radha H (2015) Cold-start item and user recommendation with decoupled completion and transduction. In: Proceedings of the 9th ACM conference on recommender systems. ACM, pp 91–98
Barjasteh I, Forsati R, Ross D, Esfahanian A, Radha H (2016) Cold-Start Recommendation with Provable Guarantees: A Decoupled Approach. IEEE Trans Knowl Data Eng 28(6):1462-1474
Cai J-F, Candès EJ, Shen Z (2010) A singular value thresholding algorithm for matrix completion. SIAM J Optim 20(4):1956–1982
Candès EJ, Recht B (2009) Exact matrix completion via convex optimization. Found Comput Math 9(6):717–772
Chiang K-Y, Hsieh C-J, Dhillon IS (2015) Matrix completion with noisy side information. In: NIPS'15 Proceedings of the 28th International Conference on Neural Information Processing Systems. MIT Press, Cambridge, pp 3447–3455
Kai-Yang C, Cho-Jui H, Nagarajan N, Dhillon Inderjit S, Ambuj Tewari (2014) Prediction and clustering in signed networks: a local to global perspective. JMLR 15(1):1177–1213
Fang Y, Si L (2011) Matrix co-factorization for recommendation with rich side information and implicit feedback. In: Proceedings of the 2nd international workshop on information heterogeneity and fusion in recommender systems. ACM, pp 65–69
Frank O (2005) Network sampling and model fitting. In: Models Methods Social Network Analysis. Cambridge University Press, Cambridge, pp 31–56
Zeno G, Steffen R, Christoph F, Lars ST (2011) MyMediaLite: a free recommender system library. In: Proceedings of the 5th ACM conference on recommender systems (RecSys 2011)
Gittens AA (2013) Topics in randomized numerical linear algebra. PhD thesis, California Institute of Technology
Goldberg A, Recht B, Xu J, Nowak R, Zhu X (2010) Transduction with matrix completion: three birds with one stone. In: Advances in neural information processing systems, pp 757–765
Guimerà R, Sales-Pardo M (2009) Missing and spurious interactions and the reconstruction of complex networks. Proc Nat Acad Sci 106(52):22073–22078
Hanneke S, Xing EP (2009) Network completion and survey sampling. J Mach Learn Res 5:209–215
Jain P, Dhillon IS (2013) Provable inductive matrix completion. arXiv preprint arXiv:1306.0626
Kim M, Leskovec J (2011) The network completion problem: inferring missing nodes and edges in networks. SDM SIAM 11:47–58
Liben-Nowell D, Kleinberg J (2007) The link-prediction problem for social networks. J Am Soc Inf Sci Technol 58(7):1019–1031
Liu W, Wang J, Chang S-F (2012) Robust and scalable graph-based semisupervised learning. Proc IEEE 100(9):2624–2638
Masrour F, Barjasteh I, Forsati R, Esfahanian A-H, Radha H (2015) Network completion with node similarity: a matrix completion approach with provable guarantees. In: Proceedings of the 2015 IEEE/ACM international conference on advances in social networks analysis and mining 2015. ACM, pp 302–307
McPherson M, Smith-Lovin L, Cook JM (2001) Birds of a feather: homophily in social networks. Annu Rev Sociol 27:415–444
Menon AK, Chitrapura KP, Garg S, Agarwal D, Kota N (2011) Response prediction using collaborative filtering with hierarchies and side information. In: Proceedings of the 17th ACM SIGKDD international conference on knowledge discovery and data mining. ACM, pp 141–149
Menon AK, Elkan C (2011) Link prediction via matrix factorization. In: Gunopulos D, Hofmann T, Malerba D, Vazirgiannis M (eds) Joint European Conference on Machine Learning and Knowledge Discovery in Databases, Springer Berlin, Heidelberg, pp 437–452
Pan W, Xiang EW, Liu NN, Yang Q (2010) Transfer learning in collaborative filtering for sparsity reduction. AAAI 10:230–235
Papagelis M, Das G, Koudas N (2013) Sampling online social networks. Knowl Data Eng, IEEE Trans 25(3):662–676
Porteous I, Asuncion AU, Welling M (2010) Bayesian matrix factorization with side information and dirichlet process mixtures. In Proceedings of the 24th AAAI Conference on Artificial Intelligence, pp 563–568
Recht B (2011) A simpler approach to matrix completion. JMLR 12:3413–3430
Shiga M, Takigawa I, Mamitsuka H (2007) Annotating gene function by combining expression data with a modular gene network. Bioinformatics 23(13):i468–i478
Srebro N, Rennie J, Jaakkola TS (2004) Maximum-margin matrix factorization. In Advances in neural information processing systems, pp 1329–1336
Zhou T, Shan H, Banerjee A, Sapiro G (2012) Kernelized probabilistic matrix factorization: exploiting graphs and side information. SDM SIAM 12:403–414
Acknowledgements
This work was supported in part by the National Science Foundation under Awards IIS-096849, CCF-1117709, and 1331852. The authors would like to thank the anonymous reviewers for their insightful comments and suggestions. We also acknowledge contributions of Farzan Masrour in the conference version of this work.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Forsati, R., Barjasteh, I., Ross, D. et al. Network completion by leveraging similarity of nodes. Soc. Netw. Anal. Min. 6, 102 (2016). https://doi.org/10.1007/s13278-016-0405-2
Received:
Revised:
Accepted:
Published:
DOI: https://doi.org/10.1007/s13278-016-0405-2