ABSTRACT
In many online networks, nodes are partitioned into categories (e.g., countries or universities in OSNs), which naturally defines a weighted category graph i.e., a coarse-grained version of the underlying network. In this paper, we show how to efficiently estimate the category graph from a probability sample of nodes. We prove consistency of our estimators and evaluate their efficiency via simulation. We also apply our methodology to a sample of Facebook users to obtain a number of category graphs, such as the college friendship graph and the country friendship graph. We share and visualize the resulting data at www.geosocialmap.com.
- Mapping Global Friendship Ties: http://tinyurl.com/TiesFB.Google Scholar
- N. Ahmed, J. Neville, and R. Kompella. Reconsidering the Foundations of Network Sampling. In Proc. of WIN, 2010.Google Scholar
- D. Aldous and J. A. Fill. Reversible Markov Chains and Random Walks on Graphs. In preparation.Google Scholar
- M. Gjoka, C. T. Butts, M. Kurant, and A. Markopoulou. Multigraph Sampling of Online Social Networks. IEEE JSAC on Measurement of Internet Topologies, 2011.Google ScholarCross Ref
- M. Gjoka, M. Kurant, C. T. Butts, and A. Markopoulou. Walking in Facebook: A Case Study of Unbiased Sampling of OSNs. In Proc. of IEEE INFOCOM, 2010. Google ScholarDigital Library
- M. Gjoka, M. Kurant, C. T. Butts, and A. Markopoulou. Practical Recommendations on Crawling Online Social Networks. IEEE JSAC on Measurement of Internet Topologies, 2011.Google ScholarCross Ref
- M. Hansen and W. Hurwitz. On the Theory of Sampling from Finite Populations. Annals of Math. Statistics, 1943.Google ScholarCross Ref
- D. D. Heckathorn. Respondent-Driven Sampling: A New Approach to the Study of Hidden Populations. Social Problems, 44:174--199, 1997.Google ScholarCross Ref
- M. R. Henzinger, A. Heydon, M. Mitzenmacher, and M. Najork. On near-uniform URL sampling. In Proc. of WWW, 2000. Google ScholarDigital Library
- L. Katzir, E. Liberty, and O. Somekh. Estimating Sizes of Social Networks via Biased Sampling. In WWW, 2011. Google ScholarDigital Library
- E. D. Kolaczyk. Statistical Analysis of Network Data, volume 69 of Springer Series in Statistics. 2009. Google ScholarDigital Library
- M. Kurant, C. T. Butts, and A. Markopoulou. Graph Size Estimation. in preparation.Google Scholar
- M. Kurant, M. Gjoka, C. T. Butts, and A. Markopoulou. Walking on a Graph with a Magnifying Glass: Stratified Sampling via Weighted Random Walks. In Proc. of Sigmetrics, 2011. Google ScholarDigital Library
- M. Kurant, M. Gjoka, Y. Wang, Z. W. Almquist, C. T. Butts, and A. Markopoulou. Coarse-Grained Topology Estimation via Graph Sampling. arXiv:1105.5488, 2011.Google Scholar
- M. Kurant, A. Markopoulou, and P. Thiran. On the bias of BFS (Breadth First Search). In Proc. of ITC, 2010.Google ScholarCross Ref
- M. Kurant, A. Markopoulou, and P. Thiran. Towards Unbiased BFS Sampling. IEEE JSAC on Measurement of Internet Topologies, 2011.Google ScholarCross Ref
- S. H. Lee, P.-J. Kim, and H. Jeong. Statistical properties of sampled networks. Phys. Review E, 73:16102, 2006.Google ScholarCross Ref
- J. Leskovec and C. Faloutsos. Sampling from large graphs. In Proc. of SIGKDD, 2006. Google ScholarDigital Library
- L. Lovász. Random walks on graphs: A survey. Combinatorics, Paul Erdos is Eighty, 2(1):1--46, 1993.Google Scholar
- A. Maiya and T. Berger-Wolf. Sampling community structure. In WWW, 2010. Google ScholarDigital Library
- M. Newman. Finding community structure in networks using the eigenvectors of matrices. Phys. Review E, 2006.Google ScholarCross Ref
- A. Rasti, M. Torkjazi, R. Rejaie, N. Duffield, W. Willinger, and D. Stutzbach. Respondent-driven sampling for characterizing unstructured overlays. In Proc. of IEEE INFOCOM Mini-conference, 2009.Google ScholarCross Ref
- R. Rejaie, M. Torkjazi, M. Valafar, and W. Willinger. Sizing up online social networks. IEEE Network, 24(5):32--37, 2010. Google ScholarDigital Library
- B. Ribeiro and D. Towsley. Estimating and sampling graphs with multidimensional random walks. In Proc.of IMC, 2010. Google ScholarDigital Library
- M. Salganik and D. D. Heckathorn. Sampling and estimation in hidden populations using respondent-driven sampling. Sociological Methodology, 34(1):193--240, 2004.Google ScholarCross Ref
- D. Stutzbach, R. Rejaie, N. Duffield, S. Sen, and W. Willinger. On unbiased sampling for unstructured peer-to-peer networks. In Proc. of IMC, 2006. Google ScholarDigital Library
- A. Traud, P. Mucha, and M. Porter. Social Structure of Facebook Networks. arXiv:1102.2166, 2011.Google Scholar
- B. Viswanath, A. Mislove, M. Cha, and K. Gummadi. On the evolution of user interaction in facebook. In Proc. of WOSN, 2009. Google ScholarDigital Library
- E. Volz and D. D. Heckathorn. Probability based estimation theory for respondent driven sampling. Journal of Official Statistics, 2008.Google Scholar
- S. Wasserman and K. Faust. Social Network Analysis: Methods and Applications. Cambridge Univ. Press, 1994.Google ScholarCross Ref
Index Terms
- Coarse-grained topology estimation via graph sampling
Recommendations
Unbiased Sampling of Bipartite Graph
CYBERC '11: Proceedings of the 2011 International Conference on Cyber-Enabled Distributed Computing and Knowledge DiscoveryIncreasing size of online social networks (OSNs) has given rise to sampling method studies that provide a relatively small but representative sample of large-scale OSNs so that the measurement and analysis burden can be affordable. So far, a number of ...
Sampling in online social networks
SAC '14: Proceedings of the 29th Annual ACM Symposium on Applied ComputingIn this paper, we propose a new graph sampling method for online social networks that achieves the following. First, a sample graph should reflect the ratio between the number of nodes and the number of edges of the original graph. Second, a sample ...
Albatross sampling: robust and effective hybrid vertex sampling for social graphs
HotPlanet '11: Proceedings of the 3rd ACM international workshop on MobiArchNowadays, Online Social Networks (OSNs) have become dramatically popular and the study of social graphs attracts the interests of a large number of researchers. One critical challenge is the huge size of the social graph, which makes the graph analyzing ...
Comments