Skip to main content
Log in

Diffusion archeology for diffusion progression history reconstruction

  • Regular Paper
  • Published:
Knowledge and Information Systems Aims and scope Submit manuscript

Abstract

Diffusion through graphs can be used to model many real-world processes, such as the spread of diseases, social network memes, computer viruses, or water contaminants. Often, a real-world diffusion cannot be directly observed, while it is occurring—perhaps it is not noticed until some time has passed, continuous monitoring is too costly, or privacy concerns limit data access. This leads to the need to reconstruct how the present state of the diffusion came to be from partial diffusion data. Here, we tackle the problem of reconstructing a diffusion history from one or more snapshots of the diffusion state. This ability can be invaluable to learn when certain computer nodes are infected or which people are the initial disease spreaders to control future diffusions. We formulate this problem over discrete-time SEIRS-type diffusion models in terms of maximum likelihood. We design methods that are based on submodularity and a novel Prize Collecting Dominating Set Vertex cover relaxation that can identify likely diffusion steps with some provable performance guarantees. Our methods are the first to be able to reconstruct complete diffusion histories accurately in real and simulated situations. As a special case, they can also identify the initial spreaders better than the existing methods for that problem. Our results for both meme and contaminant diffusion show that the partial diffusion data problem can be overcome with proper modeling and methods, and that hidden temporal characteristics of diffusion can be predicted from limited data.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

Notes

  1. http://www.cs.cmu.edu/~ckingsf/software/dhrec.

References

  1. Agresti A (2002) Categorical data analysis, Wiley series in probability and statistics, 2nd edn. Wiley-Interscience, New Jersey

    Google Scholar 

  2. Avi O et al (2008) The battle of water sensor networks (bwsn): a design challenge for engineers and algorithms. J Water Resour Plan Manag 134(6):556–568

    Article  Google Scholar 

  3. Barabási A-L, Albert R (1999) Emergence of scaling in random networks. Science 286(5439):509–512

    Article  MathSciNet  MATH  Google Scholar 

  4. Blondel VD, Guillaume J-L, Lambiotte R, Lefebvre E (2008) Fast unfolding of communities in large networks. J Stat Mech: Theory Exp (10):P10008+

  5. Boykov Y, Kolmogorov V (2004) An experimental comparison of min-cut/max- flow algorithms for energy minimization in vision. Pattern Anal Mach Intell IEEE Trans 26(9):1124–1137

    Article  MATH  Google Scholar 

  6. Buchbinder N, Feldman M, Naor JS, Schwartz R (2012) A tight linear time (1/2)-approximation for unconstrained submodular maximization. In: 2012 IEEE 53rd annual symposium on foundations of computer science, IEEE, pp 649–658

  7. Erdos P, Rnyi A (1960) On the evolution of random graphs. In: Publication of the Mathematical Institute of the Hungarian Academy of Sciences, pp 17–61

  8. Feige U (1998) A threshold of ln n for approximating set cover. J ACM 45(4):634–652

    Article  MathSciNet  MATH  Google Scholar 

  9. Feige U, Goemans M (1995) Approximating the value of two power proof systems, with applications to max 2sat and max dicut. In: Theory of computing and systems, 1995. Proceedings of the third Israel symposium, pp 182–189

  10. Feige U, Mirrokni VS, Vondrak J (2007) Maximizing non-monotone submodular functions. In: Proceedings of the 48th annual IEEE symposium on foundations of computer science. FOCS ’07. IEEE Computer Society, Washington, DC, USA, pp 461–471

  11. Gomez-Rodriguez M, Leskovec J, Schölkopf B (2013) Structure and dynamics of information pathways in online media. WSDM ’13. ACM, New York, pp 23–32

  12. Gupta A, Roth A, Schoenebeck G, Talwar K (2010) Constrained non-monotone submodular maximization: offline and secretary algorithms. CoRR, abs/1003.1517

  13. Hethcote HW (2000) The mathematics of infectious diseases. SIAM Rev 42(4):599–653

    Article  MathSciNet  MATH  Google Scholar 

  14. Hochbaum DS (2000) Instant recognition of polynomial time solvability, half integrality and 2-approximations. In: ‘APPROX ’00’, Springer, Berlin, pp 2–14

  15. Holme P (2013) Epidemiologically optimal static networks from temporal network data. PLoS Comput Biol 9(7):e1003142

    Article  MathSciNet  Google Scholar 

  16. IBM ILOG CPLEX Optimizer ( 2010) http://www.ilog.com/products/cplex/

  17. Kempe D, Kleinberg J, Tardos E (2003) Maximizing the spread of influence through a social network. In: Proceedings of the ninth ACM SIGKDD international conference on knowledge discovery and data mining. ACM, New York, pp 137–146

  18. Kolmogorov V, Zabih R (2004) What energy functions can be minimized via graph cuts. IEEE Trans Pattern Anal Mach Intell 26:65–81

    Article  MATH  Google Scholar 

  19. Lappas T, Terzi E, Gunopulos D, Mannila H (2010) Finding effectors in social networks, in ‘KDD ’10’. ACM Press, New York

    Google Scholar 

  20. Lee J, Mirrokni VS, Nagarajan V, Sviridenko M (2009) Non-monotone submodular maximization under matroid and knapsack constraints. IN: Proceedings of the forty-first annual ACM symposium on theory of computing. ACM, New York, pp 323–332

  21. Leskovec J, Adamic LA, Huberman BA (2007) The dynamics of viral marketing. ACM Trans Web 1(1):5

    Article  Google Scholar 

  22. Leskovec J, Kleinberg J, Faloutsos C (2005) Graphs over time: densification laws, shrinking diameters and possible explanations. KDD ’05. ACM, New York, pp 177–187

    Chapter  Google Scholar 

  23. Prakash BA, Chakrabarti D, Faloutsos M, Valler N, Faloutsos C (2011) Threshold conditions for arbitrary cascade models on arbitrary networks. In: Proceedings of the 2011 IEEE 11th international conference on data mining. IEEE Computer Society, Washington, DC, pp 537–546

  24. Prakash BA, Vreeken J, Faloutsos C (2012) Spotting culprits in epidemics: How many and which ones? In: ‘ICDM’, pp 11–20

  25. Rossman L (1999) The epanet programmer’s toolkit for analysis of water distribution systems. In: ‘WRPMD’99’, pp 1–10

  26. Salathé M, Kazandjieva M, Lee JW, Levis P, Feldman MW, Jones JH (2010) A high-resolution human contact network for infectious disease transmission. Proc Natl Acad Sci USA 107(51):22020–22025

    Article  Google Scholar 

  27. Schrijver A (2003) Combinatorial optimization—polyhedra and efficiency. Springer, Berlin

    MATH  Google Scholar 

  28. Sefer E, Kingsford C (2011) Metric labeling and semi-metric embedding for protein annotation prediction. In: Research in computational molecular biology. Springer, Berlin, pp 392–407

  29. Sefer E, Kingsford C (2014) Diffusion archaeology for diffusion progression history reconstruction. In: Data mining (ICDM), 2014 IEEE international conference on, pp 530–539

  30. Sefer E, Kingsford C (2015) Convex risk minimization to infer networks from probabilistic diffusion data at multiple scales. In: Data engineering (ICDE), 2015 IEEE 31st international conference on, pp 663–674

  31. Serazzi G, Zanero S (2003) Computer virus propagation models. In: In Tutorials of the 11th IEEE/ACM international symposium on modeling, analysis and simulation of computer and telecommunications systems (MASCOTS03)’, Springer, Berlin

  32. Shah D, Zaman T (2011) Finding rumor sources on random graphs. arXiv:1110.6230

  33. Wolsey LA, Nemhauser GL (1999) Integer and combinatorial optimization. Wiley-Interscience, New Jersey

    MATH  Google Scholar 

  34. Yang J, Leskovec J (2011) Patterns of temporal variation in online media. In: ACM international conference on web search and data minig (WSDM), pp 177–186

  35. Zhang Y-Q, Li X, Liang D, Cui J (2015) Characterizing bursts of aggregate pairs with individual poissonian activity and preferential mobility. Commun Lett IEEE 19(7):1225–1228

    Article  Google Scholar 

  36. Zhu K, Ying L (2015) Source localization in networks: trees and beyond. arXiv preprint arXiv:1510.01814

Download references

Acknowledgments

We thank anonymous reviewers for their very useful comments and suggestions. This work has been partially funded by the US National Science Foundation (CCF-1256087, CCF-1319998) and US National Institutes of Health (R21HG006913 and R01HG007104). C.K. received support as an Alfred P. Sloan Research Fellow.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Emre Sefer.

Additional information

A preliminary version of this paper appeared in ICDM 2014 [29].

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Sefer, E., Kingsford, C. Diffusion archeology for diffusion progression history reconstruction. Knowl Inf Syst 49, 403–427 (2016). https://doi.org/10.1007/s10115-015-0904-x

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10115-015-0904-x

Keywords

Navigation