skip to main content
10.1145/2661829.2662014acmconferencesArticle/Chapter ViewAbstractPublication PagescikmConference Proceedingsconference-collections
research-article

Microblog Topic Contagiousness Measurement and Emerging Outbreak Monitoring

Authors Info & Claims
Published:03 November 2014Publication History

ABSTRACT

A recent study on collective attention in Twitter shows that an epidemic spreading of hashtags is predominantly driven by external factors. We extend a time-series form of susceptible-infectious-recovered (SIR) model to monitor microblog emerging outbreaks by considering both endogenous and exogenous drivers. In addition, we adopt partially labeled Dirichlet allocation (PLDA) model to generate both background latent topics and hashtag topics. It overcomes the problem of small available samples in hashtag analysis by including related but unlabeled tweets through inference. We standardize hashtag topic contagiousness measure as the estimated effective-reproduction-number R derived from epidemiology. It is obtained by Bayesian parameter estimation. Guided by R, one can profile and categorize emerging topics, and generate alerts on potential outbreaks. Experiment results confirm the effectiveness of this approach.

References

  1. R. Anderson and R. May. Infectious Diseases of Humans: Dynamics and Control. Oxford science publications. OUP Oxford, 1992.Google ScholarGoogle Scholar
  2. M. Baker, A. McNicholas, N. Garrett, N. Jones, J. Stewart, V. Koberstein, and D. Lennon. Household crowding a major risk factor for epidemic meningococcal disease in auckland children. The Pediatric infectious disease journal, 19(10):983--990, 2000.Google ScholarGoogle Scholar
  3. L. M. Bettencourt and R. M. Ribeiro. Real time bayesian estimation of the epidemic potential of emerging infectious diseases. PLoS One, 3(5):e2185, 2008.Google ScholarGoogle ScholarCross RefCross Ref
  4. D. M. Blei. Probabilistic topic models. Communications of the ACM, 55(4):77--84, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. D. M. Blei, A. Y. Ng, and M. I. Jordan. Latent dirichlet allocation. the Journal of machine Learning research, 3:993--1022, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. M. Boily, C. Lowndes, and M. Alary. The impact of HIV epidemic phases on the effectiveness of core group interventions: insights from mathematical models. Sexually transmitted infections, 78(suppl 1):i78--i90, 2002.Google ScholarGoogle Scholar
  7. Centers for Disease Control and Prevention, USA. Inuenza viruses isolated by WHO/NREVSS collaborating laboratories 2012 - 2013 season. http://www.cdc.gov/flu/weekly/weeklyarchives2012-2013/data/whoAllregt35.htm/, 2013. {Online; accessed 10-September-2013}.Google ScholarGoogle Scholar
  8. V. W. Chu, R. K. Wong, C.-H. Chi, and P. C. K. Hung. Web service orchestration topic mining. In Web Services (ICWS), 2014 IEEE 21st International Conference on. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. F. C. Coelho, C. T. Codeço, and M. G. M. Gomes. A Bayesian framework for parameter estimation in dynamical models. PloS one, 6(5):e19616, 2011.Google ScholarGoogle ScholarCross RefCross Ref
  10. A. Cui, M. Zhang, Y. Liu, S. Ma, and K. Zhang. Discover breaking events with popular hashtags in twitter. In Proceedings of the 21st ACM international conference on Information and knowledge management, pages 1794--1798. ACM, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. K. Dietz. Epidemics and rumours: a survey. Journal of the Royal Statistical Society. Series A (General), pages 505--528, 1967.Google ScholarGoogle ScholarCross RefCross Ref
  12. J. N. Eisenberg, M. A. Brookhart, G. Rice, M. Brown, and J. M. Colford Jr. Disease transmission models for public health decision making: analysis of epidemic and endemic conditions caused by waterborne pathogens. Environmental Health Perspectives, 110(8):783, 2002.Google ScholarGoogle ScholarCross RefCross Ref
  13. E. Fagiolini and C. Gruber. Entropy-based method for optimal temporal and spatial resolution of gravity field variations. In A. Abbasi and N. Giesen, editors, EGU General Assembly Conference Abstracts, volume 14 of EGU General Assembly Conference Abstracts, page 8916, Apr. 2012.Google ScholarGoogle Scholar
  14. N. M. Ferguson, D. A. Cummings, C. Fraser, J. C. Cajka, P. C. Cooley, and D. S. Burke. Strategies for mitigating an inuenza pandemic. Nature, 442 (7101):448--452, 2006.Google ScholarGoogle ScholarCross RefCross Ref
  15. B. Fuglede and F. Topsoe. Jensen-Shannon divergence and hilbert space embedding. In Information Theory, 2004. ISIT 2004. Proceedings. International Symposium on, page 31. IEEE, 2004.Google ScholarGoogle Scholar
  16. R. Gani and S. Leach. Transmission potential of smallpox in contemporary populations. Nature, 414(6865):748--751, 2001.Google ScholarGoogle ScholarCross RefCross Ref
  17. M. Gomez Rodriguez, J. Leskovec, and A. Krause. Inferring networks of diffusion and inuence. In Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 1019--1028. ACM, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. S. Goorha and L. Ungar. Discovery of significant emerging trends. In Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 57--64. ACM, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. J. Han, W. Gong, and Y. Yin. Mining segment-wise periodic patterns in time-related databases. In KDD, pages 214--218, 1998.Google ScholarGoogle Scholar
  20. T. Harko, F. S. Lobo, and M. Mak. Exact analytical solutions of the susceptible-infected-recovered (SIR) epidemic model and of the sir model with equal death and birth rates. Applied Mathematics and Computation, 236:184--194, 2014.Google ScholarGoogle ScholarCross RefCross Ref
  21. P. Harremoës. Binomial and poisson distributions as maximum entropy distributions. Information Theory, IEEE Transactions on, 47(5):2039--2041, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. H. W. Hethcote. The mathematics of infectious diseases. SIAM review, 42(4):599--653, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. M. Hoffman, F. R. Bach, and D. M. Blei. Online learning for latent dirichlet allocation. In advances in neural information processing systems, pages 856--864, 2010.Google ScholarGoogle Scholar
  24. L. Hong and B. D. Davison. Empirical study of topic modeling in twitter. In Proceedings of the First Workshop on Social Media Analytics, pages 80--88. ACM, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. T. House. Modelling epidemics on networks. Contemporary Physics, 53(3):213--225, 2012.Google ScholarGoogle ScholarCross RefCross Ref
  26. E. T. Jaynes. Prior probabilities. Systems Science and Cybernetics, IEEE Transactions on, 4(3):227--241, 1968.Google ScholarGoogle Scholar
  27. M. J. Keeling and P. Rohani. Modeling infectious diseases in humans and animals. Princeton University Press, 2008.Google ScholarGoogle Scholar
  28. W. O. Kermack and A. G. McKendrick. Contributions to the mathematical theory of epidemics. part I. In Proceedings of the Royal society of London. Series A, volume 115, pages 700--721, 1927.Google ScholarGoogle Scholar
  29. W. O. Kermack and A. G. McKendrick. Contributions to the mathematical theory of epidemics. II. the problem of endemicity. Proceedings of the Royal society of London. Series A, 138(834):55--83, 1932.Google ScholarGoogle ScholarCross RefCross Ref
  30. W. O. Kermack and A. G. McKendrick. Contributions to the mathematical theory of epidemics. III. further studies of the problem of endemicity. Proceedings of the Royal Society of London. Series A, 141(843):94--122, 1933.Google ScholarGoogle ScholarCross RefCross Ref
  31. J. Lehmann, B. Gonçalves, J. J. Ramasco, and C. Cattuto. Dynamical classes of collective attention in twitter. In Proceedings of the 21st international conference on World Wide Web, pages 251--260. ACM, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. C. A. Lin. Communicator in chief: How Barak Obama used new media technology to win the White House. Journal of Broadcasting & Electronic Media, 55(2):271--272, 2011.Google ScholarGoogle ScholarCross RefCross Ref
  33. M. Lipsitch, T. Cohen, B. Cooper, J. M. Robins, S. Ma, L. James, G. Gopalakrishna, S. K. Chew, C. C. Tan, M. H. Samore, et al. Transmission dynamics and control of severe acute respiratory syndrome. science, 300(5627):1966--1970, 2003.Google ScholarGoogle Scholar
  34. R. Mehrotra, S. Sanner, W. Buntine, and L. Xie. Improving LDA topic models for microblogs via tweet pooling and automatic labeling. In Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval, SIGIR '13, pages 889--892, New York, NY, USA, 2013. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. D. Milajevs and G. Bouma. Real time discussion retrieval from twitter. In Proceedings of the 22nd international conference on World Wide Web companion, pages 795--800. International World Wide Web Conferences Steering Committee, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. C. E. Mills, J. M. Robins, and M. Lipsitch. Transmissibility of 1918 pandemic inuenza. Nature, 432(7019):904--906, 2004.Google ScholarGoogle ScholarCross RefCross Ref
  37. D. Pittet, B. Allegranzi, H. Sax, S. Dharan, C. L. Pessoa-Silva, L. Donaldson, and J. M. Boyce. Evidence-based model for hand transmission during patient care and the role of improved practices. The Lancet infectious diseases, 6(10):641--652, 2006.Google ScholarGoogle Scholar
  38. D. Poole and A. E. Raftery. Inference for deterministic simulation models: the Bayesian melding approach. Journal of the American Statistical Association, 95(452):1244--1255, 2000.Google ScholarGoogle ScholarCross RefCross Ref
  39. D. Ramage, D. Hall, R. Nallapati, and C. D. Manning. Labeled LDA: A supervised topic model for credit attribution in multi-labeled corpora. In Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 1-Volume 1, pages 248--256. Association for Computational Linguistics, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. D. Ramage, C. D. Manning, and S. Dumais. Partially labeled topic models for interpretable text mining. In Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 457--465. ACM, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. P. Rangachari. Evidence-based medicine: old french wine with a new canadian label? Journal of the Royal Society of Medicine, 90(5):280, 1997.Google ScholarGoogle ScholarCross RefCross Ref
  42. D. M. Romero, B. Meeder, and J. Kleinberg. Differences in the mechanics of information diffusion across topics: idioms, political hashtags, and complex contagion on twitter. In Proceedings of the 20th international conference on World wide web, pages 695--704. ACM, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. D. M. Scott. The new rules of marketing and PR: how to use social media, blogs, news releases, online video, and viral marketing to reach buyers directly. Wiley.com, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. X. Shuai, S. Chen, Y. Ding, Y. Sun, J. Busemeyer, and J. Tang. There is more than complex contagion: an indirect inuence analysis on twitter. In Proceedings of the ACM SIGKDD Workshop on Mining Data Semantics, page 4. ACM, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. G. Smith. Models of Mycobacterium bovis in wildlife and cattle. Tuberculosis, 81(1):51--64, 2001.Google ScholarGoogle ScholarCross RefCross Ref
  46. D. J. Spiegelhalter, N. G. Best, B. P. Carlin, and A. Van Der Linde. Bayesian measures of model complexity and fit. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 64(4):583--639, 2002.Google ScholarGoogle Scholar
  47. J. Wallinga and P. Teunis. Different epidemic curves for severe acute respiratory syndrome reveal similar impacts of control measures. American Journal of Epidemiology, 160(6):509--516, 2004.Google ScholarGoogle ScholarCross RefCross Ref
  48. W. X. Zhao, J. Jiang, J. Weng, J. He, E.-P. Lim, H. Yan, and X. Li. Comparing twitter and traditional media using topic models. In Advances in Information Retrieval, pages 338--349. Springer, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Microblog Topic Contagiousness Measurement and Emerging Outbreak Monitoring

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image ACM Conferences
          CIKM '14: Proceedings of the 23rd ACM International Conference on Conference on Information and Knowledge Management
          November 2014
          2152 pages
          ISBN:9781450325981
          DOI:10.1145/2661829

          Copyright © 2014 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 3 November 2014

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article

          Acceptance Rates

          CIKM '14 Paper Acceptance Rate175of838submissions,21%Overall Acceptance Rate1,861of8,427submissions,22%

          Upcoming Conference

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader