ABSTRACT
A recent study on collective attention in Twitter shows that an epidemic spreading of hashtags is predominantly driven by external factors. We extend a time-series form of susceptible-infectious-recovered (SIR) model to monitor microblog emerging outbreaks by considering both endogenous and exogenous drivers. In addition, we adopt partially labeled Dirichlet allocation (PLDA) model to generate both background latent topics and hashtag topics. It overcomes the problem of small available samples in hashtag analysis by including related but unlabeled tweets through inference. We standardize hashtag topic contagiousness measure as the estimated effective-reproduction-number R derived from epidemiology. It is obtained by Bayesian parameter estimation. Guided by R, one can profile and categorize emerging topics, and generate alerts on potential outbreaks. Experiment results confirm the effectiveness of this approach.
- R. Anderson and R. May. Infectious Diseases of Humans: Dynamics and Control. Oxford science publications. OUP Oxford, 1992.Google Scholar
- M. Baker, A. McNicholas, N. Garrett, N. Jones, J. Stewart, V. Koberstein, and D. Lennon. Household crowding a major risk factor for epidemic meningococcal disease in auckland children. The Pediatric infectious disease journal, 19(10):983--990, 2000.Google Scholar
- L. M. Bettencourt and R. M. Ribeiro. Real time bayesian estimation of the epidemic potential of emerging infectious diseases. PLoS One, 3(5):e2185, 2008.Google ScholarCross Ref
- D. M. Blei. Probabilistic topic models. Communications of the ACM, 55(4):77--84, 2012. Google ScholarDigital Library
- D. M. Blei, A. Y. Ng, and M. I. Jordan. Latent dirichlet allocation. the Journal of machine Learning research, 3:993--1022, 2003. Google ScholarDigital Library
- M. Boily, C. Lowndes, and M. Alary. The impact of HIV epidemic phases on the effectiveness of core group interventions: insights from mathematical models. Sexually transmitted infections, 78(suppl 1):i78--i90, 2002.Google Scholar
- Centers for Disease Control and Prevention, USA. Inuenza viruses isolated by WHO/NREVSS collaborating laboratories 2012 - 2013 season. http://www.cdc.gov/flu/weekly/weeklyarchives2012-2013/data/whoAllregt35.htm/, 2013. {Online; accessed 10-September-2013}.Google Scholar
- V. W. Chu, R. K. Wong, C.-H. Chi, and P. C. K. Hung. Web service orchestration topic mining. In Web Services (ICWS), 2014 IEEE 21st International Conference on. Google ScholarDigital Library
- F. C. Coelho, C. T. Codeço, and M. G. M. Gomes. A Bayesian framework for parameter estimation in dynamical models. PloS one, 6(5):e19616, 2011.Google ScholarCross Ref
- A. Cui, M. Zhang, Y. Liu, S. Ma, and K. Zhang. Discover breaking events with popular hashtags in twitter. In Proceedings of the 21st ACM international conference on Information and knowledge management, pages 1794--1798. ACM, 2012. Google ScholarDigital Library
- K. Dietz. Epidemics and rumours: a survey. Journal of the Royal Statistical Society. Series A (General), pages 505--528, 1967.Google ScholarCross Ref
- J. N. Eisenberg, M. A. Brookhart, G. Rice, M. Brown, and J. M. Colford Jr. Disease transmission models for public health decision making: analysis of epidemic and endemic conditions caused by waterborne pathogens. Environmental Health Perspectives, 110(8):783, 2002.Google ScholarCross Ref
- E. Fagiolini and C. Gruber. Entropy-based method for optimal temporal and spatial resolution of gravity field variations. In A. Abbasi and N. Giesen, editors, EGU General Assembly Conference Abstracts, volume 14 of EGU General Assembly Conference Abstracts, page 8916, Apr. 2012.Google Scholar
- N. M. Ferguson, D. A. Cummings, C. Fraser, J. C. Cajka, P. C. Cooley, and D. S. Burke. Strategies for mitigating an inuenza pandemic. Nature, 442 (7101):448--452, 2006.Google ScholarCross Ref
- B. Fuglede and F. Topsoe. Jensen-Shannon divergence and hilbert space embedding. In Information Theory, 2004. ISIT 2004. Proceedings. International Symposium on, page 31. IEEE, 2004.Google Scholar
- R. Gani and S. Leach. Transmission potential of smallpox in contemporary populations. Nature, 414(6865):748--751, 2001.Google ScholarCross Ref
- M. Gomez Rodriguez, J. Leskovec, and A. Krause. Inferring networks of diffusion and inuence. In Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 1019--1028. ACM, 2010. Google ScholarDigital Library
- S. Goorha and L. Ungar. Discovery of significant emerging trends. In Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 57--64. ACM, 2010. Google ScholarDigital Library
- J. Han, W. Gong, and Y. Yin. Mining segment-wise periodic patterns in time-related databases. In KDD, pages 214--218, 1998.Google Scholar
- T. Harko, F. S. Lobo, and M. Mak. Exact analytical solutions of the susceptible-infected-recovered (SIR) epidemic model and of the sir model with equal death and birth rates. Applied Mathematics and Computation, 236:184--194, 2014.Google ScholarCross Ref
- P. Harremoës. Binomial and poisson distributions as maximum entropy distributions. Information Theory, IEEE Transactions on, 47(5):2039--2041, 2001. Google ScholarDigital Library
- H. W. Hethcote. The mathematics of infectious diseases. SIAM review, 42(4):599--653, 2000. Google ScholarDigital Library
- M. Hoffman, F. R. Bach, and D. M. Blei. Online learning for latent dirichlet allocation. In advances in neural information processing systems, pages 856--864, 2010.Google Scholar
- L. Hong and B. D. Davison. Empirical study of topic modeling in twitter. In Proceedings of the First Workshop on Social Media Analytics, pages 80--88. ACM, 2010. Google ScholarDigital Library
- T. House. Modelling epidemics on networks. Contemporary Physics, 53(3):213--225, 2012.Google ScholarCross Ref
- E. T. Jaynes. Prior probabilities. Systems Science and Cybernetics, IEEE Transactions on, 4(3):227--241, 1968.Google Scholar
- M. J. Keeling and P. Rohani. Modeling infectious diseases in humans and animals. Princeton University Press, 2008.Google Scholar
- W. O. Kermack and A. G. McKendrick. Contributions to the mathematical theory of epidemics. part I. In Proceedings of the Royal society of London. Series A, volume 115, pages 700--721, 1927.Google Scholar
- W. O. Kermack and A. G. McKendrick. Contributions to the mathematical theory of epidemics. II. the problem of endemicity. Proceedings of the Royal society of London. Series A, 138(834):55--83, 1932.Google ScholarCross Ref
- W. O. Kermack and A. G. McKendrick. Contributions to the mathematical theory of epidemics. III. further studies of the problem of endemicity. Proceedings of the Royal Society of London. Series A, 141(843):94--122, 1933.Google ScholarCross Ref
- J. Lehmann, B. Gonçalves, J. J. Ramasco, and C. Cattuto. Dynamical classes of collective attention in twitter. In Proceedings of the 21st international conference on World Wide Web, pages 251--260. ACM, 2012. Google ScholarDigital Library
- C. A. Lin. Communicator in chief: How Barak Obama used new media technology to win the White House. Journal of Broadcasting & Electronic Media, 55(2):271--272, 2011.Google ScholarCross Ref
- M. Lipsitch, T. Cohen, B. Cooper, J. M. Robins, S. Ma, L. James, G. Gopalakrishna, S. K. Chew, C. C. Tan, M. H. Samore, et al. Transmission dynamics and control of severe acute respiratory syndrome. science, 300(5627):1966--1970, 2003.Google Scholar
- R. Mehrotra, S. Sanner, W. Buntine, and L. Xie. Improving LDA topic models for microblogs via tweet pooling and automatic labeling. In Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval, SIGIR '13, pages 889--892, New York, NY, USA, 2013. ACM. Google ScholarDigital Library
- D. Milajevs and G. Bouma. Real time discussion retrieval from twitter. In Proceedings of the 22nd international conference on World Wide Web companion, pages 795--800. International World Wide Web Conferences Steering Committee, 2013. Google ScholarDigital Library
- C. E. Mills, J. M. Robins, and M. Lipsitch. Transmissibility of 1918 pandemic inuenza. Nature, 432(7019):904--906, 2004.Google ScholarCross Ref
- D. Pittet, B. Allegranzi, H. Sax, S. Dharan, C. L. Pessoa-Silva, L. Donaldson, and J. M. Boyce. Evidence-based model for hand transmission during patient care and the role of improved practices. The Lancet infectious diseases, 6(10):641--652, 2006.Google Scholar
- D. Poole and A. E. Raftery. Inference for deterministic simulation models: the Bayesian melding approach. Journal of the American Statistical Association, 95(452):1244--1255, 2000.Google ScholarCross Ref
- D. Ramage, D. Hall, R. Nallapati, and C. D. Manning. Labeled LDA: A supervised topic model for credit attribution in multi-labeled corpora. In Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 1-Volume 1, pages 248--256. Association for Computational Linguistics, 2009. Google ScholarDigital Library
- D. Ramage, C. D. Manning, and S. Dumais. Partially labeled topic models for interpretable text mining. In Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 457--465. ACM, 2011. Google ScholarDigital Library
- P. Rangachari. Evidence-based medicine: old french wine with a new canadian label? Journal of the Royal Society of Medicine, 90(5):280, 1997.Google ScholarCross Ref
- D. M. Romero, B. Meeder, and J. Kleinberg. Differences in the mechanics of information diffusion across topics: idioms, political hashtags, and complex contagion on twitter. In Proceedings of the 20th international conference on World wide web, pages 695--704. ACM, 2011. Google ScholarDigital Library
- D. M. Scott. The new rules of marketing and PR: how to use social media, blogs, news releases, online video, and viral marketing to reach buyers directly. Wiley.com, 2009. Google ScholarDigital Library
- X. Shuai, S. Chen, Y. Ding, Y. Sun, J. Busemeyer, and J. Tang. There is more than complex contagion: an indirect inuence analysis on twitter. In Proceedings of the ACM SIGKDD Workshop on Mining Data Semantics, page 4. ACM, 2012. Google ScholarDigital Library
- G. Smith. Models of Mycobacterium bovis in wildlife and cattle. Tuberculosis, 81(1):51--64, 2001.Google ScholarCross Ref
- D. J. Spiegelhalter, N. G. Best, B. P. Carlin, and A. Van Der Linde. Bayesian measures of model complexity and fit. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 64(4):583--639, 2002.Google Scholar
- J. Wallinga and P. Teunis. Different epidemic curves for severe acute respiratory syndrome reveal similar impacts of control measures. American Journal of Epidemiology, 160(6):509--516, 2004.Google ScholarCross Ref
- W. X. Zhao, J. Jiang, J. Weng, J. He, E.-P. Lim, H. Yan, and X. Li. Comparing twitter and traditional media using topic models. In Advances in Information Retrieval, pages 338--349. Springer, 2011. Google ScholarDigital Library
Index Terms
- Microblog Topic Contagiousness Measurement and Emerging Outbreak Monitoring
Recommendations
A personalized hashtag recommendation approach using LDA-based topic model in microblog environment
With wide use of cloud computing technologies, microblog is used more widely for services providing more personal communities by user information sharing, dissemination and acquisition. In Microblog environment, hashtag is used to find messages with a ...
Analysis of Microblog Rumors and Correction Texts for Disaster Situations
IIWAS '13: Proceedings of International Conference on Information Integration and Web-based Applications & ServicesMicroblogging systems such as Twitter have become popular. They are especially useful and helpful for users in disaster situations. Microblogs have facilitated the spread of information of all kinds, even rumors. Rumors block adequate information ...
Bad news travel fast: a content-based analysis of interestingness on Twitter
WebSci '11: Proceedings of the 3rd International Web Science ConferenceOn the microblogging site Twitter, users can forward any message they receive to all of their followers. This is called a retweet and is usually done when users find a message particularly interesting and worth sharing with others. Thus, retweets ...
Comments