Abstract
Information spread on networks can be efficiently modeled by considering three features: documents’ content, time of publication relative to other publications, and position of the spreader in the network. Most previous works model up to two of those jointly, or rely on heavily parametric approaches. Building on recent Dirichlet-Point processes literature, we introduce the Houston (Hidden Online User-Topic Network) model, that jointly considers all those features in a non-parametric unsupervised framework. It infers dynamic topic-dependent underlying diffusion networks in a continuous-time setting along with said topics. It is unsupervised; it considers an unlabeled stream of triplets shaped as (time of publication, information’s content, spreading entity) as input data. Online inference is conducted using a sequential Monte-Carlo algorithm that scales linearly with the size of the dataset. Our approach yields consequent improvements over existing baselines on both cluster recovery and subnetworks inference tasks.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Adamic, L.A., Glance, N.: The political blogosphere and the 2004 U.S. election: Divided they blog. In: Proceedings of the 3rd International Workshop on Link Discovery, pp. 36–43. LinkKDD 2005, Association for Computing Machinery, New York, NY, USA (2005)
Albert, R., Barabási, A.L.: Statistical mechanics of complex networks. Rev. Mod. Phys. 74, 47–97 (2002)
AlSumait, L., Barbará, D., Domeniconi, C.: On-line lda: adaptive topic models for mining text streams with applications to topic detection and tracking, pp. 3–12 (2008). https://doi.org/10.1109/ICDM.2008.140
Barbieri, N., Manco, G., Ritacco, E.: Survival factorization on diffusion networks. In: Machine Learning and Knowledge Discovery in Databases, pp. 684–700 (2017). https://doi.org/10.1007/978-3-319-71249-9_41
Bassiou, N.K., Kotropoulos, C.L.: Online plsa: Batch updating techniques including out-of-vocabulary words. IEEE Trans. Neural Netw. Learn. Syst. 25(11), 1953–1966 (2014). https://doi.org/10.1109/TNNLS.2014.2299806
Blei, D.M., Lafferty, J.D.: Dynamic topic models. In: Proceedings of the 23rd International Conference on Machine Learning. p. 113–120. ICML 2006, Association for Computing Machinery, New York, NY, USA (2006). https://doi.org/10.1145/1143844.1143859
Choudhari, J., Dasgupta, A., Bhattacharya, I., Bedathur, S.: Discovering topical interactions in text-based cascades using hidden markov hawkes processes, pp. 923–928 (2018). https://doi.org/10.1109/ICDM.2018.00112
Du, N., Song, L., Smola, A., Yuan, M.: Learning networks of heterogeneous influence. In: NIPS, vol. 4, pp. 2780–2788, January 2012
Du, N., Farajtabar, M., Ahmed, A., Smola, A., Song, L.: Dirichlet-hawkes processes with applications to clustering continuous-time document streams. In: 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2015). https://doi.org/10.1145/2783258.2783411
Du, N., Song, L., Woo, H., Zha, H.: Uncover topic-sensitive information diffusion networks. In: Proceedings of the Sixteenth International Conference on Artificial Intelligence and Statistics, AISTATS. JMLR Workshop and Conference Proceedings, vol. 31, pp. 229–237. JMLR.org (2013)
Erdős, P., Rényi, A.: On the evolution of random graphs. In: Publication of The Mathematical Institute of The Hungarian Academy of Sciences, pp. 17–61 (1960)
Gomez-Rodriguez, M., Balduzzi, D., Schölkopf, B.: Uncovering the temporal dynamics of diffusion networks. In: ICML, pp. 561–568 (2011)
Gomez-Rodriguez, M., Leskovec, J., Schoelkopf, B.: Structure and dynamics of information pathways in online media. In: WSDM (2013)
Gomez-Rodriguez, M., Leskovec, J., Schölkopf, B.: Modeling information propagation with survival theory. In: ICML, vol. 28, p. III-666–III-674 (2013)
He, X., Rekatsinas, T., Foulds, J.R., Getoor, L., Liu, Y.: Hawkestopic: a joint model for network inference and topic modeling from text-based cascades. In: ICML (2015)
Larremore, D., Carpenter, M., Ott, E., Restrepo, J.: Statistical properties of avalanches in networks. Phys. Rev. E 85, 066131 (2012). https://doi.org/10.1103/PhysRevE.85.066131
Leskovec, J., Backstrom, L., Kleinberg, J.: Meme-tracking and the dynamics of the news cycle. In: Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 497–506. KDD 2009, Association for Computing Machinery, New York, NY, USA (2009). https://doi.org/10.1145/1557019.1557077
Mavroforakis, C., Valera, I., Gomez-Rodriguez, M.: Modeling the dynamics of learning activity on the web. In: Proceedings of the 26th International Conference on World Wide Web, pp. 1421–1430. WWW 2017 (2017)
Mei, Q., Fang, H., Zhai, C.: A study of poisson query generation model for information retrieval, pp. 319–326 (2007). https://doi.org/10.1145/1277741.1277797
Myers, S.A., Zhu, C., Leskovec, J.: Information diffusion and external influence in networks. In: Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 33–41. KDD 2012, Association for Computing Machinery, New York, NY, USA (2012). https://doi.org/10.1145/2339530.2339540
Nickel, M., Le, M.: Modeling sparse information diffusion at scale via lazy multivariate hawkes processes. In: Proceedings of the Web Conference 2021, pp 706–717. WWW 2021, Association for Computing Machinery, New York, NY, USA (2021). https://doi.org/10.1145/3442381.3450094
Poux-Médard, G., Pastor-Satorras, R., Castellano, C.: Influential spreaders for recurrent epidemics on networks. Phys. Rev. Res. 2, 023332 (2020). https://doi.org/10.1103/PhysRevResearch.2.023332
Poux-Médard, G., Velcin, J., Loudcher, S.: Powered hawkes-dirichlet process: challenging textual clustering using a flexible temporal prior. In: 2021 IEEE International Conference on Data Mining (ICDM), pp. 509–518 (2021)
Poux-Médard, G., Velcin, J., Loudcher, S.: Multivariate powered dirichlet-hawkes process. In: ECIR (2023)
Poux-Médard, G., Velcin, J., Loudcher, S.: Powered dirichlet process for controlling the importance of “rich-get-richer” prior assumptions in bayesian clustering. ArXiv (2021)
Suny, P., Li, J., Mao, Y., Zhang, R., Wang, L.: Inferring multiplex diffusion network via multivariate marked hawkes process. ArXiv abs/1809.07688 (2018)
Tan, X., Rao, V.A., Neville, J.: The Indian buffet hawkes process to model evolving latent influences. In: UAI (2018)
Wang, L., Ermon, S., Hopcroft, J.E.: Feature-enhanced probabilistic models for diffusion network inference. In: Flach, P.A., De Bie, T., Cristianini, N. (eds.) ECML PKDD 2012. LNCS (LNAI), vol. 7524, pp. 499–514. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-33486-3_32
Yang, S.H., Zha, H.: Mixture of mutually exciting processes for viral diffusion. In: Dasgupta, S., McAllester, D. (eds.) Proceedings of the 30th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 28, pp. 1–9 (2013)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Poux-Médard, G., Velcin, J., Loudcher, S. (2023). Dirichlet-Survival Process: Scalable Inference of Topic-Dependent Diffusion Networks. In: Kamps, J., et al. Advances in Information Retrieval. ECIR 2023. Lecture Notes in Computer Science, vol 13981. Springer, Cham. https://doi.org/10.1007/978-3-031-28238-6_47
Download citation
DOI: https://doi.org/10.1007/978-3-031-28238-6_47
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-28237-9
Online ISBN: 978-3-031-28238-6
eBook Packages: Computer ScienceComputer Science (R0)