skip to main content
10.1145/2783258.2783338acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
research-article

Simultaneous Discovery of Common and Discriminative Topics via Joint Nonnegative Matrix Factorization

Published:10 August 2015Publication History

ABSTRACT

Understanding large-scale document collections in an efficient manner is an important problem. Usually, document data are associated with other information (e.g., an author's gender, age, and location) and their links to other entities (e.g., co-authorship and citation networks). For the analysis of such data, we often have to reveal common as well as discriminative characteristics of documents with respect to their associated information, e.g., male- vs. female-authored documents, old vs. new documents, etc. To address such needs, this paper presents a novel topic modeling method based on joint nonnegative matrix factorization, which simultaneously discovers common as well as discriminative topics given multiple document sets. Our approach is based on a block-coordinate descent framework and is capable of utilizing only the most representative, thus meaningful, keywords in each topic through a novel pseudo-deflation approach. We perform both quantitative and qualitative evaluations using synthetic as well as real-world document data sets such as research paper collections and nonprofit micro-finance data. We show our method has a great potential for providing in-depth analyses by clearly identifying common and discriminative topics among multiple document sets.

Skip Supplemental Material Section

Supplemental Material

p567.mp4

mp4

209.9 MB

References

  1. S. Al-Stouhi and C. K. Reddy. Multi-task clustering using constrained symmetric non-negative matrix factorization. In Proc. SIAM International Conference on Data Mining (SDM), pages 785--793, 2014.Google ScholarGoogle ScholarCross RefCross Ref
  2. S. Arora, R. Ge, Y. Halpern, D. Mimno, A. Moitra, D. Sontag, Y. Wu, and M. Zhu. A practical algorithm for topic modeling with provable guarantees. Journal of Machine Learning Research (JMLR), 28(2):280--288, 2013.Google ScholarGoogle Scholar
  3. S. Arora, R. Ge, R. Kannan, and A. Moitra. Computing a nonnegative matrix factorization -- provably. In Proc. the 44th Symposium on Theory of Computing (STOC), pages 145--162, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. L. Badea. Extracting gene expression profiles common to colon and pancreatic adenocarcinoma using simultaneous nonnegative matrix factorization. In Proc. the Pacific Symposium on Biocomputing, pages 267--278, 2008.Google ScholarGoogle Scholar
  5. D. M. Blei, A. Y. Ng, and M. I. Jordan. Latent dirichlet allocation. Journal of Machine Learning Research (JMLR), 3:993--1022, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. J. Choo, C. Lee, D. Lee, H. Zha, and H. Park. Understanding and promoting micro-finance activities in kiva.org. In Proc. the 7th ACM International Conference on Web Search and Data Mining (WSDM), pages 583--592, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. J. Choo, C. Lee, C. K. Reddy, and H. Park. UTOPIAN: User-driven topic modeling based on interactive nonnegative matrix factorization. IEEE Transactions on Visualization and Computer Graphics (TVCG), 19(12):1992--2001, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. J. Choo, D. Lee, B. Dilkina, H. Zha, and H. Park. A better world for all: Understanding and leveraging communities in micro-lending recommendation. In Proc. the International Conference on World Wide Web (WWW), pages 249--260, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. A. Cichocki, R. Zdunek, and S.-i. Amari. Hierarchical ALS algorithms for nonnegative matrix and 3d tensor factorization. In Independent Component Analysis and Signal Separation, pages 169--176. Springer, 2007. Google ScholarGoogle ScholarCross RefCross Ref
  10. P. Dao, K. Wang, C. Collins, M. Ester, A. Lapuk, and S. C. Sahinalp. Optimally discriminative subnetwork markers predict response to chemotherapy. Bioinformatics, 27(13):i205--i213, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. J.-Y. Delort and E. Alfonseca. DualSum: a topic-model based approach for update summarization. In Proc. the 13th Conference of the European Chapter of the Association for Computational Linguistics, pages 214--223, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. I. S. Dhillon and S. Sra. Generalized nonnegative matrix approximations with bregman divergences. In Advances in Neural Information Processing Systems (NIPS), pages 283--290, 2005.Google ScholarGoogle Scholar
  13. G. Dong and J. Bailey. Contrast Data Mining: Concepts, Algorithms, and Applications. CRC Press, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. G. H. Golub and C. F. van Loan. Matrix Computations, third edition. Johns Hopkins University Press, Baltimore, 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. S. K. Gupta, D. Phung, B. Adams, and S. Venkatesh. Regularized nonnegative shared subspace learning. Data mining and knowledge discovery (DMKD), 26(1):57--97, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. T. Hofmann. Probabilistic latent semantic indexing. In Proc. the 22nd Annual International ACM SIGIR conference on Research and Development in Information Retrieval (SIGIR), pages 50--57, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. H. Kim and H. Park. Nonnegative matrix factorization based on alternating nonnegativity constrained least squares and active set method. SIAM Journal on Matrix Analysis and Applications, 30(2):713--730, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. J. Kim, Y. He, and H. Park. Algorithms for nonnegative matrix and tensor factorizations: A unified view based on block coordinate descent framework. Journal of Global Optimization, 58(2):285--319, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. J. Kim, R. D. Monteiro, and H. Park. Group sparsity in nonnegative matrix factorization. In Proc. the 2012 SIAM International Conference on Data Mining (SDM), pages 851--862, 2012.Google ScholarGoogle ScholarCross RefCross Ref
  20. J. Kim and H. Park. Fast nonnegative matrix factorization: An active-set-like method and comparisons. SIAM Journal on Scientific Computing, 33(6):3261--3281, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. H. W. Kuhn. The hungarian method for the assignment problem. Naval Research Logistics Quarterly, 2(1--2):83--97, 1955.Google ScholarGoogle Scholar
  22. S. Lacoste-Julien, F. Sha, and M. I. Jordan. DiscLDA: Discriminative learning for dimensionality reduction and classification. In Advances in Neural Information Processing Systems (NIPS), pages 897--904. 2008.Google ScholarGoogle Scholar
  23. D. D. Lee and H. S. Seung. Algorithms for non-negative matrix factorization. In Advances in Neural Information Processing Systems (NIPS) 13, pages 556--562, 2000.Google ScholarGoogle Scholar
  24. L. Li, G. Lebanon, and H. Park. Fast bregman divergence nmf using taylor expansion and coordinate descent. In Proc. the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD), pages 307--315, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. J. Liu, C. Wang, J. Gao, and J. Han. Multi-View clustering via joint nonnegative matrix factorizations. In Proc. the 2013 SIAM International Conference on Data Mining (SDM), pages 252--260, 2013.Google ScholarGoogle ScholarCross RefCross Ref
  26. C. D. Manning, P. Raghavan, and H. Schütze. Introduction to information retrieval, volume 1. Cambridge University Press Cambridge, 2008. Google ScholarGoogle ScholarCross RefCross Ref
  27. O. Odibat and C. K. Reddy. Efficient mining of discriminative co-clusters from gene expression data. Knowledge and Information Systems, 41(3):667--696, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. V. P. Pauca, F. Shahnaz, M. W. Berry, and R. J. Plemmons. Text mining using non-negative matrix factorizations. In Proc. SIAM International Conference on Data Mining (SDM), pages 452--456, 2004.Google ScholarGoogle ScholarCross RefCross Ref
  29. A. P. Singh and G. J. Gordon. Relational learning via collective matrix factorization. In Proc. the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD), pages 650--658, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. W. Xu, X. Liu, and Y. Gong. Document clustering based on non-negative matrix factorization. In Proc. the 26th Annual International ACM SIGIR conference on Research and Development in Information Retrieval (SIGIR), pages 267--273, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. J. Zhu, A. Ahmed, and E. P. Xing. MedLDA: Maximum margin supervised topic models for regression and classification. In Proc. the 26th Annual International Conference on Machine Learning (ICML), pages 1257--1264, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Simultaneous Discovery of Common and Discriminative Topics via Joint Nonnegative Matrix Factorization

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        KDD '15: Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
        August 2015
        2378 pages
        ISBN:9781450336642
        DOI:10.1145/2783258

        Copyright © 2015 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 10 August 2015

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article

        Acceptance Rates

        KDD '15 Paper Acceptance Rate160of819submissions,20%Overall Acceptance Rate1,133of8,635submissions,13%

        Upcoming Conference

        KDD '24

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader