ABSTRACT
Considering the complexity of clustering text datasets in terms of informal user generated content and the fact that there are multiple labels for each data point in many informal user generated content datasets, this paper focuses on Non-negative Matrix Factorization (NMF) algorithms for Overlapping Clustering of customer inquiry and review data, which has seldom been discussed in previous literature. We extend the use of Semi-NMF and Convex-NMF to Overlapping Clustering and develop a procedure of applying SemiNMF and Convex-NMF on Overlapping Clustering of text data. The developed procedure is tested based on customer review and inquiry datasets. The results of comparing SemiNMF and Convex-NMF with a baseline model demonstrate that they have advantages over the baseline model, since they do not need to adjust parameters to obtain similarly strong clustering performances. Moreover, we compare different methods of picking labels for generating Overlapping Clustering results from Soft Clustering algorithms, and it is concluded that thresholding by mean method is a simpler and relatively more reliable method compared to maximum n method.
- Amigó, E., Gonzalo, J., Artiles, J., and Verdejo, F. A comparison of extrinsic clustering evaluation metrics based on formal constraints. Information Retrieval 12, 4 (2009), 461--486. Google ScholarDigital Library
- Asri, L. E., Schulz, H., Sharma, S., Zumer, J., Harris, J., Fine, E., Mehrotra, R., and Suleman, K. Frames: A corpus for adding memory to goal-oriented dialogue systems. CoRR abs/arXiv:1704.00057 (2017).Google Scholar
- Bezdek, J., Ehrlich, R., and Full, W. Fcm: The fuzzy c-means clustering algorithm. Computers Geosciences 10, 2-3 (1984), 191--203.Google ScholarCross Ref
- Ding, C. H. Q., Li, T., and Jordan, M. I. Convex and semi-nonnegative matrix factorizations. IEEE Trans. Pattern Anal. Mach. Intell. 32, 1 (Jan. 2010), 45--55. Google ScholarDigital Library
- Lazar, C., and Doncescu, A. Non negative matrix factorization clustering capabilities; application on multivariate image segmentation. In CISIS (2009), L. Barolli, F. Xhafa, and H.-H. Hsu, Eds., IEEE Computer Society, pp. 924--929.Google Scholar
- Lee, D. D., and Seung, H. S. Learning the parts of objects by non-negative matrix factorization. Nature 401, 788--791.Google ScholarCross Ref
- Lee, D. D., and Seung, H. S. Algorithms for non-negative matrix factorization. In NIPS (2000), MIT Press, pp. 556--562. Google ScholarDigital Library
- McAuley, J., and Leskovec, J. Hidden factors and hidden topics: Understanding rating dimensions with review text. In Proceedings of the 7th ACM Conference on Recommender Systems (New York, NY, USA, 2013), RecSys '13, ACM, pp. 165--172. Google ScholarDigital Library
- Pennington, J., Socher, R., and Manning, C. D. Glove word embedding, 2013. https://nlp.stanford.edu/projects/glove/.Google Scholar
- Pennington, J., Socher, R., and Manning, C. D. Glove: Global vectors for word representation. In Empirical Methods in Natural Language Processing (EMNLP) (2014), pp. 1532--1543.Google ScholarCross Ref
- Ratcliff, J. W., and Metzener, D. E. Pattern matching: The gestalt approach. 46, 47, 59--51, 68--72.Google Scholar
- Shahnaz, F., Berry, M. W., Pauca, V. P., and Plemmons, R. J. Document clustering using nonnegative matrix factorization. Inf. Process. Manage. 42, 2 (Mar. 2006), 373--386. Google ScholarDigital Library
- Thorndike, R. L. Who belongs in the family. Psychometrika (1953), 267--276.Google Scholar
- Xu, W., Liu, X., and Gong, Y. Document clustering based on non-negative matrix factorization. In Proceedings of the 26th Annual International ACM SIGIR Conference on Research and Development in Informaion Retrieval (New York, NY, USA, 2003), SIGIR 2003, ACM, pp. 267--273. Google ScholarDigital Library
- Yelp. Yelp review dataset, 2013. https://www.yelp.com/dataset challenge.Google Scholar
- Zhao, Y., and Karypis, G. Criterion functions for document clustering: Experiments and analysis. In Technical Report TR 01-40, Department of Computer Science, University of Minnesota, Minneapolis, MN (2001).Google Scholar
Index Terms
- Non-negative Matrix Factorization for Overlapping Clustering of Customer Inquiry and Review Data
Recommendations
Non-negative matrix factorization for semi-supervised data clustering
Traditional clustering algorithms are inapplicable to many real-world problems where limited knowledge from domain experts is available. Incorporating the domain knowledge can guide a clustering algorithm, consequently improving the quality of ...
An improved overlapping k-means clustering method for medical applications
The sensitivity of overlapping k-means algorithm to initialization is considered.The k-harmonic means method is effective for identifying initial cluster centroids.The proposed approach outperforms the original overlapping k-means algorithm. Data ...
Document Clustering Based on Spectral Clustering and Non-negative Matrix Factorization
IEA/AIE '08: Proceedings of the 21st international conference on Industrial, Engineering and Other Applications of Applied Intelligent Systems: New Frontiers in Applied Artificial IntelligenceIn this paper, we propose a novel non-negative matrix factorization (NMF) to the affinity matrix for document clustering, which enforces non-negativity and orthogonality constraints simultaneously. With the help of orthogonality constraints, this NMF ...
Comments