Abstract
Cross-lingual event-centered news clustering aims to perform the clustering of news documents written in different languages into groups of documents that describe the same event. In order to solve the problem of similarity computation between bi-lingual documents, this paper propose a new method based on semantic correlations of news elements. First, using bilingual entity lexical and terms co-occurrences in news to acquire the semantic correlation of news elements in different language. Then, we compute the similarity between news in different languages using the GVSM model on this basis. Finally, Spectral Clustering is applied to categorize news stories. Experimental results show our method achieves promising results on the F value.
Similar content being viewed by others
References
Boyd-Graber J, Blei DM (2012) Multilingual topic models for unaligned text[C]// Conference on uncertainty in artificial intelligence. AUAI Press, pp 75–82
Dhillon IS (2001) Co-clustering documents and words using bipartite spectral graph partitioning[C]// ACM SIGKDD international conference on knowledge discovery & data mining. ACM, pp 269–274
Gabrilovich E, Markovitch S (2007) Computing semantic relatedness using wikipedia-based explicit semantic analysis[C]// International joint conference on artifical intelligence. Morgan Kaufmann Publishers Inc., pp 1606–1611
He X, Zha H, Ding CHQ et al (2002) Web document clustering using hyperlink structures[J]. Comput Stat Data Anal 41(1):19–45
Hu X, Zhang X, Lu C, et al. (2009) Exploiting wikipedia as external knowledge for document clustering.[C]// ACM SIGKDD international conference on knowledge discovery and data mining, Paris, France, June 28 – July, pp389–396
Jain AK (1999) Data clustering: a review[J]. ACM Comput Surv 31(3):264–323
Kim YM, Amini MR, Goutte C et al (2010) Multi-view clustering of multilingual documents[C]// International ACM SIGIR conference on research and development in information retrieval. ACM:821–822
Lee DD, Seung HS (1999) Learning the parts of objects by non-negativ matrix factorization[J]. Nature 401(6755):788–791
Leek T, Jin H, Sista S, et al (2000) The BBN crosslingual topic detection and tracking system[J]. Tdt Evaluation System Summary Papers:894–801
Luxburg UV (2007) A tutorial on spectral clustering[J]. Stat Comput 17(17):395–416
Mathieu B, Besançon R, Fluhr C. (2004) Multilingual document clusters discovery[C]// Computer-assisted information retrieval, pp 116–125
Miller GA (1995) WordNet: a lexical database for English[J]. Commun ACM 38(11):39–41
Mimno D, Wallach H M, Naradowsky J, et al. (2009) Polylingual topic models.[C]// Conference on empirical methods in natural language processing, EMNLP 2009, 6-7 August 2009, Singapore, a meeting of Sigdat, a special interest group of the ACL, pp 880–889
Navigli R, Ponzetto SP (2012) BabelNet: the automatic construction, evaluation and application of a wide-coverage multilingual semantic network[J]. Artif Intell 193(6):217–250
Ni X, Sun J T, Hu J, et al. (2009) Mining multilingual topics from Wikipedia[C]// International conference on world wide web, WWW 2009, Madrid, Spain, April 2009, pp 1155–1156
Pouliquen B, Steinberger R, Ignat C, et al. (2004) Multilingual and cross-lingual news topic tracking[C]// International conference on computational linguistics ralf, pp 20–23
Romeo S, Tagarelli A, Ienco D. (2014) Semantic-based multilingual document clustering via tensor modeling[C]// EMNLP
Tang G, Xia Y, Cambria E et al (2014) Document representation with statistical word senses in cross-lingual document clustering[J]. Int J Pattern Recognit Artif Intell 29(2):1–2
Weia CP (2008) A latent semantic indexing-based approach to multilingual document clustering[J]. Decis Support Syst 45(3):606–620
Xu W, Liu X, Gong Y. (2003) Document clustering based on non-negative matrix factorization[C]// International ACM SIGIR conference on research and development in informaion retrieval. ACM, pp 267–273
Yogatama D, Tanaka-Ishii K (2009) Multilingual spectral clustering using document similarity propagation[C]// Conference on empirical methods in natural language processing: volume. Association for computational linguistics, pp 871–879
Author information
Authors and Affiliations
Corresponding author
Additional information
Supported by the National Natural Science Foundation of China (Grant No. 61175068).
Rights and permissions
About this article
Cite this article
Hong, X., Yu, Z., Tang, M. et al. Cross-lingual event-centered news clustering based on elements semantic correlations of different news. Multimed Tools Appl 76, 25129–25143 (2017). https://doi.org/10.1007/s11042-017-4838-z
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-017-4838-z