Skip to main content
Log in

Cross-lingual event-centered news clustering based on elements semantic correlations of different news

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

Cross-lingual event-centered news clustering aims to perform the clustering of news documents written in different languages into groups of documents that describe the same event. In order to solve the problem of similarity computation between bi-lingual documents, this paper propose a new method based on semantic correlations of news elements. First, using bilingual entity lexical and terms co-occurrences in news to acquire the semantic correlation of news elements in different language. Then, we compute the similarity between news in different languages using the GVSM model on this basis. Finally, Spectral Clustering is applied to categorize news stories. Experimental results show our method achieves promising results on the F value.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2

Similar content being viewed by others

Notes

  1. http://ictclas.nlpir.org

  2. http://jvntextpro.sourceforge.net/

References

  1. Boyd-Graber J, Blei DM (2012) Multilingual topic models for unaligned text[C]// Conference on uncertainty in artificial intelligence. AUAI Press, pp 75–82

  2. Dhillon IS (2001) Co-clustering documents and words using bipartite spectral graph partitioning[C]// ACM SIGKDD international conference on knowledge discovery & data mining. ACM, pp 269–274

  3. Gabrilovich E, Markovitch S (2007) Computing semantic relatedness using wikipedia-based explicit semantic analysis[C]// International joint conference on artifical intelligence. Morgan Kaufmann Publishers Inc., pp 1606–1611

  4. He X, Zha H, Ding CHQ et al (2002) Web document clustering using hyperlink structures[J]. Comput Stat Data Anal 41(1):19–45

    Article  MathSciNet  MATH  Google Scholar 

  5. Hu X, Zhang X, Lu C, et al. (2009) Exploiting wikipedia as external knowledge for document clustering.[C]// ACM SIGKDD international conference on knowledge discovery and data mining, Paris, France, June 28 – July, pp389–396

  6. Jain AK (1999) Data clustering: a review[J]. ACM Comput Surv 31(3):264–323

    Article  Google Scholar 

  7. Kim YM, Amini MR, Goutte C et al (2010) Multi-view clustering of multilingual documents[C]// International ACM SIGIR conference on research and development in information retrieval. ACM:821–822

  8. Lee DD, Seung HS (1999) Learning the parts of objects by non-negativ matrix factorization[J]. Nature 401(6755):788–791

    Article  MATH  Google Scholar 

  9. Leek T, Jin H, Sista S, et al (2000) The BBN crosslingual topic detection and tracking system[J]. Tdt Evaluation System Summary Papers:894–801

  10. Luxburg UV (2007) A tutorial on spectral clustering[J]. Stat Comput 17(17):395–416

    Article  MathSciNet  Google Scholar 

  11.  Mathieu B,  Besançon R, Fluhr C. (2004) Multilingual document clusters discovery[C]// Computer-assisted information retrieval, pp 116–125

  12. Miller GA (1995) WordNet: a lexical database for English[J]. Commun ACM 38(11):39–41

    Article  Google Scholar 

  13. Mimno D, Wallach H M, Naradowsky J, et al. (2009) Polylingual topic models.[C]// Conference on empirical methods in natural language processing, EMNLP 2009, 6-7 August 2009, Singapore, a meeting of Sigdat, a special interest group of the ACL, pp 880–889

  14. Navigli R, Ponzetto SP (2012) BabelNet: the automatic construction, evaluation and application of a wide-coverage multilingual semantic network[J]. Artif Intell 193(6):217–250

    Article  MathSciNet  MATH  Google Scholar 

  15. Ni X, Sun J T, Hu J, et al. (2009) Mining multilingual topics from Wikipedia[C]// International conference on world wide web, WWW 2009, Madrid, Spain, April 2009, pp 1155–1156

  16. Pouliquen B, Steinberger R, Ignat C, et al. (2004) Multilingual and cross-lingual news topic tracking[C]// International conference on computational linguistics ralf, pp 20–23

  17. Romeo S, Tagarelli A, Ienco D. (2014) Semantic-based multilingual document clustering via tensor modeling[C]// EMNLP

  18. Tang G, Xia Y, Cambria E et al (2014) Document representation with statistical word senses in cross-lingual document clustering[J]. Int J Pattern Recognit Artif Intell 29(2):1–2

    MathSciNet  Google Scholar 

  19. Weia CP (2008) A latent semantic indexing-based approach to multilingual document clustering[J]. Decis Support Syst 45(3):606–620

    Article  Google Scholar 

  20. Xu W, Liu X, Gong Y. (2003) Document clustering based on non-negative matrix factorization[C]// International ACM SIGIR conference on research and development in informaion retrieval. ACM, pp 267–273

  21. Yogatama D, Tanaka-Ishii K (2009) Multilingual spectral clustering using document similarity propagation[C]// Conference on empirical methods in natural language processing: volume. Association for computational linguistics, pp 871–879

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Zhengtao Yu.

Additional information

Supported by the National Natural Science Foundation of China (Grant No. 61175068).

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Hong, X., Yu, Z., Tang, M. et al. Cross-lingual event-centered news clustering based on elements semantic correlations of different news. Multimed Tools Appl 76, 25129–25143 (2017). https://doi.org/10.1007/s11042-017-4838-z

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-017-4838-z

Keywords

Navigation