DOI QR코드

DOI QR Code

Multi-Document Summarization Method of Reviews Using Word Embedding Clustering

워드 임베딩 클러스터링을 활용한 리뷰 다중문서 요약기법

  • Received : 2021.06.28
  • Accepted : 2021.07.14
  • Published : 2021.11.30

Abstract

Multi-document refers to a document consisting of various topics, not a single topic, and a typical example is online reviews. There have been several attempts to summarize online reviews because of their vast amounts of information. However, collective summarization of reviews through existing summary models creates a problem of losing the various topics that make up the reviews. Therefore, in this paper, we present method to summarize the review with minimal loss of the topic. The proposed method classify reviews through processes such as preprocessing, importance evaluation, embedding substitution using BERT, and embedding clustering. Furthermore, the classified sentences generate the final summary using the trained Transformer summary model. The performance evaluation of the proposed model was compared by evaluating the existing summary model, seq2seq model, and the cosine similarity with the ROUGE score, and performed a high performance summary compared to the existing summary model.

다중문서는 하나의 주제가 아닌 다양한 주제로 구성된 문서를 의미하며 대표적인 예로 온라인 리뷰가 있다. 온라인 리뷰는 정보량이 방대하기 때문에 요약하기 위한 여러 시도가 있었다. 그러나 기존의 요약모델을 통해 리뷰를 일괄적으로 요약할 경우 리뷰를 구성하고 있는 다양한 주제가 소실되는 문제가 발생한다. 따라서 본 논문에서는 주제의 손실을 최소화하며 리뷰를 요약하기 위한 기법을 제시한다. 제안하는 기법은 전처리, 중요도 평가, BERT를 활용한 임베딩 치환, 임베딩 클러스터링과 같은 과정을 통해 리뷰를 분류한다. 그리고 분류된 문장은 학습된 Transformer 요약모델을 통해 최종 요약을 생성한다. 제안하는 모델의 성능 평가는 기존의 요약모델인 seq2seq 모델과 ROUGE 스코어와 코사인 유사도를 평가하여 비교하였으며 기존의 요약모델과 비교하여 뛰어난 성능의 요약을 수행하였다.

Keywords

Acknowledgement

이 논문은 2021년도 과학기술정보통신부 및 정보통신기획평가원의 대학ICT연구센터지원사업 지원에 의하여 수행된 것임(IITP-2020-2020-0-01602, 지능형 사이버 위협 대응 기술 개발 및 인력양성).

References

  1. T. Liu, "The support of online reviews on user shopping process," Master's Thesis, Kyung Hee University, 2016.
  2. S. Harer, S. Kadam, and R. Kaptein, "Mining and summarizing movie reviews in mobile environment," International Journal of Computer Science & Information Technologies, Vol.5, No.3, pp.3912-3916, 2014.
  3. C. V. Gupta, and G. S. Lehal, "A survey of text summarization extractive techniques," Journal of Emerging Technologies in Web Intelligence, Vol.2, No.3, pp.258-268, 2010.
  4. J. U. Heu, I. Qasim, and D. H. Lee, "FoDoSu: Multi-document summarization exploiting semantic analysis based on social Folksonomy," Information Processing & Management, Vol.51, No.1, pp.212-225, 2015 https://doi.org/10.1016/j.ipm.2014.06.003
  5. K. S. Jones, "Automatic summarising: The state of the art," Information Processing & Management, Vol.43, No.6, pp.1449-1481. 2007. https://doi.org/10.1016/j.ipm.2007.03.009
  6. M. Allahyari, et al., "Text summarization techniques: A brief survey," International Journal of Advanced Computer Science and Applications, Vol.8, No.10, pp.397-405, 2017.
  7. S. Chopra, M. Auli, and A. M. Rush, "Abstractive sentence summarization with attentive recurrent neural networks," Proceedings of The 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp.93-98, pp.93-98, 2016.
  8. D. Bahdanau, K. Cho, and Y. Bengio, "Neural machine translation by jointly learning to align and translate," 3rd ICLR 2015 as Oral Presentation, 2015.
  9. J. Devlin, M. W. Chang, K. Lee, and K. Toutanova, "BERT: Pre-training of deep bidirectional transformers for language understanding," NAACL-HLT, No.1, pp.4171-4186, 2019.
  10. J. Tan, A. Kotov, R. P. Mohammadiani, and Y. Huo, "Sentence retrieval with sentiment-specific topical anchoring for review summarization," ACM Conference on Information and Knowledge Management, pp.2323-2326, 2017.
  11. Y. Ma and Q. Li, "A weakly-supervised extractive framework for sentiment-preserving document summarization," World Wide Web, Vol.22, No.4, pp.1401-1425, 2019. https://doi.org/10.1007/s11280-018-0591-0
  12. H. Lee, "The Relational Analysis between Types of Online Hotel Review and Usefulness according to the Hotel Class," Korean Management Review, Vol.46, No.1, pp.137-156, 2017. https://doi.org/10.17287/kmr.2017.46.1.137
  13. A. Vaswani, et al., "Attention is all you need," In Advances in Neural Information Processing Systems, pp.6000-6010. 2017.
  14. Dacon. Korea Data Competition Platform. Extracting Summary of Korean Document Contest [Internet], https://dacon.io/competitions/official/235671/overview/description
  15. C. Y. Lin, "ROUGE: A package for automatic evaluation of summaries," Proceeding of the Workshop on Text Summarization Branches Out, 2004.