Skip to main content
Log in

Automatic text summarization using latent semantic analysis

  • Published:
Programming and Computer Software Aims and scope Submit manuscript

Abstract

In the paper, the most state-of-the-art methods of automatic text summarization, which build summaries in the form of generic extracts, are considered. The original text is represented in the form of a numerical matrix. Matrix columns correspond to text sentences, and each sentence is represented in the form of a vector in the term space. Further, latent semantic analysis is applied to the matrix obtained to construct sentences representation in the topic space. The dimensionality of the topic space is much less than the dimensionality of the initial term space. The choice of the most important sentences is carried out on the basis of sentences representation in the topic space. The number of important sentences is defined by the length of the demanded summary. This paper also presents a new generic text summarization method that uses nonnegative matrix factorization to estimate sentence relevance. Proposed sentence relevance estimation is based on normalization of topic space and further weighting of each topic using sentences representation in topic space. The proposed method shows better summarization quality and performance than state-of-the-art methods on the DUC 2001 and DUC 2002 standard data sets.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Mani, I. and Maybury, M.T., Advance in Automatic Text Summarization, Cambridge, Ma: The MIT Press, 1999.

    Google Scholar 

  2. Ježek, K. and Steinberger, J. Automatic Text Summarization (The State of the Art 2007 and New Challenges), Proc. of Znalosti 2008, Bratislava, 2008, pp. 1–12. http://textmining.zcu.cz/publications/Z08.pdf.

  3. Garcia, E., Information Retrieval Tutorials: Document Indexing Tutorial. http://www.miislita.com/information-retrieval-tutorial/indexing.html.

  4. Garcia, E., Vector Theory and Keyword Weights. http://www.miislita.com/term-vector/term-vector-1.html.

  5. Chisholm, E. and Kolda, T.G., New Term Weighting Formulas for the Vector Space Method in Information Retrieval, Tech. Rep. no. ORNL-TM-13756, Oak Ridge National Laboratory, Oak Ridge, TN, March 1999.

    Google Scholar 

  6. Landauer, T.K. and Dumais, S.T., A solution to Plato’s Problem: The Latent Semantic Analysis Theory of the Acquisition, Induction and Representation of Knowledge, Psychological Rev., 1997, vol. 104, pp. 211–240.

    Article  Google Scholar 

  7. Ye, Y., Comparing Matrix Methods in Text-based Information Retrieval, Tech. Rep. School of Mathematical Sciences, Peking University, 2000. http://dean.pku.edu.cn/bksky/2000jzlwj/39.pdf.

  8. Gong, Y. and Liu, X., Generic Text Summarization Using Relevance Measure and Latent Semantic Analysis, SIGIR-2001, 2001.

  9. Lee, D.D. and Seung, H.S., Learning the Parts of Objects by Non-negative Matrix Factorization, Nature, 1999, vol. 401, pp. 788–791.

    Article  Google Scholar 

  10. Wei Xu, Xin Liu, and Yihong Gong, Document Clustering Based on Non-negative Matrix Factorization, Proc. of the 26th Annual Int. ACM SIGIR Conf. on Research and Development in Information Retrieval, Toronto, 2003.

  11. Berry, M.W., Browne, M., Langville, A.N., Pauca, V.P., and Plemmons, R.J., Algorithms and Applications for Approximate Nonnegative Matrix Factorization, Computational Statistics Data Analysis, 2007, vol. 52, no. 1, pp. 155–173.

    Article  MathSciNet  MATH  Google Scholar 

  12. Rakesh, P., Shivapratap, G., Divya, G., and Soman, K.P., Evaluation of SVD and NMF Methods for Latent Semantic Analysis, Int. J. Recent Trends Engineering, 2009, vol. 1, no. 3.

  13. Berry, M.W., Dumais, S.T., and O’Brien G.W., Using Linear Algebra for Intelligent Information Retrieval, Univ. of Tennessee Knoxville, TN, USA, 1994.

  14. Steinberger, J., Text Summarization within the LSA Framework, PhD Dissertation, Univ. of West Bohemia in Pilsen, Czech Republic, 2007.

    Google Scholar 

  15. Ju-Hong Lee, Sun Park, Chan-Min Ahn, and Daeho Kim, Automatic Generic Document Summarization Based on Non-negative Matrix Factorization, Information Processing Management: Int. J., 2009, pp. 20–34.

  16. Sun Park, Personalized Summarization Agent Using Non-negative Matrix Factorization, PRICAI 2008: Trends in Artificial Intelligence, 2008.

  17. Sun Park, Ju-Hong Lee, Deok-Hwan Kim, and Chan-Min Ahn, Multi-document Summarization Using Weighted Similarity between Topic and Clustering-based Non-negative Semantic Feature, in Advances in Data and Web Management, 2007

  18. Lin, C.-Y., Looking for a Few Good Metrics: Automatic Summarization Evaluation — How many samples are enough?, Proc. of NTCIR 2004, Tokyo, 2004, pp. 1765–1776.

  19. Document Understanding Conferences. http://duc.nist.gov.

  20. DTU Toolbox. http://isp.imm.dtu.dk/toolbox/menu.html.

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to I. V. Mashechkin.

Additional information

Original Russian Text © I.V. Mashechkin, M.I. Petrovskiy, D.S. Popov, D.V. Tsarev, 2011, published in Programmirovanie, 2011, Vol. 37, No. 6.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Mashechkin, I.V., Petrovskiy, M.I., Popov, D.S. et al. Automatic text summarization using latent semantic analysis. Program Comput Soft 37, 299–305 (2011). https://doi.org/10.1134/S0361768811060041

Download citation

  • Received:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1134/S0361768811060041

Keywords

Navigation