short-paper

On the connections between explicit semantic analysis and latent semantic analysis

Authors:
Chao Liu

Tencent Inc, Beijing, China

Tencent Inc, Beijing, China
View Profile

,
Yi-Min Wang

Microsoft Research, Redmond, WA, USA

Microsoft Research, Redmond, WA, USA
View Profile

CIKM '12: Proceedings of the 21st ACM international conference on Information and knowledge managementOctober 2012Pages 1804–1808https://doi.org/10.1145/2396761.2398521

Published:29 October 2012Publication History

CIKM '12: Proceedings of the 21st ACM international conference on Information and knowledge management

Pages 1804–1808

ABSTRACT

Semantic analysis tries to solve problems arising from polysemy and synonymy that are abundant in natural languages. Recently, Gabrilovich and Markovitch propose the Explicit Semantic Analysis (ESA) technique, which complements the well-known Latent Semantic Analysis (LSA) technique. In this paper, we show that the two techniques are not as distinct as their names suggest; instead, we find that ESA is equivalent to a LSA variant, and this equivalence generalizes to all kernel methods using kernels arising from the canonical dot product. Effectively, this result guarantees that ESA would not outperform the peak efficacy of LSA for any applications using the above kernel methods. In short, this paper for the first time establishes the connections between ESA and LSA, quantifies their relative efficacy, and generalizes the result to a big category of kernel methods.

References

M. Anderka, N. Lipka, and B. Stein. Evaluating cross-language explicit semantic analysis and cross querying. In Multilingual Information Access Evaluation I. Text Retrieval Experiments, volume 6241, pages 50--57. Springer Berlin / Heidelberg, 2010. Google ScholarDigital Library
M. Anderka and B. Stein. The esa retrieval model revisited. In SIGIR '09, pages 670--671. ACM, 2009. Google ScholarDigital Library
N. Cristianini, J. Shawe-Taylor, and H. Lodhi. Latent semantic kernels. J. Intell. Inf. Syst., 18(2--3):127--152, 2002. Google ScholarDigital Library
S. C. Deerwester, S. T. Dumais, T. K. Landauer, G. W. Furnas, and R. A. Harshman. Indexing by Latent Semantic Analysis. Journal of the American Society of Information Science, 41(6):391--407, 1990.Google ScholarCross Ref
O. Egozi, E. Gabrilovich, and S. Markovitch. Concept-based feature generation and selection for information retrieval. In Proceedings of the 23rd national conference on Artificial intelligence - Volume 2, pages 1132--1137. AAAI Press, 2008. Google ScholarDigital Library
E. Gabrilovich and S. Markovitch. Feature generation for text categorization using world knowledge. In IJCAI'05, pages 1048--1053. Google ScholarDigital Library
E. Gabrilovich and S. Markovitch. Overcoming the brittleness bottleneck using wikipedia: enhancing text categorization with encyclopedic knowledge. In proceedings of the 21st national conference on Artificial intelligence - Volume 2, pages 1301--1306. AAAI Press, 2006. Google ScholarDigital Library
E. Gabrilovich and S. Markovitch. Computing semantic relatedness using wikipedia-based explicit semantic analysis. In IJCAI'07, pages 1606--1611, Hyderabad, India, 2007. Google ScholarDigital Library
E. Gabrilovich and S. Markovitch. Wikipedia-based semantic interpretation for natural language processing. J. Artif. Int. Res., 34:443--498, March 2009. Google ScholarDigital Library
J. Hu, L. Fang, Y. Cao, H.-J. Zeng, H. Li, Q. Yang, and Z. Chen. Enhancing text clustering by leveraging wikipedia semantics. In SIGIR '08, pages 179--186. ACM, 2008. Google ScholarDigital Library
X. Hu, X. Zhang, C. Lu, E. K. Park, and X. Zhou. Exploiting wikipedia as external knowledge for document clustering. In KDD '09, pages 389--396. ACM, 2009. Google ScholarDigital Library
C. D. Manning and H. Schuetze. Foundations of Statistical Natural Language Processing. The MIT Press, 1 edition, June 1999. Google ScholarDigital Library
R. Mihalcea. Using Wikipedia for Automatic Word Sense Disambiguation. In North American Chapter of the Association for Computational Linguistics (NAACL 2007), 2007.Google Scholar
D. Milne and I. H. Witten. An effective, low-cost measure of semantic relatedness obtained from wikipedia links. In In Proceedings of AAAI 2008, 2008.Google Scholar
R. Montague. The Proper Treatment of Quantification in Ordinary English. pages 221--242, 1973.Google Scholar
M. Potthast, B. Stein, and M. Anderka. A wikipedia-based multilingual retrieval model. In ECIR'08, pages 522--530. Springer-Verlag, 2008. Google ScholarDigital Library
B. Scholkopf and A. J. Smola. Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond. MIT Press, Cambridge, MA, USA, 2001. Google ScholarDigital Library
S. K. M. Wong, W. Ziarko, and P. C. N. Wong. Generalized vector spaces model in information retrieval. In SIGIR'85, pages 18--25. ACM, 1985. Google ScholarDigital Library
T. Zesch and I. Gurevych. Wisdom of crowds versus wisdom of linguists? measuring the semantic relatedness of words. Natural Language Engineering, 16(01):25--59, 2010. Google ScholarDigital Library

Index Terms

On the connections between explicit semantic analysis and latent semantic analysis
1. Computing methodologies
  1. Artificial intelligence
    1. Natural language processing
2. Information systems
  1. Information retrieval
    1. Document representation
      1. Content analysis and feature selection

Recommendations

Reducing explicit semantic representation vectors using Latent Dirichlet Allocation

Explicit Semantic Analysis (ESA) is a knowledge-based method which builds the semantic representation of the words depending on the textual description of the concepts in the certain knowledge source. Due to its simplicity and success, ESA has received ...
Read More
Quantum latent semantic analysis
ICTIR'11: Proceedings of the Third international conference on Advances in information retrieval theory

The main goal of this paper is to explore latent topic analysis (LTA), in the context of quantum information retrieval. LTA is a valuable technique for document analysis and representation, which has been extensively used in information retrieval and ...
Read More
A New Approach for Multi-document Summarization Based on Latent Semantic Analysis
ISCID '14: Proceedings of the 2014 Seventh International Symposium on Computational Intelligence and Design - Volume 01

Multi-document summary plays an increasingly important role with the exponential document growth on the web. Among many traditional multi-document summarization techniques, the latent semantic analysis (LSA) is a unique duo to its using latent semantic ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
CIKM '12: Proceedings of the 21st ACM international conference on Information and knowledge management
October 2012
2840 pages
ISBN:9781450311564
DOI:10.1145/2396761
General Chair:
Xuewen Chen
Wayne State University, USA
,
Program Chairs:
Guy Lebanon
Georgia Institute of Technology
,
Haixun Wang
Microsoft Research Asia
,
Mohammed J. Zaki
Rensselaer Polytechnic Institute
Copyright © 2012 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 29 October 2012
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
explicit semantic analysis
kernel methods
latent semantic analysis
Qualifiers
- short-paper
Conference

Acceptance Rates
Overall Acceptance Rate1,861of8,427submissions,22%
Upcoming Conference
CIKM '24

Sponsor:

sigir

sigir

The 33rd ACM International Conference on Information and Knowledge Management

October 21 - 25, 2024

Boise , ID , USA
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 3
  Total Citations
  View Citations
- 317
  Total Downloads
- Downloads (Last 12 months)2
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

On the connections between explicit semantic analysis and latent semantic analysis

CIKM '12: Proceedings of the 21st ACM international conference on Information and knowledge management

ABSTRACT

References

Cited By

Index Terms

Recommendations

Reducing explicit semantic representation vectors using Latent Dirichlet Allocation

Quantum latent semantic analysis

A New Approach for Multi-document Summarization Based on Latent Semantic Analysis

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

On the connections between explicit semantic analysis and latent semantic analysis

CIKM '12: Proceedings of the 21st ACM international conference on Information and knowledge management

ABSTRACT

References

Cited By

Index Terms

Recommendations

Reducing explicit semantic representation vectors using Latent Dirichlet Allocation

Quantum latent semantic analysis

A New Approach for Multi-document Summarization Based on Latent Semantic Analysis

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media