skip to main content
10.1145/3342827.3342837acmotherconferencesArticle/Chapter ViewAbstractPublication PagesnlpirConference Proceedingsconference-collections
research-article
Public Access

HWE: Hybrid Word Embeddings For Text Classification

Published:28 June 2019Publication History

ABSTRACT

Text classification is one of the most important tasks in natural language processing and information retrieval due to the increasing availability of documents in digital form and the ensuing need to access them in flexible ways. By assigning documents to labeled classes, text classification can reduce the search space and expedite the process of retrieving relevant documents. In this paper, we propose a novel text representation method, Hybrid Word Embeddings (HWE), which combines semantic information obtained fromWord- Net and contextual information extracted from text documents to provide concise and accurate representations of text documents. The proposed HWE method can improve the efficiency of deriving word semantics from text by taking advantage of the semantic relationships extracted from WordNet with less training corpus. Experimental study on classification of documents shows that the proposed HWE outperforms existing methods, including Doc2Vec and Word2Vec, in terms of classification accuracy, recall, precision, etc.

References

  1. Zellig Harris. Distributional structure. Word, 10(23):146--162, 1954.Google ScholarGoogle ScholarCross RefCross Ref
  2. TK LANDAUER and ST DUMAIS. A solution to plato's problem: The latent semantic analysis theory of acquisition, induction, and representation of knowledge. Psychological review, 104(2):211--240, 1997.Google ScholarGoogle ScholarCross RefCross Ref
  3. Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S Corrado, and Jeff Dean. Distributed representations of words and phrases and their compositionality. In Advances in Neural Information Processing Systems, pages 3111--3119, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Quoc Le and Tomas Mikolov. Distributed representations of sentences and documents. In Tony Jebara and Eric P. Xing, editors, Proceedings of the 31st International Conference on Machine Learning (ICML-14), pages 1188--1196. JMLR Workshop and Conference Proceedings, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Bo Li, James Z.Wang, Frank Alex Feltus, Jizhong Zhou, and Feng Luo. Effectively integrating information content and structural relationship to improve the gobased similarity measure between proteins. CoRR, abs/1001.0958, 2010.Google ScholarGoogle Scholar
  6. Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. Efficient estimation of word representations in vector space. CoRR, abs/1301.3781, 2013.Google ScholarGoogle Scholar
  7. David M. Blei. Probabilistic topic models. Commun. ACM, 55(4):77--84, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. George A. Miller. Wordnet: A lexical database for english. Commun. ACM, 38(11):39--41, November 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Kemafor Anyanwu, Angela Maduko, and Amit Sheth. Semrank: Ranking complex relationship search results on the semantic web. In Proceedings of the 14th International Conference on World Wide Web, WWW '05, pages 117--127, New York, NY, USA, 2005. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Xuebo Song, Lin Li, Pradip K. Srimani, Philip S. Yu, and James Z. Wang. Measure the semantic similarity of go terms using aggregate information content. In Zhipeng Cai, Oliver Eulenstein, Daniel Janies, and Daniel Schwartz, editors, Bioinformatics Research and Applications, volume 7875 of Lecture Notes in Computer Science, pages 224--236. Springer Berlin Heidelberg, 2013.Google ScholarGoogle ScholarCross RefCross Ref
  11. Andreas Hotho, Steffen Staab, and Gerd Stumme. Wordnet improves text document clustering. In In Proc. of the SIGIR 2003 Semantic Web Workshop, pages 541--544, 2003.Google ScholarGoogle Scholar
  12. James Z.Wang and William Taylor. Concept forest: A new ontology-assisted text document similarity measurement method. In Web Intelligence, IEEE/WIC/ACM International Conference on, pages 395--401, Nov 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Robert E. Tarjan and Jan van Leeuwen. Worst-case analysis of set union algorithms. J. ACM, 31(2):245--281, March 1984. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Xuebo Song, Lin Li, Pradip K. Srimani, Philip S. Yu, and James Z. Wang. Measure the semantic similarity of go terms using aggregate information content. IEEE/ACM Trans. Comput. Biol. Bioinformatics, 11(3):468--476, May 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. 20-newsgroup collection. 1999.Google ScholarGoogle Scholar

Index Terms

  1. HWE: Hybrid Word Embeddings For Text Classification

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Other conferences
      NLPIR '19: Proceedings of the 2019 3rd International Conference on Natural Language Processing and Information Retrieval
      June 2019
      171 pages
      ISBN:9781450362795
      DOI:10.1145/3342827

      Copyright © 2019 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 28 June 2019

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article
      • Research
      • Refereed limited

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader