research-article

Public Access

HWE: Hybrid Word Embeddings For Text Classification

Authors:
Xuebo Song

School of Computing, Clemson, SC, United States

School of Computing, Clemson, SC, United States
View Profile

,
Pradip K. Srimani

School of Computing, Clemson, SC, United States

School of Computing, Clemson, SC, United States
View Profile

,
James Z. Wang

School of Computing, Clemson, SC, United States

School of Computing, Clemson, SC, United States
View Profile

NLPIR '19: Proceedings of the 2019 3rd International Conference on Natural Language Processing and Information RetrievalJune 2019Pages 25–29https://doi.org/10.1145/3342827.3342837

Published:28 June 2019Publication History

NLPIR '19: Proceedings of the 2019 3rd International Conference on Natural Language Processing and Information Retrieval

Pages 25–29

ABSTRACT

Text classification is one of the most important tasks in natural language processing and information retrieval due to the increasing availability of documents in digital form and the ensuing need to access them in flexible ways. By assigning documents to labeled classes, text classification can reduce the search space and expedite the process of retrieving relevant documents. In this paper, we propose a novel text representation method, Hybrid Word Embeddings (HWE), which combines semantic information obtained fromWord- Net and contextual information extracted from text documents to provide concise and accurate representations of text documents. The proposed HWE method can improve the efficiency of deriving word semantics from text by taking advantage of the semantic relationships extracted from WordNet with less training corpus. Experimental study on classification of documents shows that the proposed HWE outperforms existing methods, including Doc2Vec and Word2Vec, in terms of classification accuracy, recall, precision, etc.

References

Zellig Harris. Distributional structure. Word, 10(23):146--162, 1954.Google ScholarCross Ref
TK LANDAUER and ST DUMAIS. A solution to plato's problem: The latent semantic analysis theory of acquisition, induction, and representation of knowledge. Psychological review, 104(2):211--240, 1997.Google ScholarCross Ref
Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S Corrado, and Jeff Dean. Distributed representations of words and phrases and their compositionality. In Advances in Neural Information Processing Systems, pages 3111--3119, 2013. Google ScholarDigital Library
Quoc Le and Tomas Mikolov. Distributed representations of sentences and documents. In Tony Jebara and Eric P. Xing, editors, Proceedings of the 31st International Conference on Machine Learning (ICML-14), pages 1188--1196. JMLR Workshop and Conference Proceedings, 2014. Google ScholarDigital Library
Bo Li, James Z.Wang, Frank Alex Feltus, Jizhong Zhou, and Feng Luo. Effectively integrating information content and structural relationship to improve the gobased similarity measure between proteins. CoRR, abs/1001.0958, 2010.Google Scholar
Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. Efficient estimation of word representations in vector space. CoRR, abs/1301.3781, 2013.Google Scholar
David M. Blei. Probabilistic topic models. Commun. ACM, 55(4):77--84, 2012. Google ScholarDigital Library
George A. Miller. Wordnet: A lexical database for english. Commun. ACM, 38(11):39--41, November 1995. Google ScholarDigital Library
Kemafor Anyanwu, Angela Maduko, and Amit Sheth. Semrank: Ranking complex relationship search results on the semantic web. In Proceedings of the 14th International Conference on World Wide Web, WWW '05, pages 117--127, New York, NY, USA, 2005. ACM. Google ScholarDigital Library
Xuebo Song, Lin Li, Pradip K. Srimani, Philip S. Yu, and James Z. Wang. Measure the semantic similarity of go terms using aggregate information content. In Zhipeng Cai, Oliver Eulenstein, Daniel Janies, and Daniel Schwartz, editors, Bioinformatics Research and Applications, volume 7875 of Lecture Notes in Computer Science, pages 224--236. Springer Berlin Heidelberg, 2013.Google ScholarCross Ref
Andreas Hotho, Steffen Staab, and Gerd Stumme. Wordnet improves text document clustering. In In Proc. of the SIGIR 2003 Semantic Web Workshop, pages 541--544, 2003.Google Scholar
James Z.Wang and William Taylor. Concept forest: A new ontology-assisted text document similarity measurement method. In Web Intelligence, IEEE/WIC/ACM International Conference on, pages 395--401, Nov 2007. Google ScholarDigital Library
Robert E. Tarjan and Jan van Leeuwen. Worst-case analysis of set union algorithms. J. ACM, 31(2):245--281, March 1984. Google ScholarDigital Library
Xuebo Song, Lin Li, Pradip K. Srimani, Philip S. Yu, and James Z. Wang. Measure the semantic similarity of go terms using aggregate information content. IEEE/ACM Trans. Comput. Biol. Bioinformatics, 11(3):468--476, May 2014. Google ScholarDigital Library
20-newsgroup collection. 1999.Google Scholar

Index Terms

HWE: Hybrid Word Embeddings For Text Classification
1. Information systems
  1. Information retrieval

Recommendations

An automated new approach in fast text classification (fastText): A case study for Turkish text classification without pre-processing
NLPIR '19: Proceedings of the 2019 3rd International Conference on Natural Language Processing and Information Retrieval

Any Text Classification (TC) problem need pre-processing steps which may affect the classification accuracy. Especially pre-processing steps need substantial effort particularly in agglutinative languages such as Turkish. In this context, a traditional ...
Read More
Multi-prototype Morpheme Embedding for Text Classification
SMA 2020: The 9th International Conference on Smart Media and Applications

Representing a word into a continuous space, also known as a word vector, has been successful in various NLP tasks. The word-based embedding has two problems; one is the out-of-vocabulary problem and the other is does not take into account the context ...
Read More
Improving Vietnamese WordNet using word embedding
NLPIR '19: Proceedings of the 2019 3rd International Conference on Natural Language Processing and Information Retrieval

This paper presents a simple but effective method to improve the quality of WordNet synsets and extract glosses for synsets. We translate the Princeton WordNet and other intermediate WordNets to a target language using a machine translator, then the ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in

NLPIR '19: Proceedings of the 2019 3rd International Conference on Natural Language Processing and Information Retrieval
June 2019
171 pages
ISBN:9781450362795
DOI:10.1145/3342827

Copyright © 2019 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 28 June 2019
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Text classification
Word embedding
WordNet
Qualifiers
- research-article
- Research
- Refereed limited
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 1
  Total Citations
  View Citations
- 479
  Total Downloads
- Downloads (Last 12 months)101
- Downloads (Last 6 weeks)18
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HWE: Hybrid Word Embeddings For Text Classification

NLPIR '19: Proceedings of the 2019 3rd International Conference on Natural Language Processing and Information Retrieval

ABSTRACT

References

Cited By

Index Terms

Recommendations

An automated new approach in fast text classification (fastText): A case study for Turkish text classification without pre-processing

Multi-prototype Morpheme Embedding for Text Classification

Improving Vietnamese WordNet using word embedding

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

HWE: Hybrid Word Embeddings For Text Classification

NLPIR '19: Proceedings of the 2019 3rd International Conference on Natural Language Processing and Information Retrieval

ABSTRACT

References

Cited By

Index Terms

Recommendations

An automated new approach in fast text classification (fastText): A case study for Turkish text classification without pre-processing

Multi-prototype Morpheme Embedding for Text Classification

Improving Vietnamese WordNet using word embedding

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media