Learning Word Representation Considering Proximity and Ambiguity

Lin Qiu; Yong Cao; Zaiqing Nie; Yong Yu; Yong Rui

doi:10.1609/aaai.v28i1.8936

Authors

Lin Qiu Shanghai Jiao Tong University
Yong Cao Microsoft Research
Zaiqing Nie Microsoft Research
Yong Yu Shanghai Jiao Tong University
Yong Rui Microsoft Research

DOI:

https://doi.org/10.1609/aaai.v28i1.8936

Keywords:

Word Representation, Neural Networks

Abstract

Distributed representations of words (aka word embedding) have proven helpful in solving natural language processing (NLP) tasks. Training distributed representations of words with neural networks has lately been a major focus of researchers in the field. Recent work on word embedding, the Continuous Bag-of-Words (CBOW) model and the Continuous Skip-gram (Skip-gram) model, have produced particularly impressive results, significantly speeding up the training process to enable word representation learning from large-scale data. However, both CBOW and Skip-gram do not pay enough attention to word proximity in terms of model or word ambiguity in terms of linguistics. In this paper, we propose Proximity-Ambiguity Sensitive (PAS) models (i.e. PAS CBOW and PAS Skip-gram) to produce high quality distributed representations of words considering both word proximity and ambiguity. From the model perspective, we introduce proximity weights as parameters to be learned in PAS CBOW and used in PAS Skip-gram. By better modeling word proximity, we reveal the strength of pooling-structured neural networks in word representation learning. The proximity-sensitive pooling layer can also be applied to other neural network applications that employ pooling layers. From the linguistics perspective, we train multiple representation vectors per word. Each representation vector corresponds to a particular group of POS tags of the word. By using PAS models, we achieved a 16.9% increase in accuracy over state-of-the-art models.

Learning Word Representation Considering Proximity and Ambiguity

Authors

DOI:

Keywords:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information

Developed By

Subscription