Short Text Classification Based on Distributional Representations of Words

Chenglong MA; Qingwei ZHAO; Jielin PAN; Yonghong YAN

doi:10.1587/transinf.2016SLL0006

Special Section on Recent Advances in Machine Learning for Spoken Language Processing

Short Text Classification Based on Distributional Representations of Words

Chenglong MA, Qingwei ZHAO, Jielin PAN, Yonghong YAN

Author information

Keywords: short text classification, word embedding, gaussian model

JOURNAL FREE ACCESS

2016 Volume E99.D Issue 10 Pages 2562-2565

DOI https://doi.org/10.1587/transinf.2016SLL0006

Details

Abstract

Short texts usually encounter the problem of data sparseness, as they do not provide sufficient term co-occurrence information. In this paper, we show how to mitigate the problem in short text classification through word embeddings. We assume that a short text document is a specific sample of one distribution in a Gaussian-Bayesian framework. Furthermore, a fast clustering algorithm is utilized to expand and enrich the context of short text in embedding space. This approach is compared with those based on the classical bag-of-words approaches and neural network based methods. Experimental results validate the effectiveness of the proposed method.

Corresponding author

Register with J-STAGE for free!