Skip to main content
Log in

GeSe: Generalized static embedding

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

In natural language processing, most text representation methods can be generally categorized into two paradigms: static and dynamic. Both have distinctive advantages, which are reflected in the cost of training resources, the scale of input data, and the interpretability of the representation model. Dynamic representation methods, such as BERT, have achieved excellent results on many tasks based on expensive pre-training. However, this representation paradigm is black-box, and the intrinsic properties cannot be measured by standard word similarity and analogy benchmarks. Most importantly, it is not in all cases that there are adequate resources and unlimited data to use. While static methods are solid alternatives for these scenarios because they can be efficiently trained with limited resources, keeping straightforward interpretability and verifiable intrinsic properties. Although many static embedding methods have been proposed, few attempts have been made to investigate the connections between these algorithms. Thus, it is natural to ask which implementation is more efficient, and is there any way to combine the merits of these algorithms into a generalized framework? In this paper, we try to explore answers to these questions by focusing on two popular static embedding models, Continual-Bag-of-Words (CBOW) and Skip-gram (SG), with detailed analysis of their merits and drawbacks under both Negative Sampling (NS) and Hierarchy Softmax (HS) settings. Then, we propose a novel learning framework to train generalized static embeddings in a unified architecture. Our proposed method is estimator-agnostic. Thus, it can be optimized by either NS, HS, or any other equivalent estimators. Experiments show that embeddings learned from the proposed framework outperform strong baselines on standard intrinsic evaluations. We also test the proposed method on three extrinsic tasks. Empirical results show that the proposed method achieves considerable improvements across all these tasks.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

Notes

  1. https://en.wikipedia.org/wiki/Natural_language_processing

  2. https://cs.fit.edu/~mmahoney/compression/textdata.html

  3. http://archive.ics.uci.edu/ml/datasets/Reuters-21578+Text+Categorization+Collection

  4. https://archive.ics.uci.edu/ml/datasets/Farm+Ads

  5. https://www.di.unipi.it/~gulli/AG_corpus_of_news_articles.html

References

  1. Chen K, Zhang Z, Long J, Zhang H (2016) Turning from TF-IDF to TF-IGM for term weighting in text classification. Expert Syst Appl 66:245. https://doi.org/10.1016/j.eswa.2016.09.009

    Article  Google Scholar 

  2. Wang T, Guo J, Wu Z, Xu T (2021) IFTA: Iterative Filtering by using TF-AICL algorithm for Chinese encyclopedia knowledge refinement Applied Intelligence. https://doi.org/10.1007/s10489-021-02220-w

  3. Mikolov T, Chen K, Corrado G, Dean J (2013) Efficient Estimation of Word Representations in Vector Space, in 1st International Conference on Learning Representations, ICLR 2013, Scottsdale, Arizona, USA, May 2-4, 2013 Workshop Track Proceedings

  4. Mikolov T, Sutskever I, Chen K, Corrado GS, Dean J (2013) Distributed Representations of Words and Phrases and their Compositionality, in Advances in Neural Information Processing Systems 26: 27th Annual Conference on Neural Information Processing Systems 2013. Proceedings of a meeting held December 5-8, 2013, Lake Tahoe, Nevada, United States, pp 3111–3119

  5. Pennington J, Socher R, Manning C (2014) Glove: global vectors for word representation. In: proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp 1532–1543

  6. Peters M, Neumann M, Iyyer M, Gardner M, Clark C, Lee K, Zettlemoyer L (2018) Deep contextualized word representations, in Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational linguistics: Human Language Technologies, Volume 1 (Long Papers), pp 2227–2237

  7. Devlin J, Chang MW, Lee K, Toutanova K (2019) BERT:, Pre-training of Deep Bidirectional Transformers for Language Understanding, in Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp 4171–4186

  8. Pota M, Ventura M, Fujita H, Esposito M (2021) Multilingual evaluation of pre-processing for BERT-based sentiment analysis of tweets. Expert Syst Appl 181:115119. https://doi.org/10.1016/j.eswa.2021.115119

    Article  Google Scholar 

  9. Guarasci R, Silvestri S, De Pietro G, Fujita H, Esposito M (2021) Assessing BERT’s ability to learn Italian syntax: a study on null-subject and agreement phenomena Journal of Ambient Intelligence and Humanized Computing. https://doi.org/10.1007/s12652-021-03297-4

  10. Williams A, Nangia N, Bowman SR (2018) A broad-Coverage challenge corpus for sentence understanding through inference, inproceedings of the 2018 conference of the north american chapter of the association for computational linguistics: human language technologies, NAACL-HLT 2018, new orleans, louisiana, USA, June 1-6, 2018, Volume 1 (Long, Papers), pp 1112–1122

  11. Pota M, Esposito M, De Pietro G, Fujita H (2020) Best practices of convolutional neural networks for question classification. Appl Sci 10:4710. https://doi.org/10.3390/app10144710

    Article  Google Scholar 

  12. Catelli R, Gargiulo F, Casola V, De Pietro G, Fujita H, Esposito M (2020) Crosslingual named entity recognition for clinical de-identification applied to a COVID-19 Italian data set. Appl Soft Comput 97:106779. https://doi.org/10.1016/j.asoc.2020.106779

    Article  Google Scholar 

  13. Shen D, Cheng P, Sundararaman D, Zhang X, Yang Q, Tang M, Celikyilmaz A (2019)

  14. Strubell E, Ganesh A, McCallum A (2019) Energy and policy considerations for deep learning in NLP. In: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pp 3645–3650

  15. Joulin A, Grave E, Bojanowski P, Mikolov T (2017) Bag of tricks for efficient text classification, in Proceedings of the 15th Conference of the European Chapter of the Association for Computational linguistics: Volume 2, Short Papers, pp 427–431

  16. Kandola EJ, Hofmann T, Poggio T, Shawe-Taylor J (2006) A Neural Probabilistic Language Model, 194, pp 137–186

  17. Kenter T, Borisov A, de Rijke M (2016) Siamese CBOW: optimizing word embeddings for sentence representations. In: Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume, 1: Long Papers), pp 941–951

  18. Florian M, Lukas G (2019) Ansgar scherp CBOW is not all you need: combining CBOW with the compositional matrix space model

  19. Luo Q, Xu W, Guo J (2014) A Study on the CBOW Model’s Overfitting and Stability

  20. Mu C, Yang G, Yan Z (2018) Revisiting skip-Gram negative sampling model with rectification

  21. Leimeister M, Wilson BJ (2018) Skip-gram word embeddings in hyperbolic space, CoRR abs/1809.01498

  22. Fonarev A, Grinchuk O, Gusev G, Serdyukov P, Oseledets I (2017) Riemannian optimization for skip-Gram negative sampling. In: Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume, 1: Long Papers), pp 2028–2036

  23. Brazinskas A, Havrylov S, Titov I (2018) Embedding Words as Distributions with a Bayesian Skip-gram Model, Inproceedings of the 27th International Conference on Computational Linguistics, COLING 2018, Santa Fe, New Mexico, USA, August 20-26, 2018, pp 1775–1789

  24. Seymour Z, Li Y, Zhang Z (2015) Multimodal Skip-gram Using Convolutional Pseudowords computer science

  25. Lazaridou A, Pham NT, Baroni M (2015) Combining Language and Vision with a Multimodal Skip-gram Model. In: Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics:, Human Language Technologies, pp 153–163

  26. Kocmi T, Bojar O (2016) SubGram:, Extending Skip-Gram Word Representation with Substrings, in Text, Speech, and Dialogue - 19th International Conference, TSD 2016, Brno, Czech Republic, September 12-16, 2016, Proceedings, vol. 9924, pp 182–189

  27. Schlechtweg D, Oguz C, Walde SSI (2019) Second-order Co-occurrence Sensitivity of Skip-Gram with Negative Sampling CoRR

  28. Jameel S, Schockaert S (2016) D-GloVe: A Feasible Least Squares Model for Estimating Word Embedding Densities, in COLING 2016, 26th International Conference on Computational Linguistics, Proceedings of the conference: Technical Papers, December 11-16, 2016, Osaka Japan, pp 1849–1860

  29. Jameel S, Fu Z, Shi B, Lam W, Schockaert S (2019) Estimation, Word Embedding as Maximum A Posteriori. In The thirty-Third AAAI Conference on Artificial Intelligence, AAAI 2019, The thirty-First Innovative Applications of Artificial Intelligence Conference, IAAI 2019, The Ninth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI Honolulu, Hawaii, USA, January 27 - February 1, 2019, pp 6562–6569

  30. Mikolov T, Yih WT, Zweig G (2013) Linguistic Regularities in Continuous Space Word Representations. In: Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics:, Human Language Technologies, pp 746–751

  31. Derczynski L, Bontcheva K, Roberts I (2016) Broad Twitter corpus: A Diverse Named Entity Recognition Resource, Inproceedings of COLING 2016, the 26th International Conference on Computational linguistics: Technical Papers (Osaka Japan, pp 1169–1179

  32. Sang EFTK, Meulder FD (2003) Introduction to the coNLL-2003 Shared task: language-Independent Named Entity Recognition, Inproceedings of the Seventh Conference on Natural Language Learning, coNLL 2003, Held in cooperation with HLT-NAACL 2003, Edmonton, Canada, May 31 - June 1, 2003, pp 142–147

  33. Basile V, Bos J, Evang K, Venhuizen N (2012) Developing a large semantically annotated corpus, Inproceedings of the Eighth International Conference on Language Resources and Evaluation, LREC, Istanbul, Turkey, May 23-25, 2012, pp 3196–3200

  34. Lim SK, Muis AO, Lu W, Hui OC (2017) MalwaretextDB: A Database for Annotated Malware Articles, Inproceedings of the 55th Annual Meeting of the Association for Computational Linguistics, ACL 2017, Vancouver, Canada, July 30 - August 4, Volume 1:, Long Papers, pp 1557–1567

  35. Liu J, Pasupat P, Cyphers S, Glass JR (2013) Asgard: A portable architecture for multilingual dialogue systems, in IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2013, Vancouver, BC, Canada, May 26-31, 2013, pp 8386–8390

  36. Bowman SR, Angeli G, Potts C, Manning CD (2015) A large annotated corpus for learning natural language inference, Inproceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, EMNLP 2015, Lisbon, Portugal, September 17-21, 2015, pp 632–642

  37. Wang A, Singh A, Michael J, Hill F, Levy O, Bowman SR (2019) GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding, in 7th International Conference on Learning Representations, ICLR 2019, New Orleans, LA, USA, May 6-9 2019

  38. Khot T, Sabharwal A, Clark P (2018) Scitail: A Textual Entailment Dataset from Science Question Answering, Inproceedings of the thirty-Second AAAI Conference on Artificial Intelligence, (AAAI-18), the 30th innovative Applications of Artificial Intelligence (IAAI-18), and the 8th AAAI Symposium on Educational Advances in Artificial Intelligence (EAAI-18), New Orleans, Louisiana, USA, February 2-7, 2018, pp 5189– 5197

  39. Maas AL, Daly RE, Pham PT, Huang D, Ng AY, Potts C (2011) Learning Word Vectors for Sentiment Analysis, in The 49th Annual Meeting of the Association for Computational linguistics: Human Language Technologies, Proceedings of the Conference, 19-24 June, 2011, Portland, Oregon, USA, pp 142– 150

  40. Lang K (1995) Newsweeder: Learning to Filter Netnews, in Machine Learning, Proceedings of the Twelfth International Conference on Machine Learning, Tahoe City, California, USA, July 9-12, 1995, pp 331–339

  41. Soleimani H, Miller DJ (2016) Semi-supervised Multi-Label Topic Models for Document Classification and Sentence Labeling, Inproceedings of the 25th ACM International Conference on Information and Knowledge Management, CIKM 2016, Indianapolis, IN, USA, October 24-28, 2016, pp 105–114

Download references

Acknowledgements

We would like to thank all the editors and anonymous reviewers for the insightful feedback, which helps a lot to improve the quality of our article.

This work was supported by the National Key R&D Program of China (NO. 2018AAA0100300) and The Innovation Foundation of Science and Technology of Dalian via the project Study on the Key Management and Privacy Preservation in VANET (NO. 2018J12GX045).

The views and conclusions contained in this paper are those of the authors and should not be interpreted as representing official policies or endorsements of any supporting organizations or governments.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Nianmin Yao.

Ethics declarations

Conflict of Interests

The authors declare that they have no conflict of interest.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Gong, N., Yao, N. GeSe: Generalized static embedding. Appl Intell 52, 10148–10160 (2022). https://doi.org/10.1007/s10489-021-03001-1

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10489-021-03001-1

Keywords

Navigation