Skip to main content
Log in

Information Gain Based Feature Selection for Improved Textual Sentiment Analysis

  • Published:
Wireless Personal Communications Aims and scope Submit manuscript

Abstract

Sentiment analysis or opinion mining is the process of mining the emotion from a given text. It is a text mining technique that effectively measures the inclination of public opinions and aids in analysing the subjective information from the given context. Sentiment analysis evaluates the opinion of a sentiment as either positive or negative or neutral. Sentiments are very specific and with respect to the underlying content, it plays a very crucial role in depicting the real-world scenario. Sentiment analysis can be performed at three levels namely document level, sentence level and feature level. This paper proposes a novel Information Gain based Feature Selection algorithm that selects highly correlated features by removing inappropriate content. Using this algorithm, extensive sentimental analysis is performed at the document level, sentence level and feature level. Datasets from Cornell and Kaggle are exploited for experimental purposes. Compared to other baseline classifiers experimental results show that the proposed Information Gain based classifier resulted in an accuracy of 95, 96.3 and 97.4% for document, sentence and feature levels respectively. The proposed method is also tested with higher dimensional datasets namely Movielens 1M, 10M and 25M datasets. Experimental results proved that the proposed method works better even for high dimensional datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

Data Availability

Data is publically available.

References

  1. Medhat, W., Hassan, A., & Korashy, H. (2014). Sentiment analysis algorithms and applications: A survey. Ain Shams Engineering Journal, 5(4), 1093–1113. https://doi.org/10.1016/j.asej.2014.04.011

    Article  Google Scholar 

  2. Yaakub, M. R., Latiffi, A., Iqbal, M., & Safra, L. (2019). A review on sentiment analysis techniques and applications. IOP Conference Series: Materials Science and Engineering. https://doi.org/10.1088/1757-899X/551/1/012070

    Article  Google Scholar 

  3. Kowshalya, A. M., Madhumathi, R., & Gopika, N. (2019). Correlation based feature selection algorithms for varying datasets of different dimensionality. Wireless Personal Communications, 108(5), 1977–1993. https://doi.org/10.1007/s11277-019-06504-w

    Article  Google Scholar 

  4. Kowshalya, A. M., & Valarmathi, M. L. (2018). Evaluating twitter data to discover user’s perception about social Internet of Things. Wireless Personal Communications, 101(2), 649–659. https://doi.org/10.1007/s11277-018-5709-2

    Article  Google Scholar 

  5. Drus, Z., & Khalid, H. (2019). Sentiment analysis in social media and Its application: Systematic literature review. Procedia Computer Science, 161, 707–714. https://doi.org/10.1016/j.procs.2019.11.174

    Article  Google Scholar 

  6. Vamsi, B., Suneetha, N., Sudhakar, Ch., & Amaravati, K. (2017). Sentiment analysis on online reviews using supervised learning: A survey. International Journal of Control Theory and Applications, 10(30), 143–152.

    Google Scholar 

  7. Behdenna, S., Barigou, F., & Belalem, G. (2018). Document level sentiment analysis: A survey. EAI Endorsed Transactions on Context-Aware Systems and Applications, 4(13), 1–8. https://doi.org/10.4108/eai.14-3-2018.154339

    Article  Google Scholar 

  8. Nicholls, C., & Song, F. (2010). Comparison of feature selection methods for sentiment analysis.” In: Proceedings of Canadian Conference on Artificial Intelligence, Springer, pp. 286–289, 2010. https://doi.org/10.1007/978-3-642-13059-5_30.

  9. Quan, C., & Ren, F. (2016). Feature-level sentiment analysis by using comparative domain corpora. Enterprise Information Systems, 10(5), 505–522. https://doi.org/10.1080/17517575.2014.985613

    Article  Google Scholar 

  10. Pratiwi, A. I., & Adiwijaya,. (2018). On the feature selection and classification based on information gain for document sentiment analysis. Applied Computational Intelligence and Soft Computing, 2018, 1–5. https://doi.org/10.1155/2018/1407817

    Article  Google Scholar 

  11. Gupta, S. L., & Baghel, A. S. (2018). Efficient feature extraction in sentiment classification for contrastive sentences. International Journal of Modern Education and Computer Science, 5, 54–62. https://doi.org/10.5815/ijmecs.2018.05.07

    Article  Google Scholar 

  12. Jagdale, R. S., Shirsath, V., & Deshmukh, S. (2019). Sentiment analysis on product reviews using machine learning techniques. Advances in Intelligent Systems and Computing, 768, 639–647. https://doi.org/10.1007/978-981-13-0617-4_61

    Article  Google Scholar 

  13. Shirsat, V., Jagdale, R., Shende, K., Deshmukh, S. N., & Kawale, S. (2019). Sentence level sentiment analysis from news articles and blogs using machine learning techniques. International Journal of Computer Sciences and Engineering, 7(5), 1–6.

    Article  Google Scholar 

  14. Rintyarna, B. S., Sarno, R., & Fatichah, C. (2019). Evaluating the performance of sentence level features and domain sensitive features of product reviews on supervised sentiment analysis tasks. Journal of Big Data, 6, 1–19.

    Article  Google Scholar 

  15. Schouten, K., Frasincar, F., & R. Dekker, R., (2016). An information gain-driven feature study for aspect-based sentiment analysis. In: Proceedings of International Conference on Applications of Natural Language to Information Systems, pp. 48–59, 2016. https://doi.org/10.1007/978-3-319-41754-7_5.

  16. Franky, & Manurung, R. (2008). “Machine Learning-based Sentiment Analysis of Automatic Indonesian Translations of English Movie Reviews. In: Proceedings of International Conference on Advanced Computational Intelligence and Its Applications (ICACIA), Depok, Indonesia, Jan, 2008.

  17. Singh, M., & Gupta, S. (2020). Sentiment analysis using Naive Bayes classifier and information gain feature selection over twitter. International Journal of Computer Trends and Technology, 68(5), 84–91.

    Article  Google Scholar 

  18. Saif, H., Y. He, Y. & Alani, H. (2012). “Semantic Sentiment Analysis of Twitter”. In: Proceedings of 11th International conference on The Semantic Web, vol.1, pp 508–524, 2012. https://doi.org/10.1007/978-3-642-35176-1_32.

  19. Ikonomakis, M., Kotsiantis, S., & Tampakas, V. (2005). Text classification using machine learning techniques. SEAS Transactions on Computers, 4(8), 966–974.

    Google Scholar 

  20. Raza, H., Faizan, M., Hamza, A., Mushtaq, A., & Akhtar, N. (2019). Scientific text sentiment analysis using machine learning techniques. International Journal of Advanced Computer Science and Applications, 10(12), 157–165.

    Article  Google Scholar 

  21. Yu, H., & Hatzivassiloglou, V. (2003). “Towards Answering Opinion Questions: Separating Facts from Opinions and Identifying the Polarity of Opinion Sentences. In: Proceedings of Conference on Empirical Methods in Natural Language Processing, Jul, 2003. https://doi.org/10.3115/1119355.1119372.

  22. Dang, N. C., Moreno-García, M. N., & De la Prieta, F. (2020). Sentiment analysis based on deep learning: A comparative study. Multidisciplinary Digital Publishing Institute Journal, 9, 1–29. https://doi.org/10.3390/electronics9030483

    Article  Google Scholar 

  23. dos Santos, C.N., & M. Gatti, M. (2014). “Deep Convolutional Neural Networks for Sentiment Analysis of Short Texts”. In: Proceedings of 25th International Conference on Computational Linguistics: Technical Papers, COLING, pp. 69-78, Aug, 2014. http://creativecommons.org/licenses/by/4.0.

  24. Kouloumpis, E., Wilson, T. & Moore, J. (2011). “Twitter Sentiment Analysis: The Good the Bad and the OMG!”. In: Proceedings of International Conference on Weblogs and Social Media, vol. 2, Jul, 2011.

  25. Alshamsi, A., Bayari, R., & Salloum, S. (2020). Sentiment Analysis in English Texts. Advances in Science, Technology and Engineering Systems Journal, 5(6), 1683–1689.

    Article  Google Scholar 

  26. Agarwal, A., Biadsy, F., & Mckeown, K.R. (2009). Contextual phrase-level polarity analysis using lexical affect scoring and syntactic N-grams. In: Proceedings of the 12th Conference of the European Chapter of the ACL, pp. 24–32, Mar, 2009. https://doi.org/10.3115/1609067.1609069.

  27. Lin, C., & He, Y. (2009). “Joint sentiment/topic model for sentiment analysis,” In: Proceedings of the 18th ACM conference on Information and knowledge management, pp. 375–384, Nov, 2009. https://doi.org/10.1145/1645953.1646003.

  28. Singh, V.K., Piryani, R., Uddin, A., & Waila, P. (2013). “Sentiment analysis of movie reviews: A new feature-based heuristic for aspect-level sentiment classification”. In: Proceedings of the International Multi-Conference on Automation, Computing, Communication, Control and Compressed Sensing (iMac4s), Mar, 2013. https://doi.org/10.1109/iMac4s.2013.6526500.

  29. Zhang, Y., Lai, G., Zhang, M., Zhang, Y., Liu, Y., Ma, S., (2014). “Explicit factor models for explainable recommendation based on phrase-level sentiment analysis”. In: Proceedings of the. 37th International ACM SIGIR conference on Research & development in information retrieval, pp. 83–92, Jul, 2014. https://doi.org/10.1145/2600428.2609579.

  30. Choi, G., Oh, S., & Kim, H. (2020). Improving document-level sentiment classification using importance of sentences. Multidisciplinary Digital Publishing Institute Journal, 22, 1–11. https://doi.org/10.3390/e22121336

    Article  Google Scholar 

  31. The Stanford Natural Language Processing Group (http://nlp.stanford.edu/software/lexparser.shtml).

  32. Stone, P. J., Dunphy, D., Smith, M. S., & Ogilvie, D. M. (1966). The general inquirer: A computer approach to content analysis. The MIT Press.

Download references

Funding

Not applicable.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Madhumathi Ramasamy.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Ethical Approval

This article does not contain any studies with human participants performed by any of the authors.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Ramasamy, M., Meena Kowshalya, A. Information Gain Based Feature Selection for Improved Textual Sentiment Analysis. Wireless Pers Commun 125, 1203–1219 (2022). https://doi.org/10.1007/s11277-022-09597-y

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11277-022-09597-y

Keywords

Navigation