Skip to main content

Comparison of Feature Selection Methods for Sentiment Analysis

  • Conference paper
  • First Online:
Book cover Big Data, Cloud and Applications (BDCA 2018)

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 872))

Included in the following conference series:

Abstract

Sentiment analysis is process of deriving the opinion or attitude expressed in input text. For the classification problem, feature selection aims to select features that are capable of discriminating samples that belong to different classes. This paper evaluates the performance of three feature selection methods (MI, CHI and ANOVA) combined with three machine learning based classification techniques (NB, SVM and KNN) for sentiment analysis on online movie reviews dataset. The paper shows that feature selection is important task for sentiment based classification.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Nicholls, C., Song, F.: Comparison of feature selection methods for sentiment analysis. In: Farzindar, A., Kešelj, V. (eds.) AI 2010. LNCS (LNAI), vol. 6085, pp. 286–289. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-13059-5_30

    Chapter  Google Scholar 

  2. Pang, B., Lee, L.: Opinion mining and sentiment analysis. Found. Trends Inf. Retr. 2, 1–135 (2008)

    Article  Google Scholar 

  3. Tatemura, J.: Virtual reviewers for collaborative exploration of movie reviews. In: Proceedings of Intelligent User Interfaces (IUI), pp. 272–275 (2000)

    Google Scholar 

  4. Terveen, L., Hill, W., Amento, B., McDonald, D., Creter, J.: PHOAKS: a system for sharing recommendations. Commun. Assoc. Comput. Mach. (CACM) 40(3), 59–62 (1997)

    Article  Google Scholar 

  5. Laver, M., Benoit, K., Garry, J.: Extracting policy positions from political texts using words as data. Am. Polit. Sci. Rev. 97(2), 311–331 (2003)

    Article  Google Scholar 

  6. Mullen, T., Malouf, R.: A preliminary investigation into sentiment analysis of informal political discourse. In: AAAI Symposium on Computational Approaches to Analysing Weblogs (AAAICAAW), pp. 159–162 (2006)

    Google Scholar 

  7. Dasgupta, A., Drineas, P., Harb, B., Josifovski, V., Mahoney, M.W.: Feature selection methods for text classification. In: Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 230–239 (2007)

    Google Scholar 

  8. Wang, S., Li, D., Song, X., Wei, Y., Li, H.: A feature selection method based on improved fisher’s discriminant ratio for text sentiment classification. Expert Syst. Appl. 38(7), 8696–8702 (2011)

    Article  Google Scholar 

  9. Ahmed, A., Chen, H., Salem, A.: Sentiment analysis in multiple languages: feature selection for opinion classification in web forums. ACM Trans. Inf. Syst. 26(3), 12:21–12:25 (2008)

    Google Scholar 

  10. Sharma, A., Dey, S.: A comparative study of feature selection and machine learning techniques for sentiment analysis. In: Proceedings of the 2012 ACM Research in Applied Computation Symposium. ACM (2012)

    Google Scholar 

  11. Kang, H., Yoo, S.J., Han, D.: Senti-lexicon and improved Naïve Bayes algorithms for sentiment analysis of restaurant reviews. Expert Syst. Appl. 39(5), 6000–6010 (2012)

    Article  Google Scholar 

  12. Joachims, T.: Text categorization with support vector machines: learning with many relevant features. In: Proceedings of the ECML 1998, pp. 137–142 (1998)

    Google Scholar 

  13. Yang, Y., Lin, X.: A re-examination of text categorization methods. In: Proceedings of the SIGIR 1999, pp. 42–49 (1999)

    Google Scholar 

  14. Liu, B., Zhang, L.: A survey of opinion mining and sentiment analysis. In: Aggarwal, C., Zhai, C. (eds.) Mining Text Data, pp. 415–463. Springer, Boston (2012). https://doi.org/10.1007/978-1-4614-3223-4_13

    Chapter  Google Scholar 

  15. Mikalai, T., Themis, P.: Survey on mining subjective data on the web. Data Min. Knowl. Discov. 24, 478–514 (2012)

    Article  Google Scholar 

  16. Medhat, W., Hassan, A., Korashy, H.: Sentiment analysis algorithms and applications: a survey. Ain Shams Eng. J. 5, 1093–1113 (2014)

    Article  Google Scholar 

  17. Maynard, D., Funk, A.: Automatic detection of political opinions in tweets. In: García-Castro, R., Fensel, D., Antoniou, G. (eds.) ESWC 2011. LNCS, vol. 7117, pp. 88–99. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-25953-1_8

    Chapter  Google Scholar 

  18. Pang, B., Lee, L., Vaithyanathan, S.: Thumbs up? Sentiment classification using machine learning techniques. In: Proceedings of the ACL 2002 Conference on Empirical Methods in Natural Language Processing. ACL (2002)

    Google Scholar 

  19. Pang, B., Lee, L.: A sentimental education: sentiment analysis using subjectivity summarization based on minimum cuts. In: Proceedings of ACL (2004)

    Google Scholar 

  20. Xia, R., Zong, C., Li, S.: Ensemble of feature sets and classification algorithms for sentiment classification. Inf. Sci. 181(6), 1138–1152 (2011)

    Article  Google Scholar 

  21. Dave, K., Lawrence, S., Pennock, D.M.: Mining the peanut gallery: opinion extraction and semantic classification of product reviews. In: Proceedings of the 12th International WWW Conference, Budapest, Hungary, 20–24 May 2003, pp. 519–528 (2003)

    Google Scholar 

  22. Wilson, T., Wiebe, J., Hoffman, P.: Recognizing contextual polarity in phraselevel sentiment analysis. In: Proceedings of the Human Language Technology Conference and Conference on Empirical Methods in Natural Language Processing, British Columbia, Canada, pp. 347–354 (2005)

    Google Scholar 

  23. Annett, M., Kondrak, G.: A comparison of sentiment analysis techniques: polarizing movie blogs. Adv. Artif. Intell. 5032, 25–35 (2008)

    Google Scholar 

  24. Jotheeswaram, J., Kumaraswamy, Y.S.: Opinion mining using decision tree based feature selection through Manhattan hierarchical technology. J. Theor. Appl. Inf. Technol. 58(1), 72–79 (2013)

    Google Scholar 

  25. Tan, S., Zhang, J.: An empirical study of sentiment analysis for Chinese documents. Expert Syst. Appl. 34(4), 2622–2629 (2008)

    Article  Google Scholar 

  26. Gamon, M.: Sentiment classification on customer feedback data: noisy data, large feature vectors, and the role of linguistic analysis. In: Proceedings of the 20th International Conference on Computational Linguistics, Geneva, Switzerland. ACL (2004)

    Google Scholar 

  27. Cover, T.M., Thomas, J.A.: Elements of Information Theory. Wiley, New York (1991)

    Book  Google Scholar 

  28. Yang, Y., Pedersen, J.O.: A comparative study on feature selection in text categorization. In: Proceedings of the ICML (1997)

    Google Scholar 

  29. Lin, H., Ding, H.: Predicting ion channels and their types by the dipeptide mode of pseudo amino acid composition. J. Theor. Biol. 269, 64–69 (2011)

    Article  MathSciNet  Google Scholar 

  30. Zhang, H., Gan, W., Jiang, B.: Machine learning and lexicon based methods for sentiment classification: a survey. In: Yuan, X., Meng, X. (eds.) Proceedings of the 11th Web Information System and Application Conference, pp. 262–265. IEEE Press, Piscataway (2014)

    Google Scholar 

  31. Kushal, D., Lawrence, S., Pennock, D.: Mining the peanut gallery: opinion extraction and semantic classification of product reviews. In: Proceedings of WWW (2003)

    Google Scholar 

  32. Boiy, E., Hens, P., Deschacht, K., Moens, M.: Automatic sentiment analysis in on-line text. In: Chan, L., Martens, B. (eds.) ELPUB, pp. 349–360 (2007)

    Google Scholar 

  33. Zaidan, O., Eisner, J., Piatko, C.: Using annotator rationales to improve machine learning for text categorization. In: NAACL – HLT (2007)

    Google Scholar 

  34. Paltoglou, G., Thelwall, M.: A study of information retrieval weighting schemes for sentiment analysis. In: Proceedings of Annual Meeting of the Association for Computational Linguistics (ACL-2010) (2010)

    Google Scholar 

  35. Li, Y.-M., Li, T.-Y.: Deriving market intelligence from microblogs. Decis. Support Syst. 55(1), 206–217 (2013)

    Article  Google Scholar 

  36. Ortigosa, A., Martin, J.M., Carro, R.M.: Sentiment analysis in Facebook and its application to e-learning. Comput. Hum. Behav. 31, 527–541 (2014)

    Article  Google Scholar 

  37. John, G.H., Langley, P.: Estimating continuous distributions in Bayesian classifiers. In: Proceedings of the Eleventh Conference on Uncertainty in Artificial Intelligence, pp. 338–345. Morgan Kaufmann (1995)

    Google Scholar 

  38. Chang, Y.-W., Hsieh, C.-J., Chang, K.-W., Ringgaard, M., Lin, C.-J.: Training and testing low-degree polynomial data mappings via linear SVM. J. Mach. Learn. Res. 11, 1471–1490 (2010)

    MathSciNet  MATH  Google Scholar 

  39. Vert, J.-P., Tsuda, K., Schölkopf, B.: A primer on kernel methods. In: Kernel Methods in Computational Biology (2004)

    Google Scholar 

  40. Bronshtein, A.: A quick introduction to k-nearest neighbors algorithm (2017). https://medium.com/@adi.bronshtein/a-quick-introduction-to-k-nearest-neighbors-algorithm-62214cea29c7. Accessed 15 Jan 2018

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Soufiane El Mrabti .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

El Mrabti, S., Al Achhab, M., Lazaar, M. (2018). Comparison of Feature Selection Methods for Sentiment Analysis. In: Tabii, Y., Lazaar, M., Al Achhab, M., Enneya, N. (eds) Big Data, Cloud and Applications. BDCA 2018. Communications in Computer and Information Science, vol 872. Springer, Cham. https://doi.org/10.1007/978-3-319-96292-4_21

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-96292-4_21

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-96291-7

  • Online ISBN: 978-3-319-96292-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics