Abstract
Sentiment analysis is process of deriving the opinion or attitude expressed in input text. For the classification problem, feature selection aims to select features that are capable of discriminating samples that belong to different classes. This paper evaluates the performance of three feature selection methods (MI, CHI and ANOVA) combined with three machine learning based classification techniques (NB, SVM and KNN) for sentiment analysis on online movie reviews dataset. The paper shows that feature selection is important task for sentiment based classification.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Nicholls, C., Song, F.: Comparison of feature selection methods for sentiment analysis. In: Farzindar, A., Kešelj, V. (eds.) AI 2010. LNCS (LNAI), vol. 6085, pp. 286–289. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-13059-5_30
Pang, B., Lee, L.: Opinion mining and sentiment analysis. Found. Trends Inf. Retr. 2, 1–135 (2008)
Tatemura, J.: Virtual reviewers for collaborative exploration of movie reviews. In: Proceedings of Intelligent User Interfaces (IUI), pp. 272–275 (2000)
Terveen, L., Hill, W., Amento, B., McDonald, D., Creter, J.: PHOAKS: a system for sharing recommendations. Commun. Assoc. Comput. Mach. (CACM) 40(3), 59–62 (1997)
Laver, M., Benoit, K., Garry, J.: Extracting policy positions from political texts using words as data. Am. Polit. Sci. Rev. 97(2), 311–331 (2003)
Mullen, T., Malouf, R.: A preliminary investigation into sentiment analysis of informal political discourse. In: AAAI Symposium on Computational Approaches to Analysing Weblogs (AAAICAAW), pp. 159–162 (2006)
Dasgupta, A., Drineas, P., Harb, B., Josifovski, V., Mahoney, M.W.: Feature selection methods for text classification. In: Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 230–239 (2007)
Wang, S., Li, D., Song, X., Wei, Y., Li, H.: A feature selection method based on improved fisher’s discriminant ratio for text sentiment classification. Expert Syst. Appl. 38(7), 8696–8702 (2011)
Ahmed, A., Chen, H., Salem, A.: Sentiment analysis in multiple languages: feature selection for opinion classification in web forums. ACM Trans. Inf. Syst. 26(3), 12:21–12:25 (2008)
Sharma, A., Dey, S.: A comparative study of feature selection and machine learning techniques for sentiment analysis. In: Proceedings of the 2012 ACM Research in Applied Computation Symposium. ACM (2012)
Kang, H., Yoo, S.J., Han, D.: Senti-lexicon and improved Naïve Bayes algorithms for sentiment analysis of restaurant reviews. Expert Syst. Appl. 39(5), 6000–6010 (2012)
Joachims, T.: Text categorization with support vector machines: learning with many relevant features. In: Proceedings of the ECML 1998, pp. 137–142 (1998)
Yang, Y., Lin, X.: A re-examination of text categorization methods. In: Proceedings of the SIGIR 1999, pp. 42–49 (1999)
Liu, B., Zhang, L.: A survey of opinion mining and sentiment analysis. In: Aggarwal, C., Zhai, C. (eds.) Mining Text Data, pp. 415–463. Springer, Boston (2012). https://doi.org/10.1007/978-1-4614-3223-4_13
Mikalai, T., Themis, P.: Survey on mining subjective data on the web. Data Min. Knowl. Discov. 24, 478–514 (2012)
Medhat, W., Hassan, A., Korashy, H.: Sentiment analysis algorithms and applications: a survey. Ain Shams Eng. J. 5, 1093–1113 (2014)
Maynard, D., Funk, A.: Automatic detection of political opinions in tweets. In: García-Castro, R., Fensel, D., Antoniou, G. (eds.) ESWC 2011. LNCS, vol. 7117, pp. 88–99. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-25953-1_8
Pang, B., Lee, L., Vaithyanathan, S.: Thumbs up? Sentiment classification using machine learning techniques. In: Proceedings of the ACL 2002 Conference on Empirical Methods in Natural Language Processing. ACL (2002)
Pang, B., Lee, L.: A sentimental education: sentiment analysis using subjectivity summarization based on minimum cuts. In: Proceedings of ACL (2004)
Xia, R., Zong, C., Li, S.: Ensemble of feature sets and classification algorithms for sentiment classification. Inf. Sci. 181(6), 1138–1152 (2011)
Dave, K., Lawrence, S., Pennock, D.M.: Mining the peanut gallery: opinion extraction and semantic classification of product reviews. In: Proceedings of the 12th International WWW Conference, Budapest, Hungary, 20–24 May 2003, pp. 519–528 (2003)
Wilson, T., Wiebe, J., Hoffman, P.: Recognizing contextual polarity in phraselevel sentiment analysis. In: Proceedings of the Human Language Technology Conference and Conference on Empirical Methods in Natural Language Processing, British Columbia, Canada, pp. 347–354 (2005)
Annett, M., Kondrak, G.: A comparison of sentiment analysis techniques: polarizing movie blogs. Adv. Artif. Intell. 5032, 25–35 (2008)
Jotheeswaram, J., Kumaraswamy, Y.S.: Opinion mining using decision tree based feature selection through Manhattan hierarchical technology. J. Theor. Appl. Inf. Technol. 58(1), 72–79 (2013)
Tan, S., Zhang, J.: An empirical study of sentiment analysis for Chinese documents. Expert Syst. Appl. 34(4), 2622–2629 (2008)
Gamon, M.: Sentiment classification on customer feedback data: noisy data, large feature vectors, and the role of linguistic analysis. In: Proceedings of the 20th International Conference on Computational Linguistics, Geneva, Switzerland. ACL (2004)
Cover, T.M., Thomas, J.A.: Elements of Information Theory. Wiley, New York (1991)
Yang, Y., Pedersen, J.O.: A comparative study on feature selection in text categorization. In: Proceedings of the ICML (1997)
Lin, H., Ding, H.: Predicting ion channels and their types by the dipeptide mode of pseudo amino acid composition. J. Theor. Biol. 269, 64–69 (2011)
Zhang, H., Gan, W., Jiang, B.: Machine learning and lexicon based methods for sentiment classification: a survey. In: Yuan, X., Meng, X. (eds.) Proceedings of the 11th Web Information System and Application Conference, pp. 262–265. IEEE Press, Piscataway (2014)
Kushal, D., Lawrence, S., Pennock, D.: Mining the peanut gallery: opinion extraction and semantic classification of product reviews. In: Proceedings of WWW (2003)
Boiy, E., Hens, P., Deschacht, K., Moens, M.: Automatic sentiment analysis in on-line text. In: Chan, L., Martens, B. (eds.) ELPUB, pp. 349–360 (2007)
Zaidan, O., Eisner, J., Piatko, C.: Using annotator rationales to improve machine learning for text categorization. In: NAACL – HLT (2007)
Paltoglou, G., Thelwall, M.: A study of information retrieval weighting schemes for sentiment analysis. In: Proceedings of Annual Meeting of the Association for Computational Linguistics (ACL-2010) (2010)
Li, Y.-M., Li, T.-Y.: Deriving market intelligence from microblogs. Decis. Support Syst. 55(1), 206–217 (2013)
Ortigosa, A., Martin, J.M., Carro, R.M.: Sentiment analysis in Facebook and its application to e-learning. Comput. Hum. Behav. 31, 527–541 (2014)
John, G.H., Langley, P.: Estimating continuous distributions in Bayesian classifiers. In: Proceedings of the Eleventh Conference on Uncertainty in Artificial Intelligence, pp. 338–345. Morgan Kaufmann (1995)
Chang, Y.-W., Hsieh, C.-J., Chang, K.-W., Ringgaard, M., Lin, C.-J.: Training and testing low-degree polynomial data mappings via linear SVM. J. Mach. Learn. Res. 11, 1471–1490 (2010)
Vert, J.-P., Tsuda, K., Schölkopf, B.: A primer on kernel methods. In: Kernel Methods in Computational Biology (2004)
Bronshtein, A.: A quick introduction to k-nearest neighbors algorithm (2017). https://medium.com/@adi.bronshtein/a-quick-introduction-to-k-nearest-neighbors-algorithm-62214cea29c7. Accessed 15 Jan 2018
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer Nature Switzerland AG
About this paper
Cite this paper
El Mrabti, S., Al Achhab, M., Lazaar, M. (2018). Comparison of Feature Selection Methods for Sentiment Analysis. In: Tabii, Y., Lazaar, M., Al Achhab, M., Enneya, N. (eds) Big Data, Cloud and Applications. BDCA 2018. Communications in Computer and Information Science, vol 872. Springer, Cham. https://doi.org/10.1007/978-3-319-96292-4_21
Download citation
DOI: https://doi.org/10.1007/978-3-319-96292-4_21
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-96291-7
Online ISBN: 978-3-319-96292-4
eBook Packages: Computer ScienceComputer Science (R0)