Skip to main content
Log in

Opinion mining for app reviews: an analysis of textual representation and predictive models

  • Published:
Automated Software Engineering Aims and scope Submit manuscript

Abstract

Popular mobile applications receive millions of user reviews. These reviews contain relevant information for software maintenance, such as bug reports and improvement suggestions. The review’s information is a valuable knowledge source for software requirements engineering since the apps review analysis helps make strategic decisions to improve the app quality. However, due to the large volume of texts, the manual extraction of the relevant information is an impracticable task. Opinion mining is the field of study for analyzing people’s sentiments and emotions through opinions expressed on the web, such as social networks, forums, and community platforms for products and services recommendation. In this paper, we investigate opinion mining for app reviews. In particular, we compare textual representation techniques for classification, sentiment analysis, and utility prediction from app reviews. We discuss and evaluate different techniques for the textual representation of reviews, from traditional Bag-of-Words (BoW) to the most recent state-of-the-art Neural Language models (NLM). Our findings show that the traditional Bag-of-Words model, combined with a careful analysis of text pre-processing techniques, is still competitive. It obtains results close to the NLM in the classification, sentiment analysis and utility prediction tasks. However, NLM proved to be more advantageous since they achieved very competitive performance in all the predictive tasks covered in this work, provide significant dimensionality reduction, and deals more adequately with semantic proximity between the reviews’ texts.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11

Similar content being viewed by others

Notes

  1. We generated the BoW model with bigrams by using the bigram generator of the scikit-learn library.

  2. https://github.com/facundoolano/google-play-scraper.

References

  • Aggarwal, C.C.: Machine Learning for Text, 1st edn. Springer Publishing Company, Incorporated, Berlin (2018)

    Book  Google Scholar 

  • Aggarwal, C.C.: Opinion mining and sentiment analysis. In: Machine Learning for Text, pp. 413–434. Springer (2018)

  • Al Kilani, N., Tailakh, R., Hanani, A.: Automatic classification of apps reviews for requirement engineering: Exploring the customers need from healthcare applications. In: 2019 Sixth International Conference on Social Networks Analysis, Management and Security (SNAMS), pp. 541–548 (2019)

  • Aralikatte, R., Sridhara, G., Gantayat, N., Mani, S.: Fault in your stars: an analysis of android app reviews. In: Proceedings of the ACM India Joint International Conference on Data Science and Management of Data, pp. 57–66 (2018)

  • Araujo, A., Golo, M., Viana, B., Sanches, F., Romero, R., Marcacini, R.: From bag-of-words to pre-trained neural language models: Improving automatic classification of app reviews for requirements engineering. In: Proceedings of the XVII National Meeting on Artificial and Computational Intelligence, pp. 378–389. SBC (2020)

  • Belinkov, Y., Glass, J.: Analysis methods in neural language processing: a survey. Trans. Ass. Comput. Linguist. 7, 49–72 (2019)

    Google Scholar 

  • Benevenuto, F., Araújo, M., Ribeiro, F.: Sentiment analysis methods for social media. In: Proceedings of the 21st Brazilian Symposium on Multimedia and the Web, pp. 11–11. ACM (2015)

  • Dabrowski, J., Letier, E., Perini, A., Susi, A.: Mining user opinions to support requirement engineering: an empirical study. In: Dustdar, S., Yu, E., Salinesi, C., Rieu, D., Pant, V. (eds.) Advanced Information Systems Engineering, pp. 401–416. Springer International Publishing, Cham (2020)

    Chapter  Google Scholar 

  • Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018)

  • Dhondt, E., Verberne, S., Koster, C., Boves, L.: Text representations for patent classification. Computat. Linguist. 39(3), 755–775 (2013)

    Article  Google Scholar 

  • Dragoni, M., Federici, M., Rexha, A.: An unsupervised aspect extraction strategy for monitoring real-time reviews stream. Inf. Process. Manage. 56(3), 1103–1118 (2019). https://doi.org/10.1016/j.ipm.2018.04.010

    Article  Google Scholar 

  • Du, J., Rong, J., Wang, H., Zhang, Y.: Helpfulness prediction for online reviews with explicit content-rating interaction. In: Cheng, R., Mamoulis, N., Sun, Y., Huang, X. (eds.) Web Information Systems Engineering - WISE 2019, pp. 795–809. Springer International Publishing, Cham (2019)

    Chapter  Google Scholar 

  • Feldman, R.: Techniques and applications for sentiment analysis. Commun. ACM 56(4), 82–89 (2013)

    Article  Google Scholar 

  • Fu, M., Qu, H., Huang, L., Lu, L.: Bag of meta-words: a novel method to represent document for the sentiment classification. Expert Syst. Appl. 113, 33–43 (2018)

    Article  Google Scholar 

  • García, S., Fernández, A., Luengo, J., Herrera, F.: Advanced nonparametric tests for multiple comparisons in the design of experiments in computational intelligence and data mining: Experimental analysis of power. Inf. Sci. 180(10), 2044–2064 (2010). https://doi.org/10.1016/j.ins.2009.12.010

    Article  Google Scholar 

  • Gôlo, M., Marcacini, R., Rossi, R.: An extensive empirical evaluation of preprocessing techniques and supervised one class learning algorithms for text classification. In: Proceeding of the National Meeting on Artificial and Computational Intelligence (ENIAC), pp. 262–273. SBC, Brazil (2019)

  • Guzman, E., El-Haliby, M., Bruegge, B.: Ensemble methods for app review classification: An approach for software evolution (n). In: 2015 30th IEEE/ACM International Conference on Automated Software Engineering (ASE), pp. 771–776 (2015)

  • Guzman, E., Maalej, W.: How do users like this feature? a fine grained sentiment analysis of app reviews. In: 2014 IEEE 22nd International Requirements Engineering Conference (RE), pp. 153–162 (2014)

  • Hutto, C., Gilbert, E.: Vader: A parsimonious rule-based model for sentiment analysis of social media text. In: Proceedings of the International AAAI Conference on Web and Social Media, vol. 8 (2014)

  • Kemmler, M., Rodner, E., Wacker, E.S., Denzler, J.: One-class classification with gaussian processes. Pattern Recogn. 46(12), 3507–3518 (2013)

    Article  Google Scholar 

  • Kim, S.M., Pantel, P., Chklovski, T., Pennacchiotti, M.: Automatically assessing review helpfulness. In: Proceedings of the 2006 Conference on empirical methods in natural language processing, pp. 423–430 (2006)

  • Kudraszow, N.L., Vieu, P.: Uniform consistency of knn regressors for functional variables. Stat. Prob. Lett. 83(8), 1863–1870 (2013)

    Article  Google Scholar 

  • Lindley, D.V., Smith, A.F.: Bayes estimates for the linear model. J. Roy. Stat. Soc.: Ser. B 34(1), 1–18 (1972)

    MathSciNet  MATH  Google Scholar 

  • Liu, B.: Sentiment analysis and opinion mining. Synth. Lectures Human Lang. Technol. 5(1), 1–167 (2012)

    Article  Google Scholar 

  • Liu, Y., Ott, M., Goyal, N., Du, J., Joshi, M., Chen, D., Levy, O., Lewis, M., Zettlemoyer, L., Stoyanov, V.: Roberta: A robustly optimized bert pretraining approach. arXiv preprint arXiv:1907.11692 (2019)

  • Lu, Y., Tsaparas, P., Ntoulas, A., Polanyi, L.: Exploiting social context for review quality prediction. In: Proceedings of the 19th international conference on World wide web, pp. 691–700 (2010)

  • Luiz, W., Viegas, F., Alencar, R., Mourão, F., Salles, T., Carvalho, D., Gonçalves, M.A., Rocha, L.: A feature-oriented sentiment rating for mobile app reviews. In: Proceedings of the 2018 World Wide Web Conference, pp. 1909–1918 (2018)

  • Maalej, W., Kurtanović, Z., Nabil, H., Stanik, C.: On the automatic classification of app reviews. Requir. Eng. 21(3), 311–331 (2016)

    Article  Google Scholar 

  • Maalej, W., Nayebi, M., Johann, T., Ruhe, G.: Toward data-driven requirements engineering. IEEE Softw. 33(1), 48–54 (2016)

    Article  Google Scholar 

  • Marcacini, R.M., Rossi, R.G., Matsuno, I.P., Rezende, S.O.: Cross-domain aspect extraction for sentiment analysis: a transductive learning approach. Decis. Support Syst. 114, 70–80 (2018)

    Article  Google Scholar 

  • Martin, W., Sarro, F., Jia, Y., Zhang, Y., Harman, M.: A survey of app store analysis for software engineering. IEEE Trans. Software Eng. 43(09), 817–847 (2017). https://doi.org/10.1109/TSE.2016.2630689

    Article  Google Scholar 

  • Messaoud, M.B., Jenhani, I., Jemaa, N.B., Mkaouer, M.W.: A multi-label active learning approach for mobile app user review classification. In: International Conference on Knowledge Science, Engineering and Management, pp. 805–816 (2019)

  • Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J.: Distributed representations of words and phrases and their compositionality. In: Advances in neural information processing systems, pp. 3111–3119 (2013)

  • Mulder, W., Bethard, S., Moens, M.F.: A survey on the application of recurrent neural networks to statistical language modeling. Comput. Speech Lang. 30(1), 61–98 (2015)

    Article  Google Scholar 

  • Murtagh, F.: Multilayer perceptrons for classification and regression. Neurocomputing 2(5–6), 183–197 (1991)

    Article  MathSciNet  Google Scholar 

  • Otter, D.W., Medina, J.R., Kalita, J.K.: A survey of the usages of deep learning for natural language processing. IEEE Transactions on Neural Networks and Learning Systems (2020)

  • Pagano, D., Maalej, W.: User feedback in the appstore: An empirical study. In: IEEE International Requirements Engineering Conference (RE), pp. 125–134 (2013). 10.1109/RE.2013.6636712

  • Reimers, N., Gurevych, I.: Sentence-bert: Sentence embeddings using siamese bert-networks. In: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pp. 3973–3983 (2019)

  • Ribeiro, M.T., Singh, S., Guestrin, C.: “ why should i trust you?” explaining the predictions of any classifier. In: Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pp. 1135–1144 (2016)

  • Rudkowsky, E., Haselmayer, M., Wastian, M., Jenny, M., Emrich, Š, Sedlmair, M.: More than bags of words: sentiment analysis with word embeddings. Commun. Methods Meas. 12(2–3), 140–157 (2018)

    Article  Google Scholar 

  • Sanh, V., Debut, L., Chaumond, J., Wolf, T.: Distilbert, a distilled version of bert: smaller, faster, cheaper and lighter. arXiv preprint arXiv:1910.01108 (2019)

  • Shah, F.A., Sirts, K., Pfahl, D.: Using app reviews for competitive analysis: Tool support. In: Proceedings of the 3rd ACM SIGSOFT International Workshop on App Market Analytics, WAMA 2019, pp. 40–46. ACM, New York, NY, USA (2019)

  • Smola, A.J., Schölkopf, B.: A tutorial on support vector regression. Stat. Comput. 14(3), 199–222 (2004)

    Article  MathSciNet  Google Scholar 

  • Stanik, C., Haering, M., Maalej, W.: Classifying multilingual user feedback using traditional machine learning and deep learning. In: 2019 IEEE 27th International Requirements Engineering Conf. Workshops (REW), pp. 220–226 (2019)

  • Tan, P., Steinbach, M., Kumar, V.: Introduction to Data Mining: Pearson New International Edition. Pearson Education Limited (2013)

  • Tax, D.M., Duin, R.P.: Support vector data description. Mach. learn. 54(1), 45–66 (2004)

    Article  Google Scholar 

  • Tax, D.M.J.: One-class classification: concept learning in the absence of counter-examples. Ph.D. thesis, Technische Universiteit Delft (2001)

  • Thelwall, M., Buckley, K., Paltoglou, G., Cai, D., Kappas, A.: Sentiment strength detection in short informal text. J. Am. Soc. Inform. Sci. Technol. 61, 2544–2558 (2010). https://doi.org/10.1002/asi.21416

    Article  Google Scholar 

  • Trawinski, B., Smetek, M., Telec, Z., Lasota, T.: Nonparametric statistical analysis for multiple comparison of machine learning regression algorithms. Int. J. Appl. Math. Comput. Sci. (2012). https://doi.org/10.2478/10006-012-0064-z

    Article  MathSciNet  MATH  Google Scholar 

  • Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Advances in neural information processing systems, pp. 5998–6008 (2017)

  • Wang, C., Zhang, F., Liang, P., Daneva, M., van Sinderen, M.: Can app changelogs improve requirements classification from app reviews? an exploratory study. In: Proceedings of the 12th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement, pp. 1–4 (2018)

  • Yogarajan, V., Gouk, H., Smith, T.C., Mayo, M., Pfahringer, B.: Comparing high dimensional word embeddings trained on medical text to bag-of-words for predicting medical codes. In: ACIIDS 2020, pp. 97–108. Springer (2020)

  • Zeng, B., Yang, H., Xu, R., Zhou, W., Han, X.: Lcf: a local context focus mechanism for aspect-based sentiment classification. Appl. Sci. 9, 3389 (2019). https://doi.org/10.3390/app9163389

    Article  Google Scholar 

  • Zhang, Z., Varadarajan, B.: Utility scoring of product reviews. In: Proceedings of the 15th ACM international conference on Information and knowledge management, pp. 51–57 (2006)

  • Zhao, L., Zhao, A.: Sentiment analysis based requirement evolution prediction. Future Internet 11(2) (2019). 10.3390/fi11020052. https://www.mdpi.com/1999-5903/11/2/52

  • Zhou, X., Zhang, Y., Cui, L., Huang, D.: Evaluating commonsense in pre-trained language models. In: AAAI, pp. 9733–9740 (2020)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Adailton F. Araujo.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Araujo, A.F., Gôlo, M.P.S. & Marcacini, R.M. Opinion mining for app reviews: an analysis of textual representation and predictive models. Autom Softw Eng 29, 5 (2022). https://doi.org/10.1007/s10515-021-00301-1

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s10515-021-00301-1

Keywords

Navigation