Opinion mining for app reviews: an analysis of textual representation and predictive models

Araujo, Adailton F.; Gôlo, Marcos P. S.; Marcacini, Ricardo M.

doi:10.1007/s10515-021-00301-1

Opinion mining for app reviews: an analysis of textual representation and predictive models

Published: 06 October 2021

Volume 29, article number 5, (2022)
Cite this article

Automated Software Engineering Aims and scope Submit manuscript

Adailton F. Araujo ORCID: orcid.org/0000-0002-2392-4818¹,
Marcos P. S. Gôlo¹ &
Ricardo M. Marcacini¹

1402 Accesses
14 Citations
Explore all metrics

Abstract

Popular mobile applications receive millions of user reviews. These reviews contain relevant information for software maintenance, such as bug reports and improvement suggestions. The review’s information is a valuable knowledge source for software requirements engineering since the apps review analysis helps make strategic decisions to improve the app quality. However, due to the large volume of texts, the manual extraction of the relevant information is an impracticable task. Opinion mining is the field of study for analyzing people’s sentiments and emotions through opinions expressed on the web, such as social networks, forums, and community platforms for products and services recommendation. In this paper, we investigate opinion mining for app reviews. In particular, we compare textual representation techniques for classification, sentiment analysis, and utility prediction from app reviews. We discuss and evaluate different techniques for the textual representation of reviews, from traditional Bag-of-Words (BoW) to the most recent state-of-the-art Neural Language models (NLM). Our findings show that the traditional Bag-of-Words model, combined with a careful analysis of text pre-processing techniques, is still competitive. It obtains results close to the NLM in the classification, sentiment analysis and utility prediction tasks. However, NLM proved to be more advantageous since they achieved very competitive performance in all the predictive tasks covered in this work, provide significant dimensionality reduction, and deals more adequately with semantic proximity between the reviews’ texts.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 10

On the automatic classification of app reviews

Article 14 May 2016

App2Check Extension for Sentiment Analysis of Amazon Products Reviews

Simplifying the Classification of App Reviews Using Only Lexical Features

Notes

We generated the BoW model with bigrams by using the bigram generator of the scikit-learn library.
https://github.com/facundoolano/google-play-scraper.

References

Aggarwal, C.C.: Machine Learning for Text, 1st edn. Springer Publishing Company, Incorporated, Berlin (2018)
Book Google Scholar
Aggarwal, C.C.: Opinion mining and sentiment analysis. In: Machine Learning for Text, pp. 413–434. Springer (2018)
Al Kilani, N., Tailakh, R., Hanani, A.: Automatic classification of apps reviews for requirement engineering: Exploring the customers need from healthcare applications. In: 2019 Sixth International Conference on Social Networks Analysis, Management and Security (SNAMS), pp. 541–548 (2019)
Aralikatte, R., Sridhara, G., Gantayat, N., Mani, S.: Fault in your stars: an analysis of android app reviews. In: Proceedings of the ACM India Joint International Conference on Data Science and Management of Data, pp. 57–66 (2018)
Araujo, A., Golo, M., Viana, B., Sanches, F., Romero, R., Marcacini, R.: From bag-of-words to pre-trained neural language models: Improving automatic classification of app reviews for requirements engineering. In: Proceedings of the XVII National Meeting on Artificial and Computational Intelligence, pp. 378–389. SBC (2020)
Belinkov, Y., Glass, J.: Analysis methods in neural language processing: a survey. Trans. Ass. Comput. Linguist. 7, 49–72 (2019)
Google Scholar
Benevenuto, F., Araújo, M., Ribeiro, F.: Sentiment analysis methods for social media. In: Proceedings of the 21st Brazilian Symposium on Multimedia and the Web, pp. 11–11. ACM (2015)
Dabrowski, J., Letier, E., Perini, A., Susi, A.: Mining user opinions to support requirement engineering: an empirical study. In: Dustdar, S., Yu, E., Salinesi, C., Rieu, D., Pant, V. (eds.) Advanced Information Systems Engineering, pp. 401–416. Springer International Publishing, Cham (2020)
Chapter Google Scholar
Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018)
Dhondt, E., Verberne, S., Koster, C., Boves, L.: Text representations for patent classification. Computat. Linguist. 39(3), 755–775 (2013)
Article Google Scholar
Dragoni, M., Federici, M., Rexha, A.: An unsupervised aspect extraction strategy for monitoring real-time reviews stream. Inf. Process. Manage. 56(3), 1103–1118 (2019). https://doi.org/10.1016/j.ipm.2018.04.010
Article Google Scholar
Du, J., Rong, J., Wang, H., Zhang, Y.: Helpfulness prediction for online reviews with explicit content-rating interaction. In: Cheng, R., Mamoulis, N., Sun, Y., Huang, X. (eds.) Web Information Systems Engineering - WISE 2019, pp. 795–809. Springer International Publishing, Cham (2019)
Chapter Google Scholar
Feldman, R.: Techniques and applications for sentiment analysis. Commun. ACM 56(4), 82–89 (2013)
Article Google Scholar
Fu, M., Qu, H., Huang, L., Lu, L.: Bag of meta-words: a novel method to represent document for the sentiment classification. Expert Syst. Appl. 113, 33–43 (2018)
Article Google Scholar
García, S., Fernández, A., Luengo, J., Herrera, F.: Advanced nonparametric tests for multiple comparisons in the design of experiments in computational intelligence and data mining: Experimental analysis of power. Inf. Sci. 180(10), 2044–2064 (2010). https://doi.org/10.1016/j.ins.2009.12.010
Article Google Scholar
Gôlo, M., Marcacini, R., Rossi, R.: An extensive empirical evaluation of preprocessing techniques and supervised one class learning algorithms for text classification. In: Proceeding of the National Meeting on Artificial and Computational Intelligence (ENIAC), pp. 262–273. SBC, Brazil (2019)
Guzman, E., El-Haliby, M., Bruegge, B.: Ensemble methods for app review classification: An approach for software evolution (n). In: 2015 30th IEEE/ACM International Conference on Automated Software Engineering (ASE), pp. 771–776 (2015)
Guzman, E., Maalej, W.: How do users like this feature? a fine grained sentiment analysis of app reviews. In: 2014 IEEE 22nd International Requirements Engineering Conference (RE), pp. 153–162 (2014)
Hutto, C., Gilbert, E.: Vader: A parsimonious rule-based model for sentiment analysis of social media text. In: Proceedings of the International AAAI Conference on Web and Social Media, vol. 8 (2014)
Kemmler, M., Rodner, E., Wacker, E.S., Denzler, J.: One-class classification with gaussian processes. Pattern Recogn. 46(12), 3507–3518 (2013)
Article Google Scholar
Kim, S.M., Pantel, P., Chklovski, T., Pennacchiotti, M.: Automatically assessing review helpfulness. In: Proceedings of the 2006 Conference on empirical methods in natural language processing, pp. 423–430 (2006)
Kudraszow, N.L., Vieu, P.: Uniform consistency of knn regressors for functional variables. Stat. Prob. Lett. 83(8), 1863–1870 (2013)
Article Google Scholar
Lindley, D.V., Smith, A.F.: Bayes estimates for the linear model. J. Roy. Stat. Soc.: Ser. B 34(1), 1–18 (1972)
MathSciNet MATH Google Scholar
Liu, B.: Sentiment analysis and opinion mining. Synth. Lectures Human Lang. Technol. 5(1), 1–167 (2012)
Article Google Scholar
Liu, Y., Ott, M., Goyal, N., Du, J., Joshi, M., Chen, D., Levy, O., Lewis, M., Zettlemoyer, L., Stoyanov, V.: Roberta: A robustly optimized bert pretraining approach. arXiv preprint arXiv:1907.11692 (2019)
Lu, Y., Tsaparas, P., Ntoulas, A., Polanyi, L.: Exploiting social context for review quality prediction. In: Proceedings of the 19th international conference on World wide web, pp. 691–700 (2010)
Luiz, W., Viegas, F., Alencar, R., Mourão, F., Salles, T., Carvalho, D., Gonçalves, M.A., Rocha, L.: A feature-oriented sentiment rating for mobile app reviews. In: Proceedings of the 2018 World Wide Web Conference, pp. 1909–1918 (2018)
Maalej, W., Kurtanović, Z., Nabil, H., Stanik, C.: On the automatic classification of app reviews. Requir. Eng. 21(3), 311–331 (2016)
Article Google Scholar
Maalej, W., Nayebi, M., Johann, T., Ruhe, G.: Toward data-driven requirements engineering. IEEE Softw. 33(1), 48–54 (2016)
Article Google Scholar
Marcacini, R.M., Rossi, R.G., Matsuno, I.P., Rezende, S.O.: Cross-domain aspect extraction for sentiment analysis: a transductive learning approach. Decis. Support Syst. 114, 70–80 (2018)
Article Google Scholar
Martin, W., Sarro, F., Jia, Y., Zhang, Y., Harman, M.: A survey of app store analysis for software engineering. IEEE Trans. Software Eng. 43(09), 817–847 (2017). https://doi.org/10.1109/TSE.2016.2630689
Article Google Scholar
Messaoud, M.B., Jenhani, I., Jemaa, N.B., Mkaouer, M.W.: A multi-label active learning approach for mobile app user review classification. In: International Conference on Knowledge Science, Engineering and Management, pp. 805–816 (2019)
Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J.: Distributed representations of words and phrases and their compositionality. In: Advances in neural information processing systems, pp. 3111–3119 (2013)
Mulder, W., Bethard, S., Moens, M.F.: A survey on the application of recurrent neural networks to statistical language modeling. Comput. Speech Lang. 30(1), 61–98 (2015)
Article Google Scholar
Murtagh, F.: Multilayer perceptrons for classification and regression. Neurocomputing 2(5–6), 183–197 (1991)
Article MathSciNet Google Scholar
Otter, D.W., Medina, J.R., Kalita, J.K.: A survey of the usages of deep learning for natural language processing. IEEE Transactions on Neural Networks and Learning Systems (2020)
Pagano, D., Maalej, W.: User feedback in the appstore: An empirical study. In: IEEE International Requirements Engineering Conference (RE), pp. 125–134 (2013). 10.1109/RE.2013.6636712
Reimers, N., Gurevych, I.: Sentence-bert: Sentence embeddings using siamese bert-networks. In: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pp. 3973–3983 (2019)
Ribeiro, M.T., Singh, S., Guestrin, C.: “ why should i trust you?” explaining the predictions of any classifier. In: Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pp. 1135–1144 (2016)
Rudkowsky, E., Haselmayer, M., Wastian, M., Jenny, M., Emrich, Š, Sedlmair, M.: More than bags of words: sentiment analysis with word embeddings. Commun. Methods Meas. 12(2–3), 140–157 (2018)
Article Google Scholar
Sanh, V., Debut, L., Chaumond, J., Wolf, T.: Distilbert, a distilled version of bert: smaller, faster, cheaper and lighter. arXiv preprint arXiv:1910.01108 (2019)
Shah, F.A., Sirts, K., Pfahl, D.: Using app reviews for competitive analysis: Tool support. In: Proceedings of the 3rd ACM SIGSOFT International Workshop on App Market Analytics, WAMA 2019, pp. 40–46. ACM, New York, NY, USA (2019)
Smola, A.J., Schölkopf, B.: A tutorial on support vector regression. Stat. Comput. 14(3), 199–222 (2004)
Article MathSciNet Google Scholar
Stanik, C., Haering, M., Maalej, W.: Classifying multilingual user feedback using traditional machine learning and deep learning. In: 2019 IEEE 27th International Requirements Engineering Conf. Workshops (REW), pp. 220–226 (2019)
Tan, P., Steinbach, M., Kumar, V.: Introduction to Data Mining: Pearson New International Edition. Pearson Education Limited (2013)
Tax, D.M., Duin, R.P.: Support vector data description. Mach. learn. 54(1), 45–66 (2004)
Article Google Scholar
Tax, D.M.J.: One-class classification: concept learning in the absence of counter-examples. Ph.D. thesis, Technische Universiteit Delft (2001)
Thelwall, M., Buckley, K., Paltoglou, G., Cai, D., Kappas, A.: Sentiment strength detection in short informal text. J. Am. Soc. Inform. Sci. Technol. 61, 2544–2558 (2010). https://doi.org/10.1002/asi.21416
Article Google Scholar
Trawinski, B., Smetek, M., Telec, Z., Lasota, T.: Nonparametric statistical analysis for multiple comparison of machine learning regression algorithms. Int. J. Appl. Math. Comput. Sci. (2012). https://doi.org/10.2478/10006-012-0064-z
Article MathSciNet MATH Google Scholar
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Advances in neural information processing systems, pp. 5998–6008 (2017)
Wang, C., Zhang, F., Liang, P., Daneva, M., van Sinderen, M.: Can app changelogs improve requirements classification from app reviews? an exploratory study. In: Proceedings of the 12th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement, pp. 1–4 (2018)
Yogarajan, V., Gouk, H., Smith, T.C., Mayo, M., Pfahringer, B.: Comparing high dimensional word embeddings trained on medical text to bag-of-words for predicting medical codes. In: ACIIDS 2020, pp. 97–108. Springer (2020)
Zeng, B., Yang, H., Xu, R., Zhou, W., Han, X.: Lcf: a local context focus mechanism for aspect-based sentiment classification. Appl. Sci. 9, 3389 (2019). https://doi.org/10.3390/app9163389
Article Google Scholar
Zhang, Z., Varadarajan, B.: Utility scoring of product reviews. In: Proceedings of the 15th ACM international conference on Information and knowledge management, pp. 51–57 (2006)
Zhao, L., Zhao, A.: Sentiment analysis based requirement evolution prediction. Future Internet 11(2) (2019). 10.3390/fi11020052. https://www.mdpi.com/1999-5903/11/2/52
Zhou, X., Zhang, Y., Cui, L., Huang, D.: Evaluating commonsense in pre-trained language models. In: AAAI, pp. 9733–9740 (2020)

Download references

Author information

Authors and Affiliations

Institute of Mathematics and Computer Sciences, University of São Paulo (USP), PO Box 668, 13.560-970, São Carlos, SP, Brazil
Adailton F. Araujo, Marcos P. S. Gôlo & Ricardo M. Marcacini

Authors

Adailton F. Araujo
View author publications
You can also search for this author in PubMed Google Scholar
Marcos P. S. Gôlo
View author publications
You can also search for this author in PubMed Google Scholar
Ricardo M. Marcacini
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Adailton F. Araujo.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Araujo, A.F., Gôlo, M.P.S. & Marcacini, R.M. Opinion mining for app reviews: an analysis of textual representation and predictive models. Autom Softw Eng 29, 5 (2022). https://doi.org/10.1007/s10515-021-00301-1

Download citation

Received: 06 May 2021
Accepted: 20 September 2021
Published: 06 October 2021
DOI: https://doi.org/10.1007/s10515-021-00301-1

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Opinion mining for app reviews: an analysis of textual representation and predictive models

Abstract

Access this article

Similar content being viewed by others

On the automatic classification of app reviews

App2Check Extension for Sentiment Analysis of Amazon Products Reviews

Simplifying the Classification of App Reviews Using Only Lexical Features

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Opinion mining for app reviews: an analysis of textual representation and predictive models

Abstract

Access this article

Similar content being viewed by others

On the automatic classification of app reviews

App2Check Extension for Sentiment Analysis of Amazon Products Reviews

Simplifying the Classification of App Reviews Using Only Lexical Features

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation