Automatically generating taxonomy for grouping app reviews — a study of three apps

Malgaonkar, Saurabh; Licorish, Sherlock A.; Savarimuthu, Bastin Tony Roy

doi:10.1007/s11219-021-09570-1

Automatically generating taxonomy for grouping app reviews — a study of three apps

Published: 29 October 2021

Volume 30, pages 483–512, (2022)
Cite this article

Software Quality Journal Aims and scope Submit manuscript

Saurabh Malgaonkar ORCID: orcid.org/0000-0001-6897-0335¹,
Sherlock A. Licorish¹ &
Bastin Tony Roy Savarimuthu¹

341 Accesses
1 Citation
Explore all metrics

Abstract

App reviews often reflect end-users’ requests, issues or suggestions for supporting app maintenance and evolution. Hence, researchers have evaluated several classification approaches for identifying and classifying such app reviews. However, these classification approaches are driven by manually derived taxonomies. This is a limitation given the burden of human involvement, numerous app reviews and dependency on the availability of domain knowledge to perform classification. In this study, we develop and evaluate a novel approach towards the automatic generation of a dynamic taxonomy that groups related app reviews. Our approach uses natural language processing, feature engineering and word sense disambiguation to automatically generate the taxonomy. In a pilot study, we validated the feasibility of our proposed approach with app reviews extracted from the popular My Tracks app, where outcomes revealed a 72% match with a manual taxonomy generated from domain knowledge provided by humans. We then extended the scope of this study by validating the application of the automated taxonomy generation approach on app reviews belonging to TradeMe and Flutter apps. The outcomes revealed 80% and 71% match with the manual taxonomy of the latter two apps. Thus, our approach shows promise for rapidly supporting software maintenance and evolution.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

CoRT: Transformer-based code representations with self-supervision by predicting reserved words for code smell detection

Article 08 April 2024

How different are different diff algorithms in Git?

Article Open access 11 September 2019

Applications of AI in classical software engineering

Article Open access 26 July 2020

Notes

References

Aggarwal, C., & Zhai, C. (2012). Mining text data: Springer Science Business Media.
Archak, N., Ghose, A., & Ipeirotis, P. G. (2007). Show me the money!: Deriving the pricing power of product features by mining consumer reviews. In Proceedings of the 13th ACM SIGKDD International Conference on Knowledge discovery and Data Mining (pp. 56–65): ACM.
Aslam, N., Ramay, W. Y., Xia, K., & Sarwar, N. (2020). Convolutional neural network based classification of app reviews. IEEE Access, 8, 185619–185628. https://doi.org/10.1109/ACCESS.2020.3029634
Article Google Scholar
Boehm, B., & Port, D. (2001). Educating software engineering students to manage risk. In Proceedings of the 23rd International Conference on Software Engineering (pp. 591–600): IEEE Computer Society.
Bollegala, D., Matsuo, Y., & Ishizuka, M. (2011). A web search engine-based approach to measure semantic similarity between words. IEEE Transactions on Knowledge and Data Engineering, 23(7), 977–990.
Article Google Scholar
Boutkova, E. (2011). Experience with variability management in requirement specifications. In 15th International Software Product Line Conference (SPLC) (pp. 303–312): IEEE.
Bruce, R. F., & Wiebe, J. M. (1999). Recognizing subjectivity: A case study in manual tagging. Natural Language Engineering, 5(2), 187–205.
Article Google Scholar
Brunetti, G., & Golob, B. (2000). A feature-based approach towards an integrated product model including conceptual design information. Computer-Aided Design, 32(14), 877–887.
Article Google Scholar
Bullinaria, J. A., & Levy, J. P. (2007). Extracting semantic representations from word co-occurrence statistics: A computational study. Behavior Research Methods, 39(3), 510–526.
Article Google Scholar
Chaniotaki, A. M., & Sharma, T. (2021). Architecture smells and Pareto principle: A preliminary empirical exploration. 2021 IEEE/ACM 18th International Conference on Mining Software Repositories (MSR) (pp. 190–194).
Chen, N., Lin, J., Hoi, S. C. H., Xiao, X., & Zhang, B. (2014). AR-miner: Mining informative reviews for developers from mobile app marketplace. Paper presented at the 36th International Conference on Software Engineering, Hyderabad, India.
Ciurumelea, A., Panichella, S., & Gall, H. C. (2018). Poster: Automated user reviews analyser. In 2018 IEEE/ACM 40th International Conference on Software Engineering: Companion (ICSE-Companion) (pp. 317–318).
Cysneiros, L. M., & do Prado Leite, J. C. S. (2004). Nonfunctional requirements: From elicitation to conceptual models. Ieee Transactions on Software Engineering, 30(5), 328–350.
Article Google Scholar
Dhinakaran, V. T., Pulle, R., Ajmeri, N., & Murukannaiah, P. K. (2018). App review analysis via active learning: reducing supervision effort without compromising classification accuracy. In 2018 IEEE 26th International Requirements Engineering Conference (RE) (pp. 170–181). https://doi.org/10.1109/RE.2018.00026
Di Sorbo, A., Panichella, S., Alexandru, C. V., Shimagaki, J., Visaggio, C. A., Canfora, G., et al. (2016). What would users change in my app? Summarizing app reviews for recommending software changes. In Proceedings of the 2016 24th ACM SIGSOFT International Symposium on Foundations of Software Engineering (pp. 499–510): ACM.
Di Sorbo, A., Panichella, S., Alexandru, C. V., Visaggio, C. A., & Canfora, G. (2017). SURF: summarizer of user reviews feedback. In 2017 IEEE/ACM 39th International Conference on Software Engineering Companion (ICSE-C) pp. 55-58. https://doi.org/10.1109/ICSE-C.2017.5. https://ieeexplore.ieee.org/abstract/document/7965256
Fleiss, J. L., & Cohen, J. (1973). The equivalence of weighted kappa and the intraclass correlation coefficient as measures of reliability. Educational and Psychological Measurement, 33(3), 613–619.
Article Google Scholar
Gao, C., Wang, B., He, P., Zhu, J., Zhou, Y., & Lyu, M. R. (2015). PAID: Prioritizing app issues for developers by tracking user reviews over versions. In 2015 IEEE 26th International Symposium on Software Reliability Engineering (ISSRE) (pp. 35–45).
González-Carvajal, S., & Garrido-Merchán, E. C. (2020). Comparing BERT against traditional machine learning text classification. arXiv preprint arXiv:2005.13012.
Hajič, J., Raab, J., & Spousta, M. (2009). Semi-supervised training for the averaged perceptron POS tagger. In Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics (pp. 763–771): Association for Computational Linguistics.
Illahi, I., Liu, H., Umer, Q., & Zaidi, S. A. H. (2019). An empirical study on competitive crowdsource software development: Motivating and inhibiting factors. IEEE Access, 7, 62042–62057. https://doi.org/10.1109/ACCESS.2019.2915604
Article Google Scholar
Jiang, H., Zhang, J., Li, X., Ren, Z., Lo, D., Wu, X., et al. (2019). Recommending new features from mobile app descriptions. ACM Transactions on Software Engineering and Methodology (TOSEM), 28(4), Article 22. https://doi.org/10.1145/3344158
Jiang, J. J., & Conrath, D. W. (1997). Semantic similarity based on corpus statistics and lexical taxonomy. arXiv preprint cmp-lg/9709008.
Karov, Y., & Edelman, S. (1998). Similarity-based word sense disambiguation. Computational Linguistics, 24(1), 41–59.
Google Scholar
Kiremire, A. R. (2011). The application of the pareto principle in software engineering. Consulted January, 13, 2016.
Ko, Y., Park, S., & Seo, J. (2000). Web-based requirements elicitation supporting system using requirements categorization. In Proceedings of Twelfth International Conference on Software Engineering and Knowledge Engineering (SEKE 2000), Chicago, USA (pp. 344–351).
Konkol, M., Brychcín, T., & Konopík, M. (2015). Latent semantics in named entity recognition. Expert Systems with Applications, 42(7), 3470–3479.
Article Google Scholar
Kropp, R. P., Stoker, H. W., & Bashaw, W. (1966). The validation of the taxonomy of educational objectives. The Journal of Experimental Education, 34(3), 69–76.
Article Google Scholar
Leacock, C., & Chodorow, M. (1998). Combining local context and WordNet similarity for word sense identification. WordNet: An electronic lexical database, 49(2), 265–283.
Licorish, S. A., Savarimuthu, B. T. R., & Keertipati, S. (2017). Attributes that predict which features to fix: Lessons for app store mining. Paper presented at the 21st International Conference on Evaluation and Assessment in Software Engineering, Karlskrona, Sweden.
Maalej, W., Kurtanović, Z., Nabil, H., & Stanik, C. (2016). On the automatic classification of app reviews. Requirements Engineering, 21(3), 311–331. https://doi.org/10.1007/s00766-016-0251-9
Article Google Scholar
Martinez-Gil, J. (2014). An overview of textual semantic similarity measures based on web intelligence. Artificial Intelligence Review, 42(4), 935–943.
Article Google Scholar
Mayring, P. (2004). Qualitative content analysis. A Companion to Qualitative Research, 1, 159–176.
Google Scholar
Mihalcea, R., Corley, C., & Strapparava, C. (2006). Corpus-based and knowledge-based measures of text semantic similarity. In AAAI (Vol. 6, pp. 775–780).
Minaee, S., Kalchbrenner, N., Cambria, E., Nikzad, N., Chenaghlu, M., & Gao, J. (2021). Deep learning-based text classification: A comprehensive review. ACM Computing Surveys (CSUR), 54(3), 1–40.
Article Google Scholar
Pagano, D., & Maalej, W. (2013) User feedback in the appstore: An empirical study. In 2013 21st IEEE International Requirements Engineering Conference (RE) (pp. 125–134). https://doi.org/10.1109/RE.2013.6636712
Panichella, S., Di Sorbo, A., Guzman, E., Visaggio, C. A., Canfora, G., & Gall, H. C. (2016). Ardoc: App reviews development oriented classifier. In Proceedings of the 2016 24th ACM SIGSOFT international symposium on foundations of software engineering (pp. 1023–1027): ACM.
Panichella, S., Sorbo, A. D., Guzman, E., Visaggio, C. A., Canfora, G., & Gall, H. C. (2015). How can I improve my app? Classifying user reviews for software maintenance and evolution. In 2015 IEEE International Conference on Software Maintenance and Evolution (ICSME) (pp. 281–290). https://doi.org/10.1109/ICSM.2015.7332474
Pennington, J., Socher, R., & Manning, C. (2014). Glove: Global vectors for word representation. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP) (pp. 1532–1543).
Petrakis, E. G., Varelas, G., Hliaoutakis, A., & Raftopoulou, P. (2006). X-similarity: Computing semantic similarity between concepts from different ontologies. Journal of Digital Information Management, 4(4).
Rohde, D. L., Gonnerman, L. M., & Plaut, D. C. (2006). An improved model of semantic similarity based on lexical co-occurrence. Communications of the ACM, 8(627–633), 116.
Google Scholar
Sánchez, D., Batet, M., & Isern, D. (2011). Ontology-based information content computation. Knowledge-Based Systems, 24(2), 297–303.
Article Google Scholar
Sharma, M., Aggarwal, D., & Pahuja, D. (2020). Categorization and classification of Uber reviews. In H. Sharma, K. Govindan, R. C. Poonia, S. Kumar, & W. M. El-Medany (Eds.), Advances in Computing and Intelligent Systems, Singapore (pp. 347–355): Springer Singapore.
Snijders, R., Dalpiaz, F., Hosseini, M., Shahri, A., & Ali, R. (2014) Crowd-centric requirements engineering. In 2014 IEEE/ACM 7th International Conference on Utility and Cloud Computing (pp. 614–615). https://doi.org/10.1109/UCC.2014.96
Vu, P., Nguyen, T., & Nguyen, T. (2019). On building an automated responding system for app reviews: What are the characteristics of reviews and their responses? arXiv:1908.10816.
Yang, Y., & Pedersen, J. O. (1997). A comparative study on feature selection in text categorization. In ICML (Vol. 97, pp. 35, Vol. 412–420).
Zhang, M., Palade, V., Wang, Y., & Ji, Z. (2019). Word representation with salient features. IEEE Access, 7, 30157–30173. https://doi.org/10.1109/ACCESS.2019.2892817
Article Google Scholar

Download references

Acknowledgements

We would like to thank the reviewers for their detailed and insightful comments on the QUATIC 2020 conference paper titled ‘Towards Automated Taxonomy Generation for Grouping App Reviews: A Preliminary Empirical Study’ which was extended in this study. We would also like to thank the app developers of My Tracks, TradeMe and Flutter for proving the app reviews which facilitated the execution of this study.

Funding

This work was partially funded by a Commerce Research Grant (CRG) at the Otago Business School.

Author information

Authors and Affiliations

Department of Information Science, University of Otago, Dunedin, New Zealand
Saurabh Malgaonkar, Sherlock A. Licorish & Bastin Tony Roy Savarimuthu

Authors

Saurabh Malgaonkar
View author publications
You can also search for this author in PubMed Google Scholar
Sherlock A. Licorish
View author publications
You can also search for this author in PubMed Google Scholar
Bastin Tony Roy Savarimuthu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Saurabh Malgaonkar.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Malgaonkar, S., Licorish, S.A. & Savarimuthu, B.T.R. Automatically generating taxonomy for grouping app reviews — a study of three apps. Software Qual J 30, 483–512 (2022). https://doi.org/10.1007/s11219-021-09570-1

Download citation

Accepted: 22 July 2021
Published: 29 October 2021
Issue Date: June 2022
DOI: https://doi.org/10.1007/s11219-021-09570-1

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Automatically generating taxonomy for grouping app reviews — a study of three apps

Abstract

Access this article

Similar content being viewed by others

CoRT: Transformer-based code representations with self-supervision by predicting reserved words for code smell detection

How different are different diff algorithms in Git?

Applications of AI in classical software engineering

Notes

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Automatically generating taxonomy for grouping app reviews — a study of three apps

Abstract

Access this article

Similar content being viewed by others

CoRT: Transformer-based code representations with self-supervision by predicting reserved words for code smell detection

How different are different diff algorithms in Git?

Applications of AI in classical software engineering

Notes

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation