Skip to main content
Log in

Automatically generating taxonomy for grouping app reviews — a study of three apps

  • Published:
Software Quality Journal Aims and scope Submit manuscript

Abstract

App reviews often reflect end-users’ requests, issues or suggestions for supporting app maintenance and evolution. Hence, researchers have evaluated several classification approaches for identifying and classifying such app reviews. However, these classification approaches are driven by manually derived taxonomies. This is a limitation given the burden of human involvement, numerous app reviews and dependency on the availability of domain knowledge to perform classification. In this study, we develop and evaluate a novel approach towards the automatic generation of a dynamic taxonomy that groups related app reviews. Our approach uses natural language processing, feature engineering and word sense disambiguation to automatically generate the taxonomy. In a pilot study, we validated the feasibility of our proposed approach with app reviews extracted from the popular My Tracks app, where outcomes revealed a 72% match with a manual taxonomy generated from domain knowledge provided by humans. We then extended the scope of this study by validating the application of the automated taxonomy generation approach on app reviews belonging to TradeMe and Flutter apps. The outcomes revealed 80% and 71% match with the manual taxonomy of the latter two apps. Thus, our approach shows promise for rapidly supporting software maintenance and evolution.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

Notes

  1. https://tinyurl.com/ycbyw2as

  2. https://tinyurl.com/pbwr6wnc

  3. https://www.python.org/

  4. https://tinyurl.com/yxldfp5p

  5. https://tinyurl.com/w4azwge

  6. https://tinyurl.com/mmuumyf

  7. https://tinyurl.com/y357rgvf

  8. https://tinyurl.com/y4ny72jy

  9. https://tinyurl.com/y5efud92

References

  • Aggarwal, C., & Zhai, C. (2012). Mining text data: Springer Science Business Media.

  • Archak, N., Ghose, A., & Ipeirotis, P. G. (2007). Show me the money!: Deriving the pricing power of product features by mining consumer reviews. In Proceedings of the 13th ACM SIGKDD International Conference on Knowledge discovery and Data Mining (pp. 56–65): ACM.

  • Aslam, N., Ramay, W. Y., Xia, K., & Sarwar, N. (2020). Convolutional neural network based classification of app reviews. IEEE Access, 8, 185619–185628. https://doi.org/10.1109/ACCESS.2020.3029634

    Article  Google Scholar 

  • Boehm, B., & Port, D. (2001). Educating software engineering students to manage risk. In Proceedings of the 23rd International Conference on Software Engineering (pp. 591–600): IEEE Computer Society.

  • Bollegala, D., Matsuo, Y., & Ishizuka, M. (2011). A web search engine-based approach to measure semantic similarity between words. IEEE Transactions on Knowledge and Data Engineering, 23(7), 977–990.

    Article  Google Scholar 

  • Boutkova, E. (2011). Experience with variability management in requirement specifications. In 15th International Software Product Line Conference (SPLC) (pp. 303–312): IEEE.

  • Bruce, R. F., & Wiebe, J. M. (1999). Recognizing subjectivity: A case study in manual tagging. Natural Language Engineering, 5(2), 187–205.

    Article  Google Scholar 

  • Brunetti, G., & Golob, B. (2000). A feature-based approach towards an integrated product model including conceptual design information. Computer-Aided Design, 32(14), 877–887.

    Article  Google Scholar 

  • Bullinaria, J. A., & Levy, J. P. (2007). Extracting semantic representations from word co-occurrence statistics: A computational study. Behavior Research Methods, 39(3), 510–526.

    Article  Google Scholar 

  • Chaniotaki, A. M., & Sharma, T. (2021). Architecture smells and Pareto principle: A preliminary empirical exploration. 2021 IEEE/ACM 18th International Conference on Mining Software Repositories (MSR) (pp. 190–194).

  • Chen, N., Lin, J., Hoi, S. C. H., Xiao, X., & Zhang, B. (2014). AR-miner: Mining informative reviews for developers from mobile app marketplace. Paper presented at the 36th International Conference on Software Engineering, Hyderabad, India.

  • Ciurumelea, A., Panichella, S., & Gall, H. C. (2018). Poster: Automated user reviews analyser. In 2018 IEEE/ACM 40th International Conference on Software Engineering: Companion (ICSE-Companion) (pp. 317–318).

  • Cysneiros, L. M., & do Prado Leite, J. C. S. (2004). Nonfunctional requirements: From elicitation to conceptual models. Ieee Transactions on Software Engineering, 30(5), 328–350.

    Article  Google Scholar 

  • Dhinakaran, V. T., Pulle, R., Ajmeri, N., & Murukannaiah, P. K. (2018). App review analysis via active learning: reducing supervision effort without compromising classification accuracy. In 2018 IEEE 26th International Requirements Engineering Conference (RE) (pp. 170–181). https://doi.org/10.1109/RE.2018.00026

  • Di Sorbo, A., Panichella, S., Alexandru, C. V., Shimagaki, J., Visaggio, C. A., Canfora, G., et al. (2016). What would users change in my app? Summarizing app reviews for recommending software changes. In Proceedings of the 2016 24th ACM SIGSOFT International Symposium on Foundations of Software Engineering (pp. 499–510): ACM.

  • Di Sorbo, A., Panichella, S., Alexandru, C. V., Visaggio, C. A., & Canfora, G. (2017). SURF: summarizer of user reviews feedback. In 2017 IEEE/ACM 39th International Conference on Software Engineering Companion (ICSE-C) pp. 55-58. https://doi.org/10.1109/ICSE-C.2017.5. https://ieeexplore.ieee.org/abstract/document/7965256

  • Fleiss, J. L., & Cohen, J. (1973). The equivalence of weighted kappa and the intraclass correlation coefficient as measures of reliability. Educational and Psychological Measurement, 33(3), 613–619.

    Article  Google Scholar 

  • Gao, C., Wang, B., He, P., Zhu, J., Zhou, Y., & Lyu, M. R. (2015). PAID: Prioritizing app issues for developers by tracking user reviews over versions. In 2015 IEEE 26th International Symposium on Software Reliability Engineering (ISSRE) (pp. 35–45).

  • González-Carvajal, S., & Garrido-Merchán, E. C. (2020). Comparing BERT against traditional machine learning text classification. arXiv preprint arXiv:2005.13012.

  • Hajič, J., Raab, J., & Spousta, M. (2009). Semi-supervised training for the averaged perceptron POS tagger. In Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics (pp. 763–771): Association for Computational Linguistics.

  • Illahi, I., Liu, H., Umer, Q., & Zaidi, S. A. H. (2019). An empirical study on competitive crowdsource software development: Motivating and inhibiting factors. IEEE Access, 7, 62042–62057. https://doi.org/10.1109/ACCESS.2019.2915604

    Article  Google Scholar 

  • Jiang, H., Zhang, J., Li, X., Ren, Z., Lo, D., Wu, X., et al. (2019). Recommending new features from mobile app descriptions. ACM Transactions on Software Engineering and Methodology (TOSEM), 28(4), Article 22. https://doi.org/10.1145/3344158

  • Jiang, J. J., & Conrath, D. W. (1997). Semantic similarity based on corpus statistics and lexical taxonomy. arXiv preprint cmp-lg/9709008.

  • Karov, Y., & Edelman, S. (1998). Similarity-based word sense disambiguation. Computational Linguistics, 24(1), 41–59.

    Google Scholar 

  • Kiremire, A. R. (2011). The application of the pareto principle in software engineering. Consulted January, 13, 2016.

  • Ko, Y., Park, S., & Seo, J. (2000). Web-based requirements elicitation supporting system using requirements categorization. In Proceedings of Twelfth International Conference on Software Engineering and Knowledge Engineering (SEKE 2000), Chicago, USA (pp. 344–351).

  • Konkol, M., Brychcín, T., & Konopík, M. (2015). Latent semantics in named entity recognition. Expert Systems with Applications, 42(7), 3470–3479.

    Article  Google Scholar 

  • Kropp, R. P., Stoker, H. W., & Bashaw, W. (1966). The validation of the taxonomy of educational objectives. The Journal of Experimental Education, 34(3), 69–76.

    Article  Google Scholar 

  • Leacock, C., & Chodorow, M. (1998). Combining local context and WordNet similarity for word sense identification. WordNet: An electronic lexical database, 49(2), 265–283.

  • Licorish, S. A., Savarimuthu, B. T. R., & Keertipati, S. (2017). Attributes that predict which features to fix: Lessons for app store mining. Paper presented at the 21st International Conference on Evaluation and Assessment in Software Engineering, Karlskrona, Sweden.

  • Maalej, W., Kurtanović, Z., Nabil, H., & Stanik, C. (2016). On the automatic classification of app reviews. Requirements Engineering, 21(3), 311–331. https://doi.org/10.1007/s00766-016-0251-9

    Article  Google Scholar 

  • Martinez-Gil, J. (2014). An overview of textual semantic similarity measures based on web intelligence. Artificial Intelligence Review, 42(4), 935–943.

    Article  Google Scholar 

  • Mayring, P. (2004). Qualitative content analysis. A Companion to Qualitative Research, 1, 159–176.

    Google Scholar 

  • Mihalcea, R., Corley, C., & Strapparava, C. (2006). Corpus-based and knowledge-based measures of text semantic similarity. In AAAI (Vol. 6, pp. 775–780).

  • Minaee, S., Kalchbrenner, N., Cambria, E., Nikzad, N., Chenaghlu, M., & Gao, J. (2021). Deep learning-based text classification: A comprehensive review. ACM Computing Surveys (CSUR), 54(3), 1–40.

    Article  Google Scholar 

  • Pagano, D., & Maalej, W. (2013) User feedback in the appstore: An empirical study. In 2013 21st IEEE International Requirements Engineering Conference (RE) (pp. 125–134). https://doi.org/10.1109/RE.2013.6636712

  • Panichella, S., Di Sorbo, A., Guzman, E., Visaggio, C. A., Canfora, G., & Gall, H. C. (2016). Ardoc: App reviews development oriented classifier. In Proceedings of the 2016 24th ACM SIGSOFT international symposium on foundations of software engineering (pp. 1023–1027): ACM.

  • Panichella, S., Sorbo, A. D., Guzman, E., Visaggio, C. A., Canfora, G., & Gall, H. C. (2015). How can I improve my app? Classifying user reviews for software maintenance and evolution. In 2015 IEEE International Conference on Software Maintenance and Evolution (ICSME) (pp. 281–290). https://doi.org/10.1109/ICSM.2015.7332474

  • Pennington, J., Socher, R., & Manning, C. (2014). Glove: Global vectors for word representation. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP) (pp. 1532–1543).

  • Petrakis, E. G., Varelas, G., Hliaoutakis, A., & Raftopoulou, P. (2006). X-similarity: Computing semantic similarity between concepts from different ontologies. Journal of Digital Information Management, 4(4).

  • Rohde, D. L., Gonnerman, L. M., & Plaut, D. C. (2006). An improved model of semantic similarity based on lexical co-occurrence. Communications of the ACM, 8(627–633), 116.

    Google Scholar 

  • Sánchez, D., Batet, M., & Isern, D. (2011). Ontology-based information content computation. Knowledge-Based Systems, 24(2), 297–303.

    Article  Google Scholar 

  • Sharma, M., Aggarwal, D., & Pahuja, D. (2020). Categorization and classification of Uber reviews. In H. Sharma, K. Govindan, R. C. Poonia, S. Kumar, & W. M. El-Medany (Eds.), Advances in Computing and Intelligent Systems, Singapore (pp. 347–355): Springer Singapore.

  • Snijders, R., Dalpiaz, F., Hosseini, M., Shahri, A., & Ali, R. (2014) Crowd-centric requirements engineering. In 2014 IEEE/ACM 7th International Conference on Utility and Cloud Computing (pp. 614–615). https://doi.org/10.1109/UCC.2014.96

  • Vu, P., Nguyen, T., & Nguyen, T. (2019). On building an automated responding system for app reviews: What are the characteristics of reviews and their responses? arXiv:1908.10816.

  • Yang, Y., & Pedersen, J. O. (1997). A comparative study on feature selection in text categorization. In ICML (Vol. 97, pp. 35, Vol. 412–420).

  • Zhang, M., Palade, V., Wang, Y., & Ji, Z. (2019). Word representation with salient features. IEEE Access, 7, 30157–30173. https://doi.org/10.1109/ACCESS.2019.2892817

    Article  Google Scholar 

Download references

Acknowledgements

We would like to thank the reviewers for their detailed and insightful comments on the QUATIC 2020 conference paper titled ‘Towards Automated Taxonomy Generation for Grouping App Reviews: A Preliminary Empirical Study’ which was extended in this study. We would also like to thank the app developers of My Tracks, TradeMe and Flutter for proving the app reviews which facilitated the execution of this study.

Funding

This work was partially funded by a Commerce Research Grant (CRG) at the Otago Business School.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Saurabh Malgaonkar.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Malgaonkar, S., Licorish, S.A. & Savarimuthu, B.T.R. Automatically generating taxonomy for grouping app reviews — a study of three apps. Software Qual J 30, 483–512 (2022). https://doi.org/10.1007/s11219-021-09570-1

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11219-021-09570-1

Keywords

Navigation