Skip to main content
Log in

Using frame semantics for classifying and summarizing application store reviews

  • Published:
Empirical Software Engineering Aims and scope Submit manuscript

Abstract

Text mining techniques have been recently employed to classify and summarize user reviews on mobile application stores. However, due to the inherently diverse and unstructured nature of user-generated online textual data, text-based review mining techniques often produce excessively complicated models that are prone to overfitting. In this paper, we propose a novel approach, based on frame semantics, for app review mining. Semantic frames help to generalize from raw text (individual words) to more abstract scenarios (contexts). This lower-dimensional representation of text is expected to enhance the predictive capabilities of review mining techniques and reduce the chances of overfitting. Specifically, our analysis in this paper is two-fold. First, we investigate the performance of semantic frames in classifying informative user reviews into various categories of actionable software maintenance requests. Second, we propose and evaluate the performance of multiple summarization algorithms in generating concise and representative summaries of informative reviews. Three different datasets of app store reviews, sampled from a broad range of application domains, are used to conduct our experimental analysis. The results show that semantic frames can enable an efficient and accurate review classification process. However, in review summarization tasks, our results show that text-based summarization generates more comprehensive summaries than frame-based summarization. Finally, we introduces MARC 2.0, a review classification and summarization suite that implements the algorithms investigated in our analysis.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

Notes

  1. https://www.statista.com/topics/1729/app-stores/

  2. https://framenet.icsi.berkeley.edu/fndrupal/

  3. Our dataset is publicly available at http://seel.cse.lsu.edu/data/emse18.zip

  4. Randomization in our analysis is implemented using the .NET Random class

  5. www.cs.waikato.ac.nz/~ml/weka/

  6. www.cs.cmu.edu/~ark/SEMAFOR/

  7. http://seel.cse.lsu.edu/data/emse18.zip

  8. http://demo.ark.cs.cmu.edu/parse

  9. https://github.com/seelprojects/MARC-2.0

  10. https://framenet.icsi.berkeley.edu/fndrupal/current_status

References

  • Agarwal A, Balasubramanian S, Kotalwar A, Zheng J, Rambow O (2014) Frame semantic tree kernels for social network extraction from text. In: Conference of the European chapter of the association for computational linguistics, pp 211–219

  • Baker C, Fillmore C, Lowe J (1998) The Berkeley Framenet project. In: International conference on computational linguistics, pp 86–90

  • Bano M, Zowghi D (2015) A systematic review on the relationship between user involvement and system success. Inf Softw Technol 58:148–169

    Article  Google Scholar 

  • Barker E, Paramita M, Funk A, Kurtic E, Aker A, Foster J, Hepple M, Gaizauskas R (2016) What’s the issue here?: task-based evaluation of reader comment summarization systems. In: International conference on language resources and evaluation, pp 23–28

  • Barzilay R, McKeown K, Elhadad M (1999) Information fusion in the context of multi-document summarization. In: Annual meeting of the association for computational linguistics on computational linguistics, pp 550–557

  • Basole R, Karla J (2012) Value transformation in the mobile service ecosystem: a study of app store emergence and growth. Service Science 4(1):24–41

    Article  Google Scholar 

  • Berry D (2017) Evaluation of tools for hairy requirements and software engineering tasks. In: International requirements engineering conference workshops, pp 284–291

  • Blei D, Ng A, Jordan M (2003) Latent dirichlet allocation. J Mach Learn Res 3:993–1022

    MATH  Google Scholar 

  • Brin S, Page L (1998) The anatomy of a large-scale hypertextual web search engine. Computer Networks and ISDN Systems 30(1–7):107–117

    Article  Google Scholar 

  • Brusilovsky P, Kobsa A, Nejdl W (2007) The adaptive web: methods and strategies of web personalization. Springer, Berlin, pp 335–336

    Book  Google Scholar 

  • Burges C (1998) A tutorial on Support Vector Machines for pattern recognition. Data Min Knowl Disc 2(2):121–167

    Article  Google Scholar 

  • Cai L, Hofmann T (2004) Hierarchical document categorization with support vector machines. In: International conference on information and knowledge management, pp 78–87

  • Carreńo G, Winbladh K (2013) Analysis of user comments: an approach for software requirements evolution. In: International conference on software engineering, pp 582–591

  • Chen N, Lin J, Hoi S, Xiao X, Zhang B (2014) AR-Miner: mining informative reviews for developers from mobile app marketplace. In: International conference on software engineering, pp 767–778

  • Cheung J (2008) Comparing abstractive and extractive summarization of evaluative text: controversiality and content selection. B. Sc. (Hons.) Thesis in The Department of Computer Science of the Faculty of Science, University of British Columbia

  • Ciurumelea A, Schaufelbühl A, Panichella S, Gall H (2017) Analyzing reviews and code of mobile apps for better release planning. In: International conference on software analysis, evolution and reengineering, pp 91–102

  • Das D, Schneider N, Chen D, Smith N (2010) SEMAFOR 1.0: a probabilistic frame-semantic parser. Tech. rep., Report number: CMU-LTI-10-001, Carnegie Mellon University

  • Dean A, Voss D (1999) Design and analysis of experiments. Springer, Berlin

    Book  Google Scholar 

  • Dumais S, Chen H (2000) Hierarchical classification of Web content. In: ACM international conference on research and development in information retrieval, pp 256–263

  • Erkan G, Radev D (2004) Lexrank: graph-based lexical centrality as salience in text summarization. J Artif Intell Res 22(1):457–479

    Article  Google Scholar 

  • Ester M, Kriegel H, Sander J, Xu X (1996) A density-based algorithm for discovering clusters in large spatial databases with noise. In: International conference on knowledge discovery and data mining, pp 226–231

  • Fillmore C (1976) Frame semantics and the nature of language. In: Annals of the New York academy of sciences: conference on the origin and development of language and speech, pp 20–32

    Article  Google Scholar 

  • Fleischman M, Kwon N, Hovy E (2003) Maximum entropy models for FrameNet classification. In: Empirical methods in natural language processing, pp 49–56

  • Groen E, Kopczyǹska S, Hauer M, Krafft T, Doerr J (2017) Users: the hidden software product quality experts?: a study on how app users report quality aspects in online reviews. In: International requirements engineering conference, pp 80–89

  • Guzman E, Maalej W (2014) How do users like this feature? A fine grained sentiment analysis of app reviews. In: Requirements engineering conference, pp 153–162

  • Guzman E, El-Haliby M, Bruegge B (2015) Ensemble methods for app review classification: an approach for software evolution. In: International conference on automated software engineering, pp 771–776

  • Guzman E, Alkadhi R, Seyff N (2016) A needle in a haystack: what do Twitter users say about software?. In: International requirements engineering conference, pp 96–105

  • Guzman E, Ibrahim M, Glinz M (2017) A little bird told me: mining tweets for requirements and software evolution. In: International requirements engineering conference, pp 11–20

  • Ha E, Wagner D (2013) Do Android users write about electric sheep? Examining consumer reviews in Google Play. In: Consumer communications and networking conference, pp 149–157

  • Hahn U, Mani I (2000) The challenges of automatic summarization. Computer 33(11):29–36

    Article  Google Scholar 

  • Hasa K, Ng V (2013) Frame semantics for stance classification. In: Computational natural language learning, pp 124–132

  • Huffman-Hayes J, Dekhtyar A, Sundaram S (2006) Advancing candidate link generation for requirements tracing: the study of methods. IEEE Trans Softw Eng 32 (1):4–19

    Article  Google Scholar 

  • Iacob C, Harrison R (2013) Retrieving and analyzing mobile apps feature requests from online reviews. In: Mining software repositories, pp 41–44

  • Inouye D, Kalita J (2011) Comparing Twitter summarization algorithms for multiple post summaries. In: International conference on social computing and international conference on privacy, security, risk and trust, pp 298–306

  • Jha N, Mahmoud A (2017a) MARC: a mobile application review classifier. In: Requirements engineering: foundation for software quality: workshops, pp 1–6

    Chapter  Google Scholar 

  • Jha N, Mahmoud A (2017b) Mining user requirements from application store reviews using frame semantics. In: Requirements engineering: foundation for software quality, pp 1–15

    Chapter  Google Scholar 

  • Joachims T (1998) Text categorization with Support Vector Machines: learning with many relevant features. In: European conference on machine learning, pp 137–142

    Chapter  Google Scholar 

  • Johann T, Stanik C, Alizadeh A, Maalej W (2017) Safe: a simple approach for feature extraction from app descriptions and app reviews. In: International requirements engineering conference, pp 21–31

  • Khabiri E, Caverlee J, Hsu C (2011) Summarizing user-contributed comments. In: International AAAI conference on Weblogs and social media, pp 534–537

  • Khalid H, Shihab E, Nagappan M, Hassan A (2015) What do mobile app users complain about? IEEE Softw 32(3):70–77

    Article  Google Scholar 

  • Khatiwada S, Tushev M, Mahmoud A (2018) Just enough semantics: an information theoretic approach for ir-based software bug localization. Inf Softw Technol 93:45–57

    Article  Google Scholar 

  • Kim S, Han K, Rim H, Myaeng S (2006) Some effective techniques for Naive Bayes text classification. IEEE Trans Knowl Data Eng 18(11):1457–1466

    Article  Google Scholar 

  • Kohavi R (1995) A study of cross-validation and bootstrap for accuracy estimation and model selection. In: International joint conference on artificial intelligence, pp 1137–1143

  • Langley P, Iba W, Thompson K (1992) An analysis of Bayesian classifiers. In: National conference on artificial intelligence, pp 223–228

  • Lin C (2004) ROUGE: a package for automatic evaluation of summaries. In: Workshop on text summarization branches out, pp 74–81

  • Lin C, Hovy E (2003) Automatic evaluation of summaries using n-gram co-occurrence statistics. In: Conference of the North American chapter of the association for computational linguistics on human language technology, pp 71–78

  • Llewellyn C, Grover C, Oberlander J (2014) Summarizing newspaper comments. In: International conference on Weblogs and social media, pp 599–602

  • Lo D, Nagappan N, Zimmermann T (2015) How practitioners perceive the relevance of software engineering research. In: Joint meeting on foundations of software engineering, pp 415–425

  • Lovins J (1968) Development of a stemming algorithm. Mechanical Translation and Computational Linguistics 11:22–31

    Google Scholar 

  • Maalej W, Nabil H (2015) Bug report, feature request, or simply praise? On automatically classifying app reviews. In: Requirements engineering conference, pp 116–125

  • Mackie S, McCreadie R, Macdonald C, Ounis I (2014) Comparing algorithms for microblog summarisation. In: Information access evaluation. Multilinguality, multimodality, and interaction: 5th international conference of the CLEF initiative, pp 153–159

    Google Scholar 

  • Martin W, Harman M, Jia Y, Sarro F, Zhang Y (2015) The app sampling problem for app store mining. In: Working conference on mining software repositories, pp 123–133

  • Martin W, Sarro F, Jia Y, Zhang Y, Harman M (2017) A survey of app store analysis for software engineering. IEEE Trans Softw Eng 43(9):817–847

    Article  Google Scholar 

  • McCallum A, Nigam K (1998) A comparison of event models for Naive Bayes text classification. In: AAAI workshop on learning for text categorization, pp 41–48

  • McCord M, Chuah M (2011) Spam detection on Twitter using traditional classifiers. In: international conference on Autonomic and trusted computing, pp 175–186

  • Mcllroy S, Ali N, Khalid H, Hassan A (2016) Analyzing and automatically labelling the types of user issues that are raised in mobile app reviews. Empir Softw Eng 21(3):1067–1106

    Article  Google Scholar 

  • Mehrotra R, Sanner S, Buntine W, Xie L (2013) Improving LDA topic models for microblogs via tweet pooling and automatic labeling. In: International ACM SIGIR conference on research and development in information retrieval, pp 889–892

  • Mitchell T (1997) Machine learning. McGraw-Hill, New York

    MATH  Google Scholar 

  • Moschitti A, Morarescu P, Harabagiu S (2003) Open domain information extraction via automatic semantic labeling. In: The Florida artificial intelligence research society conference, pp 397–401

  • Nayebi M, Cho H, Farrahi H, Ruhe G (2017) App store mining is not enough. In: International conference on software engineering companion, pp 152–154

  • Nenkova A, Vanderwende L (2005) The impact of frequency on summarization. Tech. rep., Report number: MSR-TR-2005-101, Microsoft Research, Redmond, Washington

  • Nichols J, Mahmud J, Drews C (2012) Summarizing sporting events using Twitter. In: ACM international conference on intelligent user interfaces, pp 189–198

  • Otterbacher J, Erkan G, Radev D (2009) Biased lexrank: passage retrieval using random walks with question-based priors. Inf Process Manag 45(1):42–54

    Article  Google Scholar 

  • Pagano D, Maalej W (2013) User feedback in the AppStore: an empirical study. In: Requirements engineering conference, pp 125–134

  • Page L, Brin S, Motwani R, Winograd T (1999) The PageRank citation ranking: bringing order to the Web. Tech. rep., Stanford University, Stanford

  • Panichella S, Di Sorbo A, Guzman E, Visaggio C, Canfora G, Gall H (2015) How can I improve my app? Classifying user reviews for software maintenance and evolution. In: International conference on software maintenance and evolution, pp 281–290

  • Petsas T, Papadogiannakis A, Polychronakis M, Markatos E, Karagiannis T (2013) Rise of the planet of the apps: a systematic study of the mobile app ecosystem. In: Conference on internet measurement conference, pp 277–290

  • Platt J (1998) Fast training of Support Vector Machines using sequential minimal optimization. In: Schoelkopf B, Burges C, Smola A (eds) Advances in Kernel methods - Support Vector learning. MIT Press, pp 185–208

  • Poché E, Jha N, Williams G, Staten J, Vesper M, Mahmoud A (2017) Analyzing user comments on YouTube coding tutorial videos. In: International conference on program comprehension, pp 196–206

  • Powers D (2014) What the f-measure doesn’t measure. Tech. rep., Report number: KIT-14-001 School of Computer Science, Engineering and Mathematics, Flinders University

  • Quinlan J (1986) Induction of decision trees. Mach Learn 1(1):81–106

    Google Scholar 

  • Read J, Pfahringer B, Holmes G (2008) Multi-label classification using ensembles of pruned sets. In: IEEE international conference on data mining, pp 995–1000

  • Runeson P (2003) Using students as experimental subjects—an analysis of graduate and freshmen PSP student data. In: Empirical assessment in software engineering, pp 95–102

  • Sebastiani F (2002) Machine learning in automated text categorization. ACM Comput Surv 34(1):1–47

    Article  MathSciNet  Google Scholar 

  • Shen D, Lapata M (2007) Using semantic roles to improve question answering. In: Joint conference on empirical methods in natural language processing and computational natural language learning, pp 12–21

  • Sinha S (2008) Answering questions about complex events. PhD thesis, Berkeley, CA, USA

  • Sorbo A, Panichella S, Alexandru C, Shimagaki J, Visaggio C, Canfora G, Gall H (2016) What would users change in my app? Summarizing app reviews for recommending software changes. In: International symposium on foundations of software engineering, pp 499–510

  • Squires L (2010) Enregistering internet language. Lang Soc 39(4):457–492

    Article  Google Scholar 

  • Steinwart I (2001) On the influence of the kernel on the consistency of Support Vector Machines. J Mach Learn Res 2:67–93

    MathSciNet  MATH  Google Scholar 

  • Tukey J (1949) Comparing individual means in the analysis of variance. Biometrics 5(2):99–114

    Article  MathSciNet  Google Scholar 

  • Üstün B, Melssen W, Buydens L (2006) Facilitating the application of support vector regression by using a universal Pearson VII function based kernel. Chemometr Intell Lab Syst 81:29–40

    Article  Google Scholar 

  • Villarroel L, Bavota G, Russo B, Oliveto R, Di Penta M (2016) Release planning of mobile apps based on user reviews. In: International conference on software engineering, pp 14–24

  • Wang A (2010) Don’t follow me: spam detection in Twitter. In: International conference on security and cryptography, pp 1–10

  • Wang S, Manning C (2012) Baselines and bigrams: simple, good sentiment and topic classification. In: Annual meeting of the association for computational linguistics, pp 90–94

  • Williams G, Mahmoud A (2017) Mining Twitter feeds for software user requirements. In: IEEE international requirements engineering conference, pp 1–10

  • Xie B, Passonneau R, Wu L, Creamer G (2013) Semantic frames to predict stock price movement. In: Annual meeting of the association for computational linguistics, pp 873–883

Download references

Acknowledgments

This work was supported in part by the Louisiana Board of Regents Research Competitiveness Subprogram (LA BoR-RCS), contract number: LEQSF(2015-18)-RD-A-07.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Anas Mahmoud.

Additional information

Communicated by: Paul Grünbacher and Anna Perini

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Jha, N., Mahmoud, A. Using frame semantics for classifying and summarizing application store reviews. Empir Software Eng 23, 3734–3767 (2018). https://doi.org/10.1007/s10664-018-9605-x

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10664-018-9605-x

Keywords

Navigation