Abstract
In the era of Fourth Industrial Revolution, data and information became a valuable resource. In this data-driven economy, it is extremely important to maintain high level of data quality. Poor data quality can be a significant business cost and therefore data quality and consistency is a primary challenge for contemporary enterprises. There exists the need for concrete understanding of data quality which is the key concern among the data users. Hence, the study was carried out with the objective to analyse Twitter data to extract sentiments and opinions in unstructured texts and the key topics that are under consideration of Twitter users. Further, Text classification and topic modelling techniques have been performed to identify positive and negative sentiments and the key themes represented in polarized texts referring to data quality. In this study, two-step processes were followed to achieve the objective. In the first step, positive and negative sentiments were identified from Twitter feeds. In the second step, the Latent Dirichlet Allocation method was performed that allows to discover the keywords in the text corpuses that capture the recurring themes and is widely used to analyse large sets of polarized texts to identify the most common topics quickly and efficiently. The study contributes to text mining literature by providing a framework for analysing public sentiments. This can help to understand the key themes in negative sentiments related to data quality among the machine learning practitioners. Also, key concerns of public/data users could be highlighted and shared with larger community.
The Project has been financed by the Ministry of Science and Higher Education within “Regional Initiative of Excellence” Programme for 2019-2022. Project no.: 021/RID/2018/19. Total financing: 11 897 131.40 PLN.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Asghar Z, et al. (2018) ‘Sentiment analysis on automobile brands using Twitter data’. In International Conference on Intelligent Technologies and Applications, pp. 76–85
Blei DM, Ng AY, Jordan MI (2003) Latent dirichlet allocation. J Mach Learn Res, 3, 993–1022
Choi D, Kim P (2013) ‘Sentiment analysis for tracking breaking events: a case study on twitter’, In asian conference on intelligent information and database systems, pp. 285–294
Dave K, Lawrence S, Pennock DM (2003) Mining the peanut gallery. :In opinion extraction and semantic classification of product reviews. Proceedings of the 12th international conference on World Wide Web, 519−528
Dinkić N et al. (2016) ‘Using sentiment analysis of Twitter data for determining popularity of city locations’. In international conference on ICT innovations, pp. 156–164
Dwivedi DN, Anand A (2021). The Text Mining of Public Policy Documents in Response to COVID-19: A Comparison of the United Arab Emirates and the Kingdom of Saudi Arabia. Public Governance / Zarządzanie Publiczne, 55(1), 8–22. https://doi.org/10.15678/ZP.2021.55.1.02
Godea AK et al. (2015) ‘An analysis of twitter data on e-cigarette sentiments and promotion’. In conference on artificial intelligence in medicine in europe, pp. 205–215
Gupta A et al. (2021) Understanding consumer product sentiments through supervised models on cloud: pre and post COVID. Webology, 18(1), pp.406−415. https://doi.org/10.14704/web/v18i1/web18097
Lukins SK, Kraft NA, Etzkorn LH (2008) Source code retrieval for bug localization using latent dirichlet allocation. In Reverse Engineering, 2008. WCRE’08. 15th Working Conference on. 2008. IEEE
Olorunnimbe MK Viktor HL (2015) ‘Tweets as a vote: Exploring political sentiments on twitter for opinion mining’, In International Symposium on Methodologies for Intelligent Systems, pp. 180–185
Pang B, Lee L, Vaithyanathan S (2002). Thumbs up?: sentiment classification using machine learning techniques. proceedings of the acl-02 conference on empirical methods in natural language processing. 10, pp. 79–86. Stroudsburg, PA, USA: Association for Computational Linguistics. https://doi.org/10.3115/1118693.1118704
Rao T, Srivastava S (2014) ‘Twitter sentiment analysis: In How to hedge your bets in the stock markets’, in State of the art applications of social network analysis. Springer, pp. 227–247
Steede GM et al (2018) ‘A sentiment and content analysis of Twitter content regarding the use of antibiotics in livestock’,. J Appl Commun Agric Commun Educ 102(4):1B-1B
Thomas R. (2021) How ai is driving the new industrial revolution, Forbes. Forbes. https://www.forbes.com (Accessed: 4 May 2021)
Wójcik, K. (2012). Sentiment Analysis of Information Extracted from Social Media. In Jaki A, Lula P, Mikuła B, (Eds.), Knowledge—Economy—Society. Transfer of Knowledge In The Contemporary Economy (pp. 185–205). Kraków: Foundation of the Cracow University of Economics
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Dwivedi, D.N., Wójcik, K., Vemareddyb, A. (2022). Identification of Key Concerns and Sentiments Towards Data Quality and Data Strategy Challenges Using Sentiment Analysis and Topic Modeling. In: Jajuga, K., Dehnel, G., Walesiak, M. (eds) Modern Classification and Data Analysis. SKAD 2021. Studies in Classification, Data Analysis, and Knowledge Organization. Springer, Cham. https://doi.org/10.1007/978-3-031-10190-8_2
Download citation
DOI: https://doi.org/10.1007/978-3-031-10190-8_2
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-10189-2
Online ISBN: 978-3-031-10190-8
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)