Identification of Key Concerns and Sentiments Towards Data Quality and Data Strategy Challenges Using Sentiment Analysis and Topic Modeling

Dwivedi, Dwijendra Nath; Wójcik, Katarzyna; Vemareddyb, Anilkumar

doi:10.1007/978-3-031-10190-8_2

Part of the book series: Studies in Classification, Data Analysis, and Knowledge Organization ((STUDIES CLASS))

Included in the following conference series:

Conference of the Section on Classification and Data Analysis of the Polish Statistical Association

364 Accesses
4 Citations
2 Altmetric

Abstract

In the era of Fourth Industrial Revolution, data and information became a valuable resource. In this data-driven economy, it is extremely important to maintain high level of data quality. Poor data quality can be a significant business cost and therefore data quality and consistency is a primary challenge for contemporary enterprises. There exists the need for concrete understanding of data quality which is the key concern among the data users. Hence, the study was carried out with the objective to analyse Twitter data to extract sentiments and opinions in unstructured texts and the key topics that are under consideration of Twitter users. Further, Text classification and topic modelling techniques have been performed to identify positive and negative sentiments and the key themes represented in polarized texts referring to data quality. In this study, two-step processes were followed to achieve the objective. In the first step, positive and negative sentiments were identified from Twitter feeds. In the second step, the Latent Dirichlet Allocation method was performed that allows to discover the keywords in the text corpuses that capture the recurring themes and is widely used to analyse large sets of polarized texts to identify the most common topics quickly and efficiently. The study contributes to text mining literature by providing a framework for analysing public sentiments. This can help to understand the key themes in negative sentiments related to data quality among the machine learning practitioners. Also, key concerns of public/data users could be highlighted and shared with larger community.

The Project has been financed by the Ministry of Science and Higher Education within “Regional Initiative of Excellence” Programme for 2019-2022. Project no.: 021/RID/2018/19. Total financing: 11 897 131.40 PLN.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Asghar Z, et al. (2018) ‘Sentiment analysis on automobile brands using Twitter data’. In International Conference on Intelligent Technologies and Applications, pp. 76–85
Google Scholar
Blei DM, Ng AY, Jordan MI (2003) Latent dirichlet allocation. J Mach Learn Res, 3, 993–1022
Google Scholar
Choi D, Kim P (2013) ‘Sentiment analysis for tracking breaking events: a case study on twitter’, In asian conference on intelligent information and database systems, pp. 285–294
Google Scholar
Dave K, Lawrence S, Pennock DM (2003) Mining the peanut gallery. :In opinion extraction and semantic classification of product reviews. Proceedings of the 12th international conference on World Wide Web, 519−528
Google Scholar
Dinkić N et al. (2016) ‘Using sentiment analysis of Twitter data for determining popularity of city locations’. In international conference on ICT innovations, pp. 156–164
Google Scholar
Dwivedi DN, Anand A (2021). The Text Mining of Public Policy Documents in Response to COVID-19: A Comparison of the United Arab Emirates and the Kingdom of Saudi Arabia. Public Governance / Zarządzanie Publiczne, 55(1), 8–22. https://doi.org/10.15678/ZP.2021.55.1.02
Godea AK et al. (2015) ‘An analysis of twitter data on e-cigarette sentiments and promotion’. In conference on artificial intelligence in medicine in europe, pp. 205–215
Google Scholar
Gupta A et al. (2021) Understanding consumer product sentiments through supervised models on cloud: pre and post COVID. Webology, 18(1), pp.406−415. https://doi.org/10.14704/web/v18i1/web18097
Lukins SK, Kraft NA, Etzkorn LH (2008) Source code retrieval for bug localization using latent dirichlet allocation. In Reverse Engineering, 2008. WCRE’08. 15th Working Conference on. 2008. IEEE
Google Scholar
Olorunnimbe MK Viktor HL (2015) ‘Tweets as a vote: Exploring political sentiments on twitter for opinion mining’, In International Symposium on Methodologies for Intelligent Systems, pp. 180–185
Google Scholar
Pang B, Lee L, Vaithyanathan S (2002). Thumbs up?: sentiment classification using machine learning techniques. proceedings of the acl-02 conference on empirical methods in natural language processing. 10, pp. 79–86. Stroudsburg, PA, USA: Association for Computational Linguistics. https://doi.org/10.3115/1118693.1118704
Rao T, Srivastava S (2014) ‘Twitter sentiment analysis: In How to hedge your bets in the stock markets’, in State of the art applications of social network analysis. Springer, pp. 227–247
Google Scholar
Steede GM et al (2018) ‘A sentiment and content analysis of Twitter content regarding the use of antibiotics in livestock’,. J Appl Commun Agric Commun Educ 102(4):1B-1B
Google Scholar
Thomas R. (2021) How ai is driving the new industrial revolution, Forbes. Forbes. https://www.forbes.com (Accessed: 4 May 2021)
Wójcik, K. (2012). Sentiment Analysis of Information Extracted from Social Media. In Jaki A, Lula P, Mikuła B, (Eds.), Knowledge—Economy—Society. Transfer of Knowledge In The Contemporary Economy (pp. 185–205). Kraków: Foundation of the Cracow University of Economics
Google Scholar

Download references

Author information

Authors and Affiliations

Cracow University of Economics, Kraków, Poland
Dwijendra Nath Dwivedi & Katarzyna Wójcik
University of Agricultural Sciences, Bangalore, India
Anilkumar Vemareddyb

Authors

Dwijendra Nath Dwivedi
View author publications
You can also search for this author in PubMed Google Scholar
Katarzyna Wójcik
View author publications
You can also search for this author in PubMed Google Scholar
Anilkumar Vemareddyb
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Katarzyna Wójcik .

Editor information

Editors and Affiliations

Department of Financial Investments and Risk Management, Wroclaw University of Economics and Business, Wrocław, Poland
Krzysztof Jajuga
Department of Statistics, Poznań University of Economics and Business, Poznań, Poland
Grażyna Dehnel
Department of Econometrics and Computer Science, Wroclaw University of Economics and Business, Jelenia Góra, Poland
Marek Walesiak

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Dwivedi, D.N., Wójcik, K., Vemareddyb, A. (2022). Identification of Key Concerns and Sentiments Towards Data Quality and Data Strategy Challenges Using Sentiment Analysis and Topic Modeling. In: Jajuga, K., Dehnel, G., Walesiak, M. (eds) Modern Classification and Data Analysis. SKAD 2021. Studies in Classification, Data Analysis, and Knowledge Organization. Springer, Cham. https://doi.org/10.1007/978-3-031-10190-8_2

Download citation

DOI: https://doi.org/10.1007/978-3-031-10190-8_2
Published: 16 October 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-10189-2
Online ISBN: 978-3-031-10190-8
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)

Publish with us

Policies and ethics