Skip to main content

Identification of Key Concerns and Sentiments Towards Data Quality and Data Strategy Challenges Using Sentiment Analysis and Topic Modeling

  • Conference paper
  • First Online:
Modern Classification and Data Analysis (SKAD 2021)

Abstract

In the era of Fourth Industrial Revolution, data and information became a valuable resource. In this data-driven economy, it is extremely important to maintain high level of data quality. Poor data quality can be a significant business cost and therefore data quality and consistency is a primary challenge for contemporary enterprises. There exists the need for concrete understanding of data quality which is the key concern among the data users. Hence, the study was carried out with the objective to analyse Twitter data to extract sentiments and opinions in unstructured texts and the key topics that are under consideration of Twitter users. Further, Text classification and topic modelling techniques have been performed to identify positive and negative sentiments and the key themes represented in polarized texts referring to data quality. In this study, two-step processes were followed to achieve the objective. In the first step, positive and negative sentiments were identified from Twitter feeds. In the second step, the Latent Dirichlet Allocation method was performed that allows to discover the keywords in the text corpuses that capture the recurring themes and is widely used to analyse large sets of polarized texts to identify the most common topics quickly and efficiently. The study contributes to text mining literature by providing a framework for analysing public sentiments. This can help to understand the key themes in negative sentiments related to data quality among the machine learning practitioners. Also, key concerns of public/data users could be highlighted and shared with larger community.

The Project has been financed by the Ministry of Science and Higher Education within “Regional Initiative of Excellence” Programme for 2019-2022. Project no.: 021/RID/2018/19. Total financing: 11 897 131.40 PLN.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  • Asghar Z, et al. (2018) ‘Sentiment analysis on automobile brands using Twitter data’. In International Conference on Intelligent Technologies and Applications, pp. 76–85

    Google Scholar 

  • Blei DM, Ng AY, Jordan MI (2003) Latent dirichlet allocation. J Mach Learn Res, 3, 993–1022

    Google Scholar 

  • Choi D, Kim P (2013) ‘Sentiment analysis for tracking breaking events: a case study on twitter’, In asian conference on intelligent information and database systems, pp. 285–294

    Google Scholar 

  • Dave K, Lawrence S, Pennock DM (2003) Mining the peanut gallery. :In opinion extraction and semantic classification of product reviews. Proceedings of the 12th international conference on World Wide Web, 519−528

    Google Scholar 

  • Dinkić N et al. (2016) ‘Using sentiment analysis of Twitter data for determining popularity of city locations’. In international conference on ICT innovations, pp. 156–164

    Google Scholar 

  • Dwivedi DN, Anand A (2021). The Text Mining of Public Policy Documents in Response to COVID-19: A Comparison of the United Arab Emirates and the Kingdom of Saudi Arabia. Public Governance / Zarządzanie Publiczne, 55(1), 8–22. https://doi.org/10.15678/ZP.2021.55.1.02

  • Godea AK et al. (2015) ‘An analysis of twitter data on e-cigarette sentiments and promotion’. In conference on artificial intelligence in medicine in europe, pp. 205–215

    Google Scholar 

  • Gupta A et al. (2021) Understanding consumer product sentiments through supervised models on cloud: pre and post COVID. Webology, 18(1), pp.406−415. https://doi.org/10.14704/web/v18i1/web18097

  • Lukins SK, Kraft NA, Etzkorn LH (2008) Source code retrieval for bug localization using latent dirichlet allocation. In Reverse Engineering, 2008. WCRE’08. 15th Working Conference on. 2008. IEEE

    Google Scholar 

  • Olorunnimbe MK Viktor HL (2015) ‘Tweets as a vote: Exploring political sentiments on twitter for opinion mining’, In International Symposium on Methodologies for Intelligent Systems, pp. 180–185

    Google Scholar 

  • Pang B, Lee L, Vaithyanathan S (2002). Thumbs up?: sentiment classification using machine learning techniques. proceedings of the acl-02 conference on empirical methods in natural language processing. 10, pp. 79–86. Stroudsburg, PA, USA: Association for Computational Linguistics. https://doi.org/10.3115/1118693.1118704

  • Rao T, Srivastava S (2014) ‘Twitter sentiment analysis: In How to hedge your bets in the stock markets’, in State of the art applications of social network analysis. Springer, pp. 227–247

    Google Scholar 

  • Steede GM et al (2018) ‘A sentiment and content analysis of Twitter content regarding the use of antibiotics in livestock’,. J Appl Commun Agric Commun Educ 102(4):1B-1B

    Google Scholar 

  • Thomas R. (2021) How ai is driving the new industrial revolution, Forbes. Forbes. https://www.forbes.com (Accessed: 4 May 2021)

  • Wójcik, K. (2012). Sentiment Analysis of Information Extracted from Social Media. In Jaki A, Lula P, Mikuła B, (Eds.), Knowledge—Economy—Society. Transfer of Knowledge In The Contemporary Economy (pp. 185–205). Kraków: Foundation of the Cracow University of Economics

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Katarzyna Wójcik .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Dwivedi, D.N., Wójcik, K., Vemareddyb, A. (2022). Identification of Key Concerns and Sentiments Towards Data Quality and Data Strategy Challenges Using Sentiment Analysis and Topic Modeling. In: Jajuga, K., Dehnel, G., Walesiak, M. (eds) Modern Classification and Data Analysis. SKAD 2021. Studies in Classification, Data Analysis, and Knowledge Organization. Springer, Cham. https://doi.org/10.1007/978-3-031-10190-8_2

Download citation

Publish with us

Policies and ethics