Skip to main content

A Thematic Review on Data Quality Challenges and Dimension in the Era of Big Data

  • Conference paper
  • First Online:
Book cover Proceedings of the 12th National Technical Seminar on Unmanned System Technology 2020

Part of the book series: Lecture Notes in Electrical Engineering ((LNEE,volume 770))

  • 1228 Accesses

Abstract

Data quality is the primary concern faced by most of the organizations due to improper maintenance in the database. Data obtained from the various resources are dirty, affecting the accuracy of predicted results. There are a lot of challenges when handling Big Data because it requires well-defined and precise measurement processes. The challenges are in the characteristics of big data itself where the V’s play an important role in measuring and determining data quality. Although the issue has been discussed over 20 years, there is no guideline in identifying the important dimension of data quality being proposed to adhere with the context of Big Data. Therefore, the purpose of this systematic review is to review literature on the issue, challenges, and dimension of data quality in the era of Big Data using thematic review. This review included journal and conference proceeding papers from ACM Digital Library, Scopus, and Science Direct published between 2016 until 2020. Inclusion and exclusion processes have filtered out 21 final articles for the review. A systematic review on these 21 articles focuses on the issue, challenges, and dimension of data quality. The results of this study benefit the future study on the development of data quality dimensions and can be a guideline for the researcher to design the data quality assessment framework.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 219.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 279.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 379.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Swapnil W, Anil Y, Gupta S (2016) Big data: characteristics, challenges and data mining. Int J Comput Appl 25–29

    Google Scholar 

  2. Ridzuan F, Wan Zainon WMN (2019) A review on data cleansing methods for big data. Proc Comput Sci 161:731–738

    Google Scholar 

  3. Taleb I, Dssouli R, Serhani MA (2015) Big Data pre-processing: a quality framework. In: 2015 IEEE international congress on big data. IEEE, pp 191–198

    Google Scholar 

  4. Feng Y (2018) Improve data quality by using dependencies and regular expressions. Mid Sweden University

    Google Scholar 

  5. Chu X (2017) Scalable and holistic qualitative data cleaning. University of Waterloo

    Google Scholar 

  6. Alotaibi SB (2017) ETDC: an efficient technique to cleanse data in the data warehouse. In: Proceedings of the international conference on advances in image processing. ACM, Bangkok, pp 135–138

    Google Scholar 

  7. Abdalla AMNT (2018) Leverage data quality improvement for big data analytics. Universitas Masarykiana

    Google Scholar 

  8. Auer F, Felderer M (2019) Addressing data quality problems with metamorphic data relations. In: Proceedings of the 2019 IEEE/ACM 4th international workshop on metamorphic testing (MET 2019), pp 76–83

    Google Scholar 

  9. Salih FI, Ismail SA, Hamed MM, Mohd Yusop O, Azmi A, Mohd Azmi NF (2019) Data quality issues in big data: a review. Adv Int Syst Comput 843:105–116

    Google Scholar 

  10. Zairul M (2020) A thematic review on student-centred learning in the studio education. J Crit Rev 7(2):504–511

    Google Scholar 

  11. Clarke V, Braun V (2013) Teaching thematic analysis: Overcoming challenges and developing strategies for effective learning. The Psychologist 26:120–123

    Google Scholar 

  12. Taleb I, Serhani MA, Dssouli R (2018) Big data quality: a survey. In: 2018 IEEE international congress on big data (Big Data congress), pp 166–73

    Google Scholar 

  13. Tian Y (2017) Accelerating data preparation for big data analytics. TELECOM ParisTech

    Google Scholar 

  14. El Alaoui (2019) Big data quality metrics for sentiment analysis approaches

    Google Scholar 

  15. El Glaoui I, Gahi Y (2019) The impact of big data quality on sentiment analysis approaches. Proc Comput Sci, pp 803–810 (Elsevier B.V.)

    Google Scholar 

  16. Dong X, He H, Li C, Liu Y, Xiong H (2018) Scene-based big data quality management framework. In: International conference of pioneering computer scientists, engineers and educators, pp 122–139

    Google Scholar 

  17. Emmanuel I, Stanier C (2016) Defining big data. In: Proceedings of the international conference on big data and advanced wireless technologies—BDAW’16. ACM Press, New York, pp 1–6

    Google Scholar 

  18. Cai L, Zhu Y (2015) The challenges of data quality and data quality assessment in the big data era. Data Sci J 1–10

    Google Scholar 

  19. Hermans K, Waegeman W, Opsomer G, Van Ranst B, De Koster J, Van Eetvelde M et al (2017) Novel approaches to assess the quality of fertility data stored in dairy herd management software. J Dairy Sci 100(5):4078–4089

    Article  Google Scholar 

  20. Ardagna D, Cappiello C, Samá W, Vitali M (2018) Context-aware data quality assessment for big data. Futur Gener Comput Syst 89:548–562

    Article  Google Scholar 

  21. Saha B, Srivastava D (2014) Data quality: the other face of Big Data. In: 2014 IEEE 30th international conference on data engineering. IEEE, pp 1294–1297

    Google Scholar 

  22. Abdellaoui S, Bellatreche L, Nader F (2016) A quality-driven approach for building heterogeneous distributed databases: the case of data warehouses. In: 2016 16th IEEE/ACM international symposium on cluster, cloud and grid computing (CCGrid). IEEE, pp 631–638

    Google Scholar 

  23. Talha M, El Kalam AA, Elmarzouqi N (2019) Big data: trade-off between data quality and data security. In: The 9th international symposium on frontiers in Ambient and mobile systems (FAMS). Elsevier B.V., pp 916–922

    Google Scholar 

  24. Ehrlinger L, Rusz E, Wöß W (2019) A survey of data quality measurement and monitoring tools. CoRR abs/1907.0

    Google Scholar 

  25. Jarwar MA, Chong I (2020) Web objects based contextual data quality assessment model for semantic data application. Appl Sci [Internet] 10(6):33

    Google Scholar 

  26. Onyeabor GA, Ta’a A (2019) A model for addressing quality issues in big data. In: Advances in intelligent systems and computing, pp 65–73

    Google Scholar 

  27. Jang WJ, Lee ST, Kim JB, Gim GY (2019) A study on data profiling: focusing on attribute value quality index. Appl Sci 9(23)

    Google Scholar 

  28. Gyulgyulyan E, Julien A, Franck R, Astsatryan H (2019) Data quality alerting model for big data analytics, vol. 3, pp 405–416

    Google Scholar 

  29. Cappiello C, Samá W, Vitali M (2018) Quality awareness for a successful big data exploitation. In: Proceedings of the 22nd international database engineering and applications symposium. Villa San Giovanni, Italy, pp 37–44

    Google Scholar 

  30. Catarci T, Scannapieco M, Console M, Demetrescu C (2017) My (fair) big data. In: 2017 IEEE international conference on Big Data (Big Data). IEEE, pp 2974–2979

    Google Scholar 

  31. De Tré G, De Mol R, Bronselaer A (2018) Handling veracity in multi-criteria decision-making: a multi-dimensional approach. Inf Sci (NY). 460–461:541–554

    Article  Google Scholar 

  32. Shankaranarayanan G, Blake R (2017) From content to context: the evolution and growth of data quality research. J Data Inf Qual 8(2):1–28

    Article  Google Scholar 

  33. Surbakti FPS, Wang W, Indulska M, Sadiq S (2020) Factors influencing effective use of big data: a research framework. Inf Manag 57(1):103146

    Google Scholar 

  34. Lee D (2019) Big data quality assurance through data traceability: a case study of the national standard reference data program of Korea. IEEE Access 7:36294–36299

    Article  Google Scholar 

  35. Abdallah M (2019) Big Data quality challenges. In: 2019 international conference on Big Data and computational intelligence (ICBDCI). IEEE, pp 1–3

    Google Scholar 

  36. L’Heureux A, Grolinger K, Elyamany HF, Capretz MAM (2017) Machine learning with big data: challenges and approaches. IEEE Access 5:7776–7797

    Google Scholar 

  37. Taleb I, El Kassabi HTE, Serhani MA, Dssouli R, Bouhaddioui C (2016) Big Data quality: a quality dimensions evaluation. In: 2016 international IEEE conferences on ubiquitous intelligence and computing, advanced and trusted computing, scalable computing and communications, cloud and big data computing, internet of people, and smart world congress (UIC/ATC/ScalCom/CBDCom/IoP/SmartWorld). IEEE, pp 759–765

    Google Scholar 

  38. García Lozano M, Brynielsson J, Franke U, Rosell M, Tjörnhammar E, Varga S et al (2020) Veracity assessment of online data. Decis Supp Syst 129:113132

    Google Scholar 

  39. Swapnil W, Anil Y, Gupta S.: Big Data and data mining. In: International conference on advances in information technology and management, pp 25–29

    Google Scholar 

  40. Hariri RH, Fredericks EM, Bowers KM (2019) Uncertainty in Big Data analytics: survey, opportunities, and challenges. J Big Data 6(1)

    Google Scholar 

  41. Francisco MMC, Alves-Souza SN, Campos EGL, De Souza LS (2017) Total data quality management and total information quality management applied to costumer relationship management. In: ACM international conference proceeding series, pp 40–45

    Google Scholar 

  42. Zheng L (2017) SNSQ ontology: a domain ontology for SNSs data quality. In: 2017 2nd IEEE international conference on cloud computing Big Data analysis (ICCCBDA 2017), pp 11–18

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Ridzuan, F., Wan Zainon, W.M.N., Zairul, M. (2022). A Thematic Review on Data Quality Challenges and Dimension in the Era of Big Data. In: Isa, K., et al. Proceedings of the 12th National Technical Seminar on Unmanned System Technology 2020. Lecture Notes in Electrical Engineering, vol 770. Springer, Singapore. https://doi.org/10.1007/978-981-16-2406-3_56

Download citation

Publish with us

Policies and ethics