Skip to main content
Log in

Visual analysis for panel data imputation with Bayesian network

  • Published:
The Journal of Supercomputing Aims and scope Submit manuscript

Abstract

Bayesian network is derived from conditional probability and is useful in inferring the next state of the currently observed variables. If data are missed or corrupted during data collection or transfer, the characteristics of the original data may be distorted and biased. Therefore, predicted values from the Bayesian network designed with missing data are not reliable. Various techniques have been studied to resolve the imperfection in data using statistical techniques or machine learning, but since the complete data are unknown, there is no optimal way to impute missing values. In this paper, we present a visual analysis system that supports decision-making to impute missing values occurring in panel data. The visual analysis system allows data analysts to explore the cause of missing data in panel datasets. The system also enables us to compare the performance of suitable imputation models with the Bayesian network accuracy and the Kolmogorov–Smirnov test. We evaluate how the visual analysis system supports the decision-making process for the data imputation with datasets in different domains.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

References

  1. Alemzadeh S, Niemann U, Ittermann T, Völzke H, Schneider D, Spiliopoulou M, Bühler K, Preim B (2020) Visual analysis of missing values in longitudinal cohort study data. In: Computer Graphics Forum. Wiley Online Library, vol 39, pp 63–75

  2. Allison PD (2001) Missing data: Sage university papers series on quantitative applications in the social sciences (07–136). Thousand Oaks, CA

    Google Scholar 

  3. Andridge RR, Little RJ (2010) A review of hot deck imputation for survey non-response. Int Stat Rev 78(1):40–64

    Article  Google Scholar 

  4. Antony U, George H, Heike H, Bernd S (1996) Interactive graphics for data sets with missing values-manet. J Comput Graph Stat 5(2):113–122

    Google Scholar 

  5. Arbesser C, Spechtenhauser F, Mühlbacher T, Piringer H (2017) Visplause: visual data quality assessment of many time series using plausibility checks. IEEE Trans Visual Comput Graph 23(1):641–650

    Article  Google Scholar 

  6. Arnold TB, Emerson JW (2011) Nonparametric goodness-of-fit tests for discrete null distributions. R J 3(2)

  7. Babad YM, Hoffer JA (1984) Even no data has a value. Commun ACM 27(8):748–756

    Article  Google Scholar 

  8. Baraldi AN, Enders CK (2010) An introduction to modern missing data analyses. J Sch Psychol 48(1):5–37

    Article  Google Scholar 

  9. Berglund P, Heeringa SG (2014) Multiple imputation of missing data using SAS. SAS Institute, Cary

    Google Scholar 

  10. Bostock M, Ogievetsky V, Heer J (2011) D\(^3\) data-driven documents. IEEE Trans Visual Comput Graph 17(12):2301–2309

    Article  Google Scholar 

  11. Buuren Sv, Groothuis-Oudshoorn K (2010) mice: Multivariate imputation by chained equations in r. J Stat Software 1–68

  12. Yy C (2010) Multiple imputation for missing data: Concepts and new development (version 9.0). SAS Institute Inc, Rockville, MD 49:1–11

    Google Scholar 

  13. Carpenter J, Kenward M (2012) Multiple imputation and its application. Wiley, Boca Raton

    MATH  Google Scholar 

  14. Cheng X, Cook D, Hofmann H et al (2015) Visually exploring missing values in multivariable data using a graphical user interface. J Stat Softw 68(1):1–23

    Google Scholar 

  15. Dingen D, van’t Veer M, Houthuizen P, Mestrom EH, Korsten EH, Bouwman AR, Van Wijk J (2019) Regressionexplorer: interactive exploration of logistic regression models with subgroup analysis. IEEE Trans Visual Comput Graph 25(1):246–255

  16. Eaton C, Plaisant C, Drizd T (2005) Visualizing missing data: Graph interpretation user study. In: IFIP Conference on Human–Computer Interaction. Springer, pp 861–872

  17. Enders CK (2010) Applied missing data analysis. Guilford press

  18. Enders CK, Gottschall AC (2011) Multiple imputation strategies for multiple group structural equation models. Struct Equ Model 18(1):35–54

    Article  MathSciNet  Google Scholar 

  19. Fernstad SJ (2018) To identify what is not there: A definition of missingness patterns and evaluation of missing value visualization. Inform Visual. 1473871618785387

  20. Fernstad SJ, Glen RC (2014) Visual analysis of missing data–to see what isn’t there. In: 2014 IEEE Conference on Visual Analytics Science and Technology (VAST), pp 249–250. IEEE

  21. Graham JW (2009) Missing data analysis: making it work in the real world. Annu Rev Psychol 60:549–576

    Article  Google Scholar 

  22. Heike MT, Hofmann H, Siegl B, Unwin A (1997) Manet extensions to interactive statistical graphics for missing values. In: In New Techniques and Technologies for Statistics II. Citeseer

  23. Highcharts: highcharts. https://www.highcharts.com/. Accessed 10 June 2019

  24. Honaker J, King G (2010) What to do about missing values in time-series cross-section data. Am J Polit Sci 54(2):561–581

    Article  Google Scholar 

  25. Honaker J, King G, Blackwell M (2011) Amelia II: A program for missing data. J Stat Software 45(7):1–47

    Article  Google Scholar 

  26. Johnson R, Wichern D (2002) Applied multivariate statistical analysis, 5th edn. Prentice Hall, Upper Saddle River

    MATH  Google Scholar 

  27. Kang H (2013) The prevention and handling of the missing data. Korean J Anesthesiol 64(5):402

    Article  Google Scholar 

  28. Kowarik A, Templ M (2016) Imputation with the r package vim. J Stat Softw 74(7):1–16

    Article  Google Scholar 

  29. Krause J, Perer A, Bertini E (2014) Infuse: interactive feature selection for predictive modeling of high dimensional data. IEEE Trans Visual Comput Graph 20(12):1614–1623

    Article  Google Scholar 

  30. Little RJ, Rubin DB (2014) Statistical analysis with missing data, vol 333. Wiley, Boca Raton

    MATH  Google Scholar 

  31. Marshall A, Altman DG, Royston P, Holder RL (2010) Comparison of techniques for handling missing covariate data within prognostic modelling studies: a simulation study. BMC Med Res Methodol 10(1):7

    Article  Google Scholar 

  32. McKnight PE, McKnight KM, Sidani S, Figueredo AJ (2007) Missing data: a gentle introduction. Guilford Press

  33. Moritz S, Bartz-Beielstein T (2017) imputets: time series missing value imputation in r. R J 9(1):207–218

    Article  Google Scholar 

  34. Mühlbacher T, Piringer H (2013) A partition-based framework for building and validating regression models. IEEE Trans Visual Comput Graph 19(12):1962–1971

    Article  Google Scholar 

  35. Nguyen CD, Carlin JB, Lee KJ (2013) Diagnosing problems with imputation models using the Kolmogorov–Smirnov test: a simulation study. BMC Med Res Methodol 13(1):144

    Article  Google Scholar 

  36. Nguyen CD, Carlin JB, Lee KJ (2017) Model checking in multiple imputation: an overview and case study. Emerg Themes Epidemiol 14(1):8

    Article  Google Scholar 

  37. Osborne JW (2013) Best practices in data cleaning: a complete guide to everything you need to do before and after collecting your data. Sage

  38. Pi M, Yeon H, Son H, Jang Y (2019) Visual cause analytics for traffic congestion. IEEE Trans Visual Comput Graph

  39. Rubin DB (2004) Multiple imputation for nonresponse in surveys, vol 81. Wiley, Boca Raton

    MATH  Google Scholar 

  40. Schulz HJ, Nocke T, Heitzler M, Schumann H (2017) A systematic view on data descriptors for the visual analysis of tabular data. Inform Visual 16(3):232–256

    Google Scholar 

  41. Song H, Szafir DA (2018) Where’s my data? evaluating visualizations with missing data. IEEE Trans Visual Comput Graph

  42. Stuart EA, Azur M, Frangakis C, Leaf P (2009) Multiple imputation with large data sets: a case study of the children’s mental health initiative. Am J Epidemiol 169(9):1133–1139

    Article  Google Scholar 

  43. Swayne DF, Buja A (1998) Missing data in interactive high-dimensional data visualization. Comput Stat 13(1):15–26

    MATH  Google Scholar 

  44. Templ M, Alfons A, Filzmoser P (2012) Exploring incomplete data using visualization techniques. Adv Data Anal Classif 6(1):29–47

    Article  MathSciNet  Google Scholar 

  45. Templ M, Filzmoser P (2008) Visualization of missing values using the r-package vim. Reserach report cs-2008-1,Department of Statistics and Probability Therory,Vienna University of Technology

  46. Yeon H, Son H, Jang Y (2021) Visual performance improvement analytics of predictive model for unbalanced panel data. J Visual 1–14

Download references

Acknowledgements

This work was supported in part by the Basic Research Program through the National Research Foundation of Korea (NRF) funded by the MSIT under Grant 2019R1A4A1021702, and in part by Institute of Information & communications Technology Planning & Evaluation (IITP) grant funded by the Korea government (MSIT) (No.2019-0-00242, Development of a Big Data Augmented Analysis Profiling Platform for Maximizing Reliability and Utilization of Big Data).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yun Jang.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Yeon, H., Seo, S., Son, H. et al. Visual analysis for panel data imputation with Bayesian network. J Supercomput 78, 1759–1782 (2022). https://doi.org/10.1007/s11227-021-03934-x

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11227-021-03934-x

Keywords

Navigation