Abstract
Bayesian network is derived from conditional probability and is useful in inferring the next state of the currently observed variables. If data are missed or corrupted during data collection or transfer, the characteristics of the original data may be distorted and biased. Therefore, predicted values from the Bayesian network designed with missing data are not reliable. Various techniques have been studied to resolve the imperfection in data using statistical techniques or machine learning, but since the complete data are unknown, there is no optimal way to impute missing values. In this paper, we present a visual analysis system that supports decision-making to impute missing values occurring in panel data. The visual analysis system allows data analysts to explore the cause of missing data in panel datasets. The system also enables us to compare the performance of suitable imputation models with the Bayesian network accuracy and the Kolmogorov–Smirnov test. We evaluate how the visual analysis system supports the decision-making process for the data imputation with datasets in different domains.
Similar content being viewed by others
References
Alemzadeh S, Niemann U, Ittermann T, Völzke H, Schneider D, Spiliopoulou M, Bühler K, Preim B (2020) Visual analysis of missing values in longitudinal cohort study data. In: Computer Graphics Forum. Wiley Online Library, vol 39, pp 63–75
Allison PD (2001) Missing data: Sage university papers series on quantitative applications in the social sciences (07–136). Thousand Oaks, CA
Andridge RR, Little RJ (2010) A review of hot deck imputation for survey non-response. Int Stat Rev 78(1):40–64
Antony U, George H, Heike H, Bernd S (1996) Interactive graphics for data sets with missing values-manet. J Comput Graph Stat 5(2):113–122
Arbesser C, Spechtenhauser F, Mühlbacher T, Piringer H (2017) Visplause: visual data quality assessment of many time series using plausibility checks. IEEE Trans Visual Comput Graph 23(1):641–650
Arnold TB, Emerson JW (2011) Nonparametric goodness-of-fit tests for discrete null distributions. R J 3(2)
Babad YM, Hoffer JA (1984) Even no data has a value. Commun ACM 27(8):748–756
Baraldi AN, Enders CK (2010) An introduction to modern missing data analyses. J Sch Psychol 48(1):5–37
Berglund P, Heeringa SG (2014) Multiple imputation of missing data using SAS. SAS Institute, Cary
Bostock M, Ogievetsky V, Heer J (2011) D\(^3\) data-driven documents. IEEE Trans Visual Comput Graph 17(12):2301–2309
Buuren Sv, Groothuis-Oudshoorn K (2010) mice: Multivariate imputation by chained equations in r. J Stat Software 1–68
Yy C (2010) Multiple imputation for missing data: Concepts and new development (version 9.0). SAS Institute Inc, Rockville, MD 49:1–11
Carpenter J, Kenward M (2012) Multiple imputation and its application. Wiley, Boca Raton
Cheng X, Cook D, Hofmann H et al (2015) Visually exploring missing values in multivariable data using a graphical user interface. J Stat Softw 68(1):1–23
Dingen D, van’t Veer M, Houthuizen P, Mestrom EH, Korsten EH, Bouwman AR, Van Wijk J (2019) Regressionexplorer: interactive exploration of logistic regression models with subgroup analysis. IEEE Trans Visual Comput Graph 25(1):246–255
Eaton C, Plaisant C, Drizd T (2005) Visualizing missing data: Graph interpretation user study. In: IFIP Conference on Human–Computer Interaction. Springer, pp 861–872
Enders CK (2010) Applied missing data analysis. Guilford press
Enders CK, Gottschall AC (2011) Multiple imputation strategies for multiple group structural equation models. Struct Equ Model 18(1):35–54
Fernstad SJ (2018) To identify what is not there: A definition of missingness patterns and evaluation of missing value visualization. Inform Visual. 1473871618785387
Fernstad SJ, Glen RC (2014) Visual analysis of missing data–to see what isn’t there. In: 2014 IEEE Conference on Visual Analytics Science and Technology (VAST), pp 249–250. IEEE
Graham JW (2009) Missing data analysis: making it work in the real world. Annu Rev Psychol 60:549–576
Heike MT, Hofmann H, Siegl B, Unwin A (1997) Manet extensions to interactive statistical graphics for missing values. In: In New Techniques and Technologies for Statistics II. Citeseer
Highcharts: highcharts. https://www.highcharts.com/. Accessed 10 June 2019
Honaker J, King G (2010) What to do about missing values in time-series cross-section data. Am J Polit Sci 54(2):561–581
Honaker J, King G, Blackwell M (2011) Amelia II: A program for missing data. J Stat Software 45(7):1–47
Johnson R, Wichern D (2002) Applied multivariate statistical analysis, 5th edn. Prentice Hall, Upper Saddle River
Kang H (2013) The prevention and handling of the missing data. Korean J Anesthesiol 64(5):402
Kowarik A, Templ M (2016) Imputation with the r package vim. J Stat Softw 74(7):1–16
Krause J, Perer A, Bertini E (2014) Infuse: interactive feature selection for predictive modeling of high dimensional data. IEEE Trans Visual Comput Graph 20(12):1614–1623
Little RJ, Rubin DB (2014) Statistical analysis with missing data, vol 333. Wiley, Boca Raton
Marshall A, Altman DG, Royston P, Holder RL (2010) Comparison of techniques for handling missing covariate data within prognostic modelling studies: a simulation study. BMC Med Res Methodol 10(1):7
McKnight PE, McKnight KM, Sidani S, Figueredo AJ (2007) Missing data: a gentle introduction. Guilford Press
Moritz S, Bartz-Beielstein T (2017) imputets: time series missing value imputation in r. R J 9(1):207–218
Mühlbacher T, Piringer H (2013) A partition-based framework for building and validating regression models. IEEE Trans Visual Comput Graph 19(12):1962–1971
Nguyen CD, Carlin JB, Lee KJ (2013) Diagnosing problems with imputation models using the Kolmogorov–Smirnov test: a simulation study. BMC Med Res Methodol 13(1):144
Nguyen CD, Carlin JB, Lee KJ (2017) Model checking in multiple imputation: an overview and case study. Emerg Themes Epidemiol 14(1):8
Osborne JW (2013) Best practices in data cleaning: a complete guide to everything you need to do before and after collecting your data. Sage
Pi M, Yeon H, Son H, Jang Y (2019) Visual cause analytics for traffic congestion. IEEE Trans Visual Comput Graph
Rubin DB (2004) Multiple imputation for nonresponse in surveys, vol 81. Wiley, Boca Raton
Schulz HJ, Nocke T, Heitzler M, Schumann H (2017) A systematic view on data descriptors for the visual analysis of tabular data. Inform Visual 16(3):232–256
Song H, Szafir DA (2018) Where’s my data? evaluating visualizations with missing data. IEEE Trans Visual Comput Graph
Stuart EA, Azur M, Frangakis C, Leaf P (2009) Multiple imputation with large data sets: a case study of the children’s mental health initiative. Am J Epidemiol 169(9):1133–1139
Swayne DF, Buja A (1998) Missing data in interactive high-dimensional data visualization. Comput Stat 13(1):15–26
Templ M, Alfons A, Filzmoser P (2012) Exploring incomplete data using visualization techniques. Adv Data Anal Classif 6(1):29–47
Templ M, Filzmoser P (2008) Visualization of missing values using the r-package vim. Reserach report cs-2008-1,Department of Statistics and Probability Therory,Vienna University of Technology
Yeon H, Son H, Jang Y (2021) Visual performance improvement analytics of predictive model for unbalanced panel data. J Visual 1–14
Acknowledgements
This work was supported in part by the Basic Research Program through the National Research Foundation of Korea (NRF) funded by the MSIT under Grant 2019R1A4A1021702, and in part by Institute of Information & communications Technology Planning & Evaluation (IITP) grant funded by the Korea government (MSIT) (No.2019-0-00242, Development of a Big Data Augmented Analysis Profiling Platform for Maximizing Reliability and Utilization of Big Data).
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Yeon, H., Seo, S., Son, H. et al. Visual analysis for panel data imputation with Bayesian network. J Supercomput 78, 1759–1782 (2022). https://doi.org/10.1007/s11227-021-03934-x
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11227-021-03934-x