Visual analysis for panel data imputation with Bayesian network

Yeon, Hanbyul; Seo, Seongbum; Son, Hyesook; Jang, Yun

doi:10.1007/s11227-021-03934-x

Visual analysis for panel data imputation with Bayesian network

Published: 21 June 2021

Volume 78, pages 1759–1782, (2022)
Cite this article

The Journal of Supercomputing Aims and scope Submit manuscript

Hanbyul Yeon¹,
Seongbum Seo¹,
Hyesook Son¹ &
…
Yun Jang ORCID: orcid.org/0000-0001-7745-1158¹

320 Accesses
Explore all metrics

Abstract

Bayesian network is derived from conditional probability and is useful in inferring the next state of the currently observed variables. If data are missed or corrupted during data collection or transfer, the characteristics of the original data may be distorted and biased. Therefore, predicted values from the Bayesian network designed with missing data are not reliable. Various techniques have been studied to resolve the imperfection in data using statistical techniques or machine learning, but since the complete data are unknown, there is no optimal way to impute missing values. In this paper, we present a visual analysis system that supports decision-making to impute missing values occurring in panel data. The visual analysis system allows data analysts to explore the cause of missing data in panel datasets. The system also enables us to compare the performance of suitable imputation models with the Bayesian network accuracy and the Kolmogorov–Smirnov test. We evaluate how the visual analysis system supports the decision-making process for the data imputation with datasets in different domains.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Analysis and Visualization of Missing Value Patterns

Correlation Visualization Under Missing Values: A Comparison Between Imputation and Direct Parameter Estimation Methods

Visualizing Missing Data: COVID-2019

References

Alemzadeh S, Niemann U, Ittermann T, Völzke H, Schneider D, Spiliopoulou M, Bühler K, Preim B (2020) Visual analysis of missing values in longitudinal cohort study data. In: Computer Graphics Forum. Wiley Online Library, vol 39, pp 63–75
Allison PD (2001) Missing data: Sage university papers series on quantitative applications in the social sciences (07–136). Thousand Oaks, CA
Google Scholar
Andridge RR, Little RJ (2010) A review of hot deck imputation for survey non-response. Int Stat Rev 78(1):40–64
Article Google Scholar
Antony U, George H, Heike H, Bernd S (1996) Interactive graphics for data sets with missing values-manet. J Comput Graph Stat 5(2):113–122
Google Scholar
Arbesser C, Spechtenhauser F, Mühlbacher T, Piringer H (2017) Visplause: visual data quality assessment of many time series using plausibility checks. IEEE Trans Visual Comput Graph 23(1):641–650
Article Google Scholar
Arnold TB, Emerson JW (2011) Nonparametric goodness-of-fit tests for discrete null distributions. R J 3(2)
Babad YM, Hoffer JA (1984) Even no data has a value. Commun ACM 27(8):748–756
Article Google Scholar
Baraldi AN, Enders CK (2010) An introduction to modern missing data analyses. J Sch Psychol 48(1):5–37
Article Google Scholar
Berglund P, Heeringa SG (2014) Multiple imputation of missing data using SAS. SAS Institute, Cary
Google Scholar
Bostock M, Ogievetsky V, Heer J (2011) D\(^3\) data-driven documents. IEEE Trans Visual Comput Graph 17(12):2301–2309
Article Google Scholar
Buuren Sv, Groothuis-Oudshoorn K (2010) mice: Multivariate imputation by chained equations in r. J Stat Software 1–68
Yy C (2010) Multiple imputation for missing data: Concepts and new development (version 9.0). SAS Institute Inc, Rockville, MD 49:1–11
Google Scholar
Carpenter J, Kenward M (2012) Multiple imputation and its application. Wiley, Boca Raton
MATH Google Scholar
Cheng X, Cook D, Hofmann H et al (2015) Visually exploring missing values in multivariable data using a graphical user interface. J Stat Softw 68(1):1–23
Google Scholar
Dingen D, van’t Veer M, Houthuizen P, Mestrom EH, Korsten EH, Bouwman AR, Van Wijk J (2019) Regressionexplorer: interactive exploration of logistic regression models with subgroup analysis. IEEE Trans Visual Comput Graph 25(1):246–255
Eaton C, Plaisant C, Drizd T (2005) Visualizing missing data: Graph interpretation user study. In: IFIP Conference on Human–Computer Interaction. Springer, pp 861–872
Enders CK (2010) Applied missing data analysis. Guilford press
Enders CK, Gottschall AC (2011) Multiple imputation strategies for multiple group structural equation models. Struct Equ Model 18(1):35–54
Article MathSciNet Google Scholar
Fernstad SJ (2018) To identify what is not there: A definition of missingness patterns and evaluation of missing value visualization. Inform Visual. 1473871618785387
Fernstad SJ, Glen RC (2014) Visual analysis of missing data–to see what isn’t there. In: 2014 IEEE Conference on Visual Analytics Science and Technology (VAST), pp 249–250. IEEE
Graham JW (2009) Missing data analysis: making it work in the real world. Annu Rev Psychol 60:549–576
Article Google Scholar
Heike MT, Hofmann H, Siegl B, Unwin A (1997) Manet extensions to interactive statistical graphics for missing values. In: In New Techniques and Technologies for Statistics II. Citeseer
Highcharts: highcharts. https://www.highcharts.com/. Accessed 10 June 2019
Honaker J, King G (2010) What to do about missing values in time-series cross-section data. Am J Polit Sci 54(2):561–581
Article Google Scholar
Honaker J, King G, Blackwell M (2011) Amelia II: A program for missing data. J Stat Software 45(7):1–47
Article Google Scholar
Johnson R, Wichern D (2002) Applied multivariate statistical analysis, 5th edn. Prentice Hall, Upper Saddle River
MATH Google Scholar
Kang H (2013) The prevention and handling of the missing data. Korean J Anesthesiol 64(5):402
Article Google Scholar
Kowarik A, Templ M (2016) Imputation with the r package vim. J Stat Softw 74(7):1–16
Article Google Scholar
Krause J, Perer A, Bertini E (2014) Infuse: interactive feature selection for predictive modeling of high dimensional data. IEEE Trans Visual Comput Graph 20(12):1614–1623
Article Google Scholar
Little RJ, Rubin DB (2014) Statistical analysis with missing data, vol 333. Wiley, Boca Raton
MATH Google Scholar
Marshall A, Altman DG, Royston P, Holder RL (2010) Comparison of techniques for handling missing covariate data within prognostic modelling studies: a simulation study. BMC Med Res Methodol 10(1):7
Article Google Scholar
McKnight PE, McKnight KM, Sidani S, Figueredo AJ (2007) Missing data: a gentle introduction. Guilford Press
Moritz S, Bartz-Beielstein T (2017) imputets: time series missing value imputation in r. R J 9(1):207–218
Article Google Scholar
Mühlbacher T, Piringer H (2013) A partition-based framework for building and validating regression models. IEEE Trans Visual Comput Graph 19(12):1962–1971
Article Google Scholar
Nguyen CD, Carlin JB, Lee KJ (2013) Diagnosing problems with imputation models using the Kolmogorov–Smirnov test: a simulation study. BMC Med Res Methodol 13(1):144
Article Google Scholar
Nguyen CD, Carlin JB, Lee KJ (2017) Model checking in multiple imputation: an overview and case study. Emerg Themes Epidemiol 14(1):8
Article Google Scholar
Osborne JW (2013) Best practices in data cleaning: a complete guide to everything you need to do before and after collecting your data. Sage
Pi M, Yeon H, Son H, Jang Y (2019) Visual cause analytics for traffic congestion. IEEE Trans Visual Comput Graph
Rubin DB (2004) Multiple imputation for nonresponse in surveys, vol 81. Wiley, Boca Raton
MATH Google Scholar
Schulz HJ, Nocke T, Heitzler M, Schumann H (2017) A systematic view on data descriptors for the visual analysis of tabular data. Inform Visual 16(3):232–256
Google Scholar
Song H, Szafir DA (2018) Where’s my data? evaluating visualizations with missing data. IEEE Trans Visual Comput Graph
Stuart EA, Azur M, Frangakis C, Leaf P (2009) Multiple imputation with large data sets: a case study of the children’s mental health initiative. Am J Epidemiol 169(9):1133–1139
Article Google Scholar
Swayne DF, Buja A (1998) Missing data in interactive high-dimensional data visualization. Comput Stat 13(1):15–26
MATH Google Scholar
Templ M, Alfons A, Filzmoser P (2012) Exploring incomplete data using visualization techniques. Adv Data Anal Classif 6(1):29–47
Article MathSciNet Google Scholar
Templ M, Filzmoser P (2008) Visualization of missing values using the r-package vim. Reserach report cs-2008-1,Department of Statistics and Probability Therory,Vienna University of Technology
Yeon H, Son H, Jang Y (2021) Visual performance improvement analytics of predictive model for unbalanced panel data. J Visual 1–14

Download references

Acknowledgements

This work was supported in part by the Basic Research Program through the National Research Foundation of Korea (NRF) funded by the MSIT under Grant 2019R1A4A1021702, and in part by Institute of Information & communications Technology Planning & Evaluation (IITP) grant funded by the Korea government (MSIT) (No.2019-0-00242, Development of a Big Data Augmented Analysis Profiling Platform for Maximizing Reliability and Utilization of Big Data).

Author information

Authors and Affiliations

Sejong University, Seoul, South Korea
Hanbyul Yeon, Seongbum Seo, Hyesook Son & Yun Jang

Authors

Hanbyul Yeon
View author publications
You can also search for this author in PubMed Google Scholar
Seongbum Seo
View author publications
You can also search for this author in PubMed Google Scholar
Hyesook Son
View author publications
You can also search for this author in PubMed Google Scholar
Yun Jang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yun Jang.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Yeon, H., Seo, S., Son, H. et al. Visual analysis for panel data imputation with Bayesian network. J Supercomput 78, 1759–1782 (2022). https://doi.org/10.1007/s11227-021-03934-x

Download citation

Accepted: 04 June 2021
Published: 21 June 2021
Issue Date: February 2022
DOI: https://doi.org/10.1007/s11227-021-03934-x

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Visual analysis for panel data imputation with Bayesian network

Abstract

Access this article

Similar content being viewed by others

Analysis and Visualization of Missing Value Patterns

Correlation Visualization Under Missing Values: A Comparison Between Imputation and Direct Parameter Estimation Methods

Visualizing Missing Data: COVID-2019

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Visual analysis for panel data imputation with Bayesian network

Abstract

Access this article

Similar content being viewed by others

Analysis and Visualization of Missing Value Patterns

Correlation Visualization Under Missing Values: A Comparison Between Imputation and Direct Parameter Estimation Methods

Visualizing Missing Data: COVID-2019

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation