Skip to main content
Log in

Target output distribution and distribution of bias for statistical model validation given a limited number of test data

  • Research Paper
  • Published:
Structural and Multidisciplinary Optimization Aims and scope Submit manuscript

Abstract

Simulation model must be validated with experimental data to correctly predict the outputs of engineered systems before they can be used with confidence. While doing so, pointwise comparison between predicted output by simulation model and experimental data for model verification and validation (V&V) is not appropriate since real-world phenomena are not deterministic due to existence of irreducible uncertainty. Thus, the output prediction by a simulation model needs to be represented by a certain probability density function (PDF). Statistical model validation methods are necessary to compare the model prediction and physical test data. The validation of a simulation model entails the acquisition of extraordinarily detailed test data, which is expensive to generate, and practicing engineers can afford only a very limited number of test data. This paper proposes an effective method to validate simulation model by using a target output distribution, which closely approximates the true output distribution. Furthermore, the proposed target output distribution accounts for a biased simulation model with stochastic outputs—specifically, simulation output distribution—using limited numbers of input and output test data. Since limited test data may involve outlier or be sparse, a data quality checking process is proposed to determine whether a given output test data needs to be balanced. If necessary, stratified sampling using cluster analysis is employed to sample balanced test data. Next, Bayesian analysis is used to obtain many possible candidates of target output distributions, from which the one at the posterior median is selected. Then, the distribution of bias can be identified using Monte Carlo convolution. Three engineering examples are used to demonstrate that (1) the developed target output distribution closely approximates the true output distribution and is robust under different sets of test data; (2) the reallocated test dataset by a quality checking process and balance sampling leads to better matching to the true output distribution; and (3) the distribution of bias is effectively used to understand the model’s accuracy and model confidence for comparison study.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17

Similar content being viewed by others

Abbreviations

AKDE:

adaptive KDE

B i(x):

unknown model bias for ith output response

CAE:

computer-aided engineering

CDF:

cumulative distribution function

DKG:

dynamic kriging

\( \hat{f}(y) \) :

output PDF using AKDE

\( {G}_i\left(\boldsymbol{x}\right),{G}_i^{true}\left(\boldsymbol{x}\right) \) :

biased simulation output and true output of ith constraint

h(y; h 0):

adaptive bandwidth in AKDE

h 0 :

global fixed bandwidth for modeling output distribution

ISFC:

indicated specific fuel consumption

K :

kernel

KDE:

kernel density estimation

M :

number of MCS samples

MAE:

mean absolute error

MAP:

maximum a posteriori probability

MCMC:

Markov Chain Monte Carlo

MCS:

Monte Carlo simulation

MSE:

mean squared error

\( {\hat{\mu}}_{h_0} \) , \( {{\hat{\sigma}}_{h_0}}^2 \) :

mean and variance of prior distribution for h0

UQ:

uncertainty quantification

PDF:

probability density function

P(h 0):

prior distribution of bandwidth

P(h 0|y e):

posterior distribution of bandwidth given output data

RBDO:

reliability-based design optimization

RPM:

revolutions per minute

STD:

standard deviation

V&V:

verification and Validation

\( {\boldsymbol{y}}_i^e,{\boldsymbol{y}}^e \) :

ith output data and output data vector

\( {\boldsymbol{x}}_{ik}^e \) :

kth element of the collected input data vector \( {\boldsymbol{x}}_i^e \)

X i :

ith input random variable

References

Download references

Funding

Technical and financial support was provided by the RAMDO—U.S. Army SBIR Sequential Phase II sub-contract from RAMDO Solutions, LLC.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to K. K. Choi.

Ethics declarations

Conflict of interests

The authors declare that they have no conflict of interest.

Additional information

Responsible Editor: Byeng D Youn

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Moon, MY., Choi, K.K. & Lamb, D. Target output distribution and distribution of bias for statistical model validation given a limited number of test data. Struct Multidisc Optim 60, 1327–1353 (2019). https://doi.org/10.1007/s00158-019-02338-z

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00158-019-02338-z

Keywords

Navigation