Quantifying uncertainty in the estimation of probability distributions with confidence bands

We consider ordinary least squares parameter estimation problems where the unknown parameters to be estimated are probability distributions. A computational framework for quantification of uncertainty (e.g., standard errors) associated with the estimated parameters is given and sample numerical findings are presented.


Motivation
Development of an inverse problem computational methodology for the estimation of functional parameters in the presence of model and data uncertainty Applications involve the estimation of growth rate distributions in size-structured marine populations (Type II problem -aggregate or population level longitudinal data) Extension of the asymptotic standard error theory for finite-dimensional ordinary least squares (OLS) estimators to "functional" confidence bands that will aid in quantifying the uncertainty in estimated probability distributions Application: Size-Structured Shrimp Population Use of shrimp as a scaffold organism to produce large amounts of a vaccine rapidly in response to a toxic attack on populations [Banks et. al. 2006] Joint project with ABN (Advanced Bionutrition Corporation) involving the development of a hybrid model of the shrimp biomass/countermeasure production system Being able to accurately model the dynamics of the size-structured shrimp population is important since the output of the biomass model will serve as input to the vaccine production model Sinko-Streifer (SS) Model for Size-Structured Populations (1967) Widely used to describe various age and size-structured populations (cells, plants, and marine species)

Example of Aggregate Type Longitudinal Shrimp Data
Previous size-structured population data from a group in Texas indicates variability in size that could be a result of variability in growth rates and might suggest the use of GRD model ( Growth Rate Distribution (GRD) Model (1988) Deterministic growth model of (1) is not biologically reasonable when modeling populations that exhibit a great deal of variability in aggregate type longitudinal data as time progresses GRD model, a modification of the SS model, was developed by Banks et. al. [Banks et. al. 1988] to account for the variability observed in populations, such as size-structured mosquitofish populations that exhibit both dispersion and bifurcation in time Assumption of GRD: Individual growth rates vary across the population

Early Growth Dynamics of Shrimp
We assume that mortality rate and reproduction rate in (1) are both zero We also assume that the size-dependent growth rate function of the shrimp has the form which was shown to provide reasonable fits to average size data for 50 randomly sampled shrimp in [Banks et. al. 2008] Intrinsic growth rate b is a random variable taking values in a compact set B Analysis of previous data also suggested that the assumption of a normal distribution on the intrinsic growth rates leads to a lognormal distribution in size We choose a truncated normal distribution with mean µ b and standard deviation σ b Standard Parametric Approach -PAR(M, N) In the parametric approach, we assume that we know the distribution of the growth rates Assuming P is (absolutely) continuous ( dP db = p), the population density from the GRD model (2) is given by where θ ∈ R M + represents the parameters (µ b , σ b ) that are associated with the a priori probability density and distribution M represents the number of parameters in θ and N represents the number of quadrature nodes used to approximate the integral above Parameter Estimation with PAR(M,N) Ordinary Least Squares Formulation (assuming constant variance noise model) We wish to solve forθ We use MATLAB fmincon to determine the optimal values of θ = (µ b , σ b ) used to generate the estimated probability density and distribution Confidence Intervals... Confidence Bands Since we reduced infinite dimensional estimation problem to finite dimensional problem for θ, we are able to compute standard errors based on the established asymptotic standard error theory for OLS estimators [Seber and Wild 1989] Standard errors are used to compute confidence intervals to quantify the uncertainty in the estimated finite dimensional parameter θ How does one use the confidence intervals computed in the finite dimensional setting to construct confidence bands in the infinite dimensional setting?

Monte Carlo Sampling Study to Aid in the Design of Experiments
Goal: Determine the sampling size N s and sampling frequency N t needed to obtain reliable estimates of the probabilistic growth rate parameters in the GRD model (2) -experiments to be carried out at ABN and SCDNR (South Carolina Department of Natural Resources) [Banks et. al. 2008] Population data (total number of shrimp in each size class) used in inverse problem calculations where ∆x is the length of the size class interval We simulated population data, where N s varied from 25, 50, 75 to 100 and N t varied from twice a week, once a week to once every two weeks Conclusions: Most desirable experiment involved using N s = 100 once a week; however there appears to be little loss in accuracy if one uses N s = 50

Parameter Estimation Results with ABN Data
Inverse problem with data (subsequently) collected from shrimp cultured in tanks at ABN Fifty shrimp were randomly sampled and measured once a week under relatively constant tank conditions Using our methodology, we determined estimates of the growth rate distribution and quantified the uncertainty associated with these estimates with confidence bands PAR(2,128) Results with Complete December Datâ

Summary and Ongoing Work
We have demonstrated how mathematical and statistical tools can be used to gain insight into the early growth dynamics of shrimp.
We are working on improving the model predictions to the shrimp population data by considering different parametric and non-parametric approaches [Banks and Davis 2005] in the GRD model.
Following the work of Seber and Wild, we are also working on fully developing the mathematical and asymptotic statistical theory ("functional" confidence bands) for OLS inverse problems where the parameter of interest is a probability distribution.
We would also like to determine if the confidence bands constructed in the non-parametric approximation methods (not discussed here today) are converging to some "true" smooth confidence bands.