Abstract
Combining information from different sources is an important practical problem in survey sampling. Using a hierarchical area-level model, we establish a framework to integrate auxiliary information to improve state-level area estimates. The best predictors are obtained by the conditional expectations of latent variables given observations, and an estimate of the mean squared prediction error is discussed. Sponsored by the National Agricultural Statistics Service of the US Department of Agriculture, the proposed model is applied to the planted crop acreage estimation problem by combining information from three sources, including the June Area Survey obtained by a probability-based sampling of lands, administrative data about the planted acreage and the cropland data layer, which is a commodity-specific classification product derived from remote sensing data. The proposed model combines the available information at a sub-state level called the agricultural statistics district and aggregates to improve state-level estimates of planted acreages for different crops. Supplementary materials accompanying this paper appear on-line.
Similar content being viewed by others
References
Battese, G. E., Harter, R. M. and Fuller, W. A. (1988). An error-components model for prediction of county crop areas using survey and satellite data, Journal of the American Statistical Association 83: 28–36.
Berg, E. J. and Fuller, W. A. (2014). Small area prediction of proportions with applications to the canadian labour force survey, Journal of Survey Statistics and Methodology 2: 227–256.
Boryan, C., Yang, Z., Mueller, R. and Craig, M. (2011). Monitoring us agriculture: the US department of agriculture, national agricultural statistics service, cropland data layer program, Geocarto International 26: 341–358.
Cressie, N. (2015). Statistics for Spatial Data, revised edn, John Wiley & Sons, New York.
Datta, G., Ghosh, M. et al. (2012). Small area shrinkage estimation, Statistical Science 27: 95–114.
Datta, G. S. (2009). Model-based approach to small area estimation, Handbook of Statistics 29: 251–288.
Datta, G. S. and Ghosh, M. (1991). Bayesian prediction in linear models: Applications to small area estimation, The Annals of Statistics 19: 1748–1770.
Deming, W. E. and Stephan, F. F. (1940). On a least squares adjustment of a sampled frequency table when the expected marginal totals are known, The Annals of Mathematical Statistics 11: 427–444.
Dever, J. A. and Valliant, R. (2010). A comparison of variance estimators for poststratification to estimated control totals, Survey Methodology 36: 45–56.
Elliott, M. R. and Davis, W. W. (2005). Obtaining cancer risk factor prevalence estimates in small areas: combining data from two surveys, Journal of the Royal Statistical Society: Series C (Applied Statistics) 54: 595–609.
Fay, R. E. and Herriot, R. A. (1979). Estimates of income for small places: An application of james-stein procedures to census data, Journal of the American Statistical Association 74: 269–277.
Ghosh, M., Natarajan, K., Stroud, T. and Carlin, B. P. (1998). Generalized linear models for small-area estimation, Journal of the American Statistical Association 93: 273–282.
Ghosh, M. and Rao, J. N. K. (1994). Small area estimation: an appraisal, Statistical science 9: 55–76.
Hidiroglou, M. (2001). Double sampling, Survey methodology 27: 143–154.
Kim, J. K. and Park, M. (2010). Calibration estimation in survey sampling, International Statistical Review 78: 21–39.
Kim, J. K., Park, S. and Kim, S. Y. (2015). Small area estimation combining information from several sources, Survey Methodology 41: 21–36.
Kim, J. K. and Rao, J. N. K. (2012). Combining data from two independent surveys: a model-assisted approach, Biometrika 99: 85–100.
Kim, J. K. and Shao, J. (2013). Statistical Methods for Handling Incomplete Data, CRC Press, Florida.
Lahiri, S. N. and Zhu, J. (2006). Resampling methods for spatial regression models under a class of stochastic designs, The Annals of Statistics 34: 1774–1813.
Legg, J. C. and Fuller, W. A. (2009). Two-phase sampling, Handbook of statistics 29: 55–70.
Louis, T. A. (1982). Finding the observed information matrix when using the EM algorithm, Journal of the Royal Statistical Society. Series B (Methodological) 44: 226–233.
Manzi, G., Spiegelhalter, D. J., Turner, R. M., Flowers, J. and Thompson, S. G. (2011). Modelling bias in combining small area prevalence estimates from multiple surveys, Journal of the Royal Statistical Society: Series A (Statistics in Society) 174: 31–50.
Merkouris, T. (2004). Combining independent regression estimators from multiple surveys, Journal of the American Statistical Association 99: 1131–1139.
Merkouris, T. (2010). Combining information from multiple surveys by using regression for efficient small domain estimation, Journal of the Royal Statistical Society: Series B (Statistical Methodology) 72: 27–48.
Pfeffermann, D. (2002). Small area estimation: New developments and directions, International Statistical Review/Revue Internationale de Statistique 70: 125–143.
Raghunathan, T. E., Xie, D., Schenker, N., Parsons, V. L., Davis, W. W., Dodd, K. W. and Feuer, E. J. (2007). Combining information from two surveys to estimate county-level prevalence rates of cancer risk factors and screening, Journal of the American Statistical Association 102: 474–486.
Rao, J. N. K. and Molina, I. (2015). Small Area Estimation, second edn, Wiley Online Library, New Jersey.
Renssen, R. H. and Nieuwenbroek, N. J. (1997). Aligning estimates for common variables in two or more sample surveys, Journal of the American Statistical Association 92: 368–374.
Tam, S.-M. and Clarke, F. (2015). Big data, official statistics and some initiatives by the Australian Bureau of Statistics, International Statistical Review 83: 436–448.
Torabi, M. and Rao, J. N. K. (2008). Small area estimation under a two-level model, Survey Methodology 34: 11–17.
Torabi, M. and Rao, J. N. K. (2014). On small area estimation under a sub-area level model, Journal of Multivariate Analysis 127: 36–55.
United States Department of Agriculture (2015). June area survey, Website. Last checked: October 15, 2015.
Wu, C. J. (1983). On the convergence properties of the em algorithm, The Annals of statistics 11: 95–103.
Wu, C. and Lu, W. W. (2016). Calibration weighting methods for complex surveys, International Statistical Review 84: 79–98.
You, Y. and Rao, J. N. K. (2002). A pseudo-empirical best linear unbiased prediction approach to small area estimation using survey weights, The Canadian Journal of Statistics/La Revue Canadienne de Statistique 30: 431–439.
Zieschang, K. D. (1990). Sample weighting methods and estimation of totals in the consumer expenditure survey, Journal of the American Statistical Association 85: 986–1001.
Acknowledgements
We are grateful to three referees and the Associate Editor for the constructive comments. This research was supported by the National Agricultural Statistics Service of the US Department of Agriculture.
Author information
Authors and Affiliations
Corresponding author
Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
About this article
Cite this article
Kim, J.K., Wang, Z., Zhu, Z. et al. Combining Survey and Non-survey Data for Improved Sub-area Prediction Using a Multi-level Model. JABES 23, 175–189 (2018). https://doi.org/10.1007/s13253-018-0320-2
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s13253-018-0320-2