Skip to main content
Log in

Combining Survey and Non-survey Data for Improved Sub-area Prediction Using a Multi-level Model

  • Published:
Journal of Agricultural, Biological and Environmental Statistics Aims and scope Submit manuscript

Abstract

Combining information from different sources is an important practical problem in survey sampling. Using a hierarchical area-level model, we establish a framework to integrate auxiliary information to improve state-level area estimates. The best predictors are obtained by the conditional expectations of latent variables given observations, and an estimate of the mean squared prediction error is discussed. Sponsored by the National Agricultural Statistics Service of the US Department of Agriculture, the proposed model is applied to the planted crop acreage estimation problem by combining information from three sources, including the June Area Survey obtained by a probability-based sampling of lands, administrative data about the planted acreage and the cropland data layer, which is a commodity-specific classification product derived from remote sensing data. The proposed model combines the available information at a sub-state level called the agricultural statistics district and aggregates to improve state-level estimates of planted acreages for different crops. Supplementary materials accompanying this paper appear on-line.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3

Similar content being viewed by others

References

  • Battese, G. E., Harter, R. M. and Fuller, W. A. (1988). An error-components model for prediction of county crop areas using survey and satellite data, Journal of the American Statistical Association 83: 28–36.

    Article  Google Scholar 

  • Berg, E. J. and Fuller, W. A. (2014). Small area prediction of proportions with applications to the canadian labour force survey, Journal of Survey Statistics and Methodology 2: 227–256.

    Article  Google Scholar 

  • Boryan, C., Yang, Z., Mueller, R. and Craig, M. (2011). Monitoring us agriculture: the US department of agriculture, national agricultural statistics service, cropland data layer program, Geocarto International 26: 341–358.

    Article  Google Scholar 

  • Cressie, N. (2015). Statistics for Spatial Data, revised edn, John Wiley & Sons, New York.

    MATH  Google Scholar 

  • Datta, G., Ghosh, M. et al. (2012). Small area shrinkage estimation, Statistical Science 27: 95–114.

    Article  MathSciNet  MATH  Google Scholar 

  • Datta, G. S. (2009). Model-based approach to small area estimation, Handbook of Statistics 29: 251–288.

    Article  Google Scholar 

  • Datta, G. S. and Ghosh, M. (1991). Bayesian prediction in linear models: Applications to small area estimation, The Annals of Statistics 19: 1748–1770.

    Article  MathSciNet  MATH  Google Scholar 

  • Deming, W. E. and Stephan, F. F. (1940). On a least squares adjustment of a sampled frequency table when the expected marginal totals are known, The Annals of Mathematical Statistics 11: 427–444.

    Article  MathSciNet  MATH  Google Scholar 

  • Dever, J. A. and Valliant, R. (2010). A comparison of variance estimators for poststratification to estimated control totals, Survey Methodology 36: 45–56.

    Google Scholar 

  • Elliott, M. R. and Davis, W. W. (2005). Obtaining cancer risk factor prevalence estimates in small areas: combining data from two surveys, Journal of the Royal Statistical Society: Series C (Applied Statistics) 54: 595–609.

    Article  MathSciNet  MATH  Google Scholar 

  • Fay, R. E. and Herriot, R. A. (1979). Estimates of income for small places: An application of james-stein procedures to census data, Journal of the American Statistical Association 74: 269–277.

    Article  MathSciNet  Google Scholar 

  • Ghosh, M., Natarajan, K., Stroud, T. and Carlin, B. P. (1998). Generalized linear models for small-area estimation, Journal of the American Statistical Association 93: 273–282.

    Article  MathSciNet  MATH  Google Scholar 

  • Ghosh, M. and Rao, J. N. K. (1994). Small area estimation: an appraisal, Statistical science 9: 55–76.

    Article  MathSciNet  MATH  Google Scholar 

  • Hidiroglou, M. (2001). Double sampling, Survey methodology 27: 143–154.

    Google Scholar 

  • Kim, J. K. and Park, M. (2010). Calibration estimation in survey sampling, International Statistical Review 78: 21–39.

    Article  Google Scholar 

  • Kim, J. K., Park, S. and Kim, S. Y. (2015). Small area estimation combining information from several sources, Survey Methodology 41: 21–36.

    Google Scholar 

  • Kim, J. K. and Rao, J. N. K. (2012). Combining data from two independent surveys: a model-assisted approach, Biometrika 99: 85–100.

    Article  MathSciNet  MATH  Google Scholar 

  • Kim, J. K. and Shao, J. (2013). Statistical Methods for Handling Incomplete Data, CRC Press, Florida.

    MATH  Google Scholar 

  • Lahiri, S. N. and Zhu, J. (2006). Resampling methods for spatial regression models under a class of stochastic designs, The Annals of Statistics 34: 1774–1813.

    Article  MathSciNet  MATH  Google Scholar 

  • Legg, J. C. and Fuller, W. A. (2009). Two-phase sampling, Handbook of statistics 29: 55–70.

    Article  MathSciNet  Google Scholar 

  • Louis, T. A. (1982). Finding the observed information matrix when using the EM algorithm, Journal of the Royal Statistical Society. Series B (Methodological) 44: 226–233.

    MathSciNet  MATH  Google Scholar 

  • Manzi, G., Spiegelhalter, D. J., Turner, R. M., Flowers, J. and Thompson, S. G. (2011). Modelling bias in combining small area prevalence estimates from multiple surveys, Journal of the Royal Statistical Society: Series A (Statistics in Society) 174: 31–50.

    Article  MathSciNet  Google Scholar 

  • Merkouris, T. (2004). Combining independent regression estimators from multiple surveys, Journal of the American Statistical Association 99: 1131–1139.

    Article  MathSciNet  MATH  Google Scholar 

  • Merkouris, T. (2010). Combining information from multiple surveys by using regression for efficient small domain estimation, Journal of the Royal Statistical Society: Series B (Statistical Methodology) 72: 27–48.

    Article  MathSciNet  Google Scholar 

  • Pfeffermann, D. (2002). Small area estimation: New developments and directions, International Statistical Review/Revue Internationale de Statistique 70: 125–143.

    MATH  Google Scholar 

  • Raghunathan, T. E., Xie, D., Schenker, N., Parsons, V. L., Davis, W. W., Dodd, K. W. and Feuer, E. J. (2007). Combining information from two surveys to estimate county-level prevalence rates of cancer risk factors and screening, Journal of the American Statistical Association 102: 474–486.

    Article  MathSciNet  MATH  Google Scholar 

  • Rao, J. N. K. and Molina, I. (2015). Small Area Estimation, second edn, Wiley Online Library, New Jersey.

    Book  MATH  Google Scholar 

  • Renssen, R. H. and Nieuwenbroek, N. J. (1997). Aligning estimates for common variables in two or more sample surveys, Journal of the American Statistical Association 92: 368–374.

    Article  MathSciNet  MATH  Google Scholar 

  • Tam, S.-M. and Clarke, F. (2015). Big data, official statistics and some initiatives by the Australian Bureau of Statistics, International Statistical Review 83: 436–448.

    Article  Google Scholar 

  • Torabi, M. and Rao, J. N. K. (2008). Small area estimation under a two-level model, Survey Methodology 34: 11–17.

    Google Scholar 

  • Torabi, M. and Rao, J. N. K. (2014). On small area estimation under a sub-area level model, Journal of Multivariate Analysis 127: 36–55.

    Article  MathSciNet  MATH  Google Scholar 

  • United States Department of Agriculture (2015). June area survey, Website. Last checked: October 15, 2015.

  • Wu, C. J. (1983). On the convergence properties of the em algorithm, The Annals of statistics 11: 95–103.

    Article  MathSciNet  MATH  Google Scholar 

  • Wu, C. and Lu, W. W. (2016). Calibration weighting methods for complex surveys, International Statistical Review 84: 79–98.

    Article  MathSciNet  Google Scholar 

  • You, Y. and Rao, J. N. K. (2002). A pseudo-empirical best linear unbiased prediction approach to small area estimation using survey weights, The Canadian Journal of Statistics/La Revue Canadienne de Statistique 30: 431–439.

    Article  MathSciNet  MATH  Google Scholar 

  • Zieschang, K. D. (1990). Sample weighting methods and estimation of totals in the consumer expenditure survey, Journal of the American Statistical Association 85: 986–1001.

    Article  Google Scholar 

Download references

Acknowledgements

We are grateful to three referees and the Associate Editor for the constructive comments. This research was supported by the National Agricultural Statistics Service of the US Department of Agriculture.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jae Kwang Kim.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (zip 4 KB)

Supplementary material 2 (pdf 167 KB)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Kim, J.K., Wang, Z., Zhu, Z. et al. Combining Survey and Non-survey Data for Improved Sub-area Prediction Using a Multi-level Model. JABES 23, 175–189 (2018). https://doi.org/10.1007/s13253-018-0320-2

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s13253-018-0320-2

Keywords

Navigation