Statistical matching of sample survey data: application to integrate Iranian time use and labour force surveys

Rezaei Ghahroodi, Zahra

doi:10.1007/s10260-023-00693-2

Statistical matching of sample survey data: application to integrate Iranian time use and labour force surveys

Original Paper
Published: 12 April 2023

Volume 32, pages 1023–1051, (2023)
Cite this article

Statistical Methods & Applications Aims and scope Submit manuscript

Zahra Rezaei Ghahroodi¹

142 Accesses
2 Citations
Explore all metrics

Abstract

Survey data are still contemplated as one of the main sources in official statistics. However, due to the high cost of conducting a survey, as well as the respondent burden, it may not be possible to collect all variables of interest in a data set. To obtain a more comprehensive source of data, one possible way is to integrate available data from different data sets such as already existing data, administrative registers, and official surveys. This helps to minimize the shortcomings of each survey and to maximize their advantages. In this paper, a mixed method at the micro-level has been applied to integrate data sourced from two surveys, involving the ‘Iranian Labour Force Survey’ and the ‘Iranian Time Use Survey’ which have been performed in the Fall of 2015. Thereby, besides increasing the coverage of the variables from two sources, we could also study the peculiarities of work and life qualities. For this objective, we develop a statistical matching micro approach by proposing the conditional predictive Dirichlet distribution and conditional predictive multinomial distribution in the regression step of mixed methods. In the end, the quality of matching along with the similarity of marginal distributions of specific variables (variables of interest) pre-and-post the integration are assessed by some similarity measures and the Kolmogorov–Smirnov test.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Sae estimation of related labor market indicators for different overlapping areas

Article 18 April 2024

A tale of two data sets: comparing German administrative and survey data using wage inequality as an example

Article Open access 21 February 2023

Estimation within the new integrated system of household surveys in Germany

Article 01 August 2018

References

Agresti A (2002) Categorical data analysis, 2nd edn. Wiley, New York
Book MATH Google Scholar
Aitchison J (1986) The statistical analysis of compositional data. Chapman & Hall, New York
Book MATH Google Scholar
Alpman A, Gardes F, Thiombiano N (2017) Statistical matching for combining time-use surveys with consumer expenditure surveys: an evaluation on real data. Documents de travail du Centre d'Economie de la Sorbonne 17024, Université Panthéon-Sorbonne (Paris 1), Centre d'Economie de la Sorbonne. ffhalshs-01529699f
Baker R, Brick JM, Bates NA, Battaglia M, Couper MP, Dever JA, Gile KJ, Tourangeau R (2013) Summary report of the AAPOR Task force on non-probability sampling. J Surv Stat Methodol 1(2):90–143
Article Google Scholar
Balin M, D’Orazio M, Di Zio M, Scanu M, Torelli N (2009) Statistical matching of two surveys with a common subset. In: ISTAT Technical Report; ISTAT: Rome, Italy, pp 1–14
Barceló C, Pawlowsky V, Grunsky E (1996) Some aspects of transformations of compositional data and the identification of outliers. Math Geol 28(4):501–518
Article Google Scholar
Cochran WG (1977) Sampling techniques. Wiley, New York
MATH Google Scholar
Conti PL, Marella D, Scanu M (2008) Evaluation of matching noise for imputation techniques based on nonparametric local linear regression estimators. Comput Stat Data Anal 53(2):354–365
Article MathSciNet MATH Google Scholar
Conti PL, Marella D, Scanu M (2016) Statistical matching analysis for complex survey data with applications. J Am Stat Assoc 111(516):1715–1725. https://doi.org/10.1080/01621459.2015.1112803
Article MathSciNet Google Scholar
Cribari-Neto F, Zeileis A (2010) Beta regression in R. J Stat Softw 34(2):1–24
Article Google Scholar
D’Ambrosio A, Aria M, Siciliano R (2012) Accurate tree-based missing data imputation and data fusion within the statistical learning paradigm. J Classif 29(2):227–258
Article MathSciNet MATH Google Scholar
D’Ambrosio A, Aria M, Siciliano R (2007) Robust tree-based incremental imputation method for data fusion. In: International symposium on intelligent data analysis . Springer, Berlin, pp 174–183
D’Orazio M (2015) Integration and imputation of survey data in R: the StatMatch package. Rom Stat Rev 63(2):57–68
Google Scholar
D’Orazio M, Di Zio M, Scanu M (2006a) Statistical matching: theory and practice. John Wiley & Sons, New York
Book MATH Google Scholar
D’Orazio M, Zio M, Scanu M (2006b) Statistical matching for categorical data: displaying uncertainty and using logical constraints. J off Stat 22(1):137–157
Google Scholar
D’orazio M (2019) Statistical learning in official statistics: the case of statistical matching. Stat J IAOS 35(3):435–441
Article Google Scholar
D’Orazio M, Di Zio M, Scanu M, DCMT ID (2005) A comparison among different estimators of regression parameters on statistically matched files through an extensive simulation study, contributi istat, p 10
D’Orazio M (2011) Statistical matching through regression trees. Paper Presented at the SCo 2011 - 7th Conference on Statistical Computation and Complex Systems. Univ. Padova, September, pp 19–21
D’Orazio M (2013) Statistical matching: methodological issues and practice with R-StatMatch. In: EUSTAT 55th international statistical seminar
D’Orazio M (2020) Statmatch: statistical matching or data fusion. R-package
D’Alberto R, Raggi M (2020) How much reliable are the integrated ‘live’ data? A validation strategy proposal for the non-parametric micro statistical matching. J Appl Stat 48(2):322–348
Article MathSciNet MATH Google Scholar
D’Alberto R, Zavalloni M, Raggi M, Viaggi D (2020) A Statistical Matching Approach to reproduce the heterogeneity of willingness to pay in benefit transfer. Socioecon Plann Sci 74:100935
Article Google Scholar
D'Orazio M, D'Orazio MM (2022) Package ‘StatMatch’. Available Online at One of the Mirror Sites.
Eurostat (2013) Statistical matching of EU-SILC and the Household Budget Survey to Compare Poverty Estimates Using Income, Expenditures and Material Deprivation. Eurostat-Methodologies and Working Papers, Luxembourg: Publications Office
Ferrari SPL, Cribari-Neto F (2004) Beta Regression for modelling rates and proportions. J Appl Stat 31(7):799–815
Article MathSciNet MATH Google Scholar
Ghahroodi ZR, Ganjali M (2013) A Bayesian approach for analysing longitudinal nominal outcomes using random coefficients transitional generalized logit model: an application to the labour force survey data. J Appl Stat 40(7):1425–1445
Article MathSciNet MATH Google Scholar
Gower JC (1971) A general coefficient of similarity and some of its properties. Biometrics 27(4):857–871
Article Google Scholar
Hijazi RH, Jernigan RW (2009) Modeling compositional data using dirichlet regression models. J Appl Probab Stat 4(1):77–91
MathSciNet MATH Google Scholar
Hijazi RH (2003) Analysis of compositional data using dirichlet covariate models. American University. Ph.D. Dissertation. Washington, D.C.
Hijazi RH (2011) An EM-algorithm Based method to deal with rounded zeros in compositional data under dirichlet models. In: Proceedings of the 4th International workshop on compositional data analysis. Girona, Spain
Hussmanns R, Mehran F, Varmā V (1990) Surveys of economically active population employment, unemployment and underemployment, an ILO manual on concepts and methods. International Labour Organization
International Labour Organization and United Nations Development Programme (2018) Time-use surveys and statistics in Asia and the Pacific: review of challenges and future direction, Thailand
International Labour Organization (1988) Current international recommendations on labour statistics, Geneva.
Kadane JB (2001) Some statistical problems in merging data files. J off Stat 17(3):423–433
Google Scholar
Leulescu A, Agafitei M (2013) Statistical matching: a model based approach for data integration. Eurostat-Methodologies and Working Papers, pp 10–2
Little RJ, Rubin DB (2019) Statistical analysis with missing data, 3rd edn. John Wiley & Sons, New York
MATH Google Scholar
Maier MJ (2020) DirichletReg: dirichlet regression in R. R Package Version 0.7-0
Marella D, Scanu M, Conti PL (2008) On the matching noise of some nonparametric imputation procedures. Stat Probab Lett 78(12):1593–1600
Article MathSciNet MATH Google Scholar
Markatou M, Chen Y, Afendras G, Lindsay BG (2017) Statistical distances and their role in robustness. In: Diggle PJ (ed) New advances in statistics and data science. Springer, Berlin
MATH Google Scholar
Martin-Fernnandez JA, Barcelo Vidal C, Pawlowsky-Glahn V (2003) Dealing with zeros and missing values in compositional data sets using nonparametric imputation. Math Geol 35(3):253–278
Article MATH Google Scholar
Moriarity C, Scheuren F (2001) Statistical matching: a paradigm for assessing the uncertainty in the procedure. J off Stat 17(3):407–422
Google Scholar
Moriarity C, Scheuren F (2003) A note on Rubin’s statistical matching using file concatenation with adjusted weights and multiple imputation. J Bus Econ Stat 21(1):65–73
Article MathSciNet Google Scholar
Morikawa K, Kim JK (2018) A note on the equivalence of two semiparametricestimation methods for nonignorable nonresponse. Stat Probab Lett 140:1–6
Article MATH Google Scholar
Okner BA (1972) Constructing a new database from existing microdata sets: the 1966 merge file. Ann Econ Soc Meas 1(3):325–362
Google Scholar
Rässler S (2002) Statistical matching: a frequentist theory, practical applications and alternative bayesian approaches, vol 168. Springer Science & Business Media, Berlin
MATH Google Scholar
Rezaei Ghahroodi Z, Ganjali M, Harandi F, Berridge D (2011) Bivariate transition model for analyzing ordinal and nominal categorical responses: an application to the Labour Force Survey data. J Appl Stat 38(4):817–832
Article MathSciNet MATH Google Scholar
Rios-Avila F (2016) Quality of match for statistical matches used in the development of the levy institute measure of time and consumption poverty (LIMTCP) for Ghana and Tanzania, Levy Economics Institute, Working Paper 873
Rios-Avila F (2018) Quality of match for statistical matches using the american time use survey 2013, the Survey of Consumer Finances 2013, and the Annual Social and Economic Supplement 2014, Levy Economics Institute, Working Papers 798
Romano MC (2008) Time use in daily life. A multidisciplinary approach to the Time use’s analysis. Tech Rep ISTAT No 35
Rubin DB (1986) Statistical matching using file concatenation with adjusted weights and multiple imputations. J Bus Econ Stat 4(1):87–94
MathSciNet Google Scholar
Ruggles N, Ruggles R (1974) A strategy for merging and matching microdata sets. Ann Econ Soc Meas 1(3):353–371
Google Scholar
Scanu M (2008) The practical aspects to be considered for statistical matching. In: Report of WP2: recommendations on the use of methodologies for the integration of surveys and administrative data, ESSnet statistical methodology project on integration of survey and administrative data, pp 34–35. http://cenex-isad.istat.it/
Singh AC, Mantel H, Kinack M, Rowe G (1993) Statistical matching: use of auxiliary information as an alternative to the conditional independence assumption. Surv Methodol 19(1):59–79
Google Scholar
Templ M, Hron K, Filzmoser P (2011) Compositional data analysis: theory and applications. John Wiley and Sons, New York
MATH Google Scholar
Tsagris M, Stewart C (2018) A dirichlet regression model for compositional data with zeros. Lobachevskii J Math 39(3):398–412
Article MathSciNet MATH Google Scholar
United Nations Statistics Division (2021) International Classification of Activities for Time-Use Statistics 2016 (ICATUS 2016). United Nations New York
Walthery P, Gershuny J (2019) Improving stylised working time estimates with time diary data: a multi study assessment for the UK. Soc Indic Res 144(3):1303–1321
Article Google Scholar
Wang T, Zhao H (2017) A dirichlet-tree multinomial regression model for associating dietary nutrients with gut microorganisms. Biometrics 73(3):792–801
Article MathSciNet Google Scholar

Download references

Author information

Authors and Affiliations

School of Mathematics, Statistics and Computer Science, University of Tehran, Tehran, Iran
Zahra Rezaei Ghahroodi

Authors

Zahra Rezaei Ghahroodi
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Zahra Rezaei Ghahroodi.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Rezaei Ghahroodi, Z. Statistical matching of sample survey data: application to integrate Iranian time use and labour force surveys. Stat Methods Appl 32, 1023–1051 (2023). https://doi.org/10.1007/s10260-023-00693-2

Download citation

Accepted: 12 March 2023
Published: 12 April 2023
Issue Date: September 2023
DOI: https://doi.org/10.1007/s10260-023-00693-2

Keywords

Mathematics Subject Classification

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Statistical matching of sample survey data: application to integrate Iranian time use and labour force surveys

Abstract

Access this article

Similar content being viewed by others

Sae estimation of related labor market indicators for different overlapping areas

A tale of two data sets: comparing German administrative and survey data using wage inequality as an example

Estimation within the new integrated system of household surveys in Germany

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Mathematics Subject Classification

Navigation

Statistical matching of sample survey data: application to integrate Iranian time use and labour force surveys

Abstract

Access this article

Similar content being viewed by others

Sae estimation of related labor market indicators for different overlapping areas

A tale of two data sets: comparing German administrative and survey data using wage inequality as an example

Estimation within the new integrated system of household surveys in Germany

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Mathematics Subject Classification

Search

Navigation