Abstract
A novel exploratory approach is developed to the analysis of a large table of counts. It uses random-effects models where the cells of the table (representing types of individuals) form the higher level in a multilevel model. The model includes Poisson variation and an offset to model the ratio of observed to expected values thereby permitting the analysis of relative rates. The model is estimated as a Bayesian model through MCMC procedures and the estimates are precision-weighted so that unreliable rates are down-weighted in the analysis. Once reliable rates have been obtained graphical and tabular analysis can be deployed. The analysis is illustrated through a study of the occupational class distribution for people of different age, birthplace-origin and generation in Australia. The case is also made that even where there is a full census there is a need to move beyond a descriptive analysis to a proper inferential and modelling framework. We also discuss the relative merits of Full and Empirical Bayes approaches to model estimation.
Similar content being viewed by others
Notes
For details on the TableBuilder facility see http://www.abs.gov.au/websitedbs/censushome.nsf/home/tablebuilder—accessed July 29 2014.
The Normality assumption of the cell differentials is obviously a key assumption for the validity of the variance in summarising the differences in the relative risk. This can be informally assessed with a Normal probability plot. In practice we have found that this assumption is generally met; no doubt due to using the log transform. Moreover, McCulloch and Neuhaus (2011) have found model results are generally robust to the shape of the random-effects distribution. An exception to this would be marked outliers for particular cells which could be accommodated by specifying separate fixed effects for these cells which would make them immune to shrinkage.
For ease of exposition (and as per normal practice) the imprecision in the ratio is dependent only on the imprecision of the observed count. It is being assumed that the expected count is precise. A more realistic formulation is given in Talbot et al. (2011). The specific nature of the weighting for this log-Normal model is considered by Papageorgiou and Ghosh (2012, Eqs. 1–3); albeit in an empirical Bayes formulation.
The weight is a form of interclass correlation coefficient for each cell that measures the amount of true variability (the level 2 variance) in the underling rates relative to the total observed variability. In the measurement literature, the reliability \(w_j \) is often symbolised by \(\rho _{yy} \) to convey the internal dependency of a measured y variable.
The between cell variance at level 2 summarizes the differences between cells, but usefully it is not the variance of the shrunken differentials, but the variance of the raw differentials. Consequently it is not the estimated between group variance of the sample, but the estimated between-group variance in the population.
For a more general discussion of these advantageous properties see the classic papers of James and Stein (1961), and Lindley and Smith (1972). Their benefits are extolled in Kendall (1959) ‘song’, and in the expository paper of Efron and Morris (1977) which studies baseball averages and disease distributions.
The estimate of pD is given by the difference between the average deviance and the deviance at the expected value of the unknown parameters.
References
Bell, A., Jones, K.: Explaining fixed effects: random effect modelling of time series, cross-sectional and panel data. Political Sci. Res. Methods (2014). doi:10.1017/psrm.2014.7
Bell, A., Jones, K.: Bayesian informative priors with Yang and Land’s hierarchical age-period-cohort model, Qual. Quant. (2014, in press). doi:10.1007/s11135-013-9985-3
Bernardinelli, L., Clayton, D., Montomoli, C.: Bayesian estimates of disease maps: how important are priors? Stat. Med. 14, 2411–2431 (1995)
Bernardinelli, L., Montomoli, C.: Empirical Bayes versus fully Bayesian analysis of geographical variation in disease risk. Stat. Med. 11, 983–1007 (1992)
Best, N., Richardson, S., Thomson, A.: A comparison of Bayesian spatial models for disease mapping. Stat. Methods Med. Res. 14, 35–59 (2005)
Borjas, G.J.: Ethnic capital and intergenerational mobility. Q. J. Econ. 107, 123–150 (1992)
Boyd, M., Grieco, E.: Triumphant transitions: socioeconomic achievement of the second generation in Canada. Int. Migr. Rev. 32, 853–876 (1998)
Breslow, N.E., Day, N.E.: Indirect standardization and multiplicative models for rates, with reference to the age adjustment of cancer incidence and relative frequency data. J. Chronic Dis. 28, 289–303 (1975)
Breslow, N.E., Day, N.E.: Statistical Methods in Cancer Research, Volume II: The Design and Analysis of Cohort Studies. International Agency for Research on Cancer, Lyon (1987)
Browne, W. J.: MCMC Estimation in MLwiN, v2.25. Centre for multilevel modelling, University of Bristol, Bristol, available at http://www.bristol.ac.uk/cmm/software/mlwin/download/manuals.html (2012)
Browne, W.J., Subramanian, S.V., Jones, K., Goldstein, H.: Variance partitioning in multilevel logistic models that exhibit over-dispersion. J. R. Stat. Soc. A 168, 599–614 (2005)
Clayton, D., Kaldor, J.: Empirical Bayes estimates of age-standardized relative risk for use in disease mapping. Biometrics 43, 671–681 (1987)
Draper, David: Bayesian multilevel analysis and MCMC. In: de Leeuw, J., Meijer, E. (eds.) Handbook of Multilevel Analysis, pp. 77–139. Springer, New York (2008)
Efron, B., Morris, C.: Stein’s paradox in statistics. Sci. Am. 237, 119–127 (1977)
Forrest, J., Hermes, K., Johnston, R., Poulsen, M.: The housing resettlement of refugee immigrants to Australia. J. Refug. Stud. 20, 187–206 (2013)
Forrest, J., Poulsen, M., Johnston, R.: A ‘multicultural model’ of the spatial assimilation of ethnic minority groups in Australia’s major immigrant-receiving cities. Urban Geogr. 27, 451–463 (2006)
Gelman, A.: How Bayesian analysis cracked the red-state, blue-state problem. Stat. Sci. 29(1), 26–35 (2014)
Gelman, A., Hill, J., Yajima, M.: Why we (usually) don’t have to worry about multiple comparisons. J. Res. Educ. Eff. 5, 189–211 (2012)
Goldstein, H.: Multilevel Statistical Models, 4th edn. Wiley, Chichester (2011)
Gorard, S.: Research Design: Robust Approaches for the Social Sciences. Sage, London (2013)
Hawthorne, L.: “Picking winners”: the recent transformation of Australia’s skilled migration policy. Int. Migr. Rev. 39, 663–696 (2005)
Ho, C.: From social justice to social cohesion: a history of Australian multicultural policy. In: Jakubowitz, A., Ho, C. (eds.) For Those Who’ve Come Across the Seas: Australian Multicultural Theory, Policy and Practice, pp. 31–44. Australian Scholarly Publishing, North Melbourne (2013)
James, W., Stein, C.: Estimation with quadratic loss. Proc. Fourth Berkeley Symp. Math. Stat. Probab. 1, 361–379 (1961)
Jones, H.E., Spiegelhalter, D.J.: The identification of ‘unusual’ health-care providers from a hierarchical model. Am. Stat. 65(3), 154–163 (2011)
Jones, K., Bullen, N.: Contextual models of urban house prices: a comparison of fixed- and random-coefficient models developed by expansion. Econ. Geogr. 70, 252–272 (1994)
Jones, K., Kirby, A.: The use of chi-square maps in the analysis of census data. Geoforum 11, 409–417 (1980)
Jones, K., Subramanian, S. V.: Developing Multilevel Models for Analysing Contexuality, Heterogeneity and Change. Centre for multilevel modelling, University of Bristol, Bristol, available at http://www.bristol.ac.uk/cmm/software/mlwin/mlwin-resources.html (2014)
Jupp, J. (ed.): The Australian People: An Encyclopedia of the Nation, its Peoples and their Origins. Cambridge University Press, Oakleigh (2001)
Kendall, M.G.: Hiawatha designs: an experiment. Am. Stat. 13, 23–24 (1959)
Leckie, G., Pillinger, R., Jones, K., Goldstein, H.: Multilevel modelling of social segregation. J Educ. Behav. Stat. 37, 3–30 (2012)
Leyland, A.H., Davies, C.A.: Empirical Bayes methods for disease mapping. Stat. Methods Med. Res. 14, 17–34 (2005)
Lindley, D., Smith, A.: Bayes estimates for the linear model. J. R. Stat. Soc. B 34, 1–41 (1972)
McCullagh, P., Nelder, J.A.: Generalized Linear Models. Chapman and Hall, London (1989)
McCulloch, C.E., Neuhaus, J.M.: Misspecifying the shape of a random effects distribution: why getting it wrong may not matter. Stat. Sci. 26, 388–402 (2011)
Owen, D., Jones, K.: Geographical inequalities in mortality: a model-based approach to analysing fine-grained differences over time: England and Wales, 2002–2012, in preparation (2014)
Papageorgiou, G., Gosh, M.: Estimation of small area event rates and of the associated standard errors. J. Stat. Plan. Inference 142, 2009–2016 (2012)
Portes, A., Zhou, M.: The new second generation: segmented assimilation and its variants. Ann. Am. Acad. Political Soc. Sci. 530, 74–96 (1993)
Rasbash, J., Charlton, C., Browne, W. J., Healy, M., and Cameron, B.: MLwiN Version 2.1. Centre for multilevel modelling, University of Bristol, Bristol (2009)
Rodriguez, G., Goldman, N.: An assessment of estimation procedures for multilevel models with binary responses. J. R. Stat. Soc. A 158, 73–90 (1995)
Spiegelhalter, D.J., Best, N.G., Carlin, B.P., van der Linde, A.: Bayesian measures of model complexity and fit. J. R. Stat. Soc. B 64, 583–640 (2002)
Steenburgh, T.J., Ainslie, A., Engebretson, P.H.: Massively categorical variables: revealing the information in zip codes. Mark. Sci. 22, 40–57 (2003)
Subramanian, S.V., Duncan, C., Jones, K.: Multilevel perspectives on modelling census data. Environ. Plan. A 33, 399–417 (2001)
Sweetman, A., van Ours, J. C.: Immigration: what about the children and grandchildren? Bonn: Institute for the Study of Labour, IZ A Discussion Paper 7919 (2014)
Talbot, D., Duchesne, T., Brisson, J., Vandal, N.: Variance estimation and confidence intervals for the standardized mortality ratio with application to the assessment of a cancer screening program. Stat. Med. 30, 3024–3037 (2011)
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Jones, K., Owen, D., Johnston, R. et al. Modelling the occupational assimilation of immigrants by ancestry, age group and generational differences in Australia: a random effects approach to a large table of counts. Qual Quant 49, 2595–2615 (2015). https://doi.org/10.1007/s11135-014-0130-8
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11135-014-0130-8