Abstract
We introduce a Bayesian theoretical formulation of the statistical learning problem concerning the genetic structure of populations. The two key concepts in our derivation are exchangeability in its various forms and random allocation models. Implications of our results to empirical investigation of the population structure are discussed.
Similar content being viewed by others
References
Bernardo, J.M., Smith, A.F.M., 1994. Bayesian Theory. Wiley, Chichester.
Corander, J., Waldmann, P., Marttinen, P., Sillanpää, M. J., 2004. BAPS 2: Enhanced possibilities for the analysis of genetic population structure. Bioinformatics 20, 2363–2369.
Corander, J., Waldmann, P., Sillanpää, M.J., 2003. Bayesian analysis of genetic differentiation between populations. Genetics, 163, 367–374.
Corander, J., Gyllenberg, M. and Koski, T., 2006a. Bayesian unsupervised classification framework based on stochastic partitions of data and a parallel search strategy. submitted to J. Statist. Comput. Simulation.
Corander, J., Gyllenberg, M. and Koski, T., 2006b. Bayesian model learning based on a parallel MCMC strategy. Stat. Comput. 16, 355–362.
de Finetti, B., 1974. Theory of Probability, vol. I. Wiley, Chichester.
Dawson, K.J., Belkhir, K., 2001. A Bayesian approach to the identification of panmictic populations and the assignment of individuals. Genet. Res. Camb. 78, 59–77.
Diaconis, P., Zabell, S.L., 1982. Updating subjective probability. J. Amer. Stat. Assoc. 77, 822–830.
Dieringer, D., Nolte, V., Schlötterer, C., 2005. Population structure in African Drosophila melanogaster revealed by microsatellite analysis. Mol. Ecol. 14, 563–573.
Donnelly, P., 1986. Partition structures, Poly’a urns, the Ewens sampling formula, and the ages of alleles. Theor. Pop. Biol. 30, 271–288.
Draper, D., Hodges, J.S., Mallows, C.L., Pregibon, D., 1993. Exchangeability and data analysis. J. R. Stat. Soc. A 156, 9–37.
Duda, R.O., Hart, P.E., Stork, D.G., 2000. Pattern Classification and Scene Analysis, 2nd edition. Wiley, New York.
Ewens, W.J., 1990. Population genetics theory—the past and the future. In: Lessard, S. (Ed.), Mathematical and Statistical Developments of Evolutionary Theory. Kluwer, Dordrecht, pp. 177–227.
Ewens, W.J., 2004. Mathematical Population Genetics, 2nd edition. Springer-Verlag, New York.
Geiger, D., Heckerman, D., 1997. A characterization of the Dirichlet distribution through global and local parameter independence. Ann. Stat. 25, 1344–1369.
Geisser, S., 1966. Predictive discrimination. In: Krishnajah, P.R. (Ed.), Multivariate Analysis. Academic Press, New York, London.
Good, I.J., 1965. Estimation of Probabilities. MIT Press, Cambridge, MA.
Gyllenberg, M., Koski, T., 2002. Bayesian predictiveness, exchangeability and sufficientness in bacterial taxonomy. Math. Biosc. 177–178, 161–184.
Holst, L., 1981. On numbers related to partitions of unlike objects and occupancy problems. Eur. J. Combinatorics 2, 231–237.
Hoppe, F.M., 1984. Poly’a-like urns and the Ewens’ sampling formula. J. Math. Biol. 20, 91–94.
Joyce, P., 1991. Estimating the frequency of the oldest allele: A Bayesian approach. Adv. Appl. Prob. 23, 456–475.
Joyce, P., 1998. Partition Structures and sufficient statistics J. Appl. Prob. 35, 622–632.
Kallenberg, O., 2005. Probabilistic Symmetries and Invariance Principles. Springer-Verlag, New York.
Kingman, J.F.C., 1977. The population structure associated with the Ewens sampling formula. Theor. Pop. Biol. 11, 274–283.
Kingman, J.F.C., 1978a. The representation of partition structures. J. Lond. Math. Soc. 18, 374–380.
Kingman, J.F.C., 1978b. Random partitions in population genetics. Proc. R. Soc. Lond. A 361, 1–20.
Kingman, J.F.C., 1978c. Uses of exchangeability. Ann. Prob. 6, 183–197.
Kingman, J.F.C., 1980. Mathematics of Genetic Diversity. SIAM, Philadelphia.
Nagylaki, T., 1992. Theoretical Population Genetics. Springer-Verlag, Berlin.
Pitman, J., 1997. Some probabilistic aspects of set partitions. Amer. Math. Month. 104, 201–209.
Pritchard, J.K., Stephens, M., Donnelly, P., 2000. Inference of population structure using multilocus genotype data. Genetics 155, 945–959.
Robert, C.P., Casella, G., 2005. Monte Carlo Statistical Methods. 2nd edition. Springer-Verlag, New York.
Rota, G.-C., 1964. The number of partitions of a set. Amer. Math. Month. 71, 498–504.
Schervish, M. J., 1995. Theory of Statistics. Springer-Verlag, New York.
Simon, H.A., 1955. On a class of skew distribution functions. Biometrika 42, 425–440.
Stam, A.J., 1983. Generation of a random partition of a finite set by an urn model. J. Combin. Theor. Ser. A 35, 231–240.
Stigler, S.M., 1982. Thomas Bayes’s Bayesian inference. J. R. Stat. Soc. A 145, 250–258.
Yule, G.U., 1925. A mathematical theory of evolution, based on the conclusions of Dr. J.C. Willis, F.R.S.. Philos. Trans. R. Soc. B 213, 431–444.
Zabell, S.L., 1982. W.E. Johnson’s ‘sufficientness’ postulate. Ann. Stat. 10, 1091–1099.
Zabell, S.L., 1992. Predicting the unpredictable. Synthese 90, 205–232.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Corander, J., Gyllenberg, M. & Koski, T. Random Partition Models and Exchangeability for Bayesian Identification of Population Structure. Bull. Math. Biol. 69, 797–815 (2007). https://doi.org/10.1007/s11538-006-9161-1
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11538-006-9161-1