Abstract
In Chap. 2 the bias-variance tradeoff was introduced and approaches to regulate model complexity by some parameter λ—but how to choose it? Here is a fundamental issue in statistical model fitting or parameter estimation: We usually only have available a comparatively small sample from a much larger population, but we really want to make statements about the population as a whole. Now, if we choose a sufficiently flexible model, e.g., a local or spline regression model with many parameters, we may always achieve a perfect fit to the training data, as we already saw in Chap. 2 (see Fig. 2.5). The problem with this is that it might not say much about the true underlying population anymore as we may have mainly fitted noise—we have overfit the data, and consequently our model would generalize poorly to sets of new observations not used for fitting. As a note on the side, it is not only the nominal number of parameters relevant for this but also the functional form or flexibility of our model and constraints put on the parameters. For instance, of course we cannot accurately capture a nonlinear functional relationship with a (globally) linear model, regardless of how many parameters. Or, as noted before, in basis expansions and kernel approaches, the effective number of parameters may be much smaller as the variables are constrained by their functional relationships. This chapter, especially the following discussion and Sects. 4.1–4.4, largely develops along the exposition in Hastie et al. (2009; but see also the brief discussion in Bishop, 2006, from a slightly different angle).
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Akaike, H.: Information theory and an extension of the maximum likelihood principle. In: Proceedings of the Second International Symposium on Information Theory, Budapest, pp. 267–281 (1973)
Allefeld, C., Haynes, J.D.: Searchlight-based multi-voxel pattern analysis of fMRI by cross-validated MANOVA. Neuroimage. 89, 345–357 (2014)
Balaguer-Ballester, E., Lapish, C.C., Seamans, J.K., Daniel Durstewitz, D.: Attractor dynamics of cortical populations during memory-guided decision-making. PLoS Comput. Biol. 7, e1002057 (2011)
Bishop, C.M.: Pattern Recognition and Machine Learning. Springer, New York (2006)
Brusco, M.J., Stanley, D.: Exact and approximate algorithms for variable selection in linear discriminant analysis. Comput. Stat. Data Anal. 55, 123–131 (2011)
Demanuele, C., Bähner, F., Plichta, M.M., Kirsch, P., Tost, H., Meyer-Lindenberg, A., Durstewitz, D.: A statistical approach for segregating cognitive task stages from multivariate fMRI BOLD time series. Front. Human Neurosci. 9, 537 (2015a)
Demanuele, C., Kirsch, P., Esslinger, C., Zink, M., Meyer-Lindenberg, A., Durstewitz, D.: Area-specific information processing in prefrontal cortex during a probabilistic inference task: a multivariate fMRI BOLD time series analysis. PLoS One. 10, e0135424 (2015b)
Duda, R.O., Hart, P.E.: Pattern Classification and Scene Analysis. Wiley, New York (1973)
Durstewitz, D., Vittoz, N.M., Floresco, S.B., Seamans, J.K.: Abrupt transitions between prefrontal neural ensemble states accompany behavioral transitions during rule learning. Neuron. 66, 438–448 (2010)
Efron, B.: Estimating the error rate of a prediction rule: some improvements on cross-validation. J. Am. Stat. Assoc. 78, 316–331 (1983)
Efron, B., Tibshirani, R.: Improvements on cross-validation: the 632+ bootstrap: method. J. Am. Stat. Assoc. 92, 548–560 (1997)
Fahrmeir, L., Tutz, G.: Multivariate Statistical Modelling Based on Generalized Linear Models. Springer, New York (2010)
Ferraty, F., van Keilegom, I., Vieu, P.: On the validity of the bootstrap in non-parametric functional regression. Scand. J. Stat. 37, 286–306 (2010a)
Ferraty, F., Hall, P., Vieu, P.: Most-predictive design points for functional data predictors. Biometrika. 97(4), 807–824 (2010b)
Friedman, J.H.: On bias, variance, 0/1—loss, and the curse-of-dimensionality. Data Mining Knowl. Discov. 1, 55–77 (1997)
Friston, K.J., Harrison, L., Penny, W.: Dynamic causal modelling. Neuroimage. 19, 1273–1302 (2003)
Garg, G., Prasad, G., Coyle, D.: Gaussian Mixture Model-based noise reduction in resting state fMRI data. J. Neurosci. Methods. 215(1), 71–77 (2013)
Hastie, T., Tibshirani, R., Friedman, J.: The Elements of Statistical Learning (Vol. 2, No. 1) Springer, New York (2009)
Kass, R.E., Raftery, A.E.: Bayes factors. J. Am. Stat. Assoc. 90, 773–795 (1995)
Kaufman, L., Rousseeuw, P.J.: Finding groups in data. Wiley, New York (1990)
Khamassi, M., Quilodran, R., Enel, P., Dominey, P.F., Procyk, E.: Behavioral regulation and the modulation of information coding in the lateral prefrontal and cingulate cortex. Cereb. Cortex. 25(9), 3197–3218 (2014)
Knuth, K.H., Habeck, M., Malakar, N.K., Mubeen, A.M., Placek, B.: Bayesian evidence and model selection. Dig. Signal Process. 47, 50–67 (2015)
Lapish, C.C., Durstewitz, D., Chandler, L.J., Seamans, J.K.: Successful choice behavior is associated with distinct and coherent network states in anterior cingulate cortex. Proc. Natl. Acad. Sci. U S A. 105, 11963–11968 (2008)
Penny, W.D.: Comparing dynamic causal models using AIC, BIC and free energy. Neuroimage. 59, 319–330 (2012)
Penny, W.D., Mattout, J., Trujillo-Barreto, N.: Chapter 35: Bayesian model selection and averaging. In: Friston, K., Ashburner, J., Kiebel, S., Nichols, T., Penny, W. (eds.) Statistical Parametric Mapping: The Analysis of Functional Brain Images. Elsevier, London (2006)
Schwarz, G.: Estimating the dimension of a model. Ann. Stat. 6, 461–464 (1978)
Stephan, K.E., Penny, W.D., Daunizeau, J., Moran, R.J., Friston, K.J.: Bayesian model selection for group studies. Neuroimage. 46, 1004–1017 (2009)
Stone, M.: Cross-Validatory Choice and Assessment of Statistical Predictions. J. R. Stat. Soc. Ser. B. 36, 111–147 (1974)
Vincent, T., Badillo, S., Risser, L., Chaari, L., Bakhous, C., Forbes, F., Ciuciu, P.: Flexible multivariate hemodynamics fMRI data analyses and simulations with PyHRF. Front. Neurosci. 8, 67 (2014)
Watanabe, T.: Disease prediction based on functional connectomes using a scalable and spatially-informed support vector machine. Neuroimage. 96, 183–202 (2014)
Witten, D.M., Tibshirani, R.: Covariance-regularized regression and classification for high dimensional problems. J. R. Stat. Soc. Ser. B (Statistical Methodology). 71, 615–636 (2009)
Witten, D.M., Tibshirani, R.: Penalized classification using Fisher’s linear discriminant. J. R. Stat. Soc. Ser. B. 73, 753–772 (2011a)
Young, G., Householder, A.S.: Discussion of a set of points in terms of their mutual distances. Psychometrika. 3, 19–22 (1938)
Author information
Authors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this chapter
Cite this chapter
Durstewitz, D. (2017). Model Complexity and Selection. In: Advanced Data Analysis in Neuroscience. Bernstein Series in Computational Neuroscience. Springer, Cham. https://doi.org/10.1007/978-3-319-59976-2_4
Download citation
DOI: https://doi.org/10.1007/978-3-319-59976-2_4
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-59974-8
Online ISBN: 978-3-319-59976-2
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)