Abstract
Pattern classification using connectionist (i.e., neural network) models is viewed within a statistical framework. A connectionist network's subjective beliefs about its statistical environment are derived. This belief structure is the network's “subjective” probability distribution. Stimulus classification is interpreted as computing the “most probable” response for a given stimulus with respect to the subjective probability distribution. Given the subjective probability distribution, learning algorithms can be analyzed and designed using maximum likelihood estimation techniques, and statistical tests can be developed to evaluate and compare network architectures. The framework is applicable to many connectionist networks including those of Hopfield (1982, 1984), Cohen and Grossberg (1983), Anderson et al. (1977), and Rumelhart et al. (1986b).
Similar content being viewed by others
References
Ackley DA, Hinton GE, Sejnowski TJ (1985) A learning algorithm for Boltzmann machines. Cog Sci 9:147–169
Anderson JA, Silverstein JW, Ritz SA, Jones RS (1977) Distinctive features, categorical perception, and probability learning: some applications of a neural model. Psychol Rev 84:413–451
Anderson JA, Golden RM, Murphy GL (1986) Concepts in distributed systems. In: Szu H (ed) Optical and hybrid computing. SPIE Vol 634, pp 260–276
Besag J (1974) Spatial interaction and the statistical analysis of lattice systems. J R Statis Soc, Ser B 36:192–236
Bierens HJ (1987) A consistent hausman-type model specification test (unpublished)
Cohen FS, Cooper DB (1987) Simple parallel hierarchical and relaxation algorithms for segmenting noncausal Markovian random fields. IEEE Trans PAMI-9:195–219
Cohen MA, Grossberg S (1983) Absolute stability of global pattern formation and parallel memory storage by competitive neural networks. IEEE Trans SMC-13:815–825
Cox RT (1946) Probability, frequency, and reasonable expectation. Am J Statis Phys 14:1–13
Duda RO, Hart PE (1973) Pattern classification and scene analysis. Wiley, New York
Gallager RG (1968) Information theory and reliable communication. Wiley, New York
Geman S, Geman D (1984) Stochastic relaxation, Gibbs distributions, and the Bayesian restoration of images. IEEE Trans PAMI-6:721–741
Goffman C (1966) Introduction to real analysis. Harper and Row, New York
Golden RM (1986) The “brain-state-in-a-box” neural model is a gradient descent algorithm. J Math Psychol 30:73–80
Golden RM (1988) Probabilistic characterization of neural model computations. In: Anderson DZ (ed) Neural networks and information processing. AIP, New York (in press)
Greenwood D (1987) NASA JSC neural network survey results. In: Proceedings of the Space Operations, Automation and Robotics Conference. Johnson Space Center
Hanson SJ, Burr DJ (1988) Minkowski-r back-propagation: learning in connectionist models with non-euclidian error signals. In: Anderson DZ (ed) Neural networks and information processing. AIP, New York (in press)
Hinton GE (1987) Connectionist learning procedures (CMU-CS-87-115). Department of Computer Science Technical Report. Carnegie-Mellon University
Hinton GE, Anderson JA (1981) Parallel models of associative memory. Erlbaum, Hillsdale, NJ
Hopfield JJ (1982) Neural networks and physical systems with emergent collective computational abilities. Proc Natl Acad Sci USA 79:2554–2558
Hopfield JJ (1984) Neurons with graded response have collective computational properties like those of two-state neurons. Proc Natl Acad Sci USA 81:3088–3092
Jennrich RI (1969) Asymptotic properties of non-linear least squares estimators. Ann Math Stat 40:633–643
Kohonen T (1984) Self-organization and associative memory. Springer, Berlin Heidelberg New York
Kullback S, Leibler RA (1951) On information and sufficiency. Ann Math Stat 22:79–86
Lancaster T (1984) Note and comments: the covariance matrix of the information matrix test. Econometrica 52:1051–1053
Le Cun Y (1985) Une procedure d'apprentissage pour reseau a seuil assymetrique [A learning procedure for assymetric threshold network]. Proc Cogn 85:599–604
Luce RD (1959) Individual choice behavior. Wiley, New York
Luenberger DG (1979) Introduction to dynamic systems: theory, models, and applications. Wiley, New York
Luenberger DG (1984) Linear and nonlinear programming. Addison-Wesley, Reading, Mass
McClelland JL, Rumelhart DE (1981) An interactive activation model of context effects in letter perception: Part 1. An account of basic findings. Psychol Rev 88:375–407
Marroquin JL (1985) Probabilistic solution of inverse problems. A. I. Memo 860, MIT Press, Cambridge
Montgomery DC, Peck EA (1982) Introduction to linear regression analysis. Wiley, New York
Parker DB (1985) Learning-logic (TR-47). Center for Computational Research in Economics and Management Science. MIT-Press, Cambridge, Mass
Rissanen J (1983) A universal prior for integers and estimation by minimum description length. Ann Stat 11:416–431
Rumelhart DE, McClelland JL, and the PDP Research Group (1986a) Parallel distributed processing: explorations in the microstructure of cognition, vol 1: Foundations. MIT Press, Cambridge, Mass
Rumelhart DE, Hinton GE, Williams RJ (1986b) Learning internal representations by error propagation. In: Rumelhart DE, McClelland JL, and the PDP Research Group (eds) Parallel distributed processing: explorations in the microstructure of cognition, vol 1: Foundations. MIT Press, Cambridge, Mass
Rumelhart DE, Smolensky P, McClelland JL, Hinton GE (1986c) Models of schemata and sequential thought processes. In: McClelland JL, Rumelhart DE, and the PDP Research Group (eds) Parallel distributed processing: explorations in the microstructure of cognition, vol 2: Applications. MIT Press, Cambridge, Mass
Savage LJ (1971) The foundations of statistics. Wiley, New York
Schneider W, Detweiler M (1987) A connectionist/control architecture for working memory. In: Bower G (ed) Psychology of learning and motivation, 21, pp 54–119
Schneider W, Mumme D (1987) Attention automaticity and the capturing of knowledge: a two-level cognitive architecture (unpublished)
Shepard RN (1957) Stimulus and response generalization: a stochastic model relating generalization to distance in psychological space. Psychometrika 22:325–345
Shepard RN (1986) Discrimination and generalization in identification and classification: comment on Nosofsky. J Exp Psychol 115:58–61
Smolensky P (1986) Information processing in dynamical systems: foundations of harmony theory. In: Rumelhart DE, McClelland JL, and the PDP Research Group (eds) Parallel distributed processing: explorations in the microstructure of cognition, vol 1: Foundations. MIT Press, Cambridge, Mass
Van Trees HL (1968) Detection, estimation, and modulation theory. Wiley, New York
White H (1982) Maximum likelihood estimation of misspecified models. Econometrica 50:1–25
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Golden, R.M. A unified framework for connectionist systems. Biol. Cybernetics 59, 109–120 (1988). https://doi.org/10.1007/BF00317773
Received:
Accepted:
Issue Date:
DOI: https://doi.org/10.1007/BF00317773