Generalized Unrestricted Models (GUMs), a flexible and interpretable tool for behavioral and neural analysis

Generalized Linear Model (GLMs) analysis is a popular tool in psychophysics and neuroscience for inferring the relative influence of various experimental factors onto choices, reaction times, neural activity and other observables. However, GLMs are intrinsically limited by their linearity assumption, and can lead to severe misattribution errors when (correlated) regressors contribute nonlinearly to the observed response. We show how this framework can be expanded to capture nonlinear functions. First, Generalized Additive Models (GAMs) allow to capture a nonlinear contribution for each regressor. A Gaussian Processes (GP) treatment of GAMs allows to recover the posterior distribution for each nonlinear mapping. Second, as neuroscience is often interested in the interaction of cognitive factors, we present a Bayesian treatment of Generalized Multilinear Models (GMMs) that capture multilinear interactions between different sets of regressors. GMMs can be applied e.g. when inferring the modulation of sensory processing by additional factors such as attention factors. Merging the frameworks of GAMs and GMMs yield Generalized Unrestricted Models (GUMs), a highly versatile and interpretable environment to capture cognitive determinants of behavior and neural activity. Crucially, these models can be efficiently estimated, even with limited dataset; Bayesian techniques can be applied to test which model is best supported by data.


Limitations of GLMs
GLMs are represented by the generative model [ ] = ∑ . When regressors ( 1 , . . ) are correlated and the true underlying model is nonlinear, significant weights can be detected for non-contributing regressors. For example, we applied linear regression on spike count for a MT neuron during visualization of a random-dot kinetogram (Bair, Zohary, & Newsome, 2001), with motion coherence and accuracy of the monkey response at the end of sensory presentation as regressors. Surprisingly, this analysis concludes that the neuron response activity depended on whether the animal would behave correctly (p<0.005, figure 1a). However, such false positive can be accounted for by the fact (figure 1b) that neural activity depends nonlinearly on motion coherence (full blue curve), and that accuracy (red curve) correlates with the residual (gray) from the linear trend (dotted blue line). Various enhancements of the GLM framework have been proposed to correct for these limitations, but are seldom used in cognitive psychology and neuroscience. Figure 1: False positive in a GLM analysis induced by incorrect linearity hypothesis. a) Result of the GLM analysis with MT neuron spike count as predictor, and signed coherence and monkey accuracy on the same trial as regressors. b) Average neuron firing rate (blue curve) and monkey accuracy (red curve) as a function of signed coherence. Blue dotted line represents the linear trend, the gray line is the residual.

GAMs model each regressor as nonlinear mapping
GAMs allow to fit functions in the form [ ] = ∑ ( ) (Wood, 2006). Traditional treatment of GAMs is limited however by the a priori definition of a set of basis functions for each regressor. GP framework sets prior ~(0, ), where is a certain covariance function (e.g. Gaussian kernel) defining the expected variance and smoothness for (formally the covariance for any finite set of function evaluations). Then an (approximate) posterior distribution is computed for the different mapping ( | ) (Adam, Hensman, & Sahani, 2016). In figure 2a we illustrate how a function 2 ( ) = 141 This work is licensed under the Creative Commons Attribution 3.0 Unported License. To view a copy of this license, visit http://creativecommons.org/licenses/by/3.0 2 ( ) (red curve) is estimated (blue curve) by observing binary from model ( ) = ( 1 + 2 2 ) and fitting generative model ( ) = ( 1 1 + 2 ( 2 )). Fitted smoothness parameter (e.g. by maximizing ELBO) naturally adapts to dataset; small dataset will lead to overestimated scale parameter, naturally retaining the most relevant trend in (figure 2b). GP prior also improves weight estimation for ordinal regressors (i.e. inferring the influence of sensory samples in a sequence), compared to vanilla or L2regularized GLM (figure 2c). dotted lines represent standard deviation of the posterior distribution) against true function (red). b) Impact of number of observations. With low number of observations (left), the fitted scale of the SE kernel remains high, essentially avoiding overfitting of the function (true function: black; estimated function: blue curve). With higher number of observations (right), the fitted scale goes down, allowing to capture finer details. c) Comparison of a model fits when binary observations (choices) derive from the weighted sum of weighted sum of evidence provided by sequential samples, with exponential decay (referred to as priming effect in the literature). Applying GP prior (right) allows a better estimation of the weights than standard techniques such as unregularized GLM or L2-regularized GLM (i.e. ridge logistic regression, left panel).  (Ahrens, Linden, & Sahani, 2008). While no longer convex, the fitting procedure for weights and (which involves alternatively updating one set of weights using Iterative Reweighting Least Square (Bishop, 2006) while leaving the others unchanged) generally gives consistent results. Compared to the full interaction model [ ] = ∑ , , they offer more interpretable results and require much less data, as the number of parameters scale as O(m+n) and not O(mn). We illustrate the method on choice data in a perceptual accumulation 2AFC task where human subjects add to judge whether the overall orientation of a sequence of Gabor patches was tilted leftwards or rightwards (Wyart, de Gardelle, Scholl, & Summerfield, 2012). Standard GLM for responses is ( ) = ( 0 + ∑ ), where = ( − ) and is the orientation for the i th sample in the sequence. GMMs with 4 interacting set of regressors (sensory sample, previous trial outcome, block position and subject id) captured modulation of sensory processing by previous trial outcome (after-error correction, second panel) and block position (improvement during learning followed by fatigue-induced performance decay, third panel):

GUMs combine the framework of GAMs and GMMs
GUM general form is: , where each ( ) can be substituted by if linearity with this regressor is assumed. This offers a particularly rich and flexible framework, where depending on the cognitive model one wants to test one can fit and compare models such as [ ] = ( 1 ( 1 ) + 2 2 ) 3 ( 3 ) + 4 4 or [ ] = ( 1 ( 1 ) + 2 ( 2 ))( 3 ( 3 ) + 4 ( 4 )) + ∑ 10 =5 . Inference on function and weights can be run using a variety of Gaussian approximations, from Laplace approximation to variational inference and Expectation-Maximization (Rasmussen & Williams, 2006). Results on synthetic data using various observation types (binary, continuous, counts) showed that fitting is generally well behaved (not shown). We used GUMs in the accumulation task dataset to probe the sensory-toperceptual mapping. Instead of assuming the optimal cosine mapping as in e), we estimated 1 using GUM: ( ) = ( ( ( )) + ∑ 1 ( ) 2 ( ( )). Fitted values show that the actual sensory-toperceptual mapping is close to the optimal cosine mapping (figure 4b), as well as previously reported recency effect in sample weighting (figure 4a), and subject-dependent lateral bias w(subj(k)) (figure 4c) Figure 4: Fits of a GUM to participants choices in visual accumulation task (same data as figure 3). a. estimated weight for visual frames as a function of visual frames. b) nonlinear mapping from orientation to weight, closely following the normative cosine form.
Maximal positive weight towards right responses is achieved when the orientation of the sample is aligned with that of the reference orientation for right responses (45 degree tilted rightwards), maximal negative weight for the opposite orientation. c) Modulation by subject identity 2 .

Conclusions
A Matlab toolbox will be available to allow for rapid, flexible usage of GUMs for cognitive psychology, neuroscience and other disciplines.