Fitting a Computational Model of Perceptual Inference to Principal Component Weights of ERP Responses

The mismatch negativity (MMN), a well-studied electrophysiological response to irregularities in the sensory input stream, has often been used to examine how the brain learns the statistics of its environment. This response has also been found to be systematically altered in clinical populations such as patients with schizophrenia. These deviations in electrophysiology, however, cannot easily be linked to inter-individual differences in cognitive processing style due to the lack of direct behavioral readouts, which limits the paradigm’s usefulness for cognitive science and computational psychiatry. To bridge this gap, we present a pipeline for inferring parameters of a generative model of learning and inference, the Hierarchical Gaussian Filter (HGF), given EEG recordings obtained as part of the auditory MMN paradigm. Our pipeline includes a data-driven feature selection step as well as a proposal for mapping belief updates to the EEG features.


Introduction
As part of our daily lives, we are constantly immersed in noisy sensory information from the outside world and are required to infer upon the hidden state of the environment. The study of perceptual inference and the integration of sensory information has a long and rich history. On the cognitive level, there is a variety of models that, given experimentally acquired readouts of behavior, allow for inference upon inter-individual differences in information processing. On the physiological level, neuroimaging studies have proven useful for identifying neural signatures of perceptual inference.
A striking example is the mismatch negativity (MMN), an electrophysiological response typically observed as part of an oddball paradigm, where a sequence of identical stimuli ('standards') is eventually interrupted by a stimulus differing in one of the stimulus dimensions ('deviant'). These stimuli are usually either auditory or visual. Here, we focused on the auditory MMN. The observed deviant-induced increase in negativity in the event-related potential (ERP) between 100 and 250 ms after stimulus onset constitutes the auditory MMN and is typically assessed by subtracting the average ERPs to standard stimuli from the average ERPs to deviant stimuli.
Since its discovery by Näätänen et al. (1978), the MMN has often been used to illustrate that humans learn the statistics of their environment (Winkler, 2007) and that deviations in the MMN response can be indicative of certain pathologies. For example, patients with schizophrenia are characterized by weaker MMN amplitudes (Avissar et al., 2018;Erickson et al., 2016). Schizophrenia has also been associated with aberrant NMDA-receptor function (Stephan et al., 2009;Friston et al., 2016) which, in turn, has been proposed to relate to prediction error signaling in predictive coding frameworks related to Bayesian inference (Friston, 2005).
While the MMN is well-studied, robust, and appears to be clinically relevant, inter-individual differences in the expression of this feature cannot readily be related to inter-individual differences in cognitive processing style in the absence of direct behavioral readouts. To bridge this gap, we here seek to relate EEG recordings acquired as part of an auditory MMN paradigm to a generative model of learning and inference, the Hierarchical Gaussian Filter (HGF) (Mathys et al., 2011). Put simply, we propose an approach that allows for fitting the HGF directly to electrophysiological (as opposed to behavioral) data. The key challenges in this approach are the high level of noise in the electrophysiological data and its high dimensionality, rendering a direct mapping from beliefs of our cognitive model to observable EEG responses challenging. We therefore first engaged in data-driven feature selection based on multilinear principal component analysis (MPCA) and subsequently constructed a response model for the HGF that generates trial-wise predictions of principal component weights for recorded electrophysiological responses.

210
This work is licensed under the Creative Commons Attribution 3.0 Unported License. To view a copy of this license, visit http://creativecommons.org/licenses/by/3.0

Subjects
We draw on data from a total of 81 healthy volunteers from a previous pharmacological EEG study which examined the effect of dopaminergic and cholinergic antagonism on MMN expression (Weber et al., in prep.). Here, participants received either amisulpride, biperiden or placebo in a betweensubjects, double-blind design.

Experimental paradigm
Participants engaged in a new variant of the roving auditory MMN paradigm, introducing different levels of volatility to the underlying statistical structure of the tone sequence. In the typical auditory MMN paradigm, following a sequence of repeating sounds, a deviant sound is presented that differs in a certain stimulus dimension (typically in pitch). In the roving MMN paradigm, this initially odd sound is being repeated and, over time, becomes the new standard. In the volatile variant employed here, the probability governing those sound flips changes over time such that, over the n = 1800 trials, some periods are more stable than others (see Figure 1).

Data acquisition and preprocessing
Electrophysiological data were recorded using a 64-channel EEG cap (EASYCAP, BrainProducts). EEG analysis was performed using SPM12 and MATLAB. The continuous EEG signal was re-referenced to the average and filtered as follows: First, a high-pass Butterworth filter with a cutoff frequency of 0.1 Hz was applied. Next, the data were down-sampled to 250 Hz. Finally, a low-pass Butterworth filter with a cutoff frequency of 30 Hz was employed. The EEG data were epoched into 550 ms long segments, beginning 100 ms prior to stimulus onset. Epochs were baseline-corrected. Following the rejection of subjects with poor signal quality, EEG data of 72 subjects were further analyzed.

Feature selection
An MPCA-based approach To reduce data dimensionality while retaining as much of the rich signal as possible, we engaged in data-driven feature selection based on multilinear principal component analysis (MPCA) (Kroonenberg & De Leeuw, 1980), an extension of PCA that allows processing matrices with more than two dimensions.
We defined the data of a single subject as containing n = 1800 measurements (trials) of (m × k) 'two-dimensional' samples, with m = 139 time points and n = 63 EEG sensors.
MPCA was carried out using the implementation of Lu et al. (2008). Given a desired number of linearly uncorrelated temporal (l 1 < m) and spatial (l 2 < k) components, MPCA decomposes the data into a weight matrix, W (of size n × l 1 × l 2 ), and two coefficient matrices: T, of size l 1 × m, for the temporal components, and S, of size l 2 × k, for the spatial components. As W is three-dimensional, a joint reconstruction in space and time can be carried out.
After having successfully validated the consistency of principal components across subjects, we carried out a single MPCA on a three-dimensional data matrix, where subjects' recordings, each of size n × m × k, are stacked along the first dimension to form an input matrix X ∈ R (n×s)×m×k , where s represents the number of subjects. We therefore interpreted data from different subjects as additional observations of the same variables. This leads to all subjects sharing the same components and differing only in their component weights.
The first ten temporal and spatial principal components obtained through this MPCA are shown in Figures 2 and 3.

Component selection
In order to select the principal components for the subsequent modelling step, we assessed how much of the relevant signal inherent in the original data is maintained when projecting the MPCA weights back to their original space using only a small subset of spatial and temporal components. We defined the relevant 'signal' as the amplitude of the negativity in the typical MMN time window at a sensor which typically shows strong MMN effects. To quantify the signal-to-noise ratio (SNR) both in the original and in the MPCA-reduced data, we assessed the maximum negative voltage within a time window of +100 to +248 ms (relative to the average voltage across all other time points of the measurement) at sensor Fz for each trial, and compared the distribution of measures for all standard vs. deviant trials using Welch's t-test statistic for unequal variances.
Beginning with the frontal electrode Fz and using the approach outlined above, we assessed the SNR of our signal when using only a single spatial and temporal component, relative to that of the original EEG data.
We found that, for most subjects, the MPCA reconstruction using the combination of spatial component 1 and temporal component 8 resulted in higher scores than the raw data (see Figure 4), whereas for all other combinations, the opposite was the case. This combination of spatial and temporal component also turned out to be the winning feature when (i) instead of the maximal negative voltage within the specified time window, the area under the curve was measured; when (ii) data from different sensors (Fz, FCz, and Oz) were assessed; and when (iii) the component was chosen with the strongest correlation between trial-by-trial component weights and the trial-by-trial precision-weighted prediction error trajectories of a Bayesoptimal agent under the HGF.

Modeling
The perceptual model We hypothesized that subjects exposed to the MMN tone sequence would infer upon two quantities: (i) the probability of hearing a high-pitched vs. low-pitched tone and (ii) the volatility of this probability, or how quickly this quantity is currently changing. As perceptual model, we therefore chose the 3-level HGF for binary inputs (Mathys et al., 2011), where the second level describes the belief about the current tendency towards a tone category (high-pitch vs. low-pitch), i.e., the current regularity in the tone sequence, and the third level captures the agent's belief about environmental volatility.
In the HGF, the agent's beliefs about both of these quantities are updated on each trial k in response to the tone input, where the magnitude of the update of the posterior mean is proportional to a precision-weighted prediction error: where ∆µ i−1 represents the prediction error measuring the difference between the prediction made for, and the actual input observed at, the level below the current one. This prediction error is then weighed by a ratio of precisions, such that more precise predictions about the level below the current one (π (k) i−1 ) lead to stronger updates, while more precise beliefs at the current level (π (k) i ) are less impacted by PE-driven updates.
On the lower hierarchical level, the mean µ 2 of the belief about the regularity is updated according to Equation 2.
where δ 1 represents the prediction error about the stimulus and σ 2 is the uncertainty (inverse precision) about the current estimate of µ 2 . Together, they form the lower-level precisionweighted PE ε 2 .
The HGF has a number of perceptual parameters that describe the individual cognitive processing style of an agent and determine the exact belief updates. For the full model specification including all parameters, please refer to Mathys et al. (2011). Here, we were particularly interested in the tonic learning rate ω 2 , which quantifies an agent's general willingness to update her beliefs irrespective of (or in addition to) her current estimate of volatility. We thus focused on inferring this parameter from the EEG data. In the following, we always fixed all other perceptual parameters to their Bayes optimal value, which is the value resulting in the least surprise over the whole input sequence.
The response model Given the MMN input sequence and a choice of parameter values for the perceptual model, the HGF generates trial-by-trial trajectories of beliefs. Based on those trajectories, we designed a response model to generate trial-by-trial EEG responses y (k) -here, corresponding to the trial-by-trial weights of the chosen principal components. As the MMN has frequently been viewed as the manifestation of a precision-weighted prediction error (Lieder et al., 2013), we sought to generate our responses (see Equation 3) based on the lower-level prediction error that is irrespective of a particular stimulus category (the absolute of δ 1 ), weighted by its precision weight (σ 2 ), as shown in Equation 3.
where η ∼ N (0, ζ), Here, the response model parameter β 0 can be understood as the offset, β 1 as the effect of our logit-transformed and precision-weighted prediction error, and η models Gaussian noise with zero mean. The logit transform (see Equation 4) was applied to map the (absolute) prediction errors, which, due to the binary nature of the inputs, lie between 0 to 1, to a range from minus to plus infinity.
Parameter recovery Examining the practical identifiability of parameters with our model, we found that the recovery of a single parameter in isolation is unproblematic but that β 1 and ω 2 are highly correlated and that high levels of noise (ζ ≥ 2000) pose problems for a successful parameter recovery. In particular, in those noise regimes, only parameter sets including reasonably high values of ω 2 and β 1 can be recovered reliably.
Fitting the data Given the difficulty of simultaneously estimating β 1 and ω 2 , we started by fixing β 1 and estimating β 0 , ω 2 as well as ζ. Figure 5 shows the results for fitting ω 2 , β 0 , and ζ while fixing β 1 = −5. We see that, for all subjects, the fitting procedure succeeded and the posterior means differed, for all three parameters (β 0 , ω 2 , and ζ), from the corresponding priors. We next systematically varied β 1 and found internal consistency in the ω 2 estimates across the range of β 1 -values that, in a previous parameter recovery, were found to be identifiable. Figure 5: Fitting β 1 , ω 2 , and ζ, with β 1 = -5. The prior means for (β 0 , ω 2 , ζ) were set to (0, -3, 500); their corresponding variances to (4, 4 4 , 4). The red lines correspond to the prior means; the y-axes indicate the parameter values.

Discussion
Overall, while our results represent a first step towards successfully estimating perceptual parameters of the HGF based on EEG recordings, the exceedingly high estimates for our noise parameter as well as the high parameter correlation between β 1 and ω 2 currently still pose restrictions on the interpretation of our results. For future applications of this pipeline, it will be important to solve these issues and potentially explore other mappings from HGF quantities to EEG features. In the future, we hope that our approach will increase the utility of the MMN paradigm for cognitive neuroscience and computational psychiatry by helping to reveal the nature and time course of perceptual inference in schizophrenia and other illnesses.