Bayesian Model for Multisensory Integration and Segregation

Multisensory integration and segregation are important for processing perceived information in animals. Experimental data indicate that the brain processes information in a Bayesian way. We consider a recently proposed model that is able to perform both multisensory integration and segregation concurrently using congruent and opposite groups of neurons in each sensory module. By incorporating output-dependence in the noise of the neural dynamics, we show that the model is able to yield estimates with excellent agreement with Bayesian inference in the weak input limit, and fairly good agreement in stronger inputs. When the prior consists of a correlated component and an independent component, we show that Bayesian inference can be achieved by incorporating an additional layer of neuron groups.


Introduction
Our brains process information from different sensory modalities. If two cues are received from the same source, the neural system will integrate those sensory signals. Otherwise, they should be segregated. Experimental data suggested that the brain can integrate visual and vestibular cues to infer headingdirection according to Bayesian prediction (Fetsch, Deangelis, & Angelaki, 2013). In the dorsal medial superior temporal (MSTd) area and the ventral intraparietal (VIP) area, there exist two types of neurons, congruent and opposite cells (Gu, Angelaki, & DeAngelis, 2008;Chen, DeAngelis, & Angelaki, 2013). Here, we consider a recently proposed model in which the congruent and opposite neurons play a role in multisensory integration and segregation respectively (Zhang et al., 2019). In the model, the neural circuit consists of two modules, each containing two groups of excitatory neurons -congruent and opposite neurons. It was shown that the proposed network yields the Bayesian posterior estimate in a broad range of parameters, but there are also parameter ranges that the inference can only be approximately Bayesian. Hence, in this paper we will approach the dynamics analytically and propose improvements for achieving Bayesian inference. Furthermore, the Bayes-optimality in Zhang et al. was based on a prior distribution of stimuli that is fully correlated. In practice, there are many other scenarios described by priors with more than one components. For example, studies in causal inference consider prior distributions with a correlated and an independent component (Körding et al., 2007;Shams & Beierholm, 2010). In the second half of the paper, we propose a neural circuit with additional modules to tackle these cases.

Network Model
We consider a neural network model (Zhang et al., 2019) receiving external inputs of modality 1 and 2 with I ext m (y,t), m = 1, 2, where y is an angular variable in the range (−π, π] and t is time. The two inputs are fed into two separate modules. Each module has two groups of congruent and opposite neurons, each neuron having a preferred stimulus y. The recurrent connections within each group are excitatory and dependent on the preferred stimuli of the neurons through a bumpshaped function of the separation of the stimulus positions, and there are global inhibitory connections connecting both groups, thus forming a continuous attractor neural network (Fung, Wong, & Wu, 2010). The congruent groups of each module are connected in a congruent manner, that is, neurons receiving inputs at position x of each module are reciprocally connected to each other. Likewise, the opposite groups of each module are connected in an opposite manner, that is, neurons receiving inputs at position x of one module are reciprocally connected to those at position x + π. Let ψ m (x,t) andψ m (x,t) be the synaptic input at position x and time t for the congruent and opposite groups respectively in module m, and denote asm the other module of module m. Then the neuronal dynamics of the congruent group is given by where J rc and J rp represent the strengths of the recurrent and reciprocal couplings respectively. R m (y,t) is the firing rate of the neurons at position y and t. It is given by R m (y,t) ≡ is the global inhibition acting on the congruent group in module m. V (y − y , a 0 ) is the von Mises function given by where a 0 is referred to as the concentration of the von Mises function, and I 0 (a 0 ) is the modified Bessel function of order 0 introduced to normalize the von Mises function. In Eq.
(1), we assign the external inputs to be the sum of a bump with constant background, plus its noisy component characterized by the Fano factor F 0 , I ext m (y,t) = I m V (y − x m , where ε m (y,t) is Gaussian white noise of zero mean and variance satisfying ε m (y,t), ε m (y ,t ) = δ mm δ(y − y )δ(t − t ).
These equations can be solved by numerical simulations to obtain the means and variances of the firing rates. In the framework of probabilistic population coding (Ma, Beck, Latham, & Pouget, 2006), the posterior estimates of the external inputs can be derived from these quantities. Using the projection method (Fung et al., 2010), we first attempted first order perturbation using the von Mises function and its derivative as the basis. They represent distortions of the height and position of the bump-shaped firing rate distributions respectively, but the calculated means and variances of the distributions deviated from the numerical results. A careful inspection of the firing rate distributions showed that their profiles were not calculated accurately. Hence, higher order perturbations describing the distortions in the height, position, width and skewness have to be introduced. We approximate the solution to the dynamical equations to be ψ m =u m0 + u m1 cos(y 1 − s 1 ) + u m2 cos 2(y 1 − s 1 ) + u m3 sin 2(y 1 − s 1 ), m = 1, 2.
The background, height, position, width and skewness are largely determined by the coefficients u m0 , u m1 , s m , u m2 and u m3 respectively. Multiplying both sides of Eqs. (1) and (4) by 1, cos(y − s m ), cos 2(y − s m ) and sin 2(y − s m ) in turn and integrating over y, we obtain the steady state equations for this set of coefficients after averaging over noise. By considering the linear perturbation around the steady state, the variancê σ 2 m of the peak positionsŝ m can be found.

Bayesian Inference
Consider the task of inferring the stimuli s m (m = 1, 2) from the received cues x m (m = 1, 2). It has been shown that for uniform distributions of s m and cues x m , the condition for the network to produce Bayesian inference is that the marginal posterior distribution of s 1 conditional on the direct cue x 1 and indirect cue x 2 is given by (Zhang et al., 2019) p where the marginal posterior distribution p(s 1 |x 2 ) given the indirect cue x 2 depends on the prior distribution p(s 1 , s 2 ) via p(s 1 |x 2 ) ∝ ds 2 p(x 2 |s 2 )p(s 1 , s 2 ).
Hence, to verify whether the congruent groups of the proposed network is able to make Bayesian predictions, one may use them to estimate the posterior distributions of s 1 when it receives cue 1 only, cue 2 only, and cues 1 and 2 combined, and test whether the result of the combined cues agrees with those predicted from the single cues according to Eq. (6).
We first consider a prior that the stimuli s 1 and s 2 are correlated. In particular, we consider the prior p(s 1 , s 2 ) = V (s 1 − s 2 , κ s ).
The subscripts represent the non-vanishing stimuli applied to the network, and the hat accents represent the network estimates.  Figure 2: Network architecture for priors with correlated and independent components. Parallel arrows represent congruently connected couplings. Crossed arrows represent couplings shifted by π.
To segregate the information from the two cues, we consider the disparity information of stimulus 1 defined to be Noting that the cosine function satisfies cos(y − y − π) = cos(y − y ), we obtain Hence, the mean ∆ŝ 1 and concentration ∆κ 1 of the disparity information, to be estimated by the opposite groups of neurons, are given bŷ κ 1 e jŝ 1 | I 1 ,I 2 =κ 1 e jŝ 1 | I 1 −κ 2s e jŝ 2 | I 2 . (13) Equations (10) and (13) are used to generate Bayesian predictions based on the single-input estimates and compare with the combined estimates generated from network simulations (the estimated concentration will be compared with the inverse of the variance). As shown in Fig. 1, we find that the network can implement Bayesian inference in the weak input limit. When the inputs are strong, the network prediction starts to deviate from the Bayesian inference, but the estimates remain reasonably close. In contrast, in the original model of Zhang et al., s m and κ m cannot be estimated accurately simultaneously. This shows that the incorporation ofÂ in the noise amplitude improves the accuracy of Bayesian prediction.

Priors with an Independent Component
So far we have considered the prior in Eq. (8) in which the two stimuli are correlated. However, there are many other scenarios described by priors with an additional independent component. Those priors are often used in causal inference tasks, in which the subject is required to determine whether the two cues originate from the same stimulus or they are independent (Körding et al., 2007;Shams & Beierholm, 2010). Hence, we consider the following two-component prior, Using Eq. (7), the marginal posterior distribution p(s 1 |x 2 ) becomes p(s 1 |x 1 , where C is the normalization constant. Hence, we add another layer of neurons to represent the posterior taking account of the two components of the prior. The second layer receives the input from congruent neurons (representing the first term in Eq. (15)), and the feedforward inputs from the cue (corresponding to the second term in Eq. (15)). The dynamics of this group of neurons is given by Note that the neuron groups in the second layer do not have reciprocal connections from the other module. Hence, their output will be the weighted sum of the two types of input. Thus, for the case of combined cues, the output of the congruent group in module 1 will become p 0κ1 e jŝ 1 + (1 − p 0 )κ 1 e jx 1 for an appropriate choice of c k , where κ 1 is the concentration of the output from the neuron group in the second layer.
Meanwhile, κ 1 does not change when this network only receives direct stimulus 1. The output will then be p 0 κ 1 e jx 1 + (1 − p 0 )κ 1 e jx 1 . When the network only receives stimulus 2, the final output of the congruent group will be p 0κ2s e jx 2 . So in summary, the network has a Bayesian behaviour in all cases. Figure 3 shows the vector diagram for achieving information integration.
In Fig. 4 we compare the network behaviour in weak and strong inputs, corresponding to I = 0.01U 0 and I = 0.7U 0 respectively. The outputs from the second layer behave in a Bayesian way in the weak input limit. Although the prior is the sum of two von Mises functions, the output is not doublepeaked since the position disparity between inputs from the first layer and the external cue is small. Next, we consider information segregation. Using Eq. (11), the inverse of the disparity information is given by where C is the normalization constant. Note that the stimulus position is shifted by π for the opposite group. Hence, we see that the opposite group in the second layer has the same structure as that of the congruent group, except that the positions of the outputs from the second layer is shifted by π. Figure 4 shows that the disparity information agrees with Bayesian prediction accurately in the weak input limit.

Conclusion
We have analyzed the dynamics of neural circuits for multisensory integration and segregation using separate modules for each stimulus modality, and congruent and opposite groups of neurons in each module. By incorporating output-dependence in the noise of neural dynamics, we found that the estimates of the integrated posteriors and disparity information agree with Bayesian prediction accurately in the weak input limit. This illustrates the significance of feedback information in neural information processing, and generates an experimentally testable prediction about noisy neural dynamics. We further show that when the prior has more than one components, additional modules can be used to produce Bayesian prediction of the integrated information. This indicates the close relation between network architecture and the information structure of the environment, as represented by the prior distribution (Wang, Zhang, Wong, & Wu, 2017). For the composite prior with a correlated and independent component, the additional module is one that processes direct stimuli. This is readily found in biological systems, which are therefore endowed with the capacity to give Bayesian prediction in complex environment. It is possible to generalize the model to prior distributions with more than two components.