Autistic traits, but not schizotypy, predict increased weighting of sensory information in Bayesian visual integration

Recent theories propose that schizophrenia/schizotypy and autistic spectrum disorder are related to impairments in Bayesian inference that is, how the brain integrates sensory information (likelihoods) with prior knowledge. However existing accounts fail to clarify: (i) how proposed theories differ in accounts of ASD vs. schizophrenia and (ii) whether the impairments result from weaker priors or enhanced likelihoods. Here, we directly address these issues by characterizing how 91 healthy participants, scored for autistic and schizotypal traits, implicitly learned and combined priors with sensory information. This was accomplished through a visual statistical learning paradigm designed to quantitatively assess variations in individuals’ likelihoods and priors. The acquisition of the priors was found to be intact along both traits spectra. However, autistic traits were associated with more veridical perception and weaker influence of expectations. Bayesian modeling revealed that this was due, not to weaker prior expectations, but to more precise sensory representations.


Introduction
In recent years Bayesian inference has come to be regarded as a general principle of brain function that underlies not only perception and motor execution, but hierarchically extends all the way to higher cognitive phenomena, such as belief formation and social cognition.
Impairments of Bayesian inference have been proposed to underlie deficits observed in mental illness, particularly schizophrenia 1---3 and autistic spectrum disorder (ASD) 4---7 . The general hypothesis for both disorders is that the weight, also called "precision", ascribed to sensory evidence and prior expectations is imbalanced, resulting in sensory evidence having relatively too much influence on perception.
In schizophrenia, overweighting of sensory information could explain the decreased susceptibility to perceptual illusions 8 , as well as the peculiar tendency to jump to conclusions 9 . Moreover, the systematically weakened low---level prior expectations might lead to forming compensatory strong and idiosyncratic high---level priors (beliefs), which would explain the emergence and persistence of delusions as well as reoccurring hallucinations 1---3 .
In ASD, the relatively stronger influence of sensory information could explain hypersensitivity to sensory stimuli and extreme attention to details. The weaker influence of prior expectations would also result in more variability in sensory experiences. The desire for sameness and rigid behaviors could then be understood as an attempt to introduce more predictability in one's environment 4 . Furthermore, this could lead to prior expectations which are too specific and which do not generalize across situations 5 . While all theories agree that the relative influence of prior expectations is weaker in ASD, the primary source of this imbalance is debated: does it arise from increased sensory precision (i.e. sharper likelihood) or from reduced precision of prior expectations? 10---12 (Fig.  1). Some authors argue for attenuated priors 4,11 , while others argue for increased sensory precision 6, 7, 10, 13 but conclusive experimental evidence is lacking.
A number of studies have aimed at testing Bayesian theories, either in a clinical population, or by studying individual differences in the general population 14---17 under the hypothesis of a continuum between autistic/schizotypal traits and ASD/schizophrenia 18---20 . Attenuated slow--speed priors were reported in a motion perception task in individuals with ASD traits 14 .
Autistic children also showed attenuated central tendency prior in temporal interval reproduction 21 . Attenuated priors were also reported in perceptual tasks that incorporate probabilistic reasoning 15,22 . However, the direction of gaze priors 23 and the light---from---above priors 24 were found to be intact. Autistic children also demonstrated intact ability to update their priors in a volatile environment in a decision---making task 25 but a follow---up study in ASD adults showed that they overestimate volatility in a changing environment 26 .
In schizophrenia/schizotypal traits, Teufel et al. 16 reported increased influence of prior expectations when disambiguating two---tone images, while Schmack et al. 27,28 reported weakened influence of stabilizing predictions when observing a bistable rotating sphere.
Overall, the existing findings are not only mixed, but also employ very different paradigms, which makes their direct comparison difficult. Further, a critical limitation of most studies (except for Karaminis et al. 21 ) is the lack of formal computational models that can test whether behavioral differences originate from different priors or from different likelihoods.
Moreover, to our knowledge, despite the similarity of the Bayesian theories proposed for ASD and schizophrenia, there is no previous work investigating both autistic and schizotypal traits within the same experimental paradigm so as to test their differences. We here address these questions empirically in a context of visual motion perception. We used a previously developed statistical learning task 29 in which participants have to estimate the direction of motion of coherently moving clouds of dots (Fig.  2). Chalk et al. 29 found that in this task healthy participants rapidly and implicitly develop prior expectations for the most frequently presented motion directions. This in turn alters their perception of motion on low contrast trials resulting in attractive estimation biases towards the most frequent directions. In addition, prior expectations lead to reduced estimation variability and reaction times, as well as increased detection performance for the most frequently presented directions. When no stimulus is presented, the acquired expectations sometimes lead to false alarms ('hallucinations'), again, mostly in the most frequent directions. Importantly, such biases were well described using a Bayesian model, where participants acquired a perceptual prior for the visual stimulus that is combined with sensory information and influences their perception. As such, this paradigm is well suited to quantitatively model variations in likelihoods and priors in individuals with ASD or schizotypal traits.

Results
Here, we investigated individual differences in statistical learning in relation to autistic and schizotypal traits in a sample of 91 healthy participants. 8 participants failed to perform the task satisfactorily and were excluded from the analysis (see Methods), leaving 83 participants in the study (41 women and 42 men, age range: 18---69; mean: 25.7).

Task behavior at low contrast
First, we investigated whether participants acquired priors on the group level. We discarded the first 170 trials as that is how long it took for the 2/1 and 4/1 staircases contrast levels to converge (Supplementary Fig. 2) and for prior effects to become significant (Section 3 in Supplementary Material). We analyzed task performance at low contrast levels (converged 2/1 and 4/1 staircases contrast levels) where sensory uncertainty is high. Replicating findings of Chalk et al. (2010), we found that on the group level people acquired priors that approximated the statistics of the task. Such priors were indicated by: attractive biases towards ±32 • (Fig.  3a), less variability in estimations at ±32 • ( Fig.  3b;

No---stimulus performance
Another indicator of acquired priors is the distribution of estimation responses on trials when no actual stimulus was presented. We found that participants sometimes still reported seeing dots (experienced 'hallucinations') but mostly so around ±32 • (Fig.  3f, solid line). To quantify the statistical significance of 'hallucinations' around ±32 •, the space of possible motion directions was divided into 45 bins of 16 • and the probability of estimation within 8 • of ±32 • was multiplied by the total number of bins: p rel = p(θ est = ±32(±8 • ) ·• N bins , (1) where N bins is the number of bins (45), each of size 16 • . This probability ratio would be equal to 1 if participants were equally likely to estimate within 8 • of ±32 • , as they were to estimate within other bins. We found that the median of p rel was significantly greater than 1 (median( p rel) = 1.6, p<0.001, signed rank test). Furthermore, the estimation distribution when no dots where detected (Fig. 3f, dash---dot line) was found to be significantly flatter (median( p rel) = 0, p < 0.001, signed rank test comparing with the median of p rel for 'hallucinations'), suggesting that the 'hallucinations' were indeed of perceptual nature (rather than related to a response bias).

Task performance and autistic/schizotypy traits
Participants were prescreened to make sure they covered a wide range of autistic and schizotypy scores. The AQ scores in our sample ranged from 6 to 41 with a mean (±SD) of 20.3 (±8.3). The RISC scores ranged from 8 to 55 with a mean of 31.7 (±11.9), and the SPQ scores ranged from 4 to 59 with a mean of 26.4 (±13.8).
We found significant effects of autistic traits on the performance at low contrast trials: autistic traits were associated with less bias ( Fig.  4a; mean absolute estimation bias: ρ = −0.228, p = 0.039) and less variability in estimations ( Fig.  4b; mean standard deviation of estimations: ρ = −0.357, p = 0.001). In the Bayesian framework, less bias could arise either due to wider priors or narrower sensory likelihoods, while less variability could be a result of either narrower priors or narrower likelihoods (see Fig. 1). Thus, observing less bias and less variability together suggests that the effects are driven by narrower likelihoods. An alternative is that the differences in variability could be due to differences in motor noise, which we further assess via modeling (below).
Schizotypy traits (RISC and SPQ scores) were found to have no effect on task performance at

No---stimulus trials and autistic/schizotypal traits
We also investigated how the traits affected performance on trials when no actual stimulus was presented. First, we looked at the total number of estimations. We found that autistic traits were associated with less 'hallucinations' ( Specifically, we were interested in whether the traits predicted how densely 'hallucinations' were distributed around ±32 • , as this could be considered to reflect the differences in the width of the underlying acquired prior distribution. To determine this, we looked at the fraction of total 'hallucinations' in the region around ±32 • for three different---sized windows:

Group level results
To quantitatively evaluate the relationships between underlying perceptual mechanisms and task performance we fitted a range of generative models. One class of models was Bayesian ---it was based on the assumption that participants combine prior expectations with uncertain sensory information on a single trial basis (Fig.  5).
To account for the possibility that the bimodal probability distribution of the stimuli, in addition to inducing prior expectations, has also affected the sensory likelihood, we To compare the models, we computed BIC values for each individual for each model; we used individual BIC values as a summary statistic and compared the models using signed rank test in order to preserve individual variability (Fig. 6a). We found that the BAYES model had significantly smaller BIC values than the remaining models (see the p---values within Fig.  6a).    Fig.  5).

Parameter recovery for BAYES
Finally, to further investigate that in our experimental paradigm the influence of stronger likelihoods can be distinguished from that of weaker priors 10,11 we performed parameter recovery for the winning BAYES model. Parameter recovery involves generating synthetic data with different sets of parameters ('actual parameters') and then fitting the same model to estimate the parameters ('recovered parameters') that are most likely to have produced the data. If actual and recovered parameters are in a good agreement, it means that the effects of different parameters can be reliably distinguished. At the same time, parameter recovery is also affected by the parameter estimation methods and even more so by the amount of data used for model fitting. Therefore, parameter recovery provides an overall check for the reliability of modelling results and is recommended as an essential step in computational modelling approaches 30 .
We found that overall BAYES model (and MLE parameter estimation using simplex optimization function) recovered parameters well (Fig.  8).
Parameter of the highest relevance for our results, the uncertainty in the sensory likelihood , σ sens , was recovered most reliably (r = 0.88), followed by the fraction of random estimations, α (r = 0.87), the mean of the prior expectations distribution, θ exp (r = 0.77) and the uncertainty in prior expectations, σ exp (r = 0.68).

Discussion
In this study, we investigated whether autistic and schizotypal traits are associated with differences in the implicit Bayesian inference performed by the brain. Specifically, we wanted to know whether autistic and schizotypal traits are accompanied by 1) differences in how the priors are updated and/or in their precision and/or by 2) differences in the precision with which the sensory information (the likelihood) is represented. We used a visual motion estimation task 29 that induces implicit prior expectations via more frequent exposure of two motion directions (±32 • ). We found that on the group level (N=83) participants acquired prior expectations towards ±32 • motion directions. This was indicated by shorter estimation reaction times and better detection at ±32•, as well as attractive biases towards ±32 • and reduced estimation variability at ±32 • . Moreover, when no stimulus was presented, participants sometimes still reported seeing the stimulus, mostly around ±32 • .
Performance was best explained by a simple Bayesian model, which provided a good fit to the data and captured the characteristic features of perceptual bias and variability. This model provided estimates of Bayesian priors and sensory likelihoods for each participant, which were then analyzed in relation to participants' schizotypal and autistic traits.
Schizotypal traits were found to have no measurable effect on perceptual biases in our task and, therefore, were not associated with any differences in the precision ascribed to priors and likelihoods. This finding challenges recent accounts of positive symptoms of schizophrenia that predict impaired updating of priors and an imbalance in precision ascribed to sensory information and prior expectations 1---3 . An immediate explanation might be that the influence of schizotypal traits in the healthy population is not strong enough to lead to behavioral differences, even if the dimensionality assumption holds. It is likely for example, that, even if they sometimes scored high in schizotypal traits, our participants didn't experience daily hallucinations. That they would not exhibit an overweighting of perceptual priors would then be consistent with the recent study of Autistic traits were associated with significant behavioral differences: weaker biases and lower variability of direction estimation on low contrast trials. Modeling revealed that this was because of increased sensory precision as well as a reduction in motor noise, while there was no attenuation of acquired priors. Parameter recovery analysis confirmed that our methodology provides reliable parameter estimates and, in particular, allows disentangling variations in priors and likelihoods.
Autistic traits were also found to be associated with less false detections ('hallucinations') on trials when no stimulus was presented, consistent with the idea that prior expectations had less influence in individuals with higher AQ. In an attempt to measure those individual differences, we fitted a more sophisticated Bayesian model that could account not only for the estimation performance but also for the detection data (see S4 in Supplementary Material). This model provided a good fit to both estimation and detection data, and preserved the correlation between ASD traits and the precision of the motion direction likelihood (ρ= ---0.235, p= 0.032). However, parameter recovery was not as good as for the BAYES model presented above (see Supplementary  Fig.  11) and for this reason we focused on the simpler model in this paper.
Overall, our findings are in agreement with most of the recent Bayesian theories of ASD, namely, that autistic traits are associated with a relatively weaker influence of prior expectations. However, we find that this is due to enhanced sensory precision 6, 7, 10, 13 , rather than attenuated priors per se 4 . Other empirical studies inspired by the Bayesian accounts have reported either attenuated or intact priors, but most are subject to methodological limitations, either because they did not use computational modeling 15, 22,---24 or because their model could not extract likelihoods and quantify their variations 14,26 .
The idea that sensory processing could be enhanced in autism has long been proposed outside the Bayesian framework. Autistic traits have been associated with enhanced orientation discrimination 33 , but only for first---order (luminance---defined) stimulus 34 . This enhancement has been proposed to be a result of either enhanced lateral 34 , or a failure to attenuate sensory signals via top---down gain control 6 , both of which could be directly related to narrower likelihoods in the Bayesian framework 35 . However, in motion perception, previous research did not find improved discrimination for first---order stimulus in autism, while for second---order (texture---defined) stimulus, the autistic group was found to underperform 36 . Our findings challenge these results and call for more research in this area.
In ASD as in schizotypy, prior integration might function differently at different levels of sensory processing. For example, Pell et al. 23 reported intact direction---of---gaze priors for healthy individuals with high autistic traits and for highly functional individuals with a clinical diagnosis. The authors did not directly investigate differences in sensory precision, but the lack of behavioral differences suggests that there was none. Arguably, their paradigm involves more complex stimuli than used in our task, which are also strongly associated with semantic content (faces). It would not be surprising if increased sensory precision does not extend to such stimuli. In fact, autistic individuals are known to exhibit differential performance based on the complexity of the stimulus 34 , which also lies at the foundation of some theoretical accounts, such as the 'Weak Central Coherence' 37 .
In our paradigm people acquire prior expectations very quickly, within 200 trials (see Section 3 in Supplementary material), which did not allow us to study individual differences in the rate at which the priors are acquired. Bayesian accounts predict differences in the dynamical updating of the priors, namely, that both autistic and schizotypal traits should be associated with increased learning rate ---which is the ratio of likelihood and posterior precisions 7 . Our findings of increased sensory precision in autistic traits also suggest that their learning rate should be faster. Future work will aim at directly testing this.
Another aspect that our paradigm could not test is the specificity of the acquired priors 32 .

Some Bayesian accounts 5 predict that priors may be overly context---sensitive in autism.
This is in line with the view that generalization is impaired in autism 38 . Furthermore, such over---specificity is thought to be stronger with more repetitive stimuli 39 . Future research could address this using statistical learning paradigms that incorporate increasingly distinct contexts or stimuli.

Conclusion
We investigated statistical learning and Bayesian inference in a visual motion perception task along autistic and schizotypal traits. To our knowledge, this study is the first to investigate differences in Bayesian inference along both trait spectra in a single task.
Furthermore, this study is the first visual study to computationally disentangle and quantitatively assess the variations in individuals' likelihoods and priors. Surprisingly, schizotypal traits were found to have no effect on task performance and thus were not associated with any differences in the underlying statistical learning and Bayesian inference. For autistic traits, however, significant behavioral differences in prior integration were found, which were due to an increase in the precision of internal sensory representations in participants with higher AQ. Personality Questionnaire (SPQ) 42 . Finally, all participants were also asked to complete the Warwick---Edinburgh Mental Well---being Scale (WEMWBS) 43 in order to control for potential depression---induced differences in performance 44 .

Apparatus
The visual stimuli were generated using Matlab Psychophysics Toolbox 45 . Participants viewed the display in a dark room at a distance of 80---100cm. The stimuli consisted of a cloud of dots with a density of 2 dots/deg 2 moving coherently (100%) at a speed of 9 • /sec. Dots appeared within a circular annulus with minimum diameter of 2.2 • and maximum diameter of 7 • . The stimuli were displayed on a Dell P790 monitor running at 1024×768 at 100 Hz. The display luminance was calibrated using a Cambridge Research Systems Colorimeter (ColorCal MKII).

The task
The task was developed previously in our laboratory 29 . Participants have to: i) estimate the direction of coherently moving simple stimuli (dots) that are presented at low contrast levels (estimation task) and then ii) indicate whether they have actually perceived the stimulus or not (detection task). Since Chalk et al. 29  presented, the bar still appeared for the estimation task to be completed.
After a 200ms delay, the participants had to indicate whether they had actually detected the presence of dots in the estimation period (detection task). The display was divided into two parts by a vertical white line across the center of the screen, the left hand side area reading "ʺNO DOTS"ʺ and the right hand side area reading "ʺDOTS"ʺ (Fig. 2a). The cursor appeared in the center of the screen, and participants had to move it to the left or right and click to indicate their response. Immediate feedback for correct or incorrect detection responses was given by a cursor flashing green or red, respectively. The screen was cleared for 400 ms before the start of a new trial. Every 20 trials, participants were presented with feedback on their estimation performance in terms of average estimation error in degrees (e.g., "ʺIn the last 20 trials, your average estimation error was 23 • "ʺ). Every 170 trials (i.e. on three occasions) participants were given a chance to "ʺhave a short break to rest their eyes"ʺ, in order to prevent fatigue. Participants clicked when they were ready to continue.

Design
The stimuli were presented at four different levels of contrast: 0 contrast (no---stimulus trials), 2 low levels contrasts and high contrast, randomly mixed across trials. There were 167 trials with no stimulus. The 2 low levels of contrast were determined using 4/1 and 2/1 staircases on detection performance 46 . There were 243 trials following the 4/1 staircase and 90 trials following the 2/1 staircase. The remaining 67 trials were at high contrast, which was set to 3.51 cd/m 2 above the background luminance.
For the two low contrast levels, there was a predetermined number of possible directions: 0 • , ±16 • , ±32 • , ±48 • , and ±64 • with respect to a reference direction. The reference direction was randomized for each participant. For the 2/1 staircased contrasts, each predetermined motion direction was presented equally frequently. Unbeknownst to participants, stimuli at high and 4/1 staircase contrasts were presented more frequently at ---32 • and +32 • motion directions, resulting in a bimodal probability distribution (Fig. 1b).

Data analysis
Responses on high contrast trials were used as a performance benchmark to ensure that participants were performing the task adequately. 8 out of 91 participants failed to satisfy pre---defined performance criteria (at least 80% detection and less than 30 • root mean squared error of estimations) and were excluded from further analysis ( Supplementary   Fig.  1).
Data analysis on the estimation of motion directions was performed on 4/1 and 2/1 staircased contrast levels only and only on trials where participants both validated their choice with a click within 3000 ms in the estimation part and clicked "ʺDOTS"ʺ in the detection part. The first 170 trials of each session were excluded from the analysis, as this was the upper limit for the convergence of the staircases to stable contrast levels ( Supplementary  Fig.    2).
After removing these trials, the luminance levels achieved by the 2/1 and 4/1 staircases were found to be considerably overlapping ( Supplementary  Fig.  2). Therefore, the data for both of these contrast levels was combined for all further analysis.
To account for random estimations (either accidental or intentional) that participants made on some trials, we fitted each participant's estimation responses to the probability distribution: (1−α)·•V(θ|µμ,κ) + α, (2) Where α is the proportion of trials in which participant makes random estimates, and V(θ|µμ,κ) is the probability density function for the estimated angle θ for von Mises (circular normal) distribution with the mean µμ and variance 1/κ. The parameters µμ and κ of the von Mises distribution were determined by maximizing the likelihood of the distribution in Eq. (2) for each presented angle.
To analyze the distribution of estimations in no---stimulus trials, we constructed histograms of 16 • size bins. These histograms were converted into probability distributions by normalizing over all motion directions. We analyzed the estimation distribution when participants reported seeing dots (clicked "ʺDOTS"ʺ) within no---stimulus trials. We interpreted these false alarms as a simple form of perceptual "hallucination".

Bayesian models
Bayesian models assume that participants combined a learned prior of the stimulus directions with their sensory evidence in a probabilistic manner. We first assume that participants make noisy sensory observations of the actual stimulus motion direction ( θ act), with a probability p sens (θ sens |θ act ) = V(θ t , κ sens ).
where θ t itself varies from trial to trial around θ act according to p(θ t |θ act ) = V(θ act , κ sens ).
While participants cannot access the "true" prior, p exp(θ), directly, we hypothesized that they learned an approximation of this distribution, denoted p exp(θ). This distribution was parameterized as the sum of two von Mises distributions, centered on motion directions θ exp and ---θ exp, and each with variance 1/κ exp : Combining these via Bayes' rule gives a posterior probability that the stimulus is moving in a direction θ: p post (θ|θ sens ) ∝ p exp (θ) ·• p sens (θ sens |θ) The perceived direction, θ perc, was taken to be the mean of the posterior distribution (almost identical results would be obtained by using the maximum instead). Finally, we accounted for motor noise and a possibility of random estimates on some trials via: p(θ est |θ perc ) = (1−α) ·• V(θ perc , κ m ) + α, where α is the proportion of trials in which participants make random estimates and 1/κ m is the variance associated with motor noise.
Increased exposure to some motion directions might not only give rise to prior expectations, but also affect the likelihood function 47 . Therefore, we fitted two more model variants: 'BAYES_var' where κ sens varied with the stimulus direction (i.e. it took five different values for each of the angles: 0•, ±16•, ±32•, ±48, ±64) and 'BAYES_varmin' where κ sens was allowed to be different for ±32• but was the same for all other directions.

Response strategy models
We wanted to test whether task behavior might be better explained by simple behavioral strategies. This class of models assumed that on trials when participants were unsure about the presented motion direction, they made an estimation based solely on prior expectations, while on the remaining fraction of trials they made unbiased estimates based solely on sensory inputs. The first model, 'ADD1', assumed that estimations derived from prior expectations were simply sampled from a learnt expected distribution, p exp(θ) (see Chalk et al. 29 and Supplementary Information) . We also considered slight variations of the 'ADD1' and 'ADD2' models, denoted 'ADD1_m' and 'ADD2_m' respectively. These were identical to 'ADD1' and 'ADD2' except from setting 1/κ exp to zero; that is, on trials when perceptual estimates were derived only from expectations, they were equal to the mode of the learnt distribution (i.e. no uncertainty).

Parameter estimation
We used performance in high contrast trials to estimate motor noise, 1/κ m , for each individual. We assumed that, for those trials, sensory uncertainty was close to zero M) corresponds to average behavior in the task.
The parameters were estimated by maximizing the fit of the log likelihood function for the experimental data for each participant individually. The maximum likelihood was found using a simplex algorithm, using "ʺfminsearchbnd"ʺ Matlab function. To avoid convergence at a local maximum we constructed a grid of initial κ exp and κ sens parameter values covering the range found in previous studies. We selected the resulting set of parameters that corresponded to the largest log---likelihood.

Model Comparison
To compare the model fits we used Bayesian Information Criterion (BIC), which approximates the log of model evidence 48 : −2 ·• log(P (D|M )) ≈ BIC = −2 ·• log(P (D|M, Θˆ )) + k ·• log(n), (7) where M is model, D is observed data and P (D|M, Θˆ ) is the likelihood of generating the experimental data given the most likely set of parameters, Θˆ ; k is the number of model parameters and n is the number of data points (or equivalently, the number of trials). BIC evaluates the model by how it fits the data by also penalizing for model complexity (number of parameters); lower BIC score indicates a better model.

Parameter recovery
To determine whether the BAYES model can distinguish the effects of strong likelihoods from those of weak priors 10,11 and to evaluate the robustness of our methods, we performed parameter recovery. First, we generated 80 sets of parameters (i.e. 80 synthetic individuals) by randomly sampling each parameter from a Gaussian distribution centered on the mean value of each parameter found in our sample (

Statistical tests
Due to the presence of outliers in our data, we used Spearman's correlations for measuring the strength of the effects. We have also used Wilcoxon signed rank test for repeated measures analysis.