A circuit mechanism for irrationalities in decision-making and NMDA receptor hypofunction: behaviour, computational modelling, and pharmacology

Decision-making biases can be systematic features of normal behaviour, or deficits underlying neuropsychiatric symptoms. We used behavioural psychophysics, spiking-circuit modelling and pharmacological manipulations to explore decision-making biases in health and disease. Monkeys performed an evidence integration task in which they showed a pro-variance bias (PVB): a preference to choose options with more variable evidence. The PVB was also present in a spiking circuit model, revealing a neural mechanism for this behaviour. Because NMDA receptor (NMDA-R) hypofunction is a leading hypothesis for neuropathology in schizophrenia, we simulated behavioural effects of NMDA-R hypofunction onto either excitatory or inhibitory neurons in the model. These were tested experimentally using the NMDA-R antagonist ketamine, yielding changes in decision-making consistent with lowered cortical excitation/inhibition balance from NMDA-R hypofunction onto excitatory neurons. These results provide a circuit-level mechanism that bridges across explanatory scales, from the synaptic to the behavioural, in neuropsychiatric disorders where decision-making biases are prominent. Significance People can make apparently irrational decisions because of underlying features in their decision circuitry. Deficits in the same neural circuits may also underlie debilitating cognitive symptoms of neuropsychiatric patients. Here, we reveal a neural circuit mechanism explaining an irrationality frequently observed in healthy humans making binary choices – the pro-variance bias. Our circuit model could be perturbed by introducing deficits in either excitatory or inhibitory neuron function. These two perturbations made specific, dissociable predictions for the types of irrational decisionmaking behaviour produced. We used the NMDA-R antagonist ketamine, an experimental model for schizophrenia, to test if these predictions were relevant to neuropsychiatric pathophysiology. The results were consistent with impaired excitatory neuron function, providing important new insights into the pathophysiology of schizophrenia.


24
Decision-making biases can be systematic features of normal behaviour, or deficits underlying 25 neuropsychiatric symptoms. We used behavioural psychophysics, spiking-circuit modelling and 26 pharmacological manipulations to explore decision-making biases in health and disease. Monkeys 27 performed an evidence integration task in which they showed a pro-variance bias (PVB): a preference 28 to choose options with more variable evidence. The PVB was also present in a spiking circuit model, 29 revealing a neural mechanism for this behaviour. Because NMDA receptor (NMDA-R) hypofunction is 30 a leading hypothesis for neuropathology in schizophrenia, we simulated behavioural effects of NMDA-31 R hypofunction onto either excitatory or inhibitory neurons in the model. These were tested 32 experimentally using the NMDA-R antagonist ketamine, yielding changes in decision-making 33 consistent with lowered cortical excitation/inhibition balance from NMDA-R hypofunction onto 34 excitatory neurons. These results provide a circuit-level mechanism that bridges across explanatory 35 scales, from the synaptic to the behavioural, in neuropsychiatric disorders where decision-making 36 biases are prominent. 37

39
People can make apparently irrational decisions because of underlying features in their decision 40 circuitry. Deficits in the same neural circuits may also underlie debilitating cognitive symptoms of 41 neuropsychiatric patients. Here, we reveal a neural circuit mechanism explaining an irrationality 42 frequently observed in healthy humans making binary choices -the pro-variance bias. Our circuit 43 model could be perturbed by introducing deficits in either excitatory or inhibitory neuron function. 44 These two perturbations made specific, dissociable predictions for the types of irrational decision-45 making behaviour produced. We used the NMDA-R antagonist ketamine, an experimental model for 46 schizophrenia, to test if these predictions were relevant to neuropsychiatric pathophysiology. The Introduction 61 62 Schizophrenia is a debilitating neuropsychiatric disorder, associated with prominent deficits in 63 cognitive function 1-3 . Despite being the focus of intensive research, the neural bases of its 64 symptomatology remain poorly understood. Our current understanding of the pathophysiology of 65 schizophrenia mainly focuses on disruptions at the synaptic level. One line of investigations implicates 66 N-methyl-D-aspartate receptor (NMDA-R) dysfunction 4-6 , and NMDA-R antagonists have been used 67 as a pharmacological model of schizophrenia. When administered to healthy volunteers, they 68 transiently reproduce multiple aspects of the symptoms of schizophrenia, especially cognitive deficits 7-69 9 . One interpretation of these observations is that NMDA-R hypofunction causes an imbalance of 70 excitation and inhibition in cortical circuits 5,10,11 . However, linking these pathophysiological 71 mechanisms to the cognitive impairment observed in patients has proved challenging. 72 One difficulty is to carefully isolate which cognitive computations underlie neuropsychiatric symptoms. 73 Working memory deficits in patients with schizophrenia have been well-characterised, which has 74 facilitated preclinical research providing insights into potential pathophysiological mechanisms 2,12 . 75 However, whether these working memory deficits reflect a more general impairment in other 76 temporally extended cognitive processes in the symptomatology of schizophrenia remains an open 77 question. One closely related cognitive process is evidence accumulation -the decision process 78 whereby multiple samples of information are combined over time to form a categorical choice 13 . It has 79 been extensively studied using the random-dot motion (RDM) task, where subjects must decide the 80 net direction of a moving dots stimulus 13,14 . Patients with schizophrenia have impaired perceptual 81 discrimination on the RDM task [15][16][17] , but the precise nature of this decision-making deficit is unclear. 82 Previous studies have attributed it to an impaired representation of the sensory evidence in visual 83 cortex 15,18 , yet circuit-level alterations affecting visual cortex are likely also present in downstream 84 cortical association areas involved in evidence accumulation and decision-making. It is therefore 85 important to characterise precisely whether and how the underlying process of evidence accumulation 86 may be affected in schizophrenia. 87 Recent research has advanced our understanding of how such evidence accumulation decisions are 88 made in the healthy brain. Of particular relevance to psychiatric research, it has been possible to 89 disentangle systematic biases in decision-making and reveal the mechanisms through which they 90 occur. For instance, when choosing between two series of bars with distinct heights, people have a 91 preference to choose the option where evidence is more broadly distributed across 92 samples 19,20 . Although this "pro-variance bias" may appear irrational, and would not be captured by 93 many normative decision-making models, it becomes the optimal strategy when the accumulation 94 process is contaminated by noise 19 . These behaviours have presently been well-characterised using 95 algorithmic level descriptions of decision formation. By extending this approach to psychiatric 96 research, new insights could be gained into the decision making deficits in schizophrenia. However, in 97 order to understand how these decision biases might be affected by NMDA-R hypofunction, a more 98 mechanistic explanation is needed. 99 An influential technique used to investigate evidence accumulation at the mechanistic level has been 100 biophysically grounded computational modelling of cortical circuits [21][22][23] -R antagonists have been tested during  114  various decision-making tasks 27,28 , the role of the NMDA-R in shaping the temporal process of  115  evidence accumulation has not been characterised experimentally.  116   Here we used a psychophysical behavioural task in macaque monkeys, in combination with spiking  117  cortical circuit modelling and pharmacological manipulations, to gain new insights into decision-118 making biases in both health and disease. We trained two subjects to perform a challenging decision-119 making task requiring the combination of multiple samples of information with distinct magnitudes. 120 Replicating observations from humans, monkeys showed a pro-variance bias. The pro-variance bias 121 was also present in the spiking circuit model, revealing an explanation of how it may arise through 122 neural dynamics. We then investigated the effects of NMDA-R hypofunction in the circuit model, by 123 perturbing NMDA-R function at distinct synaptic sites. Perturbations could either raise or lower the E/I 124 ratio, with each effect making dissociable predictions for evidence accumulation behaviour. These 125 model predictions were tested experimentally by administering monkeys with a subanaesthetic dose 126 of the NMDA-R antagonist ketamine (0.5mg/kg, intramuscular injection). Ketamine produced decision-127 making deficits consistent with a lowering of the cortical E/I ratio. 128

131
To study evidence accumulation behaviour in non-human primates, we developed a novel two-132 alternative perceptual decision-making task (Fig1a    To further probe the pro-variance bias, we studied choices from a larger pool of 'Regular' trials in 207 which the mean evidences and variabilities of the two streams were set independently on each trial 208 (Fig4a, b, Supplementary Fig. 2). 'Regular' trials allowed us to explore the pro-variance bias across 209 a greater range of choice difficulties (Fig4c) and quantitatively characterise its effect using regression 210 analysis. On 'Regular' trials, subjects also demonstrated a preference for options with broadly 211 distributed evidence. Regression analysis confirmed that evidence variability was a significant 212 predictor of choice (Fig4d; see Methods). In addition, we defined the pro-variance bias (PVB) index 213 as the ratio of the regression coefficient for evidence standard deviation over the regression 214 coefficient for mean evidence. This acted as a unitless measure of the pro-variance bias over the 215 subjects' sensitivity to the net evidence for choice selectivity. A PVB index value of 0 thereby indicates 216 no pro-variance bias, whereas a PVB index value of 1 indicates the subject is as sensitive to evidence 217 standard deviation as they are to mean evidence. The PVB index thus provides a quantitative 218 measure of the pro-variance bias. From the 'Regular' trials, the PVB index across both monkeys was  Recent work has suggested that when traditional evidence accumulation tasks are performed, it is 237 hard to dissociate whether subjects are combining information across samples, or whether 238 conventional analyses may be disguising a simpler heuristic 32,33 . In particular, an alternative decision-239 making strategy which does not involve temporal accumulation of evidence is to detect the single 240 most extreme sample. Because the extreme sample will occur at different times in each trial, if a 241 subject employed this strategy, the choice regression weights across time points would be distributed 242 as in Fig2c,d. Therefore, it is possible for these findings to be mistakenly interpreted as reflecting 243 evidence accumulation. We wanted to quantitatively confirm that subjects were using the strategy we 244 envisioned when designing our task, namely evidence accumulation. Additionally, we wanted to 245 further investigate the relative contributions of mean evidence and evidence variability on choices. A 246 logistic regression approach probed the influence upon choice of mean evidence, evidence variability, 247 first/last samples, and the most extreme samples within each stream (Supplementary Fig. 2e,h, see 248 Methods). A cross-validation approach revealed choice was principally driven by the mean evidence, 249 verifying that subjects performed the task using evidence accumulation (Supplementary Table 1, see 250 Methods). 251 Although this analysis revealed choices were not primarily driven by an 'extreme sample detection' 252 decision strategy, another concern was whether partially employing this strategy could explain the 253 pro-variance effect we observed. To address this, we compared the influence of 'evidence variability' 254 versus the influence of 'extreme samples' on subjects' choices. Cross-validation revealed that choices 255 were better described by a model incorporating evidence variability, rather than the extreme sample 256 values (Supplementary Table 2). We also demonstrated that including evidence variability as a co-257 regressor improved the performance of all combinations of nested models (Supplementary Table 3).

258
In summary, it can be concluded that although subjects integrated across samples, they were 259 additionally influenced by sample variability.

285
Existing algorithmic-level proposals for generating a pro-variance bias in human decision-making rely 286 on the disregarding of sensory information before it enters the accumulation process, depending on 287 its salience 19 . To investigate a possible alternative basis for the pro-variance bias, at the level of 288 neural implementation, we sought to characterise decision-making behaviour in a biophysically-289 plausible spiking cortical circuit model (Fig5a, b, Supplementary Fig. 3

327
To understand the origin of the pro-variance bias in the spiking circuit, we mathematically reduced the 328 circuit model to a mean-field model (Fig6a), which demonstrated similar decision-making behaviour to 329 the spiking circuit (Fig6b, c, Supplementary Fig. 4). The mean-field model, with two variables 330 representing the integrated evidence for the two choices, allowed phase-plane analysis to further 331 investigate the pro-variance bias. A simplified case was considered where the broad and narrow 332 streams have the same mean evidence, and the stimuli evidence varies over time in the broad stream 333 but not the narrow stream (i.e. σ =0) (Fig6e-h). This example provides an intuitive explanation for the 334 pro-variance bias: a momentarily strong stimulus has an asymmetrically greater influence upon the 335 decision-making process than a momentarily weak stimulus. It can be shown that such asymmetry 336 arises from the expansive non-linearities of the firing rate profiles (Fig6d  Fig. 5b).  To explore these predictions experimentally, we collected behavioural data from both monkeys 392 following the administration of a subanaesthetic dose (0.5mg/kg, intramuscular injection) of the 393 NMDA-R antagonist ketamine (see Methods, Fig8, Supplementary Fig. 6). After a baseline period of 394 the subjects performing the task, either ketamine or saline was injected intramuscularly (  Fig. 6, 400 Supplementary   The results from our spiking circuit modelling also provided a parsimonious explanation for the cause 447 of the pro-variance bias within the evidence accumulation process. Specifically, strong evidence in 448 favour of an option pushes the network towards an attractor state more so than symmetrically weak 449 evidence pushes it away. In contrast, previous explanations for pro-variance bias proposed 450 computations at the level of sensory processing upstream of evidence accumulation. In particular, a 451 'selective integration' model proposed that information for the momentarily weaker option is discarded 452 before it enters the evidence accumulation process 19 . The minutes-long timescale of the NMDA-R mediated decision-making deficit we observed was also 490 consistent with the psychotomimetic effects of subanaesthetic doses of ketamine in healthy 491 humans 7,11 . As NMDA-R hypofunction is hypothesised to play a role in the pathophysiology of 492 schizophrenia 5,6,10,11 , our findings have important clinical relevance. Previous studies have 493 demonstrated impaired perceptual discrimination in patients with schizophrenia performing the 494 random-dot motion (RDM) decision-making task 15-17 . Although the RDM has predominantly been used 495 to study evidence accumulation 13 , previously this performance deficit in schizophrenia was interpreted 496 as reflecting a diminished representation of sensory evidence in visual cortex 15,18 . Based on our task 497 with precise temporal control of the stimuli, our findings suggest that NMDA-R antagonism alters the 498 decision-making process in association cortical circuits. Dysfunction in these association circuits may 499 therefore provide an important contribution to cognitive deficits -one that is potentially complementary 500 to upstream sensory impairment. Crucially, our task uniquely allowed us to rigorously verify that the 501 subjects used an accumulation strategy to guide their choices (cf. previous animal studies 13,14,46-48 ), 502 with these analyses suggesting the strategy our subjects employed was consistent with findings in 503 human participants. This consistency further ensures our findings may translate across species, in 504 particular to clinical populations. 505 506 Another related line of schizophrenia research has shown a decision-making bias known as jumping 507 to conclusions (JTC) 49,50 . The JTC has predominately been demonstrated in the 'beads task', a 508 paradigm where participants are shown two jars of beads, one mostly pink and the other mostly green 509 (typically 85%). The jars are hidden, and the participants are presented a sequence of beads drawn 510 from a single jar. Following each draw, they are asked if they are ready to commit to a decision about 511 which jar the beads are being drawn from. Patients with schizophrenia typically make decisions based 512 on fewer beads than controls. Importantly, this JTC bias has been proposed as a mechanism for 513 delusion formation. Based on the JTC literature, one plausible hypothesis for behavioural alteration 514 under NMDA-R antagonism in our task may be a strong increase in the primacy bias, whereby only 515 the initially presented bar samples would be used to guide the subjects' decisions. However, following 516 ketamine administration, we did not observe a strong primacy -instead all samples received roughly 517 the same weighting. There are important differences between our task and the beads task. In our 518 task, the stimulus presentation is shorter (2 seconds, compared to slower sampling across bead 519 draws), and is of fixed duration rather than terminated by the subject's choice, and therefore may not 520 involve the perceived sampling cost of the beads task 51 . 521 Our precise experimental paradigm and complementary modelling approach allowed us to 522 meticulously quantify how monkeys weight time-varying evidence and robustly dissociate sensory and 523 decision-making deficits -unlike prior studies using the RDM and beads tasks. Subjects were trained to perform a two-alternative value-based decision-making task. A series of 567 bars, each with different heights, were presented on the left and right-side of the computer monitor. 568 Following a post-stimulus delay, subjects were rewarded for saccading towards the side with either 569 the higher or lower average bar-height, depending upon a contextual cue displayed at the start of the 570 trial (see Fig1a inset). The number of pairs of bars in each series was either four 571 ('ShortSampleTrial') or eight ('LongSampleTrial') during trials in each standard behavioural 572 session. In this report, we only consider the results from the eight sample trials, though similar 573 results were obtained from the four sample trials. The number of bars was always six during 574 pharmacological sessions. 575 576 The bars were presented inside of fixed-height rectangular placeholders (width, 84px; height, 318px). 577 The placeholders had a black border (thickness 9px), and a grey centre where the stimuli were 578 presented (width, 66px; height, 300px). The bar heights could take discrete percentiles, occupying 579 between 1% and 99% of the grey space. The height of the bar was indicated by a horizontal black line 580 (thickness 6px). Beneath the black line, there was 45° grey gabor shading. 581 582 An overview of the trial timings is outlined in Fig1a. Subjects initiated a trial by maintaining their 583 gaze on a central, red fixation point for 750ms. After this fixation was completed, one of four 584 contextual cues (see Fig1a inset) was centrally presented for 350ms. Subjects had previously 585 learned that two of these cues instructed to choose the side with the higher average bar-height 586 ('ChooseHighTrial'), and the other two instructed to choose the side with the lower average bar-587 height ('ChooseLowTrial'). Next, two black masks (width, 84px; height, 318px) were presented for 588 200ms in the location of the forthcoming bar stimuli. These were positioned either side of the fixation 589 spot (6° visual angle from centre). Each bar stimulus was presented for 200ms, followed by a 50ms 590 inter-stimulus-interval where only the fixation point remained on the screen. Once all of the bar stimuli 591 had been presented, the mask stimuli returned for a further 200ms. There was then a post stimulus 592 delay (250-750ms, uniformly sampled across trials). Following this, the colour of the fixation point was 593 changed to green (go cue), and two circular saccade targets appeared on each side of the screen 594 where the bars had previously been presented. This cued the subject to indicate their choice by 595 making a saccade to one of the targets. Once the subject reported their decision, there were two 596 stages of feedback. Immediately following choice, the green go cue was extinguished, the contextual 597 cue was re-presented centrally, along with the average bar heights of the two series of stimuli 598 previously presented. The option the subject chose was indicated by a purple outline surrounding the 599 relevant bar placeholder (width, 3.8°; height, 10°). Following 500ms, the second stage of feedback 600 began. The correct answer was indicated by a white outline surrounding the bar placeholder (width, 601 5.7°; height, 15°). On correct trials, the subject was rewarded for a length of time proportional to the 602 average height of the chosen option (directly proportional on a 'ChooseHighTrial', negatively 603 proportional on a 'ChooseLowTrial'). On incorrect trials, there was no reward. Regardless of the 604 reward amount, the second feedback stage lasted 1200ms. This was followed by an inter-trial-interval 605 (1.946+/-0.051 secs; for Standard Session, across all completed included trials). The inter-trial-606 interval duration was longer on 'ShortSampleTrials' than 'LongSampleTrials', in order for the trials 607 to be an equal duration, and facilitate a similar reward rate between the two conditions. 608 609 Subjects were required to maintain central fixation from the fixation period until they indicated their 610 choice. If the initial fixation period was not completed, or fixation was subsequently broken, the 611 trial was aborted and the subject received a 3000ms timeout (Trials in standard sessions: Monkey 612 A -22.46%, Monkey H -15.27%). On the following trial, the experimental condition was not 613 repeated. If subjects failed to indicate their choice within 8000ms, a 5000ms timeout was initiated 614 (Trials in standard sessions: Monkey A -0%, Monkey H -0%). 615 616 Experimental conditions were blocked according to the contextual cue and evidence length. This 617 produced four block types (ChooseHighShortSampleTrial (H4), ChooseHighLongSampleTrial (H8), 618 ChooseLowShortSampleTrial (L4), ChooseLowLongSampleTrial (L8)). At the start of each 619 session, subjects performed a short block of memory-guided saccades (MGS) 61 , completing 10 620 trials. Data from these trials is not presented in this report. Following the MGS block, the first block 621 of decision-making trials was selected at random. After the subject completed 15 trials in a block, 622 a new block was selected without replacement. Each new block had to have either the same 623 evidence length or the same contextual cue as the previous block. After all four blocks had been 624 completed, there was another interval of MGS trials. A new evidence accumulation start block was 625 then randomly selected. As there were four block types, and either the evidence length or the 626 contextual cue had to be preserved across a block switch, there were two 'sequences' in which the 627 blocks could transition (i.e. H4→H8→L8→L4; or H4→L4→L8→H8, if starting from H4). Following 628 the intervening MGS trials, the blocks transitioned in the opposite sequence to those used 629 previously, starting from the new randomly chosen block. This block switching protocol was 630 continued throughout the session. At the start of each block, the background of the screen was 631 changed for 5000ms to indicate the evidence length of the forthcoming block. A burgundy colour 632 indicated an 8 sample block was beginning, a teal colour indicated a 4 sample block was 633 beginning. 634

635
The heights of the bars on each trial were precisely controlled. On the majority of trials (Regular 636 Trials, Completed trials in standard sessions: Monkey A -76.67%, Monkey H -76.23%), the 637 heights of each option were generated from independent Gaussian distributions (Fig4a, b). There 638 were two levels of variance for the distributions, designated as 'Narrow' and 'Broad'. The mean of 639 each distribution, μ, was calculated as μ = 50 + Z*σ, where Z ∼ (-0.25,0.25), and σ was either 12 or 640 24 for narrow and broad stimuli streams. The individual bar heights were then determined by ∼ (μ, 641 σ). The trial generation process was constrained so the samples reasonably reflected the generative 642 parameters. These restrictions required bar heights to range from 1 to 99, and the actual σ for each 643 stream to be no more than 4 from the generative value. On any given trial, subjects could be 644 presented with two narrow streams, two broad streams, or one of each. The evidence variability was 645 therefore independent between the two streams. For post-hoc analysis (Fig4) we defined one stream 646 as the 'Lower SD' option on each trial, and the other the 'Higher SD' option, based upon the 647 sampled/actual σ. 648 A proportion of 'irrationality trials' were also specifically designed to elucidate the effects of evidence 649 variability on choice, and whether subjects displayed primacy/recency biases 20 . These trials occurred 650 in equal proportions within all four block types. Only one of these irrationality trial types was tested in 651 each behavioural session. In all of these conditions, the generated samples had to be within 4 of the generating σ. This was because bar heights were rounded to the nearest integer (due to the limited number of 669 pixels on the computer monitor) after the generating procedure and the plot reflects the presented bar 670 heights. 671 Half-half trials (Completed trials in standard sessions: Monkey A -8.46%, Monkey H -8.00%) 672 probed the effect of temporal weighting biases on choice 20 . The heights of each option were 673 generated using the same Gaussian distribution (X∼ (μ HH , 12), where μ HH ∼ (40, 60)). This 674 distribution was truncated to form two distributions: X High {mean(X)-0.5*SD(X),∞}, and X Low {-∞, 675 mean(X)+ 0.5*SD(X)}. On each trial, one option was designated 'HighFirst' -where the first half of bar 676 heights was drawn from X High and the second half of bar heights drawn from X Low. This process was 677 also constrained so that the mean of samples drawn from X High had to be at least 7.5 greater than 678 those taken from X Low. to maintain subject motivation, the most difficult 'Regular' and 'HalfHalf' trials were not presented. 688 Following the trial generation procedures described above, in pharmacological sessions these trials 689 were additionally required to have >4 mean difference in evidence strength. Of the 'Narrow-Broad' 690 trials, only 'Ambiguous' conditions were used; but no further constraints were applied to these trials. In 691 some sessions, a small number of control trials were used, in which the bar heights for each option 692 were fixed across all of the samples. All analyses utilised 'Regular', 'Half-Half', and 'Narrow-Broad' 693 trials. Monkey H did not always complete sufficient trials once ketamine was administered. Sessions 694 where the number of completed trials was fewer than the minimum recorded in the saline sessions 695 were discarded (6 of 18 sessions). Following ketamine administration, Monkey A did not complete 696 fewer trials in any session than the minimum recorded in a saline session. 697

699
To assess decision-making accuracy during standard sessions, we initially fitted a psychometric 700 function 14,29 to subjects' choices pooled across 'Regular' and 'Narrow-Broad' trials (Fig2a, b). This The temporal weights of stimuli were calculated using logistic regression. This function defined the 718 probability (P L ) of choosing the left option: 719 where is a bias term, reflects the weighting given to the nth pair of stimuli, and reflect the 721 evidence for the left and right option at each time point. 722 Regression analysis was used to probe the influence of evidence mean, and evidence variability on 723 choice during the 'Regular' trials (Fig4d, 5f, 6c, 7f-h, 8d-f, Supp2d,g, Supp6c,h). This function 724 defined the probability (P L ) of choosing the left option: 725 process (FigSupp 2e,h, Supp 3b, Supp 4b, Supp 5b, Supp 6d,i). 743 The goodness-of-fit of various regression models with combinations of the predictors in the full model 744 (equation 6) were compared using a 10-fold cross-validation procedure (Supplementary Tables 1-4).

745
Trials were initially divided into 10 groups. Data from 9 of the groups was used to train each 746 regression model and calculate regression coefficients. The likelihood of the subjects' choices in the 747 left-out group (testing group), given the regression coefficients, could then be determined. The log-748 likelihood was then summed across these left-out trials. This process was repeated so that each of 749 the 10 groups acted as the testing group. The whole cross-validation procedure was performed 100 750 times, and the average log-likelihood values were taken. 751 To initially explore the time course of drug effects on decision-making, we plotted choice accuracy 752 (combined across 'Regular', 'Half-Half' and 'Narrow-Broad' trials) relative to drug administration 753 (Fig8a). Trials were binned relative to the time of injection. Within each session, choice accuracy was 754 estimated at every minute, using a 6-minute window around the bin centre. Accuracy was then 755 averaged across sessions. To further probe the influence of drug administration on decision-making, 756 we defined an analysis window based upon the time course of behavioural effects. All trials before the 757 time of injection were classified as 'pre-drug'. All trials beginning 5-30 minutes after injection were 758 defined as 'on-drug' trials. These trials were then analysed using the same methods as described for 759 the Standard sessions. 760 To quantify the effect of ketamine administration on the PVB index (Fig 8f, FigSupp 6c,h), we 761 performed a permutation test. Trials collected during ketamine administration were compared with 762 those collected during saline administration. The test statistic was calculated as the difference 763 between the PVB index in ketamine and saline conditions. For each permutation, trials from the two 764 sets of data were pooled together, before two shuffled sets with the same number of trials as the 765 original ketamine and saline data were extracted. Next, the PVB index was computed in each 766 permuted set, and the difference between the two PVB indices calculated. The difference measure for 767 each permutation was used to build a null distribution with 1000000 entries. The difference measure 768 from the true data was compared with the null distribution to calculate a p-value. 769

771
A biophysically-based spiking circuit model was used to replicate decision making dynamics in a local 772 association cortical microcircuit. The model was based on 21 , but with minor modifications from a 773 previous study 34 . The current model had one extra change in the input representation of the stimulus, 774 described in detail below. 775 The circuit model consisted of = 1600 excitatory pyramidal neurons and = 400 inhibitory 776 interneurons, all simulated as leaky integrate-and-fire neurons. All neurons were recurrently 777 connected to each other, with NMDA and AMPA conductances mediating excitatory connections, and 778 GABA A conductances mediating inhibitory connections. All neurons also received background inputs, while selective groups of excitatory neurons (see below) received stimulus inputs. Both background 780 and stimulus inputs were mediated by AMPA conductances with Poisson spike trains. 781 Within the population of excitatory neurons were two non-overlapping groups of size , = 240. 782 Neurons within the two groups received separate inputs reflecting the left and right stimuli streams. 783 Neurons in the same group preferentially connected to each other (with a multiplicative factor > 1 784 to the connection strength), allowing integration of the stimulus input. The connection strength to any 785 other excitatory neurons was reduced by a factor < 1 in a manner which preserved the total 786 connection strength. Due to lateral inhibition mediated by interneurons, excitatory neurons in the two 787 different groups competed with each other. Inhibitory neurons, as well as excitatory neurons not in the 788 two groups, were insensitive to the presented stimuli and were non-selective toward either choices or 789 the respective neuron groups. 790 Momentary stimuli bar evidences were modelled as Poisson inputs (from an upstream sensory area) 791 to the two groups of excitatory neurons (Fig5a). The mean rate of Poisson input for any group, , 792 linearly scaled with the corresponding stimulus evidence: 793 = + (ℎ − 50) ( 7 ) 794 where ℎ ∈ 0,100 represented the momentary stimulus evidence, equal to the bar height in 795 ChooseHigh trials, and 100 minus bar height in ChooseLow trials. = 30 was the input strength 796 when ℎ = 50, and = 1 . For simplicity, we assumed each bar stimulus lasted 250ms, rather than 797 200ms with a subsequent 50ms inter-stimuli interval as in the experiment. 798 The circuit model simulation outputs spike data for the two excitatory populations, which are then 799 converted to population activity smoothened with a 0.001s time-step via a casual exponential filter. In 800 particular, for each spike of a given neuron, the histogram-bins corresponding to times before that 801 spike receives no weight, while the histogram-bins corresponding to times after the spike receives a 802 weight of filter exp filter , where Δ is the time of the histogram-bin after the spike, and filter =20ms. 803 From the population activity of the two excitatory populations, a choice is selected 2s after stimulus 804 offset, based on the population with higher activity. Stimulus inputs in general drive categorical, 805 winner-take-all competitions such that the winning population will ramp up its activity until a high 806 attractor state (>30Hz, in comparison to approximately 1.5Hz baseline firing rate), while suppressing 807 the activity of the other population below baseline via lateral inhibition (Fig5b). It is also possible that 808 neither population reaches the high-activity state. Both populations, remaining at the spontaneous 809 state, will have similarly low activities, such that the decision readout is random. 810 In addition to the control model, three perturbed spiking circuit models were considered 25,34 : lowered 811 E/I balance, elevated E/I balance, and sensory deficit. E/I perturbations were implemented through 812 hypofunction of NDMARs (Fig7a), as this is a leading hypothesis in the pathophysiology of 813 schizophrenia 4,5,10 . NMDA-R antagonists such as ketamine also provide a leading pharmacological 814 model of schizophrenia 7,11 . NMDA-R hypofunction on excitatory neurons (reduced → ) resulted in 815 lowered E/I ratio, whereas NMDA-R hypofunction on interneurons (reduced → ) resulted in elevated 816 E/I ratio due to disinhibition 34 . Sensory deficit was implemented as weakened scaling of external 817 inputs to stimuli evidence, resulting in reduced ′ . For the exact parameters, the lowered E/I model 818 reduced → by 1.75%, the elevated E/I model reduced → by 3.5%, and the sensory deficit model 819 had ′ = 0.74 . 820 Each of the four circuit models completed 94,000 'Regular' trials, where both streams are narrow in 821 25% of the trials, both streams are broad in 25% of the trials, and one stream is narrow and one is 822 broad in 50% of the trials. All trials were generated identically as in standard session experiments. 823 The control model also completed 47,000 standard session Narrow-Broad trials. The same 824 permutation test described earlier for comparing PVB index between ketamine and saline conditions 825 was also used to quantify whether various perturbed circuit models have different PVB indices relative 826 to the control model (Fig 7h). 827

830
The current spiking circuit model was mathematically reduced to a mean-field model, as outlined in 62 , 831 in the same manner as from 21 to 22 . The mean-field model consisted of two variables, namely the 832 NMDA-R gating variables of the two groups of excitatory neurons, which represented the integrated 833 evidence for the two choices. Using phase-plane analysis, the mean-field model provided an intuitive 834 explanation for the pro-variance bias (see Fig6). 835 The mean-field model completed 94,000 standard session 'Regular' trials, in the same manner as the 836 circuit models. 837

839
Stimuli generation and data analysis for the experiment were performed in MATLAB. The spiking 840 circuit model was implemented using the Python-based Brian2 neural simulator 63 , with a simulations 841 time step of 0.02ms. Further analyses for both experimental and model data were completed using 842 custom-written Python and MATLAB codes. All codes are available from the authors upon reasonable 843 request. 844 845 846