Statistical context dictates the relationship between feedback-related EEG signals and learning

Learning should be adjusted according to the surprise associated with observed outcomes but calibrated according to statistical context. For example, when occasional changepoints are expected, surprising outcomes should be weighted heavily to speed learning. In contrast, when uninformative outliers are expected to occur occasionally, surprising outcomes should be less influential. Here we dissociate surprising outcomes from the degree to which they demand learning using a predictive inference task and computational modeling. We show that the P300, a stimulus-locked electrophysiological response previously associated with adjustments in learning behavior, does so conditionally on the source of surprise. Larger P300 signals predicted greater learning in a changing context, but less learning in a context where surprise was indicative of a one-off outlier (oddball). Our results suggest that the P300 provides a surprise signal that is interpreted by downstream learning processes differentially according to statistical context in order to appropriately calibrate learning across complex environments.

Acknowledgements: 34 35 We would like to thank Julie Helmers and Andrea Mueller for their help collecting 36 EEG and behavioral data. This work was funded by NIH grants F32MH102009 37 and K99AG054732 (MRN), NIMH R01 MH080066-01 and NSF Proposal 38 #1460604 (MJF). RB was supported by a Promos travel grant from the German 39 Academic Exchange Service (DAAD). The funders had no role in study design, 40 data collection and analysis, decision to publish or preparation of the manuscript. 41 42 Competing interests: 43 44 The authors have no financial or non-financial conflicts of interest related to this 45 work. 46 an observed prediction error promotes measurable behavioral updating. While 91 changing environments require increased learning in the face of surprising 92 information, stable environments with outliers ("oddballs"), dictate less learning 93 from surprising information (4). People are capable of this type of robust learning 94 rate adjustment that deemphasizes surprising information (3,4,13), yet the 95 learning signals measured under such conditions do not correspond directly to 96 those observed in changing environments. Most notably, a number of candidate 97 learning signals measured through fMRI do not reflect learning rate when 98 considering a broader set of statistical contexts (4).

100
However, prior studies on EEG correlates of learning seem to favor the idea that 101 a late, stimulus-locked positivity referred to as the P300, tracks learning in a 102 broader range of statistical contexts. The central parietal component of the P300 103 (P3b) reflects surprise (14) and relates to learning (15) even after controlling for 104 the degree of surprise in changing environments (9,10). In a stationary 105 environment where integration of sequential samples is required to make a 106 subsequent decision, a late posterior positivity, reminiscent of the P300, predicts 107 the degree to which a particular sample influences the subsequent decision (16). 108 Interestingly, within this particular task, more surprising outcomes tended to exert 109 less influence on decisions (3,13), suggesting that this late positivity might 110 provide a general learning or updating signal, irrespective of statistical context. 111 This idea would be in line with a prominent theory of P3b function, which 112 emphasizes its role in updating context representations -sometimes defined in 113 terms of items stored in working memory (17)(18)(19)(20).

115
Here we tested the idea that the P3b provides a general learning signal that is 116 independent of the statistical context. In particular, we measured learning 117 behavior using a modified predictive inference task and normative learning model 118 and examined how learning behavior and surprise related to evoked potentials 119 measured through EEG. We found that people are capable of contextually 120 adjusting learning in response to surprise: they tended to learn more from 121 surprising outcomes when those outcomes were indicative of changepoints, but 122 learned less from surprising outcomes when those outcomes were indicative of 123 an oddball. Outcome evoked potentials reminiscent of a parietal P300 were 124 related to surprising events irrespective of context. The magnitude of this P300 125 response on a given trial positively predicted learning in the presence of 126 changepoints, but negatively predicted learning in the presence of oddballs. 127 These relationships persisted even when controlling for variability in learning 128 behavior that could be explained by the best behavioral model. Taken together 129 these findings suggest that the P300 does not naively reflect increased 130 behavioral updating, but may play a role in adaptively increasing or decreasing 131 learning in response to surprising information, depending on the statistical 132 context. 133 134 Results 136 137 We used EEG to measure electrophysiological signatures of feedback 138 processing while participants performed a modified predictive inference task (2) 139 designed to dissociate surprise from learning. Predictions were made in the 140 context of a video game that required participants to place a shield at a location 141 on a circle in order to block cannonballs that would be fired from a cannon 142 located at the center of the circle (Fig 1A). Surprise and learning were 143 manipulated independently using two different task conditions. In the oddball 144 condition, the aim of the cannon drifted slowly from one trial to the next (Fig 1B,  145 dotted line) and cannonball locations were distributed around the point of cannon Behavior of human participants and normative model 155 156 In both conditions, participants were instructed to place a shield on each trial in 157 order to maximize the chances of blocking the upcoming cannonball ( Figure  158 1B&C, orange line). However, behavior differed qualitatively in these two 159 conditions, which can be observed clearly in the example participant data in 160 Figure 1. In particular, shield placements were not updated in response to 161 extreme outcomes in the oddball condition (oddballs; Fig 1B) but were updated 162 dramatically in response to extreme outcomes in the changepoint condition 163 (changepoints; Fig 1C). 164 165 166   To quantitatively analyze the differences between the two task conditions, we 187 extended a previously developed normative learning model (2,7). The model 188 approximates optimal inference using an error-driven learning rule by adjusting 189 learning from trial to trial according to two latent variables. The first latent variable 190 tracks the probability with which the most recent outcome was generated from an 191 unexpected generative process (oddball probability in Fig 1D; changepoint  192 probability in Fig 1E), whereas the second latent variable tracks the model's 193 uncertainty about the true cannon aim (Fig 1D&E; uncertainty) shield position fairly minimal on trials that include a spike in oddball probability 203 (Fig 1 B,D), but fairly large on trials that include a spike in changepoint probability 204 (Fig 1 C,E).

206
The normative model also makes quantitative prescriptions for how learning 207 should be adjusted according to surprise differentially in the changepoint and 208 oddball conditions. The surprise of a given outcome can be measured crudely 209 through the degree to which a cannonball location differed from that which was 210 predicted (e.g., the shield position). Larger absolute prediction errors indicate a 211 higher degree of surprise, and higher oddball or changepoint probabilities 212 depending on the task condition. Learning in this task can be measured through 213 the degree to which a participant adjusts the shield position in response to a 214 given prediction error (2), and a fixed rate of learning would correspond to a 215 straight line mapping each prediction error onto a corresponding shield update, 216 where the slope of the line can be thought of as the learning rate ( Fig 2C, gray  217 lines). The normative learning model does not prescribe a fixed learning rate 218 across all levels of surprise; instead it prescribes higher learning rates for more 219 surprising outcomes in the changepoint condition (Fig 2C, orange) and lower 220 learning rates for more surprising outcomes in the oddball condition ( Fig 2C,  221 blue).  were variable from one trial to the next ( Fig 2D). To summarize the degree to 259 which updating behavior of individual subjects was contingent on key task 260 variables, we constructed a linear regression model that described trial-by-trial 261 updates in terms of prediction errors as well as key task variables thought to 262 modulate the degree to which prediction errors are translated into updates ( Fig  263  2E) including condition (changepoint versus oddball block), surprise (as 264 measured by changepoint or oddball probability estimates from normative 265 model), and their multiplicative interaction (capturing the degree to which learning 266 is increased for surprising outcomes in the changepoint context, but decreased 267 for surprising outcomes in the oddball context). As expected, prediction error 268 coefficients were positive, capturing a tendency for participants to update shield 269 position toward the most recent cannonball position (Fig 1F,  We took a data driven approach to identify electrophysiological signatures of 285 feedback processing. First we regressed feedback-locked EEG data collected 286 simultaneously with task performance onto an explanatory matrix that included 287 separate binary variables reflecting changepoint and oddball trials, amongst other 288 terms (Fig 3A, left). Spatiotemporal maps for changepoint and oddball 289 coefficients were combined to create a surprise contrast (changepoint + oddball) 290 and a learning contrast (changepoint -oddball) for each subject. Contrasts were 291 aggregated across subjects to create a map of t-statistics ( Fig 3A, right), and 292 spatiotemporal clusters of electrode/timepoints exceeding a cluster-forming 293 threshold were tested against a permutation distribution of cluster mass to 294 spatially and temporally organized fluctuations in voltage that related to task 295 variables.

297
When applied to the surprise contrast, this procedure yielded a large number of 298 significant clusters distributed across electrodes and timepoints ( Fig 3C). Two 299 clusters of positive coefficients occurring 350 to 700 ms after onset of the 300 cannonball location were of particular interest, given the consistency of their 301 timing and direction with the canonical p300 response. Examining the spatial 302 distribution of coefficients during this period reveals an early frontocentral locus temporal profiles of our clusters were consistent with what has been referred to in 321 previous literature as the P300, we will refer to the clusters peaking at 494 and 322 670 ms as early and late components of the P300, respectively.

324
In contrast to the EEG signature of surprise, which included a robust and 325 extended P300 response, the only signals identified by the learning contrast 326 (changepoint-oddball) were early (peak at 158 ms) and transient (Fig S3-1). 327 328

344
Behavioral relevance of the P300 345 346 Competing theories posit different functional roles for the signal underlying the 347 P300. In particular, some theories suggest that the P300 reflects a general 348 surprise signal, whereas others attribute a more specific role in accumulating 349 information, for example about the current state of the world. To test how early 350 and late P300 components may relate to learning behavior in our task we 351 extracted trial-to-trial measures of these components by taking the dot product of 352 the cluster t-map and each single trial ERP (Fig 4A, (23)). The dot product 353 indexes the degree to which a single trial ERP displays the profile of a given 354 spatiotemporal cluster, thereby allowing us to test the degree to which the 355 measured signal on any given trial might relate to behavior. We then examined 356 how trial-to-trial behavioral updates in shield position related to these single trial 357 EEG signal strengths using a regression model similar to that employed in the 358 behavioral analysis (Fig 4B). The regression model included two key terms to 359 characterize the influence of 1) the multiplicative interaction of prediction error 360 with the EEG signal strength, and 2) the interaction between prediction error, 361 EEG signal strength and condition. The first EEG-based term provided a 362 measure of the relationship between learning and the P300 that was independent 363 of condition, and thus allowed us to test the prediction that the P300 reflects a 364 direct learning signal ( Fig 4C). The second EEG-based term provided a measure 365 of the relationship between learning and the P300 that depended on condition 366 (conditional learning), and thus allowed us to test the prediction that any learning 367 impact of the P300 is bidirectionally sensitive to the source of surprise (Fig 4D). 368 369 370   derived from the regression model show that higher P300 signal strength predicts 402 more learning in the changepoint condition (Fig 4E, orange), but less learning in 403 the oddball condition (Fig 4E, blue). Thus, there was a systematic relationship 404 between P300 and learning, but that relationship was oppositely modulated by 405 the task condition and hence the inferred source of surprise.

407
The relationship between the P300 and participant learning behavior persisted 408 even after controlling for all known sources of variability in learning behavior. In a 409 model of shield updating behavior that included predictions from the behavioral 410 model described previously (Fig 2E)  We applied the same trial-by-trial behavioral analysis to the spatiotemporal 425 clusters identified in our learning (changepoint-oddball) contrast and did not find 426 systematic relationships between EEG signals and learning behavior (ps for all 427 coefficients and spatiotemporal clusters > 0.07; Fig S4-1) even when predictions 428 from our behavioral model (Fig 2E) were not included in the analysis.    The brain receives a steady stream of sensory inputs, but these inputs differ 471 dramatically from moment to moment in the degree to which they should affect 472 ongoing inferences about the world. People and animals do not treat each datum 473 in this stream the same, and instead tend to rely more heavily on some pieces of 474 information than others. Identifying the mechanisms through which these 475 adjustments occur could be an important step toward understanding why learning 476 occurs more rapidly in some domains or for some people, yet our understanding 477 of these mechanisms has been heavily conditioned on specific statistical 478 contexts, namely changing environments in which the degree to which one 479 should learn from information is closely coupled to the surprise associated with it. 480 Here we examined how relationships between learning and a specific brain 481 signal, the P300 evoked EEG potential, depend on the statistical context that 482 they are measured in. 483 484 We show that the P300 relates systematically to learning, but that the direction of 485 this relationship depends critically on the statistical context. In a context where 486 surprising events indicated changepoints (Fig 1C,E) and participants learned 487 more from surprising information (Fig 2), larger P300 responses predicted 488 increased learning (Fig 4). In contrast, in a context where surprising events 489 indicated oddballs (Fig 1B,D) and participants deemphasized surprising 490 information (Fig 2), larger P300 responses predicted reduced learning (Fig 4).

491
These context-dependent predictive relationships explained variance in learning 492 beyond what could be captured through computational modeling of behavior 493 alone (Fig 5), suggesting that the P300 signal may be involved in adjustments of 494 learning rate, but does so by mediating the subjective response to surprise, 495 rather than translating surprise into a conditionally appropriate learning signal.

497
Implications for theories of P300 function 498 499 Our findings are consistent with a number of studies that have suggested the 500 P300 is related to surprise (9,14,17,24), but extend them by demonstrating the 501 role of the signal in controlling the degree to which new information affects 502 updated beliefs. In contrast, our results are inconsistent with standard 503 interpretations of the context updating interpretation of the P300 (17-20). If the 504 P300 signal controlled the degree to which new information was loaded into 505 working memory one would expect a consistent positive relationship between the 506 P300 and learning across conditions (Fig 4C), but our results reveal that this 507 relationship differed markedly depending on the statistical context (Fig 4F,G).

509
However, as is the case with many verbal theories, predictions offered by the 510 context updating theory depend critically on the how specific concepts are linked 511 to actual mechanistic processes. If the definition of context were changed to 512 reflect the process that gave rise to the outcome (e.g., normal, changepoint, or 513 oddball), for example, and we assume that participants expected each trial to be 514 normal, then a context updating signal could account for our data (as recognizing 515 more confident recognition of changepoints should lead to more learning, but 516 more confident recognition of oddballs should lead to less learning). Thus, our 517 results constrain potential interpretations of the context updating theory, although 518 they do not falsify the theory altogether. 519 520 Similarly our results could also be viewed as constraining more recent theories 521 about P300 signaling. One more recent theory posits that the event locked 522 central parietal positivity reflects accumulated evidence for a particular decision 523 or course of action (25,26). When accumulated evidence is framed in terms of 524 the action ultimately executed (e.g., shield placement) one might extrapolate to 525 predict that P300 would predict higher learning in both contexts, which is not 526 what we observed ( Fig 4F&G). Nonetheless, it is difficult to extrapolate decision 527 variables to our continuous task, and there are other mechanistic schemes in 528 which an evidence accumulation signal over a binary decision categorizing 529 outcome type (normal versus oddball or changepoint) might give rise to our 530 observed results. Such an explanation would also call for response inhibition to 531 prevent premature responding before the default category (e.g., non-oddball trial) 532 was overturned, offering a potential link to another prominent theory of P300 533 function (24,27). Nonetheless, our data do not arbitrate between these theories, 534 and instead highlight their implications for learning when mechanistic 535 interpretations are refined and applied to our task and data. 536 537 Neural representations of surprise and updating 538 539 A key question that has motivated a number of recent studies is how does the 540 brain represent surprise differently than the belief updating it sometimes 541 prescribes. Under most conditions, the degree of surprise is tightly linked to the 542 update that is required. However, recent fMRI studies have exploited cued 543 updating paradigms (11), irrelevant stimulus dimensions (28,29), and 544 complementary statistical contexts (4) in order to tease apart neural 545 representations of surprise and updating. While there are trends that seem to 546 generalize across task boundaries (for example, dorsal anterior cingulate cortex 547 (dACC) reflecting updating in cued updating and irrelevant stimulus dimension 548 paradigms (11,29)) there is also a good deal of inconsistency across different 549 tasks in terms of the roles of specific signals. For example, even though BOLD 550 responses in dACC were identified as reflecting updating in two studies, they 551 were shown to represent surprise in another (4) and manipulations of statistical 552 context failed to reveal any brain regions that provide a pure updating signal (4).

554
One possible explanation for this discrepancy is that the component processes of 555 updating and non-updating might overlap in some specific paradigms. For 556 example, the oddball outcomes that led to reduced learning in our paradigm and 557 that of d'Acremont & Bossaerts were dissimilar to all previous outcomes and 558 indistinguishable on other feature dimensions (in contrast to (11)). Thus, while 559 these outcomes do not contain information pertinent to ongoing beliefs about 560 future outcomes, they did contain information critical for perception, namely that 561 prior expectations should not be used to bias their perceptual representations 562 (30). Interestingly, recent work has suggested that people dynamically adjust the 563 degree to which percepts are biased using systems, including the pupil linked 564 arousal system , that are closely linked to the systems implicated in adjusting 565 learning rate (6,30-34). Thus, one possible explanation for the inconsistency in 566 previous studies attempting to dissociate surprise from updating is that these 567 studies have differed in the degree to which they inadvertently manipulated 568 systems for controlling perceptual biases.

570
Like in the previous fMRI study relying on statistical context to dissociate learning 571 from surprise (4), our EEG results revealed a large number of signals related to 572 surprise and no signals that convincingly reflected learning rate in a context 573 independent manner. This comes as somewhat of a surprise given previous work 574 identifying EEG signals analogous to a late P300 component reflecting surprise, 575 predicting learning and influence on choice even in paradigms where this 576 influence was unrelated to surprise (3,9,10,15,16). In line with previous work from 577 fMRI studies, we interpret the differences in our results from what might have 578 been predicted based on previous work as pertaining to unique strategy we 579 employed for dissociating learning from surprise through the use of different 580 statistical contexts. 581 582 583 Mechanisms of learning rate adjustment 584 585 Our results, particularly when taken in the context of previous studies examining 586 how the brain adjusts learning in accordance with surprise, constrain possible 587 models of learning rate adjustment in the brain. We show that that the updating 588 P300 signal, which positively predicts learning in changing environments (Fig  589  4E), also negatively predicts learning in a context with infrequent statistical 590 outliers (Fig. 4E). Thus, in a most basic sense, our results suggest that the P300 591 signals reflects an early contribution to learning rate adjustment, and that this 592 signal is untangled according to statistical context at some downstream stage of 593 processing. The lack of robust ERP correlates of direct learning signals (Fig S3-1  594 & S4-1) suggests that this downstream process does not have a task-locked 595 electrophysiological signature. 596 597 One potential mechanism for learning rate adjustment that fits well with these 598 constraints is the notion that adjustments in learning might be implemented 599 through flexible replacement of state representations (35-37). Learning rate 600 adjustment is adaptive in changing environments because it can effectively 601 partition data relevant to the current predictive context from data that are no 602 longer relevant to prediction (21,22). One possible implementation of this 603 partitioning would be to change the active state representations that serve as the 604 substrate for contextual associations. Recent work has identified signals in OFC, 605 a region implicated in representations of latent states (38), that change more 606 rapidly during periods of rapid learning (39). If this is indeed the implementation 607 through which learning rate adjustments occur, observed learning rate signals 608 might actually signal the need to adjust the representation of the latent state.

610
Interestingly, replacement of the active latent state, or partitioning of data more 611 generally, might also be an effective way to implement the decreased learning 612 observed in response to surprising observations in the oddball condition of our 613 task. In the case of an oddball, one strategy would be to recognize the oddball as 614 having been generated by an alternative causal process (e.g., oddball 615 distribution) and to attribute learning to a latent representation of this process 616 (40). Under such conditions, implementation would require a surprise signal that 617 reflects the relevance of this oddball latent state. After the new observation is 618 attributed to the oddball context, the system would require a transition back into 619 the original "non-oddball" state in order to make a prediction that is unaffected by 620 the most recent oddball outcome. The more effectively surprise is recognized and 621 responded to through state changes (e.g., the stronger the surprise signal) the 622 more effectively this implementation would partition an oddball observation from 623 ongoing beliefs about the standard generative process, and therefore the smaller 624 learning rates would be. Thus, one mechanistic interpretation of the P300 results 625 might be that it is providing a partitioning signal that results in transitions in the 626 internal state representation, which can either increase or decrease learning 627 depending on the statistical context. 628 629 Confirming our proposed mechanistic interpretation of these results would require 630 future studies more closely relating P300 signals to purported state 631 representations (39). Furthermore, given that our study relied completely on 632 computational modeling and correlations with behavior, our results 633 raise important questions as to whether the observed associations could be 634 manipulated directly pharmacologically or through biofeedback paradigms. Thus, 635 our work provides new insight into the underlying mechanisms of learning rate 636 adjustment and the role of the P300 in this process, but leaves many 637 unanswered questions to be addressed in future research. 638 639 640 Methods 641 Participants 643 644 Participants were recruited from the Brown University community: n = 37, 21 645 female, mean age = 20.2 (SD = 3.1, range = 18-36). Behavioral data from all 646 participants was included in behavioral analyses. Data from 12 participants were 647 excluded from EEG analysis due to low data quality (> 25% of epochs rejected 648 during preprocessing). Thus, 37 participants were included in the behavioral 649 analyses and 25 participants were included in the EEG analyses. All human 650 subject procedures were approved by the Brown University Institutional Review 651 Board and conducted in agreement with the Declaration of Helsinki.

653
Cannon Task  654  655 Participants performed a modified predictive inference task programmed in 656 Matlab (The Mathworks, Natick, MA), using the Psychtoolbox-2 657 (http://psychtoolbox.org/) package. The task was based on predictive inference 658 tasks in which participants are asked to predict the next in a series of outcomes 659 (2,6,7), but differed from previous such tasks the following ways: 1) the outcomes 660 were generated from both changepoint and oddball processes to dissociate 661 learning from surprise, 2) information necessary for performance evaluation was 662 not available at time of outcome so that signals related to belief updating could 663 be dissociated from valenced performance evaluation signals, 3) the task space 664 was circular, and 4) the generative process was cast in terms of a cannon 665 shooting cannonballs.

667
Participants were instructed to place a shield at some position along a circle 668 subtending 5 degrees of visual angle in order to maximize the chances of 669 catching a cannonball that would be shot on that trial (Fig 1a). During an 670 instructional training period, the generative process that gave rise to cannonball 671 locations was made explicit to participants. During this phase, participants were 672 shown a cannon in the center of the screen. On each trial, a cannonball would be 673 "shot" from that cannon with some angular variability (Von Mises distributed 674 "Noise", concentration = 10 degrees). A key manipulation in our design was how 675 the aim of the cannon evolved from one trial to the next. The cannon would either 676 1) remain stationary on the majority of trials and re-aim to a random angle with an 677 average hazard rate of 0.14 (changepoint condition) or 2) change position slightly 678 from one trial to the next according to a Von Mises distributed random walk with 679 mean zero and concentration 30 degrees (oddball condition). In the changepoint 680 condition, all cannonballs were displayed as originating at the cannon in the 681 center of the circle, whereas in the oddball condition a small fraction (0.14) of 682 trials were oddballs, in which the cannonball location was sampled uniformly 683 across the entire circle and the cannonball appeared without a trajectory.

685
After completing the instructional training, in which the generative process was 686 fully observable, participants were asked to perform the same basic task without 687 being able to see the cannon. In this experimental phase participants were forced 688 to use knowledge of the generative structure gained during training, along with 689 the sequence of prior cannonball locations, in order to infer the aim of the cannon 690 and to inform shield placement. Participants completed four blocks of 60 trials for 691 each task condition (changepoint and oddball) in order randomized across 692 participants. The 240 experimental trials for each condition always followed the 693 instructional training period for that condition in order to minimize ambiguity over 694 which generative structure was giving rise to the experimental outcomes.

696
On each trial of the experimental task, participants would adjust the position of 697 the shield through key presses (starting at the shield position from the previous 698 trial) until they were satisfied with its location (Fig 1a; prediction phase). After 699 participants locked in their prediction (through a key press) there was a 500 ms 700 delay and then the cannonball location was revealed for 500 ms (Fig 1A;  701 outcome phase). The cannonball then disappeared for 1000 ms before it 702 reappeared, along with a full depiction of the participants shield ( Fig 1A; shield 703 phase). The shield was always centered on the position indicated by the 704 participant during the prediction phase, but differed in size from one trial to the 705 next in a random and unpredictable fashion that ensured subjects could not 706 predict whether they would successfully "catch" the cannonball during the 707 outcome phase. Thus, information provided during the outcome phase provided 708 all necessary information to update beliefs about the cannon aim, but did not 709 contain sufficient information to determine whether the cannonball would be 710 successfully caught on the trial. In addition to trial feedback provided during the 711 shield phase, participants were also provided information about their 712 performance at the end of each block that included the fraction of cannonballs 713 that were caught. Participants were paid an incentive bonus at task completion 714 that was based on the number of cannonballs that were caught. 715 716 717 Computational Model 718 719 Optimal inference in the changepoint condition would require considering all 720 possible durations of stable cannon position (21,22) but can be approximated by 721 collapsing the mixture of predictive distributions expected to arise from this 722 optimal solution into a single Gaussian distribution, which approximates the 723 posterior probability distribution over cannon locations, achieves near optimal 724 inference, reduces to an error driven learning rule in which learning rate is 725 adjusted from moment to moment according to environmental statistics, and 726 provides a detailed account of human behavior (2,7). Similarly, the ideal observer 727 for the oddball generative process would require tracking the predictive 728 distributions and posterior probabilities associated with each possible sequence 729 of oddball/non oddball trials that could have preceded the time step of interest. 730 Like in the changepoint condition, this algorithm can be simplified by 731 approximating the set of all possible predictive distributions with a single 732 Gaussian distribution, leading to an error driven learning rule in which learning 733 rate is adjusted dynamically from trial to trial, allowing us to derive normative 734 prescriptions for learning for both conditions. 735 736 While the normative model for the changepoint condition has been described 737 elsewhere (7) the analogous model for the oddball condition is not, and thus we 738 describe the normative account of oddball learning in full detail. In order to 739 minimize the differences between experienced and modeled latent variables, we 740 formulate our model in terms of the prediction errors made by participants on 741 each trial (rather than those that would have been made by the model) (7). On 742 each trial of the oddball condition, the normative model: 1) updated its 743 representation of uncertainty, 2) observed a prediction error and computed the 744 probability that the prediction error reflects an oddball, 3) computed the normative 745 learning rate by combining uncertainty (step 1) and oddball probability (step 2), 4) 746 adjusted prediction about cannon position according learning rate and prediction 747 error.

749
Relative uncertainty, which reflects the fraction of uncertainty about an upcoming 750 cannonball location that is due to imperfect knowledge of the cannon aim and is 751 analogous to the Kalman gain, was updated on each trial according to the most 752 recent observation (which should decrease uncertainty about cannon position) 753 and the expected drift in the aim of the cannon occurring between trials (which 754 should increase uncertainty about cannon position). Given that relative 755 uncertainty is expressed as a fraction of total uncertainty, it is useful to think of 756 the numerator of the fraction, or the estimation uncertainty over possible cannon 757 aims, which is the variance on a gaussian mixture distribution and is updated as 758 follows: where Ω ! is the probability that an oddball occurred on trial t, ! ! reflects the 763 variance on the distribution of cannonball locations around the true cannon aim 764 (noise), ! reflects the relative uncertainty on trial t, ! is the prediction error made 765 in predicting the outcome on trial t, and !"#$% ! reflects the degree to which the 766 cannon position drifts from one trial to the next. The first two terms in the model 767 reflect the oddball and non-oddball contributions to the updated uncertainty, the 768 third term reflects uncertainty resulting from the difference between predictions 769 for trial t+1 conditioned on an oddball or non-oddball having occurred on trial t, 770 and the last term reflects uncertainty resulting from the expected drift of the 771 cannon position between trials. Relative uncertainty for trial t+1 is then updated 772 as the updated fraction of uncertainty about the upcoming outcome that is 773 attributable to imprecise knowledge of the true cannon position, rather than to 774 noise in the distribution of exact cannonballs around that position: 775 776 The updated relative uncertainty, along with assumed knowledge of the overall 779 noise and hazard rate, were used to calibrate the oddball probability associated 780 with each new prediction error: 781 782 Where H is the average hazard of an oddball (0.14) and !!! is the new 785 prediction error, and the second term in the denominator reflects the probability 786 density on a normal distribution centered on the predicted location and with 787 variance derived from relative uncertainty. The model's prediction about cannon 788 aim was then updated according to a fraction of the prediction error !!! with the 789 exact fraction, or learning rate, determined according to the updated uncertainty 790 and oddball probability: Note that relative uncertainty ( !!! ) contributes positively to the learning rate, 795 whereas oddball probability ( Ω !!! ) reduces the learning that would otherwise be 796 dictated by the current level of uncertainty.

798
Behavioral analysis 799 800 Two key behavioral measures were extracted from each trial. First, the prediction 801 error on a trial was defined as the circular distance between the cannonball 802 location and the shield position for that trial. Second, the update on a given trial 803 was defined as the circular distance between the shield position on that trial and 804 the shield position on the subsequent trial (e.g., the updated shield position). In 805 order to better understand the computational factors governing adjustments in 806 shield position, we fit updates with a linear model that included an intercept term 807 to model overall biases in learning along with a prediction error term to capture 808 general tendencies to adjust the shield towards the most recent cannonball 809 location. The model also included additional terms to model how the influence of 810 recent cannonball locations changed dynamically according to task context. 811 These terms included: EEG Data for individual participants were analyzed using a mass univariate 841 approach. Specifically, the trial series EEG data for a given participant, channel, 842 and time relative to outcome onset was regressed onto an explanatory matrix 843 that included the following explanatory variables: 1) intercept, 2) changepoint, 3) 844 oddball, 4) condition, 5) catch . Explanatory variables 2 & 3 were binary variables 845 marking trials in which a surprising event occurred (i.e. changepoint or oddball) 846 whereas 4 reflected the overall task context (i.e. whether oddballs or 847 changepoints were present in the current statistical context), and 5 conveyed 848 whether the participant successfully "caught" the cannonball on each trial. 849 Surprise and learning contrasts were created as the sum and difference of the 850 changepoint and oddball coefficients, respectively. T-statistics were computed 851 across subjects to assess the consistency of contrasts at each electrode and 852 timepoint. 853

854
T-statistic maps were thresholded (cluster forming threshold of p=0.001, 2 tailed) 855 and spatiotemporal clusters were identified as temporally and/or spatially 856 contiguous signals that shared a common sign of effect and exceeded the 857 cluster-forming threshold. Cluster mass was computed as the average absolute t-858 statistic within a cluster times its size (number of electrode timepoints contained 859 within it). Cluster mass for each spatiotemporal cluster was compared to a 860 permutation distribution for cluster mass generated using sign flipping to correct 861 for multiple comparisons (41).

863
Trial-to-trial EEG analyses were conducted by computing the dot product of the t-864 statistic map for a given spatiotemporal cluster and the ERP measured on a 865 given trial. The resulting measure of EEG signal strength was then z-scored 866 across all trials and included in a behavioral regression model to explain trial-to-867 trial updating behavior. Like for the behavioral analyses, trial-to-trial updates were 868 regressed onto an explanatory matrix that included intercept and prediction error 869 terms to capture updating biases and static tendencies to update toward recent 870 cannonball locations. In addition, EEG informed regression models included 1) 871 the interaction between the EEG signal strength computed above and prediction 872 error (direct learning), and 2) the three-way interaction between EEG signal 873 strength, prediction error, and condition (conditional learning). Positive direct 874 learning coefficients indicated an unconditional increase in learning for trials in 875 which EEG signal strength was greater, whereas positive conditional learning 876 coefficients indicated a positive relationship between EEG signal strength and 877 learning in the changepoint condition but a negative relationship between EEG 878 signal strength in the oddball condition. In order to test the degree to which EEG-879 updating relationships persisted after accounting for variability in behavior that 880 could be captured by our computational model, we also used a version of the 881 EEG informed regression that additionally included the predicted update from the 882 behavioral model (y-hat) as an explanatory variable (Fig 5).