Eye Movements Reflect Causal Inference During Episodic Memory Retrieval

During episodic memory retrieval, eye movements tend to distinguish between studied and unstudied items, a tendency known as “retrieval-dependent eye movements” (RDEs). However, what cognitive processes drive RDEs, and especially whether they are different from those that drive explicit choices, remains unknown. Here we dissect the cognitive processes underlying RDEs by modelbased analyses of a false memory paradigm. Participants first memorized object-location pairs on a circular array (“learning”). They then saw object-location pairings allegedly produced by another participant in the upcoming memory test, and judged their correctness (“suggestion”). Finally, participants indicated the location of each object themselves (“retrieval”). A Bayesian cuecombination model that performed causal inference to assess whether the noisy memories of the learned and suggested object-location pairs (the two “cues”) were from the same sources, and combined the memories accordingly, fit participants’ explicit responses well. We also found that eye movements reflected the learned and the suggested stimulus even after controlling for the effects of explicit responses. Thus, RDEs contain information beyond that present in explicit responses, and they reflect the dynamics of the causal inference process underlying memory retrieval.


Introduction
It is well known that episodic memory retrieval involves "retrieval-dependent eye movements" (RDEs; Johansson, Holsanova, Dewhurst, & Holmqvist, 2012;Johansson & Johansson, 2013;Richardson & Spivey, 2000;Staudte & Altmann, 2017): in a recognition task, participants' eye movement patterns distinguish between studied and unstudied items. However, it has been debated whether RDEs show such differentiation when participants fail at making correct explicit responses, and relatedly, whether RDEs are driven by explicit or implicit forms of memory (Hannula, Baym, Warren, & Cohen, 2011;Smith, Hopkins, & Squire, 2006;Nickel, Henke, & Hannula, 2015;Smith & Squire, 2017;Urgolites, Smith, & Squire, 2018). Here we used a location memory task to compare how RDEs are affected by a memorized location when it is recognized versus forgotten, separately from the effect of mere exposure. We did so by asking participants to indicate a studied (or "learned") location as well as correctness of an unreliable "suggestion": we expected participants to ignore the suggested location when they deemed it wrong, although they were exposed to it. We then cast memory retrieval as Bayesian causal inference, whereby participants try to infer whether the suggestion on a given trial was correct andbased on that-what the learned location could have been, from unreliable representations of both the learned and the suggested locations. We show (1) that such a model fits participants' responses well, (2) that we can decode the learned and the suggested stimulus from participants' responses, and (3) that participants' gazes are attracted to the learned and suggested locations, even when they differ from each other and from the responded location, and that these effects depend on whether the suggestion is deemed correct.

Methods Participants
A total of 17 participants performed the task (9 females; age 19-27). The experimental protocol was approved by the Institutional Review Board of Central European University. Participants gave written informed consent before starting the experiment.

Task
Participants were presented with a series of object-location pairs on a circular array and were asked to remember the location of each object for a later memory test ("learning phase"; Figure 1). Next, participants were presented with (50% correct) object-location pairs allegedly produced by another participant in the upcoming memory test and were asked to judge the correctness of each of these pairs in light of the learning phase ("suggestion phase"). Finally, participants were asked to indicate the location of each object themselves ("retrieval phase"). They were asked to wait while a target image (2 s), a blank screen (0.5 s), and an empty array of locations (2 s) were presented, after which a mouse cursor appeared at the center of the screen, prompting them to respond by clicking one of the 12 locations that they thought was paired with the target image in the learning phase.

902
This work is licensed under the Creative Commons Attribution 3.0 Unported License. To view a copy of this license, visit http://creativecommons.org/licenses/by/3.0

Analysis of Explicit Responses
Our ultimate goal is to examine how much information gaze locations have about the learned and suggested locations (L and S), over and beyond what the explicit responses have (remembered location R and subjective, or "deemed", correctness of the suggestion D). As a baseline, we examined how much information the explicit responses have about the learned and suggested locations using a Bayesian decoding approach (the performance of a Bayesian decoder measured by cross-entropy is equivalent to the mutual information up to a change in sign and an additive constant). We will include gaze locations for decoding as a next step. We constructed a Bayesian ideal observer model (the "encoder"),P(R, D|z L , z S ; θ en ), that generated the explicit responses about the remembered location (R) and the subjective correctness of the suggestion (D) based on noisy memory representations (z L and z S ) of the learned and suggested locations (L and S; Figure 2A). We first fit the parameters of this encoder model (θ en ) by maximizing the likelihood ∏ jP (R j , D j |L j , S j ; θ en ), where j indexes trials, and the predictive distribution for each trial is obtained by integrating out the noisy memory representations of the ideal observer that are unknown to the experimenterP(R, D|L, S; θ en ) = P (R, D|z L , z S ; θ en ) P(z L , z S |L, S, θ en ) dz L dz S . We then fixed the parameters θ en , and examined how much information the explicit responses had about the stimuli, by decoding L j or S j using R j and D j . We measured decoding performance by cross entropy: is the "decoding" distribution obtained by the Bayesian inversion of the predictive distribution of the encoding model (see above), and N is the number of trials.
To ensure that we don't lose information by using the Bayesian ideal observer model, we also fit (1) parametric decoders where the parameters (called θ de ) are directly optimized to maximize the decoding performance, and (2) nonparametric decoders that do not assume Bayesian inference. To prevent overfitting, we used 10-fold cross-validation for every model in evaluation.

Analysis of Eye Movements
We analyzed eye movements during the retrieval phase from the target image onset until the onset of the mouse cursor (4.5 seconds after the target image onset), in order to determine how much information about the learned and suggested object-location pairings can be recovered from retrieval-dependent eye movements. To dissociate the effects of the learned, suggested, and responded locations, we constructed a multiple regression model of the following form for the (two-dimensional) gaze location y t in a given time bin indexed by t: where β 0,t is an overall bias, x R is the responded location on trials when the response differed from the learned and For visualization, we used every 100 ms time bin in the first 4.5 s from the target image onset in the retrieval phase as t ( Figure 3). For statistical tests, we used the difference of the average gaze position between 0.5-1 s after the array onset (which is of interest: Figure 3, gray bar on the time axis) and the 0.5-second period prior to the target image onset (as a baseline).

Task Performance
Participants' performance in responding with the learned location was higher when it matched the suggested location (i.e., P(R = L|L = S) > P(R = L|L = S), 65% vs. 47%; p < 0.001, sign test across participants; for reference, chance performance would be ∼8.3%), indicating that they used the suggestion adaptively. They also deemed the suggested location correct more when it matched the learned location (i.e., P(D = 1|S = L) > P(D = 1|S = L), 82% vs. 22%; p < 0.001, sign test across participants), and responded to the suggested location more when it was deemed correct (i.e., P(R = S|D = 1) > P(R = S|D = 0), 70% vs. 9%; p < 0.001, sign test

Causal Inference Explains Explicit Responses
The Bayesian ideal observer model performing causal inference was able to fit participants' response patterns.
(1) The response (R) followed the suggestion (S) more when it was deemed correct ( Figure 2B, left, red curve and markers), compared to when it was deemed wrong (blue).
(2) The response interpolated between the learned and suggested locations when the suggestion was deemed correct (the red curve and markers are between the horizontal line and the diagonal line; Shams & Beierholm, 2010). (3) The suggestion was deemed correct more often when it was close to the learned location ( Figure 2B, right).

Decoding Stimuli From Explicit Responses
The decoder using the Bayesian ideal observer model (whose parameters were optimized to predict R and D given L and S) predicted L successfully from responses R and D.
Its performance was on par with the decoder directly optimized to predict the stimuli, and with the nonparametric decoder, suggesting that using the Bayesian ideal observer model did not lose information ( Figure 2C, left: CE(L|R, D; θ en ) − CE(L|R, D) = −0.005 ± 0.003 (p = 0.13) and CE(L|R, D; θ en ) − CE(L|R, D; θ de ) = −0.002 ± 0.001 (p = 0.14)). The same held for decoding S from R and D ( Figure 2C, right: CE(S|R, D; θ en ) − CE(S|R, D) = −0.005 ± 0.003 (p = 0.14) and CE(S|R, D; θ en ) − CE(S|R, D; θ de ) = −0.002 ± 0.001 (p = 0.14)). Note that the Bayesian ideal observer model's performance was slightly, although not significantly, better than the models directly optimized for decoding in the above comparisons which are done in the test set (but, reassuringly, not in the training set; data not shown), indicating that it was a faithful model of participants' behavior.

Analysis of Eye Movements
As expected, we found that the learned location attracted gaze during the retrieval phase. This was the case even when it was different from the suggested and the responded locations when the suggested location was deemed wrong (β L,D=0,t = 0.069 ± 0.024, p = 0.01; Figure 3, top, shaded interval). Note that any contribution to gaze locations from suggested or responded location is regressed out by the regressors for them (β S,D=0,t , β R,D=0,t , and β S=R,D=0,t ), so β L,D=0,t represents the "pure" effect of the learned location. Surprisingly, we found that the suggested location, too, attracted gaze during the retrieval phase. This was the case even when the suggested location differed from the learned or responded locations, but only when it was deemed correct (β S,D=1,t = 0.20 ± 0.08, p = 0.01; Figure 3, bottom, shaded interval). Again, note that any contribution to gaze locations from learned or responded location is regressed out by the regressors for them (β L,D=1,t , β R,D=1,t , and β L=R,D=1,t ).

Discussion
We found that the learned location attracted gazes independent of the suggested or responded locations, and that its ef- fect depended on whether the suggestion was deemed correct. Conversely, we also found that the suggested location attracted gazes independent of the learned or responded locations, only when it was deemed correct. These results suggest that RDEs are not merely a reflection of an exposure or a rehearsal of a response to be made; instead they suggest that RDEs reflect a causal inference process where the relevance of the exposure (learned and suggested locations) is inferred based on the similarity between the unreliable memories of the learned and suggested locations (cf. Shams & Beierholm, 2010).
To clarify how memories affect RDEs, we plan to decode the learned and suggested locations from gazes, in order to compare the information contained in gazes on an equal footing with that contained in explicit responses. While we do not have direct access to the internal memory representations of the participants, we can infer them using the Bayesian ideal observer model (which fits participants' inference process well, as it could decode the stimuli from explicit responses.) This will help reconcile the debate about whether RDEs depend on explicit or implicit forms of memory, by telling us whether RDEs are generated after explicit responses are determined (and hence contains no further information about the stimuli), or RDEs derive from the internal representation of the memory at least in part separate from the explicit responses. Our regression analyses already suggest that eye movements indeed contain information separate from the explicit responses, although the information is not measured yet in the same units.