Working with Episodic Memory: The N-back Task

We present a model of how working memory (WM) and episodic memory (EM) interact in the n-back task. Contrary to previous models in which information is actively maintained in WM, our model posits that information about previous stimuli is retained exclusively in EM. Unlike WM-based active maintenance, which has limited maintenance capacity, EM-based storage has unlimited storage capacity but is subject to proactive interference. Using the model we show that benchmark phenomena ordinarily attributed to use of a limited-capacity WM system (the set size effect and the lure interference effect) can also arise in a model with no such maintenance constraints.


Introduction
The n-back task is one of the most commonly used tasks for indexing WM function. In this task, a stream of stimuli is presented and, for every stimulus, the participant must indicate whether the current stimulus matches the stimulus that occurred n-back in the sequence. The two most robust phenomena observed in the context of this task are (i) the setsize effect: the larger the value of n, the lower the accuracy and higher the reaction times; and (ii) the lure interference effect: the presence of a matching stimulus in the recent nontarget past (i.e. when the current stimulus matches the one n-1-back) increases the false alarm rate (see Oberauer et al., 2018). Note that performance of this task requires the coordination of several cognitive operations: the identity and ordinal position of the last n items must be retained, the current item must be compared to the one in the n-th position ago, and then the collection of items held in memory and their positions must be updated prior to the next comparison operation. Despite the rich empirical literature on this task, we are aware of only two mechanistic models that address how it is performed (Juvina & Taatgen, 2007;Chatham et al., 2011). This relative dearth of modeling attempts means the mechanistic underpinnings of the above mentioned phenomena remain unclear. Juvina and Taatgen (2007) suggested participants can use two different strategies to perform the n-back task that they implemented in two corresponding ACT-R models. In the highcontrol model, a window of size n is actively maintained by a rehearsal process, and the ordinal position of each item is encoded by the item's position in this actively maintained window. However, because this model was developed within ACT-R, a symbolic framework, it does not address the neural mechanisms by which information about ordinal position and information about item identity are bound together in WM. To address this, Chatham et al. (2011) showed that a biologically plausible connectionist model of the prefrontal cortex (Frank, Loughry, & O'Reilly, 2001) successfully learned to perform the n-back task, and exhibited key features of behavioral and neural observations. Critically, like the high-control model of Juvina and Taatgen (2007), the prefrontal cortex based model of Chatham et al., (2011) relied on mechanisms that learned to actively maintain representations of stimuli and ordinal position in WM slots that served to bind these pieces of information together for each stimulus.
Both of the models discussed above align with the common assumption that the n-back task relies on the active maintenance of information in WM (Oberauer et al., 2018), and therefore that constraints on performance in the n-back task reflect the capacity limitation of WM. Consider for example the set size effect introduced above (Jonides, Schumacher, Smith, & Lauber, 1997). This is phenomenon, replicated across materials and tasks, is widely assumed to reflect the limitation in WM capacity (see Oberauer et al., 2018). This intuition is included in both the ACT-R high control model (Juvina & Taatgen, 2007) and the prefrontal network based model Chatham et al. as a limit in the number of slots available to WM. However, as noted above, the n-back task engages several constituent operations (e.g., encoding, maintenance, updating, and matching), any or all of which could be responsible for the observed effects.
One alternative to the capacity limitation account is the ACT-R low-control model proposed in Juvina and Taatgen (2007). In contrast to the active maintenance accounts, stimuli in this model are not actively maintained in a rehearsal window where ordinal position is encoded by slots. Instead, along similar lines to the time-tag account of Yntema and Trask (1963), each item is stored along with a time-tag that specifies the moment of encoding. To the extent that the memory component of this model does not interfere with ongoing processing, this account aligns with the idea that memories are being stored in EM instead of actively maintained in WM. However, unlike the high-control strategy which was implemented in Chatham et al. (2011), it remains unclear whether and how time-tags could be neurally implemented in a manner consistent with EM, and how such a model would give rise to behavioral phe-747 This work is licensed under the Creative Commons Attribution 3.0 Unported License. To view a copy of this license, visit http://creativecommons.org/licenses/by/3.0 nomena such as the set-size effect discussed above. Here we sought to address this theoretical gap.

Model Overview
The input to the model consists of two subfields, a stimulus subfield and a context subfield (together referred to as a percept). The model consists of two parts, an EM module that stores percepts experienced in the past, and a WM module that is trained to judge whether the current stimulus matches the stimulus that occurred n trials ago. On each trial the model observes a percept, which triggers a similarity based retrieval process from EM. The output of this process is provided along with the current percept to WM. WM then sequentially processes the retrieved memories, to assess whether or not the current stimulus matches the one occurring n trials back. After WM issues a response, the current percept is stored in EM and a new trial begins.

Model Details
Episodic Memory Overview The EM module aligns with a rich tradition of context-based models that address the temporal organization of free recall (Howard & Kahana, 2002;Polyn, Norman, & Kahana, 2009). In these models the retrieval process is mediated through associations between the item being retrieved and the context in which that item was encoded. Because this contextual representation drifts gradually over time, items that were observed together during encoding are also more likely to be retrieved together. Along similar lines, in the current model items are encoded into EM along with a drifting context representation. As will be elaborated below, this context representation serves to constrain recall to the most recent items, and to allow the model to recover serial order information.
Model Input On each trial, the model observes an input vector, composed of (i) a stimulus subfield and (ii) a context subfield. (i) The stimulus subfield of the percept is a normal random vector drawn from a finite set generated at the beginning of the experiment, and it corresponds to the stimulus (e.g. a the picture of a face or a string of letters) presented to participants performing the n-back task. To implement (ii) the context representation, the context subfield is initialized to the zero vector and, on each trial, a delta drawn from N(1,0.5) is added. This slowly drifting context subfield implements the intuition that the brain has representations that drift slowly over time, and that these representations serve to contextualize and organize retrieval processes (Howard & Kahana, 2002;Polyn et al., 2009). In the present model, this slowly drifting context representation also serves the additional role of encoding temporal information from which serial order can be recovered in the service of performing the n-back task.
Retrieval Upon the presentation of a percept, the model queries EM for similar percepts that occurred on past trials. Retrieval from EM occurs through similarity based look-up, which approximates the process of pattern completion assumed to occur in the hippocampus (Marr, Willshaw, & Mc-Naughton, 1991). First, the similarity between the current percept and the percepts stored in EM is computed. Second, the percepts are sorted from highest to lowest similarity. Finally, the EM retrieval process returns the percepts that exceed an established similarity threshold.

Working Memory
The output of EM's retrieval is then sequentially processed, along with the current percept, by WM. Notice that, in this model, WM serves no active maintenance function. Instead, the function of WM in this model is to (i) use temporal information contained in the context representation to compute ordinal information, (ii) identify matches between the current stimulus and stimuli retrieved from EM. To implement these functions, we used a long short term memory (LSTM; Hochreiter and Schmidhuber, 1997). On each trial, the LSTM is sequentially provided with the percept of the current trial followed by a variable length sequence of percepts retrieved from EM. After observing this input sequence, the LSTM is optimized to respond using one of two softmax output units: yes if the stimulus subfield of the n-back percept matches the stimulus subfield of the current input, and no otherwise.

Model training and evaluation
An important desideratum for our model was the ability, like humans, to perform the nback with arbitrary stimuli. Toward this end, the stimulus field of each input was random vectors drawn from a standard normal distribution. To keep the model general with respect to stimulus input, the set of random vectors from which stimuli were drawn were re-randomized every 10 trials, but during training and test. Similarly, although the context drift was always drawn from N(1,0.5), after every 10 trials this process was reset and a new draw was generated. After being trained with this regime, the model is evaluated on new stimulus sets and context drifts.

Set size effect
The set size effect refers to the degradation in performance as n increases . It is largely assumed that this reflects reliance on the active maintenance of stimulus information in capacity-limited WM (Oberauer et al., 2018). However, in the current model, stimulus information is not actively maintained in WM; rather, it is stored in EM which has no such capacity constraint. Nevertheless, it exhibits a qualitatively similar effect of set size. When comparing the model on the 1, 2 and 3-back versions of the task, accuracy is highest for the 1-back and lowest for the 3-back ( Figure 1A). This occurs because the ability of the model to estimate ordinal position degrades for longer temporal distances: because our model relies on a noisily drifting context representation to estimate temporal distance, this noise accumulates over time and thereby limits the model's ability to differentiate temporal distances occurring further in the past. Therefore, the present model proposes the alternative explanation that the set size effect arises in the context of the n-back task because estimates of longer temporal distances suffer from lower resolution. Furthermore, as has been empirically observed, the degradation in performance between 1 and 2-back is larger than between 2 and 3-back (Cohen et al., 1997).

Proactive interference
One of the characteristic signals of EM usage is proactive interference (PI): when information that is lingering in memory from the past interferes with ongoing processing. One possible indication that PI is happening is if performance is degraded on later trials compared to earlier trials (Fig 1C). In the model this happens because -at the beginning of the experiment -EM is empty, and since the model reverts to a "no match" response if nothing is retrieved from EM, WM can rely on the heuristic that if nothing is retrieved from EM, then the current trial is a trivial case of "no match". However as the experiment proceeds, and more items are stored in EM, there will be more potential candidates to be retrieved, and so WM will have to sift through a longer list, thereby increasing the chance for a false alarm. Note that, in the real world, EM is not empty at the start of an experiment; however, one can reasonably expect that the number of task-relevant memories stored in EM (i.e. from a similar context, with similar stimuli) will be relatively small, so the general point holds.
To test the model's susceptibility to PI, we compared performance on trials with high versus low PI. Since EM retrieval relies on similarity based look-up, increasing the similarity of the content that is stored in EM will increase the number of items that are returned by the retrieval process. Therefore, to manipulate PI, we compared high PI trials where fewer stimulus items reoccurred (e.g. A A B C B A) versus trials where nontarget stimulus items were trial unique (A B A C D E). Confirming the prediction that the model is sensitive to PI, degradation in performance was more pronounced for the high PI compared to the low PI condition ( Figure 1C).

Lure interference
Apart from the set-size effect, another highly robust phenomenon observed in the context of the n-back task is the lure interference effect. This effects refers to a decrease in accuracy when the current stimulus matches one of the stimuli occurring in the recent non-target past. This effect is especially pronounced when the current stimulus matches the one in the n-1 position. To investigate whether our model was sensitive to this effect, we looked at the model's performance on the 3-back task for trials in which the last probe matched the one n-1-back. As observed empirically, the model's accuracy was worse for these lure trials compared to positive (match) and negative (no lures) controls ( Figure 1B). This also arises in the model as a consequence of PI. Because EM retrieval is based on the similarity between the current percept and stored percepts, lure probes will have the effect of eliciting more EM retrievals from occurring in the non-target past, thereby increasing the chance that WM will false alarm.

Discussion
In sum, there are two key features that distinguish our model from standard accounts. First, the information held in mem-ory is not maintained in an active state and as such does not consume limited WM resources. Rather, consistent with an EM-based storage, there are no capacity limitations and information is held in a latent state until retrieved. However, since memory traces in EM persist indefinitely, the growth of information stored from the past has a propensity to proactively interfere with the present. Second, the present model differs from slot-based accounts with regard to how order information is encoded and processed. Instead of specifying order information by the slot in which the stimulus is encoded, order information is computed at retrieval from the temporal information contained in the slowly drifting context representation. This combination of mechanisms obviates the need for an operation that explicitly updates the maintenance window; rather, that emerges from the encoding and retrieval processes of time-stamped information in EM, and the ability of the LSTM to learn to estimate and evaluate temporal information from the drifting context.
Although it is often acknowledged that WM interacts with a long-term store largely analogous to the EM component in our model (for a review, see Nelson, 2017), previous accounts have not implicated this component in producing the set-size effect. Instead, this behavioral benchmark is usually taken as evidence that this task relies on active maintenance of stimuli in WM (Oberauer et al., 2018;Juvina & Taatgen, 2007;Chatham et al., 2011). However, even though past items are not actively maintained in WM in the current model, it still produces the set-size effect. Here, this effect arises not as a consequence of WM's capacity for active maintenance, but as a consequence of the characteristics of the long-term storage (EM) and its interactions with WM. As such, the present model shows that -under certain assumptions about the operating characteristics of the long-term storage (gradually drifting context and susceptibility to proactive interference, both of which are well-justified by the episodic memory literature; Norman, Detre, and Polyn, 2008) -qualitatively similar behaviors can arise in the absence of constraints on active maintenance. Importantly, this EM-based account is not mutually exclusive with a WM active maintenance or slot-based accountspeople may use either or a mixture of both strategies. Going forward, it will be important to refine the differential predictions of the two accounts, thereby allowing us to diagnose which strategy is being used in a particular situation. One promising route relates to the PI phenomena discussed earlier: while models that rely on WM will show some PI due to lingering recurrent activity (e.g., Chatham et al., 2011), an EM-based model will show a greater rise in PI over time, due to the persistent nature of EM traces. Another promising route relates to manipulations of temporal context: The EM-based model predicts greater sensitivity to disruption by manipulations that affect the context drift (e.g., semantic category changes; Polyn et al., 2009) than a slot-based WM model.