SEAM: An Integrated Activation-Coupled Model of Sentence Processing and Eye Movements in Reading

Models of eye-movement control during reading, developed largely within psychology, usually focus on visual, attentional, lexical, and motor processes but neglect post-lexical language processing; by contrast, models of sentence comprehension processes, developed largely within psycholinguistics, generally focus only on post-lexical language processes. We present a model that combines these two research threads, by integrating eye-movement control and sentence processing. Developing such an integrated model is extremely challenging and computationally demanding, but such an integration is an important step toward complete mathematical models of natural language comprehension in reading. We combine the SWIFT model of eye-movement control (Seelig et al., 2020, doi:10.1016/j.jmp.2019.102313) with key components of the Lewis and Vasishth sentence processing model (Lewis&Vasishth, 2005, doi:10.1207/s15516709cog0000_25). This integration becomes possible, for the first time, due in part to recent advances in successful parameter identification in dynamical models, which allows us to investigate profile log-likelihoods for individual model parameters. We present a fully implemented proof-of-concept model demonstrating how such an integrated model can be achieved; our approach includes Bayesian model inference with Markov Chain Monte Carlo (MCMC) sampling as a key computational tool. The integrated Sentence-Processing and Eye-Movement Activation-Coupled Model (SEAM) can successfully reproduce eye movement patterns that arise due to similarity-based interference in reading. To our knowledge, this is the first-ever integration of a complete process model of eye-movement control with linguistic dependency completion processes in sentence comprehension. In future work, this proof of concept model will need to be evaluated using a comprehensive set of benchmark data.


Introduction
What is the relationship between sentence processing and eye movements during reading?As an answer to this question, Just and Carpenter (1980, pp. 330-331) famously coined the eyemind assumption, which states that "the eye remains fixated on a word as long as the word is being processed", and that "there is no appreciable lag between what is being fixated and what is being processed".But what does it mean for a word to be "processed"?Just and Carpenter's model of reading has three stages: Encoding of the word form and lexical access, identification of relationships between the words in a sentence (such as agent-action-object), and integration with information from previous sentences.Once these three stages are finished, the eyes proceed to the next word. 1Just and Carpenter's processing model is highly serial, which matches most readers' subjective experience that sentences are processed in an incremental, left-to-right fashion (Snell & Grainger, 2019).However, while readers do tend to make fixations incrementally in the reading direction, fixation sequences are not always in serial order: Instead of systematically shifting the gaze from one word to the next -something that only happens in about 50% of fixations -readers also skip words, refixate the same word, or regress to previous words (Kliegl et al., 2004;Rayner, 1998).
This more complicated picture of reading aligns with the fact that the structure of many sentences in natural language does not correspond to simple agent-action-object sequences.Consider a sentence like (1), taken from Mertzen et al. (2023): (1) It turned out that the attorney whose secretary had forgotten that the visitor was important frequently complained about the salary at the firm.
In this sentence, there are several dependencies between non-adjacent words, most strikingly the long-distance dependency between the noun attorney and the verb complained.It is difficult to argue that the processing of the word attorney is finished once the preamble It turned out that the attorney . . .has been read: It is clear that a verb must arrive at some point of which attorney is the subject.Complete integration of attorney can thus only be achieved when complained is read after ten intervening words have been processed.It is therefore clear that the eyes will have to move forward even if the current word has not been completely integrated into the sentence structure.
A well-established assumption in sentence processing is that a noun like attorney is held in working memory until the dependency is completed, and needs to be retrieved when the verb is reached (Gibson, 1998(Gibson, , 2000;;Lewis et al., 2006).A strong interpretation of the eye-mind assumption would predict that, given that the processing of attorney is finalized at complained, readers should refixate attorney once lexical access of complained is complete.However, this is not what usually happens: While readers do make more regressions in more complex sentences that involve memory retrievals (e.g., Gordon et al., 2006;Jäger et al., 2015;Lee et al., 2007;Mertzen et al., 2023), regressive eye movements nevertheless occur only in a minority of trials.Furthermore, even in difficult sentences that may require multiple passes to parse correctly, readers do not necessarily regress to the most syntactically informative words in the sentence (e.g, Christianson et al., 2017;Engelmann et al., 2013;von der Malsburg and Vasishth, 2011;von der Malsburg and Vasishth, 2013).Thus, while there is undoubtedly a connection between sentence processing and eye movements (Clifton et al., 2007;Frazier & Rayner, 1982;Rayner, 1998), it is much less direct than posited by the strong version of the eye-mind assumption, as Reichle et al. (2009) have pointed out.On the other hand, there is evidence that readers can and do move their eyes into the vicinity of critical words (Inhoff & Weger, 2005;Meseguer et al., 2002;Mitchell et al., 2008;Schotter et al., 2014;Weger & Inhoff, 2007), which suggests the need for a model with some linguistically-mediated guidance of regressive eye movements.
Psycholinguistic studies of sentence processing typically rely on aggregated reading measures such as total fixation times, and models of language processing during reading, such as the classic Just and Carpenter (1980) model, usually ignore the complexity of eye-movement control.However, highly detailed models of eye-movement control do exist.An important line of work in cognitive psychology seeks to explain reading processes at the level of individual fixations and saccades by unpacking the underlying dynamics of the latent sub-processes involved.Several influential mathematical models of eye-movement control exist; a prominent example is the E-Z Reader model (Reichle et al., 2003).These models have historically focused on the effects of word-level properties such as word length, frequency, and predictability, and do not take into account higherlevel processes such as linguistic dependency completion.However, there have been several attempts at integrating models of sentence processing difficulty with eye-movement control, including E-Z Reader (Reichle et al., 2009), the model of Engelmann et al. (2013), and Über-Reader (Reichle, 2021;Veldre et al., 2020).These models focus on different aspects of sentence processing, and have been evaluated against corpus data, such as the Schilling corpus (Schilling et al., 1998).Two models that investigate the interaction between eye-movement control and sentence comprehension using data from planned experiments are reported in Vasishth and Engelmann (2022) and Dotlačil (2021); both these investigations use a highly simplified version of E-Z Reader, that is, the Eye Movements and Movement of Attention (EMMA) model embedded within the ACT-R architecture (Salvucci, 2001).The simplified EMMA model has important limitations; for example, as discussed in Engelmann et al. (2013), the model only allows regressive eye movements to the preceding word.
All of these existing models do capture a range of selected empirical phenomena and furnish important insights into the interaction between eye-movement control and sentence parsing processes.However, to our knowledge, no model exists that uses a fully specified mainstream model of eye-movement control that is integrated with a model of dependency completion in language comprehension; furthermore, as far as we are aware, such a detailed process model has never been evaluated using data from a planned psycholinguistic experiment.
A major difficulty in developing a more complex integrated model is that a considerable number of model parameters will need to be estimated using empirical data.For models of such complexity, conventional methods like grid search will lead to intractability.In order to implement such a complex model, Bayesian parameter estimation using the model's likelihood function (or an approximation) provides a rigorous approach to statistical inference (Rabe et al., 2021;Schütt et al., 2017).Two major advantages of the Bayesian approach are that parameters can be regularized or constrained a priori, which makes computation more efficient compared to the traditional grid search method, and that the uncertainty of the parameter estimates can be taken into account when evaluating model fit.Regularization makes parameter estimation more tractable, and incorporating the uncertainty of parameter estimates gives a more realistic picture of model fit (Nicenboim et al., 2023).Although Bayesian model fitting has been implemented for a basic reading model (Dotlačil, 2018), this line of work currently still neglects many low-level physiological and higher-level cognitive aspects of reading.
In this context, the major recent advance in Bayesian parameter inference for modeling process-based models has been proposed by Rabe et al. (2021) and Seelig et al. (2020) (for an overview, see Engbert et al., 2022).This line of work relies on the dynamical model of eye movement control developed by Engbert et al. (2005), and demonstrates how the Bayesian approach can be deployed in highly complex process models.Compared to other models of eye-movement control in reading such as E-Z Reader (Reichle et al., 2003), SWIFT has several advantages that make it a potentially better candidate for the purpose of integrating higher-level processing: It (1) is available for Bayesian parameter inference due to the likelihood implementation (Rabe et al., 2021;Seelig et al., 2020), (2) has a time-dependent word-activation field that can serve as the basis for memory encodings, and (3) has mechanisms that allow for long-range regressions, which are of particular interest when investigating dependencies that span several words.SWIFT and E-Z Reader also differ with regard to theoretical assumptions such as serial vs. parallel processing of words, but these are not our primary focus.Based on the methodological advances by Rabe et al. (2021) and Seelig et al. (2020), we are able to find an objective answer to the question: Can the complex lower-level cognitive and physiological principles of eye movements be integrated with a computational model of higher-level linguistic processing, taking into account the cost of long-distance dependency completion?Below, we present the Sentence-Processing and Eye-Movement Activation-Coupled Model (SEAM), a novel integrated model of sentence processing and eye movement control in reading.By combining the Saccade-Generation With Inhibition by Foveal Targets (SWIFT) model with the cue-based memory retrieval model proposed by Lewis and Vasishth (2005), we can integrate spatially-distributed processing in eye movement control with rule-based dependency completion in a Bayesian model-fitting framework.We carry out model simulation using a principled Bayesian workflow (Schad et al., 2020) to demonstrate the activation-based coupling between SWIFT and the Lewis and Vasishth (2005) model.As a result, our model yields reliable Bayesian parameter estimates by generating simulated data with known parameters, and then recovering these parameters using the Bayesian parameter estimation approach.
We also fit SEAM to recently-published empirical data from an eye-tracking experiment investigating similarity-based interference (Mertzen et al., 2023), providing model-driven explanations for the observed eye movement patterns.Given that SEAM simulates time-ordered fixation sequences, the model makes predictions for all spatial and temporal summary statistics that are relevant in the reading research literature (e.g., fixation probabilities, landing positions/saccade amplitudes, and fixation durations/reading times).This capability of the SEAM architecture makes it an important candidate model for theory development in psycholinguistics.
We will first introduce the Lewis and Vasishth (2005) model of sentence processing, then introduce the basic workings of SWIFT, and finally proceed to our integrated model SEAM.
The Activation-Based Model of Sentence Processing (Lewis & Vasishth, 2005) During sentence reading, the human sentence processor has to incrementally integrate individual words into a syntactic structure, based on which sentence meaning can be derived.Lewis and Vasishth (2005) proposed a model of sentence processing (hereafter, we refer to this model as LV05) that is based on the cognitive architecture ACT-R (Anderson & Lebiere, 1998;Anderson, 2005).In the LV05 model, incoming words are incrementally integrated into syntactic constituents that are stored in memory as chunks.Memory chunks in LV05 carry information in the form of features, which can be used to access them in memory later on.Chunks also have fluctuating activation values that are determined by recency and by cue match during retrieval events.For instance, in a sentence like (2), as the sentence is read word-by-word, the noun phrases the robber and the policeman are stored as memory chunks as soon as they are read.The verbs chased and escaped then each trigger retrievals of their respective arguments from memory. (2) The robber that the policeman in the patrol car chased escaped.subject object subject Taking the retrieval at the verb escaped as an example, the dependency needs to be completed by searching working memory for a suitable memory chunk to serve as a syntactic subject.The search process is cue-based, that is, the verb specifies a set of linguistic features such as ±noun or ±animate to identify the correct dependent, and existing memory chunks are reactivated based on their feature specifications.The best-matching candidate is usually retrieved, but because memory activation is noisy, misretrievals occasionally occur.In addition, processing is slowed when multiple memory chunks, such as the robber and the policeman in (2), match the retrieval cues and compete for activation, which is called the fan effect (e.g., Anderson, 1990).
In LV05, the latency of a given retrieval is governed by a set of equations taken from the ACT-R architecture (Anderson et al., 2004), which determine each chunk's activation at a given point in time.Suppose that a noun phrase, say the robber in (2), has been stored in memory as memory chunk k.When a retrieval is triggered while processing word n (escaped) later on, chunk k's activation value at word n is calculated as where S k is the memory association strength, P k is the mismatch penalty, and B k is the chunkspecific base-level activation.The fan effects φ kl (t) of competing retrieval candidates of all l features of memory chunk k decrease the chunk's activation strength, which also depends on the S max (maximum activation strength) parameter, i.e., The fan effect variable φ kl (t) is defined as the number of memory chunks with feature l at time t, including memory chunk k itself so that φ kl (t) ≥ 1.The mismatch penalty decreases activation for all retrieval cues l that do not match the corresponding feature of memory chunk k, i.e., where and p ≥ 0 is a free parameter specifying the mismatch penalty incurred by each unmatched feature.Chunks become active when words are encoded or when retrievals are performed, and then start to decay.The resulting base-level activation at time t is given by where d is a decay parameter and t ik is the i-th memory access (encoding or retrieval) of memory chunk k.
Note that in our implementation, in contrast to the original LV05 model, S k , φ kl , and P k are functions of time.This is because the memory schedule, that is, the set of words encoded in memory chunks, changes dynamically each time a word is encoded in memory.As encodings can happen at any time t, the memory schedule, and therefore the predicted fan effects and penalties, may change even while a retrieval is ongoing.This assumption is necessary to allow for dependency resolution in the case that a retrieval trigger is processed before a potential target has been stored in memory.
Activation values are subject to stochastic noise controlled by the ans (activation noise) parameter, so that The memory chunk k ⋆ n with the highest memory activation A ′ k,n is matched for the retrieval n, and the retrieval latency is computed as where F is the latency factor, a free linear scaling parameter.Equation ( 7) can be used to make quantitative predictions for reading times, and the LV05 model has been used to model a variety of phenomena in the sentence-processing literature (for a review, see Engelmann et al., 2019;Vasishth & Engelmann, 2022).However, the LV05 model can only be straightforwardly applied to paradigms in which sentences are read strictly incrementally, such as self-paced reading: The model can create chunks, track their activations, and integrate them with each other via retrievals, but it does not account for eye fixations, and cannot capture cases in which the order of fixations mismatches the serial word order due to skippings and regressions.To fully capture "natural" sentence reading, the LV05 model thus needs to be interactively integrated with a model that accounts for spatial and temporal aspects of eye movements.
The dynamical SWIFT model (Engbert et al., 2002(Engbert et al., , 2005) ) is a good candidate for integration with the LV05 model.Its main advantages are that it (a) has recently been implemented for Bayesian parameter inference (Rabe et al., 2021;Seelig et al., 2020), (b) predicts and explains all empirically observable saccades in sentence reading, and (c) allows for (but does not enforce) parallel processing of words.Even though SWIFT itself does not follow an ACT-R based architecture like EMMA (Engelmann et al., 2013;Salvucci, 2001;Vasishth & Engelmann, 2022), an integration with ACT-R-based models such as LV05 is possible via activation-based coupling, as we will detail below after a brief introduction of SWIFT.
The SWIFT Model of Eye-Movement Control (Engbert et al., 2005) SWIFT is a model of eye-movement control in reading implemented in a dynamical cognitive modeling framework (Beer, 2000;Engbert, 2021).At its core, its internal timing processes and word activations govern the temporal control and target selection for saccadic eye movements.Words with high activation values are more likely to be selected as saccade targets.SWIFT assumes that all words that fall within a processing span around the current fixation location are processed in parallel (Engbert et al., 2002). 2 The processing rate Λ j (t) of any given word j at time t depends on a number of factors such as gaze eccentricity, that is, the distance between word j and the currently fixated word, such that words that are further away from the visual focus are processed more slowly.
In SWIFT, each word in the sentence passes through a lexical and post-lexical processing stage.During lexical processing, word recognition and identification take place.As word recognition is ongoing, the discrete activation associated with the processed word j, n j (t), rises up to a maximum threshold, N j .The threshold is modulated by the word's corpus frequency, as frequent words generally require less processing than less frequent words, and word predictability.Note, however, that we did not include predictability effects in our model implementation.SWIFT also largely ignores low-level sensory perception and letter-level processing, which can have effects on the further (post-lexical) processing of a word and the sentence as a whole.In future work, processes such as bigram identification (Snell et al., 2018) and surprisal (Huang et al., 2023) are worth considering as extensions to SWIFT (or derivative models) to account for more aspects of lexical processing.
Once the word is identified, post-lexical processing begins and word activation decreases again.Post-lexical processing, however, is not explicitly modeled in SWIFT.Although SWIFT keeps track of the processing stage of words in the sentence, it has no higher-level representation of its constituents or of the entire word sequence.Adjacent words may have an influence on processing difficulty, but there is no mechanism to account for difficulty due to dependency completion processes at the sentence level.
While the relative word activations at the time of programming a saccade determine the relative probability of each word to be selected as the upcoming target, the timing of saccades is relatively independent (Findlay & Walker, 1999) and involves a cascade of several processes.The cascade starts with a global timer, which triggers the labile and subsequent non-labile saccade stages, a distinction motivated by oculomotor performance in the double-step paradigm (Becker & Jürgens, 1979).During the labile stage, saccades can be canceled and a new target can be selected.During the non-labile stage, cancellation is no longer possible.The execution of the saccade itself is a noisy process subject to systematic (range) and random error (McConkie et al., 1988), where the systematic error component can be explained by a Bayesian-optimal estimation of the saccade target position (Engbert & Krügel, 2010).for saccade amplitudes based on significantly better model fits in previous work (Rabe et al., 2021).
Target selection in SWIFT is inherently stochastic, as it depends on the dynamic, relative word activations at any given point in time.Words with high activation values are more likely to be selected as targets than words with lower activation.The probability π j (t) to select word j at time t as the next saccade target is given as where N W is the number of words in the sentence and is the normalized activation of word j at time t, which is the processing state of the word, normalized by parameter N a , the highest possible threshold of a word in a given corpus.
The relation between the activation a j (t) of a word and its selection probability π j (t) also entails that words requiring little processing (i.e., "easy-to-process" words) pass through lexical and post-lexical processing faster than less frequent (i.e., "difficult-to-process") words.The former words are therefore in a state of higher activation for a shorter time period, consequently less likely to be fixated, and thus often skipped.The free parameter γ modulates the relationship between word activations and selection probabilities.For γ → 0, words are selected randomly with equal probability, regardless of their actual activation values (if greater than zero).If γ → 1, there is a perfect linear relationship between activations and selection probabilities (Luce's choice rule).Higher values γ → ∞ enforce a winner-takes-all principle so that the word with the highest activation always "wins." The evolution of word activations in the original version of SWIFT (Engbert et al., 2002(Engbert et al., , 2005) ) was governed by ordinary differential equations (ODEs).In the more recent versions by Rabe et al. (2021) and Seelig et al. (2020), the dynamics of SWIFT changed toward a model with discrete internal states that evolve stochastically over continuous time.Word activations and saccade timers are random walks that increase/decrease over time with different transition rates for different timers and individual word activations.The state of the model at time t is given by a vector n = (n 1 , n 2 , ..., n 4+N W ), where the components n j represent the states of the subprocesses.Components 1 to N W are keeping track of the (post-)lexical processing of words, while components N W + 1 to N W + 4 are saccade-related and additional stochastic variables (Table 1).In each of the possible transitions from state n = (n 1 , n 2 , ...) to n ′ = (n ′ 1 , n ′ 2 , ...) only one of the sub-processes n i is changed by one unit.The discrete stochastic variables {n j } at time t map to the activation variables {a j (t)}.
For the numerical simulation of the model, an algorithm can be derived from the master equation (see Seelig et al., 2020, for details), which describes the temporal evolution of the model's internal states (Gardiner, 1985;Van Kampen, 1992).It is specified by the transition rates W n ′ n , which in turn govern the transitions between state vectors n → n ′ .
Implementation of more detailed assumptions on the post-lexical stage can be achieved by changing the transitions rates {w j (t)} that control the stochastic transitions for the internal states {n j (t)} and thus for activations {a j (t)}.Transition rates are a measure of the expected number of transitions in a given time unit (milliseconds in SWIFT) and are the inverse of the expected time between two consecutive transitions.Transition rates, in combination with thresholds N j , are therefore directly related to processing speed.While the rates for the saccade timers are either constant or determined according to an invariant rule (see Table 1), the determination of transition rates for word processing components varies between processing stages, i.e., where α is the baseline processing difficulty, Λ is the processing rate, proc is the relative processing speed for post-lexical processing, and ω is a minimum decay parameter. 3In the integrated SEAM model, word activations in SWIFT are coupled with memory activations in LV05 in a Bayesian modeling framework by adapting the formula in Equation ( 11).
The fact that the SWIFT implements detailed mechanisms on word processing and saccade preparation is reflected by the number of parameters.Fitting the eye-movement model to experimental data started with hand-picking plausible parameter values, grid search (Reichle et al., 1998), genetic algorithms (Engbert et al., 2002), while optimizing the fit between empirical and simulated summary statistics.Based on the development of a likelihood approximation (Seelig et al., 2020), a fully Bayesian framework is now available for parameter inference (Rabe et al., 2021).The likelihood framework permits objective parameter fitting independent of a set of selected summary statistics, since fixation sequences are involved for likelihood computation.Using large-scale numerical simulations, it has been shown that SWIFT can reliably reproduce fixation durations, fixation probabilities and saccade amplitudes at the level of global and by-participant summary statistics, without using those summary statistics for the purpose of parameter fitting.

Figure 1
Word Activation in SWIFT Processing stage lexical post−lexical complete Note.Theoretical activation history of three words (A, B, and C).Colors of line segments correspond to the processing stage active at that given time.Activation maxima are N A , N B , and N C , respectively.Activations are displayed as continuous but are actually implemented as discrete counters.

SEAM: Activation-Based Coupling of SWIFT and LV05
In baseline SWIFT, processing a word always starts out in the lexical processing stage.Once the word activation n j (t) has reached its threshold N j at time t, it begins post-lexical processing, and activation starts to decrease.When the activation has returned to zero, the word is completely processed.
Figure 1 abstractly shows the activation histories of three hypothetical words.The figure assumes that the eyes move sequentially from word (a), to (b), to (c), leading to a somewhat sequential onset of their first processing (t 1 , t 2 , and t 5 ).The first stage of processing is the lexical stage.During this stage, activations rise until they reach their respective maxima (N A , N B , and N C ), which depend on printed word frequency.Given that saccade targeting depends on activation, the words in question are most likely to be selected as a saccade target if the upcoming saccade is programmed at times t 3 , t 4 , and t 6 .This happens as well when the words enter the post-lexical processing stage.During post-lexical processing, activations decrease again, making it in turn less likely for the respective word to be selected as a target.Once the activation returns to zero (t 5 , t 8 , and t 9 ), the word is assumed to have completed processing.
A feature common to the SWIFT and LV05 is that both models use activation values to guide processing.SWIFT uses word activations to select words as saccade targets, while LV05 uses memory activations to select memory chunks as retrieval targets.Our integrated model SEAM keeps these activations separate, but implements an interaction, so that memory activations in LV05 modulate word activations in the SWIFT model.Therefore, rather than assuming that the sentence processor has direct control of the eye-movement targeting system, we propose an indirect, stochastic influence on saccade targeting via memory activations.This is in good agreement with eye-tracking studies carried out with larger-than-usual sample sizes that show that the effects of sentence processing cost due to memory interference on fixation and other measures have relatively small magnitudes (e.g., Jäger et al., 2020); larger effect sizes are generally driven by lower-level factors such as frequency and word length (Boston et al., 2008).
In SEAM, activations in the LV05 component reflect the construction of a sentence representation, which affect word activations and thereby stochastically influences target selection in the eye-movement component.As in SWIFT, the activation gradient of a word in SEAM is mainly determined by the transition rate, which varies between processing stages.Compared to SWIFT, the sequence of processing stages in SEAM is extended by stages that reflect the cost of memory retrieval, which can account for additional processing difficulty.Possible interactions of memory retrieval and the word activations during dependency resolution include: (a) the retrieval process delays post-lexical processing of the the currently fixated region that caused the retrieval (that is, the retrieval trigger); and (b) retrieval candidates are reactivated so that they attract regressions from the retrieval trigger.
In Figure 2, activation histories of the same three words from the SWIFT example in Figure 1 are shown.Like the baseline SWIFT model, words in SEAM go through a lexical and post-lexical processing stage before they are considered completely processed.However, SEAM additionally accounts for the resolution of a linguistic dependency during post-lexical processing of word C. Once the words are lexically accessed (t 3 , t 4 , and t 6 ), they are encoded as chunks in SEAM's memory module, along with their features, as in the LV05 model.Words A and B are assumed to not trigger a dependency completion process; this is the case for most nouns.However, when word C, which could be a verb, is processed and the associated chunk is stored in memory, a subject-verb dependency must be resolved.A retrieval is thus triggered.The assumption that nouns do not trigger a dependency completion process is obviously an oversimplification; but this simplification is reasonable for the data being modeled in this paper, as in the experiment design of Mertzen et al. (2023), the theoretically interesting dependency completion occurs at the verb.
During retrieval, all words that are fully processed before the processing of word C completes are counted as retrieval candidates.Candidate words enter into a retrieval stage in which activation increases until the retrieval process finishes. 4The activation increase differs by the degree to which the retrieval candidate features match the retrieval cues, implementing a core assumption of the LV05 model.
The effect of the memory activations on word activations is mainly modulated by the new parameters µ 2 and µ 3 .The retrieval stage ends when one candidate reaches a threshold value, which is a fraction µ 3 of the maximum activation of the retrieval trigger N C .Because post-lexical processing in SEAM is only finished after all dependencies have been resolved, the post-lexical activation of the retrieval trigger is guaranteed not to fall below a fraction µ 2 of its maximum activation during retrieval.This is why the post-lexical activation of word C does not change between t 7 and t 10 .In this example, despite entering the retrieval phase at a later time, word B reaches the retrieval threshold at time t 10 before word A, thereby concluding the retrieval process.Consequently, the post-lexical processing of word C continues and all retrieval candidates, that is, word A and word B, enter a post-retrieval stage, which is equivalent to an additional post-lexical processing stage.This also

Figure 2
Word Activation in SEAM entails that the retrieval phase of word A is aborted, which would otherwise have reached threshold at time t 11 .
The transition rates of the baseline SWIFT model for word j, Equation ( 11), are replaced by where m is the current retrieval trigger that needs to form a dependency.The transition rate for the retrieval candidate j, triggered by dependency resolution for word m, is chosen to ensure that the total duration of reaching threshold (i.e., the time for j to be matched as a dependent of m), matches the retrieval latency predicted by LV05.Therefore, it is computed as the threshold value µ 3 N m divided by the expected total duration of j in that stage, F exp −A ′ j,m (t) .Altogether, SEAM extends the baseline SWIFT model parameters (Rabe et al., 2021;Seelig et al., 2020) with seven additional model parameters.The parameters d (decay), S max (maximum memory activation strength), F (retrieval latency scaling factor) and p (mismatch penalty), which modulate w ′ (t) through A ′ j,m (t), are directly based off their LV05 implementations (Lewis & Vasishth, 2005).Moreover, the link between word activations in LV05 and processing rate in SWIFT is complemented by the three new model parameters µ 1 , µ 2 , and µ 3 , as detailed above.Some parameters of the LV05 model, in particular for goal activation and noise (G, and ans), are ignored in the present implementation.Variation in the goal activation parameter is usually used to model individual-level capacity differences (e.g., Daily et al., 2001;Mätzig et al., 2018;Vasishth & Engelmann, 2022), which is not of interest in the present work.The goal activation is fixed at 1.0, which gives equal weight to all retrieval cues.The noise parameter ans is replaced by the built-in stochasticity of SWIFT.Moreover, the parameters S max and F are not independent in terms of the resulting retrieval latency and transition rate, which is why we will only estimate F as a free parameter and keep S max at a fixed default value of 1.5.In the present study, we also exclude µ 1 , the fixed time needed to execute a production rule, by setting it to 0, because we assume this time to overlap with some of the oculomotor processes already present in the model.Since S max is fixed, we also decided to fix mismatch penalty p at its default value, as the relation of the two parameters is critical.Thus, the only parameters that were fit to the Mertzen et al. (2023) data were F, d, µ 2 , and µ 3 .For a complete list of model parameters and default values in SEAM, see Appendix A.
For our implementation of SEAM, we opted for a simplified version of the LV05 model (Engelmann, 2015) and the latest version of SWIFT (Rabe et al., 2021).5 SEAM connects the baseline eye-movement control architecture of SWIFT with the interactive working memory module of LV05 via activation-based coupling: reading words in SWIFT leads to the creation of memory chunks and can trigger retrievals in LV05, whereas chunk activations computed by LV05 modulate word activations in SWIFT.
(3) a.It turned out that the attorney +subj +anim whose secretary had forgotten about the important meeting −subj −anim frequently complained subj anim about the salary at the firm.
b.It turned out that the attorney +subj +anim whose secretary had forgotten about the important visitor −subj +anim frequently complained subj anim about the salary at the firm.
c.It turned out that the attorney +subj +anim whose secretary had forgotten that the meeting +subj −anim was important frequently complained subj anim about the salary at the firm.
d.It turned out that the attorney +subj +anim whose secretary had forgotten that the visitor +subj +anim was important frequently complained subj anim about the salary at the firm.
In the example above, processing the verb complained is expected to trigger a retrieval for an animate subject noun phrase.In all sentences, attorney is the grammatically correct subject of complained, and should thus be retrieved.However, the distractor noun phrase (meeting or visitor) may interfere with the retrieval of attorney.The distractor is visitor in the +animate or meeting in the −animate condition, and it is either a subject (+subject) or an object (−subject) of the embedded clause.
According to cue-based retrieval theory, both subjecthood and animacy of the distractor should lead to additional difficulty for resolving the critical dependency.This is due to the fan effect (e.g., Anderson, 1990), which is also known as similarity-based interference (Jäger et al., 2017): When the feature specification of a distractor overlaps with that of the retrieval target, it diverts some of the retrieval activation from the target to itself.The activation of both the target and distractor are reduced, leading to longer retrieval time; what ends up being retrieved in a particular simulation run (target or distractor) depends on which chunk happens to have higher activation (this can vary in simulation runs due to stochastic noise in the activation).It is therefore possible that the distractor is sometimes erroneously retrieved.As indices of increased processing difficulty, we expect additive effects of animacy and subjecthood of the distractor on regression path duration and outgoing regression probabilities on the critical verb (complained).The primary region of interest where the effect of the subjecthood and animacy manipulation should manifest is the verb; however, because similarity-based interference effects have been shown to occur in the region just before the verb (Lago et al., 2021;Van Dyke, 2007), Mertzen et al. (2023) also investigated the effect at the adverb (frequently) that preceded the critical verb.For this reason, in our investigations we also report model fits for this pre-critical region.
In summary, similarity-based interference accounts predict that conditions (3b,d) should be more difficult to process than conditions (3a,c) due to the animacy of visitor, and conditions (3c,d) should be more difficult to process than conditions (3a,b) due to the distractor being in subject position.
As indices of increased processing difficulty, additive effects of distractor animacy and distractor subjecthood were expected in reading times and outgoing regression probabilities.An interaction of distractor subjecthood and animacy was not predicted but is reported in Mertzen et al. (2023) for completeness; in the Mertzen et al. (2023) analysis, there was no evidence for an interaction.
In this summary of the Mertzen et al. ( 2023) results, we report only regression path duration Note.Plotted violins are the estimated posterior distributions of experimental effects of subjecthood (subj) and animacy (anim) on regression path duration (RPD) and first-pass outgoing regression probability (FPR) from Bayesian mixed-effects regressions, as analyzed and reported by Mertzen et al. (2023).Posteriors are backtransformed linear effects in ms (for RPD) or % (for FPR).and first-pass regressions out (FPR) from the pre-critical adverb and the critical verb; for full details of all experimental results, please see the original paper.
The effects of animacy and subjecthood (coded as sum contrasts) were analyzed using Bayesian mixed-effects models.Subject and item were specified as random effects in the models, with a full variance-covariance matrix for subject and item random effects.The models were implemented with brms (Bürkner, 2017(Bürkner, , 2018(Bürkner, , 2021)), an interface to Stan (Carpenter et al., 2017).Priors were mildly informative Gaussian distributions for the linear model coefficients (intercept and slopes) and mildly informative regularizing Lewandowski-Kurowicka-Joe (LKJ) priors (Lewandowski et al., 2009) for random effects correlation matrices; setting the LKJ prior's parameter ν to 2 downweights extreme correlations like ±1.For a detailed tutorial on linear mixed models in the Bayesian setting, see chapter 5 of Nicenboim et al. (2023), or Sorensen et al. (2016).
The results in Mertzen et al. (2023) showed reading time patterns consistent with effects of subjecthood (syntactic interference) and effects of animacy (semantic interference).Figure 3 shows that on the pre-critical adverb, the effect of subjecthood shows longer regression path duration (RPD) and more first-pass regressions out for conditions that have a +subject distractor (95% credible intervals (CrIs): RPD [17, 63] ms, FPR [3, 11]%).Similarly, the effect of animacy shows longer regression-path duration and an increase in first-pass regressions out for conditions with animate distractors compared to conditions with inanimate distractors (95% CrIs: RPD [8, 57] ms, FPR [2, 8]%).The subjecthood × animacy interaction in regression-path duration is centered on zero; for first-pass regressions, the interaction has a negative sign ([−7, 0]%).
On the critical verb, the effects of subjecthood and animacy show a similar pattern of longer regression path duration and an increase in first-pass regressions out (Subjecthood 95% CrIs: RPD [3, 52] ms, FPR [1, 8]%; Animacy 95% CrIs: RPD [0, 39] ms, FPR [−1, 5]%).The interaction is centered around zero for regression path duration and regressions out.The increased reading times and regressions for conditions that have subject or animate distractors indicate that syntactically and semantically similar distractors can interfere during long-distance dependency formation.

Simulation Study
The reliability of computational cognitive models critically depends on the availability of appropriate methods for statistical inference (Engbert et al., 2022;Schütt et al., 2017).We previously applied a broader principled Bayesian workflow (Schad et al., 2020) for the baseline SWIFT model in Rabe et al. (2021), which is used as the eye-movement platform in SEAM.
In Figures 4 and 5, we visualized the word activation field and eye trajectory for a simulated trial in SWIFT and SEAM, respectively.As can be seen, SEAM behaves similarly to SWIFT throughout most of the trial.However, the models' behaviors start to diverge when the verb complained is processed and triggers a retrieval in SEAM.During the retrieval phase, word activations of previous words that have been encoded as memory chunks increase.Words with better cue match for the retrieval approach the activation threshold faster than those with lower cue match.If a saccade is triggered during the retrieval phase, the reactivated words can attract regressions.
Without proper checks, it is not self-evident that Bayesian model fitting of SEAM can be carried out in the same way as for SWIFT.However, we expect that our implementation of SEAM will exhibit correct inference because it meets the following three critical conditions: First, for all observables that were taken into account (i.e., fixation positions and durations), a model likelihood has already been implemented in SWIFT (Seelig et al., 2020).Secondly, both SWIFT and LV05 are dynamic in the sense that they describe activation values as a function of time, which allows us to let them interact dynamically without a significant modification of their initial conceptualization.Thirdly, the dynamics of eye movements and sentence processing interact in the integrated SEAM model and will thus affect the observable temporal and spatial aspects of fixation sequences due to the activation coupling of the constituent SWIFT and LV05 components.The coupling via word activations permits indirect fitting of model parameters related to memory retrieval, as long as they have some probabilistic effect on the outcome variables captured by SWIFT.
After confirming the computational faithfulness of the model, we fitted the model to a training subset of the experimental data and compared predictions for a withheld test portion us- It turned out that the attorney whose secretary had forgotten that the visitor was important frequently complained about the salary at the firm.

Time [ms]
Note.SWIFT simulation for Example (1).The bold black line is the simulated fixation location (x-axis) as a function of time (y-axis).Saccades are horizontal displacements of the black line.Word activations are depicted by gradients in the background, with darker shades referring to higher activation.The target selection preceding each executed saccade is depicted by a red cross, marking both the time and intended saccade target.Target selection is based on the relative word activations at the respective time point of saccade programming.Saccade timers, which are also components of the internal states, are omitted for brevity.For more details, see Rabe et al. (2021) and Seelig et al. (2020).
ing relevant global summary statistics and the predicted experimental effects of similarity-based interference6 described in the previous section.

Data Assimilation
In eye-movement research, the experimental (observed) data are fixation sequences consisting of time-ordered sequential observations.In such a case, the identification of model parameters is possible within the field of data assimilation (Engbert et al., 2022;Reich & Cotter, 2015).Data It turned out that the attorney whose secretary had forgotten that the visitor was important frequently complained about the salary at the firm.

Time [ms]
Note.SEAM simulation for Example (1).As in Figure 4, the black line is the simulated fixation location (x-axis) as a function of time (y-axis), gradients in the background are word activations, and red crosses are selected targets.Note that, in comparison to Figure 4, processing of forgotten and complained triggers retrievals, which prolongs processing of the trigger and reactivates potential retrieval candidates.In this simulation, during both retrievals, regressive saccades are programmed and executed.assimilation refers to the integration of complex mathematical models with time-series data (see Morzfeld & Reich, 2018, for an introduction).In this framework, the SWIFT model has previously been implemented for Bayesian model fitting (Seelig et al., 2020).Rabe et al. (2021) showed that, in a principled Bayesian workflow (Schad et al., 2020), SWIFT can be reliably fitted to simulated and experimental data even with many free parameters and sparse data that resulted from splitting by participant and experimental condition.

Sequential Likelihood
The time-ordered nature of fixational eye movements make them a suitable target for data assimilation (Engbert et al., 2022).To exploit the sequential information of the data, some of those models use sequential likelihoods for parameters θ ∈ Θ such that where X n = (x 1 , . . ., x n ) is the entire sequence of n events and P M (x i | X i−1 , θ ) is the likelihood of the i-th event of the sequence given all previous events X i−1 = (x 1 , . . ., x i−1 ).Successful examples of applying data assimilation for visual tasks are, for example, Sce-neWalk (Schwetlick et al., 2020(Schwetlick et al., , 2022) ) for scene viewing and SWIFT (Rabe et al., 2021;Seelig et al., 2020) for reading.There, each event of the sequence, x i , is a fixation.Since the location and temporal onset of the first fixation are typically known due to the experimental paradigm, e.g., sequences always starting at a fixation cross, the likelihood for x 1 is given by P M (x 1 | θ ) = 1.Sce-neWalk and SWIFT further decompose the likelihood into spatial and temporal components, since each fixation has a spatial location on the screen and a duration.
As SEAM is based on SWIFT and we only changed the latent transition rates rather than the saccade execution itself, we can easily use the data assimilation methods implemented for SWIFT.This is especially useful because we fit the model on a by-participant basis and hence only have little data for parameter estimation.The decomposition of temporal and spatial likelihood components is also theoretically interesting since we can expect the modification of the transition rates to affect both the temporal control and target selection of the (simulated) saccadic eye movements.

Profile Likelihoods
As SEAM modifies model dynamics and thus the likelihood function of SWIFT, a reevaluation of the profile log-likelihoods is crucial.Those are generated by first simulating data with known parameter values, and then systematically varying parameter values and inspecting the likelihood of the data for each value.Ideally, the likelihood of the data should be highest for the true parameter values.In order to assess whether the modifications introduced in SEAM are appropriately captured in its likelihood, it should be ensured that the newly introduced free parameters affect the outcome likelihood.Thus, the behavior of the likelihood as a function of each of the new parameters represents a necessary condition for identifiability and statistical inference of the full model (Rabe et al., 2021;Seelig et al., 2020).
Parameters were inspected if they were going to be fitted later on and/or were added in this model implementation compared to the reference SWIFT implementation (Rabe et al., 2021).This was the case for a total of 11 parameters (see Figure 6).Parameters µ 1 and S max were also inspected even though they were not selected to be fitted to the recovery and experimental data.This is because the parameters themselves are identifiable, as can be seen in Figure 6, but they are not independent from other model parameters in terms of an effect on model behavior.All other shown model parameters are also fitted to simulated data for parameter recovery as well as to experimental data.

Parameter Estimation and Recovery
As a last step for the verification of the computational faithfulness of the approach, we applied a sampling algorithm to simulated data with known true parameter values in order to ensure the validity of the computational approach.We generated 100 unique data sets with different sets of true parameters θ ⋆ randomly sampled from the prior distribution later used for parameter estimation.Parameters would be considered successfully recovered if the correlation between true and recovered parameters was sufficiently high and the normalized root mean squared error (NRMSE) was sufficiently low.

Summary Statistics and Experimental Effects
Even though we are using an objective likelihood-based approach for model fitting, it is important that simulated and empirical data are in good agreement at the level of relevant summary statistics, especially with regard to comparability with competitor models and theory testing (Roberts & Pashler, 2000).Because the goal for SEAM is to explain both spatial and temporal aspects of eye movements in reading, we consider a number of different spatial and temporal summary statistics frequently used in reading research.For the spatial dimension, we are looking at several fixation probabilities, that is, probabilities to fixate (or skip) specific words under different conditions.For the quantification of the temporal aspects of the model fit, we evaluate different fixation durations, that is, average reading times under different conditions.
A subset of the experimental test data set is withheld from parameter estimation, and this held-out set will then be compared on the basis of summary statistics against predicted data from SEAM and SWIFT using estimated parameters.Specifically, we first split the experimental data into a training and test subset, fitting the model to 70% of the data (training set) of each participant and condition, subsequently predicting eye trajectories for the other 30% (test set).For each withheld trial, we generated a fixation sequence using the HPDI (highest posterior density interval) midpoint of the sampled posterior distribution of a given participant and parameter (Rabe et al., 2021).We also present the predictions of SEAM and SWIFT for the experimental memory interference effects, which can be similarly derived from the simulated and experimental data alike.

Profile Likelihoods
We evaluated the likelihood for a typically sized simulated data set where all parameters had been set to default values7 (see Appendix A).For each parameter, the respective true value, that is, the value used for simulating the data set, is shown with a vertical dashed red line.Then, for each parameter, for 50 equidistant parameter values in the intervals shown, the likelihood for the data given the model was evaluated.Ideally, the likelihood should be maximal around the true value.
In Figure 6 we observe that the likelihood peaks, as expected, around the true value for most of the parameters.This means that (i) the parameters affect the likelihood and (ii) the likelihood may be used to recover their values.Individual likelihood evaluations are represented by dots.The plotted line smooths are just for guidance and do not represent the true likelihoods.The important observation here is that the highest evaluated likelihoods are always relatively close to the true value, even for the case of µ 2 , where the smoothed lines falsely suggest a flat likelihood.
Since not every fixation involves a retrieval, the new SEAM parameters can only have a very limited effect on the likelihood.Therefore, effects observed in the likelihood function are less pronounced than for the established SWIFT parameters such as processing span δ 0 .The fact that that higher likelihood evaluations nevertheless cluster around the true values is an indication that the parameters are identifiable, but their fitted values should be interpreted with caution.
For one of the parameters, µ 2 , the likelihood does not peak at all, which is probably because µ 2 only affects the model's behavior in rare instances.As µ 2 only determines the threshold value of a retrieval trigger, the likelihood is only affected for the small subset of words that trigger retrievals.By contrast, µ 3 affects the threshold of multiple words at the same time, i.e., all words previously processed.Also note that the profile likelihoods as well as the parameter recovery reported below are based on simulated data sets comparable in size to the experimental data of Mertzen et al. (2023).We would expect µ 2 to exhibit a more pronounced effect on the likelihood for larger data sets with more retrieval events.Despite the noise level of the profile likelihood of µ 2 , we decided to fit µ 2 as a free parameter.This means that different plausible values from the prior are considered throughout the sampling procedure instead of keeping µ 2 fixed at a (possibly implausible) default value.

Parameter Recovery
Analogous to the inspection of the profiles log-likelihoods, we simulated data from the known model but generated 50 data sets, each with a unique combination of random parameter values within the bounds of the previously inspected intervals, effectively sampling from the prior distribution.Then, we fitted the model to each of the data sets, using uninformative uniform priors over the bounds shown in Figure 6.Each fit is represented with one point per panel in Figure 7, showing 95% credible intervals (CrIs) on the y-axis and the true parameter value on the x-axis.Ideally, CrIs would be narrow intervals spanning around the identity diagonal.
We can see that the 95% CrIs almost always include the true value but are relatively wide, especially for the added parameters F, d, µ 2 , and µ 3 .Nevertheless, the agreement is generally good, as can be seen in the low normalized root mean square error or NRMSE values8 and high correlations between true parameter values and CrI midpoints.This suggests that in general, true parameter values of simulated data sets can be recovered sufficiently well or at least with an acceptable level of uncertainty.As before, we note that parameter values, especially point estimates, should be interpreted with caution.
The reason for the high uncertainty for the new parameters is very similar to that for the profile log-likelihoods: Over the course of the entire fixation sequence, there are only very few retrieval events where these parameters could possibly have an effect on model behavior.Additionally, even when there is a retrieval, it is not guaranteed that it actually affects the activation of the currently fixated word, as the eyes may, for instance, already have continued past the retrieval trigger.Given these limitations, the recovery performance is surprisingly good, and the high correlations between true and recovered parameters appear very promising.

Summary Statistics
So far, we have demonstrated that SEAM, like SWIFT in its most current version (Rabe et al., 2021), can be successfully fitted to simulated data: The true parameter values are in the vicinity of profile log-likelihood peaks and are contained within parameter recovery CrIs.This means that if we assume the true underlying cognitive architecture to be similar to SEAM, we can reliably use fitted parameters (or their credible intervals) to make inferences about it.However, as the true underlying cognitive architecture is unknown, such checks are per se impossible on experimental data.Instead, we compare simulated and experimental behavior on the basis of relevant summary statistics.For this, as explained earlier, we first split the experimental data into a training and  test subset, fitting the model to 70% of the data of each participant and condition (training set), subsequently predicting eye trajectories for the other 30% (test set).Rabe et al. (2021) had previously noted that SWIFT, with the cross-validation method described above, is unable to make reliable predictions for regressive eye movements.However, given that SEAM now incorporates processes for cue-based memory encoding and retrieval, and given that memory retrieval processes are specifically hypothesized to trigger regressions by modulating the activation of retrieval candidates, in SEAM we should see an improvement in regression-related statistics such as incoming/outgoing regression probabilities, as well as regression path durations.These are also two important dependent measures in which effects were found in the experimental data set (see Experimental Study, for a short summary; see Mertzen et al., 2023, for details).
In Figure 8, we show the comparison of summary statistics between experimental data and simulated data from the baseline SWIFT model (without memory retrieval) and SEAM (with memory retrieval).In all cases, SEAM predicts regression-related fixation probabilities and fixation durations more reliably than SWIFT.It is also noteworthy that not only the average across all word frequency bins but even word-frequency effects on summary statistics are reliably predicted.

Experimental Effects of Memory Interference
Arguably the most critical test for the SEAM architecture is to evaluate whether the model can predict differences in summary statistics between experimental conditions in the design of Mertzen et al. (2023), which manipulates effects of memory retrieval on reading.First-pass regressions (percentage) pre-critical Note.Shown are the 95% credible intervals of the estimated effects from the data and from the two models.
The empirical estimates are from the held-out data (30% of the data).subj = Effect of subjecthood, anim = Effect of animacy.
Based on a different experimental design, Rabe et al. (2021) were previously successful in demonstrating that SWIFT can be used to predict and explain differences in reading behavior when fitted to each participant and experimental condition separately.In our study presented here, however, we are only fitting one model at a time to each participant's data across all conditions, thereby considerably reducing the degrees of freedom.If the model is able to predict differences between experimental conditions, these do not originate from different parameter values for each ( 8 .0 7 , 1 0 .9 ] ( 1 0 .9 , 1 2 .6 ] ( 1 2 .6 , 1 4 ] ( 1 4 , 1 4 .2 ] [ 0 , 5 .7 4 ] ( 5 .7 4 , 8 .0 7 ] ( 8 .0 7 , 1 0 .9 ] ( 1 0 .9 , 1 2 .6 ] ( 1 2 .6 ,  condition but from the model dynamics, which are affected by the different feature specifications of the memory chunks across conditions.Therefore, capturing differences between conditions is a direct test of SEAM's added memory module.To illustrate the gain in empirical fit over baseline SWIFT, we also report predictions from SWIFT for reference.In SWIFT, no differences between experimental conditions are expected, because SWIFT has no parameters that could account for the processing cost of memory retrievals. In order to evaluate the empirical fit of SEAM and baseline SWIFT, we conducted the same set of analyses for the observed experimental data and for the data predicted by SEAM and by SWIFT, after fitting each of the models to the training data sets.For both sets of data, we conducted a Bayesian mixed-effects regression for regression-path durations and outgoing regression probabilities as predicted by region and experimental condition (syntactic/semantic interference).
Table 2, and Figures 9 and 10 summarize the comparisons between the held-out empirical data and the predictions of SEAM and SWIFT.In order to interpret these comparisons, we compare SEAM and SWIFT against the empirical estimates from the held-out data using a region of practical equivalence (ROPE) approach (Freedman et al., 1984;Kruschke, 2014;Spiegelhalter et al., 1994) rather than formal model comparison methods such as k-fold cross validation, Bayes factors, or the like (for tutorial introductions to these topics, see Nicenboim et al., 2023).The ROPE approach is a graphical model comparison method that involves comparing model predictions against observed estimates from data; overlap in the posterior distribution of estimates provides an informal basis for deciding whether a model approximately matches observed estimates.In this approach, there is no notion of statistical significance; rather, the focus is on whether the model predictions are approximately consistent with the data.One important reason for taking this informal model comparison approach is the fact that the held-out data are relatively sparse.For this reason, the present evaluation should be seen rather as a proof-of-concept rather than a comprehensive evaluation.Such an evaluation would require significant amounts of benchmark data (for examples of such extensive evaluations, see Engelmann et al., 2020;Nicenboim et al., 2020;Yadav et al., 2023) and must be left for future work.Table 2, and Figures 9 and 10 show that the predictions for the experimental effects of animacy (semantic interference) and subjecthood (syntactic interference) in the experimental data are generally more in agreement with SEAM than with SWIFT: the violin plots in Figure 9 from SEAM have a better overlap than the observed data than the predictions from SWIFT.This is true in both the pre-critical and critical regions, in both the first-pass regression and regression path duration measures.One exception is the subjecthood effect at the critical verb (see the bottom right panel in Figure 9); SEAM predicts essentially no effect of subjecthood, just like SWIFT.This is mainly because the regression paths predicted by SEAM are somewhat too short, i.e. return too early, in the +subject conditions (see Figure 11).We return to this in the Discussion section.Given that SWIFT does not have any mechanism that accounts for cue-based memory retrieval, it is expected that the model predicts no effects of memory interference.Notice that the violin plots for the data as well as the SEAM and SWIFT predictions shown in Figures 9 and 10 are relatively wide; this is due to the fact that only 30% of the test portion of experimental data (the held-out data) are compared to the model predictions.
A main motivation of SEAM was to develop a model in which low-level psychological and high-level linguistic processes interact.The integration of the LV05-based memory module is expected to affect eye movements especially in cases of demanding dependency resolution and there, particularly strongly if there is high ambiguity between the correct dependents and distractors.Even though we already know that the Mertzen et al. (2023) data do not provide unequivocal evidence in support of this hypothesis, we can look at the distribution of regressions across trials conditional on launch and landing sites in order to investigate where regressions from the (pre-)critical region tend to land in the experimental data and in the simulations.Figure 12 and Appendix B show that regressions in general have a tendency to land on the preceding word.9In these cases, SEAM is in better agreement with the experimental data than SWIFT.For regressions launched from the verb, however, SEAM currently predicts too many regressions on average, although the experimental effects (i.e., differences between conditions, see Figure B) are still in agreement with the experimental data.As there are generally very few regressions, both in the experimental and in the simulated data, analysis of regression durations is problematic but Figure 13 shows that they are also generally in good agreement with each other.
As SEAM and SWIFT are nested models,10 the fact that SEAM but not SWIFT can predict different summary statistics is a first indicator that the differences in predictive power between the models may be due to the added memory retrieval submodule.To verify this and to attempt an explanation of the differences in observed behavior, we look at the differences in the internal model dynamics under the different experimental conditions.
In particular, we can examine the word activation field, which is the main driver for target selection probabilities in SEAM and SWIFT (Equation 8), including regressive saccades.In Figure 14, we show word activations in SEAM, averaged across 500 independent simulations, using the mean estimated model parameters across all model fits.Before averaging across simulations, all word activations are centered on the temporal dimension so that t = 0 is the time when the activation of the critical verb reaches its maximum, that is, when post-lexical processing of the critical verb starts and triggers the memory retrieval.First, it is important to note that the activations of the critical verb, when normalized in time, do not vary substantially between experimental conditions.Although some conditions seem to have a slower decrease than others, overall the curves are very similar in all conditions.When the retrieval starts at t = 0, retrieval candidates are reactivated, with their memory activation A ′ j,n (t) modulating the transition rate w ′ j (t) (see Equation 12) of word/memory chunk j.While the activation for the target word seems to be very similar over time between conditions as well, there is some variability in the time course of the activations of the distractor noun and of the adverb around the retrieval.
Regarding the adverb, the main reason it is reactivated during retrieval is that it has the highest base-level activation B(t), as it was most recently encoded/accessed in memory before the retrieval started.The later processing of object noun distractors also attenuates the processing that the adverb receives, which leads to weaker reactivation of the adverb during the retrieval.
We can also observe that the distractor word activations prior to the retrieval peak earlier for the two conditions where the distractor is a subject noun, that is, in the conditions where there is syntactic interference.This effect is not related to the retrieval at the critical verb (which has Note.SEAM word activation of the target, distractor, pre-critical adverb, critical verb, and post-critical region of a sentence grouped by experimental condition.Activations are averaged across 500 independent simulations of the same item in all four conditions.For each simulation, t = 0 is adjusted to the time of the start of post-lexical processing of the critical verb, that is, the start of the retrieval. not started at this time), but is due to the distractor appearing earlier in the sentence when it is a syntactic subject.Interestingly, the distractor noun only significantly peaks during the retrieval in the +animate/+subject condition, that is, when both features match the retrieval cues.The distractor thus only attracts regressions when both the animacy and subjecthood features match, i.e., when there is both syntactic and semantic interference.Despite this difference in word activations, there is no significant difference between the proportions of observable targeted regressions from the critical verb to the distractor noun between any of the experimental conditions.This is true for the experimental data as well as for the data simulated by SEAM and SWIFT, as shown in Figure 12 and Appendix B.
As the estimates show, there is no indication in the experimental data that the distractor is regressed to more often in the +animate/+subject condition.The distractor's activation pattern in Figure 14 is simply a consequence of the hard-coded assumption in LV05 that it has the highest feature match in this condition.Interestingly, however, the predicted data from SEAM do not show an increase in incoming regressions to the distractor either.An increase in word activation thus does not necessarily translate into a change in observed eye movements.The lack of a direct effect on distractor refixations is likely due to oculomotor error, which is more influential for long-range saccades, and due to upcoming words having even higher activations than the distractor.
Based on results from preliminary post-hoc analyses, the overestimation of the average regression probability from the critical verb to preceding regions (see Figure 12) is also probably due to the hard-coded retrieval schedules.Even though the times of memory encoding, and therefore the base-level activation, are stochastic and completely governed by the eye's trajectory, the feature match is deterministic.Future work could investigate alternative links between memory activation A ′ j,n (t) and word activation a j (t) or transition rates w ′ j (t), as they currently implement a very strong linking hypothesis.
In this context, we also note that -as far as we are aware -the only study that has previously looked at word-level rereading as a function of similarity-based interference is Lee et al. (2007).The authors report longer rereading times for a sentence-initial region containing both the retrieval target and the distractor in their high-interference conditions, but the Korean sentences used in their study were relatively short compared to those used by Mertzen et al. (2023).In future work, shorter sentences should be a fruitful testing ground for SEAM.If SEAM generates more linguistically mediated targeted regressions in shorter sentences, this would be in line with human data (Inhoff & Weger, 2005).

Summary
We showed that both SEAM and SWIFT can be fitted to the Mertzen et al. (2023) experimental data set.In contrast to SWIFT, however, SEAM's predictions are in good agreement with the overall and by-frequency regression probabilities and regression-path durations.SEAM shows the more specific memory interference effects, that is, differences in regression probabilities and regression-path durations due to differences in the animacy and subjecthood of a distractor noun.
Given that the compared models SEAM and SWIFT only differ in the supplemental cuebased memory retrieval processes contributed by the LV05 component, we can attribute the better performance of SEAM in these metrics to LV05 principles with the four additional parameters that were fit to the training data from Mertzen et al. (2023) (F, d, µ 2 , and µ 3 ).It is also noteworthy that these parameters were estimated based on a restricted training data set for each participant, and that the model can make reasonable predictions on the held-out test data for all experimental conditions with a single model fit for each participant.
Furthermore, even though the models are compared to each other and to the experimental data using summary statistics and predicted experimental effects, neither SWIFT nor SEAM was directly optimized to reproduce these measures.Instead, both models were fitted directly to the raw, unbiased fixation sequences of each participant.Therefore, the models can make reasonably accurate predictions for summary statistics and experimental effects although they are not specifically fitted to them.

Discussion
We showed that adding a memory interference mechanism in the SWIFT architectureresulting in the SEAM model-allows us to bring together eye-movement control theory and a psycholinguistic account of dependency completion.We demonstrated that that the key regressive eye-movement related patterns in an experimental psycholinguistic data set can be accounted for by the SEAM architecture.Specifically, we showed that first-pass regressions and regression path duration patterns that occur due to the interference manipulation in the Mertzen et al. (2023) data can be accounted for by SEAM, but not by SWIFT; in SEAM, as in the data, both syntactic and semantic interference have an impact on the two dependent measures at the pre-critical region and the critical verb.
The main results of our simulations are summarized in Table 2 and Figures 9, 10, and 14.There were three interesting patterns in the SEAM fit that deserve discussion.First, as shown in Figure 9, at the critical verb, regression path durations from SEAM show essentially no effect of subjecthood; this is surprising because the data do show such an effect.At the same time, in SEAM, first-pass regressions at the verb show a clear subjecthood effect.This is because even though regressions were triggered at the verb, which should itself increase the mean RPD, regression paths predicted by SEAM return too early in the +subject conditions, thereby masking the effect on RPD 11 .
The second interesting pattern relates to the effects observed at the pre-critical adverb region (the attorney whose secretary had forgotten [. . .] frequently complained, see Example 3).Recall that in the original LV05 model, sentences are processed in strictly serial order.Effects of similarity-based interference at the pre-critical adverb are thus unexpected under this model: Given the assumption that the verb is the retrieval trigger, there should be no retrieval-related effects before it is read.Nevertheless, Mertzen et al. (2023) did observe interference effects at the pre-critical adverb (others have found similar patterns in the pre-critical region; see Lago et al., 2021;Van Dyke, 2007).Mertzen and colleagues discuss several possible reasons for these effects: Differential processing spillover from previous regions due to differences in sentence complexity between conditions, lingering memory interference during encoding of the noun phrases, and predictive processing of the verb.A final important possibility considered by Mertzen et al. (2023) is parafoveal preview of the verb while the adverb is being processed, so that the verb can trigger the retrieval prior to being fixated.Our SEAM simulations are partly consistent with this last account: In 25% of our simulations, the verb reaches the retrieval stage while the adverb is being fixated.However, there is also processing spillover in the form of residual word activation in SEAM.Especially in the +subject conditions, where there is an additional retrieval in the embedded sentence at was important, and the activation of the retrieval target may not have fully decayed when the adverb is read, leading to more regressions.Based solely on the Mertzen et al. (2023) data and the small sample size of the held-out data, it is difficult to quantify the relative contributions of preview and spillover, and we leave this issue to future research.Nevertheless, SEAM provides a promising starting pointing for tackling possible pre-critical retrieval effects.
A third noteworthy pattern occurs in Figure 14; the +subject/+animate condition causes a large increase in the distractor's word activation after the critical verb is encoded.This suggests that the probability of the distractor to attract regressions should be much higher in that condition than the sum of the +subject/-animate and -subject/+animate conditions.Even though the combination of the two retrieval cues is additive at the level of the LV05 memory activation (see Equation 1), the exponential transformation of A(t) in Equation ( 11) significantly amplifies it.Nevertheless, the superadditive effect on the distractor's activation when it matches both retrieval cues does not generate any detectable overadditive effects in the analyzed regression-related dependent measures (regression path duration and first-pass regressions).As discussed in the previous section, the spike in activation does not necessarily translate into observed regressions, partly because the large distance between the verb and the distractor amplifies the influence of oculomotor error.With less complex sentences, it is thus possible that SEAM would show effects on the observed regression probabilities.

General Discussion
From the very beginning of eye-movement research in reading, a dominant idea has been that the eye and mind are tightly coupled (e.g., Just & Carpenter, 1980).After psycholinguists started looking at fixation patterns in reading as a function of language comprehension difficulty, an important idea that was expressed in a now-classic paper by Frazier and Rayner (1982) was the selective reanalysis hypothesis: this was the idea that increased comprehension difficulty (e.g., due to garden-pathing) leads to targeted regressions to a preceding region that caused the processing difficulty.Although the strongest version of selective reanalysis, and thus of the eye-mind assumption, is difficult to uphold given subsequent investigations (e.g., Mitchell et al., 2008;von der Malsburg & Vasishth, 2011), it is nevertheless well-established that increased regressions are triggered when language processing difficulty occurs (e.g., Clifton et al., 2007), and that rereading can aid comprehension (Schotter et al., 2014).We assume that the mixed evidence in the psycholinguistic literature regarding selective rereading (see Paape et al., 2022 for a review) may be the result of a more indirect linkage between higher-level sentence processing and saccade targeting: In our model, retrieval events during dependency completion affect the activation values of previous words in the sentence.Words with higher activation will tend to attract saccades, but due to the inherent stochasticity of the eye-movement control system and oculomotor error, subtle linguistic manipulations do not necessarily engender measurable effects at typical sample sizes.
Most of the psycholinguistic work carried out on reading until now has side-stepped the underlying complex latent processes involved in reading, and instead focused only on key events involved in linguistic dependency completion.Abstracting away from these underlying latent reading processes has had many advantages, a major one being that it allows us to focus exclusively on the psycholinguistically interesting aspects of processing at the level of the sentence representation.On the other hand, the simplification comes at a cost, because interactions between constraints on eye-movement control and language comprehension end up being ignored.
Interestingly, cognitive psychology has gone in a completely different direction than psycholinguistics: there, the focus has been on spelling out detailed process models of eye-movement control that rely primarily on relatively low-level drivers of eye movements, such as frequency and word length.Models of eye-movement control such as E-Z Reader (Reichle et al., 1998) and SWIFT (Engbert et al., 2005) have shown excellent performance in explaining benchmark data in reading, without modeling the higher-level cognitive processes such as linguistic dependency completion in any great detail.
One major gap in the literature is that these two threads-psycholinguistic explanations of reading difficulty versus cognitive psychology models of reading-have only rarely been considered to be joint actors in explaining key effects observed in experimental data from psycholinguistics.Our paper makes an attempt to fill this gap: using data from a classic similarity-based interference design, we demonstrate one way in which an eye-movement control model, SWIFT, can be extended to include dependency completion processes.We show that such an extended model (SEAM) can produce regressive eye movements triggered by retrieval that occurs during linguistic dependency completion.Developing such models is the only way to unpack the latent processes involved in reading and to investigate how lowand high-levels of cognitive processes interface dynamically.To our knowledge, SEAM is the only model to date that extends a complete model of eye-movement control with a detailed model of linguistic dependency completion, using data from a planned experiment in psycholinguistics and rigorous statistical inference.
Apart from using SWIFT as the eye-movement module, SEAM differs in important ways from previous integrative models of eye movement control and higher-level sentence processing.For instance, Über-Reader (Reichle, 2021), whose eye movement module is highly similar to that of E-Z Reader (Reichle et al., 1998), has a parsing module that builds syntactic structure, but each parsing step is assumed to take the same amount of time.In SEAM, by contrast, completing syntactic dependencies takes a variable amount of time that is determined by the LV05 Equations (which originally come from ACT-R).Furthermore, regressive saccades are not captured by Über-Reader, but are modeled dynamically in SEAM.
Another integrative model proposed by Dotlačil (2021), whose eye movement module is also based on E-Z Reader, makes use of ACT-R Equations, but in a different way from SEAM: In Dotlačil's model, the latency with which a given dependent word is integrated into the sentence's syntactic representation depends on the retrieval time for the dependent words and additionally on the retrieval time for the relevant parsing rule from declarative memory.SEAM does not assume retrieval of parsing rules, which are assumed to be represented as procedural knowledge, as in the LV05 model.Another salient difference between the models is that regressions in Dotlačil's (2021) model are only triggered when parsing failure occurs, while regressions in SEAM are driven by the dynamic target selection processes taken over from SWIFT.As a final comparison, the model of Engelmann et al. (2013) and Vasishth and Engelmann (2022) combines an LV05 sentence processing module with eye movement control based on EMMA (Salvucci, 2001), but also does not provide a detailed model of saccade targeting, unlike SEAM.
There are of course several limitations to the present work.First and foremost, the current implementation of SEAM and its evaluation are only a proof-of-concept.Because of the absence of large-scale data sets with psycholinguistically interesting manipulations, it is difficult to present a comprehensive evaluation of the proposed SEAM architecture.However, such an investigation is in principle possible to carry out, given (i) the progress on Bayesian inference for process-based models and (ii) the fact that more and more researchers are releasing data and code associated with their published papers.We expect that in future work, more comprehensive evaluations of architectures like ours can be carried out, using large-scale data from a broad range of phenomena in psycholinguistics.At a minimum, such an investigation would need to include cross-linguistic data from garden-path sentences of different types (e.g., Frazier, 1979), predictability manipulations (e.g., Levy, 2008), the full spectrum of similarity-based interference effects (e.g., Jäger et al., 2017), underspecification effects (e.g., Swets et al., 2008), etc.This would be a sizable project, but one which would significantly advance our understanding of how eye-movement control and parsing interface during reading.
A second limitation is that, due to the computational complexity of investigating such a detailed model of reading, formal model comparison between the baseline SWIFT model and the SEAM model is difficult to carry out.We avoided overfitting the models to data by separating the empirical data into a training set and a held-out set, and evaluating the model fit only on the held-out set.This is already a significant advance over conventional approaches to model evaluation; in both cognitive psychology and psycholinguistics, it is common to evaluate a model on the same data that it is trained on.In principle, it is possible to go even further than we did in this paper, and to evaluate predictive performance by using k-fold cross-validation.This would involve creating k (usually, in machine learning, k = 10) subsets of the data to train on, and then use the k held-out data sets for evaluation; this would allow us to compute a quantitative measure of average fit, such as expected log pointwise density (e.g., Gelman et al., 2014).We did not carry out such a quantitative evaluation because it would have been computationally extremely costly.For example, just the pure SWIFT model discussed in Rabe et al. (2021) required a high-performance computing environment, and the total computing time was approximately 10,000 core hours, amounting to 3.5 hours run time on 72 independent parallel nodes with 40 cores per node.Our goal in the present work was to get as close as possible to the underlying processes involved in reading, but obviously this comes with an unavoidable computational cost.

Conclusion
We present an integrated model of eye-movement control and linguistic dependency completion while reading.The called SEAM, is an integration of the SWIFT model of eyemovement control and the Lewis-Vasishth model of sentence processing.SEAM is evaluated using experimental data from a similarity-based interference experiment.We show that the SEAM model can account for empirically observed regressive eye movements; in the model, regressive eye movements are shown to be triggered by retrieval processes that result from higher-level dependency completion during sentence parsing.To our knowledge, this is the first demonstration of how eyemovement control and sentence comprehension processes can interact in explaining data from a psycholinguistically controlled experiment.

Figure 3 Experimental
Figure 3 Figure 4 Figure 5 Figure 6 Figure 7 Figure 8 Displayed are relevant regression-related fixation probabilities and durations as estimated grand means from a linear mixed-effects model.Ribbons are 95% CrIs around the point estimate.Fixation targets (words) are grouped by log corpus frequency bins.FPR = First-pass outgoing regression probability, FPRI = Firstpass incoming regression probability, FPS = First-pass skipping probability, FPRT = First-pass reading time (gaze duration), RPD = Regression path duration (go-past time), TVT = Total viewing/reading time.Words were not grouped into regions and all words of the sentences were considered.

Table 2
Summary of Empirical vs. Model Estimates From SEAM and SWIFT of the Subjecthood and Animacy Effects on Regression Path Durations and First-Pass Regressions.