Introduction

For over 4 decades, substantial research work has been devoted to eye movement analyses of real time processing of words and sentences during reading (Radach & Kennedy, 2013; Rayner, 2009). Within this field, a central line of research has focused on the spatially distributed nature of linguistic processing. There is a rich literature on parafoveal processing of words within sentences, but within this literature, very little attention has been paid to syntactic aspects of reading so far. This is somewhat surprising, as the processing of syntax is one of the central topics of psycholinguistic research in general, including quite a bit of eye movement work (see Clifton et al., 2007, for a seminal review on eye movement research on the sentence level).

A central theoretical question is whether syntactic properties words within the context of a sentence are processed prelexically at an early stage or only postlexically. Answers given to this question are intimately related to the wider issue of modularity vs. interactivity in language processing. Proponents of interactive approaches suggest that the interaction of semantics and syntax pre-activates lexical candidates for the following word (Friederici & Jacobsen, 1999). Context (both semantic and syntactic) is thought to reduce the search effort for matching items in the lexicon (Bates et al., 1996). Accordingly, information about a word category or grammatical gender would lead to the pre-activation of a lexical subgroup, facilitating the processing of congruent and inhibiting incongruent words.

An alternative theoretical position is taken in modular models of lexical access (see, e.g., Tanenhaus & Lucas, 1987, for a detailed discussion). Here, the language processing system is assumed to consist of functionally autonomous modules. For each particular module, the relation between input and output is seen as independent of the information processed by other modules. This view, referred to as the autonomy hypothesis, makes two predictions with respect to lexical processing: (1) the information acquired at each stage is invariant with respect to different processing contexts and (2) the rate at which this information becomes available is also not dependent on context. Modules are assumed to communicate only at the ends of input and output, so that the output of one module serves as input to the subsequent one. Consequently, the internal processes of one module do not have access to the process status of other. With respect to syntax in written language, this would mean that it is processed postlexically and has no influence on the availability or pre-activation of words as reading progresses through a sentence. One theoretical argument for this view is that that pre-activation of all matching words available in the lexicon would involve far too much computational effort (Tanenhaus & Lucas, 1987).

Different hypotheses with regard to the processing of syntactic information during reading can be derived on the basis of the two classes of models. Interactive models suggest early processing of syntactic information and therefore predict both inhibitory effects in the presence of incongruent syntax and facilitatory or pre-activating effects in the presence of congruent syntax. The modular models, on the other hand, postulate post-lexical processing of syntax and therefore predict only inhibitory, but no facilitatory (i.e., priming) effects.

Behavioral and neurophysiological studies in which subjects were exposed to semantic and syntactic errors provide data relevant to the issue. Commonly observed event-related potentials (ERPs) during the reading of rule-violating sentences are the N400, which is observed in the processing of mismatching semantic information (Kutas & Federmeier, 2007) and the P600, which is related to syntactic information processing (Friederici, 1999; Hagoort et al., 1993; Hagoort, 2003, for a review, see Molinaro et al., 2011). Early left anterior negativity (< 200 ms) is observed during the processing of an incongruent sentence structure, word category (Friederici, 2002) or genus incongruence (Guajardo & Wicha, 2014). To make more specific statements about the temporal course of syntax processing, Deutsch and Bentin (2001) conducted experiments on syntactic priming and concluded on the basis of ERPs that syntactic information of words is processed at an early stage, even if interaction with semantics only occurs postlexically.

Only a few recent studies have examined early processing of syntactic information during reading using eye movement methodology. Brothers and Traxler (2016) tested the assumption that each word within a sentence generates an expectation that acts to constrain the syntactic category of subsequent words. According to these authors, early processing of word class constraint may take place in an anticipatory fashion, therefore facilitating lexical access. They hypothesized that word class congruency effects should become manifest on early eye-tracking measures such as the likelihood for fixating a word, usually referred to as skipping rate (1—fixation likelihood). To test their assumptions, they conducted three experiments in which syntactic congruence of verbs or nouns was manipulated using the boundary technique of saccade contingent display changes (see our method section below for a description). As one key result, they observed a lower fixation frequency (higher skipping rate) for syntactically valid (The admiral would not confess…) compared to invalid previews (The admiral would not surgeon…). It is well established that word repetition facilitates early lexical processing and leads to higher skip rates (Traxler et al., 2000). In a third condition (the admiral would not admiral…), this effect was shown to be inferior to syntactic fit in Brothers and Traxler (2016). The authors take this finding to imply that word class constraints may take precedence over lexical repetition effects. Overall, they conclude that their data are most consistent with an anticipatory account, in which readers preactivate syntactic constraints on upcoming words and immediately use these constraints to facilitate early stages of lexical identification.

Similar findings were obtained in a study by Snell et al. (2017). In one of their experiments using the boundary method, they presented sentences in which the target in the parafovea was either identical, congruent or incongruent with respect to part of speech. They reported higher fixation probabilities and longer first-pass fixation viewing times on the target when it was incongruent. These authors also employed a flanker paradigm in which they presented target words in the fovea that were combined with words on the left and right, resulting in syntactically congruent vs. incongruent combinations. In this task, response times and error rates were substantially lower in the congruent flanker condition. This was taken to indicate that lexical information may be gathered and integrated in parallel across multiple words, allowing for detection of a syntactic congruence vs. incongruence.

According to Veldre and Andrews (2018), however, the findings of the two studies discussed above might be attributed to plausibility rather than syntactic effects. They conducted two further experiments in which syntactic fit and plausibility were manipulated separately. Both experiments showed unique significant benefits for syntactically correct previews for early oculomotor parameters such as first fixation duration but also gaze duration and go-past time. Significant effects for fixation probability were found in one of the two experiments. The authors conclude that syntactic word class information is indeed processed relatively early during the course of sentence reading.

Cutter et al. (2020) conducted a study to investigate fixation probabilities while reading contextually predictable or unpredictable words in both syntactically legal and illegal positions. They observed that unpredictable words in a syntactically legal sentence context were associated with higher fixation probabilities, but there were no clear effects for the illegal condition. These results suggest that syntactic information is processed parafoveally and may be more influential than predictability information, or at least that the two types of information interact.

In addition to word class, a second important type of syntactic information is grammatical gender. In many languages like Spanish, French, and German, explicit grammatical gender is a ubiquitous part of syntax. In German, a minimal noun phrase consists of an article and a noun with explicitly marked gender correspondence. For example, “the house”, “the tree”, and “the bench” are expressed in German as “das Haus” (neuter), “der Baum” (male), and “die Bank” (female). The article specifies the noun in terms of singling it out as one specific object from the set of possible objects that can be denoted by a noun—e.g., a chair from the set of all chairs (Schmuck, 2020; Vater, 1984). Articles as part of a noun phrase provide information on case, number, and gender of the corresponding noun. Collectively, these features are referred to as phi-features (Adger & Harbour, 2008).

Some evidence for the use of gender correspondence comes from gender priming in word recognition tasks (Friederici & Jacobsen, 1999). In experiments where a prime provides a useful vs. misleading syntactic cue to the target, robust effects of syntactic inhibition were found, whereas effects of facilitation appear inconsistent. Facilitation effects occur especially when auditory sentence material is involved and more so in Lexical Decision Tasks compared to Naming Tasks. In addition, the effect appears more frequently in phonologically transparent languages, since in these, it is possible to infer sound properties of the noun on the basis of grammatical gender (see Friederici & Jacobsen, 1999, for a review).

We are aware of the fact that results like this can only be transferred to continuous reading with caution. According to Kuperman et al. (2013), only a small proportion of the variance in viewing times in natural reading can be explained by reaction times in the lexical decision task. Similar doubt may be in place vis-à-vis results of studies in which overt violations of gender matching are presented during the reading of sentences, as was done in electroencephalogram (EEG) studies, showing N400 effects of syntactic mismatch (e.g., Gunter et al., 2000). Effects in such studies occur in response to massive disruption of sentence processing and might not be fully representative of natural reading.

Our work is the first to study gender effects within noun phrases during normal reading for meaning. Our intention was to examine whether gender information provided by the article in a phrase like “die Musik” (the music, female) or “der Klang” (the sound, male) is utilized when this information becomes available in the parafovea during the reading of sentences. The alternative would be that fluent readers ignore this syntactic cue and focus processing on the noun as soon as its letters become parafoveally visible. The example of English provides ample evidence that the processing of noun phrases can proceed smoothly without the presence of any gender-specific cues.

We used the boundary method of saccade contingent display changes (Schotter et al., 2012) to provide either a gender match or a mismatch between article and noun as long as the eyes were fixating positions to the left of an invisible boundary. The boundary was located immediately to the right of a verb preceding the noun phrase. When the eyes crossed this invisible line, any mismatching parafoveal preview was restored, so that the noun phrase appeared completely normal as soon as it was fixated. The gender match between the first part of the sentence (up to the verb) and the subsequent article was always preserved, so that both versions of the article formed a grammatically and semantically suitable continuation of the sentence (see Appendix for all materials). Therefore, any effect in terms of increased viewing times would be based entirely on the parafoveal processing of the syntactic relationship between article and noun. To examine both parts of the nominal phrase, the article and the noun were manipulated in separate experiments. We suspected that early utilization of gender information might only take place when concurrent lexical processing allows for sufficient cognitive resources. Therefore, the frequency of the noun within the critical phrase was manipulated as a second factor, based on the seminal work of Inhoff and Rayner (1986) who first showed that frequent words are more effectively processed in the parafovea.

Experiment 1

In the experiment, participants were asked to read a series of sentences in which a sequence of verb, article and noun was placed at a central position within the sentence. The sentences were presented in one of four conditions, with either a preview of a high vs. low-frequency noun, combined with a fitting or non-fitting article. In the fitting condition, the article and the noun shared the same grammatical gender, resulting in a grammatically correct sentence. In contrast, in the non-fitting condition, an incorrect article was presented before the noun, resulting in a grammatically incorrect noun phrase in the parafovea (e.g., the noun is feminine and the article is masculine). An invisible boundary was placed at the left edge of the empty space before the article, so that when the eye had crossed the boundary, the correct article was displayed in foveal vision.

Importantly, the article was always consistent with the preceding verb, so that a parafoveal syntactic mismatch could only be noticed after the noun had been processed. If grammatical gender information can already be processed in the parafovea, longer viewing times and higher fixation probabilities on the article are to be expected. However, as the syntactic fit can only be verified after the noun has been processed (N + 2), higher viewing times should also occur on the noun in the case of a syntactically mismatched preview. Since the processing effort for high-frequency words in the parafovea is substantially lower (Inhoff & Rayner, 1986), this effect should be more pronounced in high-frequency nouns.

Method

Participants

36 subjects took part in the experiment. 33 of these were undergraduates at the University of Wuppertal. All were native German speakers with normal or corrected-to-normal vision and were unaware of the purpose of the experiment. Participants received partial course credit as compensation.

Power analyses were conducted in R (Version 3.5.2; R Core Team, 2017) using the mixedpower package (Version 0.1.0; Kumle, et al., 2021) following approaches recommended by Kumle et al. (2021) for estimating power for linear mixed models. The sample size was selected to ensure that simulations demonstrated a detection power of over 80% for even small effect sizes.

Materials and design

Stimuli consisted of 96 single-line sentences, 77–83 letters in length. At a central position within the sentence, a common sequence of verb, article, and noun was presented (see Appendix for a list of all the sentences used). The noun was five-to-six letters long and either high or low in frequency. High-frequency words were chosen as having a frequency count of at least 30 occurrences per million (opm) and low-frequency words had a frequency count of less than 10 per million based on the speech norms in the SUBTLEX corpus (Brysbaert et al., 2012). The mean word frequency for the low-frequency group was 3.8 opm with a standard deviation of 2.87 opm and a range between 0.04 and 9.76 opm. For the high-frequency condition, the mean was 93 opm with a standard deviation of 119.49 opm and the values ranked between 10.35 and 569.96 opm.

The boundary paradigm was used to manipulate parafoveal preview of the article. Four conditions were created: (1) Identical, HF: identical article, high-frequency noun, (2) Identical, LF: identical article, low-frequency noun (3) Mismatch, HF: syntactically mismatching article, high-frequency noun (4) Mismatch, LF: syntactically mismatching article, low-frequency noun. A sample of sentence pairs is shown in Fig. 1. Sentences appeared in all conditions across four counterbalanced lists.

Fig. 1
figure 1

An example sentence in each of the four conditions in Experiment 1, along with a literal translation into English 1: a high frequency, identical, b high frequency, syntactic mismatch, c low frequency, identical, d low frequency, syntactic mismatch

Stimulus validation

A separate set of 30 subjects participated in a cloze norming task to evaluate the predictability of the noun. This norming task indicated that sentence contexts were neutral, with on average the low-frequency noun only being produced in 0.94% (SD = 3.9%) of the time and the high-frequency noun in 1.7% (SD = 3.8) of the time. There was no difference in the predictability of the high- and low-frequency nouns, F(1, 190) = 2.22, p = 0.138.

Apparatus

Eye movements were recorded using an SR Research Eyelink 2 k eye tracker running at a sampling rate of 2,000 Hz. Sentences appeared in single lines in black monospaced font on a light gray background. Participants were seated approximately 70 cm away from a 21-inch CRT monitor with a screen resolution of 1,680 × 1,050 pixels and a refresh rate of 120 Hz. A chin and forehead rest were used to minimize movements of the head. At this distance three characters subtend approximately 1° of visual angle.

Procedure

Participants were asked to read each sentence at their normal pace to understand its basic meaning. At random intervals (on average after four sentences), a comprehension question appeared that required a verbal response. Comprehension questions targeted sematic relations within sentences, ensuring reading for meaning (Radach et al., 2008). The experiment started with a three-point calibration procedure followed by four practice trials. The 96 experimental sentences were then presented intermixed with 95 filler sentences, which varied in their grammatical structure. At the beginning of each trial, the subjects were presented with a fixation cross in the same position as the first letter of the sentence. Tracker accuracy was monitored throughout the experiment, and re-calibrations were performed every five trials, after each comprehension question or when calibration error exceeded 0.3 degrees of visual angle. When the current trial was completed, the participant pressed a button to advance to the next sentence.

Results

Fixations below 70 ms or above 600 ms were eliminated (0.8% of total fixations). Gaze durations above 1,000 ms (0.7% of trials), and go-past times above 2000 ms were also excluded (0.5% of trials). Trials in which the subject blinked immediately before or after fixation were eliminated (3.05% of trials). These exclusions left 3272 trials (96.91% of the data) available for analysis. On average, 96.3% of comprehension questions were answered correctly, indicating successful reading for meaning.

The following oculomotor measures were analyzed for the article and the target noun (see Inhoff & Radach, 1998, for a general discussion of eye movement measures): the fixation probability (probability of a word to be fixated during first-pass reading); first fixation duration (duration of the first fixation on the target word), gaze duration (sum of all fixations on the target word before moving to another word), go-past duration (sum of all fixations made from the moment the eyes land on the target word until the first fixation to the right of the target word), and the probability for a regression out (probability of the text fixation to land on a word left to the current fixation). Average subject means for the two mask conditions on each of these measures are presented in Table 1.

Table 1 Mean (and Standard Deviation) eye movement measures (based on means per participant and cell of the design) for Target Words (Article and Noun) across Preview Conditions in Experiment 1

Data were analyzed using (generalized) linear mixes models (LMM) with the lme4 package (Version 1.1-18-1; Bates et al., 2015) in R (Version 3.5.2; R Core Team, 2017). Unless otherwise noted, log-transformed data yielded the same pattern of significance as the analysis based on the raw data. To ease communication, we therefore report analyses of the untransformed data. The random effect structure of the models included slopes for all the fixed effects across subjects and items, including interactions (Barr et al., 2013), and were only trimmed down if the model did not converge. Absolute t values equal to or greater than 1.96 were interpreted as significant, given the number of observations, the t statistic in LMMs approximates the z statistic.

For all analyses, we included as fixed factors mask condition and noun word frequency. High-frequency nouns were coded as “1” and low-frequency nouns were coded as “2”. Originally, prior gaze duration and launch distance relative to the target word had also been included in the model, but both showed no significant interactions with the mask condition and were therefore removed from the models. The fixed effect estimates for the article are shown in Table 2 and for the noun in Table 3.

Table 2 Results of the (generalized) linear mixed-effects models for fixation duration and probability measures for the article in Experiment 1
Table 3 Results of the (generalized) linear mixed-effects models for fixation duration and probability measures for the noun in Experiment 1

Article viewing times

On the article, fixation probability was higher for the syntactically mismatching mask [z = 4.17]. There was a significant syntactical preview effect across all three viewing time measures, as readers spent less time fixating the article when the preview was syntactically matching the noun (all |t|s > 2.65; see Fig. 2). The mask condition had no influence on the probability of an outgoing regressive saccade [z = − 0.71].

Fig. 2
figure 2

Viewing times on the article as a function of preview condition (Experiment 1). Note. Error bars represent ± 1 standard error of the mean, calculated within-subjects

Noun viewing times

On the noun, the fixation probability was higher for the syntactically mismatching mask [z = 2.21]. In addition, all viewing times measures showed a reliable effect of the mask condition. First fixation duration, gaze duration, and go-past duration were substantially increased for the syntactically mismatching mask of the article (all |t|s > 2.25, see Fig. 3). The analysis of regressions out revealed that readers were more likely to start a regression out of the noun, if the article mask was syntactically incorrect [z = − 4.25]. Further analysis of the regression target revealed that with 77.8% most of the regressions landed on the article, 19.8% on the verb, and 2.3% on other words at the beginning of the current sentence (see Fig. 4 for absolute values). Numerically, the article appeared to be viewed again most frequently when a syntactically inappropriate mask was presented. However, there was no significant difference between the mask conditions for the regression target [b = 0.03, SE = 0.23, z = 0.12].

Fig. 3
figure 3

Viewing times on the noun as a function of preview condition (Experiment 1). Error bars represent ± 1 standard error of the mean, calculated within-subjects

Fig. 4
figure 4

Regressions out of the noun as a function of preview condition and target word (Experiment 1)

Discussion

In Experiment 1, subjects read sentences for comprehension that included a parafoveally matched or mismatched nominal phrase. The aim of the study was to determine whether and at what stage gender information within phrases is utilized during reading in German.

Our results indicate a reliable effect of gender marking. The fixation probability of the article decreased when an identical word mask was presented. Accordingly, when a gender mismatch was presented in the parafovea the article was skipped less frequently. First fixation duration, gaze duration, and go-past time were also increased when a syntactic mismatch was presented. Equally pronounced was the effect of parafoveal gender mismatch on the associated noun. A non-matching article in the parafovea led again to longer viewing times for all parameters included. Based on these results, it can be concluded that the entire nominal phrase was processed in the parafovea, enabling the detection of a gender mismatch. Particularly the clear effects on the noun, even though this word was not manipulated between parafoveal and foveal viewing, provide evidence for a joint processing of the nominal phrase in terms of a “word group” (Radach, 1996; see Drieghe et al., 2008 for studies on word group effects in English).

Interestingly, we found no significant interaction involving the preview condition. The gender preview effect appears to be independent of noun frequency, long vs. short prior gaze duration, and far vs. near launch distance. It might be the case that the replacement of the extremely frequent article has directed attention to its location, as evident in the large number of regressions back to this position. This may have obscured an effect of noun frequency. The missing interaction with noun frequency may also raise the question if the longer viewing times on the noun were the result of a spillover effect rather than evidence for parafoveal processing of the nominal phrase (e.g., Findelsberger et al., 2019; Schroyens et al., 1999). Furthermore, a shortcoming of Experiment 1 was the lack of an appropriate baseline condition. In the two conditions with gaze-contingent display changes, the changed stimuli were both syntactically and orthographically different (e.g., “den Fisch” vs. “die Fisch”). It could therefore be argued that the longer viewing times are to some extent a result of the orthographic changes.

To exclude the possibility that a spillover or an orthographic preview effect might be the source for the observed longer viewing times on the noun, this word itself was masked in a second experiment. In contrast to Experiment 1, word N + 2 was manipulated instead of word N + 1, allowing for a neutral baseline (cf. Veldre & Andrews, 2018). In addition to the no change and the syntactically invalid mask, a syntactically valid, gender matching mask was presented as a third condition in the parafovea.

Experiment 2

Method

Participants

For Experiment 2, the sample comprised 36 participants with a mean age of 26.4 years. Most of them (n = 30) were undergraduate students from the University of Wuppertal. None of the participants had completed Experiment 1 or any of the validation studies. All were native German speakers with normal or corrected-to-normal vision and were naïve to the purpose of the experiment. Participants received partial course credit as compensation. Power analysis was performed using the same procedure as in Experiment 1.

Material and design

The stimuli were 84 single-line sentences, 77–83 letters in length. In a central position within the sentence, a sequence of verb, article, and noun was presented (see Appendix for a list of all sentences used). The verb and the noun were both five-to-six letters long and high in frequency with frequency counts of at least 30 occurrences per million, according to the part-of-speech norms in the SUBTLEX corpus (Brysbaert et al., 2012). There was no difference in lexical frequency between the three preview conditions (F(2) = 0.35, p = 0.7).

This time the boundary paradigm was used to manipulate the parafoveal preview of the noun. The invisible boundary was placed right behind the verb resulting in an N + 2 manipulation. Three different preview conditions were created: (1) identical noun, (2) syntactically matching noun, and (3) syntactically mismatching noun. Each noun was presented in a sentence context that was designed to be neutral and not predict one of the three preview nouns (all cloze scores < 0.05). A sample of sentence pairs is shown in Fig. 5. All sentences appeared in all conditions across four counterbalanced lists.

Fig. 5
figure 5

Example sentence in each of the three conditions from Experiment 2: a identical, b syntactic match, and c syntactic mismatch

Apparatus

The apparatus and display parameters were identical to Experiment 1.

Procedure

The procedure was identical to Experiment 1.

Results

Data were trimmed identically to Experiment 1. Very short or long fixation durations (1.3%), gaze durations (0.8%), and go-past durations (0.9%) were excluded. In addition, trials in which the subject blinked immediately before or after fixation were eliminated (2.53% of trials). These exclusions left 2,620 trials (95.53% of the data) available for analysis. Participants answered an average of 94.36% of the comprehension questions correctly.

The same eye movement measures were analyzed as in Experiment 1. Mean values for the different preview conditions are presented in Table 4. Contrasts were set to compare the differences between (1) the identical preview with the syntactically matching preview and (2) the syntactically matching with the non-matching preview. Through this procedure, evidence can be obtained on whether there are differences between a purely orthographic and an additional syntactic change, therefore providing an adequate control condition. In Tables 5 and 6, the (G)LMM estimates for coefficients, standard errors, and t/z values for the fixed effects are reported.

Table 4 Mean (and standard deviation) eye movement measures for the target words (article and noun) across preview conditions in Experiment 2
Table 5 Results of the (generalized) linear mixed-effects models for fixation duration and probability measures for the article in Experiment 2
Table 6 Results of the (generalized) linear mixed-effects models for fixation duration and probability measures for the noun in Experiment 2

Article viewing times

Readers were equally likely to fixate the article in the identical, syntactic matching, and syntactic mismatching preview condition [both |z|s < 1]. There were significant differences between the identical and the syntactically matching preview condition with shorter first fixation duration on the article after an identical preview [t = − 2.3]. The go-past time was also increased when a syntactically matching compared to an identical preview was presented [t = − 2.28].Footnote 1 There was no difference between the two conditions regarding gaze duration [t = − 1.64]. Likewise, the probability for an outgoing regression was not affected by a syntactically matching in comparison with an identical mask [z = 1.14].

The first fixation duration, gaze duration, and go-past time showed no syntactic validity preview effect [all |t|s < 1.8, see Fig. 6]. Similarly, the effect did not show up in different probabilities for an outgoing regression [z = 0.1].

Fig. 6
figure 6

Viewing times on the article as a function of preview condition (Experiment 2). Error bars represent ± 1 standard error of the mean, calculated within-subjects

Noun viewing times

The probability that the noun was fixated was higher in the syntactically matching mask condition than in the identical mask condition [z = − 3.74]. But readers were equally likely to fixate the noun in the syntactically matching and mismatching condition [z = 0.75].

Across all reading measures, there was a significant difference between the identical and the syntactically matching preview condition. The viewing durations were longer when a syntactically matching noun preview was presented compared to an identical preview [all |t|s 0 > 4.48]. In addition, a greater proportion of outgoing saccades were progressive after an identical preview was presented compared to the syntactically matched preview [z = 2.74].

There was a significant preview effect of the syntactic match across all viewing time measures. The first fixation duration, gaze duration, and go-past duration were increased when a syntactically mismatched preview was presented compared to a syntactically matching preview [all |t|s > 2.61, see Fig.7]. This effect was also evident in a higher number of regressions out of the noun [z = − 2.52] when the preview was syntactically mismatching.

Further analysis of the regression target revealed that with 77.84% most of the regressions landed on the article, 19.81% on the verb, and 2.34% on other words at the beginning of the sentences (see Fig. 8  for absolute values). Numerically, the article was fixated again especially for syntactically inappropriate masks. However, there were no significant differences for the regression target between the identical and syntactic matching preview [b = 0.1, SE = 0.16, z = 0.66] and the syntactic matching compared to the mismatching preview [b = − 0.5, SE = 0.15, z = − 0.34].

Fig. 7
figure 7

Viewing times on the noun as a function of preview condition (Experiment 2). Error bars represent ± 1 standard error of the mean, calculated within-subjects

Fig. 8
figure 8

Regressions out of the noun as a function of preview condition and target word (Experiment 1)

Discussion

In Experiment 2, as in Experiment 1, subjects read sentences for comprehension in which the syntactic fit of a nominal phrase was manipulated. This time, however, the noun and not the article was varied. There were three conditions in which the noun was presented, either identically, syntactically matching or syntactically mismatching in the parafovea. In this design, effects of orthographic and syntactic mismatch could be dissociated. The results clearly indicated that the effect of a combined manipulation on both levels was substantially larger compared to the isolated orthographic manipulation. This provided solid evidence for the utilization of extrafoveal syntactic information during reading in German. When the article was fixated, the orthographic mismatch in both preview conditions led to an increase in fixation durations. For fixations on the noun, effects on the orthographic level were evident in all reading measures, as could be expected from any disruption of basic word processing. More interesting was the fact that the additional syntactic effect was also evident in all reading parameters except fixation probability. The benefit from a gender matching preview relative to a gender mismatching preview was apparent in both shorter viewing times and less outgoing regressions.

These results confirm that syntactic information with regard to grammatical gender is accessible at an early stage of processing, when the source if the critical information is still located in the parafovea. The increased processing load is carried to later stages, and the increased frequency of regressions to words earlier in the sentence suggests that the readers attempt to deal with the perceived syntactic incongruity.

General discussion

The main aim of this study was to illuminate the spatially distributed processing of grammatical gender information during the reading of sentences in German. Previous research had shown that various factors can influence parafoveal word processing. There is evidence that orthography, phonology, morphology, and, to some extent, semantics can be processed in the parafovea (see Schotter et al., 2012, for a detailed discussion). Evidence on parafoveal preprocessing of the subsequent word (N + 1) is quite robust after several decades of research. To what extent these effects can be extended to the next word (N + 2) is the subject of ongoing debate (e.g., Vasilev & Angele, 2017).

So far, relatively little attention has been paid in the eye-tracking literature to the time line of syntax processing during sentence reading. Initial studies pointed to the possibility that syntax in the form of word type may be processed early, with effects even on the probability of fixating a parafoveally visible word (Brothers & Traxler, 2016; Veldre & Andrews, 2018). To our knowledge, syntax in the form of grammatical gender has not been part of eye movement research so far. The results of the present study, therefore, add a new angle to the debate, suggesting that grammatical gender can have a substantial role in online reading comprehension and may influence early stages of processing.

In our present work, evidence for parafoveal processing of gender-related syntactic information was present in both experiments. In Experiment 1, these effects materialized both for fixations on the article and the noun. The article had a higher probability of being fixated and the gaze duration and go-past time were increased when a syntactical mismatch had been presented. For the noun, both experiments show a strong effect of syntactic fit, which is reflected in all viewing time measures. These effects are most pronounced in late processing and regressions occur more frequently with syntactically mismatched parafoveal masks.

Before looking further into the central issue of parafoveal syntax processing, we will first consider the fact that our study adds to the accumulating body of literature on orthographic preview effect for word N + 1 (Experiment 1) and, more importantly, word N + 2 (Experiment 2). Our findings are clearly in line with earlier evidence on N + 2 preview effects, especially for a short word N + 1 (see also Kliegl, et al., 2007; Angele & Rayner, 2011; Radach et al., 2013; Cutter et al., 2017; Risse & Kliegl, 2012).

Taking the orthographic preview effect as a baseline, our Experiment 2 provides the first direct evidence that syntactic (gender) information can be extracted from a parafoveal word N + 2 during normal reading. Both parafoveal nouns, the one visible before and the one visible after the display change, were equally appropriate in the general context of the sentence. It is therefore the joint processing (either parallel or sequential) of both article and noun that must have created the viewing time effects we observed. Considering the dynamics of sentence reading, the article may provide a cue to the gender of the subsequent noun, serving as a constraint in the activation of word candidates. Ideally, suitable lexical candidates would be limited to syntactically fitting nouns, either in terms of facilitation for matching candidates or inhibition for non-matching words (see Balota et al., 1985, for a seminal discussion on how parafoveal processing and contextual constraint may interact during sentence reading).

Looking at this problem from the viewpoint of the E–Z reader model (Reichle et al., 1998), our results appear problematic on two levels. First, within the eye movement control machinery of this model, N + 2 preview effects should usually not occur on the lexical level. More specifically, the extraction of linguistic information from word N + 2 during the fixation of word N would require several steps: first, the completion of the L2 processing of word N, then, the completion of L1 and L2 processing of word N + 1, and finally, some amount of L1 processing of word N + 2. For this to happen, the L2 processing of word N and the full recognition of N + 1 would have to be exceedingly efficient (Radach et al., 2013). This assumption appears not implausible for our sentence materials, as the articles serving as N + 1 words are short and extremely familiar. Indeed, Rayner et al. (2007) argued that N + 2 preview effects may be possible when N + 1 words are short and common. Under such favorable linguistic processing conditions, the time line of L1 and L2 processing can approximate zero in serial attention shift models, and processing conditions might be established within which N + 2 preview effects can be accommodated.

However, evidence for N + 2 effects on the level of syntactic processing adds another layer of difficulty for sequential processing accounts. Staub (2011) performed a series of simulations using the E–Z Reader 10 model. He tested, among other aspects, whether and under which conditions post-lexical integration difficulties can have an influence on fixation probability. Similarly, Schotter and Leinenger (2012) discussed ways to accommodate higher-level parafoveal preview effects within a serial processing framework that preserves the assumption that such effects are due to failures in post-lexical integration.

We believe that this assumption appears hard to reconcile with our data. In our Experiment 1, the article was skipped more often when there was no syntactic fit with the noun. The decision to send a saccade to the article vs. the noun must be made no later than 70 ms before the end of the current fixation on word N (Deubel et al., 2000). Acquisition and utilization of syntactic information from word N + 2 to an extent that modifies the targeting decision for the next saccade at this early point (about 150 ms into an average fixation on word N) appear quite incompatible with the idea of post-lexical integration of syntactic information from two adjacent words.

Alternatively, article and noun may be processed simultaneously (at least with substantial temporal overlap), so that letter information accumulates in parallel from both words within the perceptual span (e.g., Inhoff et al., 2006) Reilly & Radach, 2006; Snell et al., 2018). The syntactic relationship of the two words would be gradually processed, as the transition from visual to orthographic and lexical information proceeds. Similarly, Snell et al. (2017) proposed that multiple words can be processed in parallel beyond the sub-lexical level, leading to higher order integration of information from several words’ positions within the current perceptual span.

Within such a parallel processing scenario, article and noun might be processed as one functional unit, initially forming a joint perceptual entity and subsequently a cognitive unit of meaning. The possibility that noun phrases may be processed in terms of an integrated “word group” has been initially proposed by Radach (1996); see Drieghe et al. (2008) for more recent empirical work and a detailed discussion. As the combination of articles and nouns in German is largely arbitrary (despite heroic attempts of teachers to provide rules, e.g., Vayenas, 2017), it appears possible that such two-word units are often explicitly represented in the mental lexicon, allowing for fast processing as a quasi-lexical unit. At this point, this idea has to remain speculative and open for subsequent research.

The idea that frequent minimal noun phases are processed as in integrated functional unit also offers an answer to the seemingly puzzling question why gender-specific articles matter at all in written German. Indeed, if all articles would be the same, as it is the case in English, or if all articles were omitted, German text would look and sound quite awkward, but it could still be read without any major loss in the understanding of sentence meaning. Apparently, the processing system takes the language structure as it is and utilizes each source of information within the perceptual span pragmatically for an optimal flow of information acquisition (see, e.g., Schmuck, 2020, for a linguistic treatise on the grammar of definite articles in German, Dutch and English).

In addition to the effects of parafoveal syntactic information on early oculomotor measures such as fixation probability for the two elements within the noun phrase, we also found substantial effects on late measures up to an increased frequency of regressive saccades (see Inhoff et al., 2019, for a recent review). The mismatch of gender information perceived in the parafovea leads to allocation of substantial resources into attempts to reconcile the seemingly conflicting syntactic information. As the German language allows a very flexible sentence structure, part of this re-analyses could be a check on whether the article actually refers to the noun. In addition, the grammatical gender of the word can be flexible due the productive nature of compounding nouns in German (Inhoff et al., 2000). If these potential problems were the main issue in re-analysis, regression should go to locations to the left of the noun phrase. This is clearly not the case, as in both experiments, most regressions target the article. The large number of regressions landing on the article is particularly striking in Experiment 2, where the noun was changed in the parafovea. The fact that immediate re-analyses is so strongly localized within the noun phrase underscores its importance as an integrated functional unit.

Our results are in harmony with a series of gender priming experiments using EEG methodology in German, indicating that grammatical gender can be processed early during the time-course of linguistic processing (Friederici & Jacobsen, 1999). This conclusion is also reinforced by results of a further ERP work where a syntactic mismatch was found to be associated with the occurrence of an early left anterior negativity (ELAN; Deutsch & Bentin, 2001; Gunter et al., 2000). In these experiments, however, there was always an overt violation of syntax, which could have led to expectancy effects and specific strategies. In contrast, our work demonstrated early effects of syntactic processing in an ecologically valid, natural reading situation.

In conclusion, the presents work contributes to the growing evidence in favor of spatially distributed processing of syntactic information in different written alphabetic languages, such as English (Brothers & Traxler, 2016; Veldre & Andrews, 2018), Dutch (Snell et al., 2017), and now German. Converging results have also been obtained for Korean, a non-alphabetic, predominantly syllabic writing system, by Kim et al. (2012). These authors reported that readers of Korean parafoveally process specific characters located at the rightmost position within a word, which serve to indicate the syntactic class (and semantic role) of nouns within a sentence.

The fact that in these studies evidence for parafoveal processing of syntactic information has emerged in early oculomotor measures provides evidence in favor of immediate and interactive rather than late and modular linguistic processing during reading (see Tanenhaus & Lucas, 1987, for a detailed account of the modular view). It has recently been argued quite convincingly, that there is still good evidence for modularity during sentence processing (Ferreira & Nye, 2018), so that it may to some extent depend on the particular construction and local processing constraints whether early distributed syntactic processing can indeed take place.