Investigating the relationship between individual differences and island sensitivity

Catherine Pham1, Lauren Covey2, Alison Gabriele3, Saad Aldosari4 and Robert Fiorentino5 1 Department of Psychology, The Pennsylvania State University, University Park, PA, US 2 Department of Linguistics, Montclair State University, Montclair, NJ, US 3 Second Language Acquisition Laboratory, University of Kansas, Lawrence, KS, US 4 College of Languages and Translation, Al Imam Mohammad Ibn Saud Islamic University, Riyadh, SA 5 Neurolinguistics and Language Processing Laboratory, Department of Linguistics, University of Kansas, Lawrence, KS, US Corresponding author: Lauren Covey (coveyl@montclair.edu)


Introduction
The current study addresses a theoretical debate regarding the source of syntactic island effects. Languages such as English allow for wh-movement, in which the wh-phrase (e.g. who) originates in one position and moves to a different position in the sentence, as in (1) (e.g. Chomsky 1981;1986). (1) Who did you see ___ yesterday?
The word who in (1) originates in the object position of the verb see and is then moved to the front of the sentence, leaving a gap at its original site. Wh-movement is argued to be subject to syntactic constraints, such that wh-phrases cannot be extracted out of certain syntactic structures called islands (Ross 1967). Example (2a) contains an embedded question, one type of island domain, and (2b) illustrates that extracting a wh-item out of the embedded question renders the sentence ungrammatical.
Glossa general linguistics a journal of Pham, Catherine, et al. 2020. Investigating the relationship between individual differences and island sensitivity. Glossa: a journal of general linguistics 5(1): 94. Native speakers have been shown to give low acceptability judgment ratings to sentences containing island violations (Sprouse 2007;Sprouse et al. 2011;Sprouse et al. 2012a), but there is little consensus regarding the source of these island effects (for further discussion see Sprouse & Villata in press). Proponents of the grammatical view postulate that syntactic constraints prevent extraction from islands (e.g. Phillips 2006;Wagers & Phillips 2009;Sprouse et al. 2012a;b;Yoshida et al. 2014;Sprouse et al. 2016;Kush et al. 2018), although related proposals have argued that island effects may be accounted for by semantic and pragmatic factors (e.g. Erteschik-Shir 1973; Kuno & Takami 1993;Szabolcsi & Zwarts 1993;Goldberg 2007;Truswell 2007;Abrusán 2014;Kush et al. 2019). In contrast to the grammatical view, proponents of the resource-limitation view argue that island effects arise when the processing costs associated with a sentence is too high, exceeding the individual's processing resources (e.g. Kluender & Kutas 1993;Kluender 1998;2004;Hofmeister & Sag 2010;Hofmeister et al. 2012a;b;. For example, Kluender and Kutas (1993) argued that island effects result from an "overload" of the limited capacity of processing resources available to the parser. This study builds directly on Sprouse and colleagues' (2012a) work addressing the debate between grammatical and resource-limitation accounts and examines the source of island effects by investigating the relationship between island sensitivity and individual differences in processing abilities, as Sprouse et al. argue that the two views make distinct predictions regarding whether a relationship should hold. The findings of the current study are poised to inform our understanding of the nature of island effects, as well as the extent to which individual differences affect language processing.

Grammatical vs resource-limitation view
One explanation of the grammatical view of islands is that island effects emerge due to violations of syntactic constraints that prohibit wh-extraction out of island structures (e.g. Ross 1967;Chomsky 1973;1986;Huang 1982). These island constraints are assumed to be an innate part of a native speaker's mental grammar and cannot be reduced to processing-based explanations.
In contrast to grammatical approaches, resource-limitation accounts (Kluender & Kutas 1993;Kluender 2004;Hofmeister & Sag 2010;Hofmeister et al. 2012a;b; claim that island effects arise not due to the violation of syntactic constraints, but instead due to processing difficulties. Under the resource-limitation theory first proposed by Kluender and Kutas (1993), it is assumed that the cost associated with processing a long-distance wh-dependency, which involves maintenance of the wh-filler while searching for the gap and retrieval of the wh-filler at the gap site, and the cost associated with processing an island structure both need to be active simultaneously for island effects to emerge. It also claims that the resources which are available for sentence processing are limited, and unacceptability emerges when the total processing cost necessary to parse a sentence exceeds the limited resources available. In short, islands are rejected because they are too difficult for the majority of native speakers to process.
To investigate whether island effects arise due to processing difficulties, Hofmeister and Sag (2010) examined how linguistic properties of the wh-phrase affect native English speakers' processing of sentences containing island violations. Stimuli tested complex whfillers (e.g. which employee), which are argued to facilitate the processing of sentences containing island violations, given that more semantically and syntactically complex whfillers have stronger mental representations compared to bare wh-fillers (e.g. who) and are thus expected to be easier to retrieve from working memory at the gap position.
Therefore, sentences containing complex wh-fillers were expected to elicit faster reading times after the verb compared to sentences containing bare wh-fillers, which was confirmed by the results of a self-paced reading task. Hofmeister and Sag (2010) argued that their results provided evidence in support of the resource-limitation view of islands because non-structural factors (i.e. the complexity of the wh-filler) affected the processing and acceptability of ungrammatical sentences containing island violations. Studies on Danish (Christensen et al. 2013a;b;Christensen & Nyvad 2014) have also argued in support of a processing-based explanation of islands. In related work, Keshev and Meltzer-Asscher (2019) showed that decreased acceptability of ungrammatical sentences containing wh-islands in Hebrew is at least partially induced by processing costs.
To further investigate the source of island effects and to tease apart grammatical and resource-limitation approaches, Sprouse et al. (2012a) examined the relationship between judgments of island violations and working memory in native English speakers. Sprouse et al. (2012a) argued that resource-limitation approaches should predict a relationship between an individual's processing resources and the size of island effects: under the resource-limitation view, those with greater working memory are expected to have more processing resources available and should find sentences containing island violations easier to process and more acceptable. In contrast, the grammatical view should not expect this relationship, as island violations are not permitted by the grammar and should thus be unacceptable, regardless of processing resources. Sprouse et al. (2012a) tested working-memory using a serial-recall task and a n-back task (Kirchner 1958;Kane & Engle 2002;Jaeggi et al. 2008). To measure island sensitivity, participants completed a task in which they rated the acceptability of English sentences, utilizing a 7-point scale (Experiment 1) or magnitude estimation (Experiment 2). Four island types were tested: whether, complex NP, subject, and adjunct. The stimuli were created using a 2 × 2 factorial design, manipulating the presence/absence of an island structure and the wh-dependency length. An example set of stimuli for an adjunct island is depicted in (3). (3) Non-island/Matrix a. Who ___ suspects that the boss left her keys in the car? Sprouse et al. (2012a) calculated a differences-in-differences (DD) score for each island type. To calculate DD scores, the first difference score (D1) is calculated by subtracting the mean acceptability rating of the island/embedded condition (3d) from the non-island/embedded condition (3b). This D1 score quantifies the effect of an island structure in sentences with a long-distance dependency. The second difference score (D2) is calculated by subtracting the mean acceptability rating of the island/matrix condition (3c) from the non-island/matrix condition (3a). Finally, the DD score is calculated by subtracting D2 from D1. This score quantifies the strength of island effects on a long-distance dependency compared to a shortdistance dependency. High DD scores indicate strong sensitivity to island effects and less acceptance of ungrammatical island violations. Low DD scores indicate weak sensitivity to island effects and greater acceptance of ungrammatical island violations.
As outlined by Sprouse et al., the resource-limitation approach would predict that individuals with better performance on the working memory tasks should show lower DD scores (less rejection of island violations). In contrast, grammatical approaches expect no such relationship. Sprouse et al. argued that their results revealed virtually no relationship between working memory and DD scores. In many cases, the relationships were not statistically significant, and in the cases in which there were significant effects, the amount of variance explained was nevertheless very small (R 2 value between 0.00-0.06). Thus, Sprouse et al. concluded that the perceived unacceptability of island violations cannot be reduced to processing difficulties and are likely due to the existence of grammatical constraints.
However, Hofmeister et al. (2012a;b) raised a number of criticisms regarding Sprouse et al. (2012a); the present study addresses several of these criticisms. One criticism was that the stimuli may have been too complex to process, given that the sentences included words like who in isolation without context. This could have masked the relationship between working memory and DD scores, such that even individuals with increased working memory resources may experience a processing breakdown given the extreme complexity. To alleviate some of the processing burden and increase the likelihood of finding a relationship between working memory and DD scores, the current study includes a background sentence prior to the target wh-question in order to establish context. We additionally utilize complex wh-fillers (e.g. which worker) instead of bare wh-fillers (e.g. who) since complex wh-fillers have been argued to facilitate the processing of wh-dependencies (Hofmeister & Sag 2010;Goodall 2015). Another criticism was that the serial-recall and n-back tasks used by Sprouse et al. (2012a) were not sufficient measures of working-memory capacity because these tasks are simple span tasks which do not include both storage and processing components. As Hofmeister and colleagues point out, the validity of Sprouse et al.'s (2012a) argument that there is no relationship between working memory and island sensitivity depends on the validity of the choice of working memory measure. To address this concern directly, we utilize a complex memory span measure used by Hofmeister and colleagues themselves (Hofmeister et al. 2014), which has been shown to predict language comprehension skills broadly (e.g. Daneman & Carpenter 1980;King & Just 1991;Just & Carpenter 1992).

The current study
We further investigate the role of individual differences on the processing of islands, building directly on Sprouse et al. (2012a) to account for the concerns of Hofmeister et al. (2012a;b).

Participants
102 native English speakers from the University of Kansas (32 males) were tested. Participants ranged in age from 18-34 (M = 20.7).

Materials
The current study utilized sentences 1 from Aldosari (2015), who made two important modifications to Sprouse et al.'s sentences in order to address concerns raised by Hofmeister et al. (2012a). A declarative background sentence preceded each test sentence to provide a context for the wh-question. Secondly, a complex wh-filler (e.g. which worker) was used in place of a bare wh-filler (e.g. who) because it has been argued that the use of complex wh-fillers facilitates the processing of wh-dependencies (Hofmeister & Sag 2010;Goodall 2015). Following Sprouse et al. (2012a), four island types were tested: whether, complex NP, subject, and adjunct islands. We created four conditions for each island type using a 2 × 2 factorial design manipulating presence of an island structure and the wh-dependency length. An example of the four conditions for the adjunct island type is shown in (4).
(4) Background sentence The helpful worker thinks that the boss left her keys in the car. 16 sets of sentences were created for each of the 4 conditions. In total, there were 64 sets distributed among 4 lists using a Latin-Square design; no filler sentences were included. 2 The sentences were divided into four blocks with the experimental sentences randomized in each block.

Tasks
During the acceptability judgment task, participants were first presented with the declarative background sentence and were instructed to press the space bar to advance to the next screen after reading it. The subsequent screen presented only the test sentence, and participants were asked to rate each target sentence using a 7-point scale ranging from totally unnatural to perfectly natural. There was no time limit for this task, and participants were provided with an I do not know option.
Because Hofmeister et al. (2012a;b) argued that the serial-recall and n-back tasks used by Sprouse et al. (2012a) were not true measures of working memory capacity, the counting span task (Case et al. 1982) and the reading span task (Daneman & Carpenter 1980) were used because these tasks contain both a memory component and a processing component. Crucially, the reading span task has been used to investigate the relationship between working-memory capacity and the processing of wh-dependencies in work by Hofmeister and colleagues (2014) as well as in other research (Johnson et al. 2016). In the counting span task, following Conway et al. (2005), participants were presented with a screen depicting a random arrangement of target objects (dark blue circles) and distractor objects (dark blue squares and light green circles). They were asked to count the number of target objects aloud and remember the number. After 2 to 6 screens, participants were prompted to input the total number of target shapes counted from the last set of arrays in order. For the reading span task, participants read sentences aloud, provided a semantic judgment about the sentence, said the letter presented on the screen out loud, and were asked to remember the letters. After 2 to 5 sentences, participants were prompted to input the letters from the last set of sentences (Conway et al. 2005). Accuracy on the working memory tasks was measured as the percentage of numbers/letters participants correctly recalled in order.
Additionally, we included a measure of attentional control given that it has been shown to capture individual variability in language processing (e.g. Hutchison 2007;Boudewyn et al. 2012;Zirnstein et al. 2018). Relevant to the current study, Johnson (2015) found that individuals with increased attentional control resources were more likely to engage in gap prediction during the processing of long-distance wh-dependencies. Following Johnson (2015), we included the number Stroop task in order to examine whether individuals with increased attentional control resources find island violation sentences (which contain a long-distance dependency) easier to process, resulting in increased acceptance of sentences with island violations. The number Stroop task measures participants' ability to attend to the target task despite interfering visual information. Following Bush et al. (2006), participants counted the total number of words presented on the screen, which ranged from 1 to 4, and pushed the corresponding button on a button box. For congruent trials, the words on the screen were monosyllabic animal words (e.g. cat cat), and participants were instructed to press the corresponding button (e.g. 2). For incongruent trials, the words on the screen were monosyllabic number words (e.g. one one one), and participants were instructed to press the button corresponding to the number of words on the screen (e.g. 3), inhibiting the meaning of the words. Reaction times and accuracy were recorded.
During the experiment, the three cognitive tasks were administered before the acceptability judgment task in counterbalanced order. The presentation software Paradigm (Tagliaferri 2005) was utilized to administer all tasks.

Predictions
The current study tests the predictions of the grammatical view and resource-limitation view regarding the source of island effects. The resource-limitation view hypothesizes that ungrammaticality of island violations is the result of an overload in processing resources (Kluender & Kutas 1993;Kluender 1998;2004). Under this view, it is possible that a negative correlation between individual differences in cognitive abilities and acceptability of island violations will emerge, such that as performance on the cognitive tasks increases, DD scores, which index island sensitivity, should decrease. In other words, individuals with increased working memory and/or attentional control resources may show lower DD scores. The grammatical view of islands hypothesizes no such correlation between individual differences in working memory and attentional control and the acceptability of island violations.

Acceptability judgment task analyses
The mean acceptability ratings for each condition are provided in Table 1, with higher scores reflecting greater acceptability. 3 3 A reviewer raised a concern about possible floor effects in the ungrammatical island/embedded conditions.
An analysis of the individual responses to the four island/embedded conditions showed that participants used the entirety of the 7-point scale, with responses ranging from 1-7 across the four island/embedded conditions. Furthermore, across the four conditions, between 79-96% of participants had a mean rating above the lowest rating of '1'. The standard deviation values in Table 1 reflect this variability in mean ratings. Prior to statistical analysis, each participant's acceptability judgment ratings were z-score transformed. We used linear mixed effects models to investigate whether participants were sensitive to island effects in the acceptability judgment task. Data were analyzed using R's lme4 and lmerTest packages (Bates et al. 2015;Kuznetsova et al. 2017; R Core Team 2019). For each island type, a full model was constructed which included the fixed effects Island Structure (non-island, island), Dependency Length (matrix, embedded), and the interaction term Island Structure × Dependency Length, as well as random intercepts for item and participant, as well as by-item and by-participant random slopes for each factor and the interaction term. The full model was simplified stepwise and likelihood ratio tests determined whether the inclusion of these random and fixed effects improved model fit. 4 The best-fitting model for each island type revealed a significant main effect of whdependency length (whether: est = 0.92, SE = 0.07, t = 12.52, p < .001; complex NP: est = 1.83, SE = 0.10, t = 17.88, p < .001; subject: est = 1.93, SE = 0.01, t = 19.35, p < .001; adjunct: est = 1.72, SE = 0.09, t = 18.86, p < .001). A main effect of island structure for each island type was also significant (whether: est = 0.77, SE = 0.07, t = 10.77, p < .001; complex NP: est = 1.28, SE = 0.11, t = 11.70, p < .001; subject: est = 2.03, SE = 0.09, t = 21.49, p < .001; adjunct: est = 1.61, SE = 0.08, t = 19.11, p < .001). These main effects reflect the fact that sentences with longer wh-dependencies were rated lower than those with shorter (matrix) wh-dependencies, and sentences with islands were rated lower than non-island sentences.
Crucially, a significant interaction between wh-dependency length and island structure was also found for each island type (whether: est = -0.61, SE = 0.09, t = -6.968, p < .001; complex NP: est = -1.14, SE = 0.14, t = -7.87, p < .001; subject: est = -1.96, SE = 0.13, t = 15.20, p < .001; adjunct: est = -1.51, SE = 0.11, t = -13.96, p < .001). This interaction resulted from low acceptability ratings of the ungrammatical island violation condition compared to the other three grammatical conditions for each island type. In other words, the effect of the island structure was greater in sentences with a long wh-dependency length than in sentences with a short wh-dependency length, indicating that native English speakers were sensitive to island effects in all four island types. Interaction plots for each island type are shown in Figure 1.
In sum, superadditive effects were observed across all island types, such that the combination of a long wh-dependency and an island structure yielded lower acceptability than the sum of individually processing a long wh-dependency and individually processing an island structure. Under the grammatical view, this superadditivity would be taken to reflect the violation of an island constraint, while under the resource-limitation view, it would reflect a processing overload due to the simultaneous burdens of processing a longdistance dependency and an island structure.

Individual differences analyses
To investigate the source of the island effects, we next examined whether sensitivity (quantified by DD scores) was modulated by individual differences in cognitive abilities (quantified by performance on the cognitive tasks). Recall that a relationship is expected under the resource-limitation view, such that better cognitive abilities should lead to lower island sensitivity (i.e. greater acceptability of island violations), but no such relationship is expected by the grammatical view.
Following Sprouse et al. (2012a), two sets of linear regressions were conducted for each of the four island types. The first linear regression was run with the complete set of DD scores for each island type. The second linear regression was run with DD scores ≥ 0 for each island type. DD scores below zero are indicative of subadditive effects, indicating that the effect the island structure in a sentence with a long wh-dependency was less than the effect of the island structure in a sentence with a short wh-dependency. As Sprouse et al. (2012a) note, because neither theory predicts subadditive effects, inclusion of these negative DD scores might potentially mask the ability to observe a relationship between the individual difference measures and DD scores. Therefore, negative DD scores were excluded from the second analysis. Sprouse et al. also note that negative DD scores could represent individuals who do not experience sensitivity to typical superadditive effects, in which case the inclusion of these negative DD scores might increase the likelihood of finding a negative correlation between individual difference measures and DD scores. Both analyses are reported here following Sprouse et al. This second analysis resulted in the exclusion of two participants for whether islands, twelve participants for complex NP islands, three participants for subject islands, and five participants for adjunct islands.
In addition to the linear regressions, Bayes factors (BF) were used to assess the strength of evidence with respect to hypothesis testing (Dienes 2014). A BF < .33 is considered substantial evidence for the null hypothesis over the alternative hypothesis, which would be in line with the grammatical account of islands, which predicts no relationship between DD scores and working memory/attentional control scores. A BF > 3 would be considered substantial evidence for the alternative hypothesis over the null hypothesis, which would be expected under a resource-limitation view. BF between .33 and 3 indicate that the data do not provide substantial evidence to distinguish the null and alternative hypotheses. For each island type, a Bayes factor analysis was conducted using the JZS prior with the R package BayesFactor (Morey & Rouder 2018).

Working memory
On the counting span task, participants' scores ranged from 22.22 to 94.44 (M = 61.37; SD = 13.56). For the reading span task, participants' scores ranged from 26.25 to 95 (M = 63.15; SD = 14.29). Because scores on the two tasks were significantly correlated (r = .36, p < .001), participants' scores on each task were z-score transformed and added to create a composite working memory variable. DD scores are plotted as a function of working memory scores in Figure 2.
Results from the linear regressions are reported in Table 2. For each island type, the line of best fit, goodness of fit, and significance of the slope are provided. In both sets of linear regressions, none of the best-fit slopes were significantly different from zero across all island types. Additionally, the R 2 value, which measures how much of the variance in DD scores can be explained by the working memory scores, was very low for each island type.
Bayes factors for each island type are also provided in Table 2 and provide adequate evidence in line with the null hypothesis for most of the linear regressions, with Bayes factors below or around .33 for most island types. One exception to this is for the whether island in the overall linear regression (BF = 1.018), which did not show conclusive evidence for either the null or alternative hypotheses. Together, these results indicate that there is no robust relationship between DD scores and working memory, contrary to expectations of the resource-limitation view.

Attentional control
Participants' reaction times and accuracy were recorded in order to calculate the Stroop interference effect. In this calculation, larger positive scores are indicative of higher attentional control. Participants' Stroop reaction time interference effect scores ranged from -239.97 ms to 110.70 ms (M = -86.81 ms; SD = 60.80 ms). Stroop accuracy interference effect scores ranged from -21.25 to 5.00 (M = -3.55; SD = 4.12). These two variables were marginally correlated (r = .18, p = .07), and given the fact that they measure the same task, a composite variable was created for the linear regression analysis. 5 DD scores are plotted as a function of attentional control composite scores in Figure 3.
Results from the linear regressions are reported in Table 3. In both sets of linear regressions, none of the best-fit slopes were significantly different from zero across all island types. The R 2 value for each model was very low, and thus the goodness-of-fit results and significance tests of the slopes (p-values) indicate a lack of significant relationship between DD scores and attentional control. Bayes factors for each island type similarly provided evidence in line with the null hypothesis for most of the linear regressions, with Bayes factors below or around .33 for most island types, with the exception of the second whether island model which indicated that there was not substantial evidence to support the null hypothesis or reject the null hypothesis in favor of the alternative (BF = 0.415). Together, these analyses provide no evidence of a robust relationship between attentional control and island sensitivity.

Discussion
Our findings do not suggest a strong relationship between individual differences in working memory, as assessed via counting and reading span tasks, and attentional control, assessed via number Stroop task, and island sensitivity, which is contrary to what is predicted by the resource-limitation view (Kluender & Kutas 1993;Kluender 1998;2004). When negative DD scores were included and excluded from the linear regressions, the results overall showed no relationship between these cognitive measures and DD scores for the four island types (p > .05). No more than 3.5% of the variance in DD scores was accounted for by any of the cognitive measures. Furthermore, Bayes factors for these linear regression analyses 5 We conducted additional linear regression analyses examining each Stroop measure separately (reaction time, accuracy). We observed the same results for each analysis as in the models utilizing the composite Stroop score, with very low R 2 values and non-significant p-values for the slopes for each island type. largely supported the null hypothesis. In light of these results, we argue that island effects are not reducible to processing costs of working memory or attentional control. Thus, the current study provides further evidence in favor of the grammatical view of island effects in line with Sprouse et al. (2012a), as well as with recent work in Italian (Sprouse et al. 2016), Norwegian (Kush et al. 2018;2019), and Akan (Goodluck et al. 2017) (see also Michel 2014;Yoshida et al. 2014;Aldosari 2015).  Although our results indicate significant island effects across all four island types tested, sentences in the whether island violation condition were rated higher on average (4.22) compared to the other island violation types tested (mean of 2.28 for complex NP, subject, and adjunct). Smaller, more variable island effects have been observed for whether islands, which are considered weak islands, or those which selectively allow extraction (Chomsky 1986;Cinque 1990;Szabolcsi 2006). Native English speakers have shown variation in their willingness to reject sentences containing extraction from whether islands (Johnson & Newport 1991;Marthardjono 1993;Aldosari 2015), a finding echoed in recent work on whether islands in Norwegian (Kush et al. 2018). In the individual difference analyses, R 2 and p-values for linear regressions indicated a lack of relationship between working memory scores and DD scores for whether islands; however, the Bayes factor for this regression revealed that there was insufficient evidence to support the null hypothesis. Thus, the results suggest, in line with previous studies, that island effects differ across constructions and that the null hypothesis, which supports the grammatical account of islands, is most strongly supported by the results of the complex NP, subject, and adjunct islands.
We believe the results provide strong support of Sprouse et al.'s proposal given that our methods addressed several of the concerns outlined by Hofmeister et al. (2012a). First, in the present study, a background declarative sentence preceded each target sentence to eliminate the pragmatic oddity of being presented a question in isolation. In addition, the stimuli used complex wh-fillers, which may facilitate the processing of wh-dependencies (Hofmeister & Sag 2010;Goodall 2015) as they have been argued to be more Discourselinked (D-linked) than bare wh-fillers, meaning the noun phrase refers to some previously introduced entity. Although D-linking has been argued to ameliorate island effects, recent work by Sprouse et al. (2016) found that the amelioration through D-linking was unable to overcome superadditive island effects. This is in line with our results given that island effects emerged across all four island types.
An additional criticism was that the serial-recall and n-back tasks used by Sprouse et al. (2012a) were not sufficient measures of working-memory capacity because they do not both include a processing component and a storage component. Hofmeister et al. (2012a;b) argued that this could account for Sprouse et al. (2012a)'s failure to find a robust relationship between working memory and island sensitivity. We therefore utilized the counting span and reading span tasks, both of which include a processing component and a storage component (Conway et al. 2005). We also included an additional cognitive task, the number Stroop task, to measure attentional control, which has been shown to capture individual variability in the processing of wh-dependencies (e.g. Johnson 2015). Despite the use of different working memory tasks and an additional attentional control measure, we still found no robust relationship between individual differences and island sensitivity, which suggests that the source of island effects is not due to processing difficulties.
Note that the reading and counting span tasks we utilized provide a measure of working memory capacity as described by capacity models of memory during language comprehension (Just & Carpenter 1992;Gibson 2000). As Pañeda et al. (2020) discuss in their recent work on island effects in Spanish, it may be the case that working-memory capacity size is an inadequate memory measure given the cue-based retrieval theory, which argues that the critical mechanism involved in language comprehension concerns how accurately a comprehender retrieves, rather than stores, information (e.g. Lewis et al. 2006;McElree et al. 2003; for a review see Parker et al. 2017). It is possible that we did not observe a relationship between working-memory capacity and island sensitivity due to the memory tasks we utilized; future research could employ working memory tasks which do not test serial order recall capacity (Gieselman et al. 2013).

Conclusion
We found no robust relationship between individual differences in working memory and attentional control and island sensitivity. The results of our study, which took several criticisms of Sprouse et al.'s (2012a) approach into account, provide further evidence in line with grammatical theories regarding the source of island effects. Although the extent to which different island phenomena may result from processing pressures remains an intriguing open question, the findings from the current study do not provide evidence supporting an attempt to recast grammatical island constraints for the four island types we tested as due to capacity-based limitations in cognitive resources such as working memory.

Ethics and Consent
The human subjects research was approved by the Institutional Review Board at the University of Kansas -Lawrence (Study No. 00003708).