Verbal short term memory contribution to sentence comprehension decreases with increasing syntactic complexity in people with aphasia

Sentence comprehension requires the integration of linguistic units presented in a temporal sequence based on a non-linear underlying syntactic structure. While it is uncontroversial that storage is mandatory for this process


Introduction
Recovering meaning from a sentence requires comprehension of the words, and extraction of the correct dependencies between them.The process is incremental in that words constitute phrases (e.g.noun phrase NP "the dog") to be combined in clauses (e.g."[the dog] NP bites VP "), which in turn may be combined to construct theoretically infinitely complex utterances, although natural language use may have universal complexity constraints on syntactic structure (Karlsson, 2007).Evidently, comprehension becomes more challenging when the sentence is longer, contains more propositions or is semantically less plausible.However, even if these factors are kept constant, syntactic structure modulates processing demands.As an example the sentence [1]"The dog bites the man, who eats a sausage." is less challenging than [2]"The man, whom the dog bites, eats a sausage.",despite identical number of words, semantic content and plausibility.In the latter sentence [2] the longer distance between elements constituting the proposition 'man eats sausage' increases storage demands, while reordering of arguments is necessary to recover the meaning of the proposition 'dog bites man'.Storage challenges increased by centre-embedded clauses suggest that verbal short term memory (STM) is a major modulator of correct comprehension of complex sentences (Papagno and Cecchetto, 2019).Regarding the proposition 'dog bites man' in [2], the ability to perform the syntactic operation 'movement' is mandatory to correctly restore the order of arguments.Theoretical accounts propose the gap (__) in the object-first relative clause must be filled to restore canonical S-V-O structure ("…whom the dog bites __ [the man] ") (Grodzinsky and Santi, 2008).Such manipulations of items stored in STM are traditionally assumed to be afforded by a 'working memory' (WM), assumed to be governed by a 'central executive' (Baddeley, 2010).

Which information is stored during online sentence processing?
Importantly both, linking elements separated by a long distance, but also movement of constituents requires keeping the elements 'active' until the phrasal or sentential meaning is recovered.Therefor intuitively, the comprehension of sentences consisting of a linear sequence of elements (words /phrases) governed by an underlying non-linear grammatical structure relies on efficient storage of intermediate results during linguistic processing.If storage is mandatory for sentence comprehension, such storage can pertain to different levels.The 'classical' working memory model proceeds from a phonological loop and buffer (Baddeley and Hitch, 2019), which could store the verbatim representation of elements (i.e.word-forms).However, for sentence comprehension, most current theories additionally assume the relevance of a semantic STM, which interfaces with long-term lexico-semantic representations, holding active the semantic information when accessible (Martin and Schnur, 2019;Martin et al., 2021).As an example studies in participants with acquired brain lesions, document greater difficulties to identify the implausible item in a sentence like "The rugs, mirrors, and vases cracked during the move."when compared to "The movers cracked the vases, mirrors, and rugs.".Because the former sentence provides the relevant semantic category ('crackability') only after the enumeration of the items, semantic STM is taxed more heavily suggesting a relevant role in sentence comprehension (Martin and Romani, 1994;Martin and He, 2004).At a higher level, interim storage of the integrated conceptual meaning of phrases or propositions might be afforded by a conceptual STM (e.g.'an elderly man' or 'elderly man eats sausage' as one storage unit) (Fiebach, Friederici et al., 2007).Indeed a conceptual short term memory account posits an unconscious, rapid and short-lived conceptual representation of verbal but also pictorial material (Potter, 2012).Similar to the 'episodic buffer' introduced in the revised Baddeley-Model (Baddeley, 2000), it implies immediate interfacing incoming information with concepts stored in the long-term or semantic memory.This would allow for a given proposition in a multi-proposition sentence to be economically stored at a conceptual level rather than verbatim.Such a view implies that the ease of comprehension of sentences with multiple propositions partially depends on how fast a given phrase/proposition can be 'wrapped up' to be stored in a conceptual form.Not least, beyond storage of phonological, semantic, and/or conceptual information, syntactic features need to be kept 'alive' until a phrase or sentence is completed (Fiebach et al. 2001).This comes into play whenever unintegrated syntactic information has to be kept activated during ongoing sentence processing (e.g.sentences with embeddings).
Despite controversies regarding the exact nature of the short-lived memory traces, all levels may contribute to sentence comprehension.Different parts of a highly distributed short-term memory network may be active during sentence comprehension, with their respective weight depending on task requirements and context (Christophel, Klink et al. 2017).In support of this view semantic rather than phonological STM was shown pivotal for overall sentence comprehension in a study on people with aphasia (PWA), however, in some sentence types (reversible active-passive) the phonological STM measure was a better predictor of sentence comprehension.The proposal of a 'back-up' function of phonological STM when overall sentence comprehension capacities are impaired, highlights that recruitment weight may be adjusted, if necessary (Horne, Zahn et al., 2022).

Short-term memory impairment impacts on sentence comprehension in people with aphasia (PWA)
Although the relevance of verbal STM/WM is intuitive, the correlation between STM/WM and sentence comprehension has been discussed controversially.Of special interest to the current study is evidence from people with an acquired language disorder.One position is that verbal STM is of importance for processing complex, non-canonical sentences.This is supported by a study on stroke patients showing a correlation between STM (digit and word spans) and sentence comprehension (picture matching) only for syntactically challenging sentence types (semantically reversible, object-cleft, passives).The correlation did not hold for irreversible sentences, allowing for comprehension relying on the semantic content (Pettigrew and Hillis, 2014).Even more specific, a study including people with agrammatic and fluent types of aphasia concludes that centre-embedded sentences and object relatives strongly tax STM-capacities (assessed by digit-span forward), while deficits of passive-sentence comprehension largely correlate with core syntactic impairment, defining agrammatism (Gilardone et al., 2023).Conversely a very general impairment of information maintenance has been suggested to drive correlations between sentence comprehension and STM/WM deficits.Using verbal and spatial memory measures (forward and backward digit-/block-span) and an aphasia-score including verbal command execution, correlations held for both short-term-and working-memory and notably also for the spatial domain (Potagas et al., 2011).Similarly in an attempt to specify the level at which STM/WM supports sentence comprehension an elegant EEG-study in neurotypicals used isochronous (4 Hz) presentation of Chinese syllables.Four syllables constituted a sequence of (syntactically simple) sentences (1 Hz) composed of two phrases (2 Hz).These were presented during the delay period of a task imposing a verbal (digits) or spatial (square position) WM-load.The EEG indicated that phrasal (2 Hz) and sentential (1 Hz) processing decreased with increasing verbal and spatial WM-load, while low-level syllable processing (4 Hz) increased, suggesting that domain general WM capacities are central to sentence as opposed to single-word processing (Liang et al., 2022).In stark contrast to the above studies some studies suggest that there is no correlation between overall STM capacities and sentence comprehension.Assessing visual (picture recognition), and verbal STM (digit span), a sentence repetition test (repeating sentences with increasing number of words) and a sentence comprehension task (sentence to picture matching) no correlation between these measures was seen in 12 PWA and 84 neurotypical controls (Salmons et al., 2022).
A detailed review including studies in people with an acquired language deficit (Papagno and Cecchetto, 2019), arrives at the conclusion that verbal STM is highly relevant for the comprehension of syntactically complex sentences.To demonstrate this correlation the relevance of testing centre-embedded sentences is highlighted.Another review of the literature (between 1980 and 2017) points out the high divergence of operationalisations of STM/WM and manipulations of sentences complexity used in different studies.While the meta-analysis confirms that overall measures of STM/WM do correlate with measures of sentence comprehension, the authors favour a view in which STM/WM does not assist online processing of the sentence (parsing and interpretation) but rather post-interpretative operations, such as choosing the correct picture to match the sentence (Varkanitsa and Caplan, 2018).This view posits that the standard online-processing of a sentence is served by a "separate language interpretation resource" (SLIR), unless ambiguities require reanalysis after sentence completion or highly complex/un-natural syntactic structure requires conscious re-ordering of words/phrases.Such a language devoted STM-system is assumed to be independent from the typically tested STM/WM-capacities (e.g.digit spans) as suggested by dissociations between severe STM-deficits for spans without corresponding sentence comprehension deficits in clinical populations (e.g.degenerative disease Caplan and Waters, 1999).Whether or not studies find correlations between measures of STM and H. Obrig et al. sentence comprehension thereby depends on the overall task including post-interpretative judgment.
Other authors have proposed a 'language-specific' STM/WM, converging on the idea that beyond general STM/WM capacities complex sentences require the use of a storage function for the evolving syntactic structure (Fiebach et al., 2005;Makuuchi et al., 2009;Matchin and Hickok, 2020).A review including the potential neuronal correlates of the multifaceted storage dimensions during sentence comprehension proposes that pars orbitalis and opercularis of the IFG house language specific, syntactic WM functions, while pars triangularis is relevant for the more general phonological and semantic short term storage (Matchin, 2018).
Proceeding from these different views we here investigate whether measures of overall sentence comprehension will correlate with STM measures.However, we additionally address the question whether language specific aspects of the task such as the increase in the sentence's syntactic complexity do or do not lead to increased STM relevance, tapping into the controversy of domain-general versus domain-specific processing demands during sentence comprehension (Fedorenko and Thompson-Schill, 2014;Campbell and Tyler, 2018).

Are neuronal correlates of verbal STM and sentence comprehension overlapping?
From a slightly different angle lesion-behaviour analyses have been used to test whether sentence processing relies on verbal STM-capacity.The rationale is that if lesion patterns correlating with either skill overlap, this would speak for a joint processing resource.The strongest claim is based on a large sample of stroke survivors unselected regarding their deficit profile, in whom a measure of auditory STM (digit-span forward) correlated with integrity of a small region in the left posterior superior temporal gyrus (pSTG, Leff et al., 2009).The fact that this lesion also correlated to measures of sentence comprehension is taken as evidence for a joint neural substrate for STM and sentence comprehension.Another study suggests a more complex interaction.In a sample of 58 participants with mixed-aetiology degenerative disease, specific atrophy patterns correlated with lesser performance on more complex sentence types.While overall sentence comprehension correlated with volume reduction in posterior temporal and inferior parietal areas, digit span backward and performance on sentences with multi-proposition relatives showed overlapping correlations in the left inferior frontal (p.triangularis, IFG triangularis ) and left middle frontal gyrus (Amici et al., 2007).In a lesion study using the Token Test as a measure of sentence comprehension, damage to both posterior superior/middle temporal lobe and IFG triangularis were identified as key areas affecting performance.Without specific testing, this is interpreted to signal that working memory (besides semantic control) is a key function for performance on the Token Test, which relies on the increasingly complex sentential commands (Adezati et al., 2022).
These studies illustrate potentially shared hubs of a large scale network affording both sentence comprehension and STM/WM capacity.However, either capacity relies on distributed networks, whose respective functional anatomy is controversial.A review on lesion studies highlights that inferior frontal, inferior parietal, superior temporal and anterior temporal cortices have all been considered relevant for sentence comprehension (Wilson, 2017).Regarding verbal STM (operationalized by the digit span) a lesion-behaviour analysis in participants with acquired brain lesions suggests that temporo-parietal cortex and basal ganglia in the left hemisphere are most consistently associated with deficits in digit-span.Notably, however, none of the five regions identified, selectively predicted a persistent STM deficit, suggesting a mutually compensatory network (Geva et al., 2021).In sum the overlap of functional anatomy between two functions (STM/WM and sentence comprehension) is only partially indicative of the assumption that overall STM/WM capacities are predictive for complex sentence comprehension.

The current study
Here we address the correlation between verbal STM/WM and the comprehension of complex sentences in a cohort of people with acquired chronic lesions to the (extended) left hemispheric language network (Fedorenko and Thompson-Schill, 2014;Campbell and Tyler, 2018).Participants underwent two experimental tasks.A sentence-comprehension task required to choose the picture (out of four), which correctly represented a sentence with three propositions.While length, content and plausibility were kept constant the sentences were systematically varied along two syntactic dimensions (number of embeddings and argument order).The experimental verbal STM task required to judge whether the last (pseudo)word of a sequence of three or five items was identical to any of the preceding ones.Moreover clinical measures of verbal and non-verbal STM/WM (digit-span/ block-span, forward and backward) and other clinical measures were used.We hypothesised that overall performance on the sentence-to-picture matching task should correlate with measures of verbal STM/WM.The key question, however, was whether increasing syntactic complexity would modulate this correlation.If overall verbal STM/WM were central to the processing of syntactically complex structure, an increase in the correlation strength would be expected.Conversely, a decrease would speak for an additional, separable, potentially domain-specific resource affording different aspects of syntactic analysis including storage of interim processing steps.A complementary lesion-behaviour analysis was performed addressing two related questions.The first analysis targets whether lesion patterns for overall performance on both experimental tasks overlap, which would speak for joint resources.A second analysis aims at isolating the potentially independent resource additionally taxed by syntax-related challenges.To this end the STM-capacity was factored out from the difference scores between sentences of high versus low syntactic complexity.In other words the analysis aimed at isolating specific hubs in the network relevant to cope with increasingly complex syntactic structure if content and overall storage capacities are factored out.

Participants
43 participants with a chronic, acquired left hemispheric brain lesion participated in the study (age: mean ±SD= 50.5 ± 11.09 years [range: 24-74], 20 females).The aetiologies comprised mostly vascular but also other CNS-diseases leading to a circumscribed chronic brain lesion (25 ischemic and 6 haemorrhagic strokes, 4 subarachnoid haemorrhages mostly with consecutive vasospastic stroke, and 8 participants who suffered from a lesion due to other aetiologies, see supplemental material SM-1 for more details).Time post onset of the lesion was variable (months post onset, MPO: 28.3 ± 11.09 months [range 3-139]).Participants were selected from a data bank of the Clinic for Cognitive Neurology, University Hospital Leipzig, and the Max Planck Institute for Human Cognitive and Brain Sciences (MPI-CBS).All participants had been treated at the clinic with a focus on cognitive rehabilitation.Exclusion criteria were severe overall cognitive impairment, or an aphasia severity interfering with consent and/or comprehension of the instruction.Presence and profile of aphasia at the time of testing were assessed by the standard German battery (Huber et al., 1984), and a range of clinical tests.Patients either had AAT-derived syndromes (amnestic n = 7, Broca's n = 3, Wernicke's n = 2) or showed residual aphasia 1 (n = 18).Of those with no manifest aphasia at the time of inclusion n = 9 had suffered from an aphasia during the acute stage, n = 4 1 Residual aphasia denotes that the Token Test, which the AAT uses to quantify aphasia severity, was within the normal range, but subtests of the AAT including spontaneous speech rating showed clear aphasic symptoms.
H. Obrig et al. had not shown aphasic symptoms.Details are listed in supplemental material SM-1, including demographic and clinical information and the aphasia type.For the aetiologies other than typical strokes some additional clinical information is supplied there.
In all patients brain imaging allowed for lesion delineation.In 39 patients a high-resolution structural MRI acquired at the MPI-CBS was available (3T Scanner; T1 MP-Rage/mdeft with 1 mm3 isovoxel; FLAIR image as reference).In three participants clinically motivated MRIs and in one patient a clinically motivated CT were used.All participants gave informed consent according to the Declaration of Helsinki.The experiment was approved by the local ethic committee of the University of Leipzig (Nr.:144/18-ek, 13.4.2018).

Experimental material and procedure
In an auditory sentence-to-picture matching task accuracy (ACC, correct/incorrect) and reaction times (RT, s) were recorded.All RTs were log 10 -transformed.The paradigm targeting the comprehension of differentially complex sentences (SYN, see 2.1.1.and 2.1.2.below) was complemented by results of another experimental paradigm targeting 'phonological working memory' (PhoM, 2.1.3.below) and results from the clinical assessment (see 2.1.4.).

Stimulus material: comprehension of complex sentences (SYN)
Syntactic complexity was systematically manipulated in a stimulus set consisting of 132 German sentences.The sentence material comprises six animated agents, (animals: frog, dog, hedgehog, bird, tiger, beetle), six colors (red, green, grey, yellow, blue, brown), six different actions (pull, push, wash, comb, draw, carry), and two emotions (laughing, crying).Only animals with grammatically male gender were chosen, since case marking of the article is unambiguous in German only for grammatical male gender (e.g."der [- subj] Käfer" / "den [obj] Käfer").Based on this set, all sentences contained three propositions in a scene with two participants (see Fig. 1A).The propositions were (i) colour of one participant, (ii) emotion of one participant, and (iii) action between the two participants.This allows for the variation along two well-documented factors of syntactic complexity: embedding depth and argument order.Regarding embedding depth (factor EMB; levels: E0 = no, E1 = single, and E2 = double embedding), the three propositions were either presented in canonical sentences linked by the conjunction 'and' (E0) or contained relative clauses allowing for single and double embedding (E1, E2).For the factor argument order, the proposition 'action' was presented either in a subject-or object-first order (factor ARG; levels: S 1st , O 1st ).Object-first sentences represent a non-canonical word order and thus enhance syntactic complexity.This results in six conditions: EMB[E0,E1,E2] x ARG [S 1st ,O 1st ], examples of which are provided below.
The material is identical to a study investigating the effect of anodal transcranial DC-stimulation in young neurotypical participants (Krause et al., 2022), and was developed based on a simpler version of the material previously used in children and elderly participants (Antonenko et al., 2013;Fengler et al., 2016).The here used material requires the choice of the correct out of four pictures corresponding to an auditorily presented sentence.Note, that for each proposition one of the incorrect pictures contained a distractor.The pictures appeared only after the full sentence was presented to prevent step-wise exclusion of the distractors.

Procedure (SYN)
A fixation cross was displayed during the full length of the auditory presentation of the sentence (average length: 5500 ms).Thereafter, four pictures appeared on the touch screen and participants had to choose the image matching the content of the sentence.After choice of the picture visual feedback was given for 1000 ms (green checkmark for correct, red cross for incorrect responses).Feedback was included to increase task adherence.Time-out for a response was 15 s.For an illustration, see Fig. 1B.Reaction times were measured from the onset of the picture array.
Stimuli were presented using the software package Presentation® (Neurobehavioural Systems, Inc., Albany, CA, USA).Pictures were generated using Adobe Illustrator (Adobe Systems Inc., San Jose, USA) and displayed on a 17′' touch screen (ProLite T1732MSC-B1; resolution: 1280×1024, i.e. 1.3 megapixel).The 132 sentences were single-channel recorded by a trained male speaker of German in a sound-proof chamber using the free open source digital audio editor and recording software Audacity (www.audacityteam.org/download/).The audio track was normalized offline to remove any DC offset by centering it on the 0.0 amplitude level.This is important as DC offset may cause clicks in the recording or distortion after running effects on the track.Afterwards, the sentences were edited into separate audio files, remaining clicks were manually cut out or covered with generated silence, and prolonged speech pauses were scaled down.

Measure of verbal STM / phonological working memory (PhoM)
An experimentally developed battery for verbal STM termed 'Phonological Working Memory battery'2 (PhoM) was performed in all participants.All subtests of the battery require a simple forced binary choice by button press.Of relevance when testing people with aphasia, the paradigm does not require verbal production, which may distort results of e.g.digit span measures in people with more severe production deficits.For PhoM participants listen to a set of words or non-words to then indicate whether a probe non-/word was part of the set.The material systematically varies lexicality (word / non-word), lengths (mono-/bi-syllabic), set size (3 / 5 items), similarity (only 1 phoneme different/ phonetically dissimilar), and phonotactic complexity (no / complex onset-clusters).Details are being published elsewhere, examples are provided in supplemental material SM-2.In the current study we use averaged performance across all sub-tests, which includes overall 216 trials (accuracy in% correct and mean of log 10 -transformed RTs in s).The measure is referred to as PhoM MEAN henceforth.

Clinical tests
We additionally included three test results from the clinically motivated test-batteries assessed in the participants, targeting (i) alertness, (ii) verbal and (iii) spatial STM / working memory.As an estimate of alertness subtests of the standard battery for the assessment of attention in Germany were used (Testbatterie zur Aufmerksamkeitsprüfung, TAP Pflueger and Gschwandtner, 2003).The two subtests assess simple stimulus response latencies with/ without a warning tone (phasic/ tonic alertness).Here we use the log 10 -transformed latencies (in ms) averaged across the two subtests.The measure is henceforth referred to as log A-LERT.As a clinical estimate of verbal short term memory the Digit Span and for spatial short term memory the Block Span both from the German Version of the Wechlser Memory Scale were used (Härting et al., 2004).For both spans the mean chain lengths of two test blocks were used (e.g.4.5 could mean recall of 4 digits in the 1st and 5 in the 2nd test run).Only forward spans were used and are henceforth referred to as DS fw and BS fw3 .

Statistical analyses of behavioural measures
We calculated means per participant and condition resulting in% correct for ACC, and mean log 10 RT.Moreover, individual responses were used for (generalized) mixed models.After a survey of the descriptive statistical properties the analyses of the data were performed stepwise with increasing complexity.

Correlation analyses
Based on the means, we analysed correlations between (i) the results of the sentence-comprehension task (SYN), (ii) PhoM MEAN , (iii) the three clinical tests: log ALERT, DS fw , BS fw , and (iv) three epidemiological parameters: age, log-transformed months post onset ( log MPO), and lesion size.For lesion size the diameter of a sphere corresponding to the overall lesion volume was calculated (DiaSph).Pearson's R or Spearman's ρ were used depending on the statistical properties of the respective measures.The results allow for a ranking of the mutual correlation strengths between the performance on the sentence-to-picture matching and the other parameters.

(Generalized) linear mixed models
Next, we constructed (generalized) mixed models based on all individual responses.For ACC the logit function (log[odd(correct/incorrect)]) in a generalized linear mixed model was used.For log 10 RT linear mixed models were evaluated.All models comprised the two factors of syntactic complexity (EMB: E0, E1, E2 and ARG: S 1st , O 1st ) and their interaction (EMB*ARG).Participant and Stimulus were always modelled as random factor intercepts.In a stepwise fashion we then added covariates to the models including all other measures (PhoM MEAN , three clinical tests, three epidemiological parameters).The order of the stepwise addition was decided by the strength of correlation in the initial correlation analysis.To evaluate which of the resulting eight models (basic + 1-7 covariates) best fits the data we used Akaike's-informationcriterion (AIC) and fixed-effects statistics.The (G)LMM analyses will yield two important results.Firstly they indicate whether there is an effect of the two factors of syntactic complexity.Additionally they identify which of the seven additional factors, including measures of short term memory, significantly contribute to the variance in performance on the sentence-to-picture matching.

Linear regression
Finally, we performed linear regression analyses.As opposed to the (G)LMM analyses, linear regression allowed for separate assessment of each level of sentence complexity and also for difference scores.The 7 parameters were added in a step-wise fashion in analogy to the (G)LMM.F-statistics were used for model comparison (i.e.whether the addition of another covariate increases model fit in a statistically significant way).
Akaike's-information-criterion (AIC) was also calculated as an additional measure of the model fit.

Lesion-Behaviour correlations
We assessed correlations between lesion site and performance in a voxel-based lesion-behaviour analysis.To this end, lesions were manually delineated on each slice of the T1-images using MRIcron (Rorden and Brett, 2000) with FLAIR-images as a reference.For normalization and transformation of the lesion masks into standard stereotactic space (MNI) the 'clinical toolbox' (www.nitrc.org/projects/clinicaltbx/) in SPM12 (fil.ion.ucl.ac.uk/spm) was used applying the unified segmentation approach (Ashburner and Friston, 2005) and restricting estimation of normalization parameters to healthy tissue (Brett et al., 2001).Clinical MRI-images (n = 4 participants) and the CT (n = 1 participant) with a resolution other than the in-house MRIs were interpolated to 1 mm 3 images.The lesion overlay map of the 43 participants is shown in Fig. 2A.Fig. 2B illustrates the coverage in the most relevant areas for syntax processing, as will be discussed below.Note that for all regions at least one third of the participants showed lesions, and that ≥ 10 participants had lesions affecting ≥ 10 % of the region.The atlas used for the percetanges is the aal (automated anatomical labelling, Tzourio-Mazoyer et al., 2002).
Correlation analysis between behaviour and lesion pattern was performed by the voxel-based lesion symptom software 'vlsm2' developed for the first VLSM study and regularly revised by Stephen Wilson (https://langneurosci.org/vlsm/;Bates et al., 2003).Based on the binary lesion maps, t-statistics determine whether a voxel correlates in a statistically significant way with performance in the behavioural measure of interest.To tackle the issue of multiple comparisons the permutation method (2000 permutations) was used to correct for false positives.We report cluster-based correction, meaning that after a statistical base-map is generated, clusters surviving the threshold are corrected for multiple comparisons yielding a corrected p at the cluster level.An advantage of the vlsm2-software is, that it provides results based on different p-value levels of the 'base'-map (level of p < 0.001, p < 0.005 or p < 0.010 for the voxel-wise t-test before clusterwise correction).Reporting the results in Table 3 the lowest significant threshold is indicated in the column 'p @ base'.Clusters are reported if they survived correction at a p < 0.05, trends are (p < 0.1) reported if relevant for the central research question.
The results of this analysis were checked using a multivariate approach using support-vector-regression (Zhang, Kimberg et al. 2014) as implemented in the SVR-LSM toolbox (DeMarco and Turkeltaub 2018).The multivariate approach takes inter-voxel correlation into account, and estimates the lesion-symptom map at all voxels simultaneously in a single model.We deliberately used both approaches, since type I and type II errors differ especially in relatively small samples.Convergence of results may augment confidence in the respective lesion-behaviour-correlation.
In all lesion-behaviour analyses age and time since lesion onset were entered as covariates, to factor out these unspecific effects of no interest.Since plasticity related changes are more likely in the early chronic stage and become less likely with increasing chronicity the log-transformed months post onset ( log MPO) were used.Moreover lesion size was included as a covariate using the diameter of a sphere (in cm) corresponding to the lesion volume.Factoring out lesion volume is proposed by most conventional but also in multivariate approaches since larger lesions increase the risk of falsely ascribing behaviour variation to a specific voxel.Only voxels lesioned in at least 10 % of participants (four participants) were included in the analyses.
To identify differential regions relevant for overall performance on the sentence-to-picture matching task (SYN MEAN ) and the task assessing verbal short term memory measure (PhoM MEAN ) the respectively other result was factored out (i.e.SYN MEAN with additional covariate PhoM-MEAN , and vice versa).Next, we conducted lesion-behaviour correlations for individual differences scores.Three difference scores were calculated: E1-E2, S 1st -O 1st , and E0/S 1st -E2/O 1st .This should capture both factors (EMB and ARG) and their interaction.For these analyses PhoM MEAN was also factored out.Note, that for the difference scores numerically larger (i.e. more positive) difference values for accuracy denote a larger decrease in performance for the more challenging condition.In other words lesion sites correlating with larger values lead to a more pronounced decrease in performance with increasing syntactic complexity.
If all participants had performed at chance level for a given condition (e.g.E2/O 1st ), or factor (E0, E1, E2 and S 1st , O 1st ) correlations might be uninformative.Therefore, we calculated confidence intervals for the individual performance for conditions and factors.This was done by assessing the probability P(k) of observing a hit rate H of k hits in n trials purely by chance (Steffens et al., 2020).
Averaged across participants PhoM performance was 89 % ±6.4 (SD).Also at the individual level none of the participants performed below chance level (confidence interval 48-52 % with 216 trials, chance level 50 %).Reaction times were 1.3 s ± 1.63 (SD).RTs were log 10transformed for the ensuing analyses.

Correlation between experimental, clinical and epidemiological parameters
Next, we performed a correlation analysis between all parameters of the experimental paradigm (SYN), the phonological short memory task (PhoM MEAN ), the three clinical tests (DS fw , BS fw , ALERT) and the three epidemiological measures (age, MPO, DiaSph).Supplemental material SM-3.3.provides those results, which are of interest to our research question, that is, which factors predict the performance on the comprehension of complex sentences.For accuracy, performance on PhoM MEAN correlated strongly with all except for the most challenging conditions (E2/O 1st ).Moreover, PhoM MEAN also correlated with most of the difference scores.The positive values of r/ρ indicate that better performance on PhoM MEAN predicts a larger decrease in performance with increasing in syntactic complexity.The clinical measure for verbal working memory (DS fw ) also correlated with the conditions and factor levels but not with the difference scores.Age, lesion size (DiaSph) and time since disease onset ( log MPO) showed weak correlations with some H. Obrig et al. conditions.BS fw and ALERT showed no correlations.Reaction times in the sentence-to-picture matching task correlated with age and lesions size (DiaSph), largely indicating that older participants and those with larger lesions were overall slower.Notably, performance in PhoM MEAN and DS fw , did not consistently correlate with the speed of correct sentence comprehension.

(Generalized) mixed model approach, (G)LMM
(G)LMMs were constructed to test: (i) the effect of syntactic complexity on performance; (ii) whether addition of PhoM MEAN and/or other parameters would increase the model fit.The incremental addition of covariates was based on the correlation analysis.For accuracy the stepwise addition was: PhoM MEAN → DS fw → age → DiaSph → MPO → ALERT → BS fw .The resulting models (M1-M8) are listed in Table 1A.Based on Akaike's-criterion (AIC) the addition of PhoM and DS fw lead to relevant increases in model fit, while all additional factors, did not improve model fit.Because addition of BS fw substantially decreased AIC an additional model (M9) with PhoM MEAN , DS fw and BS fw was evaluated.This model had a lower R 2 -conditional, moreover, DS fw The of the two best fitting models (M3 for both parameters) are listed in Table 2.For accuracy all main effects and post hoc tests were significant.The significant interaction EMB*ARG was driven by a lesser influence of ARG for E2 (for illustration see graph with all conditions in supplemental material SM-3.1.).As expected, the covariates PhoM MEAN and DS fw also showed significant effects.For reaction times main effects for EMB and ARG were significant, but not their interaction.Post-hoc testing showed significant differences for all comparisons except for E0 -E1.Statistical results are indicated in Fig. 3.

Linear regression analysis
In the next step, we performed a linear regression analysis to find out whether the levels of syntactic complexity differed in the degree to which PhoM MEAN , clinical and epidemiological measures predict performance.The stepwise procedure was the same as for the (G)LMM and comparison between models used F statistics.The full results are provided in supplemental material SM-3.4..The analysis confirmed the results for the (G)LMMs, in that PhoM MEAN and DS fw were the only factors contributing significantly to the regression fit for accuracy, while for reaction times age and lesion size (DiaSph) were the only relevant factors significantly improving the model.In addition, the regressions showed that the explained variance (adjusted R 2 ) differed between conditions, factor levels and difference scores, as is illustrated in Fig. 4.  2).Numerical information on means and a graphical representation of all six conditions are provided in supplemental material SM-3.2.. EMB: embedding depths (E0, E1, E2); ARG: argument-structure, i.e. subject-/ object-first (S 1st /O 1st ).

Table 1
A: results of the GLMMs based on accuracy (correct/incorrect) with increasing number of covariates; 1B: LMMs based on reaction times ( log10 RT) with increasing number of covariates.Best fitting models are framed in solid lines.For accuracy, the measures of verbal short term memory (PhoM MEAN and DS fw ) explained ~40 % of the variance for E0-and E1-sentences but did not explain significant variance for the most challenging E2-sentences.Moreover, adjusted R 2 was lower for O 1st when compared to S 1st sentences, and neither PhoM MEAN nor DS fw contribute to the explanation of the difference between the two argument structures (S 1st -O 1st ).Interestingly, the prediction of O 1st -sentences was improved by the addition of the factor DS fw , while performance for S 1st -sentences was best predicted by PhoM MEAN alone.For reaction times age and lesion size (age / DiaSph) also differentially predicted speed to decide on the sentences' meaning.The two epidemiological factors predicted less variance for the E2-sentences and showed a lower R 2 for O 1st versus S 1st sentences.Note, that PhoM MEAN and DS fw did not consistently predict reaction times for any sentence type (see supplemental material SM-3.4.for details).

Lesion symptom analysis
The lesion behaviour analysis targeted the question, whether specific lesion sites modulate performance in aspects of the experimental paradigms.We only report results for accuracy, since analyses for RT did not result in consistent clusters.For overall performance on the SYN-task (SYN MEAN ) a lesion cluster in the anterior portion of the left middle temporal gyrus (a-MTG), extending to the superior temporal gyurs and the temporal pole correlated with worse performance.Worse performance on the PhoM-task (PhoM MEAN ) projected to a cluster in the left inferior parietal lobe which extends into the underlying white matter.Fig. 5 visualizes the respective clusters.
For the SYN-task we additionally analysed three contrasts between simple and complex syntactic structure (E1-E2, S 1st -O 1st and E0/S 1st -E2/ O 1st ).For the contrast E1-E2 lesions in a cluster in the mesial anterior temporal lobe (with a very marginal extension to the frontal operculum) led to a more pronounced decrease in accuracy with the additional Fig. 4. Results of the linear regression analysis for accuracy (ACC upper graphs) and reaction times (RT, lower graphs).Adjusted R 2 are plotted for all factor levels and difference scores (see supplemental material SM-3.4.for details).For both ACC and RT, models with one (M2) or two (M3) covariates are shown, note that these covariates differ between ACC and RT.n.s.indicates that none of the models was significant.Arrows (↓) indicate the 'winning' model (i.e.either M2 or M3).M3 models for the difference scores are not plotted, because none of these was significantly better than the respective M2 models.embedding.For argument order (S 1st -O 1st ) lesions in a more lateral cluster in anterior temporal cortex decreased performance more strongly for non-canonical argument order.The contrast between the easiest and the most challenging condition (E0/S 1st -E2/O 1st ) yielded a cluster in between and partially overlapping with these clusters.The clusters for the difference scores are illustrated in Fig. 6 and statistical results and MNI-coordinates are provided in Table 3.Our confirmatory analysis using the multivariate approach (SVR-LSM toolbox, DeMarco and Turkeltaub, 2018) qualitatively yielded the same results.The figure in supplemental material SM-4 shows that results from this approach largely overlap with the primary VLSM-analysis.

Discussion
The correlation between measures of overall verbal STM/WM and the comprehension of (complex) sentences is controversial, ranging from the view that general STM/WM capacities are mandatory for sentence processing (Pettigrew and Hillis, 2014) to the alternative notion, that linguistic processing during sentence comprehension is largely independent from the less domain-specific STM/WM capacities (Varkanitsa and Caplan, 2018;Salmons et al., 2022).The results of the current study speak for an intermediate position.While the two measures of verbal STM were the strongest predictors of overall performance on the sentence-to-picture matching task, this was driven by the syntactically less complex versions of the sentences.For multiple embedding (E2) and non-canonical object-first (O 1st ) sentences the correlation was weaker or even absent, suggesting that additional domain-specific linguistic capacities come into play with increasing syntactic complexity.Note, that sentence-length, semantic plausibility and post-interpretative factors (e.g.picture matching) were kept constant, therefore the relevance of more general STM/WM capacities does not decrease with syntactic complexity, but additional language specific resources need to be recruited.The complementary lesion-behaviour analyses yielded results partially supporting and extending these findings.Correlating lesion patterns with overall performance on the sentence-to-picture matching (SYN) and the verbal STM-task (PhoM) respectively, resulted in non-overlapping clusters in the temporal lobe (sentence-task) and in the parietal lobe (STM-task).Since both tasks rely on large-scale networks rather than single brain areas, this does not exclude partially overlapping neuronal resources, but suggests that the recruitment strength of respective network hubs dissociates.Tapping into the question, which language-specific resources are recruited during sentence comprehension, another lesion-behaviour analysis provided lesion-clusters, which lead to a larger decrease in accuracy with increasing syntactic complexity of the sentences.The resulting clusters in the anterior temporal lobe marginally extending into the frontal operculum do not coincide with the 'classical' inferior frontal and posterior temporal syntax-hubs, however, the ATL has been discussed in the context of speech comprehension.The cluster found in the present study (ATL with marginal extension into frontal operculum) is situated at the interface between the target regions of the ventral stream of language processing (Hickok and Poeppel, 2007;Mesulam, 2023) and inferior frontal areas which may afford the most basic syntactic operations (Zaccarella and Friederici, 2015).3. Coronal slices illustrate that the cluster for PhoM MEAN extends substantially into the underlying white matter.

Overall performance on sentence-to picture matching correlates with measures of verbal STM
The experimental verbal STM-task and digit-span forward (PhoM + DS fw ) were the only two parameters significantly improving the model fit for accuracy (Table 1A), with 42 % (PhoM) and 22 % (DS fw ) of variance explanation according to the simple correlation analyses (supplemental table SM-3.3).The finding indicates that recall of phonological and lexical information (PhoM) and the repetition of meaningless digit sequences (DS fw ) are predictors of correct sentence comprehension, thus supporting the view that general STM/WM capacities do play a role in complex sentence comprehension (Papagno and Cecchetto, 2019).Notably the correlation between STM and connected speech comprehension may be bidirectional.Using German adjective-noun and noun-adjective pairs for which the inflection of the adjective was either incongruent or congruent with the noun, only the latter improved recall.This is taken as evidence that (morpho)syntactic features are relevant for storage in the STM (Schweppe et al., 2022).Regarding the interaction between core-linguistic (syntactic) operations and STM this speaks for partially separate but interacting resources.
A pointed view is that STM/WM capacities are part of the "domaingeneral machinery" during language processing (p.125 in Fedorenko and Thompson-Schill, 2014).It should be noted, however, that the correlation demonstrated in the present study pertains to the accuracy of the overall tasks.It does not allow for a differentiation between recruitment of phonological, lexico-semantic or conceptual STM-storage Fig. 6. (A) illustrates the clusters correlating with a stronger decrease in accuracy for double when compared to single embedded sentences (E1-E2, red) and the a stronger decrease in accuracy for object-compared to subject-first relative clauses (S 1st -O 1st , blue).(B) additionally provides the cluster correlating with the difference between the easiest and most challenging condition (E0/S 1st -E2/O 1st , green).Clusters were corrected for multiple comparison by permutation; for the numerical results please refer to Table 3. Axial slices additionally illustrate ovelaps between the clusters.

Table 3
Results of the VLSM analyses.The arrows in the column 'lesion →' indicate: ↓= lesion leads to lower numerical value indicating lesser performance on the PhoM or SYN tasks.↑ = lesion leads to larger difference between the easier and more complex syntactic structure of the sentences, indicating a larger comprehension deficit for the syntactically complex sentences after factoring out the influence of PhoM.p @cluster = p-value at the cluster-level, p corr = p for the cluster after correction (permutation analysis).(Martin and He, 2004;Fiebach et al., 2007).Moreover, post-interpretative steps of the task, including correct visual analysis and choice of the four pictures, may contribute to the correlation with overall task performance.For reaction times only age and size of the lesion (DiaSph) improved the model fit.This is in line with age-related slowing of sentence processing in neurotypicals (Caplan et al., 2011;Malyutina et al., 2018) and overall slowing of language tasks in PWA (Faroqi-Shah and Gehman, 2021).In sum, participants, whose STM-capacities were less affected did better on the correct performance of the challenging sentence comprehension task, while their speed largely depended on their age and the severity of the lesion.
In people with chronic lesions, correlations may additionally be driven by compensatory mechanisms.In this vein a recent study tested 20 stroke-survivors (14 PWA) on a sentence-to-picture matching task with/without background noise.Besides the unsurprising finding of an increase in RTs for the noise conditions, working memory resources were the most relevant predictor of how well the additional cognitive load was compensated for (Fitzhugh et al., 2021).Another study on 134 neurotypical participants demonstrated age to negatively correlate with STM/WM-measures and sentence comprehension.Importantly, however, with increasing age the correlation between STM/WM and sentence comprehension became stronger, suggesting that good STM/WM capacities allow for compensation of difficulties understanding complex sentences (Sung et al., 2017).

An increase in syntactic complexity reduces the correlation between sentence-comprehension-task performance and verbal STM measures
Stepwise linear regression for all six syntactic complexity levels confirmed the results of the (generalized) mixed model approach in that only verbal STM measures (PhoM, DS fw ) contributed to the prediction of accuracy, while only epidemiological factors (age, lesion-size) improved regression models for reaction times.Importantly, however, the predictive strengths of verbal STM-measures for correct sentence-to-picture matching decreased with increasing syntactic complexity of the sentences.Accuracy for sentences with two embeddings (E2) was not significantly predicted by verbal STM measures.For object-first versus subject-first relative clauses (O 1st vs. S 1st ) the explained variation dropped by ~20 % or ~10 %, depending on whether only PhoM or both STM measures (PhoM+DS fw ) were included in the model.These results imply that resolution of syntactic complexity does not rely on overall STM measures alone but additionally taxes a "separate language interpretation resource" (Varkanitsa and Caplan, 2018).Such a dissociable resource may afford syntactic computations including increased storage demands caused by longer distance between constituents of a proposition (embedding) or gap-filler relations in non-canonical argument order.This view is supported by the finding of dissociable neuronal correlates of such a 'syntax-devoted' STM (Makuuchi et al., 2009).Contrary to our finding, Pettigrew & Hillis (Pettigrew and Hillis, 2014) report that STM measures were relevant for more complex sentences.The difference may stem from a less parametric syntactic manipulation including semantic aspects (largely reversible versus irreversible sentences), and from the fact that only digit-and word-spans were used to assess STM.

Deficits of the sentence-picture matching task and the STM-task correlate with differential lesion patterns in the extended language network
Regarding overall mean performance either on the STM-task or the sentence-to-picture matching task lesion-behaviour correlations yielded non-overlapping clusters.The cluster for the STM-task comprised parietal cortical areas and parts of the underlying white matter including the large parieto-frontal tracts.A review of lesion studies indeed highlights the role of the inferior parietal and prefrontal cortex and their interaction for STM/WM functions (Muller et al., 2002;Muller and Knight, 2006).However, a more general view doubts the quest for a specialized "STM/WM-area" arguing that neuronal correlates of maintenance and manipulation of information to perform a specific task is highly distributed depending on the specifics of the sensory information and the task (Christophel et al., 2017).Similarly lesion correlates of impaired sentence comprehension comprise areas in the inferior frontal, superior temporal, inferior parietal cortices, and other regions (Wilson, 2017), in line with the notion that language comprehension is a function of a large-scale left-lateralized circuit (Friederici, 2012).Therefore we judge our results on the mean performance of either task to illustrate differential recruitment in partially overlapping networks affording the two tasks.This is in line with the interpretation of the correlational analyses of the behavioural variables to indicate that verbal STM-capacity is but one modulator of complex sentence comprehension.

Syntactic complexity resolution involves neuronal resources in the anterior temporal lobe
Extending our behavioural results the second lesion-behaviour analysis shows that lesions in the anterior temporal lobe (ATL) lead to a larger decline in task performance with increasing syntactic complexity.The ATL is most prominently discussed as the target area of the ventral ('what') stream, which unifies percepts from different modalities into increasingly amodal representations (Ralph et al., 2017).Gradients for auditory, visual and limbic-evaluative connections have been described (Mesulam, 2023).Regarding language processing Mesulam states that the "… TPR [temporal-polar region] is interconnected with the other two major epicentres on the language network in the inferior frontal gyrus (Broca's area) and temporoparietal junction (Wernicke's area)…" (p. 31, in Mesulam, 2023), suggesting a relevant role in language comprehension.Beyond such general relevance in mapping linguistic input to meaning, the role of the ATL in syntax processing is controversial.For overall sentence comprehension it has been attested in neurotypicals, mostly by increased recruitment for processing syntactically structured sentences compared to random-ordered word lists (Humphries et al., 2006;Udden et al., 2022).More specifically MEG data in neurotypicals suggest that the ATL supports sentence comprehension by incremental combinatoric operations for parsing (Brennan and Pylkkanen, 2017) and during story-listening, when syntactic structure building demands are used as a predictor for fMRI data (Brennan et al., 2012).In an attempt to disentangle syntactic from compositional semantic processes an elegant study using attentional modulation showed that the largest part of the ATL recruited during sentence-specific processing, affords both syntactic and compositional semantic processing, while a smaller part is more specifically involved in semantic composition (Rogalsky and Hickok, 2009).In another fMRI study on a large neurotypical sample, a widely distributed left-lateralized network was identified.Notably, left IFG and middle temporal areas were central to processing of syntactic complexity resulting from left-branching structure (i.e.elements of a syntactic structure preceding the head / verb of the sentence).Conversely, ATL activation was correlated with the measure of the less challenging right branching dependencies (Udden et al., 2022).This only partially converges with the current results, but stresses the relevance of 'unification' of parts of a phrase or clause in concert with mnemonic and control mechanisms during the comprehension of (syntactically) complex sentences (Hagoort, 2013).In sum, the 'classical' frontal and temporal pivots of the language network but also the anterior temporal lobe may contribute to syntactic processing.Controversially discussed regarding their respective roles, the ATL may be especially relevant for local / phrasal structure building (Friederici, 2017).
In people with acquired brain lesions support for a role of the ATL in syntax processing comes from a VLSM study in 50 stroke survivors who performed a sentence-to-picture matching task.PWA showed strongly augmented difficulties with non-canonical, when compared to canonical sentences after damage to the left ATL including the temporal pole, while overall task performance relied on the integrity of a large network in temporal, parietal and occipital areas (Magnusdottir et al., 2012).Similarly, contrasting chronic-phase stroke survivors (posterior lesions) and people after left anterior temporal lobectomy, ATL lesions strongly impaired object-when compared to subject-first relative clauses in a sentence to picture matching task (Rogalsky et al., 2018).However, the most robust lesion-correlation with overall agrammatic comprehension projected to posterior temporal and inferior parietal areas.A specific role of the ATL for language comprehension in aphasia is also proposed by a lesion-behaviour and connectivity study.Spared connections between the ATL and inferior frontal and posterior temporal areas, as well as inter-hemispheric connectivity between ATL-homologues predicted better comprehension of sentences and narratives (Warren et al., 2009).This may stem from the ATL's eminent role in binding meaningful lexical information together when interfacing semantic and syntactic information (Vandenberghe et al., 2002).Another lesion study suggests basic morphosyntactic operations to be afforded by the ATL (Dronkers et al., 2004).However, some studies showed no relevant contribution of the ATL in syntax processing.This includes work in PWA showing Token-Test performance to correlate with lesions in the posterior superior temporal lobe and IFG triangularis , while ATL and inferior parietal lesions did not (Adezati et al., 2022).Similarly an fMRI study comparing people suffering from the semantic variant of primary progressive aphasia (svPPA) and neurotypical controls provided no evidence of relevant modulation of ATL activation with increasing syntactic complexity (sentence-to-picture matching for variably complex sentences).Conversely IFG opercularis , posterior STS/ MTG/ ITG and bordering inferior parietal cortex showed recruitment in both groups.The authors moreover highlight the rather well preserved syntactic abilities in the svPPA group, in whom up to 40 % ATL-volume reduction was shown by VBM (Wilson et al., 2014).Interestingly, a follow-up on the above mentioned lesion-study suggesting ATL involvement (Magnusdottir et al., 2012) did not reproduce the finding.Instead lesions in pSTG correlated with lesser performance on non-canonical sentences after factoring out the performance of canonical sentences (Kristinsson et al., 2020).Since IFG-lesions had an additional but weaker effect, the authors conclude that posterior temporal and inferior parietal areas are relevant for complex sentence comprehension, while the IFG plays a complementary role.
Our results suggest a relevant role of the ATL for the resolution of syntactic complexity.Tentatively, this indicates that the ATL affords local structure building and may afford combing multi-word-phrases into a single conceptual representation (e.g.'.. the frog, who is brown, …' → [brown frog]).Such compositional semantic processing may be relevant to incrementally recover propositional content especially in syntactically complex sentences.This role is dissociable from the general verbal STM as assessed by clinical and experimental testing (Thothathiri, Kimberg et al. 2012).

The role of 'classical' language areas for understanding (syntactically) complex sentences
As reviewed above, the ATL has been discussed in the context of sentence comprehension and syntactic operations, however, frontal (especially IFG with subdivisions), posterior temporal (Wernicke's area plus surrounding cortex) and inferior parietal areas are much more prominent candidates in this context.Regarding comprehension of sentences, in which meaning can only be recovered by their syntactic structure, a seminal paper extended Broca's area previously assumed role exclusively in linguistic production to comprehension with a focus on "algorithmic", "syntactic-like" operations (Caramazza and Zurif, 1976).Notably, besides very small numbers of participants, the ascription to Broca's area resulted from the no longer tenable assumption, that the patholinguistic profile of Broca's aphasia correlates with damage to the respective area (Mohr et al., 1978;Dronkers et al., 2007).The paper shifted the 'division of labour' to a syntax (IFG) versus meaning/ lexico-sematics (pSTG) dichotomy.Recent models informed by the cornucopia of functional imaging and lesion studies propose a radically different functional-anatomical map in an extended language network.An excellent position-review paper (Matchin and Hickok, 2020) posits comprehension of (syntactically) complex sentence to be largely afforded by posterior STG and inferior parietal areas.Hierarchical lexico-syntactic processing in these areas allows for recovering syntactic structure in an incremental way profiting from a close interaction with lexico-semantic representations.Conversely, frontal areas are considered pivotal for constructing a linear sequence to match the underlying (non-linear) hierarchical structure during syntactic production.The fact that frontal areas are variably reported in functional imaging studies may signal that 're-winding' of the input is required for re-analysis during comprehension of highly demanding sentential syntax (e.g.ambiguous/ garden-path sentences).The latter partially converges with the notion that syntax-specific working memory function is housed in the frontal language-hub (Fiebach et al., 2005;Makuuchi et al., 2009).Thus, the absence of lesion correlations within the anterior language hub in the current study may indicate that comprehension of sentences with admittedly somewhat artificial, but unambiguous and repeatedly presented structure, varied across two commonly used syntactic dimensions, does not require 're-winding' or other production related processes.The recurrent presentation and the limited number of surfaces structures in otherwise stereotyped sentences may also limit syntactic STM/WM-load, potentially housed in the IFG.The null-finding converges with our findings using the exact same material in a facilitatory transcranial-DC-stimulation study in neurotypicals (Krause et al., 2022).We found no evidence for an effect when stimulating over IFG, while pSTG stimulation enhanced overall accuracy.
In this vein, our finding that posterior lesions did not show statistically robust correlations with difficulties in syntactic complexity is largely unexpected.The ample evidence for an eminent role of posteriortemporal and bordering inferior parietal areas during comprehension encompasses overall sentence comprehension (Just et al., 1996;Cooke et al., 2002), semantic-syntactic integration (Friederici et al., 2009) thematic role assignment (Bornkessel et al., 2005), argument-structure (Thompson et al., 2010) and even aspects of syntactic movement (Ben-Shachar et al., 2004), although the latter has also been claimed to rely on IFG recruitment (Grodzinsky and Santi, 2008).
Moreover, evidence from brain lesion studies converges on the central role of posterior language areas.A recent lesion-connectivity study on four aphasia-profile-groups suggests the integrity of overlapping posterior and middle temporal areas to be central to non-canonical sentence and word comprehension (Matchin et al., 2022).A follow-up study comparing PWA with agrammatic versus paragrammtic production showed that the latter correlates with syntactic comprehension deficits, while agrammatic production does not.Regarding corresponding lesion patterns, the role of posterior temporal /inferior parietal areas for hierarchical structure building (impaired in paragrammatic production) is confirmed (Matchin et al., 2023).The finding was confirmed comparing patients with apraxia of speech with non-fluent production due to degenerative language impairment (Lorca-Puls et al., 2024) and a recent study in chronic stage stroke using acceptability judgement on word order and agreement violations (Fahey et al., 2024).The relevance of factoring out word level deficits is highlighted by a lesion study in 51 chronic stage stroke survivors which investigated auditory description naming and sentence comprehension.While lesions in large parts of the language network decrease performance in both tasks the mid to posterior MTG survived for sentence comprehension when picture naming abilities were factored out (Pillay et al., 2017).Notably, also recovery of sentence comprehension was very modest in the subgroup of a large cohort of stroke survivors with temporo-parietal lesions (Wilson et al., 2023).
In sum, it remains unclear why in the current cohort, lesions in the posterior part of the language network (pSTG/ inferior parietal) did not correlate with the syntactic processing impairment.It may be argued that factoring out our measure of verbal STM/WM (PhoM), which H. Obrig et al. projected to posterior parietal areas, may have attenuated the effect.However, omitting this covariate did not yield a statistically significant correlations in pSTG/IP.Besides specifics of the cohort and the notorious issue of relative small sample size, a very speculative explanation might be that the highly repetitive presentation of highly controlled syntactic structure rendered structure-building processes less critical than building interim representations of single propositions while the sentence was presented (e.g.'.. the frog, who is brown,…' → [brown frog]).

Conclusion and perspectives
Our results speak for a role of verbal short term memory for the comprehension of complex sentences.However, the impact of general STM-capacity assessed by the most widely used clinical measure (digitspan) and an experimental task not requiring production (PhoM) decreases with syntactic complexity.This means that syntactic (i.e.language specific) processing is partially independent from the STM/WMcapacity routinely assessed in patients.Our lesion-behaviour-analysis extends this finding suggesting a relevant role of the anterior temporal lobe for this language specific capacity.Tentatively, the ATL may afford semantic composition and integration processes to allow for stepwise recovery of the sentence's meaning.Our results in people with an acquired lesion in the left-hemispheric language network, are of no direct clinical value.However, partially overlapping but separable resources for verbal STM/WM and syntactic aspects of speech comprehension suggest that differential training schemes may be relevant depending on the individual deficit profile (Salis et al., 2017).Moreover, our study demonstrates the feasibility and versatility of testing performance on syntactically quite complex sentences as an option to tease apart aspects of more general STM and core-linguistic, syntactic abilities also in people with aphasia.Since the task requires a simple forced choice a shortened version might be used diagnostically and add to the inventory of fine-grained diagnostics of the language deficit.

Declaration of competing interest
None of the authors have to disclose any conflict of interest.The work was supported by the International Max Planck Research School (IMPRS NeuroCom), in that CDK held a PhD-grant from the IMPRS.https://imprs-neurocom.mpg.de/home

Fig. 1 .
Fig. 1.Experimental paradigm and procedure.(A) provides an example of the paradigm: The example sentence belongs to the most challenging type with double embedding and object-first relative clause (E2/O 1st ).For examples of all conditions, please see above.The forced choice required a choice from four pictures; apart from the correct picture (bottom right) the other pictures contained a distractor for each of the three propositions.(B) illustrates the procedure: The full sentence was presented via headphones while the participant fixated on a fixation cross.The four pictures only appeared after the full sentence had been presented.Forced choice was required on the touch screen (time-out 15 s).Participants received feedback thereafter.

Fig. 2 .
Fig. 2. Coverage of lesions in the 43 participants.(A) Colored areas indicate that at least one of the participants had a lesion in this area.In the upper three graphs the orange areas indicate that ≥ 4 participants showed an overlap.This is the area covered by the lesion-behavior analysis.Maximal overlap (n = 15) projects to the insula as illustrated in the lower left sketch.(B) Percent damage in 4 left hemispheric regions: inferior frontal (IFG), superior temporal (STG), angular gyrus (AG), and temporal pole (T pole ).The upper part shows templates of the respective areas.Note the logarithmic scale, and that for all regions ≥ 10 participants had a lesion ≥ 10% of the respective region.Participants with no lesion in the respective area are not shown.

Fig. 3 .
Fig. 3. Behavioural results based on individual means for each condition.Left graphs provide accuracy results (ACC, [% correct]); note that chance level was 25 % as indicated by the grey shading.Right graphs show the results for reaction times; these are displayed in log 10 -transformed and corresponding seconds (RT, [log 10 /s]).The statistical results of the (G)LMM approach with post-hoc tests are indicated: **p < 0.001 (for details of (G)LMMs: see Section 3.1.3., Table2).Numerical information on means and a graphical representation of all six conditions are provided in supplemental material SM-3.2.. EMB: embedding depths (E0, E1, E2); ARG: argument-structure, i.e. subject-/ object-first (S 1st /O 1st ).

Fig. 5 .
Fig. 5. Lesion patterns correlating with overall lesser performance on the syntax-paradigm (SYN MEAN ) and the average performance in the phonological memory task (PhoM MEAN ).Clusters were corrected for multiple comparison by permutation; for the numerical results please refer to Table3.Coronal slices illustrate that the cluster for PhoM MEAN extends substantially into the underlying white matter.

Table 2
Best fitting GLMM for accuracy and best-fitting LMM for reaction times.Note that for both parameters the best model included two covariates, which were measures of working memory for ACC, but age and lesion size for RT.Bold rows indicate significant results.