Sentence-level embeddings reveal dissociable word-and sentence-level cortical representation across coarse-and fine-grained levels of meaning

B S


Introduction
While the cortical representation of the semantic content of single words has been extensively studied in past decades, it is the capacity to combine these single concepts in increasingly complex combinations that underlies many of the uniquely human aspects of knowledge and thought.One way to probe the cortical representation of such combinatorial meaning is by assessing sensitivity to meaning at the level of the sentence.A promising avenue to study sentence-level semantic sensitivity is offered by recent advances in computational language models.It is the ability of these models to capture meaning across the whole sentence that allows both the formulation of combinatorial representational spaces and the consideration of whether word-and sentence-level meaning are represented differentially across the cortex.In the current work we use OpenAI's sentence-level language models to address these questions and additionally utilise the properties of our stimulus set to assess whether our brain encodes fine-grained differences in sentence meaning between similar sentences and course-grain differences between highly dissimilar sentences in similar or different ways.
Sensitivity to semantic content at the level of the single word has been investigated in terms of local regional selectivity for semantic classes.These localised cortical increases in fMRI response have been observed for classes such as tool-, person-or place-related concepts (Chao, Haxby, & Martin, 1999;Fairhall & Caramazza, 2013a, 2013b;Fairhall, 2020;Fairhall, Anzellotti, Ubaldi, & Caramazza, 2014;Noppeney, Price, Penny, & Friston, 2006; for a review, see Bi, Wang, & Caramazza, 2016), as well as specific semantic features (Fernandino et al., 2015;Liuzzi, Aglinskas, & Fairhall, 2020).At the same time, Multivariate Pattern Analysis (MVPA) has identified more subtle sensitivity to semantic content in the distributed pattern of activation across voxels (Bruffaerts et al., 2013;Devereux, Clarke, Marouchos, & Tyler, 2013;Fairhall & Caramazza, 2013a;Leonardelli & Fairhall, 2022;Liuzzi et al., 2015;Simanova, Hagoort, Oostenveld, & Van Gerven, 2014).MVPA representation can be further extended with Representational Similarity Analysis (RSA), to assess where the distance between neural patterns produced by specific words can be seen to align with the semantic distance between those words.This extension allows the researcher to know not only that there is some form of information present about the different classes of words but that this conforms to a particular property (in this case, semantic meaning).Such neuroconceptual similarity has been quantified using word similarity derived from co-occurrence with other words (Fu et al., 2023), based on hierarchically defined, Wordnet derived, distances (Fairhall & Caramazza, 2013a, 2013b) or based upon word embeddings (Anderson et al., 2019;Fu et al., 2023;Liuzzi et al., 2020).This latter class consist of numerical vectors derived from linguistic neural networks that capture meaning through the localised context within which words appear in text (e.g.word2vec; Mikolov, Chen, Corrado, & Dean, 2013), is the initial input into most extant large language models, and provides a powerful tool for mapping meaning in the brain.Collectively, what emerges from these studies of category-general cortical sensitivity to word meaning is a distributed network of left hemisphere biased regions centred on the precuneus and medial prefrontal cortex (mPFC), the angular gyri (AG), ventral temporal cortex (VTC), posterior middle temporal gyri (pMTG), the inferior frontal gyri (IFG) and the left lateral orbitofrontal cortex (latOFC).
Similar results have been observed when participants are presented with sentence stimuli.Sentences describing particular thematic categories, such as those referring to people, places, object or animals, show pronounced domain-selective responses for these classes (Rabini, Ubaldi, & Fairhall, 2021, 2023;Ubaldi, Rabin, & Fairhall, 2022).When single word embeddings are averaged across a sentence (treating a sentence merely as the sum of its individual words) to create representational spaces, RSA results also closely converge with single word studies (Acunzo, Low, & Fairhall, 2022; see also Pereira et al., 2018).Recent advances in artificial intelligence (AI) models of sentence-level meaning have opened investigation of the neural representation of the higher order meaning conveyed by the structured integration of words into the overall meaning conveyed by the sentence.Models of sentence meaning have been seen to capture neural representations of meaning generally across the language network and have been used to investigate the architectural properties, training goals and neural-network layers that lead to the most neural-like representations, with varying results (Caucheteux & King, 2022;Schrimpf et al., 2021;Sun, Wang, Zhang, & Zong, 2021).However, it is important to note that neither word-or sentence-level approach in isolation can distinguish the cortical response to the individual constituent words from the combinatorial meaning of sentences.
To identify the neural processes underlying the higher-order meaning conveyed by the structured integration of words into sentence-level meaning, it is necessary to compare word-level to sentence-level representations.Region-of-interest analysis comparing deep neural network representations of sentence meaning to the averaged meaning of single words uncovered significantly greater sentence-level representations in the left temporal cortex and bilateral inferior temporal gyrus in compositional meaning (Anderson et al., 2021).In addition, recent work using sentence level representations derived from a deep neural network trained to identify the topic of the sentence show an enhanced ability to capture neural representation compared to averaged single-word embeddings found greater responses additionally in the precuneus and medial prefrontal cortex (Acunzo et al., 2022).However, in this latter case it is not possible to fully distinguish whether this arises due to the contribution of training goals (topic differentiation), the relationship between the sentence stimuli used in the fMRI study and these training goals (which may have led to bias) or from the sentence-level processing capacity conveyed by the deep neural network's architecture.
Sentence-level embeddings offer another promising way to address how semantic representation changes from single words to sentences.Like their word-embedding counterparts, these sentence level embeddings represent the meaning, now of the entire sentence including elements of the structure that it conveys.In the present work, we use OpenAIs text-embeddings-ada-002 to estimate the relatedness of 240 sentences drawn from four broad thematic categories of sentences.We use this in conjunction with representational similarity analysis (RSA) and fMRI to identify maximal convergence between sentence and neural relatedness across the brain.The contribution of sentence-level meaning is isolated by comparison to a model where word order within the sentences have been scrambled.We further informed this analysis by utilising the categorical structure or our stimulus set to examine both coarse-grained (across thematic category) and fine-grained (within thematic category) sentence representation.

Participants
Data are taken from 64 right-handed participants (26 male; mean age, 24.9) who underwent functional magnetic resonance imaging while reading general-knowledge questions presented in Italian.Data were derived from two studies.The first (N = 24) was a study of transient blocks in semantic access and participants read questions and were asked whether the target piece of knowledge was accessible, thought to be known but presently inaccessible, or was unknown (Ubaldi et al., 2022).In the second study (N = 40), an investigation of individual differences, participants read questions and reported only whether the target piece of knowledge was accessible or not (Rabini et al., 2023).In both cases, the task is orthogonal to the present study, where all sentences are analysed irrespective of response.All participants gave informed consent and procedures were approved by the Ethics Committee at the University of Trento and were conducted in line with the declaration of Helsinki (1964Helsinki ( , amended in 2013)).

Stimuli
Stimuli were composed of 240 general-knowledge questions written in Italian.The questions were equally divided into four thematic categories that described: people (e.g., "Which philosopher uttered the phrase "I know that I don't know?"); places, (e.g., "In which Spanish city is the Alhambra complex located?"); objects, (e.g., "What is the name of the brick that supports the weight of an arch?") or a final 'scholastic' knowledge domain that was designed to capture general knowledge unrelated to direct experience with the environment (e.g., "What is the name of the transition of matter from the solid to the gaseous state?").Stimuli were similar but not identical between the two studies (94 % were either the same or differed minimally in wording).Sentences were on average 10 words in length (study 1: mean: 9.95, range 6-15; study 2: mean: 10.0, range 8-12) and were matched across knowledge domain by number of words and number of letters.In both study 1 and 2, sentences were only presented once per participant.The full list of stimuli is available for study 1 here: https://figshare.com/s/2f4be0ba0278ea79a7d5; and for study 2, here: https://figshare.com/s/de4c9df3a39fc16fe0d3.

Procedure
Stimuli were presented using Matlab (https://www.mathworks.com) and Psychtoolbox Version 3 (https://www.psychtoolbox.org).In both studies, 15 questions from each of the four knowledge domains were presented per run in a pseudorandomised event-related design.In study 1, sentence stimuli were presented for 3 s followed by a 3 s fixation cross.In study 2, questions were presented one word at a time for 250 msec each, in black with the left-of-centre letter printed in red (≈17 % left-ofcentre), in order to facilitate fixation.For this study, presentation time was 2-3 s (depending on sentence length) and was followed by a fixation cross for the remaining duration of the six-second trial.In both studies, participants responded whether the targeted piece of knowledge was fully accessible in that moment or not.In study 1, if the question's answer was not accessible, participants were subsequently asked their confidence that they possessed the knowledge.Thus, the sentences analysed in the present study consisted of trials where the targeted answer was available to the participant in the moment ('accessible' response) or situations where the targeted piece of knowledge was either unknown, transiently inaccessible, or only partially accessible.

MRI scanning parameters
Functional and structural data were collected at the Center for Mind/ Brain Sciences (CIMeC), University of Trento, with a Prisma 3 T scanner (Siemens), using a 64-channel head coil.Participants lay in the scanner and viewed the visual stimuli through a mirror system connected to a 42 in., MR-compatible LCD monitor (NordicNeuroLab) positioned at the back of the magnet bore.Functional images were acquired over four runs using echoplanar T2*-weighted scans.Run duration was on average 10.5 min in study 1 (depending on responses) and 7 min in study 2. Acquisition parameters were as follows: repetition time (TR), 2 s; echo time (TE), 28 ms; a flip angle, 75 • ; field of view (FoV), 100 mm; matrix size, 100 × 100.Each volume consisted of 78 axial slices (which covered the whole brain) with a thickness of 2 mm, anterior commissure/posterior commissure aligned.

fMRI data analysis
Data were analysed and preprocessed with SPM12 (https://www.fil.ion.ucl.ac.uk/spm/).The first four volumes of each run were dummy scans.All images were slice time corrected, realigned to correct for head movement, normalized to MNI space, and smoothed using a 6 mm FWHM isotropic kernel.The data were then temporally high-pass filtered using custom code and a FIR filter of order 80, and cut-off = 0.0156 Hz (64 s).To estimate the fMR response for each trial was attained by averaging the 3 EPI volumes between 6 and 10 s post stimulus onset (in 0.5 % only two were used as the third scan was unavailable).Analyses were performed within an a priori defined grey matter mask.

Language models
Sentence embeddings were extracted from OpenAI's second generation text-embedding-ada-002 model (Neelakantan et al., 2022; http s://api.openai.com/)for each Italian sentence.Sentence-level embeddings are trained to learn a low dimensional representation that efficiently and numerically captures semantic meaning.The ADA-002 embeddings are derived from an unsupervised contrastive learning approach (Neelakantan et al., 2022), where the transformer encoder learns to produce similar embedding for sentences that occur next to one another in a passage of text and dissimilar embeddings for sentences that do not.The embeddings were used to determine sentence similarity via correlation between the 1536 element embedding vectors of each sentence.The efficacy of this approach as applied to our stimulus sets can be seen in the tight conformation of embedding-similarity based clustering and experimenter-defined categories shown in Fig. 1A.
Six different sentence-embedding derived template models were employed to construct representational dissimilarity matrices (RDMs; To isolate the combinatorial meaning contained within sentence structure from the meaning conveyed by the constituent words in isolation, embeddings were attained for each sentence either in their original form ('ordered') or a word-order shuffled ('scrambled') form to create separate models.Models were further separated in the full-model (all-sentences), coarse-grained model (where each sentence's embedding was replaced by the average of that domain) and fine-grained model (where RSA was performed separately within each knowledge domain and the results averaged).This resulted in six RDMs (in addition to an 'uniformed' binary category RDM, see text).C. Searchlight RSA.Separately for each subject and each template RDM, searchlight RSA was performed by correlating the neural RDMs of the 240 sentences extracted from a 4voxel radius sphere with the template RDM.This process was repeated iteratively with a searchlight sphere centred at each voxel with the resulting template-neural RDMs correlation summarised at the central voxel for that sphere.see Fig. 1B).To isolate the effects of sentence-level meaning from that of the constituent words alone, models were derived from a) the sentences presented in their original order and b) from sentences where the word order had been randomly scrambled.These ordered and scrambledorder models were used with a full model which included the distance calculated for each sentence pair.Additionally, we utilised the categorical structure of our stimulus set to examine both coarse-grained and fine-grained sentence representations.For the coarse-level model, the embedding for each sentence within a category was replace by the average embedding of that category.This permitted the isolation of the contribution of the broad distance between sentence knowledgedomains by removing fine-grained differences between similar sentences.We additionally included an 'uninformed' coarse-grained model (not shown) where categories were simply assigned a distance of same ('0′) or different ('1′).For the fine-grained model, to isolate brain regions most sensitive to subtle differences between more like sentences, RDMs were formed separately for each category.Collectively, this resulted in seven separate template RDMs.
Word-order scrambled model RDMs were seen as the tightest control for the ordered model RDMs, as the sentences are processed by the same model in the same way with the only exception being that word order has been rendered uninformative through scrambling.However, it may be the case that scrambling introduces unanticipated effects that make this a poor model.To control for this possibility, the RDMs derived from the scrambled-order model were compared to the averaged embeddings of the single words contained in each sentence using both the same embeddings (ada-002) and GloVe embeddings (Pennington, Socher, & Manning, 2014; see Acunzo et al., 2022, for training details).In both cases, the word-order scrambled model was superior to the averagedembeddings models analysis (see supplementary Fig. S1) confirming that the former provides an appropriate control condition for this study.

Representational similarity analysis
RSA was performed by comparing neural representational dissimilarity matrices (RDMs; see Fig. 1C) to language-model derived template RDMs utilising CoSMoMVPA (https://www.cosmomvpa.org/;Oosterhof, Connolly, & Haxby, 2016).Neural RDMs were extracted via searchlight (Kriegeskorte, Goebel, & Bandettini, 2006), where the local pattern of voxel activation was extracted from a 4-voxel radius sphere for each of the 240 sentences.The (dis)similarity between the neural representation of the 240 sentences was then determined by Pearson correlation (1-r), which formed the neural RDM at that location.The concordance between this neural RDM and template RDMs was calculated via Pearson's correlation and the resulting value was recorded at the voxel at the centre of this searchlight.This process was repeated for a sphere centred at every location within the brain volume.This process was repeated for each template RDM resulting in seven searchlight maps for each subject.The brainmaps reflected the results of the ordered and scrambled versions of the full, course-grained and fine-grained models as well as the uninformed binary category model.For analysis of finegrained differences, the RSA was performed separately within each category and the results of the four resulting analyses averaged.
To perform inferential group level analysis, searchlight maps for each participant were taken in separate random effects group-level analyses to assess word-ordered and scrambled ordered variants of the full, coarse-grained and fine-grained models and for the comparison of the informed and uninformed coarse-grained models.By performing these three separate analyses for the full, coarse-grained and fine-grained models, the degree of within-RDM averaging was balanced at each contrasted level (which may potentially affect the amount of noise present in the RDM).

Region of Interest (ROI) Analysis
ROI analysis was performed to assess differentiated regional contributions to sentence-level meaning across the cortex.Orthogonal ROIs were defined for the full model using the average of the ordered-and scrambled-sentence RSA result.While direct comparisons to baseline would not be valid, the resulting ROI definition is unbiased in terms of differences between the two models and differences across regions.This is because the defining contrast contains no Information about the difference between the two models and therefore cannot bias voxel selection towards either model (for discussion, see Friston, Rotshtein, Geng, Sterzer, & Henson, 2006).To form the ROIs, firstly the 16 most significant peaks in the group-level analysis of the average of the orderedsentence RSA and the scrambled-sentence RSA were identified.Then the ROIs were defined as the conjunction between a 5 mm spherical ROI centred at each location and voxels that showed significant effect (p <.001) in this contrast.

Full model
Sentence-level neuroconceptual similarity was extensive, with a single significant cluster encompassing much of the cortex (Fig. 2A).Within this cluster, the correspondence between modelled sentence similarity and neural similarity spaces was more pronounced in the left hemisphere.Peaks can be broadly grouped into two subtypes.The first includes retrosplenial complex (RSC), posterior parahippocampal gyrus (pPG) and in an anterior section of transverse occipital sulcus abutting the posterior angular gyrus (TOS/AG).These regions have previously shown to have a univariate preference for place-referent sentences (Ubaldi et al., 2022, Rabini et al., 2023) or pictures of places (Epstein & Kanwisher, 1998), to respond when participants have to navigate spatial location within memory (Epstein, Parker, & Feiler, 2007), or when participants retrieve knowledge about specific buildings (Fairhall, Anzellotti, Ubaldi, & Caramazza, 2014) or types of buildings (Fairhall & Caramazza, 2013a, 2013b) and show increased activity when individual have to recall the geographic provenance of specific items such as famous food dishes or people (Fairhall, 2020).Conversely, these regions are not commonly reported in studies investigating the multivariate representation of word meaning (Acunzo et al., 2022;Leonardelli & Fairhall, 2022;Liuzzi et al., 2020).It is of note that the conventional designation of this place-selective region as the 'retrosplenial complex' is somewhat of a misnomer as this region does not encompass the retrosplenial cortex and is rather located in the medial parietal occipital sulcus (Silson, Steel, & Baker, 2016), a region not commonly associated with general semantic processing.Likewise, the region of TOS identified in this study is approximately 1.5 cm posterior to AG regions previously shown to be sensitive to semantic content (Acunzo et al., 2022;Leonardelli & Fairhall, 2022;Liuzzi et al., 2020) and the bilateral pPG seen in the present study are approximately 1 cm posterior to the left lateralised section of the VTC that exhibits a robust multivariate representation of semantic meaning.The second subtype of peaks includes the pMTG, IFG extending into the middle frontal gyrus (MFG), anterior superior temporal sulcus (aSTS), lateral OFC, supramarginal gyrus (SMG), which have previously been seen to exhibit strong multivariate representations of word/sentence meaning (Acunzo et al., 2022;Leonardelli & Fairhall, 2022;Liuzzi et al., 2020).This pattern was largely preserved when considering representational models based on wordorder scrambled sentences (Fig. 2B), indicating that much of the capacity of sentence embeddings to capture neural representational spaces is contingent on the presence of the specific words themselves, rather than how they are constructed into a meaningful sentence.
Examination of the difference between the word-ordered and wordorder scrambled models allows differentiation of the meaning endowed by the higher-order structure of sentences from the collective response of the composite words.The ordered model significantly outperformed the scrambled model across the brain, indicating distributed contributions of sentence level information to cortical representation (Fig. 2C).Notably, peaks showing the greatest difference between ordered and scrambled models correspond closely to those regions showing the maximal overall effect for ordered and scrambled models (compare peaks in Fig. 2A-C), suggesting a relative lack of cortical differentiation.
To assess whether subtle regional variations in sentence-level representation were present, unbiased ROIs were defined through the contrast of the average of full and scrambled model (see methods).An initial ANOVA considering the difference in information captured by the ordered and word-scrambled models indicated strong regional variations (F (14,63) = 41.36,p <.0001).However, this absolute difference may reflect a difference in the underlying information present in each region (c.f.Fig. 2D; Henson, 2006).To assess whether sentence-level information confers the same proportional increase in neural information capture, we normalized each region by the average amount of information present for the ordered and scrambled models (Fig. 2E).Regional differences were seen to persist (F (14,63) = 3.88, p <.0001), with the most notable difference being between the smaller increase in putative place-selective regions (PPG, RSC, TOS/AG; mean increase 20.9 %), compared to non-selective regions more closely associated with semantic processing (IFG, aSTG, pMTG, SMG, latOFC; mean increase 25.7 %, t(63) = 5.36, p <.00001).However, no differences were seen within this non-selective set of regions (f (9,63) = 1.15, p =.33).Thus, sentence-level meaning captures more information across the cortex but, while this was less pronounced in regions that have been associated with place-information selectivity, the effect was largely undifferentiated in regions associated with general semantic processing.

Coarse-grained semantic representation
Our sentence stimuli were drawn from four thematic categories (see methods).This allowed us to examine coarse-grained differences between thematically distinct sentences, in contradistinction to the more fine-grained differences that may distinguish more similar sentences.To accomplish this, RSA was performed using a model where the embedding of each sentence within a category was replaced by the average embedding of that category (Fig. 3A).This model was then compared to an uninformed model that assumed the categories were equidistant from one another (Fig. 3B).Notably, the loci of peak neuroconceptual similarity were consistent with the full model (c.f.Fig. 2A/B), indicating the relative importance of these coarse-grained semantic differences in capturing neural representational spaces.
To isolate the regions where the semantic distance between categories is consistent with that of the sentence embeddings, the difference Fig. 2. Results of RSA using the full model that was derived from each sentence's individual embedding for all sentences.A. Model created from sentences with words in canconical order.B. Model created by word-order scrambled sentences.C. Differences between ordered and scrambled brain maps.D. Infromational convergence for ordered and scrambled models within ROIs.E. Difference between ordered and scrambled models normalised by average regional infromational content (see text).All brain maps are shown with an initial voxel theshold of p <.001, FWE-corrected for multiple comparisons at the cluster level (p <.05).Abbreviations: IFGope: par operculum; IFGtri: par triangularis.See Table 1 for peak locations and significance.between the brain maps resulting from the informed and uninformed models was determined (Fig. 3C).The primary loci of these coursegrained representations are distinct from those seen for the informed and uninformed model, with foci in the anterior superior temporal gyrus (aSTG), vmPFC and the precuneus, posterior parietal lobe, in addition to the lateral PFC (see Table 2).These effects were more pronounced in the right hemisphere, with coarse category level semantic distances more fully capturing continuous inter-category differences, compared to left

Table 1
Peak significance for word-ordered and word-order scrambled models, and the difference between them.Peak locations in MNI coordinates are shown for these contrasts and for the localiser contrast used to define the ROIs.Peaks for each contrast are taken from a single cortex-spanning cluster (extent in voxels: 175992, ordered; 163448, scrambled; 138861, difference).hemisphere where category representation appears more absolute in nature.

Region
To isolate the specific contribution of sentence-level meaning, the course-grained model of the ordered sentences was compared to the course-grained model derived from order-scrambled sentences.Fig. 3D shows the clear dissociation of word-order effects across the cortex.Contrary to expectation, we saw that the more nuanced sentence-level information actually impaired the model's ability to capture brain activity in the PPG, TOS/AG, RSC and right MFG (Table 2).This indicates that, at the course level, neural representations in these regions are best captured by the composite words within the sentence, rather than by the full meaning of the sentence.In contrast, sentence-level meaning contributed to coarse-grained category level representations in the anterior temporal poles, ventral temporal cortex, the amygdalae and ventromedial PFC (Table 2).

Fine-grained model
Fine-grained representation of sentence meaning was isolated by repeating the RSA procedure separately within each of the four categories and averaging the results.Brain maps are shown in Fig. 4.Both ordered (Fig. 4A) and scrambled (Fig. 4B) sentence models show maximal information correspondence in regions distinct from both to the full and course-grained models, with maximal information correspondence in left MTG and left IPS.The added meaning found in sentences was seen to strongly dissociate from that seen at the coarse-grained level, with foci in the inferior and posterior parital lobe, posterior middle temrpoal gryrus extending in the the mid MTG in the left hemisphere, the precuneus and elements of the medial and lateral prefrontal cortex (see Fig. 4C and see Table 2), with only mPFC expressing neuroconceptual convergence with both fine-and coarse-grained models.

Relationship between model grain and information capture
Between category, coarse-grained, differences are expected to dominate the capture of neural representations, as RSA weights distinct differences between representations more than subtle ones.Put another way, informational measures are much more apt at distinguishing between categories, say between apples and oranges, than they are at distinguishing the subtle differences within sets of apples or within sets of oranges.There is simply more information distinguishing sentences drawn from different categories and for this reason, coarse-grained between category differences can be expected to overwhelm fine-grained within-category differences.
This has a counterintuitive influence on our data.The full model, which contains both coarse-grained (apples versus oranges) and finegrained information (differences between apples and between oranges), actually underperformed compared to coarse-grained models (this is true both for informed and uninformed coarse-grained modelssee supplementary Fig. S3).This is counterintuitive as the full model contains more information.However, the result becomes less mysterious when one considers a) how coarse-grained differences dominate the informational space and b) the template model is imperfect.To continue the earlier analogy, if the occasional description of an apple or an orange was wrong for some reason, then one would expect that when distinguishing between apples and oranges replacing the fine descriptions with the generalised coarse description of that set would result in a performance gain.In this way, the extent of within-RDM averaging in this comparison (unlike the main analyses, see methods) is unbalanced and can produce extraneous increases in information capture.This performance gain offsets the lost fine-grained information.This is the most parsimonious, if mundane, explanation for increased information capture for the coarse-grained models compared to the full model.

Reaction time effects
To characterise the influence reaction time (RT) on the main analyses, a control analysis was performed for the full model.The analysis was the same as the main analysis except that, for each subject, an additional RDM was constructed based on RT differences between sentences.This RDM was then included as a regressor of no interest in the semantic searchlight RSA to partial out RT effects.For the a) ordered and b) scrambled models, and c) the difference between ordered and scrambled models, results were highly consistent with the main analysis in terms of distribution, peak location and significance (see supplementary Fig. S2 A-C).A quantitative analysis of the influence of RT differences was accomplished by subtracting the RT-controlled RSA from the RSA of the main analysis for the ordered full model.RT effects were seen to be present in the precentral sulcus, frontal operculum, early visual cortex and the supplementary motor area (see supplementary Fig. S2 D).These regions were distinct from those showing maximal semantic representation.Collectively, these results suggest that RT effects do not contribute meaningfully to the results reported elsewhere in this study.

Table 2
Peak significance and MNI locations for model comparisons for the coarsegrained (Fig. 3) and fine-grained models (Fig. 4).

Cluster
Peak T

Discussion
In this work, we used sentence-level embeddings to determine whether the combinatorial meaning contained within sentences relies on neural substrates dissociable from those underlying single-word meaning.When considering the full model, we observed that sentence-level meaning boosted the model's ability to capture information, but did so in a relatively uniform manner across the cortex.In contrast, when broad, course-grained representations and fine-grained representations were considered separately, dissociations became evident in the regions that represented word-level and sentence-level meaning and the regions involved were seen to differ between coarseand fine-grained representations of meaning.

Full-model
Examination with the full model revealed the widespread neuroconceptual convergence for sentence-level embeddings.At the same time, this capacity was largely mirrored by the scrambled model, suggesting that words alone can capture representational spaces across those same brain regions.Indeed, for both ordered and scrambled-order models, representations were seen to peak in pMTG, IFG, anterior STG and left lateral OFC with a left-hemispheric bias was.This pattern is consistent with previous studies of single-word embeddings, both when considered for single word presentation (Liuzzi et al., 2020) and when the average word embeddings computed for words are presented within a sentence (Acunzo et al., 2022).A notable exception to this pattern in the present study was the high correspondence between the full-model and neural representational spaces observed in RSC, TOS/AG and pPG.noted, these regions are strongly implicated in the representation of place-related words (Fairhall & Caramazza, 2013a, 2013b;Fairhall et al., 2014) and with regard to this specific stimulus set, they are known to exhibit univariate increases in activity for sentences about places (Rabini et al., 2023;Ubaldi et al., 2022).One possibility is that this pattern reflects a response to the presence of place information (see Section 4.2 for further discussion).
The primary question of this study is whether the representation of the additional information associated with combinatorial meaning relies on the same or different neural substrates compared to individual word meaning.There was not robust evidence for such divergence with the full model.When the difference between ordered and scrambled models was considered, the location of the maximal information difference was highly consistent with those exhibiting the maximal information for both ordered and scrambled models.An ROI analysis was performed to further assess the apparent homogeneity of informational capture associated with sentence-level meaning.Here the addition of sentencelevel meaning was seen to produce a 20-25 % increase in the capacity of the model to capture neural representation.While this differed between the set of regions previously associated with place selectivity (RSC, PPG, TOS/AG) and those more generally associated with semantic processing (IFG, aSTS, pMTG, SMG, latOFC), it did not differ within this latter subdivision.Thus, at the level of the full model, there do not seem to be specific elements of the semantic system that have a particular importance in combinatorial sentence meaning.

Coarse-grained model
To examine whether broad differences in meaning, such as that seen between sentences drawn from different thematic categories, are encoded distinctly in the brain, we replaced each sentence's embedding with the average embedding of that category, then compared this model to an uninformed category model that considered the categories to be equidistant.Results demonstrated a distinct change in cortical topography (compare Fig. 2C and 3C).The informed model led to have higher neuroconceptual similarity in the lateral aSTSs, medial PFC and the precuneus, while there was no difference in pMTG.Notably, this network showed a pronounced rightward bias, indicating that course grained differences, as captured by the sentence-embeddings, are mirrored more in the representations of the brain of the right hemisphere than the left.A potential mechanistic explanation for this effect is that the left hemisphere treats categories as discrete entities with the relative difference between categories being of little relevance to computations occurring in these regions.This relative agnosis to coarsegrained differences may result from the specialisation to focus on fine- grained details associated with the left hemisphere's documented specialisation for semantic meaning (Binder, Desai, Graves, & Conant, 2009).On the other hand, the distinctiveness of categories may be less pronounced in relatively less specialised right hemisphere regions, resulting in a more continuous relationship between categories, with more similar categories being represented in a more similar way.This principle may extend to regions like the pMTG, which show distinct representations of category and the lateral aSTS, which encoded broad relations between them.
The representations discussed in the previous paragraph could reflect the information contained in the constituent words of the sentence or in the sentence's combinatorial meaning.To isolate the added contribution of combinatorial meaning, we again compared ordered to scrambled models.Representations were seen to clearly dissociate across the cortex with coarse-grained sentence-level information being present in the temporal poles, the left lateral occipitotemporal cortex, extending along the ventral temporal cortex anteriorly and in the mPFC.The nature of this effect is complex.It implies that the types of relational information contained within sentences is of particular importance for the representation of broad-scaled differences between these categories of sentences.Dorsal elements of lateral anterior temporal lobe have recently been associated with the language-derived representation of conceptual knowledge (Bi, 2021;Wang et al., 2023;Wang, Men, Gao, Caramazza, & Bi, 2020) and the relational structures within sentences may be more relevant to this form of representation.However, future research will be needed to better understand the exact nature of the relationship of these regions to the coarse-grained meaning afforded by the relational information within sentence structure.
Counter to expectation, representations in place-selective RSC, pPG, TOS/AG and right MFG were captured better by the word-scrambled than the word-ordered model.This indicates that, at the coarsegrained level, the subtle relational meaning conveyed by the sentence structure was irrelevant to (and actually impairs) informational representations in these regions.This provides a further indication that effects in these regions are driven by category selective properties that may not generalise beyond the stimulus set used in this study.It is notable that we see a different pattern regarding the role of these regions when we consider the effects of word order in the full-model (which contain information of both a coarse-and fine-grained nature).In the full-model, ordered models capture more information in classically place-selective regions, RSC, pPG, TOS, while in the coarse model, the reverse is true.This suggests a complex interplay between the contribution of word order to coarse-and fine-grained representations which cannot be fully resolved by the present study.

Fine-Grained model
Considering overall information captured by both the ordered and scrambled models, the information captured by fine-grained models was more left-lateralised than that captured by either the full or coarsegrained models.Fine-grained representations of meaning peaked in left pMTG, and TOS/AG, a pattern consistent with the specialisation of these left-lateralised regions for semantics (e.g.Binder, 2009) and the consequent ability to make fine grained distinctions between more similar sentences.Additional sentence-level information (ordered > scrambled) again showed a more pronounced left-ward bias and was maximal in IPS, posterior/mid MTG, lateral PFC in both hemispheres along with the precuneus and mPFC.While most of these regions are reliably associated with semantic processing, the contribution of the IPS to sentence-level meaning is uncertain.Meta-analysis has implicated the left IPS in semantic control processes (Noonan, Jefferies, Visser, & Lambon Ralph, 2013), although a more recent meta-analysis failed to replicate this result (Jackson, 2021).
A central finding of this study is that, with the exception of the mPFC, all regions showing sensitivity to sentence-level information for finegrained differences between sentence meaning were distinct from those showing sensitivity at the coarse grain.This indicates that finegrained specialisation may be associated with a reduced sensitivity to coarse grained information.

Further considerations
Embeddings are primarily developed to index the meaning of and relatedness between sentences.However, neural networks are black boxes and the underlying mechanisms by which they perform their operations tends to be opaque.For this reason, the nature of the representations captured in this study remains uncertain.For instance, embeddings are sensitive to grammar and syntax and these may potentially play some role in the present result.Future work will benefit from additional models that consider other differences between stimuli that may account for RSA results, such as word frequency or syntax.
At the same time, this study employed naturalistic sentences that conveyed a diversity in meaning.While sentences were matched across knowledge domains on number of words and letters, they were not controlled for all factors (e.g., age of acquisition, familiarity, frequency, orthographic neighbourhood density, ratio of content to functional words etc), which may have potentially affected the results.While one might expect these factors to be balanced within ordered and scrambled models, the influence of these extraneous factors cannot be excluded, especially in so far as they converge with properties with which the embedding model is sensitive.Future studies may benefit from more tightly controlled sentences that are closely matched in linguistic syntactic structure while differing systematically in terms of semantic content.Likewise, future work may benefit from stimulus sets that manipulate the influence of sentence order on meaning ('dog bites man' versus 'man bites dog') to assess both the impact of this factor on sentence-level meaning and the capacity of these large language models to capture such meaning in the brain.
A final consideration is that the four thematic knowledge-domains used in this study may represent an under-sampling the category space, and thus our coarse-grained results reported here may be specific to elements of the four categories used.Future work is needed to ascertain that these findings generalise to coarse-grained representation in general.
While the opaque nature of AI-models can be considered a challenge in studies like the present one.It is also a strength.As AI begins to play a larger role in our day-to-day lives, Explainable/Interpretable AI, and the right to a clear explanation as to why an AI system made the decision it did, is becoming increasingly important to society (White House Office of Science and Technology Policy, 2022).For instance, the sentenceembeddings used in this study are an integral part of chat-GPT.Here the brain can provide a resource to probe the underlying mechanisms of the AI, such as the way in which coarse-and fine-grained sentence meaning that feeds into this large language model maps onto largely distinct human cognitive systems, can contribute to our understanding of how these models perform their tasks.

Summary
In this work, we used sentence-level word embeddings to gain insight into the representation of combinatorial meaning in the brain.When considered in the context of the undifferentiated full model, that collapsed across coarse-and fine-grains, while the capacity of sentence embeddings to capture neural representational spaces was largely contingent on the specific words irrespective of sentences structure it was seen to increase by ~ 20-25 % when information about sentence structure was conserved, in a manner that was relatively uniformly across brain regions.However, when divided into coarse-and finegrained distinctions in sentence meaning, differences were observed in dissociable brain regions.Collectively, these results indicate that differing neural systems are biased towards single-word and combinatorial meaning and additionally that the brain is organised into cortical

Fig. 1 .
Fig.1.Semantic representational spaces were created for the 240 sentences using the ada-002 text embeddings. A. Stimuli and Semantic Space.Separate Hierarchical clusterings of the embeddings of stimuli from study 1 and study 2 with the sentence topic (indicated by the question's answer) colour-coded by the original experimenter-defined thematic category (red: person; blue: place; green: object; yellow: scholastic).B. Template RDMs.To isolate the combinatorial meaning contained within sentence structure from the meaning conveyed by the constituent words in isolation, embeddings were attained for each sentence either in their original form ('ordered') or a word-order shuffled ('scrambled') form to create separate models.Models were further separated in the full-model (all-sentences), coarse-grained model (where each sentence's embedding was replaced by the average of that domain) and fine-grained model (where RSA was performed separately within each knowledge domain and the results averaged).This resulted in six RDMs (in addition to an 'uniformed' binary category RDM, see text).C. Searchlight RSA.Separately for each subject and each template RDM, searchlight RSA was performed by correlating the neural RDMs of the 240 sentences extracted from a 4voxel radius sphere with the template RDM.This process was repeated iteratively with a searchlight sphere centred at each voxel with the resulting template-neural RDMs correlation summarised at the central voxel for that sphere.

Fig. 3 .
Fig. 3. Results of RSA using the coarse-grained model where RDMs were created by replacing each sentence's individual embedding with the average embedding of the 60 sentences of the thematic category from which that sentence was drawn.A. Informed model, created from sentences with canonical word-order.B. Binary theoretical model created by coding sentence-pairs as belonging to either the same of different categories C. Difference between informed and uniformed brain maps.D. Difference in brain maps between informed models created from canonical word-order and word-order scrambled sentences.All brain maps are shown with an initial voxel threshold of p <.001, FWE-corrected for multiple comparisons at the cluster level (p <.05).

Fig. 4 .
Fig. 4. Results of RSA using the fined-grained model where RSA was conducted separately for each of the four thematic categories and the results averaged A. Model created from sentences with words in canonical order.B. Model created by word-order scrambled sentences.C. Differences between ordered and scrambled-order brain maps.All brain maps are shown with an initial voxel threshold of p <.001, FWE-corrected for multiple comparisons at the cluster level (p <.05).