Mapping semantic space: Exploring the higher-order structure of word meaning

Multiple representation theories posit that concepts are represented via a combination of properties derived from sensorimotor, affective, and linguistic experiences. Recently, it has been proposed that information derived from social experience, or socialness, represents another key aspect of conceptual representation. How these various dimensions interact to form a coherent conceptual space has yet to be fully explored. To address this, we capitalized on openly available word property norms for 6339 words and conducted a large-scale investigation into the relationships between 18 dimensions. An exploratory factor analysis reduced the dimensions to six higher-order factors: sub-lexical, distributional, visuotactile, body action, affective and social interaction. All these factors explained unique variance in performance on lexical and semantic tasks, demonstrating that they make important contributions to the representation of word meaning. An important and novel finding was that the socialness dimension clustered with the auditory modality and with mouth and head actions. We suggest this reflects experiential learning from verbal interpersonal interactions. Moreover, formally modelling the network structure of semantic space revealed pairwise partial correlations between most dimensions and highlighted the centrality of the interoception dimension. Altogether, these findings provide new insights into the architecture of conceptual space, including the importance of inner and social experience, and highlight promising avenues for future research.


Introduction
Conceptual knowledge underpins our ability to extract meaning from and interact with our environment, including the objects, people, and words within it.Strongly embodied theories argue that retrieving concept knowledge involves re-enacting sensorimotor states associated with the first-hand experience of a concept's referent (e.g., Glenberg, 2015).In contrast, amodal theories proffer symbolic conceptual representations that are independent from sensorimotor states (e.g., Collins and Loftus, 1975;Pylyshyn, 1984).For example, it has been proposed that word meanings can be represented in amodal format via distributional linguistic information derived from patterns of word co-occurrence in natural language (e.g., Grand, Blank, Pereira, and Fedorenko, 2022;Griffiths, Steyvers, and Tenenbaum, 2007;Jones and Mewhort, 2007).Providing a middle ground between these extreme positions, contemporary multiple representation theories argue that multiple sources of modal information, like perception, action, and affect, and of amodal information, like language, contribute to semantic representation (Binder and Desai, 2011;Borghi et al., 2019;Connell, 2019;Lambon Ralph, Jefferies, Patterson, and Rogers, 2017;Martin, 2016;Reilly, Peelle, Garcia, and Crutch, 2016).The degree to which each source of information contributes is thought to be dependent on both concept type and context.For instance, sensorimotor features are essential for the representation of concrete (i.e., material) concepts (e.g., APPLE), whereas abstract meanings (e.g., LOYALTY) rely more on features derived from linguistic and affective experience, because their referents lack those direct sensorimotor attributes (Dove, 2018;Kousta, Vigliocco, Vinson, Andrews, and Del Campo, 2011).Moreover, during retrieval, context/task-relevant conceptual features are prioritized, leading to observable context effects on both behaviour (e.g., Tousignant and Pexman, 2012;Van Dam, Rueschemeyer, Lindemann, and Bekkering, 2010) and neural activity (e.g., Kuhnke, Kiefer, and Hartwigsen, 2020;Muraki, Doyle, Protzner, and Pexman, 2023).
There is growing empirical evidence in favour of multiple representation accounts (for related reviews and computational modelling, see Lambon Ralph et al., 2017;Meteyard, Cuadrado, Bahrami, and Vigliocco, 2012;Muraki, Speed, and Pexman, 2023).Research efforts have demonstrated that distributional and embodied information make complementary and equally important contributions to conceptual representation (Meteyard et al., 2012;Muraki, Speed, and Pexman, 2023).For example, computational models trained on both linguistic and sensorimotor information outperform models trained solely on linguistic or sensorimotor information in explaining human lexicalsemantic performance (Andrews, Frank, and Vigliocco, 2014;Banks, Wingfield, and Connell, 2021).A growing number of distributional and embodied, or experience-based, properties of word meaning have been quantified (for a comprehensive list, see Gao, Shinkareva, and Desai, 2022), but how they interact within a unified semantic space has yet to be fully understood.Only a small number of studies (Binder et al., 2016;Muraki, Sidhu, and Pexman, 2020;Troche, Crutch, andReilly, 2014, 2017;Villani, Lugli, Liuzza, and Borghi, 2019) have formally explored this issue, often taking data reduction approaches towards distilling down to key organizational principles.While these studies have shown promise for furthering our understanding of conceptual representation, there has since emerged a richer set of measures, and new dimensions that have yet to be incorporated and accounted for (e.g., Diveica, Pexman, and Binney, 2023;Lynott, Connell, Brysbaert, Brand, and Carney, 2020).
It has recently been suggested that socialness, which refers to the relation of a concept to social experience, could have an important role in the representation of some concepts (Diveica et al., 2023;Pexman, Diveica, and Binney, 2023).Indeed, according to some accounts, social interaction is a mechanism for grounding, or linking a concept's mental representation to its real world referent (Barsalou, 2016).For example, Borghi et al. (2019) suggested that social and linguistic experience jointly facilitate the acquisition of abstract word meanings.Similarly, Barsalou (2020) has argued that conceptual knowledge is grounded in perceptual and motor experiences that are situated both in the social and the physical environment.Indeed, property listing and rating studies have found that socialness can distinguish between concrete and abstract concepts, and between distinct types of abstract concepts, while neuroimaging investigations have found that social, compared to nonsocial information, recruits additional brain regions (for a review, see Pexman et al., 2023, also Conca, Borsa, Cappa, andCatricalà, 2021).Furthermore, individuals with autistic-like traits, who have atypical social experiences, show selective deficits in social concept processing (Birba, López-Pigüi, León Santana, and García, 2023).The availability of new socialness norms for thousands of English words (Diveica et al., 2023) has paved the way to larger-scale investigations into the effects of socialness on lexical-semantic processing.Diveica, Pexman and Binney (2023) quantified socialness as the degree to which words' referents have social relevance by referring to social roles, social behaviours, social institutions, social values and other social constructs.This work has demonstrated that for socially-relevant words, like FAMILY, SO-CIABLE and TO TRUST, performance is facilitated on various types of lexical, semantic and memory tasks (Diveica et al., 2023;Diveica, Muraki, Binney, and Pexman, 2024).Moreover, it demonstrated that socialness captures unique aspects of meaning, which are distinguishable from those indexed by other established semantic dimensions, like concreteness and emotional valence.Together, this body of work suggests that socialness should be incorporated into models of concept knowledge.However, it is unclear where socialness could fit within theories that map out multidimensional semantic space (e.g., Binder and Desai, 2011).
Multiple possibilities about the relationships between socialness and other semantic dimensions have emerged.Functional neuroimaging studies have found that social concepts rely on additional brain regions, some of which could reflect greater demand on affective processing (Binney, Hoffman, and Lambon Ralph, 2016;Rice, Lambon Ralph, and Hoffman, 2015;Rijpma et al., 2023), suggesting that social and affective concept attributes might be closely related.Exploring the clustering of 14 semantic dimensions among 750 English nouns, Troche et al. (2017) found that social semantic content was closely related to emotion, as well as to ratings of associations with thought, morality, and selfgenerated motion (also see Troche et al., 2014).This emerged within a latent factor that was interpreted by the authors as reflecting endogenous cognitive and affective experience.An alternative hypothesis is that social and linguistic experience are intrinsically intertwined, and therefore socialness ratings might covary with measures of embodied aspects of language, such as auditory experiences and mouth action (Borghi et al., 2019).Consistent with this possibility, Binder et al. (2016) explored the clustering of 65 experiential attributes among 434 English nouns and 62 verbs, and found that social interaction was part of a communication factor together with a dimension quantifying communicative tools/behaviours and head action.Partly in line with both possibilities, Villani et al. (2019) explored the clustering of 15 dimensions among 425 abstract Italian nouns and found that socialness clustered with both mouth action and emotionality, as well as interoception and metacognition.The auditory modality, on the other hand, was part of a separate latent factor.The discrepancies between these exploratory investigations into the organization of semantic space can be attributed to several factors, including small word samples restricted to specific word types (e.g., only nouns or abstract concepts) and the consideration of different sets of semantic dimensions.This highlights the need for largerscale explorations of semantic space over thousands of words across a range of concreteness values and parts of speech.
The main aim of the current study was to explore the relationships between various embodied and distributional properties of conceptual meaning to clarify (1) the main organizational principles of semantic space, and (2) the relationships between the newly characterized socialness dimension and other established properties of word meaning.To this end, we conducted a large-scale exploration by capitalizing on openly available word property norms.We adopted a data-driven analytic approach consisting of two main steps: (1) an exploratory factor analysis to uncover the higher-order factors characterizing semantic space, followed by item-level regression analyses to assess their behavioural relevance, and (2) network analysis to reveal the pairwise relationships among dimensions.

Dataset
We selected word properties for inclusion in our analyses based on two main theoretical considerations.First, we only included measures that have been shown to influence lexical-semantic performance and/or been validated as capturing some unique aspects of meaning.Second, in line with multiple representation theories of conceptual knowledge, we included both distributional and embodied dimensions and ensured that the latter covered multiple sources of experiential information (i.e., sensorimotor, affective, social).
Sensorimotor information has traditionally been quantified via concreteness, often conceptualized as the degree to which a word's referent can be experienced through one of the senses (e.g., Brysbaert, Warriner, and Kuperman, 2014), or imageability, an index of the ease with which a word elicits a mental picture of its referent (e.g., Schock, Cortese, and Khanna, 2012).However, research has shown that modality-specific measures predict lexical-semantic performance better than concreteness and imageability (Connell and Lynott, 2012), and that the semantic information pertaining to the different modalities has distinctive effects on task performance (Connell andLynott, 2010, 2014).Moreover, action-related measures explain unique variance in V. Diveica et al. lexical-semantic performance, beyond what can be explained by sensory measures (Lynott et al., 2020).Therefore, we used both modalityspecific sensory measures, and body effector-specific motor dimensions from the Lancaster Sensorimotor Norms (Lynott et al., 2020).These included four sensory dimensions that index the degree to which a concept's referent is experienced through the visual, auditory, haptic, and interoceptive modalities, and five motor measures quantifying the extent to which a concept's referent is experienced through hand/arm, mouth/throat, head, torso and foot/leg actions.
We incorporated three additional embodied dimensions related to emotional and social experience, all of which have been shown to explain unique variance in lexical-semantic tasks, beyond what can be explained by sensorimotor information (e.g., Diveica et al., 2023;Kousta et al., 2011;Kuperman, Estes, Brysbaert, and Warriner, 2014;Lund, Sidhu, and Pexman, 2019;Moffat, Siakaluk, Sidhu, and Pexman, 2015;Zdrazilova and Pexman, 2013).The two affective dimensions included valence extremity, an index of the degree to which the word evokes positive/negative feelings (measured as the absolute difference between the valence rating and the neutral point of the original valence scale by Warriner, Kuperman, and Brysbaert, 2013), and arousal, a measure of the degree to which the word evokes feelings of arousal as opposed to calm (Warriner et al., 2013).The socialness norms collected by Diveica et al. (2023) were used as an index of the extent to which a concept's referent has social relevance.These norms employed a broad and inclusive conceptualization of socialness to capture a variety of social concepts, like social roles (e.g., MOTHER), behaviours (e.g., COOP-ERATE), traits (e.g., LOYAL), places (e.g., FESTIVAL), and social institutions/ideologies (e.g., MARRIAGE) (for examples of contrasting approaches, see Pexman et al., 2023).
Linguistic experience was captured via six distributional measures.First, we included two properties which quantify distribution across time and are central to word processing: word frequency (log subtitle frequency; Brysbaert and New, 2009) and age of acquisition (AoA; Juhasz, 2005).We used a test-based AoA measure derived from Dale and O'Rourke (1981) and updated by Brysbaert and Biemiller (2017).We also included a measure of the average semantic distance between a word and its semantic neighbors (henceforth average neighbourhood similarity; ANS) (Shaoul and Westbury, 2010) because this property influences lexical processing (for a review, see Farsi, 2018).In addition, we included a measure of the extent to which words appear in more semantically diverse contexts, termed semantic diversity (SemD; Hoffman, Lambon Ralph, and Rogers, 2013), which captures aspects of semantic ambiguity and affects lexical-semantic performance (Hoffman and Woollams, 2015).Finally, we included two measures of word form similarity, orthographic Levenshtein distance (OLD) and phonologic Levenshtein distance (PLD; Yarkoni, Balota, and Yap, 2008).These are often used as control variables in the literature because they influence word perception, and they may also be related to words' semantic contentfor example, phonologically/orthographically similar word pairs tend to have more similar meanings (Dautriche, Mahowald, Gibson, and Piantadosi, 2017).
There were 6339 words for which we had values for all 18 lexical and semantic properties of interest.These included 3822 nouns, 1060 verbs, 1438 adjectives and 19 words belonging to some other part of speech.The word sample covered the entire abstract-concrete continuum as illustrated in Fig. S1 in the Supplementary Materials.Descriptive statistics for all dimensions investigated in our word sample are reported in Supplementary Table S1 and their distributions are depicted in Fig. S2.

Analytic approach
All analyses were conducted using the open source software R [version 4.1.1](R Core Team, 2022).The scripts and software details can be accessed via the Open Science Framework project page at htt ps://osf.io/apnyt/.

Exploratory factor analysis
We first assessed the appropriateness of the data for factor analysis.Bartlett's (1954) test of sphericity was used to test whether the correlation matrix was significantly different from an identity matrix, thus ensuring the presence of correlations in the data.In addition, the Kaiser-Meyer-Olkin statistic (Kaiser, 1974) was computed as an index of sampling adequacy.Then, given our aim of identifying latent constructs responsible for the variation of the measured variables, we modelled the data using common factor analysis (Watkins, 2018) as implemented in the R package 'psych' (Revelle, 2022).We employed an iterated principal axis estimation method with squared multiple correlations as the initial communality estimate because this approach makes no distributional assumptions, is robust to having few indicators per factor (de Winter and Dodou, 2012) and is able to recover weak factors (Briggs and MacCallum, 2010).Because univariate skewness and kurtosis were not extreme (see Supplementary Table S1; Curran, West, & Finch, 1996), we computed the correlations using the product moment correlation coefficient.Oblimin oblique rotation was employed to allow for factor intercorrelations (Watkins, 2018).We determined the optimal number of factors for extraction using parallel analysis (Horn, 1965).

Regression analyses
We conducted a series of item-level regression analyses to evaluate whether the latent semantic constructs make unique contributions to lexical-semantic processing.In these analyses, we used the factor scores of the six latent variables as the predictors of interest and behavioural indices of lexical-semantic processing as outcome variables.
The outcome variables were obtained from three behavioural megastudies and included response times (RTs) and error rates from the English Lexicon Project visual lexical decision task (LDT) (Balota et al., 2007), the Auditory English Lexicon Project auditory LDT (Goh, Yap, and Chee, 2020), and the Calgary Semantic Decision Project semantic decision task (SDT) (Pexman, Heard, Lloyd, and Yap, 2017).The full methods for each mega-study are provided in their respective papers, thus only brief descriptions are provided below.The LDT outcome variables quantify the speed and accuracy with which participants could distinguish between words and non-word letter strings that were presented visually (LDT visual) and auditorily in either American, or British accents (LDT auditory).In the case of the auditory LDT, we additionally investigated RT minus stimulus duration (henceforth RT-Duration) because this outcome variable controls for the high variation in the duration of the auditorily-presented word stimuli.In LDT, words that have richer semantic representations are expected to be associated with more efficient processing due to stronger feedback from semantic to orthographic representations (Hino and Lupker, 1996;Hino, Lupker, and Pexman, 2002;Pexman, Lupker, and Hino, 2002).The SDT outcome variables quantify the speed and accuracy with which participants could classify visually presented words as being concrete or abstract.The responses to concrete and abstract words were analysed separately because previous findings suggest that semantic richness effects differ for concrete and abstract decisions (Newcombe, Campbell, Siakaluk, and Pexman, 2012;Pexman et al., 2017;Pexman and Yap, 2018; also see Connell and Lynott, 2012).However, for completeness, we also conducted the analysis on the full SDT dataset, collapsing across concreteness decisions.In SDT, words associated with richer semantic representations are expected to be associated with more efficient processing due to increased semantic activation and/or faster semantic settling (Pexman, Holyk, and Monfils, 2003).Semantic variables tend to explain more variance in the SDT than in LDT (e.g., Taikh, Hargreaves, Yap, and Pexman, 2015;Yap, Pexman, Wellsby, Hargreaves, and Huff, 2012).We used RTs standardized as z-scores to control for individual differences in overall processing speed (Faust, Ferraro, Balota, and Spieler, 1999).
For the predictors of interest, we used the pattern coefficients of the six latent variables extracted in the exploratory factor analysis.To account for potentially confounding effects, we additionally included letter length as a control predictor in the visual LDT and SDT.For the analyses on auditory LDT responses, the number of phonemes was used as control predictor instead of letter length and we additionally controlled for the uniqueness point (the point at which enough phonetic information has been heard to leave only one word-form as a possibility).In the analyses on SDT conducted on the whole word sample, we additionally controlled for concreteness (the extent to which the words' referents can be experienced through one of the five senses; Brysbaert et al., 2014) because it was the decision criterion.
To facilitate direct comparisons between task types, each analysis used the same word sample.Because the SDT dataset included only concrete and abstract words (Pexman et al., 2017), the word sample did not include words with intermediate concreteness scores (i.e., 2.04-3.78 on a 5-point Likert scale according to the norms collected by Brysbaert et al., 2014).All behavioural outcomes were available for 2431 of the words in our dataset.Of these, 1161 were included in the concrete decision SDT analyses, and 1270 in the abstract decision analyses.To ensure that the exclusion of words with intermediate concreteness ratings did not affect the overall results patterns, we repeated the analyses on the maximum sample of words for which visual and auditory LDT behavioural outcomes were available -the full dataset, N = 6339, in the case of visual LDT, and n = 4126 in the case of auditory LDT.

Network analysis
We conducted a network analysis to further investigate the relationships between the lexical and semantic dimensions comprising the semantic space (for a primer on network analysis, see Borsboom et al., 2021;Epskamp and Fried, 2018).In contrast to the factor analysis, in which measured variables are modelled as a function of an unobserved common cause (i.e., latent construct), the network approach conceptualizes the observed variables as forming a network of directly related causal entities (Schaafsma, Pfaff, Spunt, and Adolphs, 2015).Network analysis models the measured variables as nodes that are connected by edges representing pairwise statistical relationships estimated after controlling for all other variables (i.e., nodes) in the dataset (Borsboom et al., 2021).In other words, the network analysis estimates the partial correlations between all variable pairs.The edges linked to an individual node provide the researcher with the anticipated outcome of a multiple regression analysis in which the respective node is the outcome variable, and all other nodes are predictor variablesedge strength is proportional to the magnitude of the regression coefficient, and an edge within the network would not be expected in cases where a predictor variable is not associated with the outcome variable (Epskamp and Fried, 2018).The network analysis can further reveal predictive mediationin the absence of a direct connection, an indirect path between nodes X and Z via node Y suggests that, although X and Z may be correlated, any predictive effect between X and Z is mediated by Y (Epskamp and Fried, 2018).
We estimated a partial correlation network using the R package 'bootnet' (Epskamp, Borsboom, and Fried, 2018).We used the regularized EBICglasso algorithm because this is the algorithm of choice when the aims of the analysis are to (i) visualize the network structure, and (ii) assess the relative importance of nodes via centrality metrics (Isvoranu and Epskamp, 2021), as is the case in the current study.Non-paranormal transformation was used to handle non-normal data (Isvoranu and Epskamp, 2021).The resulting network structure was visualized using the R package 'qgraph' (Epskamp, Cramer, Waldorp, Schmittmann, and Borsboom, 2012).Then, we estimated node strength, a centrality index that quantifies how well a node is connected to the other nodes in the network by computing the sum of absolute edge weights (i.e., partial correlations) connected to each node.
In a final step, we evaluated the accuracy and stability of the network structure and node strengths (following recommendations by Epskamp, Borsboom and Fried, 2018).To assess the accuracy of the estimated edge weights, a nonparametric bootstrap using resampled data with replacement was conducted.This analysis estimates 95% confidence intervals (CIs) around the edge weightsnarrow bootstrapped CIs suggest that the strengths of the edge weights are reliable.To assess the stability of the centrality metrics, a case-dropping bootstrap (subsampling without replacement) was performed.The resulting correlation stability (CS) coefficient quantifies the proportion of data that can be dropped to retain with 95% certainty a correlation of at least 0.7 between the original and re-estimated node strengths.A CS coefficient above 0.5 indicates stable node strengths.

The higher-order structure of the semantic space
Exploratory factor analysis was performed to identify latent constructs among the 18 lexical and semantic variables.The results of Bartlett's test of sphericity confirmed the presence of correlations in the correlation matrix χ2(153) = 42,638.9,p < .001and the Kaiser-Meyer-Olkin statistic of 0.66 suggested that the data were suitable for factor analysis (Kaiser, 1974).The zero-order correlations between the 18 variables are summarized in Fig. 1, and scatterplots of the relationships between socialness and the other dimensions are provided in Supplementary Fig. S3.
The parallel analysis indicated that six factors should be retained (see Fig. 2B).This six-factor solution accounted for 54.13% of the total variance after rotation.The factor loadings of the 18 variables are illustrated in Fig. 2A (also see Supplementary Table S2), and the distributions of factor scores are displayed in Fig. 2D.To aid interpretation, the 10 words with the lowest and highest scores on each factor are presented in Table 1.The first factor (explaining 10.29% of variance) captures Sub-lexical properties, with high loadings from OLD and PLD.The second factor (9.96%) relates to Body Action, having high loadings from the torso and foot/leg motor dimensions.The third factor (9.48%) reflects Distributional language properties, including word frequency, ANS, AoA and semantic diversity.The fourth factor (8.90%) appears to reflect a Social Interaction construct, with high loadings from the auditory perceptual dimension, socialness, as well as mouth and head actions.The fifth factor (8.11%) relates to Visuotactile experience, having high loadings on the visual and haptic perceptual dimensions, as well as the hand/arm motor measure.The sixth and last factor (7.4%) is related to Affective experience, with high loadings from valence extremity, interoception and arousal.We acknowledge that this final factor solution did not reach simplicity, as two cases of complex loadings were identified: AoA loaded strongly on the Distributional and Visuotactile factors, whereas hand/arm action loaded strongly on the Visuotactile and Body Action factors.The Social Interaction factor correlated positively with the Affective and Distributional factors, and negatively with the Visuotactile factor.All inter-factor correlations are displayed in Fig. 2C.

The behavioural relevance of the latent factors
The standardized coefficients estimated in the regression analyses are illustrated in Fig. 3, and the associated statistics are summarized in Supplementary Table S3.In the analyses predicting LDT outcome variables, in general, the Distributional factor and the four embodied factors (i.e., Visuotactile, Body Action, Affective and Social Interaction) had facilitatory effects on behaviour, such that more semantic information (e.g., increased Visuotactile scores) was associated with faster and more accurate responses (except for the Visuotactile factor in Auditory LDT Error Rates and RT-Duration).In contrast, the Sub-lexical factor had inhibitory effects, such that higher word form similarity was associated with slower and less accurate responses (except for Auditory LDT RT-Duration).The same pattern of results was found in the analyses that included words with intermediate concreteness ratings (Supplementary Fig. S4, Table S4).
In the analyses predicting SDT, there were important differences in the way the factors were related to abstract and concrete decisions.
While the Visuotactile and Sub-lexical dimensions had facilitatory effects on concrete decisions, they had inhibitory effects on abstract decisions.The Affective factor showed the opposite pattern, facilitating abstract decisions and inhibiting concrete decisions.The Social Interaction factor had facilitatory effects on abstract decisions but was not significantly related to concrete decisions.Body Action was not significantly related to any of the SDT outcome variables.

Semantic space as a network of interconnected dimensions
The semantic space modelled as a network of 18 lexical and semantic dimensions is illustrated in Fig. 4A, and the pairwise partial correlations are also summarized in Fig. 1.Post-hoc bootstrapping analyses confirmed that the strengths of the estimated edge-weights (i.e., the partial correlations) are reliable (see Supplementary Fig. S5).Visual inspection of the network structure suggests that the dimensions comprising the Sub-lexical factor are sparsely connected to the rest of the semantic space, and mainly via Distributional dimensions.The rest of the dimensions were relatively densely interconnected.With respect to the dimensions contributing to the Social Interaction factor, socialness is most strongly related to auditory information, which largely mediated its relationship with mouth action.Mouth action, in turn, mediated the relationship between the auditory modality and head action.Within the Visuotactile factor, the positive association between the visual modality and hand/arm action was mediated by the haptic dimension.All three Affective variables were directly and strongly interconnected.The strongest relationships within the Distributional dimensions were found between Frequency and ANS (positive), and Frequency and AoA (negative).
We computed node strength as an index of the importance of nodes in the estimated network, with higher values indicating variables that are more strongly directly related to the other lexical-semantic dimensions.The node strengths showed a correlation-stability coefficient of 0.75 suggesting high stability.Interoception was by far the most central dimension within the network, with a node strength significantly larger than all other dimensions (see Fig. 4B).It was followed by the haptic modality, which had significantly higher strength than 16 of the dimensions.All statistically significant differences between nodes' strengths are highlighted in Supplementary Fig. S6.Fig. 4C highlights the average strength between each variable and each of the six factors.

Discussion
Multiple representation accounts of conceptual knowledge have emphasized the crucial importance of properties derived from multiple sources, such as social experience, and it is not clear how these fit together into a single conceptual space.Therefore, we explored the organization of the semantic space underpinning concepts of all concreteness levels in a data-driven fashion in order to (1) uncover latent factors among its multiple dimensions, and (2) reveal where socialness fits within this space.We found that the 18 lexical and semantic properties of interest can be reduced to six higher-order factors reflecting Sub-lexical, Distributional, Visuotactile, Body Action, Affective and Social Interaction attributes.These higher-order factors were related to performance on lexical and semantic tasks, confirming that they capture important aspects of lexical-semantic processing.We further mapped out the complex web of pairwise relationships among the dimensions of interest, which highlighted the central role of interoceptive information.Moreover, within this space, socialness occupied a position closest to the auditory modality, as well as to mouth and head actions, as part of a higher-order factor that may reflect experiential learning from verbal social interactions.Altogether, these findings elucidate the structure of semantic space and point to new directions for future research, which we discuss in detail below.
Socialness, our main dimension of interest, was most related to variables reflecting embodied experience.Specifically, socialness clustered with the auditory modality, and with mouth and head actions.These latter three dimensions were also found to form a Communication component in Dymarska, Connell, and Banks (2023)'s exploration, even though their analysis did not include a social dimension.This finding confirms that socialness, as defined here, can be classified as an embodied dimension of meaning, consistent with theories proposing a role for the social environment in the grounding of abstract concepts (Barsalou, 2020;Borghi et al., 2019).Indeed, certain low-level social abilities, such as understanding others' movements, might rely on specialized 'mirror neuron' mechanisms, wherein the same brain areas are engaged when performing an action and observing someone else perform that action (Bonini, Rotunno, Arcuri, and Gallese, 2022;Heyes and Catmur, 2022; but see Goldman and de Vignemont, 2009).This, however, does not preclude the possibility that non-embodied distributional aspects of the social world might also contribute to conceptual knowledge (see Johns, 2021bJohns, , 2021a)).
Words with high scores on this Social Interaction factor seem to refer to social interactions of a verbal nature, like DISCUSSION and TALKA-TIVE.This finding aligns with the clustering of socialness with a dimension quantifying communicative tools/behaviours in Binder et al. ( 2016)'s study.Moreover, it is in line with the proposal that language is itself a source of embodiment for concept knowledge (Dove, 2022; also see Davis and Yee, 2021).In this perspective, the production (e.g., mouth action) and perception (e.g., auditory stimulation) of language grants access to embodied representations that become indirectly linked to the words' meaning.For instance, the concept SCHOOL might not be learned only through multimodal experiences of schools, but also   through sensorimotor experiences of talking or listening to others talk about schools.Given that language is often embedded in a social context (e.g., face-to-face communication), social information might become intrinsically linked to other embodied aspects of language experience.Borghi et al. (2019) proposed that this intertwined nature of linguistic and social interactions might manifest as a close link between words' social and mouth action properties.Our results confirm this prediction, and further show that this relationship is largely mediated by auditory properties, which represent another embodied aspect of language.The clustering of head action in this factor could be explained by the multimodal nature of verbal social interactions; during face-to-face communication, individuals exchange not only verbal information but also visual cues such as facial expressions and gestures (Murgiano, Motamedi, and Vigliocco, 2021).It has been proposed that the mouth action and auditory dimensions might also be related to inner speechthat is, covert communication with oneself (Dove, Barca, Tummolini, and Borghi, 2022).Inner speech often takes the form of dialogue rather than monologue (Alderson-Day et al  V. Diveica et al. Fernyhough, 2011), which can perhaps also explain the clustering of socialness in this factor.Indeed, Borghi and Fernyhough (2022) proposed that inner speech could be an important mechanism for acquiring and understanding abstract concepts.Interestingly, the words with higher scores on this factor tended to have lower Visuotactile scores, perhaps suggesting that verbal social interactions are less important sources of visual and haptic conceptual properties.In sum, the resulting factor structure supports the idea that socialness contributes to meaning representation as an embodied aspect of verbal interactions.
It is important to note that the socialness norms used here do not distinguish between different types of social features (Diveica et al., 2023).It is likely that socialness is itself a multidimensional construct and that different types of social information make dissociable contributions to concept knowledge.The consideration of more specific social dimensions could potentially result in a different factor structure.Indeed, using a much smaller item set, Binder et al. ( 2016) quantified four fine-grained socially-relevant dimensions -Social (defined as an activity or event that involves an interaction between people), Communication (a thing or action that people use to communicate), Human (having human-like intentions, plans, or goals) and Self (related to one's own view of oneself) -and found differences in the way each of these dimensions related to other experiential dimensions.While Social and Communication clustered together, the Human and Self dimensions reduced to separate latent factors.The Human dimension clustered with the dimensions face, body, speech, and biomotion, suggesting that concepts referring to social agents might represent a sub-type of social concepts.The Self dimension also clustered separately, with the dimensions needs, near (meaning often physically near to oneself in everyday life) and practice (a physical object one has personal experience using), suggesting that self-relevant concepts might be distinct from other-related social concepts.In addition to these and other narrow socialness definitions previously proposed (for more examples, see Pexman et al., 2023), the current results suggest a novel potential distinction between concepts referring to verbal and non-verbal social interactions.Future research that distinguishes between information derived from verbal and non-verbal social experiences would be useful in determining to what extent embodied aspects of language contribute to the relationships we have observed between socialness and other lexical and semantic variables.
Unlike in previous smaller-scale studies (Troche et al., 2014(Troche et al., , 2017;;Villani et al., 2019), the social and affective dimensions did not reduce to a common latent factor.This is consistent with the finding that, when independently manipulated, social and valenced words are associated with partially dissociable neural correlates (Wang, Wang, and Bi, 2019; also see Arioli, Gianelli, and Canessa, 2021).Moreover, impaired social word processing but preserved emotional word processing has been reported in case studies of patients with neurodegenerative disorders (Catricalà, Della Rosa, Plebani, Vigliocco, and Cappa, 2014; also see Catricalà et al., 2021) and localized brain lesions (Wang et al., 2021).Nevertheless, although socialness and the affective dimensions clustered separately, the words with higher Affective scores tended to have higher Social Interaction scores, suggesting a link between social and emotional experience.It has been suggested that this might be explained by their similar reliance on brain regions involved in hedonic evaluation (Rijpma et al., 2023).Future research that explores the hedonic value of concepts' referents could thus provide further insights into the relationship between social and affective dimensions of meaning.
The affective dimensions, valence extremity and arousal, clustered with the interoceptive modality.This is perhaps unsurprising given proposals that interoception, which refers to the processing of sensory signals from within the body (e.g., heart beat), is the basis of emotional experience (Critchley and Garfinkel, 2017;Quigley, Kanoski, Grill, Barrett, and Tsakiris, 2021).Compared to the exteroceptive perceptual modalities, the role of interoception in conceptual representation has received little attention.Nevertheless, recent evidence suggests that, just like exteroception, interoception contributes to the perceptual grounding of concepts (Connell, Lynott, and Banks, 2018;Villani, Lugli, Liuzza, Nicoletti, and Borghi, 2021; also see Borghi et al., 2019).We found that interoception was the most central dimension in the semantic space, having the strongest total direct connections and showing associations with all latent factors except for the Sub-lexical factor.This finding is consistent with the proposal that interoception plays an important role in concept knowledge.The clustering of the interoceptive and exteroceptive dimensions onto different factors, as well as the negative relationship between Affective and Visuotactile scores, might reflect a distinction between experiences that are internal vs external to the self.Such an 'internality/externality' dimension has been found to explain variation in the neural patterns associated with individual words (Vargas and Just, 2020).The potential importance of this distinction could be elucidated by future research that investigates how other inner experiences that have been proposed to play a role in conceptual representation, like non-emotional mental states and metacognition (Barsalou, 2020;Borghi et al., 2019;Shea, 2018), fit within the semantic space.
The modality-and effector-specific sensorimotor dimensions clustered onto four separate higher-order factors.Two of these factors were purely comprised of sensorimotor properties and did not reflect a perceptual vs motor distinctionthe visual and haptic modalities clustered with hand/arm action into a Visuotactile factor, whereas torso and foot/leg action clustered into a Body Action factor.This finding suggests that there are important distinctions even within sensorimotor representations (also seeDymarska et al., 2023;Muraki et al., 2020).Indeed, these two factors related to behaviour differentlywhile the Visuotactile factor shows a significant relationship with both LDT and SDT responses, Body Action is related to LDT responses but shows no significant relationships with abstract and concrete SDT decisions.This might suggest that Body Action information is not helpful when deciding whether a word is abstract or concrete.In contrast, the pattern of associations between the Visuotactile factor and SDT decisions might reflect the tendency for concreteness ratings to be highly influenced by the degree of visual information associated with a word's referent (Brysbaert et al., 2014;Connell and Lynott, 2012) and, therefore, visual sensory information may be more diagnostic of whether a word is abstract or concrete.The Visuotactile factor may also reflect the experience of grasping words' referents, which seems to be more strongly related to word processing than other motor aspects of body-object interactions (Heard, Madan, Protzner, and Pexman, 2019).Together with previous reports of modality-specific effects on the behavioural and neural correlates of word processing (e.g., Connell and Lynott, 2014;Kuhnke et al., 2020) and effector-specific simulation mechanisms (Muraki, Dahm, and Pexman, 2023), our results emphasise the necessity to more thoroughly investigate the fine-grained sensorimotor dimensions of concept knowledge.Nevertheless, the Visuotactile and Body Action factors were correlated, perhaps reflective of the fact that seeing and touching a concept's referent with one's body are often intertwined experiences.
A general pattern that arose was the separation of the distributional dimensions and the embodied dimensions into different higher-order factors.This is consistent with previous findings (Dymarska et al., 2023;Muraki et al., 2020) and suggests that distributional and embodied dimensions capture qualitatively different aspects of meaning.Notably, the distinction between the Distributional and Social Interaction factors might suggest that language experience makes two dissociable contributions to conceptual representationpurely linguistic distributional information and embodied information derived from verbal social exchanges.Indeed, both distributional and embodied higher-order factors, including Social Interaction, independently contributed to LDT and SDT responses, indicative of complementary roles in lexical-semantic processing.Therefore, together with prior research (Andrews, Vigliocco, and Vinson, 2009;Banks et al., 2021;Louwerse and Jeuniaux, 2010;Muraki et al., 2020), our study provides evidence for weak embodiment, or hybrid, theories of concept knowledge, which include multiple representation theories and posit that semantic knowledge is derived from both embodied experience and distributional linguistic properties (Andrews et al., 2014;Dove, 2011;Louwerse, 2018;Meteyard et al., 2012).The distributional dimensions separated into two factors that displayed higher correlations with each other than with the embodied dimensions -OLD and PLD clustered onto a Sub-lexical factor, while Frequency, AoA, SemD and ANS clustered into a separate Distributional factor.The Sub-lexical factor capturing wordform properties was found on the outskirts of semantic space, perhaps suggesting that it primarily contributes to word perception rather than word meaning, as proposed in models of word recognition (Rastle, 2016).In contrast, the Distributional factor was more tightly coupled with other semantic dimensions, indicative of greater contribution to word meaning.
The higher-order factors identified are behaviourally relevant as demonstrated by the finding that they accounted for unique variance in lexical-semantic performance across three different tasks.Their relationship to LDT task responses was mainly facilitatory in nature (except for the Sub-lexical factor), in line with semantic richness effects whereby words with richer meanings (e.g., more sensorimotor features) are processed more efficiently (for a review, see Pexman, 2012).This facilitation is thought to arise from stronger feedback from semantic to orthographic/phonologic representations (Pexman et al., 2002).Importantly, the higher-order factors were related to concrete and abstract SDT decisions in different ways, highlighting fundamental differences in how concrete and abstract concepts are processed.As expected (e.g., Banks and Connell, 2022;Newcombe et al., 2012), higher Visuotactile scores facilitated concrete decisions, but inhibited abstract decisions, suggesting that visual and haptic properties are diagnostic of concrete concepts.In contrast, higher Affective scores facilitated abstract decisions, but inhibited concrete decisions, suggesting that affective information is diagnostic of abstract concepts.This is in line with the Affective Embodiment Account, which proposes that, while concrete concepts are grounded through sensorimotor experience, abstract concepts are grounded through emotional experience (Kousta et al., 2011;Vigliocco et al., 2014), and with findings that interoception contributes more to abstract concepts, and to emotion concepts in particular, compared to concrete concepts (Connell et al., 2018).Higher Social Interaction scores facilitated abstract decisions but were unrelated to concrete decisions.This finding is consistent with Borghi et al. (2019)'s proposal that linguistic and social interactions are the primary means of acquiring abstract words, and, hence, abstract words, compared to concrete words, are associated with more linguistic and social attributes that can facilitate their processing (for a discussion on the relationship between social interaction and abstractness, see Borghi, 2023).However, a recent study investigating the effect of socialness, by itself, on lexical-semantic performance found the opposite pattern and mixed evidence in favour of greater contribution of socialness to abstract concepts (Diveica et al., 2024).This discrepancy highlights the need to further examine how social and embodied aspects of language experience jointly, rather than independently, support conceptual representations.Given differences in the relevance of these higher-order factors for concrete and abstract SDT decisions, future research should explore whether the organization of semantic space differs between these two concept types.Considering more specific aspects of social experience (see Pexman et al., 2023) could prove particularly beneficial in this endeavour as it is possible that concrete and abstract concepts are related to different aspects of the social environment.Overall, the finding that multiple latent factors simultaneously contributed to performance on each task supports the proposal that conceptual knowledge is derived from a combination of sensorimotor, linguistic, affective and social experiences (e.g., Barsalou, 2008;Borghi et al., 2019).

Conclusion
We have conducted a large-scale exploration of semantic space, encompassing 18 variables and over 6000 words, and found clusters of related semantic dimensions that correspond to different types of concept knowledge: distributional, visuotactile, body action, social V. Diveica et al. interaction, and affective.The concurrent contribution of all these types of information to word meaning can be explained by theories of conceptual knowledge that assume multiple interdependent representational systems (e.g., Andrews et al., 2014;Borghi and Binkofski, 2014;Connell, 2019;Lambon Ralph et al., 2017).The novel relationships observed between the recently collected socialness norms (Diveica et al., 2023) and established aspects of word meaning suggest that socialness contributes to concept knowledge as an embodied aspect of language experience.Overall, the current results inform multiple representation theories by elucidating how different semantic dimensions might be both related and distinct, and highlight promising directions for future research.

Fig. 1 .
Fig. 1.Correlations between the 18 lexical and semantic dimensions among 6339 words.The strength and direction of the product-moment correlation coefficients are indicated by the colour and the numerical values.This correlation matrix is asymmetric.The bottom right corner, highlighted in blue, displays zero-order correlations.Only correlations significant at p < .01 are shown.The top left corner, highlighted in red, displays partial correlations estimated via network analysis, which quantifies pairwise correlations while controlling for all other variables -these values correspond to the line thickness and colour of the edge weights in Fig. 4A.All non-zero correlation coefficients are shown (note that the network analysis does not compute p-values).AoA = age of acquisition; ANS = average neighbourhood similarity; SemD = semantic diversity; PLD = phonologic Levenshtein distance; OLD = orthographic Levenshtein distance.(For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)

Fig. 2 .
Fig. 2. Results of exploratory factor analysis of 18 lexical and semantic variables based on a sample of 6339 words.Panel Athe loading (pattern coefficients) of the 18 variables onto six latent factors.Only factor loadings greater than 0.3 are displayed (for all loadings, see Supplementary TableS2).Bar colour and length indicate the strength of the loading.Panel B -Parallel analysis scree plot showing eigenvalues by number of factors based on actual lexical and semantic variable data, simulated data, and resampled data.Panel C -Pair-wise zero-order correlations of factor scores.Panel Dkernel density plots of factor scores.AoA = age of acquisition; ANS = average neighbourhood similarity; SemD = semantic diversity; PLD = phonologic Levenshtein distance; OLD = orthographic Levenshtein distance.(For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.) Fig. 2. Results of exploratory factor analysis of 18 lexical and semantic variables based on a sample of 6339 words.Panel Athe loading (pattern coefficients) of the 18 variables onto six latent factors.Only factor loadings greater than 0.3 are displayed (for all loadings, see Supplementary TableS2).Bar colour and length indicate the strength of the loading.Panel B -Parallel analysis scree plot showing eigenvalues by number of factors based on actual lexical and semantic variable data, simulated data, and resampled data.Panel C -Pair-wise zero-order correlations of factor scores.Panel Dkernel density plots of factor scores.AoA = age of acquisition; ANS = average neighbourhood similarity; SemD = semantic diversity; PLD = phonologic Levenshtein distance; OLD = orthographic Levenshtein distance.(For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.) Fig. 2. Results of exploratory factor analysis of 18 lexical and semantic variables based on a sample of 6339 words.Panel Athe loading (pattern coefficients) of the 18 variables onto six latent factors.Only factor loadings greater than 0.3 are displayed (for all loadings, see Supplementary TableS2).Bar colour and length indicate the strength of the loading.Panel B -Parallel analysis scree plot showing eigenvalues by number of factors based on actual lexical and semantic variable data, simulated data, and resampled data.Panel C -Pair-wise zero-order correlations of factor scores.Panel Dkernel density plots of factor scores.AoA = age of acquisition; ANS = average neighbourhood similarity; SemD = semantic diversity; PLD = phonologic Levenshtein distance; OLD = orthographic Levenshtein distance.(For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)

Fig. 3 .
Fig. 3.The relationship between the six latent factors and performance in lexical and semantic tasks.The magnitude and signs of the standardized regression coefficients are indicated by the colour of the squares and the numerical values.The adjusted R 2 , quantifying the proportion of variance explained altogether by the predictors, is provided for each behavioural outcome.Note that these analyses included additional control predictorssee Section 2.2.2.Panel A. Results of analyses conducted on all overlapping words (n = 2431).In the case of the Auditory LDT, only the results of the analyses on stimuli pronounced with a US accent are displayed.For the results of stimuli pronounced with a UK accent, see Supplementary TableS2.Panel B. Results of the separate analyses of SDT performance on concrete (n = 1161) and abstract (n = 1270) decisions/words.(For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)

Fig. 4 .
Fig. 4. Network model describing the relationships between 18 lexical and semantic variables over 6339 words.Panel A. Network structure.Line thickness is proportional to the edge strength, which quantifies the magnitude of the partial correlations between node pairs.Line colours indicate the direction of the correlation, in which purple lines correspond to a positive correlation while orange lines correspond to a negative correlation.The nodes are coloured according to factors identified in the exploratory factor analysis -see Section 3.1.Panel B. Node centrality as indexed by strength (i.e., the sum of the absolute edge-weights of each node's direct connections); significant differences between node pairs are highlighted in Supplementary Fig. S6.Panel C. The mean absolute strength of the estimated partial correlations between each of the 18 variables and the dimensions comprising each latent factor.The magnitude of the mean connection strength is indicated by the colour of the squares, with darker colours indicating stronger mean connections.The black boxes highlight mean intra-factor connections.(For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)

Table 1
The ten words with the highest and lowest scores on each of the six latent dimensions.