Abstract
Semantic property listing tasks require participants to generate short propositions (e.g., \({<}{} barks {>}\), \({<}{} has\,\, fur {>}\)) for a specific concept (e.g., DOG). This task is the cornerstone of the creation of semantic property norms which are essential for modeling, stimuli creation, and understanding similarity between concepts. Despite the wide applicability of semantic property norms for a large variety of concepts across different groups of people, the methodological aspects of the property listing task have received less attention, even though the procedure and processing of the data can substantially affect the nature and quality of the measures derived from them. The goal of this paper is to provide a practical primer on how to collect and process semantic property norms. We will discuss the key methods to elicit semantic properties and compare different methods to derive meaningful representations from them. This will cover the role of instructions and test context, property preprocessing (e.g., lemmatization), property weighting, and relationship encoding using ontologies. With these choices in mind, we propose and demonstrate a processing pipeline that transparently documents these steps, resulting in improved comparability across different studies. The impact of these choices will be demonstrated using intrinsic (e.g., reliability, number of properties) and extrinsic measures (e.g., categorization, semantic similarity, lexical processing). This practical primer will offer potential solutions to several long-standing problems and allow researchers to develop new property listing norms overcoming the constraints of previous studies.
Similar content being viewed by others
Notes
Throughout this article, <features> will be distinguished from CUES using angular brackets and italic font.
For transparency, the updated csv file should be renamed, which also practically keeps one from overwriting their adjustments if they rerun their code. The csv should be loaded as spelling.dict to continue with the code below.
We mainly focus on lemmatization and do not proceed stemming the word because it introduces additional ambiguity. More specifically, stemming involves processing words using heuristics to remove affixes or inflections, such as ing or s. The stem or root word may not reflect an actual word in the language, as simply removing an affix does not necessarily produce the lemma. For example, in response to AIRPLANE, <flying> can be easily converted to <fly> by removing the ing inflection. However, this same heuristic converts the feature <wings> into <w> after removing both the s for a plural marker and the ing for a participle marker.
These results were lemmatized by creating a lookup dictionary from the features listed in the Buchanan et al. (2019) norms.
References
Aust F, Barth M (2017) papaja: create APA manuscripts with R Markdown. https://github.com/crsh/papaja. Accessed 15 Oct 2019
Baroni M, Murphy B, Barbu E, Poesio M (2010) Strudel: a corpus-based semantic model based on properties and types. Cognit Sci 34(2):222–254. https://doi.org/10.1111/j.1551-6709.2009.01068.x
Benoit K, Muhr D, Watanabe K (2017) stopwords: multilingual stopword lists. https://cran.r-project.org/web/packages/stopwords/index.html. Accessed 15 Oct 2019
Bruni E, Tran NK, Baroni M (2014) Multimodal distributional semantics. J Artif Intell Res 49:1–47. https://doi.org/10.1613/jair.4135
Brysbaert M, Warriner AB, Kuperman V (2014) Concreteness ratings for 40 thousand generally known English word lemmas. Behav Res Methods 46(3):904–911. https://doi.org/10.3758/s13428-013-0403-5
Buchanan EM, Holmes JL, Teasley ML, Hutchison KA (2013) English semantic word-pair norms and a searchable web portal for experimental stimulus creation. Behav Res Methods 45(3):746–757. https://doi.org/10.3758/s13428-012-0284-z
Buchanan EM, Valentine KD, Maxwell NP (2019) English semantic feature production norms: an extended database of 4436 concepts. Behav Res Methods 51(4):1849–1863. https://doi.org/10.3758/s13428-019-01243-z
Caramazza A, Laudanna A, Romani C (1988) Lexical access and inflectional morphology. Cognition 28(3):297–332. https://doi.org/10.1016/0010-0277(88)90017-0
Catricalà E, Della Rosa PA, Plebani V, Perani D, Garrard P, Cappa SF (2015) Semantic feature degradation and naming performance. Evidence from neurodegenerative disorders. Brain Lang 147:58–65. https://doi.org/10.1016/J.BANDL.2015.05.007
Collins AM, Quillian MR (1969) Retrieval time from semantic memory. J Verbal Learn Verbal Behav 8(2):240–247. https://doi.org/10.1016/S0022-5371(69)80069-1
Cree GS, McRae K (2003) Analyzing the factors underlying the structure and computation of the meaning of chipmunk, cherry, chisel, cheese, and cello (and many other such concrete nouns). J Exp Psychol Gen 132(2):163–201. https://doi.org/10.1037/0096-3445.132.2.163
De Deyne S, Verheyen S, Ameel E, Vanpaemel W, Dry MJ, Voorspoels W, Storms G (2008) Exemplar by feature applicability matrices and other Dutch normative data for semantic concepts. Behav Res Methods 40(4):1030–1048. https://doi.org/10.3758/BRM.40.4.1030
De Queiroz G, Hvitfeldt E, Keyes O, Misra K, Mastny T, Erickson J et al (2019) tidytext: text mining using ’dplyr’, ’ggplot2’, and other tidy tools. https://cran.r-project.org/web/packages/tidytext/index.html. Accessed 15 Oct 2019
Devereux BJ, Tyler LK, Geertzen J, Randall B (2014) The centre for speech, language and the brain (CSLB) concept property norms. Behav Res Methods 46(4):1119–1127. https://doi.org/10.3758/s13428-013-0420-4
Duarte LR, Marquié L, Marquié JC, Terrier P, Ousset PJ (2009) Analyzing feature distinctiveness in the processing of living and non-living concepts in Alzheimer’s disease. Brain Cognit 71(2):108–117. https://doi.org/10.1016/j.bandc.2009.04.007
Fairhall SL, Caramazza A (2013) Category-selective neural substrates for person- and place-related concepts. Cortex 49(10):2748–2757. https://doi.org/10.1016/j.cortex.2013.05.010
Farah MJ, McClelland JL (1991) A computational model of semantic memory impairment: modality specificity and emergent category specificity. J Exp Psychol Gen 120(4):339–357. https://doi.org/10.1037/0096-3445.120.4.339
Gagolewski M, Tartanus B (2019) stringi: character string processing facilities. https://cran.r-project.org/web/packages/stringi/index.html. Accessed 15 Oct 2019
Garrard P, Lambon Ralph MA, Hodges JR, Patterson K (2001) Prototypicality, distinctiveness, and intercorrelation: analyses of the semantic attributes of living and nonliving concepts. Cognit Neuropsychol 18(2):125–174. https://doi.org/10.1080/02643290125857
Humphreys GW, Forde EM (2001) Hierarchies, similarity, and interactivity in object recognition: "category-specific" neuropsychological deficits. Behav Brain Sci 24(3):453–476
Jackendoff R (1992) Semantic structures. MIT Press, Boston
Jackendoff R (2002) Foundations of language (brain, meaning, grammar, evolution). Oxford University Press, Oxford
Jones LL, Golonka S (2012) Different influences on lexical priming for integrative, thematic, and taxonomic relations. Front Hum Neurosci 6:205. https://doi.org/10.3389/fnhum.2012.00205
Kremer G, Baroni M (2011) A set of semantic norms for German and Italian. Behav Res Methods 43(1):97–109. https://doi.org/10.3758/s13428-010-0028-x
Lebani GE, Lenci A, Bondielli A (2016) You are what you do: an empirical characterization of the semantic content of the thematic roles for a group of Italian verbs. J Cognit Sci 16(4):401–430. https://doi.org/10.17791/jcs.2015.16.4.401
Lenci A, Baroni M, Cazzolli G, Marotta G (2013) BLIND: a set of semantic feature norms from the congenitally blind. Behav Res Methods 45(4):1218–1233. https://doi.org/10.3758/s13428-013-0323-4
Marques JF, Fonseca FL, Morais S, Pinto IA (2007) Estimated age of acquisition norms for 834 Portuguese nouns and their relation with other psycholinguistic variables. Behav Res Methods 39(3):439–444. https://doi.org/10.3758/BF03193013
McRae K, Cree GS, Seidenberg MS, McNorgan C (2005) Semantic feature production norms for a large set of living and nonliving things. Behav Res Methods 37(4):547–559. https://doi.org/10.3758/BF03192726
Michalke M (2018) koRpus: an R package for text analysis. https://cran.r-project.org/web/packages/koRpus/index.html. Accessed 15 Oct 2019
Minsky M (1975) A framework for representing knowledge. In: Winston PH (ed) The psychology of computer vision. McGraw Hill, Winston, pp 211–277
Montefinese M, Ambrosini E, Fairfield B, Mammarella N (2013) Semantic memory: a feature-based analysis and new norms for Italian. Behav Res Methods 45(2):440–461. https://doi.org/10.3758/s13428-012-0263-4
Montefinese M, Ambrosini E, Fairfield B, Mammarella N (2014) Semantic significance: a new measure of feature salience. Mem Cogni 42(3):355–369. https://doi.org/10.3758/s13421-013-0365-y
Montefinese M, Zannino GD, Ambrosini E (2015) Semantic similarity between old and new items produces false alarms in recognition memory. Psychol Res 79(5):785–794. https://doi.org/10.1007/s00426-014-0615-z
Montefinese M, Vinson D, Ambrosini E (2018) Recognition memory and featural similarity between concepts: the pupil’s point of view. Biol Psychol 135:159–169. https://doi.org/10.1016/J.BIOPSYCHO.2018.04.004
Norman DA, Rumelhart DE (1975) Explorations in cognition. Freeman, San Francisco
Nosek BA, Alter G, Banks GC, Borsboom D, Bowman SD, Breckler SJ, Yarkoni T (2015) Promoting an open research culture. Science 348(6242):1422–1425. https://doi.org/10.1126/science.aab2374
Ooms J (2018) The hunspell package: high-performance stemmer, Tokenizer, and spell checker for R. https://cran.r-project.org/web/packages/hunspell/. Accessed 15 Oct 2019
Peng RD (2011) Reproducible research in computational science. Science (New York, N.Y.) 334(6060):1226–1227. https://doi.org/10.1126/science.1213847
Pexman PM, Hargreaves IS, Siakaluk PD, Bodner GE, Pope J (2008) There are many ways to be rich: effects of three measures of semantic richness on visual word recognition. Psychon Bull Rev 15(1):161–167. https://doi.org/10.3758/PBR.15.1.161
Plaut DC (2002) Graded modality-specific specialisation in semantics: a computational account of optic aphasia. Cognit Neuropsychol 19(7):603–639. https://doi.org/10.1080/02643290244000112
Recchia G, Jones MN (2012) The semantic richness of abstract concepts. Front Hum Neurosci 6:315. https://doi.org/10.3389/fnhum.2012.00315
Rogers TT, Lambon Ralph MA, Garrard P, Bozeat S, McClelland JL, Hodges JR, Patterson K (2004) Structure and deterioration of semantic memory: a neuropsychological and computational investigation. Psychol Rev 111(1):205–235. https://doi.org/10.1037/0033-295X.111.1.205
Rosch E, Mervis CB (1975) Family resemblances: studies in the internal structure of categories. Cognit Psychol 7(4):573–605. https://doi.org/10.1016/0010-0285(75)90024-9
Ruts W, De Deyne S, Ameel E, Vanpaemel W, Verbeemen T, Storms G (2004) Dutch norm data for 13 semantic categories and 338 exemplars. Behav Res Methods Instrum Comput 36(3):506–515. https://doi.org/10.3758/BF03195597
Saffran E, Sholl A (1999) Clues to the function and neural architecture of word meaning. In: Hogoort P, Brown C (eds) The neurocognition of language. Oxford University Press, Oxford
Santos A, Chaigneau SE, Simmons WK, Barsalou LW (2011) Property generation reflects word association and situated simulation. Lang Cognit 3(1):83–119. https://doi.org/10.1515/langcog.2011.004
Sartori G, Lombardi L (2004) Semantic relevance and semantic disorders. J Cognit Neurosci 16(3):439–452. https://doi.org/10.1162/089892904322926773
Schmid H (1994) Probabilistic part of speech tagging using decision trees. https://www.cis.uni-muenchen.de/~schmid/tools/TreeTagger/data/tree-tagger1.pdf
Smith E, Medin DL (1981) Categories and concepts, vol 9. Harvard University Press, Cambridge
Smith EE, Shoben EJ, Rips LJ (1974) Structure and process in semantic memory: a featural model for semantic decisions. Psychol Rev 81(3):214–241. https://doi.org/10.1037/h0036351
Ushey K, McPherson J, Cheng J, Atkins A, Allaire J (2018) packrat: a dependency management system for projects and their R rackage dependencies. https://cran.r-project.org/web/packages/packrat/index.html. Accessed 15 Oct 2019
Vigliocco G, Vinson DP, Lewis W, Garrett MF (2004) Representing the meanings of object and action words: the featural and unitary semantic space hypothesis. Cognit Psychol 48(4):422–488. https://doi.org/10.1016/j.cogpsych.2003.09.001
Vinson DP, Vigliocco G (2008) Semantic feature production norms for a large set of objects and events. Behav Res Methods 40(1):183–190. https://doi.org/10.3758/BRM.40.1.183
Vivas J, Vivas L, Comesaña A, Coni AG, Vorano A (2017) Spanish semantic feature production norms for 400 concrete concepts. Behav Res Methods 49(3):1095–1106. https://doi.org/10.3758/s13428-016-0777-2
Wickham H, Francios R, Henry L, Muller K, Rstudio (2019) dplyr: a grammar of data manipulation. https://cloud.r-project.org/web/packages/dplyr/index.html. Accessed 15 Oct 2019
Wiemer-Hastings K, Xu X (2005) Content differences for abstract and concrete concepts. Cognit Sci 29(5):719–736. https://doi.org/10.1207/s15516709cog0000_33
Wu L-L, Barsalou LW (2009) Perceptual simulation in conceptual combination: evidence from property generation. Acta Psychol 132(2):173–189. https://doi.org/10.1016/j.actpsy.2009.02.002
Zannino GD, Perri R, Pasqualetti P, Caltagirone C, Carlesimo GA (2006a) Analysis of the semantic representations of living and nonliving concepts: a normative study. Cognit Neuropsychol 23(4):515–540. https://doi.org/10.1080/02643290542000067
Zannino GD, Perri R, Pasqualetti P, Caltagirone C, Carlesimo GA (2006b) (Category-specific) semantic deficit in Alzheimer’s patients: the role of semantic distance. Neuropsychologia 44(1):52–61. https://doi.org/10.1016/j.neuropsychologia.2005.04.008
Acknowledgements
We would like to thank the editor and two anonymous reviewers for their helpful comments in shaping this manuscript.
Funding
This work was supported by the European Union’s Horizon 2020 research and innovation program under the Marie Skłodowska-Curie Grant Agreement No. 702655 and by the University of Padua (SID 2018) to MM.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest.
Ethical approval
All procedures performed in studies involving human participants were in accordance with the ethical standards of the institutional and/or national research committee (include name of committee + reference number) and with the 1964 Helsinki Declaration and its later amendments or comparable ethical standards.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Guest-editor: Barry Devereux (Queen’s University Belfast); Reviewers: Anna Rogers (University of Massachusetts Lowell) and a second researcher who prefers to remain anonymous In addition to information on editor and reviewers.
This manuscript is part of the special topic on ‘Eliciting Semantic Properties: Methods and Applications’ guest-edited by Enrico Canessa, Sergio Chaigneau, Barry Devereux, and Alessandro Lenci.
Rights and permissions
About this article
Cite this article
Buchanan, E.M., De Deyne, S. & Montefinese, M. A practical primer on processing semantic property norm data. Cogn Process 21, 587–599 (2020). https://doi.org/10.1007/s10339-019-00939-6
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10339-019-00939-6