A practical primer on processing semantic property norm data

Buchanan, Erin M.; De Deyne, Simon; Montefinese, Maria

doi:10.1007/s10339-019-00939-6

A practical primer on processing semantic property norm data

Research Article
Published: 25 November 2019

Volume 21, pages 587–599, (2020)
Cite this article

Cognitive Processing Aims and scope Submit manuscript

487 Accesses
8 Citations
15 Altmetric
1 Mention
Explore all metrics

Abstract

Semantic property listing tasks require participants to generate short propositions (e.g., \({<}{} barks {>}\), \({<}{} has\,\, fur {>}\)) for a specific concept (e.g., DOG). This task is the cornerstone of the creation of semantic property norms which are essential for modeling, stimuli creation, and understanding similarity between concepts. Despite the wide applicability of semantic property norms for a large variety of concepts across different groups of people, the methodological aspects of the property listing task have received less attention, even though the procedure and processing of the data can substantially affect the nature and quality of the measures derived from them. The goal of this paper is to provide a practical primer on how to collect and process semantic property norms. We will discuss the key methods to elicit semantic properties and compare different methods to derive meaningful representations from them. This will cover the role of instructions and test context, property preprocessing (e.g., lemmatization), property weighting, and relationship encoding using ontologies. With these choices in mind, we propose and demonstrate a processing pipeline that transparently documents these steps, resulting in improved comparability across different studies. The impact of these choices will be demonstrated using intrinsic (e.g., reliability, number of properties) and extrinsic measures (e.g., categorization, semantic similarity, lexical processing). This practical primer will offer potential solutions to several long-standing problems and allow researchers to develop new property listing norms overcoming the constraints of previous studies.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Natural Language Processing

Decision Making: a Theoretical Review

Article 15 November 2021

Matteo Morelli, Maria Casagrande & Giuseppe Forte

Semantic memory: A review of methods, models, and current challenges

Article 03 September 2020

Abhilasha A. Kumar

Notes

Throughout this article, <features> will be distinguished from CUES using angular brackets and italic font.
A packrat project compilation is available on GitHub for reproducibility (Ushey et al. 2018), and this manuscript was written in Rmarkdown with papaja (Aust and Barth 2017).
For transparency, the updated csv file should be renamed, which also practically keeps one from overwriting their adjustments if they rerun their code. The csv should be loaded as spelling.dict to continue with the code below.
We mainly focus on lemmatization and do not proceed stemming the word because it introduces additional ambiguity. More specifically, stemming involves processing words using heuristics to remove affixes or inflections, such as ing or s. The stem or root word may not reflect an actual word in the language, as simply removing an affix does not necessarily produce the lemma. For example, in response to AIRPLANE, <flying> can be easily converted to <fly> by removing the ing inflection. However, this same heuristic converts the feature <wings> into <w> after removing both the s for a plural marker and the ing for a participle marker.
These results were lemmatized by creating a lookup dictionary from the features listed in the Buchanan et al. (2019) norms.

References

Aust F, Barth M (2017) papaja: create APA manuscripts with R Markdown. https://github.com/crsh/papaja. Accessed 15 Oct 2019
Baroni M, Murphy B, Barbu E, Poesio M (2010) Strudel: a corpus-based semantic model based on properties and types. Cognit Sci 34(2):222–254. https://doi.org/10.1111/j.1551-6709.2009.01068.x
Article Google Scholar
Benoit K, Muhr D, Watanabe K (2017) stopwords: multilingual stopword lists. https://cran.r-project.org/web/packages/stopwords/index.html. Accessed 15 Oct 2019
Bruni E, Tran NK, Baroni M (2014) Multimodal distributional semantics. J Artif Intell Res 49:1–47. https://doi.org/10.1613/jair.4135
Article Google Scholar
Brysbaert M, Warriner AB, Kuperman V (2014) Concreteness ratings for 40 thousand generally known English word lemmas. Behav Res Methods 46(3):904–911. https://doi.org/10.3758/s13428-013-0403-5
Article PubMed Google Scholar
Buchanan EM, Holmes JL, Teasley ML, Hutchison KA (2013) English semantic word-pair norms and a searchable web portal for experimental stimulus creation. Behav Res Methods 45(3):746–757. https://doi.org/10.3758/s13428-012-0284-z
Article PubMed Google Scholar
Buchanan EM, Valentine KD, Maxwell NP (2019) English semantic feature production norms: an extended database of 4436 concepts. Behav Res Methods 51(4):1849–1863. https://doi.org/10.3758/s13428-019-01243-z
Article PubMed Google Scholar
Caramazza A, Laudanna A, Romani C (1988) Lexical access and inflectional morphology. Cognition 28(3):297–332. https://doi.org/10.1016/0010-0277(88)90017-0
Article CAS PubMed Google Scholar
Catricalà E, Della Rosa PA, Plebani V, Perani D, Garrard P, Cappa SF (2015) Semantic feature degradation and naming performance. Evidence from neurodegenerative disorders. Brain Lang 147:58–65. https://doi.org/10.1016/J.BANDL.2015.05.007
Article PubMed Google Scholar
Collins AM, Quillian MR (1969) Retrieval time from semantic memory. J Verbal Learn Verbal Behav 8(2):240–247. https://doi.org/10.1016/S0022-5371(69)80069-1
Article Google Scholar
Cree GS, McRae K (2003) Analyzing the factors underlying the structure and computation of the meaning of chipmunk, cherry, chisel, cheese, and cello (and many other such concrete nouns). J Exp Psychol Gen 132(2):163–201. https://doi.org/10.1037/0096-3445.132.2.163
Article PubMed Google Scholar
De Deyne S, Verheyen S, Ameel E, Vanpaemel W, Dry MJ, Voorspoels W, Storms G (2008) Exemplar by feature applicability matrices and other Dutch normative data for semantic concepts. Behav Res Methods 40(4):1030–1048. https://doi.org/10.3758/BRM.40.4.1030
Article PubMed Google Scholar
De Queiroz G, Hvitfeldt E, Keyes O, Misra K, Mastny T, Erickson J et al (2019) tidytext: text mining using ’dplyr’, ’ggplot2’, and other tidy tools. https://cran.r-project.org/web/packages/tidytext/index.html. Accessed 15 Oct 2019
Devereux BJ, Tyler LK, Geertzen J, Randall B (2014) The centre for speech, language and the brain (CSLB) concept property norms. Behav Res Methods 46(4):1119–1127. https://doi.org/10.3758/s13428-013-0420-4
Article PubMed Google Scholar
Duarte LR, Marquié L, Marquié JC, Terrier P, Ousset PJ (2009) Analyzing feature distinctiveness in the processing of living and non-living concepts in Alzheimer’s disease. Brain Cognit 71(2):108–117. https://doi.org/10.1016/j.bandc.2009.04.007
Article Google Scholar
Fairhall SL, Caramazza A (2013) Category-selective neural substrates for person- and place-related concepts. Cortex 49(10):2748–2757. https://doi.org/10.1016/j.cortex.2013.05.010
Article PubMed Google Scholar
Farah MJ, McClelland JL (1991) A computational model of semantic memory impairment: modality specificity and emergent category specificity. J Exp Psychol Gen 120(4):339–357. https://doi.org/10.1037/0096-3445.120.4.339
Article CAS PubMed Google Scholar
Gagolewski M, Tartanus B (2019) stringi: character string processing facilities. https://cran.r-project.org/web/packages/stringi/index.html. Accessed 15 Oct 2019
Garrard P, Lambon Ralph MA, Hodges JR, Patterson K (2001) Prototypicality, distinctiveness, and intercorrelation: analyses of the semantic attributes of living and nonliving concepts. Cognit Neuropsychol 18(2):125–174. https://doi.org/10.1080/02643290125857
Article CAS Google Scholar
Humphreys GW, Forde EM (2001) Hierarchies, similarity, and interactivity in object recognition: "category-specific" neuropsychological deficits. Behav Brain Sci 24(3):453–476
Article CAS Google Scholar
Jackendoff R (1992) Semantic structures. MIT Press, Boston
Google Scholar
Jackendoff R (2002) Foundations of language (brain, meaning, grammar, evolution). Oxford University Press, Oxford
Book Google Scholar
Jones LL, Golonka S (2012) Different influences on lexical priming for integrative, thematic, and taxonomic relations. Front Hum Neurosci 6:205. https://doi.org/10.3389/fnhum.2012.00205
Article PubMed PubMed Central Google Scholar
Kremer G, Baroni M (2011) A set of semantic norms for German and Italian. Behav Res Methods 43(1):97–109. https://doi.org/10.3758/s13428-010-0028-x
Article PubMed Google Scholar
Lebani GE, Lenci A, Bondielli A (2016) You are what you do: an empirical characterization of the semantic content of the thematic roles for a group of Italian verbs. J Cognit Sci 16(4):401–430. https://doi.org/10.17791/jcs.2015.16.4.401
Article Google Scholar
Lenci A, Baroni M, Cazzolli G, Marotta G (2013) BLIND: a set of semantic feature norms from the congenitally blind. Behav Res Methods 45(4):1218–1233. https://doi.org/10.3758/s13428-013-0323-4
Article PubMed Google Scholar
Marques JF, Fonseca FL, Morais S, Pinto IA (2007) Estimated age of acquisition norms for 834 Portuguese nouns and their relation with other psycholinguistic variables. Behav Res Methods 39(3):439–444. https://doi.org/10.3758/BF03193013
Article PubMed Google Scholar
McRae K, Cree GS, Seidenberg MS, McNorgan C (2005) Semantic feature production norms for a large set of living and nonliving things. Behav Res Methods 37(4):547–559. https://doi.org/10.3758/BF03192726
Article PubMed Google Scholar
Michalke M (2018) koRpus: an R package for text analysis. https://cran.r-project.org/web/packages/koRpus/index.html. Accessed 15 Oct 2019
Minsky M (1975) A framework for representing knowledge. In: Winston PH (ed) The psychology of computer vision. McGraw Hill, Winston, pp 211–277
Google Scholar
Montefinese M, Ambrosini E, Fairfield B, Mammarella N (2013) Semantic memory: a feature-based analysis and new norms for Italian. Behav Res Methods 45(2):440–461. https://doi.org/10.3758/s13428-012-0263-4
Article PubMed Google Scholar
Montefinese M, Ambrosini E, Fairfield B, Mammarella N (2014) Semantic significance: a new measure of feature salience. Mem Cogni 42(3):355–369. https://doi.org/10.3758/s13421-013-0365-y
Article Google Scholar
Montefinese M, Zannino GD, Ambrosini E (2015) Semantic similarity between old and new items produces false alarms in recognition memory. Psychol Res 79(5):785–794. https://doi.org/10.1007/s00426-014-0615-z
Article PubMed Google Scholar
Montefinese M, Vinson D, Ambrosini E (2018) Recognition memory and featural similarity between concepts: the pupil’s point of view. Biol Psychol 135:159–169. https://doi.org/10.1016/J.BIOPSYCHO.2018.04.004
Article PubMed Google Scholar
Norman DA, Rumelhart DE (1975) Explorations in cognition. Freeman, San Francisco
Google Scholar
Nosek BA, Alter G, Banks GC, Borsboom D, Bowman SD, Breckler SJ, Yarkoni T (2015) Promoting an open research culture. Science 348(6242):1422–1425. https://doi.org/10.1126/science.aab2374
Article CAS PubMed PubMed Central Google Scholar
Ooms J (2018) The hunspell package: high-performance stemmer, Tokenizer, and spell checker for R. https://cran.r-project.org/web/packages/hunspell/. Accessed 15 Oct 2019
Peng RD (2011) Reproducible research in computational science. Science (New York, N.Y.) 334(6060):1226–1227. https://doi.org/10.1126/science.1213847
Article CAS Google Scholar
Pexman PM, Hargreaves IS, Siakaluk PD, Bodner GE, Pope J (2008) There are many ways to be rich: effects of three measures of semantic richness on visual word recognition. Psychon Bull Rev 15(1):161–167. https://doi.org/10.3758/PBR.15.1.161
Article PubMed Google Scholar
Plaut DC (2002) Graded modality-specific specialisation in semantics: a computational account of optic aphasia. Cognit Neuropsychol 19(7):603–639. https://doi.org/10.1080/02643290244000112
Article Google Scholar
Recchia G, Jones MN (2012) The semantic richness of abstract concepts. Front Hum Neurosci 6:315. https://doi.org/10.3389/fnhum.2012.00315
Article PubMed PubMed Central Google Scholar
Rogers TT, Lambon Ralph MA, Garrard P, Bozeat S, McClelland JL, Hodges JR, Patterson K (2004) Structure and deterioration of semantic memory: a neuropsychological and computational investigation. Psychol Rev 111(1):205–235. https://doi.org/10.1037/0033-295X.111.1.205
Article Google Scholar
Rosch E, Mervis CB (1975) Family resemblances: studies in the internal structure of categories. Cognit Psychol 7(4):573–605. https://doi.org/10.1016/0010-0285(75)90024-9
Article Google Scholar
Ruts W, De Deyne S, Ameel E, Vanpaemel W, Verbeemen T, Storms G (2004) Dutch norm data for 13 semantic categories and 338 exemplars. Behav Res Methods Instrum Comput 36(3):506–515. https://doi.org/10.3758/BF03195597
Article PubMed Google Scholar
Saffran E, Sholl A (1999) Clues to the function and neural architecture of word meaning. In: Hogoort P, Brown C (eds) The neurocognition of language. Oxford University Press, Oxford
Google Scholar
Santos A, Chaigneau SE, Simmons WK, Barsalou LW (2011) Property generation reflects word association and situated simulation. Lang Cognit 3(1):83–119. https://doi.org/10.1515/langcog.2011.004
Article Google Scholar
Sartori G, Lombardi L (2004) Semantic relevance and semantic disorders. J Cognit Neurosci 16(3):439–452. https://doi.org/10.1162/089892904322926773
Article Google Scholar
Schmid H (1994) Probabilistic part of speech tagging using decision trees. https://www.cis.uni-muenchen.de/~schmid/tools/TreeTagger/data/tree-tagger1.pdf
Smith E, Medin DL (1981) Categories and concepts, vol 9. Harvard University Press, Cambridge
Google Scholar
Smith EE, Shoben EJ, Rips LJ (1974) Structure and process in semantic memory: a featural model for semantic decisions. Psychol Rev 81(3):214–241. https://doi.org/10.1037/h0036351
Article Google Scholar
Ushey K, McPherson J, Cheng J, Atkins A, Allaire J (2018) packrat: a dependency management system for projects and their R rackage dependencies. https://cran.r-project.org/web/packages/packrat/index.html. Accessed 15 Oct 2019
Vigliocco G, Vinson DP, Lewis W, Garrett MF (2004) Representing the meanings of object and action words: the featural and unitary semantic space hypothesis. Cognit Psychol 48(4):422–488. https://doi.org/10.1016/j.cogpsych.2003.09.001
Article PubMed Google Scholar
Vinson DP, Vigliocco G (2008) Semantic feature production norms for a large set of objects and events. Behav Res Methods 40(1):183–190. https://doi.org/10.3758/BRM.40.1.183
Article PubMed Google Scholar
Vivas J, Vivas L, Comesaña A, Coni AG, Vorano A (2017) Spanish semantic feature production norms for 400 concrete concepts. Behav Res Methods 49(3):1095–1106. https://doi.org/10.3758/s13428-016-0777-2
Article PubMed Google Scholar
Wickham H, Francios R, Henry L, Muller K, Rstudio (2019) dplyr: a grammar of data manipulation. https://cloud.r-project.org/web/packages/dplyr/index.html. Accessed 15 Oct 2019
Wiemer-Hastings K, Xu X (2005) Content differences for abstract and concrete concepts. Cognit Sci 29(5):719–736. https://doi.org/10.1207/s15516709cog0000_33
Article Google Scholar
Wu L-L, Barsalou LW (2009) Perceptual simulation in conceptual combination: evidence from property generation. Acta Psychol 132(2):173–189. https://doi.org/10.1016/j.actpsy.2009.02.002
Article Google Scholar
Zannino GD, Perri R, Pasqualetti P, Caltagirone C, Carlesimo GA (2006a) Analysis of the semantic representations of living and nonliving concepts: a normative study. Cognit Neuropsychol 23(4):515–540. https://doi.org/10.1080/02643290542000067
Article Google Scholar
Zannino GD, Perri R, Pasqualetti P, Caltagirone C, Carlesimo GA (2006b) (Category-specific) semantic deficit in Alzheimer’s patients: the role of semantic distance. Neuropsychologia 44(1):52–61. https://doi.org/10.1016/j.neuropsychologia.2005.04.008
Article PubMed Google Scholar

Download references

Acknowledgements

We would like to thank the editor and two anonymous reviewers for their helpful comments in shaping this manuscript.

Funding

This work was supported by the European Union’s Horizon 2020 research and innovation program under the Marie Skłodowska-Curie Grant Agreement No. 702655 and by the University of Padua (SID 2018) to MM.

Author information

Authors and Affiliations

Harrisburg University of Science and Technology, 326 Market St., Harrisburg, PA, 17101, USA
Erin M. Buchanan
The University of Melbourne, Melbourne, Australia
Simon De Deyne
University of Padova, Padua, Italy
Maria Montefinese
University College London, London, UK
Maria Montefinese

Authors

Erin M. Buchanan
View author publications
You can also search for this author in PubMed Google Scholar
Simon De Deyne
View author publications
You can also search for this author in PubMed Google Scholar
Maria Montefinese
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Erin M. Buchanan.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Ethical approval

All procedures performed in studies involving human participants were in accordance with the ethical standards of the institutional and/or national research committee (include name of committee + reference number) and with the 1964 Helsinki Declaration and its later amendments or comparable ethical standards.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Guest-editor: Barry Devereux (Queen’s University Belfast); Reviewers: Anna Rogers (University of Massachusetts Lowell) and a second researcher who prefers to remain anonymous In addition to information on editor and reviewers.

This manuscript is part of the special topic on ‘Eliciting Semantic Properties: Methods and Applications’ guest-edited by Enrico Canessa, Sergio Chaigneau, Barry Devereux, and Alessandro Lenci.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Buchanan, E.M., De Deyne, S. & Montefinese, M. A practical primer on processing semantic property norm data. Cogn Process 21, 587–599 (2020). https://doi.org/10.1007/s10339-019-00939-6

Download citation

Received: 24 July 2019
Accepted: 24 October 2019
Published: 25 November 2019
Issue Date: November 2020
DOI: https://doi.org/10.1007/s10339-019-00939-6

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A practical primer on processing semantic property norm data

Abstract

Access this article

Similar content being viewed by others

Natural Language Processing

Decision Making: a Theoretical Review

Semantic memory: A review of methods, models, and current challenges

Notes

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Ethical approval

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A practical primer on processing semantic property norm data

Abstract

Access this article

Similar content being viewed by others

Natural Language Processing

Decision Making: a Theoretical Review

Semantic memory: A review of methods, models, and current challenges

Notes

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Ethical approval

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation