Abstract
The HAL (hyperspace analog to language) model of lexical semantics uses global word co-occurrence from a large corpus of text to calculate the distance between words in co-occurrence space. We have implemented a system called HiDEx (High Dimensional Explorer) that extends HAL in two ways: It removes unwanted influence of orthographic frequency from the measures of distance, and it finds the number of words within a certain distance of the word of interest (NCount, the number of neighbors). These two changes to the HAL model produce
Article PDF
Similar content being viewed by others
References
Baayen, R. H., Piepenbrock, R., &Gulikers, L. (1995).The CELEX lexical database (CD-ROM). Philadelphia: University of Pennsylvania, Linguistic Data Consortium.
Balota, D., Cortese, M., Hutchison, K., Neely, J., Nelson, D., Simpson, G., et al. (2002).The English Lexicon Project: A Webbased repository of descriptive and behavioral measures for 40,481 English words and nonwords. St. Louis: Washington University. Retrieved July 7, 2005, from elexicon.wustl.edu.
Buchanan, L., Burgess, C., &Lund, K. (1996). Overcrowding in semantic neighborhoods: Modeling deep dyslexia.Brain & Cognition,32, 111–114.
Buchanan, L., Westbury, C., &Burgess, C. (2001). Characterizing semantic space: Neighborhood effects in word recognition.Psychonomic Bulletin & Review,8, 531–544.
Burgess, C. (1998). From simple associations to the building blocks language: Modeling meaning in memory with the HAL model.Behavior Research Methods, Instruments, & Computers,30, 188–198.
Burgess, C., &Livesay, K. (1998). The effect of corpus size in predicting reaction time in a basic word recognition task: Moving on from Kučera and Francis.Behavior Research Methods, Instruments, & Computers,30, 272–277.
Burgess, C., Livesay, K., &Lund, K. (1998). Explorations in context space: Words, sentences, discourse.Discourse Processes,25, 211–257.
Burgess, C., &Lund, K. (1997). Modelling parsing constraints high-dimensional context space.Language & Cognitive Processes,12, 177–210.
Burgess, C., &Lund, K. (2000). The dynamics of meaning in memory. In E. Dietrich & A. B. Markman (Eds.),Cognitive dynamics: Conceptual and representational change in humans and machines (pp. 117–156). Mahwah, NJ: Erlbaum.
Cavnar, W. B., & Trenkle, J. M. (1994). N-gram-based text categorization. InProceedings of Third Annual Symposium on Document Analysis and Information Retrieval (pp. 161–169). Las Vegas.
Graff, D. (2002).The AQUAINT corpus of English news text (Tech. Rep. No. LDC2002T31). Philadelphia: University of Pennsylvania, Linguistic Data Consortium. (Original work published 1999)
Hollis, G., &Westbury, C. (2006). NUANCE: Naturalistic University of Alberta Nonlinear Correlation Explorer.Behavior Research Methods,38, 8–23.
Lowe, W. (2001). Toward a theory of semantic space. In J. D. Moore & K. Stenning (Eds.),Proceedings of the Twenty-Third Annual Conference of the Cognitive Science Society (pp. 576–581). Mahwah, NJ: Erlbaum.
Lund, K., &Burgess, C. (1996). Producing high-dimensional semantic spaces from lexical co-occurrence.Behavior Research Methods, Instruments, & Computers,28, 203–208.
Mohr, G., Stack, M., Ranitovic, I., Avery, D., & Kimpton, M. (2004). Introduction to Heritrix, an archival quality Web crawler. In4th International Web Archiving Workshop. Retrieved April 4, 2005, from www.iwaw.net/04/proceedings.php?f=Mohr.
Rohde, D. L. T., Gonnerman, L. M., & Plaut, D. C. (2004).An improved method for deriving word meaning from lexical co-occurrence. Unpublished manuscript. Cambridge, MA: Massachusetts Institute of Technology. Retrieved September 20, 2004, from tedlab.mit.edu/dr/.
Song, D., Bruza, P., & Cole, R. (2004, July 30).Concept learning and information inferencing on a high-dimensional semantic space. Paper presented at the ACM SIGIR 2004 Workshop on Mathematical/ Formal Methods in Information Retrieval, Sheffield, U.K.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Shaoul, C., Westbury, C. Word frequency effects in high-dimensional co-occurrence models: A new approach. Behavior Research Methods 38, 190–195 (2006). https://doi.org/10.3758/BF03192768
Received:
Accepted:
Issue Date:
DOI: https://doi.org/10.3758/BF03192768