In recent years, the publication of large datasets of behavioral responses to linguistic stimuli has been an important development for language researchers. The most influential of these has been the English Lexicon Project, which provides naming and lexical decision task (LDT) latencies for over 40,000 words (Balota et al., 2007). As evidence of its impact, consider that more than 1,000 citations now reference the article that describes the English Lexicon Project database. Additional LDT datasets have since been made available through the British Lexicon Project (Keuleers, Lacey, Rastle, & Brysbaert, 2012), as well as for other languages including French (Ferrand et al., 2010), Dutch (Keuleers, Diependaele, & Brysbaert, 2010), Malay (Yap, Liow, Jalil, & Faizal, 2010), and Chinese (Sze, Rickard Liow, & Yap, 2014). Since these datasets each involve responses to thousands of items, they allow researchers to evaluate effects of different (and often correlated) psycholinguistic variables and to do so with considerable statistical power (for discussions, see Balota, Yap, Hutchison, & Cortese, 2012; Brysbaert, Stevens, Mandera, & Keuleers, 2016; Keuleers & Balota, 2015).

Using these datasets, researchers have learned a great deal about the lexical characteristics that influence LDT responses and have been able to test the effects of new variables as they emerge. For instance, the effects of word frequency (typically, faster responses to more frequent words) have been compared to those of contextual diversity (the number of unique passages/documents in which a word appears; Adelman, Brown, & Quesada, 2006), with results suggesting that contextual diversity is the better predictor. Similarly, the effects of orthographic neighborhood size (Coltheart, Davelaar, Jonasson, & Besner, 1977) have been compared to those of orthographic Levenshtein distance (a measure of words’ orthographic similarity; Yarkoni, Balota, & Yap, 2008), with results showing that orthographic Levenshtein distance was the better predictor of LDT performance; responses were faster for words that were orthographically less distinct. These and other lexical characteristics explain considerable variance in LDT performance.

In contrast, much less variance in LDT performance is explained by words’ semantic characteristics (see Pexman, 2012, for a review). While seminal studies have shown that semantic information does play a role (e.g., Balota, Cortese, Sergent-Marshall, Spieler, & Yap, 2004; Buchanan, Westbury, & Burgess, 2001), it is assumed that LDT responses are primarily based on orthographic familiarity (Balota, Ferraro, & Connor, 1991). For those researchers interested in semantic processing, this constraint limits the utility of the LDT datasets. In one example, Yap, Tan, Pexman, and Hargreaves (2011) examined the influence of four measures of semantic richness on lexical decision latencies: number of features (e.g., cheese [high] vs. basket [low]; McRae, Cree, Seidenberg, & McNorgan, 2005), average radius of co-occurrence (e.g., prison [high] vs. tweezers [low]; Shaoul & Westbury, 2010), contextual dispersion (e.g., whistle [high] vs. parsnip [low]; Brysbaert & New, 2009), and number of senses (e.g., book [high] vs. axe [low]; Miller, 1990). After first controlling for a number of lexical characteristics such as frequency, orthographic neighborhood size, and orthographic Levenshtein distance, they found that these four semantic variables explained only 2 % of additional variance in lexical decision latencies. As Yap et al. (2011) showed, meaning influences in LDTs tend to be quite modest. Accordingly, researchers who are interested in questions of semantic representation and processing often use other tasks—in particular, those that require more extensive consideration of word meaning by participants. In the present study, we chose a concrete/abstract semantic decision (SDT) for this purpose. In this task, words are presented one at a time and participants are asked to decide whether each presented word refers to something concrete or abstract. The purpose of the present study was to generate a large SDT dataset to facilitate future research in ways that cannot be accomplished using existing LDT datasets.

To our knowledge, the only other SDT dataset currently available is from a recent study by Taikh, Hargreaves, Yap, and Pexman (2015). The authors collected behavioral responses for 288 pictures and, separately, their corresponding word labels. In that study, the decision was living/nonliving. Taikh et al. conducted regression analyses of behavioral responses to a subset of these items (i.e., those for which a full set of lexical and semantic predictor variables were available). The analysis of SDT responses to word stimuli is of particular relevance to the present study. Their analysis examined living/nonliving SDT latencies to 196 words, with lexical and task-specific variables entered on the first step and semantic richness variables on the second step. The results showed that the semantic richness variables explained 11 % of variance in living/nonliving SDT responses, over and above the 24 % explained by the lexical and task variables. Conversely, a parallel analysis for the same items using English Lexicon Project LDT latencies as the outcome variable showed different results. When LDT (rather than SDT) latencies were regressed on the same predictor variables, lexical and task variables now explained 61 % of the variance, while the semantic richness variables explained 3 %. This provides some evidence that meaning variables can play a stronger role in an SDT than in the LDT, with the caveat that the item set used by Taikh et al. was quite limited. The number of items therein is relatively small, and is limited to concrete concepts. In addition, the results of that study included several effects that seemed particular to the living/nonliving decision. Taikh et al. speculated that the living/nonliving decision encouraged participants to focus on certain aspects of meaning, such as animacy, which may have contributed to the particular pattern of semantic effects that was observed.

Indeed, there is now strong evidence that the decision chosen in a semantic task can influence the effects observed. For instance, Hino, Pexman, and Lupker (2006) compared responses to words having many unrelated meanings with responses to unambiguous words. This type of semantic ambiguity effect was inhibitory in a living/nonliving SDT and also in a human/nonhuman SDT, but was null when the decision was vegetable/nonvegetable. Similarly, Pexman, Holyk, and Monfils (2003) examined number of features effects in three different semantic decisions, and found that number-of-features effects were large and facilitatory in a bird/nonbird SDT, larger with a living/nonliving SDT, and largest with a concrete/abstract SDT. Also evidence from fMRI data suggests differences in the brain regions that are associated with living/nonliving versus concrete/abstract semantic decision-making (Hargreaves, White, Pexman, Pittman, & Goodyear, 2012).

In perhaps the most fine-grained manipulation of decision category to date, Tousignant and Pexman (2012) examined the body–object interaction effect (faster responses to words that refer to objects with which the body can easily interact—e.g., mask [high] vs. ship [low]; Siakaluk, Pexman, Aguilera, Owen, & Sears, 2008; Tillotson, Siakaluk, & Pexman, 2008) in four different versions of an SDT. All four versions of the SDT presented the same word lists and varied only in how the decision was framed to participants: as action/nonaction, action/entity, entity/action, or entity/nonentity. The body–object interaction effect was null when the decision was action/nonaction, large and facilitatory when the decision was action/entity or entity/action, and largest when the decision was entity/nonentity. The authors took these findings as evidence that semantic processing is highly context-dependent, and that participants make adjustments to semantic processing in response to the task context to optimize performance. Similarly, Jared and Seidenberg (1991) found differences in the semantic effects observed in narrow (e.g., flower/nonflower) versus broad (e.g., living/nonliving) decisions, and recommended that researchers avoid specific categories in decision tasks. In the present study, we chose the broadest decision that we could (concrete/abstract) so that a large number of items could be presented under the same task demands. In the sections of this article that follow, we describe our data collection procedure and offer some preliminary description and analyses of the dataset. This includes comparisons of the present results to those of previous smaller-scale SDT studies, because there can sometimes be differences in the results of small-scale and megastudies (Sibley, Kello, & Seidenberg, 2009). By making this dataset available to other researchers, we hope to facilitate future studies on the semantic processing of concrete and abstract words.

Method

Participants

The participants were 321 undergraduate students at the University of Calgary who participated for partial course credit. Nine of these participants had SDT accuracy below 70 %, and so their data were removed from the final dataset and from all analyses; the data for the remaining 312 participants (225 female, 87 male) were analyzed. All subsequent descriptions correspond to this final set. Participants were asked to provide their age, and 296 did so (age range = 17–66 years; mean age = 21.75 years, SD = 5.82). Prior to taking part in the study, all participants completed a prescreen questionnaire on which they reported their level of English fluency. Only participants who self-reported as being “completely fluent” were eligible to participate. All participants had normal or corrected-to-normal vision. Participants were randomly assigned to one of ten versions of the study, comprising unique word lists (further description follows). The numbers and characteristics of the participants who completed the different versions of the study are presented in Table 1.

Table 1 Participant characteristics and mean Calgary semantic decision task (SDT) response latencies and accuracies, by version

Apparatus

Words were presented via a widescreen 24-in. ASUS monitor (VG248QE), which was controlled by a Dell OptiPlex 9020 PC. The monitor has a rapid refresh rate of 144 Hz and a 1-ms response time.

Stimuli

The word stimuli were selected from Brysbaert et al.’s (2013) comprehensive list of concreteness ratings for English lemmas. This list contains the concreteness ratings of 40,000 known English lemmas, rated on a scale of 1 (abstract) to 5 (concrete). From this list we selected 18,000 words consisting of nouns, verbs, adjectives, and adverbs. These included 9,000 of the words rated as most concrete and 9,000 of the words rated as most abstract. The concreteness ratings ranged from 3.78 to 5 for concrete words, and 1.04 to 2.08 for abstract words. Slang or obscenities, one-letter words, and words with spaces or dashes were eliminated. Next, we selected 10,000 items that could be divided into ten lists of 1,000 words and matched (using the Match program; van Casteren & Davis, 2007), such that in each list the abstract and concrete words did not differ significantly on word length or frequency (measured as the log of the SUBTLEXus frequency values; Brysbaert & New, 2009). Each resulting list of 1,000 words was assigned to a different version of the experiment. The mean lexical characteristics for each of the ten versions are presented in Table 2. Thus, across versions, participants collectively gave responses to 5,000 concrete words and 5,000 abstract words. All of these words are also present in the ELP database.

Table 2 Mean lengths, frequencies and concreteness ratings for concrete and abstract words by version (standard deviations in parentheses)

Procedure

PC-compatible computers running E-Prime software (Schneider, Eschman, & Zuccolotto, 2001) were used for stimulus presentation and data collection. Participants were individually tested in our university laboratory. Before beginning the SDT, each participant first completed the Modified Edinburgh Handedness Survey. We used the results of this survey to ensure that all participants responded to concrete words during the SDT using their dominant hand and abstract words using their nondominant hand. For participants whose score on the handedness survey was zero (fully ambidextrous, n = 2), the hand they preferred to write with was designated as the dominant hand. Next, each participant was administered the shortened version of the North American Adult Reading Test (NAART35; Uttl, 2002) in order to assess vocabulary skill.

To align with the definitions used in the Brysbaert et al. (2013) study, participants were provided with the following onscreen instructions for the SDT:

Concrete words are defined as things or actions in reality, which you can experience directly through your senses. These words are experience-based.

Night, bridle, and lynx are examples of concrete words.

Abstract words are defined as something you cannot experience directly through your senses or actions. These words are language-based, as their meaning depends on other words.

Have, limitation, and outspokenness are examples of abstract words.

The researcher verbally repeated these instructions and confirmed that the participant understood the distinction between the concrete and abstract word categories. The researcher reminded each participant that words in the study were not restricted to nouns and that even verbs and adjectives could fall into the concrete or abstract categories.

Participants were next provided with 24 practice items before beginning the experimental trials. Stimuli were presented one at a time in the center of the screen in white lowercase letters against a black background (Courier New, font size 18). Each trial began with the presentation of a fixation screen depicting two horizontal lines positioned above and below a gap where a word would appear. Participants were asked to focus on the gap between the lines. After 500 ms, the stimulus word was presented in the gap; the horizontal lines remained on the screen. Individual stimuli remained on the screen until the participant made a response or for a maximum of 3,000 ms. Using an external response box connected to the serial port, participants responded using their dominant hand for concrete words and their nondominant hand for abstract words. The interstimulus interval was 500 ms. A feedback screen was presented for 1,000 ms following any incorrect responses (“incorrect”) or when no response was detected (“no response detected”).

Following completion of the practice trials, the researcher invited each participant to ask any additional questions. For example, the researcher might explain the correct categorization of a given word if a participant indicated they were unsure why a response had been incorrect. The researcher then left the participant alone in the testing room to complete the procedure independently. Throughout the experimental trials, each participant made semantic decisions for 500 concrete and 500 abstract words. Each word list was divided into four blocks consisting of 250 words. Trials were randomized within blocks, and block order was fixed. Breaks were provided between each block. The first and third breaks did not have a set time limit; participants were simply told to press a button when they were ready to continue. To manage participant fatigue, the duration of the second break was a mandatory 3 min. After the mandatory break, a warning screen (black lettering on a white background) appeared for 1,000 ms, signaling to the participant that the trials were about to resume. On average, participants took 80 min to complete the entire procedure.

Full item-level and trial-level datasets are available as supplements to this article, and descriptions of the variables in each file are provided in the Appendix.

Results

Trials with incorrect responses (12.49 %) were excluded from the latency analyses. Responses faster than 250 ms (0.02 %) were likewise excluded before computing the latency means and standard deviations for each participant in each block. Note that responses slower than 3,000 ms were also automatically excluded, because all trials timed out after 3,000 ms (0.49 %). Next, latencies beyond 3 SDs from each participant’s mean in each block were eliminated, removing a further 1.37 % of the responses.

The correlation between participant age and North American Adult Reading Test score was significant, r(296) = .30, p < .001, such that older participants tended to have higher vocabulary scores. We used partial correlations to investigate the relationship between vocabulary score and SDT performance, independent of age (Table 3). Even with age controlled, participant vocabulary scores were still correlated with the speed and accuracy of SDT responses, such that participants with higher vocabulary scores had faster and more accurate SDT responses for both concrete and abstract words.

Table 3 Partial correlations between NAART scores and Calgary SDT response latency and accuracy, with participant age controlled

Response latencies were standardized as z scores, since these minimize the influence of a participant’s overall processing speed and variability (Faust, Balota, Spieler, & Ferraro, 1999). Using these scores, we ran a series of hierarchical linear regression analyses to compare the semantic effects on Calgary SDT latencies to those found in previous studies. In the first two sets of regression analyses reported next, we made direct comparisons to previous (smaller-scale) studies, so we used the same predictors as in the original studies. In the final set of regression analyses reported below, we used the semantic predictors that allowed us to include as many items as possible in the present dataset, to examine the contributions of lexical and semantic variables across the dataset.

Pexman, Hargreaves, Siakaluk, Bodner, and Pope (2008) examined SDT latencies based on the concrete/abstract decision for 514 concrete words from McRae et al.’s (2005) feature-listing norms. Using hierarchical regression, they entered log word frequency (HAL; Lund & Burgess, 1996), orthographic neighborhood size, and word length as control variables in Step 1, and three semantic richness variables in Step 2: number of semantic neighbors (Durda, & Buchanan, 2006), contextual dispersion (Zeno, Ivens, Millard, & Duvvuri, 1995), and number of features. Using the same variables, we ran the same analysis on latencies to the 192 items that are common to the Pexman et al. study and the present SDT. The results are presented in Table 4. The patterns of results for the two data sets are the same; frequency is the only significant control variable, and both contextual dispersion and number of features are significant semantic predictors, while number of semantic neighbors is not.

Table 4 Hierarchical regression results for 192 concrete words from Pexman et al. (2008), and the corresponding Calgary SDT results for the same items

Similarly, SDT latencies to 202 abstract words were examined in an earlier study by Zdrazilova and Pexman (2013). The task used by Zdrazilova and Pexman was a go/no-go SDT; participants decided whether each item referred to something abstract, pressing a button to respond “yes” to abstract words and withholding a response for concrete words. Using hierarchical regression, they entered log word frequency (SUBTL; Brysbaert & New, 2009), orthographic Levenshtein distance, and age of acquisition ratings in Step 1, and six semantic richness variables in Step 2: context availability, sensory experience ratings (Juhasz, Yap, Dicke, Taylor, & Gullick, 2011), valence, arousal, number of semantic neighbors, and number of associates (Nelson, McEvoy, & Schreiber, 1998). Using the same variables, we ran the same analysis on latencies to the 125 items that are common to the Zdrazilova and Pexman study and the present SDT. The results are presented in Table 5. Despite the fact that the Zdrazilova and Pexman SDT used a go/no-go procedure and the present SDT did not, the patterns of results for the two data sets are the same; frequency was the only significant lexical variable, and sensory experience rating was the only significant semantic predictor. The values in Table 5 also suggest that more variance was explained overall in the Zdrazilova and Pexman dataset. This may be due to the go/no-go procedure, which Zdrazilova and Pexman speculated would have encouraged participants to focus on factors that are diagnostic of abstractness, rather than simply on the absence of concreteness.

Table 5 Hierarchical regression results for 125 abstract words from Zdrazilova and Pexman (2013), and the corresponding Calgary SDT results for the same items

Using a similar analytic approach, we next assessed the variance explained by lexical and semantic predictors in the present SDT for a much larger set of items, and compared to English Lexicon Project LDT latencies for the same items. We expected that lexical variables might explain more variance in LDT responses, while semantic variables might explain more variance in SDT responses. We analyzed the responses to concrete and abstract words separately, since these represent different responses in the SDT. Using hierarchical regression, we entered log contextual dispersion (Brysbaert & New, 2009), orthographic Levenshtein distance, orthographic neighborhood size, and word length as lexical variables in Step 1, and in Step 2 we entered three semantic richness variables for which we had a large number of values, to make the analysis inclusive of responses to most of the items: concreteness, average radius of co-occurrence (Shaoul & Westbury, 2010), and semantic diversity (Hoffman, Lambon Ralph, & Rogers, 2013). As is illustrated in Table 6, the lexical variables did tend to explain more of the variance in LDT than in SDT latencies. Furthermore, the semantic richness variables tended to explain more variance in the SDT than in the LDT, particularly for concrete words. We discuss the observed patterns of effects for semantic richness variables in more detail below.

Table 6 Hierarchical regression results for Calgary SDT and English Lexicon Project LDT

Finally, we checked for practice effects across experiment blocks. Given the length of the experimental sessions, it is not surprising that participants did tend to get faster across blocks. A one-way analysis of variance on the response latencies revealed a main effect of block [F(3, 933) = 58.94, p < .001, η p 2 = .159]. Participants tended to speed up from the first block (M = 1,048.33, SD = 393.74) to the second (M = 1,002.10, SD = 382.97) [t(311) = 8.35, p < .001], but had similar latencies for the second and third blocks (M = 1,000.00, SD = 383.23) [t(311) = 0.54, p = .59]. Participants then got faster from the third to the fourth blocks (M = 970.28, SD = 366.70) [t(311) = 6.61, p < .001]. In addition, we evaluated the possibility that participants were relying on different types of information to make their semantic decisions in the first block and the last block. We did this by running the regression analyses presented in Table 6 separately for Block 1 data and Block 4 data. The results showed, for both blocks, the same patterns of effects as in the overall analysis. As such, we assume that participants did not shift their reliance on different types of lexical or semantic information across the experimental blocks. Users who wish to control for variability in latencies across blocks can do so with the Block variable in the item-level data file, or in a more fine-grained way using the FullRunOrder variable in the trial-level data file.

Discussion

The overarching purpose of the present study was to generate a relatively large dataset of SDT responses. We chose a decision category that was sufficiently broad to allow inclusion of a large number of items but that still required meaning retrieval for each item presented. We capitalized on the existing concreteness norms generated by Brysbaert et al. (2013) to select concrete and abstract word stimuli. As we described in the introduction, the decision chosen for a SDT will necessarily shape the responses generated. Participants tend to focus on dimensions of meaning that are diagnostic of the decision (Tousignant & Pexman, 2012). Certainly, the breadth of the concrete/abstract decision would likely make it less susceptible to these effects than a more narrow decision might be (e.g., vegetable/nonvegetable, bird/nonbird), but the decision will, nonetheless, have influenced responses. As evidence, consider the large proportion of variance explained by the concreteness dimension in the regression analyses in Table 6; concreteness was facilitatory for concrete words and inhibitory for abstract words. Researchers wishing to control for these effects of typicality should include the concreteness dimension in analyses of this dataset. Since we chose our stimuli from Brysbaert et al.’s (2013) comprehensive concreteness ratings norms, those values are available for every item in the present dataset and we have included them in the item-wise dataset in order to make it relatively straightforward for users to perform this type of adjustment for typicality.

Previous lexical-semantic studies have tended to focus on concrete words, and have identified a number of dimensions that are important to concrete meaning, including sensorimotor dimensions like imageability and BOI (e.g., Amsel, 2011; Amsel & Cree, 2013; Amsel, Urbach, & Kutas, 2012; Cortese & Fugett, 2004; Siakaluk, et al., 2008; Yap, Pexman, Wellsby, Hargreaves, & Huff, 2012; Yap et al., 2011). Some of these dimensions are likely not relevant for abstract words; for example, body-object interaction by definition applies only to words that refer to objects or entities. Less research attention has been given to dimensions of abstract meaning, however, so there is much we do not yet understand about the semantic representation of abstract concepts. Indeed, the regression results in Table 6 show that the variables tested explained more variance for concrete words than for abstract words in the SDT. With responses to 5,000 abstract items, the present dataset offers the opportunity to examine new questions about abstract word meaning. Our results, preliminary though they are, provide some intriguing evidence that semantic effects may differ for concrete and abstract words.

In particular, the patterns of semantic diversity effects for concrete and abstract words show some interesting differences. While we found that this semantic variable was facilitatory in the LDT for both concrete and abstract words, in the SDT it was facilitatory for abstract words but inhibitory for concrete words (Table 6). Hoffman et al. (2013) devised the construct of semantic diversity and assumed that words that appear in more diverse contexts have more varied meanings. In a previous study, Hoffman and Woollams (2015) showed that semantic diversity effects vary across tasks; in the LDT, responses were faster to high-semantic-diversity words than to low-semantic-diversity words, but in a semantic relatedness judgment task the effect was reversed, with faster responses to low-semantic-diversity words than to high-semantic-diversity words. This pattern is consistent with some of the previous literature on semantic ambiguity effects, but that literature is quite mixed (e.g., Hargreaves, Pexman, Pittman, & Goodyear, 2011; Piercey & Joordens, 2000; Rodd, Gaskell, & Marslen-Wilson, 2002; Yap et al., 2011). Our results point to one potential explanation for some of the mixed results—concreteness. That is, our results suggest different effects of ambiguity in the SDT for concrete and abstract words. Using the present dataset, the reasons for these differences could be explored in future studies.

Similarly, the effects of the average radius of co-occurrence differed in this study for concrete and abstract words. The average radius of co-occurrence indexes a word’s closeness or similarity to its neighbors in lexical co-occurrence space (Shaoul & Westbury, 2010). Previous studies have reported facilitatory effects of the average radius of co-occurrence in the LDT, but null effects in the SDT for both concrete (Yap et al., 2012; Yap et al., 2011) and abstract (Zdrazilova & Pexman, 2013) stimuli. In the present analyses, controlling for several other lexical and semantic variables, the facilitatory effect of average radius of co-occurrence was present in the LDT only for abstract words. Furthermore, for abstract words there was an inhibitory effect of average radius of co-occurrence on SDT latencies. This may be consistent with Mirman and Magnuson’s (2006) findings of trade-offs between close and distant semantic neighbors. That is, Mirman and Magnuson noted that while greater semantic neighborhood density is typically facilitatory, close neighbors can sometimes exert an inhibitory effect on semantic processing. To explain the different pattern of results that we observed across tasks, we would need to further assume that task demands interact with semantic neighborhood structure to exert different effects of average radius of co-occurrence in the LDT and SDT. These and other differences between concrete and abstract meaning could be explored in future studies utilizing this dataset along with more fine-grained measures of semantic neighborhood characteristics.

The limited set of analyses in the present study are intended merely to assess the potential of our dataset for testing the effects of semantic variables. Preliminary results from these analyses suggest that the database has promise, but certainly there are many more semantic variables we have not tested here, as well as the promise of novel variables not yet characterized. Indeed, many researchers now assume that semantic representation is multidimensional—that is, composed of several different types of information, including both linguistic or language-based information and experiential or object-based information (e.g., Barsalou, Santos, Simmons, & Wilson, 2008; Binder & Desai, 2011; Dove, 2009; Louwerse, 2010; Vigliocco, Meteyard, Andrews, & Kousta, 2009). The Calgary SDT dataset offers researchers the opportunity to test the independent and joint effects of these variables on the processing of concrete and abstract word meanings.

For instance, it has been argued that emotion information may play a particularly important role in the representation of abstract meaning (Vigliocco et al., 2009), but the literature on emotion variables in lexical–semantic processing is quite mixed. Some studies have shown that valence has a linear effect on lexical processing, with faster LDT latencies to positive than to negative words (Estes & Adelman, 2008; Kuperman, Estes, Brysbaert, & Warriner, 2014; Larsen, Mercer, Balota, & Strube, 2008). Other studies have shown that the effect of valence is better described as an inverted U-shape, with faster LDT latencies for both positive and negative words as compared with neutral words (Kousta, Vinson, & Vigliocco, 2009; Vinson, Ponari, & Vigliocco, 2014; Yap & Seow, 2014). Finally, other ways of measuring emotional information have been characterized, and these need to be compared to the more traditional constructs of valence and arousal (Moffat, Siakaluk, Sidhu, & Pexman, 2015; Newcombe, Campbell, Siakaluk, & Pexman, 2012). These issues could be pursued with the present dataset.

Similarly, it has been argued that contextual and situational information is particularly important to abstract meaning (Wilson-Mendenhall, Simmons, Martin, & Barsalou, 2013), but we need to better characterize this type of information, and there are ongoing efforts to do so (e.g., Moffat et al., 2015; Recchia & Jones, 2012). The Calgary SDT dataset offers researchers the unprecedented opportunity to explore each of these issues using a task for which substantial variance is explained by semantic variables. As such, use of the present dataset holds strong potential for allowing new insights and progress.