Creating Implicit Measure Stimulus Sets Using a Multi-Step Piloting Method

The effect of arbitrary stimulus selection is a persistent concern when employing implicit measures. The current study tests a data-driven multi-step procedure to create stimulus items using a combination of free-recall and survey data. Six sets of stimulus items were created, representing healthy food and high sugar items in children, adolescents, and adults. Selected items were highly representative of the target concepts, in frequent use, and of near equal length. Tests of the piloted items in two samples showed slightly higher implicit measure–behavior relations compared to a previously used measure, providing preliminary support for the value in empirically based stimulus selection. Further, the items reported as being the most associated with their target concepts differed notably from what one may expect from the guidelines or population consumption patterns, highlighting the importance of informed stimulus selection.


Creating Implicit Measure Stimulus Sets Using a Multi-Step Piloting Method
Over the past three decades, there has been an increasing interest in the use of experimental tasks as measures to assess non-conscious constructs such as implicit attitudes, implicit self-concepts, and approach biases. Such measures have, in general, been successful in augmenting traditional self-reported scales to improve the prediction of outcomes and have coincided with renewed interest in dual-process models to conceptualize the role of automatic processes in human behavior [1][2][3][4].
However, despite the increasing popularity and interest in dual-process research, questions remain about the validity of measures used to assess implicit constructs like implicit attitude and implicit beliefs [5][6][7][8][9]. One persistent concern raised by dual-process researchers is that the characteristics of stimulus items used may introduce extraneous variability into measures [10][11][12][13], as all currently available implicit measures require target objects and concepts to be represented by a set of stimulus items [14][15][16]. Most commonly, these stimulus sets are made up of pictures or words of prototypical exemplars that are thought to represent that category (e.g., the target concept of alcohol might be represented by names of common alcoholic beverages). Tasks then infer nonconscious constructs like implicit attitudes through patterns of responding to these stimuli in experimental tasks. The ability of researchers to create stimulus sets to represent a wide range of categories has contributed to the growing use of implicit measures in a variety of fields, including social psychology [17], health promotion [18][19][20][21], and business [22]. However, the methodology and process of stimulus selection is often treated as a minor or trivial part of the research design, as pilot testing and data-driven stimulus selection processes are rarely reported in the current dual-process literature. This dearth of research in piloting and robust stimulus selection occurs in spite of evidence that stimulus characteristics can influence results from implicit measures. For example, several studies have produced notable changes in mean effects and correlations with outcome measures by altering stimulus in terms of context [23], modality [24], or task irrelevant stimulus-valence associations [25]. Of the potential effects of stimulus selection, one of the more subtle yet broad findings has been evidence that seemingly minor variations in the correspondence between a stimulus item and its target concept can alter the effects found on implicit measures [12,13]. This proposition is supported by experimental evidence [26], showing that, when administering multiple versions of the racial bias IAT across four experiments, IATs produced stronger effects when stimuli were rated as more representative of their target concepts. Findings such as these reaffirm the recommendation that researchers employing implicit measures should be cautious in their stimulus selection, and seek to use highly relevant stimulus items, rather than many stimuli with a range of conceptual correspondences [11].
Further, even when highly relevant stimulus items are selected, characteristics of how exemplars are used in everyday life may also affect results [27]. For example, experimental evidence has found that implicit measures are more sensitive when target stimulus comprise of words which are frequently used and familiar to the target sample [28]. While this effect is minor when all categories are equally familiar or unfamiliar to participants, the effects of stimulus familiarity and use frequency can notably change effects on implicit measures when one category has a higher use frequency than another [29].
Another common consideration in implicit measure research is the length of stimulus words in each category, as there is evidence of slower word processing when words are particularly short (three or fewer letters) or long (10 or more letters) [30]. As a result, some researchers opt to match the mean length of stimulus words between categories [29,[31][32][33][34][35]. Research has demonstrated, however, that there is little evidence of variation in implicit measures as a function of word length [36]. One potential reason this effect has not been found in implicit measure studies is that, as reading skills advance, known words are generally processed as a whole, contributing to reduced variability in processing speed as a function of word length [37][38][39]. However, in younger samples where reading skills are less advanced, the same studies found significantly slower processing speeds for long compared to short words. Thus, it is a plausible hypothesis that word length should remain a consideration, but its relevance may be minimal outside of younger samples or those with a reading impairment.

The Current Study
Despite the potential value of good stimulus selection flagged in the current literature, research comparing stimulus sets selected based on theoretically relevant empirical data and stimulus sets selected by researchers is lacking. Thus, the current study aimed to address this research gap by outlining and testing a multi-step process to create stimulus sets for implicit measures, which accounts for several theoretically important sources of heterogeneity as a function of stimulus selection: correspondence to the construct, use frequency and familiarity, and word length. In an attempt to minimize researcher bias in selecting stimulus items, a pool of potential stimulus words will first be extracted from the target sample in a free-recall format. Then, separate samples will explicitly rate the extent to which they believe these words correspond to the target construct. Lastly, each of the words will be assessed for frequency of use and word length. Ideally, from each set of words, the researchers can select a small set of exemplars to act as stimulus items that are highly corresponding to the target concept, are at low risk of being highly unfamiliar to the target sample, and of near equal average length. In the current study, we test this process in three populations:-children, adolescents, and young adults-creating a stimulus set for the target behaviors of healthy eating and sugar intake in each population. Following piloting, we assessed the stimulus sets created for how they compared to previously employed but Methods Protoc. 2023, 6, 47 3 of 15 unpiloted measures in terms of their reliability and convergent validity. A visual summary of the research project is presented in Figure 1. extent to which they believe these words correspond to the target construct. Lastly, each of the words will be assessed for frequency of use and word length. Ideally, from each set of words, the researchers can select a small set of exemplars to act as stimulus items that are highly corresponding to the target concept, are at low risk of being highly unfamiliar to the target sample, and of near equal average length. In the current study, we test this process in three populations:-children, adolescents, and young adults-creating a stimulus set for the target behaviors of healthy eating and sugar intake in each population. Following piloting, we assessed the stimulus sets created for how they compared to previously employed but unpiloted measures in terms of their reliability and convergent validity. A visual summary of the research project is presented in Figure 1.

Pilot
A sample of 10 children aged 6-10 years old were interviewed at a convenient location, in order to provide a pool of potential stimulus words in a free-recall format. After obtaining parental and child consent, children were asked to provide the first five things they thought of when they think of healthy foods. The children were then asked to provide the first five things they think of when they think of sugary or sweet food and drinks.

Study 1 2.1. Pilot
A sample of 10 children aged 6-10 years old were interviewed at a convenient location, in order to provide a pool of potential stimulus words in a free-recall format. After obtaining parental and child consent, children were asked to provide the first five things they thought of when they think of healthy foods. The children were then asked to provide the first five things they think of when they think of sugary or sweet food and drinks.

Participants and Procedure
A sample of 51 children aged 6-10 years old (male = 22, female = 29, M age = 7.55, SD Age = 1.53) were recruited to explicitly rate the extent to which they associated the words provided in the pilot with the target concepts. Participants were recruited through advertisements targeted at parents posted in parenting forums, online community noticeboards, Facebook advertisements, and an email broadcast to staff members of an Australian university.
Before beginning the survey, parents were presented with a detailed information sheet, asked to provide consent, and instructed that they may assist their child in reading and understanding, but should not attempt to influence children's survey responses. Then, a simplified information and consent form was presented for children, which parents were asked to read and explain to their child. Once they began the survey, children were asked to rate how much they associated each of the exemplars provided in the pilot study with healthy food and sugary or sweet food and drinks (e.g., How much do apples make you think about healthy food? How much do doughnuts make you think about sweet and sugary things?). To increase readability and accessibility for children, simple language was used with a large font size, and all Likert items included a label on each possible response (0 = I do not know what that is, 1 = Not at all, 2 = A Little, 3 = Kind of, 4 = Very Much, 5 = Extremely). Researchers also recorded the length of each word in syllables and letters and its frequency of use in Australian children's writing using the Oxford Word List [40]. Note that, compared to other corpus data, the Oxford Word List presents use frequency as a rank of sampled words. Thus, lower numbers correspond to more frequent use.

Results and Discussion
Study 1 aimed to create sets of potential stimulus items representing healthy eating and high sugar items in children which were rated as highly corresponding to their respective target concept, familiar to the target sample, and of near equal word length. The children's levels of association between exemplars and healthy foods are provided in Table 1, and associations between exemplars and sweet and sugary foods and drinks are provided in Table 2. Use frequency rank refers to the ranking in the Oxford Word List, with lower numbers indicating a higher rank and more frequent use. Regarding sweet and sugary foods and drinks in the children sample, the current data suggests the category is best represented by three exemplars: candy, ice cream, and lollipop. Each of these exemplar items was rated as strongly associated with the target concept (mean correspondence ≥ 4.00), was less than three syllables and/or ten letters, had no respondents reported not knowing the word, and was in frequent use according to corpus data. It is important to note that, while the category sweet and sugary foods and drinks included both foods and drinks, the highest ranked drink was soft drink as the seventh most associated. In the healthy eating category, however, results are not as clear cut.
Although nine stimulus items were rated as highly corresponding to the target concept, the length of watermelon and strawberry may cause slower reactions, and some respondents reported not knowing the word cucumber. Thus, out of caution, these words are excluded. As a result, six exemplars are chosen for the healthy eating in children category: broccoli, fruit, carrots, apples, grape, and banana.

Study 2 3.1. Pilot
A pool of potential stimulus words was extracted from a sample of 24 young adults at an Australian university (M Age = 22.08, SD Age = 7.45; 16 = female, 8 = male). This process replicated the Study 1 pilot, albeit with two changes. First, data were collected in an online survey format, rather than as an interview. Second, owing to the increased literary abilities of the young adult sample, the target concept of sweet and sugary foods and drinks was replaced with foods and drinks high in free sugar. After providing consent to participate in the study, participants were presented with a definition of free sugar. Then, participants were asked to list the first five examples which came to mind when they thought about healthy food, and then the first five examples that came to mind when they thought about foods and drinks high in free sugar.

Participants and Procedure
Participants for Study 2a consisted of 94 young adults aged 18 to 25 years old recruited from an Australian university (M Age = 19.57, SD Age = 1.99; 71 = female, 23 = male). The procedure for Study 2a mirrored that of Study 1. After completing consent and demographic information, participants were presented with the same definition of free sugar presented to the Study 2 pilot sample. Participants then rated the extent to which they associated each of the exemplars extracted from the pilot study with the concept of either foods and drinks high in free sugar or healthy foods. All exemplars were rated on a 5-point Likert scale anchored 1 = "Not at all" to 5 = "Extremely". As the Oxford Word List used in Study 1 was designed for children only, the CORE Corpus was used as an indicator of word frequency in Study 2 [41].
After collecting Study 2a data, we concluded that word familiarity statistics extracted from Corpus data alone was likely an insufficient indicator of word use in the target sample. That is, the corpus data used is specific to the frequency of words in prose, rather than how frequently the target sample likely uses each term. As such, we recruited an additional sample of 73 young adults aged 18-25 years old from the same Australian university (M Age = 20.17, SD Age = 1.97, 50 female, 22 male). Participants in Study 2b were asked to rate each of the stimulus items drawn from the Study 2 pilot for how frequently they used each word ("How often do you use each of the following words?"), scored from 1 = "Never" to 6 = "Extremely often".

Results and Discussion
The young adult sample's levels of association (Study 2a) between exemplars and use frequency (Study 2b) for healthy foods and foods and drinks high in free sugar are provided in Tables 3 and 4, respectively. As in Study 1, we sought to create a set of stimulus items which were strongly associated with the target concepts and in frequent use, conceptualized in the current study as a mean correspondence ≥4.00, and mean use frequency rating ≥2.50 or corpus frequency ≥3. In the young adult sample, 29 exemplars were rated as highly associated with the target concept of healthy food. In the interest of maintaining a small and highly relevant stimulus set [11], the 10 highest ranked suitable items were chosen: fruit, vegetables, broccoli, lettuce, bananas, oranges, spinach, salad, apples, and tomato. The exemplar vegetables was included despite being 10 characters, as evidence indicates little effect of word length in proficient readers [39]. For foods and drinks high in free sugar in the young adult sample, 16 words were found to be highly corresponding to the target construct. However, lollies, energy drinks, cordial, Sunkist, lemonade, jellybeans, maple syrup, and doughnuts were all found to be infrequently used words on both participant rating and corpus data, and were therefore excluded. The remaining stimulus items consisted of soft drink, Coke, candy, sweets, chocolate, cake, ice cream, and fast food.

Study 3 4.1. Pilot
A sample of 12 adolescents aged 11-14 years old (M Age = 12.50, SD Age = 1.17, 7 female, 5 male) were asked to provide the five words they most associated with food and drinks high in free sugar and with healthy foods in a free recall format identical to the Study 2 pilot. Participants were recruited via advertisements targeting parents on social media and using the staff email broadcast at an Australian university. After parental and adolescent consent, participants were able to access the study via a hyperlink.

Participants and Procedure
Participants for Study 3 were 20 adolescents aged 11-14 years old (M Age = 12.64, SD Age = 0.90, 9 female, 11 male). Participants were recruited via advertisements targeting parents on social media and in online community groups, and using the staff email broadcast at an Australian university.
Upon clicking the advertisement, parents were presented with a detailed information sheet, and presented with a short link where their adolescent could access the study. The procedure of Study 3 mirrored that of Study 2a, with the exception that, following the rating of each item's association with the target concepts, participants were asked to rate how frequently they used each of the stimulus words, as in Study 2b. The CORE Corpus was used as an indicator of word frequency alongside self-reported use frequency [41].

Results and Discussion
The adolescent samples ratings of the association between potential stimulus words and target concepts, the self-reported frequency of use, word characteristics, and corpus statistics for word use for healthy eating and foods and drinks high in free sugar are presented in Tables 5 and 6, respectively. Mirroring Studies 1 and 2, we again aimed to create stimulus sets which were highly representative of their target concepts and in frequent use by the target sample (mean correspondence ≥4.00, and mean use frequency rating ≥2.50 or corpus frequency ≥3). Thirteen exemplars were rated as highly associated with the concept of healthy food, and of these 13 all but Brussel sprouts were in relatively frequent use. As in the young adult sample, the 10 highest ranked suitable items were selected: vegetables, carrots, salad, broccoli, apples, lettuce, banana, fruit, mango, and avocado. Of the 15 potential free sugar stimulus items extracted from the pilot sample, 10 were highly associated with the concept of free sugar. However, the word energy drink was rated as infrequently used on both rating and corpus data. The remaining nine words were: soft drink, Coke, lollies, chocolate, ice cream, cake, lemonade, and Slurpee.

Study 4
After the piloting of materials, we aimed to test the piloted measures to assess how they compared in terms of their reliability and predictive validity in comparison with previously used, unpiloted measures.

Participants and Procedure
Participants in Study 4 consisted of two samples: undergraduates from an Australian university, who completed measures relating to free sugar; and a community sample of Australian individuals recruited from the general public by a panel company, who completed measures relating to healthy eating. After providing informed consent, participants completed an implicit attitude measure using piloted stimuli and an implicit attitude measure using stimuli from previous research, before completing self-reported measures of behavior. In the university student free sugar sample, 67 participants completed measures; however, six participants met the implicit measure scoring exclusion criteria (e.g., excessive errors; Greenwald et al., 2003), resulting in a final sample of 61 (M Age = 21.54, SD Age = 6.24, 35 female, 25 male, 1 other). In the general population sample, 162 participants completed measures; however, four were flagged for exclusion based on the implicit measure scoring criteria [42], resulting in a final sample of 158 (M Age = 59.03, SD Age = 15.26, 83 female, 73 male, 2 other).

Single Target Implicit Association Tests
Implicit attitude was assessed using the Single Target Implicit Association Test [43], administered using the IATGEN package in the Qualtrics online data collection platform [44]. The ST-IAT is a reaction time-based task used to infer participant implicit attitude towards an attitude target by comparing response times from trials when positive words share a response key with the target concept with trials where negative words share a response key with the target concept. For the ST-IATs in both samples, positive stimuli consisted of five words (good, tasty, enjoyable, nice, fun), as did negative stimuli (bad, nasty, dull, awful, boring). In the general population sample, healthy eating words for one ST-IAT were drawn from the piloting data (fruit, vegetables, broccoli, lettuce, bananas, oranges, spinach, salad), while another ST-IAT used the stimuli used in a previously published study (strawberries, rice, fruit salad, turkey filet, cucumber, apples, grapes, chicken) [45]. Similarly, in the university student population, one ST-IAT used stimuli extracted from the piloting process (soft drink, Coke, candy, sweets, chocolate, cake, ice cream, fast food), and the other used stimuli from previous research (syrup, sucrose, glucose, honey, caramel, chocolate, icing, lolly) [46,47]. All ST-IATs were scored following recommended conventions [42,43], with scores normalized for potential order effects.

Behavior
In each sample, behavior was assessed using brief food frequency questionnaires drawn from previously validated measures of dietary consumption [48,49]. These items asked participants to respond how often they consumed common healthy or free sugar items (e.g., "Leafy green vegetables", "Chocolate"), on an 8-point scale anchored [1] never to [8] 4+ times per day.

Results
In both samples, the piloted and previously used ST-IATs presented with acceptable reliability coefficients (healthy eating piloted ST-IAT α = 0.62; healthy eating previously Used ST-IAT α = 0.66; free sugar piloted ST-IAT α = 0.68; free sugar previously used ST-IAT α = 0.62), with only minor differences between ST-IATs. In the healthy eating sample, both the piloted ST-IAT (r = 0.28, p < 0.001) and the previously used ST-IAT (r = 0.18, p = 0.020) were associated with behavior, although the piloted ST-IAT had a stronger effect. In the free sugar sample, neither ST-IAT was associated with behavior, as the piloted ST-IAT had a small effect which did not meet the significance threshold (r = 0.09, p = 0.489), while the previously used ST-IAT had a negligible relationship with behavior (r = 0.02, p = 0.898).

General Discussion
The aim of the current research was to test a multi-step process to extract sets of stimulus items for implicit measures. Through this process we aimed to make informed choices on stimulus selection, accounting for the correspondence of exemplars with the target concept, the frequency of exemplar uses in common language, and the length of the exemplars. We tested this process on two prominent health behaviors: healthy eating and the consumption of products high in sugar. Testing the piloted stimulus sets in a sample of university students and a general population sample, we found little difference between piloted and previously used implicit measures in terms of the reliability coefficients, but a slightly stronger implicit measure-behavior relationship in the piloted measures.
Previous evidence has indicated variability in response times to stimulus items as a function of a stimuli's level of correspondence to its target concept [26] and level of familiarity to its target sample [27,31]. Similarly, word length has also been theorized to effect response times [30], especially in those with less developed reading skills [37][38][39]. Thus, by using empirical data to minimize variability in stimulus items' levels of correspondence, familiarity, and word length, the current research may represent a method of increasing the validity of research which utilizes implicit measures. Specifically, controlling for these features may serve as a pathway to increasing the accuracy of response time-based scoring metrics by reducing several potential sources of task-irrelevant variation in response times. However, in spite of evidence of the effects of stimulus characteristics on implicit measure responses, there is a dearth of research systematically investigating piloting methods and how they may affect metrics inferred from implicit measures.
In terms of the testing of implicit measures based on the extracted items, data provides preliminary support for the value in piloting stimulus sets. Although we observed the small sized implicit measure-behavior relationships typical of current dual process research [46,[50][51][52], the implicit measure-behavior relationship was stronger when using the implicit measure with piloted stimulus items in both the university student and general population samples. This is consistent with previous evidence and theory that implicit measures using stimuli highly representative of their target constructs are liable to produce stronger effect sizes [11,26]. However, in contrast to our expectations, there were no systematic differences in reliability scores, as the piloted IAT had slightly higher reliability in the free sugar sample, while the previously used IAT had slightly higher reliability in the healthy eating sample. Given the fact that we expected homogenizing stimulus sets to consist of only highly relevant, familiar items to reduce the likelihood of extraneous variation in reaction times, this is somewhat surprising. However, it is also important to note that implicit measure internal consistency statistics should be interpreted with a degree of caution, given that they are subject to influences beyond the reliability of measurement, such as attitude polarity or personal importance [53]. Thus, it is difficult to assess which characteristics affected reliability in the current study, if at all. Future research may seek to address this issue using investigations in large data sets, or through improvements to the mathematical basis for estimating implicit measure reliability statistics.
Beyond evidence for the efficacy of piloted stimulus sets, the current findings also have several notable implications for broader implicit measure research. The extracted stimulus sets present a data-driven, empirically grounded representation of their target concepts in three populations. In an immediate sense, these findings have inherent value in elucidating laypeople's understanding of free sugar and healthy food items, and thus informing stimulus selection for implicit measures focused on these key health behaviors. However, it is also important to consider that the highest rated of the extracted exemplars differ notably from published statistics and guidelines regarding dietary choices. For example, despite high sugar beverages accounting for a disproportionately high amount of sugar in children's diets [54,55], out of the 16 exemplars provided by the child sample, the highest ranked exemplar drink, soft drink, was ranked by children as seventh, and the highest contributor to children's sugar intake [56], juice, was ranked 14th. However, despite soft drinks making up a notably smaller portion of overall sugar intake in adults [55,56], Coke and soft drink were ranked the most associated with food and drinks high in free sugar in the adolescent and young adult samples, respectively. Similarly, with respect to healthy foods, fruits and vegetables ranked highly in all samples. While this is somewhat expected, it is important to consider that the highest rated exemplars do not match published definitions of a healthy and balanced diet [57], as grains and lean proteins were ranked as being less associated with the concept of healthy foods.
This inconsistency poses an interesting question for research design, as to whether stimulus sets should consist of those exemplars the target population rates as most corresponding with the target concept, or exemplars which cover the breadth of a concept as defined by professionals. As the aim of the current study was to test a method of piloting stimulus items which minimized any potential for extraneous input from researchers, a purely data-driven approach was employed, prioritizing the inclusion of stimulus words by their ratings of correspondence. As a result, the stimulus sets chosen may not reflect what experts define as the key exemplars for each category. An alternative approach might be to combine data with the professional opinion of experts. For example, for healthy eating exemplars in the young adult sample, the exemplars beans, fish, and eggs met the criteria of being highly corresponding to the target concept and in regular use, but were excluded as they were less highly corresponding than the selected items. Using guidelines [57] in combination with our data, a stimulus set which better covers the breadth of the concept of healthy eating in young adults might be vegetables, lettuce, fruit, broccoli, salad, eggs, tomatoes, bananas, fish, and beans to include a range of fruits, vegetables, and proteins. This is arguably a superior stimulus set than a purely data-driven approach. For example, behavior measures designed by experts to assess healthy eating often cover multiple facets of a good diet [49], rather than simply fruit and vegetable consumption. Thus, it is plausible that a stimulus set which is based on both published guidelines and empirical data may produce stronger findings through a closer correspondence to validated behavior measures that have also been designed with guidelines in mind. However, even in the current data, such a mixed approach is not always possible. In both the child and young adult samples, no grains were rated as highly associated with healthy food. Additionally, in the child sample, few proteins were provided in the free-recall experiment, and those provided were ranked as poorly associated with healthy eating in the rating task. Such a finding leaves researchers in a difficult position with regard to decisions about stimulus selection, as providing full coverage of the target concept may mean introducing extraneous variance into implicit measure results.
Such a predicament is not unique to methodology utilizing implicit measures, and the current findings are also likely to have implications for research using the more traditional self-report methods of data collection. For example, a researcher asking a simple Likert scale question on healthy eating (e.g., eating a healthy diet in the next two weeks would be [1] boring to [7] enjoyable) will receive responses based upon the participants salient definition of healthy eating. However, according to current research, this salient definition may be incongruent with conceptualizations from researchers and experts. To an extent, these findings highlight the value of providing clear definitions of key constructs in survey and self-report research. Yet, even with a clear definition provided, the extent to which findings are affected by discrepancies between provided definitions of key constructs and participants salient understanding is unclear.

Strengths, Limitations, and Future Directions
The current study had several notable strengths, including the use of multiple samples and an empirically grounded design. However, the current research is not without its limitations. Firstly, examining the mean level of correspondence and familiarity in samples does not account for individual differences within each sample. In the current literature, there is suggestion this could be addressed through the use of individualized stimulus items, rather than creating stimulus sets via piloting. While some studies have found this individualization technique to be useful [58][59][60], it may be impractical for many researchers in terms of creating implicit measures using currently available experimental software packages, and may in itself introduce additional variance to scores inferred from implicit measures. Future research may seek to compare the relative value of addressing stimulus variation issues through piloting or measure individualization to inform best practice implicit research. Further, while the stimulus sets created in the current study have natural value to their target populations, it is possible that cultural variations may inhibit their usefulness outside the Australian context. For example, some extracted items such as lollies or soft drink may not be familiar in US samples, where candy or soda may be more likely items. Thus, replication of the current process in alternative samples may serve as a useful avenue for future research.

Conclusions
The current study presented a multi-step method of creating stimulus sets for use in implicit measures including extracting potential exemplars using free-recall piloting, testing the level of correspondence between potential stimulus items and target constructs, and investigating word characteristics, such as length and use frequency, which may impact responses on implicit measures. From analysis, six sets of stimulus items were created, representing healthy eating and sugar consumption in children, adolescents, and young adults. By controlling for potentially extraneous sources of variance on reaction times, this process may represent an avenue for increasing the validity and precision of implicit measures. This was reflected in two samples where an implicit measure based upon the piloted stimuli had a slightly stronger relationship with behavior than previously used implicit measures drawn from the published literature. Thus, the current research has notable practical implications for research employing implicit measures like the implicit association test or affective misattribution procedure, providing preliminary support for the value in empirically grounded stimulus selection to ensure more accurate and valid findings, and reaffirming concerns that arbitrary stimulus selection may be an avenue of introducing biases to results. There are also practical implications for wider research design beyond the stimulus sets created, as the exemplars rated as the most corresponding to the target concepts in each sample did not accurately reflect published guidelines and known patterns of consumption. These deviations from expected patterns pose difficult questions for implicit research overall, and reaffirm the need for clear conceptualizations of key constructs