Tablet assessment of word comprehension reveals coarse word representations in 18– 20- month- old toddlers

The present study explores the viability of using tablets in assessing early word comprehension by means of a two-alternative forced- choice task. Forty- nine 18– 20- month- old Norwegian toddlers performed a touch- based word recognition task, in which they were prompted to identify the labeled target out of two displayed items on a touchscreen tablet. In each trial, the distractor item was either semantically related (e.g., dog– cat) or unrelated (e.g., The present study used a within- subjects design. Toddlers’ comprehension of 24 lexical items of three levels of difficulty (easy, moderately difficult, and difficult; see Lexical Items section, below) was assessed using a tablet- based 2AFC word recognition task. Lexical targets were assessed under two conditions: semantically related (i.e., the lexical target was presented with a distractor from the same semantic category) and semantically unrelated (i.e., the lexical target was presented with a distractor from a different semantic category).


| INTRODUCTION
Historically, studies of early language development involved observations of children's spontaneous speech while interacting with their parents or an experimenter/clinician (Clark, 1974). Despite this method's undeniable appeal of ecological validity, the process of collecting, transcribing, and analyzing spontaneous language samples is labor-intensive and time-consuming.
To go beyond these limitations, researchers have turned to indirect assessment methods-parental reports-that provide insights into their child's communicative-linguistic development. Parental reports systematically utilize parents' extensive experience with their children, and thus allow for the collection of data that is not just more extensive than what can typically be collected during a brief laboratory or clinical session, but that might also be more representative of children's abilities (Fenson et al., 2000). Furthermore, the application of parental reports, such as the widely-used MacArthur-Bates Communicative Development Inventories (CDIs) in cross-linguistic studies, has provided invaluable insight into infants' and toddlers' early lexical development (Bleses et al., 2008;Braginsky et al., 2019;Frank et al., 2021), while other studies have evinced predictive relationships between early vocabulary and subsequent academic outcomes (e.g., Bleses et al., 2016;Duff et al., 2015;Morgan et al., 2015).
Yet, concerns have been raised regarding the sole use of parental reports, in particular when it comes to the assessment of comprehension (more than for production), since parents can at best infer comprehension based on infants' and toddlers' non-verbal responses to language (Feldman et al., 2000;Houston-Price et al., 2007;Tomasello & Mervis, 1994). For instance, while, on a "general level," previous studies have found moderate to strong correlations between average parental reports on the CDI and direct measures of infants' and toddlers' word knowledge (Fernald & Marchman, 2012;Fernald et al., 2006;Friend et al., 2012;Hurtado et al., 2008), on an "item-level," the evidence is mixed. For example, studies using indirect, eye-tracking measures revealed both underestimation (Houston-Price et al., 2007) and alignment (Styles & Plunkett, 2009;Syrnyk & Meints, 2017) between parental reports and child comprehension operationalized as visual gaze preference; studies using direct measures (i.e., child's overt answer, a touch response), on the other hand, for example, Friend et al. (2012) and Friend and Zesiger (2011), reported moderate item-level agreement.
Inconsistencies between parental reports and direct measures of child word comprehension might reflect immaturity of children's early lexical-semantic representations, which makes it challenging for parents to pin down whether a child knows a given word. Previous research has shown that early word representations are (semantically) coarse and infants and toddlers use a number of cues to disambiguate words, rather than a one-to-one word-object mapping. For instance, at 6 months of age, infants fail to disambiguate semantically/functionally related items (Bergelson & Aslin, 2017a), and at 8 months, they struggle to disambiguate items matched for frequency in child-directed speech (Kartushina & Mayor, 2019). Although word-object mappings undergo a progressive development through learning, and semantic specificity sharpens by 18-20 months of age (Bergelson & Aslin, 2017b), early word representations remain fragile by the end of the second year (Arias-Trejo & Plunkett, 2010). Arias-Trejo and Plunkett have shown that 18-24-month-olds failed to disambiguate items that were both perceptually and semantically related (e.g., an apple and an remote data collection in 18-20 month-old toddlers is viable, as comparable results were observed from both inlaboratory and online administration of the touchscreen recognition task. orange), as compared to semantically related but perceptually dissimilar items only (e.g., an apple and a banana), indicating that the presence of a perceptually similar distractor increases the burden of visual discrimination and feature overlap for semantically related objects.
Imprecision of parental reports may have implications when such instruments are used as measures in research or as a basis for decisions in clinical settings (Yoder et al., 1997). For these reasons, the use of supplemental measures to parental reports is encouraged (Dale et al., 2003;Fenson et al., 1993), and further assessment of their validity is needed.
A direct language measure can serve both as a convergent and a supplemental measure of parental reports. While many structured tests, such as the Peabody Picture Vocabulary Test (Dunn, 2018) and the Expressive Vocabulary Test (Williams, 2018), are available to assess young children's vocabulary knowledge, direct measures that are appropriate for assessing children below two years of age remain scarce, due to the inherent difficulty in maintaining infants' and toddlers' interest and attention (Friend & Keplinger, 2003) as well as behavioral non-compliance (Kaler & Kopp, 1990). Whereas looking-based measures, such as the Intermodal Preferential Looking Paradigm (Golinkoff et al., 1987;Hirsh-Pasek & Golinkoff, 1996) and the Looking-while-listening task (Fernald et al., 1998(Fernald et al., , 2006, have been successfully used with infants as young as 4 months old by eliminating the need for a volitional response (Golinkoff et al., 2013), the passive and repetitive nature of such measures may quickly lead to boredom among older toddlers, thus making an extensive assessment impracticable. The Computerized Comprehension Task (CCT; Friend & Keplinger, 2003), on the other hand, is a reliable and valid touchscreen-based measure designed specifically for assessing comprehension among toddlers between 16 and 24 months of age and has been shown to be effective in maintaining children's attention as well as improving compliance (Friend & Keplinger, 2003Friend et al., 2012;Friend & Zesiger, 2011;Hendrickson et al., 2015;Poulin-Dubois et al., 2013).
Following the approach of the CCT-in providing an engaging direct language assessment-the present study explores the viability of tablets in assessing toddlers' word comprehension by means of a word recognition task, with the following three objectives. First, despite tablets and apps being increasingly commonplace among children of all ages, the use of tablet-based assessments has been primarily limited to adults and older children. Given that tablets are easy to operate even for the youngest children and additionally, given children's increasing proficiency with tablets (Abdul Aziz et al., 2014;Marsh et al., 2015), there is a need to examine how such devices can be used most effectively to collect child language data. Neumann et al. (2019), for instance, demonstrated that a tablet-based assessment could provide a valid and reliable measure of early literacy skills, at least among the older children (n = 45, M age = 4.65) tested in their study. Twomey et al. (2018) further showed that children as young as 24 months old were able to complete a tablet-based assessment of early cognitive functions.
Second, compared to traditional paper-and-pencil tests, tablet-based assessments provide a testing situation that is more engaging and motivating. While the CCT offers the same advantage, the assessment is typically administered in laboratories, where screens are often mounted on a wall or placed on a desk and thus require full arm movements, which may in turn, lead to fatigue in longer sessions (Frank et al., 2016). In contrast, tablet-based assessments require only minimal motor movements and are much more portable due to the small form factor of tablets.
Third, there is a need to further evaluate the alignment between parental reports and children's word comprehension, and, in particular, to assess whether parental evaluations fit best their toddlers' word recognition in coarse (the semantically unrelated condition) or finer-grained contexts (the semantically related condition). Children vary in the strength of their word knowledge at the item-level and capturing this variability is important for a robust understanding of a child's lexical development.
In order to explore the viability of using a tablet-based measure in assessing early word comprehension and to examine the role of semantic relatedness in early word recognition, the present study employed a two-alternative forced-choice (2AFC) word recognition paradigm (similar to the CCT) with Norwegian toddlers aged between 18 and 20 months. As the CCT is only available in three languages (i.e., English, Spanish, and French), lexical items were selected from the Norwegian adaptation of the CDI-Words and Gestures (CDI-WG; Simonsen et al., 2014), with varying levels of difficulty (defined based on the normative data). Within each trial, toddlers saw on a screen two images: one representing the lexical target, and the other representing the distractor. In contrast to the CCT, in which only semantically related item pairs were used, the current design examined the role of semantic relatedness on toddlers' performance in the word recognition task, by pairing the lexical target with a distractor belonging to a different semantic category (e.g., a car and a cat) and with another distractor belonging to the same semantic category (e.g., a car and an airplane). It was expected that toddlers, in the current study, would be more accurate in semantically unrelated than related trials. Based on previous work using the CCT (Friend & Keplinger, 2003, accuracy was also expected to mirror the a priori difficulty levels, with accuracy decreasing with increasing difficulty. Finally, if parental reports are an accurate predictor of toddlers' word knowledge, a positive relationship between parent-reported comprehension and toddler's accuracy in word recognition was expected.

| Participants
Parents of 49 monolingual (>75% exposure) Norwegian toddlers (aged between 18 and 20 months) from the Greater Oslo Region, Norway, were contacted to participate in the current study through social media, leaflets distributed in a kindergarten, postal mailing lists, and email lists. After consenting to participate in the study, parents completed the Norwegian adaptation of the CDI-WG (Simonsen et al., 2014) online within one week prior to the study so that the current estimates of their child's vocabulary size could be obtained. Parents' socioeconomic status (SES), indicated by mother's highest education level, ranged from 0 (primary school) to 5 (doctoral degree), with the mean score 3.57 (SD = 0.82).
All recruited toddlers were full-term at birth, had no hearing or visual impairments, and had Norwegian as their native language. Toddlers participated in the study in one of three settings: the BabyLing laboratory, a municipal kindergarten, and online (i.e., at toddlers' own homes). 1 In both the laboratory and the kindergarten settings, toddlers were tested by an experimenter, whereas online, toddlers were tested by their parents. 2 Thus, for simplicity, both the laboratory and kindergarten samples (n = 21; 16 females, 5 males) were categorized under the laboratory setting, and the online sample (n = 28; 15 females, 13 males), the online setting. An additional 11 participants had to be excluded for failing to complete the task (n = 7; 2 laboratory and 5 online) and for attempting the task more than once (n = 4; all online). Mean age, age range, and standard deviation for each setting are detailed in Table 1.
The present study was conducted according to guidelines laid down in the Declaration of Helsinki, with written informed consent obtained from a parent or a guardian for each child before any assessment or data collection. The study was approved by the ethics committee at the Department of Psychology, University of Oslo and by the Norwegian Centre for Research Data (NSD, ref. 807456).

|
LO et aL.

| Design
The present study used a within-subjects design. Toddlers' comprehension of 24 lexical items of three levels of difficulty (easy, moderately difficult, and difficult; see Lexical Items section, below) was assessed using a tablet-based 2AFC word recognition task. Lexical targets were assessed under two conditions: semantically related (i.e., the lexical target was presented with a distractor from the same semantic category) and semantically unrelated (i.e., the lexical target was presented with a distractor from a different semantic category).

| Apparatus and materials
The study was conducted via a custom-based online experimental platform developed by Lo et al. (2021). In the laboratory setting, a Samsung Galaxy Tab S4 was used to run the study, whereas in the online setting, parents' own touchscreen devices were used. The Norwegian adaptation of the CDI-WG (Simonsen et al., 2014) was used as a measure of vocabulary size.

| Lexical items
Four highly familiar lexical items were selected for the familiarization phase: "ball" [ball], "hus" [house], "sko" [shoe], and "tre" [tree]. For the test phase, a total of 24 lexical items were selected. Each lexical target was assessed twice, by pairing its referent with semantically related and unrelated referents as distractors. Item pairs varied in difficulty (defined a priori on the basis of the Norwegian CDI-WG normative data for 20-month-olds; Frank et al., 2017;Simonsen et al., 2014) and were comprised of an equal number of easy (comprehended by more than 80% of the normative sample), moderately difficult (comprehended by 40%-80% of the normative sample), and difficult (comprehended by less than 40% of the normative sample) items. Within each level of difficulty, there was also an equal representation of animate and inanimate referents. The list of item pairs is provided in Table 2.

| Visual and auditory stimuli
To remove potential biases due to familiarity effects (from assessing the same item twice), visual stimuli for the test phase included 48 images of prototypical referents (as reported by 2 adults in a separate stimuli assessment) for the 24 lexical items assessed (i.e., two images for each item). The set of images used can be found in Appendix 1 (see also Appendix 2 for the images used in the familiarization phase). Within each item pair, the side (left or right) on which a referent appeared was counterbalanced. All auditory stimuli were recorded by a female native speaker of Norwegian in T A B L E 1 Age mean, standard deviation, and range for laboratory and online settings child-directed speech and then processed in Praat (Boersma & Weenink, 2020) to remove noise and equalize intensity across the 24 prompts.

| Procedure
The study began with an introductory phase, followed by a familiarization phase and a test phase.

| Introductory phase
Before the familiarization phase began, a smiley face was presented at the center of the screen with an introductory audio "Hei! Har du lyst til å spille?" [Hi! Do you want to play?] to attract participants' attention. In order to proceed to the familiarization phase, the experimenter/parent had to tap on the "Next" button at the bottom-right corner of the screen.

| Familiarization phase
The familiarization phase consisted of four 2AFC trials to (a) ensure that participants understood the context of the task and (b) familiarize them with the tapping paradigm. In each trial, participants were presented with a pair of highly familiar objects (placed on the left and right sides of the screen respectively) and prompted to tap on the referent for the heard lexical target X embedded in the carrier phrase "Kan du trykke på X?" [Can you touch the X?] Tapping was disabled for the first 2000 ms from the onset of the trial to prevent impulsive responses during the audio prompt that lasted between 1500 and 2000 ms. The timeout was 8000 ms (comparable to Friend et al., 2012), to accommodate for considerable individual variation in response times (see Ackermann et al., 2020). As soon as a (touch) response was provided, the next trial was presented.

| Test phase
Before the test phase began, a smiley face was again presented at the center of the screen, accompanied by an audio with an encouraging phrase "Da fortsetter vi!" [Let's continue!] The experimenter/ parent had to tap on the "Next" button to begin the test phase. The test phase consisted of 48 2AFC trials, in which each lexical target was assessed twice (paired with either a semantically related distractor or a semantically unrelated distractor). In each trial (see Figure 1 for a screenshot), participants were presented with an item pair (see Table 2) and prompted to tap on the referent for the heard lexical target X (see carrier phrase from the familiarization phase). Each item pair was presented twice so that each item within the pair served as a target and a distractor in an equal number of trials. As with the familiar trials, tapping was disabled for the first 2000 ms of the trial (to prevent participants from providing responses during the audio prompt that lasted between 1500 and 2000 ms), after which participants were given 8000 ms to respond until the subsequent trial was presented. Trials were presented in a random order, with three breaks interspersed throughout the test phase. During each break, a smiley face was presented in the same manner as before, accompanied by one of the following encouraging phrases: (a) "Da fortsetter vi!" [Let's continue!], (b) "Nå går vi videre!" [Now, we move on!], (c) "Da har vi den neste!" [Then, we have the next (one)!], and (d) "Da er du nesten ferdig! Bra!" [You're almost done! Good!] In order to continue with the test, the experimenter/parent had to also tap on the "Next" button at the bottom-right corner of the screen. Upon completion of the test phase, the smiley face was once again presented, accompanied by an audio with the phrase "Nå er du ferdig! Kjempebra!" [Now you're done! Great!].

| RESULTS
The results are organized around three central questions. First, potential differences between data collected online and in-laboratory were considered. Second, the influence of semantic relatedness and difficulty of item pairs on toddlers' motivation to produce a response as well as on their performance in the word recognition task were examined. Finally, the convergent relation between toddlers' performance and parental report (CDI-WG) was assessed. In accordance with previous work using the CCT (Friend & Keplinger, 2003;Friend et al., 2012), missing responses (i.e., trials in which the child did not produce a response) were treated as errors of comprehension. 3

| Attempted trials
The number of trials in which a tap response was produced, regardless of whether the response was correct (i.e., tap on target) or incorrect (i.e., tap on distractor), was used as a measure of toddlers' motivation to produce a response during the word recognition task. Results from a Welch's t-test indicated that toddlers who were tested online (M = 44.286, SD = 6.359) and those who were tested in the laboratory (M = 40.810, SD = 7.061) did not differ significantly in the number of attempted trials; t(40.601) = −1.779, p = .083 (see Figure 2).
To assess whether toddlers' motivation (as indexed by whether an attempt to provide a tap response was made) differed across semantic relatedness and difficulty of the trials, a binomial generalised linear mixed-effects model (GLMM) with a logit link function was fitted and analyzed using the mixed() function from the afex package (Singmann et al., 2020), which relies on the lme4 package (Bates et al., 2015) for model fitting. The model included semantic relatedness (related, unrelated), difficulty (easy, moderately difficult, and difficult), toddlers' age (in months), and the interaction between semantic relatedness and difficulty as fixed effects, as well as participant and selected object as random intercepts. 4 Both semantic relatedness (−1: unrelated; 1: related) and difficulty (−1: easy; 1: moderately difficult, difficult) were sum-coded, whereas age was centered on the mean. To determine a model with a parsimonious random effect structure (Matuschek et al., 2017), the forward "best-path" approach, with α = .20 as the inclusion criterion, was used to test random slopes for inclusion (Barr thus omitted.

F I G U R E 1 Screenshot of a trial in the test phase
F I G U R E 2 Attempted, correct, and incorrect trials across different settings 604 | LO et aL. et al., 2013). As none of the random slopes fell below the inclusion criterion, the random-interceptsonly model was retained: The results are detailed in Table 3, with chi-square statistics and p-values obtained using likelihood ratio tests. Follow-up pairwise comparisons, with p-values adjusted using the Tukey method, were conducted using the pairs() function in the emmeans package (Lenth, 2020).
As shown in Table 3, there were significant main effects of trial difficulty and age, with the number of attempted trials increasing with age. No significant main effect of semantic relatedness was found; neither did semantic relatedness interact with difficulty. Results from the follow-up tests indicated that toddlers attempted significantly more easy than difficult trials (β = 0.556, SE = 0.186, z = 2.995, p = .008), while no such difference was found between easy and moderately difficult trials (β = 0.363, SE = 0.189, z = 1.917, p = .134) as well as moderately difficult and difficult trials (β = 0.193, SE = 0.176, z = 1.096, p = .517; see also Figure 3).

| Correct trials
Results from a Welch's t-test indicated that there was no statistically significant difference between toddlers who were tested online (M = 38.286, SD = 7.262) and those who were tested in the laboratory (M = 34.095, SD = 8.717) in terms of the number of trials in which they correctly identified the target referent; t(38.508) = −1.787, p = .082 (see Figure 2). To assess whether toddlers' accuracy differed across semantic relatedness and difficulty of the trials, a binomial GLMM with a logit link function was again fitted and analyzed. The model included the same fixed effects as the previous model (i.e., semantic relatedness, difficulty, age, and the interaction between semantic relatedness and difficulty) and the same random intercepts (i.e., participant and selected object), with by-participant adjustments to the slope of difficulty: 5 Attempted ∼ Relatedness * Difficulty + Age + (1|Participant) + (1|Object) 5 The inclusion of setting (i.e., online vs. laboratory) and sex as fixed effects in the model did not change the conclusions and were thus omitted. The results are detailed in Table 4, with chi-square statistics and p-values obtained using likelihood ratio tests. Follow-up pairwise comparisons were conducted with p-values adjusted using the Tukey method.
As shown in Table 4, there were significant main effects of semantic relatedness, difficulty, and age. Specifically, toddlers responded with higher accuracy in semantically unrelated than related trials. Toddlers' accuracy also increased significantly with age. No interaction was found between semantic relatedness and difficulty, however. Results from the follow-up tests indicated that toddlers were significantly more accurate in easy trials relative to both moderately difficult (β = 0.523, SE = 0.183, z = 2.861, p = .012) and difficult trials (β = 1.113, SE = 0.164, z = 6.799, p < .001). Toddlers were also significantly more accurate in moderately difficult than difficult trials (β = 0.590, SE = 0.150, z = 3.924, p < .001; see also Figure 4). 6

| Convergent validity
At the general level, toddlers' receptive vocabulary size, as measured by the CDI-WG, and their overall accuracy in the word recognition task significantly correlated in both unrelated, r (47) = .631, p < .001 and related trials, r (47) = .603, p <.001. Partialling out the effect of age further revealed that Accuracy ∼ Relatedness * Difficulty + Age + (1 + Difficulty|Participant) + (1|Object) 6 A Spearman correlation between toddlers' overall word recognition accuracy and SES revealed no relationship, rho = 0.1, p = .46.

F I G U R E 3 Proportion of attempted trials across settings by semantic relatedness and difficulty
toddlers' receptive vocabulary size accounted for a significant proportion of unique variance in their recognition accuracy, beyond that accounted for by their age in both unrelated, r (46) = .593, p < .001, R 2 = .352 and related trials, r (46) = .538, p < .001, R 2 = .289.
To explore the consistency between toddlers responses and parent-reported comprehension on the test items (i.e., parent-child agreement), item-level agreement was calculated (see Table 5) and a binomial GLMM with a logit link function was fitted. The model included semantic relatedness, difficulty, age, and the interaction between semantic relatedness and difficulty as fixed effects. Both semantic relatedness (−1: unrelated; 1: related) and difficulty (−1: easy; 1: moderately difficult, difficult) were sum-coded, whereas age was centered on the mean. Random intercepts included participant and selected object, with by-participant adjustments to the slopes of semantic relatedness, difficulty, and their interaction: 7 The GLMM results are detailed in Table 6, with chi-square statistics and p-values obtained using likelihood ratio tests. Follow-up pairwise comparisons were conducted with p-values adjusted using the Tukey method.
Overall, as shown in Table 6, there was good item-level agreement between parental reports and toddlers' responses, although this attenuated with increasing item difficulty. Results from the GLMM indicated that semantic relatedness, difficulty, as well as the interaction between semantic relatedness and difficulty (but not age) significantly predicted parent-child agreement (see also Figure 5). The follow-up tests revealed that parent-child agreement was significantly higher in semantically unrelated than related easy trials (β = 0.795, SE = 0.299, z = 2.662, p =.008), but no significant differences were found across the different semantic conditions in the moderately difficult (β = 0.253, SE = 0.169, z = 1.495, p =.135) and difficult trials (β = −0.166, SE = 0.164, z = −1.014, p = .311).
To further examine whether item-pair comprehension status (i.e., whether the target or the distractor label was known or not known by the toddler as indicated by parental responses on the 7 The inclusion of setting (i.e., online vs. laboratory) and sex as fixed effects in the model did not change the conclusions and were thus omitted.
Accuracy ∼ Relatedness*Difficulty + Age + (1 + Relatedness*Difficulty|Participant) + (1|Object) CDI-WG) was an accurate predictor of toddlers' performance in the word recognition task, another binomial GLMM with a logit link function was fitted, with semantic relatedness, difficulty, item-pair comprehension status, age, and the interaction between semantic relatedness and difficulty as fixed effects. Semantic relatedness (−1: unrelated; 1: related), difficulty (−1: easy; 1: moderately difficult, difficult), and item-pair comprehension status (−1: both unknown; 1: both known, target known only, distractor known only) were sum-coded, whereas age was centered on the mean. Random intercepts included participant and selected object, with by-participant adjustments to the slope of difficulty: 8 8 The inclusion of setting (i.e., online vs. laboratory) and sex as fixed effects in the model did not change the conclusions and were thus omitted.
Accuracy ∼ Relatedness*Difficulty + Pair Comprehension + Age + (1 + Difficulty|Participant) + (1|Object) The results are detailed in Table 7, with chi-square statistics and p-values obtained using likelihood ratio tests. Follow-up pairwise comparisons were conducted with p-values adjusted using the Tukey method.
As shown in Table 7, parent-reported item-pair comprehension was a significant predictor of toddlers' performance, along with semantic relatedness, difficulty, and age. No significant interaction effect between semantic relatedness and difficulty was found. Results from the follow-up tests indicated that toddlers were significantly less accurate when both target and distractor were reported as unknown compared to when both were known (β = −0.628, SE = 0.190, z = −3.300, p = .005) and when only the target was known

| DISCUSSION
In the interest of developing a performance-based measure of comprehension during the second year of life that addresses the need for a convergent and supplemental measure of parental reports, while taking into account young children's non-compliance and limited attention capabilities (as in Friend & Keplinger, 2003), the present study explored the viability of using a tablet-based 2AFC word recognition task in assessing early word comprehension. Toddlers aged between 18 and 20 months were tested-either in the laboratory setting by an experimenter or online (i.e., at home) by their parents-on their comprehension of 24 lexical items selected from the Norwegian CDI-WG (Simonsen et al., 2014). During the task, toddlers were asked to identify the referent for the lexical target presented alongside a distractor. Target-distractor pairs were manipulated such that each lexical target was paired once with a semantically related distractor and once with a semantically unrelated distractor. Item pairs also varied in three levels of difficulty (defined based on the Norwegian CDI-WG normative data for age-matched children).
Both the analyses on the number of attempted trials (regardless of whether the response was correct or incorrect) as well as the number of trials in which toddlers provided a correct response revealed no significant differences between the online and laboratory samples, suggesting that toddlers were equally motivated to produce a response in the task and that neither setting led to better or poorer performance. This demonstrates that remote infant data collection with fully automatized tasks can be as efficient and reliable as in situ laboratory assessments. High-quality data through remote administration are not only an important enabler during this time of the global COVID-19 pandemic, but also provide a promising avenue for data collection associated with developmental research, with increased speed, lowered cost, and the potential to an improved sample diversity by reaching to a wider sociodemographic background than traditional laboratory-based research (Sheskin et al., 2020).
Overall, in line with Friend and Keplinger (2008), toddlers attempted significantly more easy than difficult trials. Older toddlers also attempted significantly more trials than younger toddlers. Together, T A B L E 7 GLMM results for accuracy (with parent-reported item-pair comprehension as predictor) these findings suggest that toddlers were responding non-randomly and bolster the support for the notion that non-responses represent toddlers' true inability to map the lexical target to its referent, rather than their non-compliance or the lack of motivation, while incorrect responses might be taken as evidence of partial word knowledge, and correct responses-robust word knowledge (Hendrickson et al., 2015). With regard to the accuracy measure, toddlers demonstrated above-chance performance throughout the task. Congruent with previous work (Friend & Keplinger, 2003, toddlers' performance was consistent with the a priori "cohort-level" difficulty categorization, as their best performance was obtained for easy trials and their worst performance for difficult trials. As would be expected, older toddlers also performed with greater accuracy relative to younger toddlers. Examining the role of semantic relatedness, it was found that toddlers displayed more robust recognition in semantically unrelated than related trials, suggesting that, and similar to research in younger infants (Bergelson & Aslin, 2017a), semantical relatedness between the target and the distractor triggered competition effects in referent selection. Although there is evidence that early word representations are semantically more specified by 18-20-months of age (Bergelson & Aslin, 2017b), they still might be lacking representational specificity (Arias-Trejo & Plunkett, 2010). In the current study, in addition to semantic relatedness, lower recognition on some related trials could be attributed to the increased burden of visual discrimination and feature overlap (e.g., both goose and owl are birds and have wings, feather, and a beak), as shown with 18-24-month-olds in Arias-Trejo and Plunkett (2010). It is likely that toddlers, upon hearing the lexical target, co-activated related (and thus, competing) word referents, which subsequently interfered with their lexical decision about the target. Such interference has been reported even among older children, between 3 and 9 years of age, as they took longer to provide a correct response in a visual search task when a related distractor was present than when an unrelated distractor was present (Vales & Fisher, 2019).
Comparing between toddlers' recognition accuracy and their receptive vocabulary size as measured by the CDI-WG, a significant and moderate correlation (comparable to that achieved with the CCT; Friend & Keplinger, 2008) was found, evincing acceptable convergent validity of the word recognition task employed in the present study, and also supporting the feasibility of the CDI-WG, as a general proxy of receptive vocabulary. Consistent with the CCT (Friend et al., 2012;Friend & Zesiger, 2011), there was also good (albeit not perfect) item-level agreement between toddlers' responses and parental reports across both semantic conditions, with easy items having the highest agreement and difficult items having the lowest agreement. The results further indicated that parent-child agreement was significantly higher in semantically unrelated than related trials, although this was only limited to easy items. This discrepancy suggests that parents' inference on their child's word comprehension is not solely based on evidence of their child's true ability to comprehend the word, but rather on the confluence of both evidence of robust word knowledge (i.e., their child's true ability to comprehend the word) and evidence of partial word knowledge (i.e., their child's ability to respond appropriately when cued by the rich context in which the word is heard, or upon recognizing the sound of the word; Friend et al., 2018;Houston-Price et al., 2007;Tomasello & Mervis, 1994). Restating the finding that toddlers were less accurate in semantically related than unrelated trials, a performance-based measure that uses semantically related target-distractor pairs can potentially tap children's strong, rather than weak, word knowledge to supplement parental reports. Nevertheless, parent-reported item-pair comprehension (i.e., whether the target or distractor label was known or not known by the child) was found to be a significant predictor of toddlers recognition accuracy. Specifically, compared to trials where both the target and distractor were reported by parents as "not understood" on the CDI-WG, toddlers were more likely to respond correctly in trials where either the target or both the target and distractor were reported as "understood," indicating that parents are adequate informants of their child's language abilities.
It is important to note that the while the CCT uses a set of carefully selected test items consisting of an equal representation of nouns, verbs, and adjectives, the present study focused on nouns only. Nevertheless, the good item-level agreement between parental reports and their child's performance provides encouraging results. Supplemented with a principled selection of test items (Chai et al., 2020;Makransky et al., 2016) and with statistical methods to allow for an estimation of full CDI scores (Mayor & Mani, 2019) and total vocabulary sizes (Mayor & Plunkett, 2011), tablet-based word recognition tasks may provide a useful measure of receptive vocabulary skills in the second year of life-and potentially serve as a supplemental and convergent measure of parental reports.