Comparing cross-situational word learning, retention, and generalisation in children with autism and typical development

Word learning is complicated by referential ambiguity - there are often multiple potential targets for a newly-heard word. While typically developing (TD) children can accurately infer word meanings from cross-situational statistics, specific difficulties tracking word-object co-occurrences may contribute to language impairments in autism spectrum disorder (ASD). Here, we investigate cross-situational word learning as an integrated system including mapping, retention, and generalisation in both typical development and autism. In Study 1, children with ASD were as accurate at disambiguating the meanings of novel words from statistical correspondences as TD controls matched on receptive vocabulary. In Study 2, both populations spontaneously utilised social and non-social attentional cues to facilitate and accelerate their mapping of word-referent relationships. Across Studies 1 and 2, both groups retrieved and generalised word-referent representations with impressive and comparable accuracy. Although children with ASD performed very similarly to TD children on measures of learning accuracy, they were significantly slower to identify correct referents under both cued and non-cued learning conditions. These findings indicate that mechanisms supporting cross-situational word learning, and the relationships between them, are not qualitatively atypical in language-delayed children with ASD. However, the increased time required to generate correct responses suggests that these mechanisms may be less efficient, potentially impacting learning in natural environments where visual and auditory stimuli are presented rapidly. Our data support claims that word learning in the longer term is driven by the gradual accumulation of word-object associations over multiple learning instances and could potentially inform the development of interventions designed to scaffold word learning.


Introduction
Word learning is of crucial importance to children's language acquisition (Patael & Diesendruck, 2008). When an unfamiliar word is segmented from speech, mapping the phonological form to its meaning involves: (a) identification of intended meaning (referent selection), (b) storage of the word-referent association enabling later retrieval (retention), and (c) appropriate extension to new members of the same semantic category (generalisation). In natural learning environments, these processes are complicated by referential ambiguity. Parents direct 300-400 words at their children per hour (Hart & Risley, 1995) and each unfamiliar word could refer to innumerable unknown referents (Cartmill et al., 2013;Quine, 1960). One way that children resolve this ambiguity is by learning across situations, aggregating data across naming events to form stable word-referent relationships (MacDonald, Yurovsky, & Frank, 2017;McMurray, Horst, & Samuelson, 2012;Saffran, Aslin, & Newport, 1996;Smith & Yu, 2008;Vlach & DeBrock, 2019).
Unlike typically developing (TD) children, many children with autism spectrum disorder (ASD) have difficulty learning words. Children with ASD are often significantly delayed in their production of first words (Howlin, 2003), and approximately 25-30% have minimalto-no language during childhood (Anderson et al., 2007;Norrelgen et al., 2015;Rose, Trembath, Keen, & Paynter, 2016). Here, we investigate whether cross-situational word learning is atypical in children with ASD, and how differences in this ability may affect lexical retention and generalisation.
Fast mapping, evident in early stages of word learning, occurs when a child hears a novel word and uses the linguistic and non-linguistic context to rapidly acquire information about its meaning (e.g. mapping it to an unfamiliar object; Carey, 1978). However, in the face of referential uncertainty, accurate fast mapping may be impossible. Under such circumstances, correct word-referent mappings can be reliably acquired over multiple ambiguous naming events (Yurovsky & Frank, 2015). If a child hears an unfamiliar word ("cat") while viewing two unfamiliar objects (a cat and a mouse), they have no way of identifying the correct referent without external input. But, if the child hears the word again while viewing a different congregation of objects (a cat and a dog), they could combine word-object co-occurrence frequencies across exposures and accurately infer the meaning of "cat" (Smith & Yu, 2008). Empirical evidence demonstrates that TD adults and children spontaneously utilise word-object co-occurrence frequencies to identify the meanings of unknown words (Monaghan & Mattock, 2012;Roembke & McMurray, 2016;Smith & Yu, 2008;Vlach & Johnson, 2013;Vouloumanos & Werker, 2009;Yu & Smith, 2007;Yurovsky, Fricker, Yu, & Smith, 2014). For example, Smith and Yu (2008) presented TD 12-and 14-month-olds with pairs of novel words in conjunction with pairs of unfamiliar referents. Although these trials were individually ambiguous, each word reliably co-occurred with a single referent. Immediately after training, children were presented with test trials in which a single word was presented in conjunction with the target referent and a distracter. The results revealed that infants looked significantly longer at target referents, indicating their ability to infer correct meanings for multiple words based on cross-situational statistics alone.
Although there is substantial evidence that TD children's sensitivity to statistical associations between words and objects supports the acquisition of word-referent mappings, the underlying processes have been subject to debate (MacDonald et al., 2017). McMurray et al. (2012) propose that cross-situational word learning results from dynamically strengthening and weakening inter-connections between words and multiple potential referents within a system. By contrast, Trueswell and colleagues propose that learners map words to single referents, and then verify or update their hypotheses based on new evidence (Trueswell, Medina, Hafri, & Gleitman, 2013). Alternatively, both mechanisms may be employed depending on learning and environmental constraints (e.g. number of words, targets, and competitors; Yurovsky & Frank, 2015).
When faced with ambiguous naming events, TD children also infer meaning from non-linguistic social cues including gaze, gesture, and affect (Baldwin, 1993;Nappa, Wessel, McEldoon, Gleitman, & Trueswell, 2009;Tomasello, Strosberg, & Akhtar, 1996). As words often refer to whatever referent is currently the focus of a speaker's attention (Bloom, 2002), following social cues can facilitate statistical learning by increasing the frequency of exposure to correct word-referent co-occurrences, thereby increasing children's uptake from input (McMurray et al., 2012;Monaghan, 2017). Although research has yet to directly investigate how children utilise social cues in the service of cross-situational word learning, MacDonald et al. (2017) explored this relationship in neurotypical adults. They found that when referential ambiguity was reduced by eye gaze towards targets, participants were quicker and more accurate in their learning, and reduced their tracking of alternative possible mappings between words and objects.
To date, studies of cross-situational learning have focused predominantly on how TD children and adults identify referential meaning. As such, participants' understanding of word-referent mappings is conventionally tested immediately after training. However, accurate referent selection does not constitute word learning per se.
Horst and colleagues have repeatedly demonstrated that TD children forget new words just 5 min after performing at ceiling on fast mapping tasks (Axelsson, Churchley, & Horst, 2012;Horst & Samuelson, 2008;Horst, Scott, & Pollard, 2010). Under those learning circumstances, though children can select the novel referent accurately, their retention may be poor due to the paucity of associative information about the mapping. Thus, it is necessary to investigate how novel word-referent relationships that are acquired through cross-situational learning are retained after a delay. As theoretical accounts of lexical retention emphasise the role of associative mechanisms in gradually accumulating knowledge over multiple exposures (e.g. McMurray et al., 2012), we may expect cross-situational mappings to yield more robust and longlasting encoding than fast mapping as word-referent associations are accumulated across several naming events. Indeed, Vlach and DeBrock (2019) showed that TD children aged 47 months and older could retain words learned via cross-situational statistics at above-chance accuracy following a 5-minute delay.
Furthermore, studies have yet to explore how cross-situational learning supports extension of labels to unseen referents. As basic nouns refer to categories, children must learn to appropriately generalise labels to exemplars that have not been directly named. By approximately 24-months, TD children infer that noun-referent relationships are constrained by shape, and thereby generalise newly-learned words based on this feature rather than other perceptual properties (e.g. colour, size, texture; Landau, Smith, & Jones, 1988). In a recent study by Chen, Zhang, and Yu (2018), neurotypical adults were trained on associations between words and objects, but also words and a category name that extended over several related instances for an object. Participants were able to learn both individual labels and category labels. However, this study did not test whether use of a word extended to additional, unseen members of the category. Therefore, a novel goal of this study is to examine cross-situational word learning as a system that includes a dynamic mapping process, retention of meaning, and generalisation to novel referents.
However, understanding of how ASD affects children's ability to infer word-referent relationships in the absence of informative cues is extremely limited. Only two studies, to our knowledge, have assessed cross-situational word mapping in ASD. In McGregor, Rost, Arenas, Farris-Trimble, and Stiles (2013), children with ASD and TD controls matched on age (TD M = 10.9 years; ASD M = 11.2 years), nonverbal IQ (TD M = 106; ASD M = 108), and receptive vocabulary (TD M standard score = 113; ASD M = 109) mapped novel words to unfamiliar objects presented in sets of three. An actor gazed at the target object on half of the trials for each word ('facilitative trials') and gazed forward for the other half ('neutral trials'). Hence, on neutral trials, children had to rely on cross-trial information to accurately map the novel words (although, for some words, referents may have already been unambiguously identified in preceding facilitative trials). Both groups achieved above-chance accuracy on neutral trials, and significantly benefited from the ostensive social cues provided in facilitative trials. In a recent eye-tracking study, Venker (2019) assessed children with ASD aged 4-7.9 years and TD children aged 2.6-7.9 years matched on receptive vocabulary (TD M = 6.3 years; ASD M = 5.8 years) on a cross-situational mapping task modelled on Smith and Yu's (2008) paradigm. During training, children were exposed to four novel words, presented in pairs in conjunction with pairs of unfamiliar objects. Children's understanding of the novel word-object mappings was tested immediately after training. Both children with ASD and TD children looked longer at correct targets than expected by chance, and no between-population differences in accuracy were identified.
The lack of population differences in these studies aligns with recent evidence that ASD does not impair statistical learning when processing visual stimuli (Foti, De Crescenzo, Vivanti, Menghini, & Vicari, 2015;Roser, Aslin, McKenzie, Zahra, & Fiser, 2015) or segmenting speech (Haebig, Saffran, & Ellis Weismer, 2017;Mayo & Eigsti, 2012;Obeid, Brooks, Powers, Gillespie-Lynch, & Lum, 2016). However, a potentially more sensitive measure of the processing involved in cross-situational word learning is the speed at which children can access meaning (Kohnert, Bates, & Hernandez, 1999). In cross-situational word learning paradigms, words and objects tend to be presented at a regular pace during training and testing. However, in natural language learning, words and their referents tend to co-occur at a substantially faster rate with more noise and distractions. Thus, overall accuracy may not be a true reflection of children's ability to associate words with meanings given the processing bottleneck that is fast, continuous, online speech (Christiansen & Chater, 2016). Here, we examine the impact of autism on both cross-situational learning accuracy and response times, while also exploring the influence of individual differences in chronological age, receptive vocabulary, and non-verbal intelligence.
It is also important to recognise that identification of meaning is just one component of word learning (McMurray et al., 2012) and it is currently unknown how effectively children with ASD are able to retain and generalise new words that are acquired through tracking cross-situational statistics. Unlike referent selection, relatively little is known about lexical retention in ASD. Norbury, Griffiths, and Nation (2010) found that high-functioning children with ASD were unimpaired in their ability to retain word-object associations, but they remembered significantly less semantic information about referents over time (see also Henderson, Powell, Gareth Gaskell, & Norbury, 2014). Bedford et al. (2013) discovered that ostensive social feedback enhanced retention of novel word meanings for TD 2-year-olds, but not those at high-risk of developing ASD. Together, these studies suggest that consolidation of word-referent relationships may be atypical in ASD. Regarding generalisation, many children with ASD struggle to extend information and behaviours across different contexts (Happé & Frith, 2006), and this difficulty can manifest in the extension of verbal labels to novel referents. Tek, Jaffery, Fein, and Naigles (2008) and Potrzeba, Fein, and Naigles (2015) found that preschoolers with ASD did not reliably extend novel labels on the basis of shape. Hartley and Allen (2014) reported that language-impaired children with ASD frequently extended labels to novel referents on the basis of shape (a categorydefining cue) or colour (a category-irrelevant cue). These findings suggest that early lexical development in ASD could be hindered by the overgeneralisation of labels based on category-irrelevant cues.
Only a single study to date has explored how children with ASD identify, retain, and generalise new words within the same experimental task. In, Hartley et al. (2019), language-delayed children with ASD and TD children (ASD M age =~8 years; TD M age =~5 years) matched on receptive vocabulary (TD M =~5 years; ASD M =~5 years) identified the names of novel objects via mutual exclusivity in a standard fast mapping task, before completing tests of retention and generalisation after a 5-min delay. Participants in study 1 received no feedback while participants in study 2 received either social feedback (the experimenter looked and pointed towards the target object, while repeating its label) or non-social feedback (the experimenter activated a flashing light underneath the target object and repeated its label while looking towards the participant) after selecting referents in the fast mapping task. Across studies, both children with ASD (84-97%) and TD children (98-100%) fast mapped novel wordreferent pairings at above-chance (33%) accuracy (although the between-group difference was statistically significant), but were relatively less accurate on measures of delayed retention (ASD: 41-57%; TD: 42-44%) and generalisation (ASD: 41-59%; TD: 34-46%). Surprisingly, children with ASD who received social feedback during fast mapping achieved the most accurate retention and generalisation, outperforming TD controls in the same condition and children with ASD who received non-social feedback or no feedback. The disparity between referent selection and delayed test performance in both populations emphasises the importance of studying word learning as a system of integrated processes that occur over short and long timescales. Based on these results, Hartley et al. (2019) propose that fundamental word learning mechanisms, and the relationships between them, are not qualitatively impaired by ASD (when expectations are based on language development rather than chronological age).
If Hartley et al.'s (2019) proposal is correct, then we may anticipate that children with ASD would perform as accurately as vocabularymatched TD controls on measures of cross-situational word learning and subsequent retention/generalisation. Moreover, we may expect that children with ASD could benefit from the provision of social cues that reduce referential ambiguity. However, potential deficits in retention may be obscured in fast mapping studies because the likelihood of TD children forgetting labels is relatively high. Between-population differences may be more likely to emerge following cross-situational learning where the increased provision of word-object statistical associations could provide a more robust scaffold for retention in typical development. Hartley et al. (2019) hypothesise that children with ASD benefitted from social feedback because it was provided after they had mapped word-referent relationships. Having already figured out what the novel words represented, children did not need to infer communicative intent from the experimenter's cues. By contrast, ASD may impact cross-situational word learning by reducing the likelihood that children utilise available social cues to overcome referential ambiguity, diminishing the accuracy and speed of encoding correct word-referent relationships. Indeed, it is possible that children with ASD may follow social cues but fail to use them as a basis for establishing referential meaning (Gliga et al., 2012). Should this be the case, it might be possible to increase uptake of cross-situational linguistic input by providing highly-salient non-social attentional cues that direct children's attention to target referents during training.
This study is the first to systematically investigate whether children with ASD and concomitant language delays are capable of mapping, retaining, and generalising new words based on cross-situational statistics. By examining cross-situational word learning as a multi-stage process we identify how mechanisms are related in typical development and highlight the nature, and location(s), of potential weaknesses in ASD. To assess how individual differences influence these processes, we recruited children with ASD with varying degrees of language delay (relative to their chronological age). Our samples had receptive vocabulary age equivalents around 5-6 years on average. We targeted a similar age group as Suanda, Mugwanya, and Namy's (2014) study of cross-situational word learning in typical development, as this enables us to understand the processes involved at the point where children's vocabulary is beginning to rapidly increase. Furthermore, we know that children with ASD with these language skills are able to identify and retain words through fast mapping , but we do not know if they are successful when learning under ambiguous circumstances via cross-situational statistics.
In Study 1, children with ASD and TD controls matched on receptive vocabulary were presented with pairs of unfamiliar objects (via a touchscreen tablet) and instructed to identify the referent of a novel word. These exposures were intentionally ambiguous (i.e., there was no cue to which object was the correct referent), but children could disambiguate word-object pairings based on cross-trial statistics. After a 5minute delay, children's retention and generalisation of the novel words were assessed. Based on previous cross-situational learning studies (e.g. Venker, 2019), we predicted that children with ASD would map novel word-referent pairings as accurately as vocabulary-matched controls during training. Based on Hartley et al. (2019), we tentatively predicted that the two populations would respond similarly on delayed tests of retention and generalisation. As theoretical accounts of retention emphasise the importance of accumulating word-object correspondences over multiple exposures (e.g. McMurray et al., 2012), we anticipated that cross-situational learning would elicit more accurate retention and generalisation than fast mapping. We also measured participants' response times as there is evidence that reaction times for linguistic stimuli may be slowed for children with ASD in comparison to TD children, despite similar mapping accuracy (e.g. Bavin, Kidd, Prendergast, & Baker, 2016). Consequently, we predicted that children with ASD may be slower to identify correct referents than TD controls.
In Study 2 we tested whether cross-situational word learning improves when referential ambiguity is potentially diminished by social and non-social attentional cues. Importantly, this research will advance theoretical understanding of language acquisition by showing how cross-situational word mapping relates to retention and generalisation in both typical development and ASD.

Participants
Participants were 15 children with ASD (13 males, 2 females; M age = 8.78 years; SD = 2.92 years) recruited from specialist schools, and 16 TD children (7 males, 9 females; M age = 5.52 years; SD = 1.12 years) recruited from mainstream schools and nurseries (see Table 1). Groups were matched on receptive vocabulary as measured by the British Picture Vocabulary Scale (BPVS; ASD: M age equivalent = 5.35 years, SD = 2.17; TD: M age equivalent = 5.84 years, SD = 1.20; Dunn, Dunn, Whetton, & Burley, 1997), t(29) = 0.78, p = .44. Receptive vocabulary was selected as the primary matching criterion because it reflects children's ability to learn word-referent relationships -the linguistic ability at the core of this research (Bion, Borovsky, & Fernald, 2013;Kalashnikova, Mattock, & Monaghan, 2016). All children had normal or corrected-to-normal colour vision. Children with ASD were previously diagnosed by a qualified educational or clinical psychologist, using standardised instruments (i.e. Autism Diagnostic Observation Scale and Autism Diagnostic Interview -Revised; Lord, Rutter, & Le Couteur, 1994;Lord, Rutter, DiLavore, & Risi, 2002) and expert judgement. Diagnoses were confirmed via the Childhood Autism Rating Scale 2 (CARS; Schopler, Van Bourgondien, Wellman, & Love, 2010), which was completed by each participant's class teacher (ASD M score = 36.43, SD = 6.42; TD M score = 15.28, SD = 0.41). Children with ASD were significantly older (t(29) = 4.16, p < 001, d = 1.48), and had significantly higher CARS scores (t (29) = 13.09, p < .001, d = 4.65) than the TD children. Children's non-verbal intellectual abilities were measured using the Leiter-3 (Roid, Miller, Pomplun, & Koch, 2013). The mean IQ for the ASD group was 83.73 (SD = 19.54), and the mean IQ of the TD group was significantly higher at 102.38 (SD = 6.16), t(29) = 3.63, p = .001, d = 1.27. However, the groups' raw scores on the Leiter-3 did not significantly differ (ASD M score = 70.93, SD = 18.89; TD M score = 69.69, SD = 13.51), t(29) = 0.21, p = .83, indicating that their non-verbal cognitive abilities were similar at time of testing (when age was not considered). In comparison to most prior word learning research with TD children, our samples were older and more advanced in terms of receptive language development. Thus, we can be confident that any between-population differences in word learning are not the consequence of insufficient language skills in the ASD samples. All procedures performed in this study (Study 1 and Study 2) involving human participants were in accordance with the ethical standards of institutional and national research committees. Informed consent was obtained from parents/caregivers prior to children's participation.
Fifteen TD children and 13 children with ASD who participated in this study also participated in Hartley et al. (2019). The experimental tasks for that study and this study were administered one week apart (in a random order), so participants were not re-tested on the battery of standardised assessments. The two studies differed in terms of the novel words and objects used, how the experimental tasks were delivered (presenting real objects vs. a tablet computer), and the learning mechanisms being studied (fast mapping vs. cross-situational learning). Given these differences, and the time delay between administration of the two studies, we were confident that children's performance would not be influenced by interference effects.

Materials
Stimuli included six novel words selected from the NOUN database (Horst & Hout, 2016) and colour photographs of twelve unfamiliar objects. The novel words (virdex, nelby, gasser, blicket, fiffin, teebu) were recorded by a female speaker. The pictures of unfamiliar objects were divided into six sets. Each set included a 'named object' (which was paired with a novel word during training) and a 'shape match' (a differently-coloured variant of the named object that was presented in generalisation trials). Each shape match was the same colour as a named object from a different set, and therefore served as a 'colour match' in generalisation trials. The relationship between shape and colour of object stimuli within and across sets is illustrated in Fig. 1. Colour photographs of 12 familiar objects (cat, shoe, bed, tree, cow, banana, duck, dog, apple, flower, car, ball) were used in warm-up trials. Familiar objects were selected on the basis that most children understand their linguistic labels by 15-months (Fenson, Marchman, Thal, Reznick, & Bates, 2006). All stimuli were presented via a Microsoft Surface Pro 4 tablet computer. Pictures of objects were approximately 4 cm × 4 cm in size and separated on screen by 10 cm. As similar stimuli were used in Hartley et al. (2019), we were confident that our materials had the potential to be learnable, and generalisable, by the participants in this study.

Procedure
Participants were tested individually in their own schools and were accompanied by a familiar adult when required. Children were administered the BPVS and Leiter-3 over a series of sessions before completing the word learning task after a one-or two-week break. The word learning task consisted of the following stages delivered in a fixed order: 1. Warm-up trials, 2. Training trials, 3. Delay, 4. Retention and generalisation trials. The word learning task was delivered on the tablet computer, which was placed in front of the child on a table. The experimenter sat quietly while the participant was engaged in the task and offered verbal praise for attention and good behaviour.

Warm-up trials.
To familiarise children with the format of the task and its response method, children completed four warm-up trials. Children were presented with two pictures of familiar objects positioned to the left and right of the tablet's screen, and heard a female voice (presented through the tablet's speakers) say "Look!" in an excited tone. After viewing the pictures for 3 s, the same voice asked them to identify one of the objects ("Which is the cat? Touch the cat"). Following a correct response, the experimenter issued praise and reinforced the identity of the object (e.g. "Great job, that is the cat!"). Following an incorrect response, the experimenter provided corrective feedback (e.g. "Actually, this is the cat. Can you touch the cat? Well done, you touched the cat!"). The requested object was positioned to the left on two trials, and the right on two trials. Trial order was randomised for each participant. Participants immediately progressed to the first training trial after completing the final warm-up trial.

Training trials.
This stage provided a context in which children could actively learn six novel word-object pairings. Children completed 48 training trials, administered in two blocks of 24 trials (order counterbalanced across participants). The length of this training stage was based on Smith and Yu's (2008) successful use of 30 trials with TD 12-month-olds. To minimise the likelihood of fatigue, children received a 30-second break after completing the first block of training trials.
On each training trial, children were presented with photographs of two unfamiliar objects positioned to the left and right of the tablet's screen accompanied by a female voice directing them to "Look!" (see Fig. 2). After viewing the pictures for 3 s, the same voice asked them to identify one of the objects (e.g. "Which is the blicket? Touch the blicket"). Following the child's response, the pictures disappeared and the next trial was presented. Individually, these trials are ambiguous (either object could be the word's referent), but cross-situational statistics enabled children to disambiguate the correct word-referent pairings. For example, on the first trial the child might hear the word "blicket" while viewing an X and a Y. At this point, either X or Y could be the target referent. Then, on a subsequent trial, the child experiences "blicket" paired with a Z and an X. If the child is sensitive to cross-trial co-occurrence frequencies, they should correctly map "blicket" onto the X -the object it is experienced with consistently. If a child showed reluctance to respond due to uncertainty, the experimenter encouraged them to guess.
Words occurred an equal number of times in each block and wordobject pairings were randomised for each participant. In each block, every object occurred as a target four times and as a foil four times (thus, each target word-object pairing was experienced eight times in total). The target object appeared equally often on the left and right of the screen. The order of trials within blocks was randomised. Children's active learning was tracked by recording the accuracy of their selections. If a child refused to respond on a particular trial, the experimenter could progress to the next trial without selecting a referent object on their behalf by pressing a discrete asterisk button in the upper right-hand corner of the screen. However, the experimenter did not need to use this function in either Study 1 or Study 2.

Delay.
After completing the last training trial, the child played with the experimenter for 5 min. The tablet was not used during this period.

Retention and generalisation trials.
To re-engage children's attention to the task, three new warm-up trials with pictures of familiar objects were administered. Here children were presented with three objects per trial positioned to the left, middle, and right of the screen. The requested object appeared in each of the three possible locations on one trial, and trial order was randomised. Thus, children were familiarised with the new "middle" location and selected targets in each location before the retention and generalisation trials.
Children then completed 12 retention trials and 12 generalisation trials (see Fig. 3). Each word was tested on two retention trials and two generalisation trials. For retention trials, three of the named objects were presented on the tablet's screen (positioned to the left, middle, and right). After viewing the pictures for 3 s, children heard a female voice asking them to identify one of the objects (e.g. "Which is the blicket? Touch the blicket"). The purpose of these trials was to assess children's memory of the exact word-referent pairings that were experienced during the training trials. For generalisation trials, children were C. Hartley, et al. Cognition 200 (2020) 104265 presented with differently-coloured variants of named objects (i.e. shape/colour matches). After viewing the pictures for 3 s, children heard a female voice asking them to identify one of the objects (e.g. "Which is the blicket? Touch the blicket"). Crucially, one object was the same shape as the label's original referent, but different in colour (shape match), while another object matched the original referent on colour, but not shape (colour match). The third object, a shape match for a different named object, served as a distracter. The purpose of these trials was to assess whether children's extension of labels to novel referents is systematically influenced by shape. Importantly, all choice objects were of equal familiarity in both retention trials (all had been named in the training trials) and generalisation trials (all were novel objects, although their shapes and colours were introduced in the training trials). Each object was used as a foil on two retention trials and two generalisation trials. To provide the necessary level of control when presenting stimuli, object groupings were fixed. The order of trials within blocks was pseudo-randomised such that the same word was not tested on consecutive trials (e.g. "blicket" retention trial followed by "blicket" generalisation trial), and no more than two trials of the same type occurred consecutively. The target object appeared equally often in each screen location, and never occurred in the same location on more than two consecutive trials. Pictures of objects were exactly the same size as those presented during the training trials. The accuracy and speed of children's responses were recorded.

Training accuracy
Children's responses on individual training trials were scored as correct (1) or incorrect (0). The likelihood of responding correctly by chance was 50%. Descriptive statistics for training accuracy are presented in Fig. 4. We investigated the influence of Population and Block on children's training trial accuracy by conducting a series of generalised linear mixed-effects models (GLMM). The analysis modelled the probability (log odds) of children's response (correct/incorrect) and all models were conducted using the glmer function from the lme4 package in R (Bates, Maechler, Bolker, & Walker, 2015). Block was contrast coded as −0.5 (block 1) and +0.5 (block 2). Population was coded as 0 (TD) and 1 (ASD).
We started with a baseline model containing by-participant random intercepts and slopes for Block and by-word random intercepts and slopes for Population x Block (note that Population was a betweensubjects variable and so cannot be entered as a random slope by-participant; see Barr, Levy, Scheepers, & Tilly, 2013). Adding Population as a fixed effect did not improve fit in comparison to the baseline model (p = 1.00). The addition of Block yielded a significant improvement in fit (χ 2 = 7.08, p = .008). The inclusion of both Block and Population (p = .77) or the Block x Population interaction (p = .96) did not further improve fit. Therefore, a model containing only Block as a fixed effect provided the best fit to the observed data (see Table 2). These results suggest that both children with ASD and TD children responded with significantly greater accuracy on training trials in Block 2 than Block 1. The populations did not differ significantly in their ability to map novel word-referent relationships based purely on cross-situational statistics.
In order to test whether there was evidence for no difference between the children with ASD and TD children on training accuracy, we determined the Bayes Factor (BF) from the difference in the Bayesian Information Criterion values for models including and excluding Population as a fixed effect (Wagenmakers, 2007). Using this calculation, BF values > 3 are conventionally taken to provide evidence for the hypothesis that populations are similar rather than distinct in their performance (Raftery, 1995). BF values lower than 0.3 are taken to provide evidence for a difference between the populations. BF values between 0.3 and 3 are indeterminate regarding evidence for similarities or differences between populations. For the effect of Population on training accuracy, BF = 34.8, providing strong evidence that the training accuracy of children with ASD and TD children was similar. For the interaction between Block and Population, BF = 38.5, indicating strong evidence for no interaction between these variables.
We also examined whether individual differences in receptive vocabulary, nonverbal intelligence, and chronological age within each population predicted additional variability in training accuracy. Although the populations did not differ on accuracy, it is possible that . 3. Example retention and generalisation trials in Studies 1 and 2; (a) target object, (b) non-target objects, (c) shape match for target object, (d) colour match for target object, (e) shape/colour match for non-target object.   Hartley, et al. Cognition 200 (2020) 104265 different factors contributed to their successful performance (individuals with ASD can attain "normal" performance on psychological tasks via atypical routes and compensatory strategies; Happé, 1995;Norbury et al., 2010). As such, the following analyses were conducted on data from the ASD and TD groups separately. We started with models that mirrored the final model detailed above containing Block as a fixed effect, with by-participant and by-word random intercepts and slopes for Block. For children with ASD, the addition of receptive vocabulary (age equivalent in months) significantly improved fit (χ 2 = 6.55, p = .01). Adding nonverbal intelligence (raw score) afforded a borderline significant improvement in fit (χ 2 = 3.65, p = .056). Adding chronological age did not significantly improve fit (p = .49). Including both receptive vocabulary and nonverbal intelligence as fixed effects yielded a borderline significant improvement over the model containing only nonverbal intelligence (χ 2 = 3.70, p = .055), but did not differ from the model containing only receptive vocabulary (p = .37). These comparisons indicate that the training accuracy of children with ASD was predicted by receptive vocabulary in addition to Block -children with larger vocabularies were more likely to respond correctly (see Table 3).
For TD children, the addition of receptive vocabulary (p = .83), nonverbal intelligence (p = .49), or chronological age (p = 1.00) did not improve predictive fit when compared with the model containing only Block (which was a significant effect; Z = 1.99, p = .047). Thus, the accuracy of TD children on training trials was not reliably influenced by variability in the measured participant traits. Additional models confirmed that population x receptive vocabulary (p = .15, BF = 12.8), population x nonverbal intelligence (p = .13, BF = 14.2), and population x chronological age (p = .91, BF = 36.6) interactions were non-significant, suggesting that individual differences in these abilities did not predict different patterns of performance in training accuracy across groups, with the Bayes Factor values highlighting evidence for no interaction.
For additional analyses investigating relations between response times and accuracy (for Study 1 and Study 2), please see Supplementary materials.

Training response times
Children's response times for correctly-answered training trials were analysed using linear mixed-effects models. 2 We calculated the average correct response time for each population, in each block. We removed outliers that were over three standard deviations above the mean for the sub-group (e.g. children with ASD in block 1) to rule out trials on which participants were likely distracted. For children with ASD, 479 of 490 (97.76%) correct responses were included in our analyses. For TD children, 529 of 541 (97.78%) correct responses were included. With outliers excluded, the mean correct response times for each population are reported in Fig. 5.
We started with a baseline model containing by-participant random intercepts and slopes for Block and by-word random intercepts and slopes for Population x Block, however, it failed to converge. The baseline model, and all subsequent models, converged when the random effects structure was simplified to by-participant intercepts and slopes for Block and by-word intercepts for Block and Population as main effects (rather than their interaction). Adding Block as a fixed effect did not improve fit in comparison to the baseline model (p = .65). Adding population yielded a borderline significant improvement in fit (χ 2 = 3.74, p = .053). Models including both Block and Population (p = .14, BF = 134.3) as main effects or the Block x Population interaction (p = .25, BF = 3827.6) did not differ from the baseline model. These comparisons suggest that a model containing only Population as a fixed factor provides the best fit to the data, suggesting that children with ASD were slower to indicate the correct referent on training trials than TD children (see Table 4).
Our next series of models investigated whether individual differences in receptive vocabulary, nonverbal intelligence, and chronological age predicted additional variability in children's correct response times. Mirroring our analyses for individual differences in training accuracy, these models were conducted on data from the ASD and TD groups separately (so Population, the only fixed effect in the preceding model, was not included as a random slope). Please refer to Supplementary materials for full details of the model building processes for these analyses.
For children with ASD, a model containing chronological age, receptive vocabulary and nonverbal intelligence as fixed effects provided the best fit to the observed data (see Table 5). These results indicate that children with ASD who were older and had higher receptive vocabulary scores were quicker to select the correct referent on training trials. However, higher nonverbal intelligence was associated with slower correct responses.
For TD children, a model containing only chronological age as a fixed effect provided the best fit to the observed data (see Table 6). These results indicate that TD children who were older were quicker to select the correct referent on training trials. However, it is noteworthy that chronological age was correlated with receptive vocabulary (R = 0.87, p < .001) and raw nonverbal intelligence score (R = 0.82, p < .001), so effects of age should be interpreted cautiously and with these variables in mind.

Retention & generalisation accuracy
Children's responses on individual retention and generalisation trials were scored as correct (1) or incorrect (0). The likelihood of responding correctly by chance was 33%. Descriptive statistics for  retention and generalisation accuracy are presented in Fig. 4. We conducted a series of GLMMs to explore the effects of Population, Trial Type, and Training Accuracy. Population was coded as 0 (TD) and 1 (ASD), Trial Type was coded as 0 (retention) and 1 (generalisation), and Training Accuracy was a numerical value between 0 and 24 corresponding to the number of correct responses made by the child in the second block of training trials. Please refer to Supplementary materials for full details of the model building process for this analysis.
A model containing only Training Accuracy as a fixed effect provided the best fit to the observed data (see Table 7). These results indicate that children who achieved greater accuracy in Block 2 of the training trials tended to respond correctly on more retention and generalisation trials. No differences between populations or trial types were detected.
As for training accuracy, we explored whether individual differences within each population predicted additional variability in retention and generalisation trial accuracy. The following analyses were conducted on data from the ASD and TD groups separately. We started with models that mirrored the final model detailed above containing Training Accuracy as a fixed effect, with by-participant and by-word intercepts and slopes for Training Accuracy x Trial Type.
For children with ASD, including receptive vocabulary (p = .30), nonverbal intelligence (p = .57), or chronological age (p = 1.00) did not significantly improve predictive fit when added to Training Accuracy (which was a significant effect; Z = 5.17, p < .001). The same was true for TD children -adding receptive vocabulary (p = 1.00), nonverbal intelligence (p = .24), or chronological age (p = 1.00) did not significantly improve fit when added to Training Accuracy (a significant effect; Z = 3.01, p = .003). This finding is consistent with Vlach and DeBrock's (2019) study on receptive vocabulary and retention of learning in TD children. These results suggest that accuracy on retention and generalisation trials was not reliably influenced by receptive vocabulary, nonverbal intelligence, or chronological age in either group. Additional models confirmed that population x receptive vocabulary (p = .78, BF = 5.2), population x nonverbal intelligence (p = .41, BF = 10.0), and population x chronological age (p = .95, BF = 66.7) interactions were not significant.

Retention & generalisation response times
Children's response times for correctly-answered retention and generalisation trials were examined using linear mixed-effects models. Outliers were identified and removed in the same way as for the training trials, and the mean correct response times for each population are reported in Fig. 5. For children with ASD, 228 of 232 (98.28%) correct responses were included in our analyses. For TD children, 238 of 245 (97.14%) correct responses were included.
We started with a baseline model containing by-participant random intercepts and slopes for Trial Type and by-word random intercepts and slopes for Population x Trial Type, however, it failed to converge. The baseline model, and all subsequent models, converged when the random effects structure was simplified to by-participant intercepts and slopes for Trial Type and by-word intercepts for Population. The addition of Population (t = 2.28, p = .03) significantly improved fit (χ 2 = 4.80, p = .029). Adding Trial Type did not improve fit (p = .84). The inclusion of both Population and Trial Type (p = .81, BF = 21.1) or the Population x Trial Type interaction (p = .97, BF = 1200.6) did not further improve fit. Thus, a model containing only Population as a fixed effect provided the best fit to the observed data (see Table 8), indicating that children with ASD took longer to generate correct responses on retention and generalisation trials than TD children.
Finally, we investigated whether individual differences in receptive vocabulary, nonverbal intelligence, and chronological age predicted additional variability in children's correct response times for retention and generalisation trials. These models were conducted on data from the ASD and TD groups separately. Please refer to Supplementary materials for full details of the model building processes for these analyses.
For children with ASD, the observed data were best predicted by a baseline model with a significant intercept (t = 7.32, p < .001). The inclusion of fixed effects did not improve fit. These results suggest that correct response times for children with ASD on retention and generalisation trials were not significantly influenced by receptive vocabulary, nonverbal intelligence, or chronological age.      Hartley, et al. Cognition 200 (2020) 104265 For TD children, a model containing only chronological age as a fixed effect provided the best fit to the observed data (see Table 9). These results indicate that TD children who were older were quicker to select the correct referent on retention and generalisation trials.

Discussion
Study 1 investigated cross-situational word learning as an integrated system, exploring the relationship between mapping, retention, and generalisation of novel labels in TD children and children with ASD. Both populations achieved superior accuracy on training trials in Block 2 than Block 1, demonstrating their ability to overcome referential ambiguity based purely on statistical co-occurrences between words and objects. Participants were clearly sensitive to the gradual accrual of associative information as training proceeded. Training trial accuracy was predicted by receptive vocabulary in children with ASD, but not TD children. Impressively, both populations achieved 60-65% accuracy on delayed retention and generalisation trials, significantly exceeding the chance rate of 33%. Retention and generalisation accuracy was predicted by children's accuracy in Block 2 of the training trials, but was not predicted by individual differences in either population. While the populations did not differ on training or test accuracy, children with ASD were slower to indicate correct referents.
The strong performance of children with ASD on the training trials suggests that cross-situational word mapping accuracy is unimpaired (at a group level) when samples are controlled for language comprehension. This finding is congruent with prior evidence that children with ASD can employ statistical learning mechanisms to correctly derive word meanings (McGregor et al., 2013;Venker, 2019), segment speech (e.g. Haebig et al., 2017), and process visual stimuli (e.g. Foti et al., 2015). As in McGregor et al. (2013), we observed that crosssituational mapping in the ASD group was predicted by receptive vocabulary, but not nonverbal intelligence (once receptive vocabulary was accounted for). This finding may indicate that the ability to disambiguate word meanings using statistical information is strongly related to vocabulary development in ASD. It also suggests that children with ASD who have profoundly impaired language acquisition may have deficits in cross-situational word learning. Consistent with previous studies, TD children's cross-situational word learning accuracy was not significantly related to receptive vocabulary (e.g. Vlach & DeBrock, 2019).
An important feature of Study 1 is that we included tests of retention and generalisation to explore how cross-situational mapping supports longer-term word learning as compared to immediate tests of recognition. In line with our predictions, TD children and children with ASD performed very similarly and with good accuracy on both trial types. Children's accurate recollection of word-referent pairings after a 5-minute delay indicates a link between cross-situational mapping and longer-term learning. It is well-documented that TD children often forget word-referent pairings that are acquired through fast mapping after 5 min. For example, after achieving 91-100% accuracy on Hartley et al.'s (2019) fast mapping task (study 1), children with ASD responded correctly on~46% of retention and generalisation trials while TD children responded correctly on~39% of retention and generalisation trials. This phenomenon indicates that correct identification of meaning in the context of a naming event is not sufficient to guarantee retention, suggesting that these two aspects of word learning are supported by distinct mechanisms (Horst & Samuelson, 2008). McMurray et al.'s (2012) dynamic associative account posits that word learning requires both 'fast' processes that facilitate disambiguation of referential meaning and 'slow' processes that enable storage and retrieval of lexical representations. According to this theory, slow learning involves children gradually accumulating knowledge of how words map onto objects or actions in their environment over multiple learning instances. Our data are consistent with this theory of slow learning driven by associative learning from cross-situational statistics -both groups responded correctly on 62-65% of retention and generalisation trials, with learning being predicted by cross-situational training accuracy.
However, response times for referent selection were marginally different between the groups during training and were significantly different for the retention and generalisation test trials. Children with ASD were slower than TD children for both tasks. Slow learning, in McMurray et al.'s (2012) model, refers to the gradual accumulation of information about word-referent pairings over multiple trials. However, for this to occur in natural language learning, children must quickly access words and process the potential referents in the environment across fast-moving situations (Christiansen & Chater, 2016). If the child's processing of this information is slowed (as reflected in response times) then the child may be disadvantaged in slow learning of words. We return to this issue in the General discussion.
Still, the ability of both groups to acquire, retain, and generalise word-referent mappings suggests that repeatedly overcoming the challenge of referential ambiguity elicited the formation of robust relationships that may be less susceptible to decay than representations established through fast mapping (cf. Axelsson et al., 2012;Horst & Samuelson, 2008). However, it is unclear whether our participants' retrieval was facilitated by (a) increased opportunities to pair words with referents during training (i.e. greater statistical input compared to fast mapping tasks), or (b) the increased cognitive effort required to track cross-situational associations between words and objects in the absence of ostensive cues or conditions that enable the use of mutual exclusivity (at least from the outset).
Despite substantial delays in language development (relative to chronological age), accuracy of cross-situational word-referent mapping and its relationship to slow learning was not qualitatively atypical in the ASD group (mirroring the results of Hartley et al.'s (2019) fast mapping study). At a group level, their accuracies on training, retention, and generalisation trials closely resembled those of TD children matched on receptive vocabulary and raw nonverbal cognitive ability. However, between-population differences may emerge when comparing how children with, and without, ASD utilise attentional cues in the service of cross-situational word learning.
In Study 2, we investigate whether cross-situational word learning improves when referential ambiguity is potentially reduced. Children with ASD and TD children completed variations of our cross-situational word learning task that incorporated social or non-social attentional cues during training. Half of each population received social cues (head turn and gaze shift towards the target) and half received non-social cues (the target was highlighted by a brightly coloured border) that focussed children's attention on the correct referent for each word.
In accordance with MacDonald et al. (2017), we expected that TD children and children with ASD in these conditions would attain superior training accuracy than children in Study 1 (where no cues were provided). However, we expected ASD to have differing impacts on children's performance in the two cue conditions. Several studies have shown that children with ASD and concomitant language impairments are less likely to follow an adult's head turn in comparison to TD controls (Leekam, Baron-Cohen, Perrett, Milders, & Brown, 1997 Leekam et al. (1998) reported that 59% of children Table 9 Summary of the fixed effects in the final linear mixed-effects model of individual differences in response times on correctly-answered retention and generalisation trials for typically developing children in Study 1, predicted by chronological age. with ASD (aged 8 years on average) followed a head turn cue in comparison to 88% of TD 4-year-olds matched on verbal mental age. Thus, we predicted that the training accuracy of children with ASD may be lower than that of TD controls in the social cue condition. However, we anticipated no between-population differences in training accuracy in the non-social cue condition as we had no reason to believe that children with ASD would be any less likely to notice the appearance of the brightly coloured border than TD children.
Regarding retention accuracy, we reasoned that reducing referential ambiguity during training could have different consequences depending on why learning was strong in Study 1. If retention is facilitated by frequently pairing words with their correct referents, children should respond more accurately when attentional cues are provided during training. Alternatively, if retention is facilitated by tracking and disambiguating meaning across multiple exposures, children may respond less accurately on retention and generalisation trials in the cue conditions. The attentional cues serve to reduce referential ambiguity which, in turn, reduces the cognitive effort required to establish correct word-referent relationships. However, this reduction in effort may result in encoding word-referent representations that are less robust and increasingly susceptible to decay. Importantly, this study will reveal whether the provision of attentional cues enhances crosssituational word learning and will highlight whether the benefits of cue types differ across populations.

Participants
Participants were 33 children with ASD (26 males, 7 females; M age = 8.66 years; SD = 2.39) recruited from specialist schools, and 32 TD children (21 males, 11 females; M age = 5.40 years; SD = 1.30) recruited from mainstream schools and nurseries. All children had normal or corrected-to-normal colour vision. None of these children participated in Study 1. Children with ASD were previously diagnosed by a qualified educational or clinical psychologist, using standardised instruments (i.e. Autism Diagnostic Observation Scale and Autism Diagnostic Interview -Revised; Lord et al., 1994;Lord et al., 2002) and expert judgement. Diagnoses were confirmed via the Childhood Autism Rating Scale 2 (Schopler et al., 2010), which was completed by each participant's class teacher.
Importantly, these samples of children with ASD and TD controls had very similar characteristics to the groups in Study 1 (see Table 1), enabling direct comparison across the two studies. Full comparisons between populations and conditions are provided in Appendix A. There were no significant differences in BPVS age equivalent or Leiter-3 raw score between any groups. Samples of children with ASD did not differ on chronological age or CARS score. Samples of TD children did not differ on chronological age, but the mean CARS score for TD children in Study 1 (M = 15.38) was significantly higher than that of TD children in the Non-social Cue condition of Study 2 (M = 15.03), t(30) = 2.35, p = .026. However, this difference is unlikely to be clinically meaningful given that the lowest possible score on the CARS is 15 and the samples do not differ on any other metric.
Thirty TD children and all of the children with ASD who participated in this study also participated in Hartley et al. (2019). The experimental tasks for that study and this study were administered one week apart (in a random order). The two studies differed in terms of the novel words and objects used, how the experimental tasks were delivered (presenting real objects vs. a tablet computer), and the learning mechanisms being studied (fast mapping vs. cross-situational learning). As such, we were confident that children's responding would not be influenced by interference effects.

Materials
Stimuli were exactly the same novel words and colour photographs of objects (familiar and unfamiliar) as used in Study 1. Two short videos were created depicting a female face looking forward before turning to look left or right. All stimuli were presented via a Microsoft Surface Pro 4 tablet computer.

Procedure
The paradigm was exactly as described for Study 1 with some adjustments to the warm-up trials and training trials due to introducing social or non-social cues. All other stages, and the order of administration, remained the same.

Social cue condition.
Children completed four warm-up trials followed by 48 training trials, administered in two blocks of 24 trials. The stimuli and format were as previously described, with one exception: a female face appeared in between each pair of objects. On appearance, the face was looking directly forwards, and children heard a female voice directing them to "Look!". After 3 s, the face turned to one of the objects (the target named object) and the same voice asked them to identify one of the objects (e.g. "Which is the blicket? Touch the blicket"; see Fig. 6). Following the child's response, the pictures disappeared and the next trial was presented. Thus, the face's direction of gaze and corresponding head turn provided cues to the referent of each word, effectively reducing the ambiguity of each individual training trial (see Wu, Gopnik, Richardson, & Kirkham, 2010).

Non-social cue condition.
As for the social cue condition, children completed four warm-up trials followed by 48 training trials (administered in two blocks). On each trial, children were presented with photographs of two objects positioned to the left and right of the tablet's screen accompanied by a female voice saying "Look!". A star shape also appeared between the two objects. We reasoned that children in the social cue condition would look at the face before the head turn, dividing their attention between the three elements on screen. The star appeared in the initial portion of training trials in the non-social condition to divide children's attention in a similar way before experiencing the directional cue. After viewing the pictures for 3 s, the star disappeared, and one of the objects (the target named object) was "highlighted" by a green border and the same voice asked them to identify one of the objects (e.g. "Which is the blicket? Touch the blicket"; see Fig. 6). Following the child's response, the pictures disappeared and the next trial was presented. Here, the green border provided a non-social attentional cue to the referent of each novel word.

Training accuracy
Children's responses on individual training trials were scored as C. Hartley, et al. Cognition 200 (2020) 104265 correct (1) or incorrect (0). The likelihood of responding correctly by chance was 50%. Descriptive statistics for training accuracy are presented in Fig. 7.
Please refer to Supplementary materials for full details of the model building processes for all Study 2 analyses. We first explored the effects of Population, Cue Condition, and Block on children's training trial accuracy via a series of GLMMs. Cue Condition was contrast coded as −0.5 (Non-social Condition) and +0.5 (Social Condition). Block was contrast coded as −0.5 (block 1) and + 0.5 (block 2). Population was coded as 0 (TD) and 1 (ASD). The observed data were best predicted by a baseline model with a significant intercept (Z = 11.30, p < .001). The inclusion of fixed effects did not improve fit. These results suggest that children's response accuracy on training trials was not reliably influenced by diagnostic group, cue type, or training block. The Bayes Factor results for diagnostic group provided evidence that these groups were similar in accuracy for this task (see Supplementary materials).
Next, we examined whether children's cross-situational mapping was significantly enhanced by the provision of cues by comparing training results from Study 2 and Study 1. Population and Block were coded as described above. Cue Provision was coded 0 (No cue; these data were from children who participated in Study 1) and 1 (Cue provided; these data were from children allocated to the Social and Nonsocial conditions of Study 2). A model containing fixed effects of Cue Provision and Block provided the best fit to the observed data (see Table 10). These analyses show that both children with ASD and TD children responded with significantly greater accuracy on training trials when cues were provided. They also achieved greater accuracy in Block 2 than Block 1. The Bayes Factor values provided evidence that the diagnostic groups were similar in their accuracies.
As in Study 1, we assessed whether individual differences within each population predicted additional variability in training accuracy in the cue conditions. These models were conducted on data from the ASD and TD groups separately, with the cue conditions collapsed.
For children with ASD, a model containing only nonverbal intelligence as a fixed effect provided the best fit to the observed data (see Table 11). These results indicate that children with ASD with higher nonverbal intelligence were more likely to respond correctly on training trials in the cue conditions. For TD children, receptive vocabulary, nonverbal intelligence, and chronological age did not significantly improve predictive fit over the baseline model.

Training response times
Children's response times for correctly-answered training trials in the Social Cue and Non-social Cue Conditions were examined via linear mixed-effects models. Outliers were identified and removed as described for Study 1. For children with ASD, the analyses included 701 of 714 (98.18%) correct responses in the Social Cue condition and 704 of 712 (98.88%) correct responses in the Non-social Cue condition. For TD children, the analyses included 700 of 718 (97.49%) correct responses in the Social Cue condition and 718 of 731 (98.22%) correct responses in the Non-social Cue condition. With outliers excluded, the mean correct response times for each population are reported in Fig. 8.
A model containing fixed effects of Population and Block provided the best fit to the observed data (see Table 12). These results indicate that children with ASD were slower to respond correctly on training trials (as in Study 1), and both populations were slower to respond correctly in Block 2 than Block 1.

Social Cue Condition
Initial presentation (3s)

Referent request
"Which is the blicket? Touch the blicket."

Referent request
"Which is the blicket?   Hartley, et al. Cognition 200 (2020) 104265 We also examined whether children were quicker to generate correct responses on training trials when attentional cues were provided than when no cues were provided in Study 1. A model containing fixed effects of Population, Block, and Cue Provision provided the best fit to the observed data (see Table 13). These results show that Cue Provision significantly predicted variability in children's response times in addition to Population and Block; children in both groups identified correct referents on training trials significantly more quickly in Study 2 than in Study 1.
As in Study 1, we investigated whether individual differences in receptive vocabulary, nonverbal intelligence, and chronological age predicted additional variability in children's correct response times on training trials in the Social and Non-social Cue conditions. These models were conducted on data from the ASD and TD groups separately.
For children with ASD, a model containing Block and receptive vocabulary as fixed effects provided the best fit to the observed data (see Table 14). These results suggest higher receptive vocabulary was associated with faster correct responding on training trials in the Social and Non-social Cue conditions. For TD children, receptive vocabulary, nonverbal intelligence, and chronological age did not significantly improve predictive fit when added as fixed effects alongside Block (a significant effect, t = 2.81, p = .009).

Retention & generalisation accuracy
Children's responses on individual retention and generalisation trials were scored as correct (1) or incorrect (0). The likelihood of responding correctly by chance was 33%. Descriptive statistics for retention and generalisation accuracy are presented in Fig. 9. We analysed the effects of Population, Cue Condition, Trial Type, and Training Accuracy on children's retention and generalisation trial accuracy via a series of GLMMs (all variables coded as described previously).
A model containing Training Accuracy as a fixed effect provided the best fit to the observed data (see Table 15). As in Study 1, children who achieved greater accuracy in Block 2 of the training trials tended to respond correctly on more retention and generalisation trials. No differences between populations, conditions, or trial types were detected (for Population, BF = 6.8 × 10 10 ).
We then explored how children's retention and generalisation accuracy was affected by the provision of attentional cues during mapping in a combined analysis across Studies 1 and 2. A model containing Training Accuracy as a fixed effect provided the best fit to the observed  Fig. 8. Mean response times on correctly answered training trials for typically developing (TD) children and children with autism spectrum disorder (ASD) in the Social Cue and Non-social Cue conditions of Study 2. Error bars show ± 1 SE.

Table 12
Summary of the fixed effects in the final general linear mixed-effects model of children's response times on correctly-answered training trials in Study 2, predicted by population and block.    data (see Table 16). Provision of attentional cues during training did not reliably influence retention or generalisation accuracy and did not interact with Population or Trial Type. As for training accuracy, we assessed whether individual differences within each population predicted additional variability in retention and generalisation trial accuracy for children in the cue conditions. These models were conducted on data from the ASD and TD groups separately, with the cue conditions collapsed.
For children with ASD, a model containing Training Accuracy and chronological age as fixed effects provided the best fit to the observed data (see Table 17). Children with ASD were more likely to respond correctly on retention and generalisation trials in the Social and Nonsocial Cue conditions as their chronological age increased. However, it is worth noting that chronological age was correlated with receptive vocabulary (R = 0.59, p < .001) and raw nonverbal intelligence (R = 0.69, p < .001), so effects of age should be interpreted with these variables in mind.
For TD children, a model containing Training Accuracy and nonverbal intelligence as fixed effects provided the best fit to the observed data (see Table 18). TD children with higher nonverbal intelligence were more likely to respond correctly on retention and generalisation trials in the Social and Non-social Cue conditions.

Retention & generalisation response times
Children's response times for correctly-answered retention and generalisation trials were examined. Outliers were identified and removed as described for Study 1. For children with ASD, the analyses included 251 of 257 (97.67%) correct responses in the Social Cue condition and 235 of 241 (97.51%) correct responses in the Non-social Cue condition. For TD children, the analyses included 241 of 245 (98.37%) correct responses in the Social Cue condition and 237 of 244 (97.13%) correct responses in the Non-social Cue condition. With outliers excluded, the mean correct response times for each population are reported in Fig. 10.
The observed data were best predicted by a baseline model -the inclusion of fixed effects did not improve predictive power. These results suggest that children's response times on correctly-answered trials were not reliably influenced by diagnostic group, type of attentional cue presented during training, or type of trial.
We also tested whether children were faster to generate correct responses on retention and generalisation trials when attentional cues were provided during training than when no cues were provided in Study 1. A model containing Population as a fixed effect provided the best fit to the observed data (see Table 19), suggesting that children with ASD were slower to respond correctly on retention and generalisation trials when data sets for Study 1 and Study 2 were combined.
Finally, we examined whether individual differences in receptive vocabulary, nonverbal intelligence, and chronological age predicted additional variability in children's correct response times on retention and generalisation trials in the Social and Non-social Cue conditions. These models were conducted on data from the ASD and TD groups separately.
For children with ASD, a model containing only receptive vocabulary as a fixed effect provided the best fit to the observed data (see Table 20). These results suggest that children with higher receptive    Fig. 10. Mean response times on correctly answered retention and generalisation trials for typically developing (TD) children and children with autism spectrum disorder (ASD) in Study 2. Error bars show ± 1 SE.

Table 19
Summary of the fixed effects in the final general linear mixed-effects model of children's response times on correctly-answered retention and generalisation trials in Study1 and Study 2, predicted by population.  Hartley, et al. Cognition 200 (2020) 104265 vocabulary were quicker to identify correct referents on retention and generalisation trials in the Social and Non-social Cue conditions. For TD children, a model containing only chronological age as a fixed effect provided the best fit to the observed data (see Table 21). These results suggest that older TD children were quicker to identify correct referents on retention and generalisation trials in the Social and Non-social Cue conditions. However, chronological age was correlated with receptive vocabulary (R = 0.87, p < .001) and nonverbal intelligence raw score (R = 0.82, p < .001), so it is possible that these variables contributed to the predictive effect.

Discussion
Experiment 2 investigated whether cross-situational word learning is enhanced by reducing referential ambiguity and whether the benefits of social and non-social attentional cues differ for children with ASD and TD children. Both populations responded with high accuracy on training trials in Block 1 and Block 2 of each cue condition. Between-condition comparisons indicated that training accuracy was not reliably influenced by cue type, population, or block, although children with ASD took significantly longer to identify correct referents. Training accuracy significantly increased with the provision of cues (versus no cues in Study 1) and there was a significant relationship with nonverbal intelligence in children with ASD. Similar to Study 1, all groups achieved 61-66% accuracy on delayed retention and generalisation trials, with performance in both populations predicted by training accuracy, but not cue condition or provision of cues. Additional variability in retention and generalisation accuracy was predicted by chronological age for children with ASD and nonverbal intelligence for TD children.
Unlike children in Study 1, participants in Study 2 could deduce word meanings by simply attending to cues that reliably highlighted the referent for each word. Indeed, our data suggest that both populations may have employed this strategy. TD children responded with ceiling accuracy (91-96%) across conditions and blocks, while children with ASD were marginally less accurate (87-94%) but not significantly different. Both populations appeared to recognise that both social and non-social cues were indicative of referential meaning, despite the lack of instruction explaining this to be the case. Consequently, children reliably directed their attention to correct referents on the vast majority of trials, thus experiencing the almost-maximal frequency of correct word-object pairings during training. When data from Studies 1 and 2 were collapsed to explore the effect of cue provision, Block also emerged as a main effect across conditions. While the difference between blocks was larger in Study 1, three out of four groups in Study 2 also demonstrated superior accuracy in Block 2 than Block 1. This implies that some participants may have taken a short while to realise the usefulness of attentional cues during training.
Perhaps surprisingly, the training accuracy of TD children and children with ASD in the Social Cue condition did not significantly differ. This demonstrates that our participants with ASD were sensitive to the actor's change in gaze direction and considered this to be a reliable cue to referential meaning. These findings are incongruent with classic evidence that children with ASD do not reliably map word-referent relationships based on gaze (e.g. Baron-Cohen et al., 1997;Preissler & Carey, 2005). However, a growing collection of studies show that children with ASD who develop functional language skills can utilise social-communicative information when identifying the meanings of new words (Bean Ellawadi & McGregor, 2016;Hani, Gonzalez-Barrero, & Nadig, 2013;Luyster & Lord, 2009;McGregor et al., 2013;Norbury et al., 2010). As predicted, the two populations did not differ in their sensitivity to non-social attentional cues. Taken together, data across the two conditions show that our samples of children with ASD utilised attentional cues to overcome the challenge of referential ambiguity when mapping novel words over multiple exposures as effectively as vocabulary-matched TD controls.
Accuracy during training was significantly higher in Study 2 than in Study 1, suggesting that children identified correct referents more reliably when attentional cues were provided. There was a risk that providing highly-accurate attentional cues would trivialise the task of selecting correct referents -if participants were sensitive to the cues then they could correctly select target referents based purely on visual information, without actually learning word-object correspondences. However, the similarity of retention and generalisation accuracy observed in Study 1 and Study 2 suggests that children encoded the novel word-referent correspondences despite the easier learning context. On one hand, this finding suggests that children's exposure to higher frequencies of correct word-referent pairings during training does not necessarily elicit superior "longer term" learning. The fact that TD children's retention accuracy was not promoted by the Social Cue condition also suggests that longer term learning is not necessarily facilitated by opportunities to infer referential intent from socialcommunicative behaviour. However, on the other hand, our response time analyses show that providing attentional cues can accelerate children's identification of novel word meanings without having a detrimental effect on their subsequent retention or generalisation accuracy. From a practical perspective, this finding has interesting implications for interventions supporting word learning in both typical development and autism.
The similarity of retention and generalisation accuracy across the two populations replicates the results from Study 1, providing further evidence that core relationships between cross-situational mapping and slow learning mechanisms are not atypical in ASD. Both groups established word-referent representations that privileged similarity of shape as the key determinant of referential meaning, enabling accurate extension to unlabelled category members. These findings support recent studies showing that ASD is not characterised by a pervasive deficit in shape-based generalisation (e.g. Field, Allen, & Lewis, 2016;Hartley et al., 2019). However, children with ASD were slower to identify correct referents during training, despite the provision of attentional cues, and when tested after a delay. Whereas the mechanisms of word learning may be intact in children with ASD, the detriment may instead be apparent when learning words under more naturalistic conditions, where the time available for the child to process the input is substantially reduced. Our models incorporating individual differences also suggest that the relative contributions of chronological age, receptive vocabulary, and nonverbal intelligence to mapping and retention may differ for children with ASD and typical development.

General discussion
These studies are the first to systematically explore cross-situational word learning as an integrated system of mapping, retention, and generalisation in both typical development and autism. In comparison to TD controls matched on receptive vocabulary and nonverbal intelligence (raw score), children with ASD were as accurate at disambiguating the meaning of novel words based purely on statistical correspondences. Both groups spontaneously utilised social and nonsocial attentional cues to facilitate and accelerate their mapping of novel word meanings. Moreover, both groups retained and generalised word-referent representations established through cued and non-cued learning with impressive and comparable accuracy. While children with typical development and autism performed very similarly on measures of learning accuracy, between-group differences emerged in analyses of response times. Children with ASD were significantly slower to correctly identify word meanings than TD children under both cued and non-cued learning conditions. These findings advance understanding of how ASD may affect children's language acquisition and have interesting implications for interventions targeting vocabulary development.
The results of Studies 1 and 2 demonstrate that, under the right conditions, children with ASD can learn novel words as accurately as TD children when expectations are based on current receptive language ability. Our data indicate that ASD does not impair the fundamental mechanisms that underpin cross-situational word learning, or the relationships between them. These findings align with evidence that language acquisition in ASD is not qualitatively atypical (e.g. Boucher, 2012;Ellis Weismer et al., 2011;Gernsbacher, Morson, & Grace, 2015;Gernsbacher & Pripas-Kapit, 2012;Goodwin, Fein, & Naigles, 2012;Naigles, Kelty, Jaffery, & Fein, 2011). Hartley et al. (2019) recently reported that children with ASD (several of whom also participated in the present study) were accurate in their use of mutual exclusivity and retained fast-mapped words with comparable accuracy to TD controls. Moreover, children with ASD who received social feedback after independently mapping word-referent relationships achieved the most accurate retention, outperforming TD controls in the same condition and autistic peers who received non-social feedback or no feedback. Viewed in conjunction with the training data from our Social Cue condition, it appears that -like TD children -children with ASD can benefit from social input both in terms of identifying the meaning of new words and reinforcing longer-term learning.
However, effective word learning performance of children with ASD plus concomitant language delay may be contingent on the quality and pace of input. Our data show that children with ASD were significantly slower to indicate correspondences between words and referents during training and delayed test trials. Typical speech occurs at a rate of approximately 150 words per minute (Studdert-Kennedy, 1986), approximately ten times more rapidly than in the current experimental settings. Slower processing of information in these natural language conditions would result in a reduction of associative strength between words and referents, even if the mechanisms enabling those associations to be formed are unimpaired (McMurray et al., 2012;Yu & Smith, 2012). In the present research, such difficulties that may impact natural language learning were likely mitigated by the highly controlled presentation of visual and auditory stimuli. Firstly, the experiment was delivered via a touchscreen tablet -a platform that children with ASD find extremely engaging (Allen, Hartley, & Cain, 2016). Secondly, the possible referents in our experimental learning environments were substantially constrained -just two options during the training trials, compared to a multitude of possibilities in natural learning situations (Yu & Ballard, 2007;Yurovsky, Smith, & Yu, 2013). Thirdly, children's progression through training and retention trials was regulated by the speed of their responding, meaning their processing time was unrestricted.
The observed profile of good accuracy coupled with slower response times supports recent claims that ASD may delay children's language acquisition by disrupting intake of information, rather than impairing specific word learning processes or strategies per se (Arunachalam & Luyster, 2016 . That is, skills which appear to be intact in experimental settings may operate less efficiently in unconstrained natural environments. Evidence from typical development suggests that social cues facilitate word learning in so far as they direct children's attention to referent objects and away from competitors, thereby increasing the accuracy of associative learning mechanisms (Axelsson et al., 2012). However, many children with ASD have diminished social motivation, and may not spontaneously or reliably attend to gaze or gestural cues when deciphering word meanings in real world contexts (Chevallier, Kohls, Troiani, Brodkin, & Schultz, 2012). Children with ASD may also be overwhelmed by sensory input or distracted by non-relevant aspects of the learning environment (Akechi et al., 2011;Tenenbaum et al., 2017). For example, Akechi et al. (2011) found that children reliably mapped a novel word to a speaker's object (the intended referent) only when it was more perceptually salient than an object they themselves were holding. Thus, advancing understanding of how children with ASD extend experimentally-demonstrated word learning mechanisms to naturalistic settings is an important objective for future research.
For both TD children and children with ASD, word-referent relationships encoded through cross-situational mapping were flexible, enabling extension to novel category members. Our results contrast with previous evidence that shape-based label generalisation is a relative weakness for children with ASD (Hartley & Allen, 2014;Potrzeba et al., 2015;Tek et al., 2008). This discrepancy may be attributable to differences in sample composition and/or methodology.
In terms of population differences, children with ASD may need to acquire a more extensive vocabulary than TD children before they privilege shape as a basis for generalising labels. Previous studies report the absence of a preference for shape-based generalisation in ASD samples with verbal mental ages between 23 and 42 months, whereas the average receptive vocabulary age equivalent of our ASD sample was approximately 64 months. Hartley et al. (2019) also observed a preference for shape-based label extension in children with ASD whose verbal mental ages exceeded 60 months. Viewed alongside those findings, our data support the theory that development of a tendency to generalise labels based on shape is delayed in ASD, but not deviant (Field et al., 2016). In terms of methodology, studies reporting poorer generalisation in ASD usually employ tasks that allow children to select multiple referents for target words (either through explicit gestural responses or through gaze shifting), whereas the present study employed a forced-choice task. It may be that children with ASD prioritise shape when required to select a single referent from an array, but are willing to extend labels based on non-shape properties when several objects can be identified as referents for a target word. Furthermore, unlike studies that present novel word-referent relationships and immediately probe extension (e.g. Hartley & Allen, 2014), we assessed generalisation of already-encoded word-referent relationships after a delay. This approach required children to generalise on the basis of mental representations rather than online categorisation in relation to a visible, or recently visible, target exemplar. The accuracy of our ASD group suggests that their ability to abstract and store mental representations of referents' shapes was not atypical. This, in turn, may suggest that previously reported deficits are linked to categorisation impairments resulting from differences in visual processing and increased attention to an exemplar's category-irrelevant details (e.g. colour; Happé & Frith, 2006). Future research that teases apart online categorisation from prototype formation is required to fully understand the impact of ASD on shape-based generalisation.
Our study also enabled us to test the influence of background individual difference variables in predicting children's responses, and determining whether these background variables had contrasting effects according to group. Though there were effects of age, receptive vocabulary, and nonverbal intelligence relating to accuracy and response times for the children with ASD and TD children across various measures, there was no consistent influence of any of these variables. Furthermore, the intercorrelations between these individual difference variables mean we cannot be sure that they individually predict variance in performance. Future studies that focus on these individual difference measures, and tease them apart in controlled studies, would help to determine their roles in cross-situational word learning for children with ASD.
From a practical perspective, our findings can potentially inform the development of interventions designed to scaffold word learning in children with ASD. Our study shows that it is possible to teach children new words using a tablet computer. While children with ASD are often highly-motivated to interact with touchscreen technology, evidence of effective learning via this platform is conflicted (see Allen et al., 2016). Here, we demonstrate that touchscreen tablets can be an effective learning platform for children with delayed development, providing that stimuli are presented in a way that appeals to their strengths.
Also, unlike many studies that report word learning deficits in ASD (e.g. Baron-Cohen et al., 1997;Gliga et al., 2012), our tasks required children to actively disambiguate meaning rather than observe naming events. In Study 1, children's mapping of word-referent relationships was completely unguided and, despite the provision of attentional cues in Study 2, children still had to decipher the referent for each word. It may be that children with ASD are more likely to retain word-referent relationships that they actively and independently decode and benefit less when they are passive recipients of ostensive naming . This may also be true for TD children, as Zosh and colleagues showed that TD pre-schoolers retained new words with greater accuracy following independent inference of meaning rather than ostensive labelling (Zosh, Brinster, & Halberda, 2013). Thus, in terms of scaffolding longer-term learning, the same hierarchy of effectiveness could apply to both typical development and autism: cross-situational learning may be more effective than mutual exclusivity based fast mapping, which in turn may be more effective than passive ostensive naming. However, this claim is speculative and warrants empirical validation.
Our data also show that children's independent identification of word meanings can be accelerated by providing highly-salient attentional cues, without having a negative impact on their retention accuracy. We observed no differences between cue types, suggesting that any means of guiding children's attention towards a target may be helpful. Although the increased frequency of informative statistical input afforded by attentional cues did not yield a significant improvement in retention after 5 min, the increased speed and accuracy of word-referent mapping may benefit vocabulary development over extended timeframes. It is conceivable that the strategies discussed here could be employed by interventions to optimise language-learning conditions for children with ASD.
Of course, we must address the limitations of this research. It is vital to acknowledge that language development is extremely heterogenous across the ASD spectrum, and deficits in cross-situational word learning accuracy may be observed in minimally verbal samples. It is also important to recognise that our participants with ASD were matched to TD controls on receptive vocabulary, not chronological age. It is well-documented that children with ASD tend to be developmentally delayed on various facets of language development when compared against age norms for TD children (e.g. Charman, Drew, Baird, & Baird, 2003). Hence, we realise that our ASD groups may have attained weaker performance than similarly-aged TD controls. However, in line with other recent work, the purpose of this research was to examine cross-situational word learning across typical and autistic development while controlling for vocabulary. Our samples of TD children and children with ASD were not systematically matched on gender, which could have potentially contributed to between-group differences (although prior work has shown that group differences in gender have little bearing on word learning performance in these populations; e.g. Hartley et al., 2019). In the Social Cue condition of Study 2, we cannot be certain that our participants with ASD would have responded as accurately as TD controls if the actor's shift in gaze was not accompanied by a head turn. It is possible that children's attention was cued towards the target referent by the head movement, rather than by following the actor's gaze. Future research exploring the use of gaze in the service of cross-situational word learning by children with ASD ought to provide a more stringent test by providing gaze cues without head turns.
Finally, our data only allow us to speculate on potential causes of the slower response times produced by children with ASD. While it is possible that these results are due to differences in language-specific processes, they could alternatively be explained by differences in the generation of motor behaviours. Children with ASD may have processed stimuli at the same speed as TD controls, but generated slower touch responses due to deficits in motor planning and execution (for a review, see Gowen & Hamilton, 2013). Another alternative is that differences in response times indicate that children with ASD were more reflective and less impulsive when selecting referents. However, it is noteworthy that response times tended to increase from block 1 to block 2 during training in Study 2, particularly for children with ASD, which shows that greater practice with the task did not result in faster responding. If children with ASD require more time for reflection to accurately learn words, then the bottleneck of natural communicative situations may reduce their performance in comparison to controlled experimental settings, resulting in delayed acquisition.
In summary, our study has elucidated how cross-situational wordreferent mapping inter-relates with retention and generalisation in children with ASD and typical development. Despite significant delays in their global language development (relative to chronological age), our ASD samples mapped, retained, and generalised novel words as accurately as TD controls. However, they were significantly slower to generate correct responses across trial types and conditions. These findings imply that fundamental word-learning mechanisms are not atypical in ASD. It may be that ASD affects the efficiency of these mechanisms by disrupting children's intake of language input, particularly during early stages of linguistic development (Arunachalam & Luyster, 2016). Promisingly, we found that providing attentional cues during cross-situational learning can increase both the accuracy and speed of word-referent mapping, without disadvantaging retention in either population. Overall, this research informs understanding of word learning in ASD and spotlights strategies for scaffolding vocabulary acquisition that can be evaluated in future research.

Funding
This research was funded by an Economic and Social Research Council (UK) grant awarded to the first author (ES/N016955/1).

Declaration of competing interest
Hartley declares that he has no conflict of interest. Bird declares that she has no conflict of interest. Monaghan declares that he has no conflict of interest.