Receptive and expressive language ability differentially support symbolic understanding over time : picture comprehension in late talking and typically developing children 2021

236 words Word count (text + references): 10, 484 PICTURE COMPREHENSION IN LATE TALKING CHILDREN 2 Abstract Symbols are a hallmark of human communication, and a key question is how children’s emerging language skills relate to their ability to comprehend symbols. In particular, receptive and expressive vocabulary may have related, but distinct, roles across early development. In a longitudinal study of late talking (LT) and typically developing (TD) children, we differentiated the extent to which expressive and receptive language skills predicted symbolic understanding as reflected in picture comprehension, and how language skills inter-related with social skills. LT and TD children were tested on a picture comprehension task that manipulated the availability of verbal labels at 2.0 – 2.4 years and 3.5 – 3.9 years. While all children improved in accuracy over time as expected, TD children exhibited an advantage over LT children, despite both groups utilising verbal labels to inform their mapping of picture-object relationships. Receptive and expressive vocabulary also differed in their contribution at different ages: receptive vocabulary predicted performance at ~2-years-old, and expressive vocabulary predicted performance at ~3.5-years-old. Task performance at 3.5-years-old was predicted by earlier receptive vocabulary, but this effect was largely mediated by concurrent expressive vocabulary. Social ability across the whole sample at ~2-years-old also predicted and mediated the effect of receptive vocabulary onSymbols are a hallmark of human communication, and a key question is how children’s emerging language skills relate to their ability to comprehend symbols. In particular, receptive and expressive vocabulary may have related, but distinct, roles across early development. In a longitudinal study of late talking (LT) and typically developing (TD) children, we differentiated the extent to which expressive and receptive language skills predicted symbolic understanding as reflected in picture comprehension, and how language skills inter-related with social skills. LT and TD children were tested on a picture comprehension task that manipulated the availability of verbal labels at 2.0 – 2.4 years and 3.5 – 3.9 years. While all children improved in accuracy over time as expected, TD children exhibited an advantage over LT children, despite both groups utilising verbal labels to inform their mapping of picture-object relationships. Receptive and expressive vocabulary also differed in their contribution at different ages: receptive vocabulary predicted performance at ~2-years-old, and expressive vocabulary predicted performance at ~3.5-years-old. Task performance at 3.5-years-old was predicted by earlier receptive vocabulary, but this effect was largely mediated by concurrent expressive vocabulary. Social ability across the whole sample at ~2-years-old also predicted and mediated the effect of receptive vocabulary on concurrent task performance. These findings suggest that LT children may have delays in developing picture comprehension over time, and also that social ability and language skills may differentially relate to symbolic understanding at key moments across development. PICTURE COMPREHENSION IN LATE TALKING CHILDREN 3 Introduction The use of symbols is a uniquely human cognitive hallmark and is vital to communication (DeLoache, 1995; Tomasello et al., 2005). A symbol is something that someone intends to represent something else, and can take many forms including gestures, graphics, text, words, and maps (DeLoache, 2004). Children are immersed in a symbolic world from infancy, and the types of symbols children understand are subject to both cultural context and social scaffolding (Callaghan et al., 2011; Rakoczy et al., 2005). Children in Western societies are exposed to pictures from an early age. Children use linguistic labels to scaffold their understanding of pictures (Callaghan, 2000), and the development of language and other symbolic domains, such as symbolic play, are closely related (Quinn et al., 2018). This means that early language impairments have the potential to also affect children’s understanding of non-linguistic symbol systems. Although the literature has established that symbolic understanding, language ability, and social context interact in typical development (Callaghan & Corbit, 2015), we do not fully understand how these domains affect each other over time. Furthermore, their trajectory in atypical development is not well defined, and the effect of language delay on how children understand pictures remains under-investigated. Examining the effect of language delay on picture comprehension is crucial to understanding whether children with these difficulties have functional impairments in additional symbolic domains, and also offers an opportunity to elucidate how language scaffolds symbolic understanding during development. Language and picture comprehension in typical development In order for children to understand pictures as symbols, they need to acquire dual representation; the understanding that a symbol is not just an object, but also a representation of something else (DeLoache, 2004). At 9-months-old, infants manually investigate pictures as if they were real objects, grasping and plucking at depicted items (DeLoache et al., 1998). PICTURE COMPREHENSION IN LATE TALKING CHILDREN 4 By 18-months-old, they begin pointing and talking about pictures rather than handling them, suggesting that they have begun to treat pictures as symbols, rather than as objects in themselves (Pierroutsakos & DeLoache, 2003). Language can aid children in understanding the representative nature of pictures, as verbal labels provide clues about how 2-D visual symbols relate to referents in the world (Callaghan, 2000; Ganea et al, 2008). When testing 2-year-olds, Preissler and Bloom (2007) demonstrated that labelling a picture of an unfamiliar object (‘this is a wug. Can you show me another one?’) directed children to identify the symbolised object 90% of the time, whereas children only identified symbolised referents 30% of the time when pictures were not labelled (‘look at this. Can you show me another one?’). As children quickly learn that verbal labels refer to objects in the world, the act of labelling cues children to view pictures as symbolic representations rather than objects. Children aged 15, 18, and 24 months will spontaneously extend a novel label (e.g. ‘whisk’) taught using a picture to its corresponding 3-dimensional referent (e.g. an actual whisk; Ganea et al., 2009; Preissler & Carey, 2004). These findings show that young children understand that verbal labels paired with pictures refer to independently existing referents, and also that the pictures themselves are representational and not the exclusive referents for their associated labels. However, language itself is a symbol system that caregivers heavily invest in, going to considerable lengths to teach their children words. Children may thus learn verbal representations for concepts (e.g., understanding how the label ‘dog’ relates to the world) before they learn how pictures or other symbols relate to the same concept. Callaghan (2000) explicitly demonstrated that children use verbal labels to scaffold their understanding of pictures and objects, but also that this differs according to age. Children were shown a series of line drawings and were asked to identify the referent of each drawing from a pair of 3dimensional objects (with the picture removed from view). In some trials, linguistic PICTURE COMPREHENSION IN LATE TALKING CHILDREN 5 scaffolding was unavailable, as the two paired objects had the same category label (e.g. two types of dog). In other trials, linguistic scaffolding could be used, as the two paired objects had distinct category labels (e.g. dog and cat). The study demonstrated that 2.5-year-olds only performed above chance when pictures could be unambiguously matched to objects using distinct verbal labels, while 3-year-olds performed above chance even without linguistic scaffolding. For younger children, whose understanding of the pictorial symbol system was relatively fragile, verbal labels were valuable in bridging the gap between images and their depicted referents. Older children, however, were able to rely on the perceptual similarities between images and their referents to accurately identify picture-object relationships in the absence of linguistic scaffolding. More broadly, language may provide a basis for other symbol systems during development (Callaghan, 2020; Nelson, 2007; Tomasello, 2003). A meta-analysis of symbolic play studies found a significant interaction for symbolic play between age and whether expressive or receptive language measures were used (35 studies; p = .006; Quinn et al., 2018). This demonstrated that symbolic play was related to concurrent receptive measures in children under 3-years-old (r = .41), whereas concurrent expressive measures better predicted symbolic play in studies of children over 3-years-old (r = .36). However, this interaction was driven by a difference in effect sizes for receptive, rather than expressive vocabulary, as the expressive effect size remained stable across ages, making any differential effects at different ages hard to clearly identify. As picture comprehension and symbolic play skills appear to be closely related (Rochat & Callaghan, 2005), any differential effects of emerging language ability on symbolic play may also affect picture comprehension at different ages. Few studies have assessed how picture comprehension and language skills inter-relate during early development. Of these studies, some have found different effects of receptive PICTURE COMPREHENSION IN LATE TALKING CHILDREN 6 and expressive language ability on pictorial understanding and wider symbolic ability. Callaghan and Rankin (2002) assessed picture comprehension and production at 28, 36, and 42 months. The authors found that picture comprehension scores positively correlated with receptive language and picture production scores positively correlated with expressive language. Kirkham et al. (2013) also assessed the relationship between language, pictorial understanding, and symbolic play. They found that the mean length of five longest utterances (MLU5) at 4 years predicted symbolic play and representational drawing ability at 5 years, and that receptive and expressive language score combined at age 4 predicted symbolic play at age 5. Receptive vocabulary has also been found to correlate with performance on scale model and picture search tasks (finding a hidden object in a room based on the location of a miniature object positioned in a scale model of the same room or searching a location represented by a picture; Hartley & Allen, 2015a; Homer & Nelson, 2009). In summary, cross-sectional and longitudinal evidence shows that linguistic and nonlinguistic symbolic domains are developmentally inter-related. Verbal labelling scaffolds symbolic understanding of picture-object relationships (Callaghan, 2000; Ganea et al., 2009) and expressive and receptive language abilities correlate with pictorial tasks, but may exhibit different effects at different ages (Callaghan & Rankin, 2002; Kirkham et al., 2013). However, we do not know whether early language delays cause deficits in picture comprehension over time. Furthermore, in typical development, differential effects of expressive and receptive language on symbolic ability have proved difficult to identify (Quinn et al., 2018). Studying productive language impairments provides a unique opportunity to explore how receptive and expressive language skills interact differentially with pictorial understanding over time. Language and picture comprehension in atypical development PICTURE COMPREHENSION IN LATE TALKING CHILDREN 7 Late talking (LT) children are defined as 18–30-month-old children at or below the 10th percentile of expressive vocabulary compared to other children their age, without neurodevelopmental or sensory deficits (Fisher, 2017). Although the majority of LT children recover by approximately 5 years (Rescorla, 2011), a minority – between ~12% – 25% (Collisson et al., 2016; Henrichs et al., 2011; Reilly et al., 2010; Roulstone et al., 2002; Zubrick et al., 2007) – develop Developmental Language Disorder (DLD). Although many LT children reach the neurotypical range for expressive vocabulary by school age, they consistently score on the lower end of this range across a variety of language measures (Domsch et al., 2012; Rescorla, 2002, 2005; Rescorla et al., 2000; Rice et al., 2008). LT children are characterised by expressive vocabulary deficits, yet can have varying receptive vocabulary skills (Fisher, 2017), whereas expressive and receptive vocabulary are more tightly intertwined in typically developing (TD) children. Evidence that expressive and receptive vocabulary might exert differential effects on pictorial understanding can be found in autism spectrum disorder (ASD) studies, as children with ASD often exhibit a range of language difficulties (Eigsti et al., 2011). Studies that test the extension of words from pictures to symbolised referents in minimally verbal children with ASD who are matched with TD children on receptive vocabulary have found deficits in the ASD sample (mean receptive age ~ 3.5-years-old; Hartley & Allen, 2015b; Preissler, 2008). However, when adapting Callaghan’s (2000) linguistic scaffolding task for TD and ASD samples that were matched on both expressive and receptive language (mean ~ 4.5-years-old), Hartley et al. (2019) found that children with ASD and TD children performed identically across all trial types. Both samples showed lower accuracy on trials where they could not use verbal labels relative to trials where they could. In the ASD sample, both receptive and expressive language predicted task performance; in the TD sample, only receptive language was PICTURE COMPREHENSION IN LATE TALKING CHILDREN 8 predictive. These studies suggest that children with expressive, but not receptive, deficits might struggle with utilising verbal scaffolding in pictorial understanding tasks. One possible explanation for differential effects of receptive and expressive language on pictorial understanding is simply that children who say less experience fewer opportunities to participate in social situations where pictures are utilised. Many accounts of symbolic understanding rely on a foundation of socio-cognitive skills, such as imitation and intention reading (Nelson, 2007; Rakoczy et al., 2005; Rochat & Callaghan, 2005; Tomasello, 2003, 2010; Vygotsky, 1980). For example, Rochat and Callaghan (2005) argue that pictures are inherently communicative, and understanding them is driven by a ‘basic affiliative need’ to communicate and identify with other humans. They describe pictorial understanding development in stages that are built on social factors, beginning from infants (12-months-old) who imitate the actions of adults when given pictorial symbols, to toddlers (2 – 4-years-old) who use social scaffolding through language and imitation to understand pictures, and finally to school-aged children (4 – 5-years-old) who begin to understand not only symbol-referent relations, but also the intentions of the symbol-creator. Differences in socio-cognitive ability may contribute towards some of the differences in pictorial understanding found in ASD (Hartley & Allen, 2014) and may also be affected by a delay in expressive vocabulary (although directionality in LT is difficult to specify). Expressive delay could potentially reduce opportunities to learn from caregivers that verbal labels are used to scaffold picture comprehension, and result in LT children having less practice in applying a linguistic strategy. Caregivers of children with expressive language delay have been found to provide less complex recasts (Conti-Ramsden, 1990), less lexical and prosodic information (D’Odorico & Jacob, 2006), and produce fewer expansions, less self-directed speech, and less general responses (Vigil et al., 2005). Others have found no difference in maternal input, but rather found that as LT children simply say less, caregivers PICTURE COMPREHENSION IN LATE TALKING CHILDREN 9 have less to expand upon (Paul & Elwood, 1991). Outcome studies also suggest that there may be social impairments associated with expressive language delay, with some finding lower social competency in LT children (Horwitz et al., 2003; Longobardi et al., 2016). Overall, despite socio-cognitive skills forming the basis of theoretical accounts of symbolic development, we do not know how individual differences in social ability in LT children might interact with pictorial understanding. It is possible that the impact of expressive language delay on the availability of social scaffolding, or vice versa, may affect pictorial understanding. Equally, social ability may well compensate for deficits in expressive vocabulary; LT children who are more socially orientated may invite more social scaffolding behaviour than those who are not. The current study In sum, there are three distinct areas in which further research is necessary. Firstly, although symbols form a key part of communication throughout life, and TD children use language before 3 – 4-years-old to scaffold their understanding of pictorial symbols, we do not know how early language delay affects picture comprehension in the absence of ASD. No studies to date have investigated how linguistic scaffolding of pictorial understanding might be affected in LT children, and other research suggests that language delay might be related to differences in symbolic play (Lyytinen et al., 2001; Rescorla & Goossens, 1992). As symbolic play, pictorial understanding, and language are developmentally inter-related (Callaghan & Rankin, 2002; Kirkham et al., 2013), LT children may also exhibit deficits in pictorial understanding. Secondly, despite evidence from typical and atypical populations that expressive and receptive vocabulary might have different effects as pictorial understanding develops, very few studies have probed this relation directly. This means we do not know how emerging language skills interact with pictorial understanding at different ages. PICTURE COMPREHENSION IN LATE TALKING CHILDREN 10 Thirdly, regardless of theoretical literature maintaining that social scaffolding and language are crucial to pictorial understanding, the relationships between individual social ability, language delay, and pictorial understanding have not been directly investigated in populations with isolated expressive language delay. We address these issues by adapting Callaghan’s (2000) verbal scaffolding picture comprehension task in a longitudinal study of LT and TD children. We manipulated the availability of verbal labels when asking children to match pictures to real objects and assessed their concurrent language skills at 2 – 2.4-years-old (timepoint 1; T1) and 3.5 – 3.9years-old (timepoint 2; T2). We also considered the effect of social ability measured at 2 – 2.4-years-old. We hypothesised that LT children would respond less accurately than TD children when linguistic scaffolding is available, and on par with TD children in conditions when linguistic scaffolding is inaccessible. We also hypothesised that expressive vocabulary at both T1 and T2 would positively predict picture comprehension accuracy, and that receptive vocabulary would be positively correlated with expressive vocabulary. We originally hypothesised that LT children who did not recover (i.e. they failed to reach the TD expressive vocabulary range by the second timepoint at 3.5 – 3.9-years-old) would perform less accurately with linguistic scaffolding, and that LT children who did recover (i.e. they reached the TD expressive range by T2) would perform on par with TD children (see preregistrations). However, as we could only test half of the original sample due to COVID19, resulting in small subgroups, we utilised concurrent expressive vocabulary as a continuous variable across all participants at T2. As an exploratory analysis, we also hypothesised that children with less sophisticated social ability would score lower on picture comprehension accuracy. PICTURE COMPREHENSION IN LATE TALKING CHILDREN 11 Material and Methods Participants Participants were part of a longitudinal project intended to capture differences between LT and TD children for 18 months, between the ages of 2 – 2.4-years-old to 3.5 – 3.9-years-old. The picture comprehension task was administered at the first and last time points. Participants were recruited using flyers from Lancaster Babylab, via health visitors in the Lancashire local authority, and from nurseries in the local area. Once consent to contact was obtained, parents completed the Oxford-CDI (Hamilton et al., 2000), a parent-reported checklist of words the child says and understands. Children were included in the study if they met one of the following criteria: TD with productive vocabulary score ≥ 25th percentile, or LT with productive vocabulary score ≤ 10th percentile. These criteria were chosen to ensure two distinct groups, with the LT criterion consistent with prior literature (Fisher, 2017). Inclusion criteria also included monolingual British English, with no history of developmental or sensory delays or disorders. A total of 85 families completed the CDI; of these, 24 were excluded due to the aforementioned criteria. A total of 61 children (40 TD and 21 LT) took part in the study at the first time point aged 2.0 – 2.4-years-old (T1); however, 2 TDs did not complete the pictorial understanding task due to fussiness and so were excluded from the final sample of 59 children (38 TD and 21 LT). At 3.5 – 3.9-years-old (18 months from baseline; T2) a total of 29 children (20 TD and 9 LT) were tested before the COVID-19 pandemic halted all face-toface testing. Questionnaires Participants completed the Oxford-CDI (Hamilton et al., 2000) at consent-to-contact. Caregivers completed a demographics questionnaire and the Preschool SocialPICTURE COMPREHENSION IN LATE TALKING CHILDREN 12 Responsiveness Scale-2 (SRS-2; Constantino & Gruber, 2012) at the T1 test visit. The SRS-2 is a caregiver-reported list of behaviours rated by frequency of occurrence that is often used to identify children at risk of autism spectrum disorder (ASD). However, as it is designed to discriminate typical and atypical behaviours at a young age and was normed on a welldistributed sample of children aged 30 – 54-months-old (albeit in the US), it was used as a measure of individual social proficiency (raw scores) within our sample. The experimenter conducted the Receptive and Expressive One-Word Picture Vocabulary Tests (ROWPVT-4 and EOWPVT-4 respectively; Martin & Brownell, 2011) and the Leiter-3 Non-Verbal IQ 4subscore scale (Roid et al., 2013) at the T2 test visit. The ROWPVT-4 and EOWPVT-4 are picture-based vocabulary tests; the ROWPVT-4 is administered by showing four pictures to a child and asking them to identify a named picture (e.g. ‘which is the rabbit?’), whereas the EOWPVT-4 is administered by showing a single picture to a child and asking them to name the picture (e.g. ‘what’s this?’). The Leiter-3 Non-Verbal IQ 4-subscale involves a series of core subtests that measure reasoning, visualisation, and problem solving without the need to utilise verbal instructions, allowing assessment of non-verbal IQ in children with low language ability. Picture comprehension task Objects: We adapted Callaghan’s (2000) picture comprehension task, using the same criteria for selecting relevant stimuli. There were 32 different objects in total, split into 16 pairs. For each condition, there were four trials, one from each of four groups: animals, natural, household/indoor artifacts, and vehicles (16 trials in total). For the Familiar-Matched Label condition, pairs of familiar objects had the same basic label (e.g. dog) but different subordinate labels (e.g. German Shepherd and Burmese Mountain). For the UnfamiliarMatched Label condition, pairs of unfamiliar objects had the same basic label (e.g. coral) but different subordinate labels (e.g. elkhorn coral and encrusting coral). For the FamiliarPICTURE COMPREHENSION IN LATE TALKING CHILDREN 13 Distinct Label condition, pairs of familiar objects had the same global label (e.g. animal) but different basic labels (e.g. cat and rabbit). For the Unfamiliar-Distinct Label condition, pairs of unfamiliar objects had the same global label (e.g. vehicle) but different basic labels (e.g. quadbike and jet ski). We ensured that perceptual discriminability of paired objects was similar across trial types and stimuli groups. For sets of animals, different fur colours and poses were chosen (sitting vs. standing dogs); for artifacts, different colours, materials and shapes were chosen, and so on. All objects were roughly the same size. Caregivers were consulted prior to participation on their children’s familiarity with the test objects, and the age-norms for objects were checked using Fenson et al. (1994; familiar objects: M age = 13.92 months-old, range = 10–16-months-old). Example stimuli can be seen in Figure 1 (see Appendix A for all stimuli used). Pictures: Sixteen black and white laminated cards were used that had a simple black pen drawing of one familiar or unfamiliar object. Display: Objects were placed on a tray with a deep lid that had a handle and a cut out at the back that allowed the experimenter to rearrange objects out of sight of the child. The objects remained hidden until the experimenter lifted the lid to reveal the two objects sitting on the tray. Procedure: We adapted Callaghan's (2000) picture comprehension task that manipulates the availability of linguistic scaffolding. We manipulated the label for choice objects across conditions where it could not be used (Matched Label trials; two objects with the same basic label, e.g. two types of dog) and conditions where it could be (Distinct Label trials; two objects with different basic labels, e.g. rabbit and cat). We also manipulated the familiarity of objects depending on the child’s knowledge of the labels and objects (Familiar and Unfamiliar) within the Matched Label and Distinct Label trials. The order of trial types was PICTURE COMPREHENSION IN LATE TALKING CHILDREN 14 randomised per participant, with no more than two trial types of the same time presented consecutively. In Familiar-Distinct Label trials, linguistic scaffolding is possible – if the participant generates a label when the target is cued, they can achieve a correct response by identifying the target object based on its matching label, rather than responding based on perceptual similarity alone. However, this strategy is unavailable in the other conditions when both referent objects share the same known label as the picture (Familiar-Matched Label), or labels are unknown (Unfamiliar trials). The task was administered at T1 and T2. Participants were tested with the same mobile set-up for the task, either at the participant’s home or in a designated room at the Babylab depending on the family’s preference. Where visits took place at home, care was taken to ensure a clear space and a quiet environment with just the experimenter, child, and caregiver present. During the task, the child and experimenter were sitting on opposite sides of a 1-metre wide, low fold-out table. The experimenter held up the relevant trial picture card (e.g. cat) and said “Look!”. The picture was presented for 4 seconds before being removed from view. The experimenter then lifted the lid of the box to reveal the two relevant trial objects, one of which resembled the picture (e.g. cat), and the other, a paired foil object (e.g. rabbit). On displaying the objects, the experimenter asked “Which one is the same as the picture?” The trial ended when the child made a response (either by pointing with fingers or palm, or picking up the relevant object). PICTURE COMPREHENSION IN LATE TALKING CHILDREN 15 Figure 1. Example of stimuli and trial types used: a) Familiar-Matched Label; b) Unfamiliar-Matched Label; c) Familiar-Distinct Label; d) Unfamiliar-Distinct Label. Results All data and code can be found at: https://osf.io/ywmx5/?view_only=14f51c730c4c47758893bc684d7cebf5, alongside preregistrations with a document that explains deviations due to the COVID-19 pandemic. Overview of analyses We first describe the sample and report descriptive analyses of the data using Welch one-sample t-tests, identifying how TD and LT children performed against chance. We then conducted three analyses to assess our research hypotheses. The first tested the longitudinal PICTURE COMPREHENSION IN LATE TALKING CHILDREN 16 predictive effect of LT status over time using generalised linear mixed effects modelling (GLME). The second tested whether receptive and expressive vocabulary measures could predict performance cross-sectionally at different ages (T1 and T2) using GLME analyses and a post-hoc mediation analysis. The third assessed whether social ability at T1 had any additive predictive value on accuracy at T1 or T2 by comparing GLME model fits to the data, with and without social ability, and by using a post-hoc mediation analysis. Sample Table 1 contains T1 and T2 final sample demographics, questionnaire, and vocabulary scores. Almost all families identified as White British and the majority (92%) had at least one parent with an education level corresponding to an undergraduate University degree or above. Due to the COVID-19 pandemic halting all face-to-face testing, only 29 of the original 59 children were tested at T2. TD and LT children did not differ in SRS-2 (t(46.39) = 1.35, p = .183) or Leiter-3 scores (t(10.02) = -1.45, p = .178). In order to be included in the study, children had to exceed a Leiter-3 cut-off score of 79, which corresponds to the upper boundary of “low” performance associated with the test (Roid et al. 2013); no children were excluded on this basis. PICTURE COMPREHENSION IN LATE TALKING CHILDREN 17 Table 1. Mean and standard deviation for demographic, questionnaire, and vocabulary scores for samples at first timepoint (T1) and second timepoint (T2). Timepoint T1: 2.0 – 2.4-years-old (N = 59) T2: 3.5 – 3.9 years-old (N = 29) TD (n = 38) LT (n = 21) TD (n = 20) LT (n = 9) Age (years) 2.2 (0.12) 2.2 (0.12) 3.7 (0.12) 3.8 (0.15) Gender (ratio, m : f) 16 : 22 14 : 7 7 : 13 7 : 2 Receptive vocaba 384.00 (38.00) 258.00 (93.40) 119.45 (5.35) 111.11 (7.54) Expressive vocaba 331.00 (73.20) 60.00 (49.50) 122.75 (8.43) 108.11 (12.90) Social ability (SRS-2)b 27.90 (12.40) 32.10 (10.80) Non-verbal IQ (Leiter-3)c 98.55 (6.68) 92.00 (12.80) LT = late talker; SRS-2 = Social Responsiveness Scale-2; TD = typically developing a T1: Communicative-Development Inventories (raw scores); T2: Receptive/Expressive One-Word Picture Vocabulary Tests (standardised scores). b Higher scores indicate lower responsiveness/ability. Raw scores were used. c Standardised scores were used. Descriptive analyses We used Welch’s one sample t-tests to compare each population’s overall picture comprehension accuracy, and accuracy on each trial type, against chance (50%). The data were checked for outliers using the rstatix package in R, and were normally distributed using the Shapiro-Wilk test. At the first timepoint (T1), when participants were aged 2.0 – 2.4-years-old, TD children performed significantly above chance overall (M = 0.60; t(37) = 4.64, p < .001). TD children performed below chance on the Familiar-Matched Label trials, but above chance on all other trial types: Familiar-Distinct Label (p <.001), Unfamiliar-Matched Label (p = .007), and Unfamiliar-Distinct Label (p = .004; Figure 2). The difference between FamiliarPICTURE COMPREHENSION IN LATE TALKING CHILDREN 18 Matched Label (M = 0.42) and Familiar-Distinct Label (M = 0.72) trials demonstrated that TD children were able to use verbal labels to scaffold their understanding of pictures and objects. In line with Callaghan (2000), children responded accurately when objects were familiar and had different basic labels, but responded inaccurately when familiar objects shared the same basic label. Performance on the Unfamiliar trial types indicated that when objects were unfamiliar, children were also able to utilise perceptual similarities between pictures and objects to select the correct object. Not knowing the basic or subordinate label in these conditions was thus advantageous, as it enabled them to utilise perceptual similarity. LT children did not perform significantly above chance overall (M = 0.53; t(20) = 1.44, p = .083). They performed below or at chance in Familiar-Matched Label, UnfamiliarMatched Label and Unfamiliar-Distinct Label trials (Figure 2). They performed above chance on Familiar-Distinct Label trials (M = 0.59; p = .021), and at a level similar to TD children in Familiar-Matched Label trials (M = 0.42). This suggests that LT children were able to use linguistic labels to support their picture-object matching when they were available. Scoring below chance when objects were unfamiliar suggested that LT children struggled to match pictures to unfamiliar objects based upon perceptual similarities alone. At the second timepoint (T2), when participants were aged 3.5 – 3.9-years-old, both TD and LT children performed above chance overall (Figure 2; TD: M = 0.80; t(19) = 10.93, p < .001; LTs: M = 0.73; t(8) = 4.80, p <.001). Both TD and LT children performed at chance in Familiar-Matched Label trials, but significantly above chance in all other trial types. These results indicate that LT children were largely able to utilise both perceptual information and linguistic labels in the task at T2. PICTURE COMPREHENSION IN LATE TALKING CHILDREN 19 *p <.05, ** p <.01, *** p <.001; p-values to 3 decimal places; within-group one-sample Welch t-Tests against chance (50%) Figure 2. Mean accuracy and standard error at test across trial types per group over time. Trial types: Familiar = known objects to the child; Unfamiliar = unknown objects to the child; Matched Label = object pairs with the same global label and same basic label, inhibiting verbal scaffolding; Distinct Label = object pairs with the same global label and different basic labels, allowing verbal scaffolding. PICTURE COMPREHENSION IN LATE TALKING CHILDREN 20 General linear mixed effects model analyses All GLME analyses were undertaken with the same procedure. All models predicted child task accuracy as the dependent variable, and were built in R [version 1.1.463] using the glmer function in the package lme4 (Bates et al., 2015). Models were built up sequentially, adding in one fixed effect at a time and comparing each model to the previous best-fitting model using log likelihood tests. Each model was built up from a null model containing random effects of participant and target. Random slopes of participant per target failed to converge. Where longitudinal data was analysed, we also attempted to fit a random slope of timepoint per participant, but this failed to converge. To analyse fixed effects of trial type, we coded them as follows: object familiarity: Unfamiliar coded as 0, and Familiar coded as 1, and language scaffolding: Matched Label coded as 0, and Distinct Label coded as 1. Due to the number of analyses conducted, only results from best-fitting models that found significant effects of variables of interest are reported here. All best-fitting models were tested for normality and overdispersion1 and can be viewed on the Open Science Framework (https://osf.io/ywmx5/?view_only=14f51c730c4c47758893bc684d7cebf5). All post-hoc mediation analyses were undertaken using the mediation package in R [version 1.1.463] (Tingley et al., 2014). For each analysis, 1000 simulations were used to estimate model effects using the quasi-Bayesian Monte Carlo method (Imai et al., 2010). Does late talking status predict symbolic picture comprehension over time? We conducted a GLME analysis with added fixed effects of population and timepoint to trial type. The best-fitting model to the data contained fixed effects of timepoint, population, language scaffolding and object familiarity, with an interaction between language 1 Due to the disparate scales utilised for each measure (i.e. accuracy as 0 or 1, and vocabulary as 0 – 416), in some cases convergence warnings were issued when fitting GLME analyses. Where this occurred, vocabulary measures were scaled by dividing the vocabulary score by 100, so they were on a closer scale to accuracy. This is indicated in the Tables reporting GLME result estimates. Please see R code on OSF for more details. PICTURE COMPREHENSION IN LATE TALKING CHILDREN 21 scaffolding and object familiarity, and random effects of participant and target (Table 2; c2(3) = 13.51, p = .004). The pattern of accuracy for each trial type was consistent over both timepoints: relative to Unfamiliar-Matched Label trials where objects were unfamiliar and had the same unknown category label, children performed significantly less accurately in the FamiliarMatched Label condition where the objects were familiar and shared the same known category label (p < .001). The significant two-way interaction was caused by a significant difference between trial types involving familiar, but not unfamiliar, objects: participants performed significantly more accurately in the Familiar-Distinct Label condition where verbal scaffolding could assist children’s mapping of pictures to familiar objects (p < .001). Performance was highest in Familiar-Distinct Label trials and least accurate in FamiliarMatched Label trials (Figure 2), consistent with Callaghan (2000). Children performed similarly to Unfamiliar-Matched Label trials in the Unfamiliar-Distinct Label trials (when objects were unfamiliar but had different basic category labels; p = .603). The added effect of timepoint indicated that participants performed significantly more accurately at age 3.5 – 3.9-years-old as compared to 2.0 – 2.4-years-old (p < .001), and the effect of population indicated that TD children performed significantly more accurately than LT children when data from both timepoints were combined (p = .022). Thus, the longitudinal analysis indicated that there was a predictive effect of latetalking status on performance across time, with LT children attaining lower accuracy scores overall when total performance was assessed across both timepoints. However, as there were no interactions between trial type and population, the results also suggested that the facilitative effect of linguistic scaffolding was stable across both populations. PICTURE COMPREHENSION IN LATE TALKING CHILDREN 22 Table 2. Longitudinal analysis of task accuracy over time: summary table of best-fitting model from general linear mixed effect model analyses predicting accuracy over time, using fixed effects of trial type (object familiarity and language scaffolding), population (TD or LT) and timepoint. Fixed effect estimate SE z-value p-value (intercept)a Familiar Distinct Label Familiar * Distinct Label Timepoint (T2: 3.5 – 3.9-years-old) Population (TD) 0.34 -0.97 -0.13 1.34 0.99 0.34 0.20 0.24 0.25 0.35 0.14 0.15 1.67 -3.96 -0.52 3.82 7.04 2.28 .094 <.001 .603 <.001 <.001 .022 LT = late talker; TD = typically developing aIntercept corresponds to Matched Label (no language scaffolding; 0) and Unfamiliar (object unknown to child; 0), population LT, and timepoint T1 (2.0 – 2.4-years-old). How do concurrent receptive and expressive vocabulary contribute to picture comprehension at different ages? Receptive vocabulary: We conducted three separate GLME analyses to identify the effects of receptive vocabulary on cross-sectional task performance at T1 and T2, collapsing across LT and TD data. For all analyses, fixed effects of trial type were used; only fixed effects of receptive vocabulary differed. When predicting T1 task performance, T1 receptive vocabulary (CDI) was used. When predicting T2 task performance, one model tested the effect of prior T1 receptive vocabulary (CDI), and the other tested the effect of T2 receptive vocabulary (ROWPVT-4). At T1, there was an added effect of concurrent receptive vocabulary to that of trial type predicting task performance (Table 3; model comparison: c2(2) = 9.14, p = .010). This PICTURE COMPREHENSION IN LATE TALKING CHILDREN 23 indicated that children with higher concurrent receptive vocabularies performed significantly more accurately at 2 – 2.4-years-old (p = .038). At T2, there was no added predictive effect of concurrent receptive vocabulary (ROWPVT-4) to that of trial type, and no interactions were found. However, prior receptive vocabulary at T1 did predict accuracy in addition to the effect of trial type (Table 3; model comparison: c2(3) = 10.11, p = .018), showing that children with higher receptive vocabularies at 2.0 -2 .4-years-old, performed more accurately on the picture comprehension task when they were 3.5 – 3.9-years-old (p = 0.18). Table 3. Cross-sectional analyses of predictive effect of receptive vocabulary on task accuracy: summary of best-fitting model from general linear mixed effect model analyses predicting T1 and T2 accuracy with fixed effects of trial type (object familiarity and language scaffolding) and T1 receptive vocabulary. T1: age 2.0 – 2.4-years-old Fixed effect estimate SE z-value p-value (intercept)a Familiar Distinct Label Familiar * Distinct Label T1 receptive vocabulary (CDI)b -0.13 -0.75 0.06 0.97 0.16 0.31 0.21 0.21 0.30 0.08 -0.42 -3.51 0.28 3.24 2.07 .674 <.001


Introduction
The use of symbols is a uniquely human cognitive hallmark and is vital to communication (DeLoache, 1995;Tomasello et al., 2005). A symbol is something that someone intends to represent something else, and can take many forms including gestures, graphics, text, words, and maps (DeLoache, 2004). Children are immersed in a symbolic world from infancy, and the types of symbols children understand are subject to both cultural context and social scaffolding (Callaghan et al., 2011;Rakoczy et al., 2005).
Children in Western societies are exposed to pictures from an early age. Children use linguistic labels to scaffold their understanding of pictures (Callaghan, 2000), and the development of language and other symbolic domains, such as symbolic play, are closely related (Quinn et al., 2018). This means that early language impairments have the potential to also affect children's understanding of non-linguistic symbol systems. Although the literature has established that symbolic understanding, language ability, and social context interact in typical development (Callaghan & Corbit, 2015), we do not fully understand how these domains affect each other over time. Furthermore, their trajectory in atypical development is not well defined, and the effect of language delay on how children understand pictures remains under-investigated. Examining the effect of language delay on picture comprehension is crucial to understanding whether children with these difficulties have functional impairments in additional symbolic domains, and also offers an opportunity to elucidate how language scaffolds symbolic understanding during development.

Language and picture comprehension in typical development
In order for children to understand pictures as symbols, they need to acquire dual representation; the understanding that a symbol is not just an object, but also a representation of something else (DeLoache, 2004). At 9-months-old, infants manually investigate pictures as if they were real objects, grasping and plucking at depicted items .
By 18-months-old, they begin pointing and talking about pictures rather than handling them, suggesting that they have begun to treat pictures as symbols, rather than as objects in themselves (Pierroutsakos & DeLoache, 2003).
Language can aid children in understanding the representative nature of pictures, as verbal labels provide clues about how 2-D visual symbols relate to referents in the world (Callaghan, 2000;Ganea et al, 2008). When testing 2-year-olds, Preissler and Bloom (2007) demonstrated that labelling a picture of an unfamiliar object ('this is a wug. Can you show me another one?') directed children to identify the symbolised object 90% of the time, whereas children only identified symbolised referents 30% of the time when pictures were not labelled ('look at this. Can you show me another one?'). As children quickly learn that verbal labels refer to objects in the world, the act of labelling cues children to view pictures as symbolic representations rather than objects. Children aged 15, 18, and 24 months will spontaneously extend a novel label (e.g. 'whisk') taught using a picture to its corresponding 3-dimensional referent (e.g. an actual whisk; Ganea et al., 2009;Preissler & Carey, 2004). These findings show that young children understand that verbal labels paired with pictures refer to independently existing referents, and also that the pictures themselves are representational and not the exclusive referents for their associated labels.
However, language itself is a symbol system that caregivers heavily invest in, going to considerable lengths to teach their children words. Children may thus learn verbal representations for concepts (e.g., understanding how the label 'dog' relates to the world) before they learn how pictures or other symbols relate to the same concept. Callaghan (2000) explicitly demonstrated that children use verbal labels to scaffold their understanding of pictures and objects, but also that this differs according to age. Children were shown a series of line drawings and were asked to identify the referent of each drawing from a pair of 3dimensional objects (with the picture removed from view). In some trials, linguistic scaffolding was unavailable, as the two paired objects had the same category label (e.g. two types of dog). In other trials, linguistic scaffolding could be used, as the two paired objects had distinct category labels (e.g. dog and cat). The study demonstrated that 2.5-year-olds only performed above chance when pictures could be unambiguously matched to objects using distinct verbal labels, while 3-year-olds performed above chance even without linguistic scaffolding. For younger children, whose understanding of the pictorial symbol system was relatively fragile, verbal labels were valuable in bridging the gap between images and their depicted referents. Older children, however, were able to rely on the perceptual similarities between images and their referents to accurately identify picture-object relationships in the absence of linguistic scaffolding.
More broadly, language may provide a basis for other symbol systems during development (Callaghan, 2020;Nelson, 2007;Tomasello, 2003). A meta-analysis of symbolic play studies found a significant interaction for symbolic play between age and whether expressive or receptive language measures were used (35 studies; p = .006; Quinn et al., 2018). This demonstrated that symbolic play was related to concurrent receptive measures in children under 3-years-old (r = .41), whereas concurrent expressive measures better predicted symbolic play in studies of children over 3-years-old (r = .36). However, this interaction was driven by a difference in effect sizes for receptive, rather than expressive vocabulary, as the expressive effect size remained stable across ages, making any differential effects at different ages hard to clearly identify. As picture comprehension and symbolic play skills appear to be closely related (Rochat & Callaghan, 2005), any differential effects of emerging language ability on symbolic play may also affect picture comprehension at different ages.
Few studies have assessed how picture comprehension and language skills inter-relate during early development. Of these studies, some have found different effects of receptive and expressive language ability on pictorial understanding and wider symbolic ability. Callaghan and Rankin (2002) assessed picture comprehension and production at 28, 36, and 42 months. The authors found that picture comprehension scores positively correlated with receptive language and picture production scores positively correlated with expressive language. Kirkham et al. (2013) also assessed the relationship between language, pictorial understanding, and symbolic play. They found that the mean length of five longest utterances (MLU5) at 4 years predicted symbolic play and representational drawing ability at 5 years, and that receptive and expressive language score combined at age 4 predicted symbolic play at age 5. Receptive vocabulary has also been found to correlate with performance on scale model and picture search tasks (finding a hidden object in a room based on the location of a miniature object positioned in a scale model of the same room or searching a location represented by a picture; Hartley & Allen, 2015a;Homer & Nelson, 2009). In summary, cross-sectional and longitudinal evidence shows that linguistic and nonlinguistic symbolic domains are developmentally inter-related. Verbal labelling scaffolds symbolic understanding of picture-object relationships (Callaghan, 2000;Ganea et al., 2009) and expressive and receptive language abilities correlate with pictorial tasks, but may exhibit different effects at different ages (Callaghan & Rankin, 2002;Kirkham et al., 2013). However, we do not know whether early language delays cause deficits in picture comprehension over time. Furthermore, in typical development, differential effects of expressive and receptive language on symbolic ability have proved difficult to identify (Quinn et al., 2018). Studying productive language impairments provides a unique opportunity to explore how receptive and expressive language skills interact differentially with pictorial understanding over time.

Language and picture comprehension in atypical development
Late talking (LT) children are defined as 18-30-month-old children at or below the 10 th percentile of expressive vocabulary compared to other children their age, without neurodevelopmental or sensory deficits (Fisher, 2017). Although the majority of LT children recover by approximately 5 years (Rescorla, 2011), a minority -between ~12% -25% (Collisson et al., 2016;Henrichs et al., 2011;Reilly et al., 2010;Roulstone et al., 2002;Zubrick et al., 2007) -develop Developmental Language Disorder (DLD). Although many LT children reach the neurotypical range for expressive vocabulary by school age, they consistently score on the lower end of this range across a variety of language measures (Domsch et al., 2012;Rescorla, 2002Rescorla, , 2005Rescorla et al., 2000;Rice et al., 2008).
LT children are characterised by expressive vocabulary deficits, yet can have varying receptive vocabulary skills (Fisher, 2017), whereas expressive and receptive vocabulary are more tightly intertwined in typically developing (TD) children. Evidence that expressive and receptive vocabulary might exert differential effects on pictorial understanding can be found in autism spectrum disorder (ASD) studies, as children with ASD often exhibit a range of language difficulties (Eigsti et al., 2011). Studies that test the extension of words from pictures to symbolised referents in minimally verbal children with ASD who are matched with TD children on receptive vocabulary have found deficits in the ASD sample (mean receptive age ~ 3.5-years-old; Hartley & Allen, 2015b;Preissler, 2008). However, when adapting Callaghan's (2000) linguistic scaffolding task for TD and ASD samples that were matched on both expressive and receptive language (mean ~ 4.5-years-old), Hartley et al. (2019) found that children with ASD and TD children performed identically across all trial types. Both samples showed lower accuracy on trials where they could not use verbal labels relative to trials where they could. In the ASD sample, both receptive and expressive language predicted task performance; in the TD sample, only receptive language was predictive. These studies suggest that children with expressive, but not receptive, deficits might struggle with utilising verbal scaffolding in pictorial understanding tasks.
One possible explanation for differential effects of receptive and expressive language on pictorial understanding is simply that children who say less experience fewer opportunities to participate in social situations where pictures are utilised. Many accounts of symbolic understanding rely on a foundation of socio-cognitive skills, such as imitation and intention reading (Nelson, 2007;Rakoczy et al., 2005;Rochat & Callaghan, 2005;Tomasello, 2003Tomasello, , 2010Vygotsky, 1980). For example, Rochat and Callaghan (2005) argue that pictures are inherently communicative, and understanding them is driven by a 'basic affiliative need' to communicate and identify with other humans. They describe pictorial understanding development in stages that are built on social factors, beginning from infants (12-months-old) who imitate the actions of adults when given pictorial symbols, to toddlers (2 -4-years-old) who use social scaffolding through language and imitation to understand pictures, and finally to school-aged children (4 -5-years-old) who begin to understand not only symbol-referent relations, but also the intentions of the symbol-creator.
Differences in socio-cognitive ability may contribute towards some of the differences in pictorial understanding found in ASD (Hartley & Allen, 2014) and may also be affected by a delay in expressive vocabulary (although directionality in LT is difficult to specify).
Expressive delay could potentially reduce opportunities to learn from caregivers that verbal labels are used to scaffold picture comprehension, and result in LT children having less practice in applying a linguistic strategy. Caregivers of children with expressive language delay have been found to provide less complex recasts (Conti-Ramsden, 1990), less lexical and prosodic information (D'Odorico & Jacob, 2006), and produce fewer expansions, less self-directed speech, and less general responses (Vigil et al., 2005). Others have found no difference in maternal input, but rather found that as LT children simply say less, caregivers have less to expand upon (Paul & Elwood, 1991). Outcome studies also suggest that there may be social impairments associated with expressive language delay, with some finding lower social competency in LT children (Horwitz et al., 2003;Longobardi et al., 2016).
Overall, despite socio-cognitive skills forming the basis of theoretical accounts of symbolic development, we do not know how individual differences in social ability in LT children might interact with pictorial understanding. It is possible that the impact of expressive language delay on the availability of social scaffolding, or vice versa, may affect pictorial understanding. Equally, social ability may well compensate for deficits in expressive vocabulary; LT children who are more socially orientated may invite more social scaffolding behaviour than those who are not.

The current study
In sum, there are three distinct areas in which further research is necessary. Firstly, although symbols form a key part of communication throughout life, and TD children use language before 3 -4-years-old to scaffold their understanding of pictorial symbols, we do not know how early language delay affects picture comprehension in the absence of ASD. No studies to date have investigated how linguistic scaffolding of pictorial understanding might be affected in LT children, and other research suggests that language delay might be related to differences in symbolic play (Lyytinen et al., 2001;Rescorla & Goossens, 1992). As symbolic play, pictorial understanding, and language are developmentally inter-related (Callaghan & Rankin, 2002;Kirkham et al., 2013), LT children may also exhibit deficits in pictorial understanding.
Secondly, despite evidence from typical and atypical populations that expressive and receptive vocabulary might have different effects as pictorial understanding develops, very few studies have probed this relation directly. This means we do not know how emerging language skills interact with pictorial understanding at different ages.
Thirdly, regardless of theoretical literature maintaining that social scaffolding and language are crucial to pictorial understanding, the relationships between individual social ability, language delay, and pictorial understanding have not been directly investigated in populations with isolated expressive language delay. We address these issues by adapting Callaghan's (2000) verbal scaffolding picture comprehension task in a longitudinal study of LT and TD children. We manipulated the availability of verbal labels when asking children to match pictures to real objects and assessed their concurrent language skills at 2 -2.4-years-old (timepoint 1; T1) and 3.5 -3.9years-old (timepoint 2; T2). We also considered the effect of social ability measured at 2 -2.4-years-old.
We hypothesised that LT children would respond less accurately than TD children when linguistic scaffolding is available, and on par with TD children in conditions when linguistic scaffolding is inaccessible. We also hypothesised that expressive vocabulary at both T1 and T2 would positively predict picture comprehension accuracy, and that receptive vocabulary would be positively correlated with expressive vocabulary. We originally hypothesised that LT children who did not recover (i.e. they failed to reach the TD expressive vocabulary range by the second timepoint at 3.5 -3.9-years-old) would perform less accurately with linguistic scaffolding, and that LT children who did recover (i.e. they reached the TD expressive range by T2) would perform on par with TD children (see preregistrations). However, as we could only test half of the original sample due to COVID-19, resulting in small subgroups, we utilised concurrent expressive vocabulary as a continuous variable across all participants at T2.
As an exploratory analysis, we also hypothesised that children with less sophisticated social ability would score lower on picture comprehension accuracy.

Participants
Participants were part of a longitudinal project intended to capture differences between LT and TD children for 18 months, between the ages of 2 -2.4-years-old to 3.5 -3.9-years-old. The picture comprehension task was administered at the first and last time points.
Participants were recruited using flyers from Lancaster Babylab, via health visitors in the Lancashire local authority, and from nurseries in the local area. Once consent to contact was obtained, parents completed the Oxford-CDI (Hamilton et al., 2000), a parent-reported checklist of words the child says and understands. Children were included in the study if they met one of the following criteria: TD with productive vocabulary score ≥ 25 th percentile, or LT with productive vocabulary score ≤ 10 th percentile. These criteria were chosen to ensure two distinct groups, with the LT criterion consistent with prior literature (Fisher, 2017).
Inclusion criteria also included monolingual British English, with no history of developmental or sensory delays or disorders.
A total of 85 families completed the CDI; of these, 24 were excluded due to the aforementioned criteria. A total of 61 children (40 TD and 21 LT) took part in the study at the first time point aged 2.0 -2.4-years-old (T1); however, 2 TDs did not complete the pictorial understanding task due to fussiness and so were excluded from the final sample of 59 children (38 TD and 21 LT). At 3.5 -3.9-years-old (18 months from baseline; T2) a total of 29 children (20 TD and 9 LT) were tested before the COVID-19 pandemic halted all face-toface testing.

Questionnaires
Participants completed the Oxford-CDI (Hamilton et al., 2000) at consent-to-contact.
Caregivers completed a demographics questionnaire and the Preschool Social-Responsiveness Scale-2 (SRS-2; Constantino & Gruber, 2012) at the T1 test visit. The SRS-2 is a caregiver-reported list of behaviours rated by frequency of occurrence that is often used to identify children at risk of autism spectrum disorder (ASD). However, as it is designed to discriminate typical and atypical behaviours at a young age and was normed on a welldistributed sample of children aged 30 -54-months-old (albeit in the US), it was used as a measure of individual social proficiency (raw scores) within our sample. The experimenter conducted the Receptive and Expressive One-Word Picture Vocabulary Tests (ROWPVT-4 and EOWPVT-4 respectively; Martin & Brownell, 2011) and the Leiter-3 Non-Verbal IQ 4subscore scale (Roid et al., 2013) at the T2 test visit. The ROWPVT-4 and EOWPVT-4 are picture-based vocabulary tests; the ROWPVT-4 is administered by showing four pictures to a child and asking them to identify a named picture (e.g. 'which is the rabbit?'), whereas the EOWPVT-4 is administered by showing a single picture to a child and asking them to name the picture (e.g. 'what's this?'). The Leiter-3 Non-Verbal IQ 4-subscale involves a series of core subtests that measure reasoning, visualisation, and problem solving without the need to utilise verbal instructions, allowing assessment of non-verbal IQ in children with low language ability.

Picture comprehension task
Objects: We adapted Callaghan's (2000) picture comprehension task, using the same criteria for selecting relevant stimuli. There were 32 different objects in total, split into 16 pairs. For each condition, there were four trials, one from each of four groups: animals, natural, household/indoor artifacts, and vehicles (16 trials in total). For the Familiar-Matched Label condition, pairs of familiar objects had the same basic label (e.g. dog) but different subordinate labels (e.g. German Shepherd and Burmese Mountain). For the Unfamiliar-Matched Label condition, pairs of unfamiliar objects had the same basic label (e.g. coral) but different subordinate labels (e.g. elkhorn coral and encrusting coral). For the Familiar-Distinct Label condition, pairs of familiar objects had the same global label (e.g. animal) but different basic labels (e.g. cat and rabbit). For the Unfamiliar-Distinct Label condition, pairs of unfamiliar objects had the same global label (e.g. vehicle) but different basic labels (e.g. quadbike and jet ski).
We ensured that perceptual discriminability of paired objects was similar across trial types and stimuli groups. For sets of animals, different fur colours and poses were chosen (sitting vs. standing dogs); for artifacts, different colours, materials and shapes were chosen, and so on. All objects were roughly the same size. Caregivers were consulted prior to participation on their children's familiarity with the test objects, and the age-norms for objects were checked using Fenson et al. (1994; familiar objects: M age = 13.92 months-old, range = 10-16-months-old). Example stimuli can be seen in Figure 1 (see Appendix A for all stimuli used).
Pictures: Sixteen black and white laminated cards were used that had a simple black pen drawing of one familiar or unfamiliar object.
Display: Objects were placed on a tray with a deep lid that had a handle and a cut out at the back that allowed the experimenter to rearrange objects out of sight of the child. The objects remained hidden until the experimenter lifted the lid to reveal the two objects sitting on the tray.

Procedure:
We adapted Callaghan's (2000) picture comprehension task that manipulates the availability of linguistic scaffolding. We manipulated the label for choice objects across conditions where it could not be used (Matched Label trials; two objects with the same basic label, e.g. two types of dog) and conditions where it could be (Distinct Label trials; two objects with different basic labels, e.g. rabbit and cat). We also manipulated the familiarity of objects depending on the child's knowledge of the labels and objects (Familiar and Unfamiliar) within the Matched Label and Distinct Label trials. The order of trial types was randomised per participant, with no more than two trial types of the same time presented consecutively.
In Familiar-Distinct Label trials, linguistic scaffolding is possible -if the participant generates a label when the target is cued, they can achieve a correct response by identifying the target object based on its matching label, rather than responding based on perceptual similarity alone. However, this strategy is unavailable in the other conditions when both referent objects share the same known label as the picture (Familiar-Matched Label), or labels are unknown (Unfamiliar trials).
The task was administered at T1 and T2. Participants were tested with the same mobile set-up for the task, either at the participant's home or in a designated room at the Babylab depending on the family's preference. Where visits took place at home, care was taken to ensure a clear space and a quiet environment with just the experimenter, child, and caregiver present.
During the task, the child and experimenter were sitting on opposite sides of a 1-metre wide, low fold-out table. The experimenter held up the relevant trial picture card (e.g. cat) and said "Look!". The picture was presented for 4 seconds before being removed from view.
The experimenter then lifted the lid of the box to reveal the two relevant trial objects, one of which resembled the picture (e.g. cat), and the other, a paired foil object (e.g. rabbit). On displaying the objects, the experimenter asked "Which one is the same as the picture?" The trial ended when the child made a response (either by pointing with fingers or palm, or picking up the relevant object).

Results
All data and code can be found at: https://osf.io/ywmx5/?view_only=14f51c730c4c47758893bc684d7cebf5, alongside preregistrations with a document that explains deviations due to the COVID-19 pandemic.

Overview of analyses
We first describe the sample and report descriptive analyses of the data using Welch one-sample t-tests, identifying how TD and LT children performed against chance. We then conducted three analyses to assess our research hypotheses. The first tested the longitudinal predictive effect of LT status over time using generalised linear mixed effects modelling (GLME). The second tested whether receptive and expressive vocabulary measures could predict performance cross-sectionally at different ages (T1 and T2) using GLME analyses and a post-hoc mediation analysis. The third assessed whether social ability at T1 had any additive predictive value on accuracy at T1 or T2 by comparing GLME model fits to the data, with and without social ability, and by using a post-hoc mediation analysis. Table 1 contains T1 and T2 final sample demographics, questionnaire, and vocabulary scores. Almost all families identified as White British and the majority (92%) had at least one parent with an education level corresponding to an undergraduate University degree or above.

Sample
Due to the COVID-19 pandemic halting all face-to-face testing, only 29 of the original 59 children were tested at T2. TD and LT children did not differ in SRS-2 (t(46.39) = 1.35, p = .183) or Leiter-3 scores (t(10.02) = -1.45, p = .178). In order to be included in the study, children had to exceed a Leiter-3 cut-off score of 79, which corresponds to the upper boundary of "low" performance associated with the test (Roid et al. 2013); no children were excluded on this basis.

Descriptive analyses
We used Welch's one sample t-tests to compare each population's overall picture comprehension accuracy, and accuracy on each trial type, against chance (50%). The data were checked for outliers using the rstatix package in R, and were normally distributed using the Shapiro-Wilk test.

General linear mixed effects model analyses
All GLME analyses were undertaken with the same procedure. All models predicted child task accuracy as the dependent variable, and were built in R [version 1.1.463] using the glmer function in the package lme4 (Bates et al., 2015). All post-hoc mediation analyses were undertaken using the mediation package in R [version 1.1.463] (Tingley et al., 2014). For each analysis, 1000 simulations were used to estimate model effects using the quasi-Bayesian Monte Carlo method (Imai et al., 2010).

Does late talking status predict symbolic picture comprehension over time?
We conducted a GLME analysis with added fixed effects of population and timepoint to trial type. The best-fitting model to the data contained fixed effects of timepoint, population, language scaffolding and object familiarity, with an interaction between language scaffolding and object familiarity, and random effects of participant and target (Table 2; c 2 (3) = 13.51, p = .004).
The pattern of accuracy for each trial type was consistent over both timepoints: relative to Unfamiliar-Matched Label trials where objects were unfamiliar and had the same unknown category label, children performed significantly less accurately in the Familiar-Matched Label condition where the objects were familiar and shared the same known category label (p < .001). The significant two-way interaction was caused by a significant difference between trial types involving familiar, but not unfamiliar, objects: participants performed significantly more accurately in the Familiar-Distinct Label condition where verbal scaffolding could assist children's mapping of pictures to familiar objects (p < .001).

Performance was highest in Familiar-Distinct Label trials and least accurate in Familiar-
Matched Label trials (Figure 2), consistent with Callaghan (2000). Children performed similarly to Unfamiliar-Matched Label trials in the Unfamiliar-Distinct Label trials (when objects were unfamiliar but had different basic category labels; p = .603).
The added effect of timepoint indicated that participants performed significantly more accurately at age 3.5 -3.9-years-old as compared to 2.0 -2.4-years-old (p < .001), and the effect of population indicated that TD children performed significantly more accurately than LT children when data from both timepoints were combined (p = .022).
Thus, the longitudinal analysis indicated that there was a predictive effect of latetalking status on performance across time, with LT children attaining lower accuracy scores overall when total performance was assessed across both timepoints. However, as there were no interactions between trial type and population, the results also suggested that the facilitative effect of linguistic scaffolding was stable across both populations.

How do concurrent receptive and expressive vocabulary contribute to picture comprehension at different ages?
Receptive vocabulary: We conducted three separate GLME analyses to identify the effects of receptive vocabulary on cross-sectional task performance at T1 and T2, collapsing across LT and TD data. For all analyses, fixed effects of trial type were used; only fixed effects of receptive vocabulary differed. When predicting T1 task performance, T1 receptive vocabulary (CDI) was used. When predicting T2 task performance, one model tested the effect of prior T1 receptive vocabulary (CDI), and the other tested the effect of T2 receptive vocabulary (ROWPVT-4).
At T1, there was an added effect of concurrent receptive vocabulary to that of trial type predicting task performance (Table 3; model comparison: c 2 (2) = 9.14, p = .010). This indicated that children with higher concurrent receptive vocabularies performed significantly more accurately at 2 -2.4-years-old (p = .038).
At T2, there was no added predictive effect of concurrent receptive vocabulary (ROWPVT-4) to that of trial type, and no interactions were found. However, prior receptive vocabulary at T1 did predict accuracy in addition to the effect of trial type (Table 3;   At T1, the GLME analysis did not find a predictive effect of population above that of trial type, and no interactions were found. The lack of a population effect suggested that at 2.0 -2.4-years-old, expressive vocabulary was not predictive of pictorial understanding performance. At T2, population at T1 did not predict accuracy. However, T2 expressive vocabulary (EOWPVT-4) did predict accuracy in addition to the effect of trial type (Table 4; model comparison: c 2 (3) = 10.12, p = .018). The best fitting model to the data demonstrated that as children's concurrent expressive vocabulary at 3.5 -3.9-years-old increased, so did their picture comprehension accuracy (p <.001).

Relationship between receptive and expressive vocabulary in predicting task accuracy:
The cross-sectional analyses indicated that early receptive vocabulary at ~2-years-old predicted both concurrent and later task accuracy at ~3.5-years-old, and later expressive vocabulary at ~3.5-years-old predicted concurrent task accuracy at ~3.5-years-old.
To tease apart the relative contribution of T1 receptive vocabulary and T2 expressive vocabulary to T2 task accuracy, we conducted a further post-hoc mediation analysis ( Figure   3). The effect of T1 receptive vocabulary on T2 picture task accuracy was significantly mediated through T2 expressive vocabulary (Average Casual Mediation Effects: 0.07; 95% CI: [0.01, 0.12]; p = .016). The results indicated that of the estimated increase in probability of task accuracy at ~ 3.5-years-old (total effect: 0.10) due to earlier receptive vocabulary at 2 -2.4-years-old, 0.07 was estimated to be mediated through later expressive vocabulary at 3.5 -3.9-years-old, and 0.03 was estimated to be from earlier receptive vocabulary at 2 -2.4years-old.

Is the differential effect of expressive and receptive language in picture comprehension tasks mediated by social ability?
To test whether there was any effect of T1 social ability on task accuracy, we fitted an additional GLME model with SRS-2 as an additional fixed effect, and compared it to the original best-fitting model for each time point.
For T1, adding SRS-2 to the best-fitting model with fixed effects of trial type and T1 receptive vocabulary was beneficial. Adding SRS-2 yielded a better fit to the data than a model without SRS-2 (Table 5; model comparison: c 2 (1) = 5.40, p = .020), suggesting that children with less social responsiveness were less accurate at matching pictures to symbolised objects (p = .023) regardless of language ability.
For T2, a GLME model with SRS-score as an additional fixed effect was not a better fit to the data when compared to the original models.
We conducted a post-hoc mediation analysis to assess whether the effect of T1 receptive vocabulary on T1 task accuracy was mediated through concurrent T1 social ability ( Figure 4). This demonstrated a significant mediating effect of social ability (Average Casual Mediation Effects: 0.02; 95% CI: [0.002, 0.03]; p = .020). The results indicated that of the estimated increase in probability of task accuracy at 2 -2.4-years-old (total effect: 0.04) due to concurrent receptive vocabulary, 0.02 was estimated to be mediated through concurrent social responsiveness, and 0.02 was estimated to be from concurrent receptive vocabulary.

Discussion
Developmental theories propose that language scaffolds children's acquisition and understanding of the pictorial symbol system (Callaghan, 2000;Tomasello, 2003Tomasello, , 2010. Our results indicate not only that language ability affects the developmental trajectory of picture comprehension, but also that receptive and expressive skills may differ in their contribution at different ages, subject to mediating effects of social ability. The use of linguistic scaffolding in the picture comprehension task requires children to generate labels (albeit subvocally). When viewing the picture, children can either generate a label for the depicted object internally or store its visual features if the label is unknown, and then use that information to match the picture to the referent object. There are two opportunities to generate a label: when the target is cued (i.e. a picture of a cat) and when the target object is selected (i.e. a plastic cat and a plastic rabbit on the tray). At an earlier age, receptive vocabulary skills might enable children to understand the task and, to some extent, use linguistic information to activate associated concepts that can be used to help scaffold picture comprehension. However, being more proficient in expressive vocabulary may facilitate children's ability to explicitly generate the label internally and activate associated concepts both when the target is cued and when the object is selected, and thus directly utilise that linguistic information to select the correct object.
More generally, our results suggest that at an earlier age, children may rely more on understanding linguistic information and concurrent social ability, but at a later age, maturing expressive vocabulary skills may become increasingly important in scaffolding picture comprehension. However, as our mediation analyses highlighted, receptive and expressive vocabulary skills and their influence on developing symbolic skills are tightly interwoven.
We now outline the implications of these results for LT children, children's development more generally, and future considerations.

Implications for late-talking children
At both time points, LT children scored lower than TD children on the picture comprehension task. This was reflected in the longitudinal analysis that showed a general effect of population on task accuracy across both timepoints. One possibility is that the smaller expressive vocabularies of LT children might have meant they were less able to retrieve the words (e.g. 'cat') and subsequent representations of the real object (e.g. the concept of a cat) when seeing the picture (i.e. a picture of a cat), resulting in more errors when identifying the depicted object. Similarly, Rescorla and Goossens (1992) suggested that reduced symbolic play in LT toddlers might be secondary to less fluent and less spontaneous retrieval and encoding of lexical entries for semantic representations across both referents and play scripts. However, as no significant effects of population were found cross-sectionally, any differences between the populations in our study were subtle. Furthermore, as there was no significant interaction between population or vocabulary measures with trial type in either the longitudinal or cross-sectional analyses, this suggests that the developmental trajectory for picture comprehension in LT children is not atypical, but rather, is delayed. The results also indicated that the effect of language in scaffolding pictures is stable, even in early expressive language delay.
These findings are in line with outcome studies in LT children showing that the majority of children reach the same range as TD children in language skills by school-age, but fall on the lower end of this range (Rescorla, 2002;2005). The predictive effect of receptive vocabulary at age 2 -2.4 years on picture comprehension in our study was also consistent with early receptive vocabulary being a better predictor of later outcomes than early expressive vocabulary in LT children (Fisher, 2017). Overall, although expressive language mediates linguistic scaffolding of picture comprehension at an older age, categorising our participants using a dichotomous variable at an earlier age did not accurately represent the fine-grained detail contained in our sample as they grew older. This is also consistent with prominent theories which suggest that language ability, in LT children and DLD, falls upon a spectrum (Bishop, 2017;Leonard, 2014).
We did not find any differences in SRS-2 scores in LT and TD children, indicating that early expressive language delay in our sample did not appear to coincide with reduced social proficiency. However, we did find that social ability at age 2 -2.4 years predicted task performance at the same age across the whole sample. The implications of this in typical development are discussed below, but of note is that social ability may actually help mitigate delays that occur alongside, or as a result of, expressive language deficits. This adds to the evidence base for interventions for LT children that make use of social scaffolding to improve language outcomes (e.g. Alt et al., 2014;Cable & Domsch, 2010;Robertson & Weismer, 1999). More pro-social toddlers may benefit from social scaffolding during interactions involving pictures at an early age, even if their expressive vocabulary is less well developed. Children with higher social skills may also receive more exposure to pictures, and thus more exposure to adults labelling pictures, accelerating their acquisition of a linguistic strategy in pictorial understanding. Equally, it is possible that this cycle may occur in the reverse: higher levels of parental input around pictures may encourage prosocial behaviour in toddlers, leading to more opportunities for picture exposure, and thus more opportunities for toddlers to utilise social scaffolding. A third possibility is that both children's social ability and parental input influence each other in a positive feedback loop. In any case, social scaffolding may be a beneficial strategy for future interventions targeting picture comprehension.

General implications for linguistic scaffolding and picture comprehension
Our findings also have several implications for how children comprehend pictures in both LT and TD populations. Across both timepoints, children struggled most with the Familiar-Matched Label trials where access to linguistic scaffolding was blocked. In theory, linguistic scaffolding was only available in Familiar-Distinct Label trials, and selecting the correct referent object in all other trial types required children to attend to the pictures' perceptual features. While children were able to apply this strategy successfully in trials involving unfamiliar objects, the consistently lower performance in Familiar-Matched Label trials in both TD and LT children suggests that generating labels for pictures may not always be a beneficial strategy -the familiar linguistic label in these trials (e.g. 'dog' when there are two types of dog to choose from) seemingly impeded comprehension of the picture based on perceptual resemblance. When faced with unfamiliar objects with unknown labels (e.g. jetski vs. quadbike), Callaghan (2000) suggests that TD children may use an attribute word (e.g. "wheels") to help scaffold performance alongside perceptual similarity. If the use of attribute words such as 'wheels' for the distinction between 'jetski' and 'quadbike' is a strategy applied by TD children, then LT children may also have greater difficulty comprehending unfamiliar pictures due to their comparatively limited vocabulary.
In both samples, children's accuracy on Unfamiliar trial types improved over time, indicating their developing ability to quickly encode mental representations of perceptual features when determining picture-object relationships. This improved accuracy over time may reflect more flexible cognitive strategies towards picture comprehension as children age; whereas linguistic scaffolding may be the default strategy, children showed the ability to adapt over time by using perceptual features as well. It is also possible that the lower performance of LT children overall in picture comprehension across all trials reflects a lack of flexibility or competency in applying different strategies. This would be consistent with word learning literature suggesting that LT children show less flexibility in adopting new strategies for learning novel words than TD children (e.g. Rice et al., 1994;Stokes et al., 2012).
The function that language plays in aiding pictorial understanding may be in creating 'cognitive distance ' (p.132, Homer & Nelson, 2009). By enabling children to treat pictures as distinct to real objects through labels, the salience of the picture itself as an object is reduced, and its status as a symbolic representation is increased. This abstraction afforded by language is also found in category learning (Waxman & Markow, 1995). Children's language ability predicted performance across all trial types in our study, including those that relied on perceptual discrimination, indicating a robust relationship between pictorial understanding and language domains.
The ability to use language in this manner may depend on where in the trajectory of symbolic understanding children are located. At an earlier age, performance in the picture comprehension task was not dependent on being able to talk about pictures, but rather on language comprehension ability and social ability. Social ability both predicted task performance at age 2.0 -2.4 and mediated the effect of receptive vocabulary on task performance. The lack of interaction between condition and receptive vocabulary also suggests that language not only scaffolds picture comprehension -as evidenced by the highest accuracy scores being in Familiar-Distinct Label trials -but also that receptive vocabulary alongside social ability may mediate pictorial understanding more generally.
These results are consistent with a socio-cognitive framework of symbolic understanding, where children at an earlier age rely more heavily on social scaffolding to interact with the world than children at later ages (Callaghan et al., 2004). Striano et al. (2001) found that when given uninteresting or ambiguous objects (e.g., a stapler), 2 -3-yearolds did not perform symbolic actions spontaneously and largely declined to play at all without an experimenter modelling symbolic actions or actively engaging the child.
However, 4-year-olds were better able to play with the items independently. In a longitudinal study, Callaghan and Rankin (2002) also found that cultural scaffolding, consisting of explicitly highlighting the relationship between objects and pictures, improved children's picture comprehension and production in 28-month-olds. Our results also indicate that children may be more vulnerable to interference of pictorial understanding when faced with more social difficulties early on, although none of our sample reached clinical levels of impairment using the SRS-2. Rather, the results reflected individual differences in social proficiency. Future studies that examine clinical levels of social impairment and dual representation tasks in populations that are otherwise typical, or manipulate social cues directly within the task, will help to elucidate these mechanisms.
At an older age, expressive, rather than receptive vocabulary, predicted children's picture comprehension. This may reflect the shifting role of expressive vocabulary in facilitating symbolic understanding more generally at an older age. At ~3.5-years-old, the ability to actively talk about symbols and partake in social discourse, for example, by asking questions and explicitly inviting caregiver expansions about pictures, may help children to understand how symbols depict real items, but are not the same as the real item in itself (i.e. aiding children to understand dual representation; Deloache, 2004).
Similarly, Tomasello and colleagues (Rakoczy et al., 2005;Tomasello et al., 2005) describe language as a means through which children are able to develop other symbolic functions, such as pretend play. With advancing linguistic ability from 3-4 years of age, children are able to engage in meta-representational discourse -and it is this use of expressive discourse that affords them an appropriate vehicle to interpret mental states and broader symbols as referring to real-world concepts and objects. Nelson (2007) also describes an approach where children's external representations of meaning advance from nonintentional imitation of meaning as infants (such as copying gestures or early words), to intentional representation and sharing of meaning as school-aged children (such as using conventional symbolic systems like discourse). This process is facilitated by externalisation of meaning within a social system, such as by using words and gestures with caregivers.
Overall, our results indicate not only that pictorial understanding and language ability are developmentally inter-related, but also that the importance of receptive and expressive vocabulary ability may be weighted differently as children develop symbolic understanding within a social context.

Limitations and future directions
There are a number of considerations that limit our findings. Our study was restricted by smaller sample sizes at T2 as a result of the COVID-19 pandemic. As our task was designed to test children's understanding of symbolic relations between pictures and 3-D objects, data collection could not be completed online, as perceiving all stimuli via a 2-D screen would fundamentally change the nature of the task (Troseth & DeLoache, 1998).
When face-to-face testing resumes, future directions thus include testing a larger sample.
We also did not have IQ data for the whole sample due to the interruption of testingthe Leiter-3 was to be collected at the oldest timepoint due to the increasing stability of IQ constructs with age (Gottfried et al., 2009;Schneider et al., 2014). However, the data we do have indicated no significant differences between populations. Furthermore, a mismatch between verbal and non-verbal ability is not sufficient evidence for diagnosis of DLD (Bishop, 2017), and non-verbal IQ may not predict symbolic or pictorial understanding (Kirkham, 2013). However, it is possible that individual differences in attention and executive functioning were not fully accounted for.
Our sample also consisted of families with similarly high parental education levels and high interest in participating in developmental research. Consequently, although we can be confident that any differences between children in our samples were unlikely to be due to socioeconomic or environmental causes, we cannot extend these findings to populations with contrasting demographic characteristics without further testing. Furthermore, the use of pictures and symbols are subject to cultural differences -for example, Western cultures adopt a different pedagogical approach that entails more social scaffolding around pictorial understanding than non-Western cultures (Callaghan et al., 2011). Thus, our findings are applicable to a specific population where pictures and language have a privileged position in dual representation and broader symbolic understanding.
We also utilised a parent-report measure for vocabulary at T1 rather than an experimenter-administered measure. This may limit the comparison of time points as different measures were used to test vocabulary. However, as we used two distinct cut-offs for the two groups, it is unlikely that parent-report measures were so inaccurate as to incorrectly characterise group status at T1. Furthermore, CDIs can capture a broader assessment of how children utilise language in their everyday lives during the earlier stages of language development.
Of further note is that the ROWPVT-4 and EOWPVT-4 require picture recognition, whereas the CDI does not. Although both the picture comprehension task and the picture vocabulary tasks involve the use of pictures, it is important to note that picture vocabulary tests utilise picture recognition (i.e. identification of what a picture looks like without understanding the picture-referent representational relationship), a skill acquired very early in development that relies upon perceptual ability (DeLoache, 2004), whereas our task tested the rather more advanced ability of picture comprehension (i.e. understanding the representational function of a picture and relating it to a specific real-world referent) as a type of symbolic understanding. It is therefore unlikely that picture vocabulary tests are truly tests of symbolic understanding that result in significant overlap with our picture comprehension task.
A further limitation is that the ROWPVT-4 and EOWPVT-4 were standardised on a US population, whereas the Oxford-CDI was normed on a UK population. This may mean that there are some cultural discrepancies between the two measures (Hamilton et al., 2000).

Conclusions
Our study has implications for both TD and LT children. Through a longitudinal study, we demonstrate firstly that LT children show evidence of less accurate picture comprehension skills over time when compared to a TD sample and, secondly, that these differences are subtle and subject to effects of participant heterogeneity. Late talking (in line with DLD) and its effects on pictorial understanding may thus be best considered on a dimensional scale, rather than a categorical one. Crucially, as the trajectory of development for LT children resembled that of earlier typical development, albeit developmentally delayed, this suggests that a significant early deficit in expressive language does not appear to cause any qualitative differences between domains -language still appears to be an important mediating factor across groups and ages. Thus, language appears to scaffold pictorial understanding not only in typical development, but also in early expressive language delay.
We also demonstrate that the relationship between language and picture comprehension may be partly explained by differences in how receptive and expressive language ability help scaffold picture comprehension over time, with receptive vocabulary predicting picture comprehension at 2-years-old, and expressive vocabulary predicting picture comprehension at 3.5-years-old. This differential weighting may be secondary to the interplay of symbolic understanding and language with social ability and social scaffolding.
At an earlier age, children may rely on social scaffolding as well as language comprehension skills to understand pictures, but at an older age, this may be superseded by the ability to talk about pictures to others. Overall, these findings advance understanding of both atypical and typical development, and demonstrate how language ability, social ability, and pictorial understanding may inter-relate over time.