The sound of silence: Reconsidering infants ’ object categorization in silence, with labels, and with nonlinguistic sounds

A large body of research based on a specific stimulus set (dinosaur/fish) has argued that auditory labels and novel communicative signals (such as beeps used in a communicative context) facilitate category formation in infants, that such effects can be attributed to the auditory signals ’ communicative nature, and that other auditory stimuli have no effect on categorization. A contrasting view, the auditory overshadowing hypothesis, maintains that auditory signals disrupt processing of visual information and, therefore, interfere with categorization, with more unfamiliar sounds having a more disruptive effect than familiar ones. Here, we used the dinosaur/fish stimulus set to test these contrasting theories in two experiments. In Experiment 1 ( N = 17), we found that 6- month-old infants were able to form categories of these stimuli in silence, weakening the claim that labels facilitated their categorization in infants. These results imply that prior findings of no categorization of these stimuli in the presence of nonlinguistic sounds must be due to disruptive effects of such sounds. In Experiment 2 ( N = 17), we showed that familiarity modulated the disruptive effect of nonlinguistic sounds on infants ’ cate- gorization of these stimuli. Together, these results support the auditory overshadowing hypothesis and provide new insights into the interaction between visual and auditory information in infants ’ category formation.

A large body of research based on a specific stimulus set (dinosaur/fish) has argued that auditory labels and novel communicative signals (such as beeps used in a communicative context) facilitate category formation in infants, that such effects can be attributed to the auditory signals' communicative nature, and that other auditory stimuli have no effect on categorization. A contrasting view, the auditory overshadowing hypothesis, maintains that auditory signals disrupt processing of visual information and, therefore, interfere with categorization, with more unfamiliar sounds having a more disruptive effect than familiar ones. Here, we used the dinosaur/fish stimulus set to test these contrasting theories in two experiments. In Experiment 1 (N = 17), we found that 6month-old infants were able to form categories of these stimuli in silence, weakening the claim that labels facilitated their categorization in infants. These results imply that prior findings of no categorization of these stimuli in the presence of nonlinguistic sounds must be due to disruptive effects of such sounds. In Experiment 2 (N = 17), we showed that familiarity modulated the disruptive effect of nonlinguistic sounds on infants' categorization of these stimuli. Together, these results support the auditory overshadowing hypothesis and provide new insights into the interaction between visual and auditory information in infants' category formation.
The way in which infants form categories has been an area of active research over the past 30 years. This is because categorizationthe grouping together of perceptually distinct objects into categoriesis a fundamental cognitive process that is essential for understanding the world in meaningful ways, and that forms the basis for many cognitive functions such as semantic processing and inference making (Cohen & Lefebvre, 2005;Quinn, 2002;Rakison & Yermolayeva, 2010;Westermann & Mareschal, 2014). Studying this process in young infants allows us to explore the origins of knowledge and mental representations.
Much of the research on categorization in infants during their first year of life has made use of the familiarization/novelty-preference paradigm, which is based on infants' preference for looking at novel over familiar stimuli (Fantz, 1964) and measures the looking times of infants as they gaze at stimuli presented on a screen. Studies employing this paradigm typically consist of two phases. Infants are first familiarized with a series of objects from one category, and are then tested with two new objects, one from the familiarized category and the other from a novel category. If, during test, infants look longer at the object from the novel category, it can be concluded that they have formed a category that contains the within-category test object but excludes the out-of-category one. To illustrate, Quinn, Eimas, and Rosenkrantz (1993) familiarized 3-to 4-month-old infants with a series of photographs of cats and found that the infants, at test, looked longer at photographs of birds and dogs than those of novel cats. A large body of work using this paradigm has shown that even very young infants from 2 to 3 months of age can form categories at different levels of specificity (e.g., Behl-Chadha, 1996;Bomba & Siqueland, 1983;Mareschal, Powell, Westermann, & Volein, 2005;Mareschal & Quinn, 2001;Oakes, Madole, & Cohen, 1991;Quinn, 2002;Rakison, 2010;Younger & Cohen, 1983). While these early categories are based on the visual properties of objects, they can form the basis for enriched, conceptual representations when infants learn about more abstract information, such as object functions, sounds, hidden properties, and, importantly, the names for objects (Balaban & Waxman, 1997;Madole & Oakes, 1999;Waxman & Gelman, 2010;Westermann & Mareschal, 2014). Specifically, the role of language in infants' formation of object categories has received considerable attention (e.g., Althaus & Mareschal, 2014;Althaus & Plunkett, 2015Althaus & Westermann, 2016;Balaban & Waxman, 1997;Ferguson & Waxman, 2016;Ferry, Hespos, & Waxman, 2010Plunkett, Hu, & Cohen, 2008). Here, researchers have asked if language -labels for objectsand other sounds can shape preverbal infants' formation of categories. In adults and older children, different labels for perceptually similar objects, for example 'bird' and 'bat', indicate that the objects belong to different categories, and they allow inferences about the hidden properties of category members (e.g., a never-before-seen dog will bark; e.g., Gelman & Meyer, 2011).
The most extensive body of work investigating the role of labels in preverbal infants' category formation comes from Waxman and colleagues (e.g., Balaban & Waxman, 1997;Waxman & Markow, 1995). In many studies, these researchers tested infants in categorization tasks under a variety of auditory stimuli, such as labels, tone sequences (Ferguson & Waxman, 2016;Fulkerson & Waxman, 2007), lemur vocalizations (Ferry et al., 2010), backward speech (Ferry et al., 2013), non-native languages that agreed or differed with the infants' native language in rhythmic and prosodic properties (Perszyk & Waxman, 2019) and bird song (Woodruff Carr, Perszyk, & Waxman, 2021). In these studies, infants between 3 and 12 months of age were typically familiarized with a sequence of stimuli from a single category (commonly, colored line drawings of dinosaurs or fish) presented on a computer screen, and most or all of these presentations were accompanied by a label (e.g., "Look, it's a toma!") or a non-labeling sound (e. g., pure tones matched in length and volume to the labels). Like in the category learning studies described above, at test, infants were then shown (in silence) a novel exemplar from the familiarized category and an exemplar from a different category side-by-side, and looking preference to either stimulus was measured. In these studies, it was consistently found that when infants were familiarized with a consistent label accompanying the visual stimuli, they displayed a preference for the outof-category item at test, suggesting that they had formed a category for the familiarized objects. In contrast, the effect of a range of other, nonlinguistic sounds varied. While for most nonlinguistic sounds, there was no evidence of category formation (i.e., a lack of preference towards either visual stimulus), some sounds (e.g., pre-familiarized lemur vocalizations) also led to categorization (see Table 1).
Based on these results, Waxman and colleagues argued that infants as young as 3 months showed a link between labels and object categorization (Ferry et al., 2010) and that labels (and language-like sounds) take a special role that is not shared by other auditory signals: labels were claimed to selectively promote object categorization (Fulkerson & Waxman, 2007;Perszyk & Waxman, 2018), and more broadly, to link language to core cognitive capacities even in young infants (Perszyk & Waxman, 2019).
To further delineate which auditory signals are used by infants in this way, Ferguson and Waxman (2016) asked if nonlinguistic stimuli could assume a category-promoting function if they were used in a communicative context. The authors pre-familiarized infants with video clips in which two actors communicated, with one actor speaking and the other replying in sine-wave tones. In a control condition, infants heard the same auditory sequence containing the tones, but, here, it was uncoupled from the dialogue. In the subsequent categorization task where visual stimuli were accompanied by tones during familiarization, only infants in the 'communicative' condition showed evidence of category formation. Based on these results, the authors argued that infants had sophisticated abilities to detect which auditory signals had a communicative function and then used these signals to guide category formation.
Together, then, the conclusion that Waxman and colleagues drew from their body of work was that linguistic, but not other, auditory signals selectively promote category formation even in preverbal infants, and that infants have sophisticated abilities enabling them to determine which auditory signals can serve this linguistic function and which do not.
Nevertheless, as discussed above, it is well-established that young infants are able to form perceptual categories even in the absence of any auditory input. Therefore, it is important to consider what additional role, if any, labels play in this process. Plunkett et al. (2008) and Althaus and Westermann (2016) argued that a constructive role for labels in category formation can only be assumed when, for a specific set of stimuli, categorization under labels is different from categorization in the absence of auditory input. That is, if infants form a category for a set of objects in silence, and also when accompanied by labels, then the labels cannot be said to have a facilitative or constructive role in the formation of this category.
Even for categories that infants can form in silence, however, it is interesting to compare the effect that different auditory stimuli might have on this processin this case, not to see which auditory stimuli facilitate category formation, but instead, which stimuli disrupt it. According to the auditory overshadowing hypothesis (Robinson & Sloutsky, 2004), simultaneous presentation of auditory and visual information can interfere with the processing of the visual information, especially when the auditory stimuli are unfamiliar to the infant (Robinson & Sloutsky, 2004, 2007a, 2007b, 2010Sloutsky & Robinson, 2008). For example, Robinson and Sloutsky (2007a) employed the familiarization/novelty-preference paradigm and familiarized 8-and 12-month-old infants with pictures either in silence, with a label, or with a laser-sound sequence. The authors found that, consistent with their auditory overshadowing hypothesis, auditory stimuliboth the label and laser-sound sequencehindered infants' ability to form categories which they were able to form in silence. In another series of studies with 8-to 16-month-old infants, Robinson and Sloutsky (2004;Sloutsky & Robinson, 2008) found that non-speech sounds interfered with infants' processing of corresponding visual information, but, at least in 16month-olds, that pre-familiarizing infants with these sounds attenuated this disruptive effect. Together, these results suggest that concurrent auditory input (including labels) interferes with, rather than supports, visual processing in infants, and the familiarity of such auditory input mediates auditory overshadowing.
Considering this evidence, a finding that infants form categories for a set of visual stimuli under labeling but not when accompanied by other sounds can, therefore, be interpreted in two sharply contrasting ways. First, as argued by Waxman and colleagues, it could indicate that labels have a constructive role and promote category formation, but that Table 1 Infants' ability to form object categories in the presences of different sounds during familiarization in studies using the dinosaur/fish stimulus set. nonlinguistic sounds do not share this facilitative role. A second possibility, in line with the auditory overshadowing hypothesis, is that certain auditory signals (e.g., labels) have no effect on category formation for that specific set of stimuli and that infants would form the same categories in silence, but that certain nonlinguistic sounds disrupt the formation of these categories. The way to distinguish between these alternatives is clear: we need to examine if infants are able to form categories for a specific stimulus set in the absence of any auditory input, establishing a baseline against which constructive or disruptive effects of auditory stimuli can then be evaluated. Although the work by Waxman and colleagues tested a wide range of auditory stimuli, unfortunately, none of these studies included such a critical 'silent' condition. Therefore, it remains unclear whether, in their studies, labels and related auditory stimuli indeed facilitated object categorization, or whether they merely did not disrupt categorization of the visual stimuli that infants would have achieved also in silence. Likewise, it remains unclear whether other, nonlinguisitc auditory signals had no effect on categorization (as claimed by Waxman and colleagues), or whether they disrupted categorization of the visual stimuli. The latter possibility would be in accord with the auditory overshadowing hypothesis which argues that the disruptive effect of certain auditory signals is due to their unfamiliarity.
In order to disambiguate these contrasting possibilities, here, we extended the body of work by Waxman and colleagues by, in a first experiment, running the missing silent control condition with the dinosaur/fish stimulus set used in the majority of their studies. In a second experiment, we then tested contrasting explanations for the role of auditory signals in categorization: whether, as claimed by Waxman and colleagues, infants recognize certain auditory signals as communicative and, therefore, use them for category formation, or whether, in line with the auditory overshadowing hypothesis, the greater familiarity of such stimuli merely prevents the disruption of category formation.

Experiment 1
This experiment tested whether young infants were able to form categories for the dinosaur/fish stimuli used by Waxman and colleagues in the absence of auditory input. This experiment, therefore, contributed a crucial control condition to Waxman and colleagues' work to distinguish whether labels and label-like sounds facilitate category formation for these stimuli, or merely do not disrupt infants' categorization that is based purely on the visual appearance of the stimuli. Based on findings of previous work (e.g., Behl-Chadha, 1996;Quinn & Eimas, 1996) that infants from 3 months of age can form visual categories for drawings and pictures of animals and other objects in the absence of any auditory input, we hypothesized that the 6-month-olds in this experiment would successfully form categories.

Participants
Seventeen healthy, full-term 6-month-olds (M = 5.9 months, range = 5.6-6.5 months, 9 females) participated in this experiment. Infants were recruited from a database where parents had previously indicated interest in participating in research studies. An additional 14 infants were tested but excluded from all analyses due to failure to meet the looking time criterion (a minimum of 20% of looking during the whole visual presentation or the familiarization phase, N = 9), looking behavior during familiarization being identified as an outlier (2 SDs above mean looking time; N = 1), fussiness (N = 3), or parent-reported facial muscle disorder (N = 1). Informed consent was received from the parents, and all infants were reported to be primarily exposed to English at home. The number of infants was determined by a power analysis based on Ferguson and  which indicated that a sample of 17 participants would yield a power of 0.81 to detect a t-value of 2.12 in a onesample t-test. The dropout rate was comparable to those in the studies by Waxman and colleagues using the same stimulus set (e.g., Ferguson & Waxman, 2016:

Materials
The dinosaur/fish stimulus set used in various studies by Waxman and colleagues (see Table 1) was used. The first set of pictures consisted of nine colored line-drawn dinosaurs, and the other set consisted of nine colored line-drawn fish. All pictures in the same set depicted a different exemplar of the same category. All exemplars were about 15 cm × 15 cm in size, outlined in black, filled with a unique solid color, and depicted against a white background. Of the nine pictures in each set, eight were used in the familiarization phase, and the remaining one in the test phase (following Waxman and colleagues' studies, all infants saw the same exemplar of fish and dinosaur during the test phase). The two sets of pictures were matched for color, and, as in Waxman and colleagues' work, both exemplars (fish and dinosaur) used in the test phase were of the same color.

Procedure
The procedure closely followed that employed by Waxman and colleagues in the described studies. After explaining the procedure to the parents and obtaining written informed consent, infants were seated on their parent's lap at a distance of approximately 65 cm from a 22-in. computer screen. Parents were blind to the purpose of the study and were instructed not to speak during the experiment or influence their infant's attention in any way during the visual presentation. The lights in the room were then dimmed for the infants to concentrate on the visual presentation. A 5-point calibration sequence was used to calibrate a Tobii X120 remote eye tracker (sampling frequency 60 Hz, system accuracy = 0.5 degrees). During calibration, a looming circle with accompanying sound was displayed at five locations (the four corners and the center) on the screen. Calibration was repeated until all five points had been calibrated successfully.
Following calibration, a colorful spinning wheel was shown at the center of the screen to focus infants' attention. Infants were then familiarized with eight colored pictures from the same category, either all dinosaurs or all fish, counterbalanced across infants. The pictures were presented one at a time, in random order, for 20 s each, at either the left or right side of the screen. The lateral position of the picture in the first familiarization trial was counterbalanced across infants and alternated thereafter. Pictures were presented in silence.
The familiarization phase was followed by a test phase consisting of two silent test trials. A colorful spinning wheel was shown at the center of the screen to direct infants' attention to this location prior to each test trial. In the first test trial, infants were shown two new pictures side-byside approximately 10 cm apart from each other for 20 s: a new member of the familiarized category (e.g., another dinosaur) and a member of the other category (e.g., a fish). The second test trial was identical to the first one, apart from the lateral positioning of the test stimuli being reversed. The lateral position of the picture from the novel category in the first test trial was counterbalanced across infants. It should be noted that, typically, there was only a single test trial in all studies by Waxman and colleagues 1 ; the additional test trial in the present study was to ensure that the infants did not have a side bias.
After the test phase, parents were fully debriefed and received reimbursement for their transport costs. In addition, a storybook was given to the infants as a token of thanks.

Results
The raw data of this experiment were the infants' looking times towards the visual stimuli during familiarization and test. All data were transformed, where necessary, processed, and distributions checked prior to analysis (see details below for each phase).

Looking during familiarization
Average looking times in the first three trials (Block 1), in the last three trials (Block 2), and across all eight trials were calculated for all infants (see Table 2 for Ms and SDs). The data were checked, by means of visual inspection of Q-Q plots, to be normally distributed and fit for analysis.
In order to examine whether familiarization had occurred by the end of the familiarization phase, the average looking times by block were submitted to a 2 within (Block: 1 vs. 2) × 2 between (Category: dinosaur vs. fish) factorial mixed ANOVA. The mixed ANOVA revealed a significant main effect of block (F(1, 15) = 5.90, p = .028, η p 2 = .28); all other main effects and interactions were non-significant (all Fs ≤ 4.24, ps ≥ .057). These results suggest that the infants were familiarized with the visual stimuli by the end of the familiarization phase, irrespective of the category they were familiarized with.

Preferential looking during test
For each infant, a novelty preference score, the amount of time that they spent looking at the stimulus from the novel category divided by the amount of time they spent looking towards either stimulus, was calculated for each test trial (see Table 3 for Ms and SDs). 2 The novelty preference scores were then transformed using an arcsine-root transformation to allow the bounded proportional data to be analyzed using linear models, which assume dependent variables to be unbounded (for clarity, we report untransformed scores). The transformed data were checked to be fit for analysis by visual examination of Q-Q plots.
A preliminary paired t-test was conducted to compare the novelty preference scores on test trials 1 and 2. The difference between the novelty preference scores on the two trials was not significant (t(16) = 1.14, p = .273). Hence, subsequent analyses were conducted on the average novelty preference scores. A 2 (Category: dinosaur vs. fish) × 2 (Position of Stimulus from Novel Category in First Test Trial: left vs. right) between-subjects ANOVA was conducted on the average novelty preference scores. No significant main effects and interactions were found (all Fs ≤ 2.09, ps ≥ .172). These results reflect that the infants looked more at the stimulus from the novel category, irrespective of the category they were familiarized with and the lateral position of the stimulus from the novel category in the first trial. Subsequent analyses were collapsed across category and position of novel stimulus in the first test trial.
The average novelty preference scores were compared against chance (.50). As in previous studies, a looking preference towards the stimulus from the novel category that is significantly above chance indicates that infants have formed a category of the familiarization stimuli. A one-sample t-test was conducted on the average novelty preference scores. The infants reliably showed a preference for the stimulus from the novel category (t(16) = 2.13, p = .049, d = 0.52; see Fig. 1). For the results to be more comparable to those of Waxman and colleagues' studies, which typically only used one test trial, we also conducted separate one-sample t-tests on the data for each test trial. The infants showed a novelty preference in the first test trial (t(16) = 3.28, p = .005, d = 0.77), but not in the second test trial (t(16) = − 0.71, p = .490).

Discussion
The results of this experiment, consistent with the hypothesis and findings from previous research (e.g., Behl-Chadha, 1996;Quinn & Eimas, 1996), show that 6-month-old infants were able to form categories of the presented stimuli in the absence of any auditory stimuli. Critically, this experiment used the same visual stimuli as those in the studies by Waxman and colleagues (Ferguson & Waxman, 2016;Ferry et al., 2010Ferry et al., , 2013Fulkerson & Waxman, 2007;Perszyk & Waxman, 2016, 2019 who argued that in their studies, labels and other speechlike auditory signals supported, promoted, facilitated, or boosted categorization of these stimuli by infants. The results of the current experiment therefore warrant an important re-interpretation of these claims. Our results suggest that in Waxman and colleagues' studies, labels and other communicative auditory signals had no (supportive or detrimental) effect on infants' formation of categories of these stimuli, but that other sounds interfered with categorization. Therefore, in sharp contrast to Waxman and colleagues' interpretation, the effect of labels and other communicative auditory signals in these studies should be seen as non-disruptive, rather than facilitative.
Although, here, we have shown successful category formation of the dinosaur/fish stimuli in silence only for 6-month-old infants, we can assume that, in line with the existing literature on visual categorization in silence, slightly younger and older infants would likewise be able to categorize these stimuli in silence. Thus, assuming that the infants in the studies by Waxman and colleagues were also able to categorize these stimuli in silence, our results can be combined with their work to provide a rich dataset of how sounds and visual category formation interact. Re-interpreting the results presented in Table 1, we can assume that these studies provided no evidence that word labels, lemur vocalizations in 3-to 4-month-olds, pre-familiarized 'communicative' tone sequences, and non-native language with similar rhythmic patterns as the infants' native language support categorization of this stimulus set in infants. Likewise, our results suggest that, in these studies, tone sequences, backward speech, lemur vocalizations in older infants, zebra finch song, pre-familiarized 'non-communicative' tone sequences, and non-native language with different rhythmic patterns from the infants' native language interfered with category formation. This re-interpretation is in line with the auditory overshadowing hypothesis which argues that it is the familiarity of auditory stimuli that determines whether or not they disrupt the processing of visual stimuli. We next turn to a more detailed   Ferry et al., 2010Ferry et al., , 2103 based their analyses on the first 10 s of accumulated looks for each infant, we note that only eight (Experiment 1) and four (Experiment 2) of our infants accumulated >10 s of looking during each test trial. Here, we argue that our approach is more conservative than that in Waxman and colleagues' studies, as any looking beyond the first 10 s of accumulated looks would likely attenuate the looking difference between the two stimuli, as shown in our results on Trial 2. We also conducted the same analyses on the first 10 s of each test trial, and the same result patterns were obtained.
investigation of this assumption. While the auditory overshadowing hypothesis attributes a central role to the familiarity of auditory stimuli in preventing disruption of categorization, familiarity is not the only explanation that has been put forward to account for this effect. An alternative account is that infants have sophisticated abilities to detect the communicative nature of auditory stimuli and distinguish communicative from noncommunicative sounds, using the former but not the latter to guide category formation. Following this alternative account, Ferguson and Waxman (2016) built on previous results which had shown that when 6month-old infants heard sine-wave tone sequences accompanying visual stimuli, they did not show evidence of category formation (Fulkerson & Waxman, 2007). Ferguson and Waxman, assuming a special role for linguistic (communicative) signals, asked whether providing such tones with a communicative function would then promote categorization. They found that at test, only infants in a 'communicative' condition (having been pre-familiarized with beeps used in a communicative way within a dialogue) looked longer at the out-of-category item, whereas infants in a 'non-communicative' condition (having been prefamiliarized with the same beeps, but uncoupled from the dialogue) showed no preference for either stimulus. They interpreted these results as showing that 6-month-olds had detected the communicative function of tone sequences in the communicative condition and thus accepted them as label-like category markers, promoting categorization of the visual stimuli.
With the results from our Experiment 1 suggesting the reinterpretation that in Ferguson and Waxman's (2016) study the beeps in the communicative condition had no effect on categorization, but those in the non-communicative condition had a disruptive effect, we can ask what caused the difference between these conditions. One possible explanation is that apart from the synchrony of auditory and visual information, the videos in the pre-familiarization phase of the communicative and non-communicative conditions differed in other ways. In the communicative condition, the two actresses sat still next to each other and engaged in a conversation, with occasional hand gestures. In contrast, the actresses in the non-communicative condition were engaged in more dynamic activity with more body movements. This more dynamic visual component may have led the infants to allocate more attention to the visual information, interfering with their encoding of the auditory information, so that the beeps remained less familiar in this condition. Another explanation is that the asynchrony between visual and auditory signals in the non-communicative condition disrupted infants' familiarization to the auditory signals. Infants process synchronous and asynchronous audiovisual information differently, attending more readily to the auditory information in a synchronous audiovisual presentation, while allocating more attention to the visual information in an asynchronous audiovisual presentation (Bahrick, Lickliter, & Castellanos, 2013). These explanations are, again, compatible with the auditory overshadowing hypothesis, because they suggest that in the 'non-communicative' condition, the auditory information was encoded less deeply and thus, was less familiar. We put this explanation to the test in our second experiment by pre-exposing infants to the beep sequence without accompanying potentially disruptive visual information, but crucially, therefore, also without the beeps taking a communicative role. According to the 'beeps-as-communicative-signals' hypothesis, under this condition, infants should not form a category because the non-communicative auditory signals do not promote category formation. In contrast, according to the auditory overshadowing hypothesis, infants should form a category (as in silence) because the pre-familiarized auditory signals would lose their disruptive function.

Experiment 2
This experiment closely followed Ferguson and Waxman's (2016) study, with the difference that during the exposure (pre-familiarization) phase, infants listened to the beep sequences without accompanying visual stimuli. We predicted, based on the auditory overshadowing hypothesis, particularly the results of Sloutsky and Robinson (2008), that infants in this experiment would also successfully form categories.

Fig. 1. Infants' average novelty preference scores in both experiments.
Note. Error bars represent standard errors. Chance level (.50) is indicated by dotted line. Significant difference between preference score and chance performance is marked by a single asterisk (p < .05).

Participants
Seventeen healthy, full-term 6-month-olds (M = 5.9 months, range = 5.5-6.4, 10 females) who did not take part in Experiment 1 were recruited from the same database to take part in this experiment. An additional 11 infants were tested but excluded from all analyses due to failure to meet the looking time criterion (a minimum of 20% of looking during the whole visual presentation or the familiarization phase, N = 6), fussiness (N = 4), or technical problems (N = 1). Caregivers provided informed consent and reported that their infants were primarily exposed to English at home.

Auditory stimuli.
An audio recording of sine-wave tone sequences (approximately 2 min in duration, played at 54-76 dB), extracted from the video recordings used in the exposure phase in Ferguson and Waxman's (2016) study, were used in the exposure phase. A tone sequence (2.2 s in duration, played at approximately 65 dB) which differed in both rhythm and pitch (400 Hz) from the tone sequences presented in the exposure phase, also adopted from Ferguson and Waxman's study, was paired with the presentation of each picture in the familiarization phase.

Procedure
The procedure of this experiment was the same as Experiment 1, with the following exceptions: (1) Two loudspeakers, concealed by black fabric, were placed below and to the immediate left and right of the computer screen for the presentation of auditory stimuli. Prior to dimming the lights and calibration, the infants participated in an exposure phase, during which they listened to the 2-min audio recording of sine-wave tone sequences while the screen remained blank. (2) In the familiarization phase, the presentation of each picture was paired with the 400-Hz tone sequence. As in Ferguson and Waxman (2016), the tone sequence was played at picture onset and repeated 10 s later.

Results
Analysis of the data followed that of Experiment 1. The raw data of this experiment were the infants' looking times towards the visual stimuli during familiarization and test, and all data were transformed, where necessary, processed, and distributions checked prior to analysis (see details below for each phase).

Looking during familiarization
Average looking times in Block 1, Block 2, and across all eight trials were calculated for all infants (see Table 2 for Ms and SDs). The data were checked, by means of visual inspection of Q-Q plots, to be normally distributed and fit for analysis.
To examine whether familiarization had occurred by the end of the familiarization phase, the average looking times by block were submitted to a 2 within (Block: 1 vs. 2) × 2 between (Category: dinosaur vs. fish) factorial mixed ANOVA, which revealed a significant main effect of block (F(1, 15) = 17.65, p < .001, η p 2 = .54); all other main effects and interactions were non-significant (all Fs ≤ 0.15, ps ≥ .709). These results suggest that the infants in this experiment were familiarized with the visual stimuli by the end of the familiarization phase, irrespective of the category they were familiarized with.

Preferential looking during test
A novelty preference score was computed for all infants (see Table 3 for Ms and SDs). These scores were then arcsine-root transformed and checked to be fit for analysis. A preliminary paired t-test was conducted to compare the novelty preference scores on test trials 1 and 2 of this experiment. The difference between the novelty preference scores on the two trials was not significant (t(16) = − 1.78, p = .094). Thus, subsequent analyses were conducted on the average novelty preference scores. Next, a 2 (Category: dinosaur vs. fish) × 2 (Position of Stimulus from Novel Category in First Test Trial: left vs. right) between-subjects ANOVA was conducted on the average novelty preference scores. No significant main effects and interactions were found (all Fs ≤ 0.55, ps ≥ .472), suggesting that the infants performed similarly irrespective of the category they were familiarized with and the lateral position of the stimulus from the novel category in the first trial. Subsequent analyses were, therefore, collapsed across category and position of novel stimulus in the first test trial.
A one-sample t-test was conducted on the average novelty preference scores against chance-level performance. As in Experiment 1, the infants reliably showed a preference for the stimulus from the novel category (t (16) = 2.70, p = .016, d = 0.65), indicating that the infants had formed a category of the familiarization stimuli. To make our results more comparable to those of Ferguson and Waxman (2016), which only included one test trial, we, again, conducted separate one-sample t-tests on the novelty preference scores of each test trial. The infants showed a novelty preference in the first test trial (t(16) = 2.29, p = .036, d = 0.56), but not in the second test trial (t(13) = 0.63, p = .542).

Discussion
The results of this experiment were in line with the prediction and converge with those from Sloutsky and Robinson (2008) in showing that increasing the familiarity of novel auditory signals can remove or attenuate their disruptive effects on infants' visual categorization where visual and auditory information are presented simultaneously. This finding, together with Experiment 1, provides a simple perceptual explanation of Ferguson and Waxman's (2016) results. Ferguson and Waxman argued that it was infants' understanding of the communicative function of beeps that promoted their category formation. In contrast, our results suggest that stimulus familiarity alone can account for their findings (while acknowledging that we cannot provide direct evidence that the difference between Ferguson and Waxman's (2016) two conditions was indeed familiarity per se). As such, these results are in keeping with the auditory overshadowing hypothesis which argues that unfamiliar auditory stimuli can disrupt visual categorization, but that familiar auditory stimuli do not.
Notwithstanding our results, brief pre-familiarization with auditory stimuli is not always sufficient to prevent interference with visual categorization. In one study using the same visual stimuli as here, Perszyk and Waxman (2016) pre-familiarized 6-month-old infants with backward speech, which had been shown to disrupt categorization in 6month-olds (Ferry et al., 2013), but found that even after prefamiliarization, infants did not form a category. Perszyk and Waxman concluded that not all familiar auditory signals support categorization in infants (but given our results here, we would interpret these results as showing that even pre-familiarized backward speech disrupted categorization). It is likely that the effect of pre-familiarization on auditory overshadowing is shaped by interactions between the complexity of the auditory stimuli and the nature of the pre-familiarization exposure.

General discussion
Building on a substantial body of work by Waxman and colleagues, we used the same visual stimuli that, in several studies, had been argued to show that labels and other communicative signals exert an enabling role on preverbal infants' category formation, and examined if infants could form a category of these stimuli also in the absence of any auditory input. We found that 6-month-old infants were able to form categories of these visual stimuli in silence, suggesting a re-interpretation of the results from that body of work: as infants are able to categorize these stimuli in silence, it cannot be argued that labels and similar auditory signals have an enabling effect on their categorization of these stimuli, but merely that they did not disrupt categorization. In contrast, tone sequences and other unfamiliar auditory stimuli disrupted visual categorization. In a second experiment, we asked whether the nondisruptive function of some auditory signals is best explained by their communicative relevance, as argued by Waxman and colleagues, or by infants' familiarity with these stimuli, as argued in the auditory overshadowing hypothesis. We found that pre-familiarization on auditory stimuli without accompanying visual input, and without a communicative function, prevented disruption of infants' visual category formation.
We would like to stress that the aim of the present work was not to argue against the central claim that auditory labels can shape object categorization in infants already during the first year of life. This claim has been verified in other studies (Althaus & Westermann, 2016;Plunkett et al., 2008) that included silent control conditions. We do, however, reiterate here the simple argument put forward by Plunkett et al. (2008) and Althaus and Westermann (2016): to conclusively show that labels promote object categorization, it is necessary to show that infants do not form the same categories without labels. If they do, we cannot argue that labels facilitate or enable formation of these categories; instead, the natural conclusion is that labels have no effect on their formation. In Plunkett et al.'s (2008) study, it was found that infants in silence formed two categories when familiarized on a specific set of stimuli, but when all stimuli were paired with the same label, infants formed a single category. Similarly, Althaus and Westermann (2016) showed that infants formed a single category of a stimulus set in silence, but when half of the stimuli were paired with one label and the other half with another, infants formed two categories that aligned with the labels.
We also do not argue against the notion that when labels do shape categorization, they do so by highlighting the commonalities among category members. Indeed, Althaus and Mareschal (2014) showed that labels led 12-month-old infants to look more at features that were similar among familiarized objects, and this, in turn, helped infants to discriminate which of two perceptually similar objects was a new category member of the familiarized category.
Our results do, however, speak to the debate about the mechanisms underlying the interactions between labels and visual information in infant categorization. In the absence of a silent control condition, the original interpretation of the body of work by Waxman and colleagues was for strong and pervasive evidence of a constructive role of labels and other communicative signals in the formation of categories, and no role, neither constructive nor disruptive, for other auditory stimuli. The conclusions drawn from these results were that infants early in life have the capacity not only to distinguish communicative from noncommunicative signals, but also to utilize communicative signals in specific ways (Perszyk & Waxman, 2018). Since only communicative sounds were assumed to have an enabling role in category formation, these results were also argued to be strong evidence against an associative mechanism in linking objects and sounds.
In light of our results, this interpretation needs to be re-evaluated. First, we can conclude that, at present, it is an open question whether infants privilege communicative sounds to enable their category formation. To our knowledge, only two studies have shown that labels do support category formation above and beyond forming the same categories in silence (Althaus & Westermann, 2016;Plunkett et al., 2008), and no such enabling role has yet been demonstrated for other sounds. Thus, although the question whether infants have a special sensitivity to communicative sounds for this purpose or whether enabling effects are due to the greater familiarity of language over other sounds remains openour results in Experiment 2 support the latter assumption (see also Sloutsky & Robinson, 2008). We also find that, surprisingly, evidence for a constructive role for labels in categorization in infancy is limited. The studies by Althaus and Westermann (2016) and Plunkett et al. (2008) showed this for 10-month-olds, but at present, we do not know if younger infants have the same ability.
In conclusion, the present study has shown that the results reported in the series of studies by Waxman and colleagues cannot be taken as supportive evidence for their claim that labels and novel communicative signals can facilitate categorization in infants. Instead, combined with our results, they support the auditory overshadowing hypothesis and provide a rich dataset of which types of auditory stimuli interfere with infants' visual categorization. Further research can address the question under which circumstances nonlinguistic sounds do not disrupt, or perhaps even facilitate, infants' category formation, and the developmental changes in such effects.

Data availability
Data will be made available on request.