Neuroscience and Biobehavioral Reviews Vision dominates audition in adults but not children: A meta-analysis of the Colavita e ﬀ ect

The Colavita e ﬀ ect occurs when participants respond only to the visual element of an audio-visual stimulus. This visual dominance e ﬀ ect is proposed to arise from asymmetric facilitation and inhibition between modalities. It has also been proposed that, unlike adults, children appear predisposed to auditory information. We provide the ﬁ rst quantitative synthesis of studies exploring the Colavita e ﬀ ect, combining data from 70 experiments across 14 studies. A mixed-meta-regression model was applied to assess whether the Colavita e ﬀ ect is in ﬂ uenced by methodological factors and age group tested. Studies reporting response time data were used to test for the presence of asymmetrical facilitation between modalities. Studies with adult participants yielded a medium, approaching large, e ﬀ ect size. Studies exploring the Colavita e ﬀ ect in children yielded no Colavita e ﬀ ect. Across adult and child studies, no methodological factors in ﬂ uenced the e ﬀ ect. Contrary to asymmetrical facilitation, response time data suggested a general slowing under bimodal conditions. These ﬁ ndings suggest that whilst vision dominates in adults, this e ﬀ ect is absent in childhood.


Introduction
Our world is perceived through multiple senses, but it is unclear whether information from all senses is treated equally. Whilst reading this paper, are you more likely to be distracted by the sight of an email pop-up on your screen, or the sound of your phone ringing? Furthermore if your phone rings and an email pops-up simultaneously, which do you respond to first? The answer to these questions may lie with sensory dominance. Colavita (1974), Colavita et al. (1976) reported that when participants were presented with an auditory and a visual stimulus simultaneously they responded as though only the visual stimulus had occurred, and frequently reported having not perceived the auditory stimulus at all. This Colavita effect was found even when the auditory stimulus (a tone) was presented at twice the subjective intensity of the visual stimulus (a light), ruling out a simple explanation of physical inequality between the two modalities (Colavita, 1974). A Colavita error is defined as occurring when participants respond only to the visual element of a bimodal, in this case audio-visual, target. This effect has been used to imply a hierarchy of sensory processing in which visual information is given precedence.
Multiple studies have since replicated the Colavita effect, although the extent of the effect does appear to depend on the specific instructions given to participants. Studies conducted in the decade following the original study used two response keys and instructed participants to "make a response appropriate to the signal recognised first" (Colavita, 1982;Colavita and Weisberg, 1979;Johnson and Shapiro, 1989;Shapiro et al., 1984). These studies found Colavita "errors" to occur on a relatively large number of bimodal trials ranging from 38 to 98%. In these studies, however, participants were instructed to make only one response (to that which was recognised first) but it is possible that the participants still perceived both auditory and visual signals. More recent studies (Koppen and Spence, 2007a,b,c,d) instructed participants to press both keys on bimodal trials. Although the number of visualonly responses was smaller in these studies (0.9-12.1%) these error rates remained significantly higher than auditory-only responses, thus demonstrating the Colavita effect.
In contrast, variations in other task manipulations do not appear to influence the Colavita effect. Qualitative reviews of the literature exploring visual precedence in adults (Spence, 2009;Spence et al., 2012) have concluded the Colavita effect to be relatively insensitive to manipulations of stimulus intensity (Colavita, 1974;Shapiro and Johnson, 1987), attention bias to one or other modality created by the experiment (Egeth and Sager, 1977;Koppen and Spence, 2007a,c;Sinnett et al., 2007), response demands (Egeth and Sager, 1977;Hecht and Reiner, 2009;Koppen and Spence, 2007c;Sinnett et al., 2007) and stimulus complexity (Koppen et al., 2008;Sinnett et al., 2007). This suggests that visual precedence may have an origin beyond simply https://doi.org/10. 1016/j.neubiorev.2018.07.012 Received 18 October 2017; Received in revised form 18 July 2018; Accepted 22 July 2018 response bias. However, since the previous review was descriptive, and over ten large studies have been published since, a quantitative update of the review is essential. Therefore, the primary aim of the current study was to quantify how robust the Colavita effect is, and, furthermore, whether it can be manipulated by task demands or age group tested.
The additional factor of age may be of particular importance to the sensory dominance literature. Robinson and Sloutsky (2004) and Barnhart et al. (2018) assessed sensory dominance in 4 year olds and 5-12 year olds respectively. Findings from these studies suggested that visual dominance may develop across the lifespan and that children may be auditory dominant. Wille and Ebersbach (2016) suggest a shift occurring around 9 years of age, as they found 9-year-olds showed Colavita effects, albeit weaker than the effects seen in adults. Indeed, the auditory system undergoes substantial development in utero (Graven and Browne, 2008a) whereas the visual cortex undergoes lengthy, protracted development throughout childhood (Graven and Browne, 2008b). Consequently, children may rely less upon vision, and more upon audition, early in life. In line with this it has been shown that young children struggle to ignore auditory information when focusing upon visual stimuli (Hanauer and Brooks, 2003) and children manifest smaller, sometimes reverse, Colavita effects (Nava and Pavani, 2013;Wille and Ebersbach, 2016). Given this, a comparison of the Colavita effect across studies using different age groups is of great theoretical interest.
A further aim of the current study was to explore the mechanisms underpinning the Colavita effect. Sinnett et al. (2008) proposed that the appearance of visual precedence is due to an asymmetrical inhibitoryfacilitatory relationship between vision and audition (Sinnett et al., 2008). Sinnett et al. (2008) report that, in simple detection tasks (using a single key), presenting auditory and visual stimuli together facilitated response times. Conversely, in discrimination tasks (using multiple keys), presenting auditory and visual stimuli together impeded response times. In a second experiment, using a simple detection task, they found that auditory stimuli facilitated response times to visual targets whilst visual stimuli impaired response times to auditory targets. These opposing effects have been used to infer an asymmetrical inhibitory-facilitatory relationship between audition and vision. Sinnett et al. (2008) propose that this asymmetrical relationship might result in Colavita errors, as when participants are presented with bimodal targets the 'internal threshold' for responding to visual targets is reached sooner than auditory targets (Spence, 2009). Thus visual processing interferes with, and delays, auditory target detection and speeded responses are most likely to be visual-only responses (Spence, 2009). This hypothesis is supported by event-related potential (ERP) data showing ERPs to audio-visual stimuli occur at an increased latency relative to auditory only ERPs and a decreased latency relative to visual only ERPs (Molholm et al., 2002).
On the other hand, previous literature has suggested vision facilitates audition and vice versa. In simple response time tasks (using one response key) response times to bimodal targets are typically faster than unimodal targets (the redundant target effect; Diederich and Colonius, 2004;Forster et al., 2002;Gondan et al., 2005;Sinnett et al., 2008). Furthermore, detection thresholds for luminance appear lower (Frassinetti et al., 2002), and the saliency (Noesselt et al., 2008) and perceived brightness (Odgaard et al., 2003) of visual events increases with simultaneous sound. Similarly, irrelevant visual stimuli can enhance auditory detection (Lovelace et al., 2003) and increase the perceived loudness of simultaneously presented sounds (Odgaard et al., 2004). However, Odgaard et al. (2004) suggest different processes may underpin facilitation between modalities, as the effect of audition upon vision might arise from decisional processes, whilst the effect of vision upon audition may hold sensory origin.
A general, symmetrical, model of multisensory facilitation is consistent with additivity, whereby neural responses elicited from bimodal targets are greater than responses to unimodal elements (Meredith and Stein, 1986). However, asymmetrical effects upon response times are not necessarily incompatible with additivity. For example, although visual and auditory evoked ERPs are asymmetrically influenced by one another with respect to latency, the amplitude of ERPs to audio-visual stimuli are greater than the sum of both unimodal auditory and unimodal visual responses (Molholm et al., 2002). However, it has yet to be established how physiological models of multisensory integration can accommodate asymmetries in cross-modal influences.
Given the mixed literature regarding symmetrical versus asymmetrical inhibition and facilitation between vision and audition, we aimed to test this within the existing Colavita literature. The hypothesis of Sinnett et al. (2008) is based upon findings from a simple detection task (using one response key). In contrast to this, many Colavita studies have utilised multiple response keys. Sinnett et al. note that with multiple response keys slowing can be observed. As such, we assessed whether asymmetrical response time effects are observed within the wider Colavita literature, in which multiple response keys were sometimes used.
The current paper provides the first quantitative synthesis of literature exploring the Colavita effect. The primary objectives of this analysis were to a) quantify how robust the Colavita effect is (i.e. making a unimodal visual response when bimodal stimuli are presented), b) test whether the Colavita effect is sensitive to experimental manipulations and age, and c) use available response time data to assess the presence of symmetrical versus asymmetrical facilitation between audition and vision. Given the specific predictions provided by Sinnett et al. with regards to auditory versus visual modalities, and the audiovisual nature of the Colavita effect in original reports (Colavita, 1974), we focus on studies comparing auditory versus visual modalities. Nevertheless it should be noted that the Colavita effect has since been extended to the visual-tactile domain Hecht and Reiner, 2009;Occelli et al., 2010). By including data from multiple studies we can overcome some of the limitations of individual studies. Small sample sizes have been used in many cases and effect sizes vary. For instance, Colavita's early (1974;1976;1979) experiments contained very few participants (n = 10) and trials (35 trials per participant, 5 bimodal).
To allow comparison between the present quantitative review and the qualitative review by Spence (2009) we included variables highlighted by Spence (2009) as potential moderator variables. Specifically, we predicted that the Colavita effect would be insensitive to manipulations of: • Number of response keys (2 or 3). Note that studies including only a single response key were considered for the response time analysis only as Colavita errors cannot be made with a single response key.
• Ratio of visual, auditory and bimodal targets (and in one case no target present 1 ).
• Attentional manipulation: was attention biased towards the visual or auditory modality either through arousal, cueing, perceptual biasing (if the light was twice the subjective intensity of the sound), or via instructional manipulation (participants asked to attend to or respond only to auditory information).
• Whether auditory and visual stimuli were perceptually matched in intensity (either subjectively or based upon thresholds).
• Stimulus congruency: A stimulus could be "congruent" semantically, e.g. a picture of a cat and the sound of a cat, or spatially, e.g. a visual stimulus on the left and a sound on the left.  • Asymmetric facilitation and inhibition. We included studies using Colavita tasks that also reported response times to test the prediction of Sinnett et al. (2008); that response times to visual stimuli are faster under bimodal conditions, whilst response times to auditory stimuli are slower under bimodal conditions.

Search and inclusion criteria
Studies were retrieved and selected using the guidelines outlined in PRISMA (Moher et al., 2009). Fig. 1 outlines the search strategy used. Studies were found by searching the electronic databases Scopus, PubMed and Web of Science (July 2016-August 2017) and reviewing the references of studies sourced. Initial search terms included: Colavita effect (64 hits across all data-bases), Colavita (362 hits across all databases) and sensory dominance (256 hits across all data-bases). The following inclusion criteria were then applied: • Studies using a choice response time task to compare responses to unimodal and bimodal stimuli in humans ( Fig. 1; box b).
• Studies comparing responses to auditory, visual and audio-visual targets ( Fig. 1; box c).
• Studies available to the author in English ( Fig. 1; box c).
• Sources in which full text could be sourced (i.e. meeting abstracts and posters excluded - Fig. 1; box c).
• Studies where error data and/or response time data for bimodal (audio-visual) stimuli could be sourced (either within the paper or via personal communication with the author - Fig. 1; box d). Notably, because response time analyses were performed to examine the effect of vision on audition and vice versa, response time data needed to be available for unimodal visual targets and visual targets in the presence of auditory stimuli and/or unimodal auditory targets, and auditory targets in the presence of visual stimuli.
• Studies conducted upon healthy participants (children and adults).
For example in two cases data was sought from the healthy control group of larger studies Steeves, 2012, 2013).
Many of the studies sourced included multiple experiments, each containing its own conditions/comparisons. For example, Wille and Ebersbach (2016) conducted three experiments each containing three age groups, in which three levels of congruency were exploredthus providing 27 experiments for the purposes of our analysis. By breaking down each study into its component experiments a total of 125 experiments were available for analysis. Details of these studies can be found in Table 1.
Of the studies and experiments available, only those that provided sufficient information for the calculation of effect size data were included to explore the following dependant variables: 1. The overall Colavita effect as defined in Eq. (1), where Vb refers to the percentage of visual-only responses made on bimodal trials and Ab refers to percentage auditory-only responses made on bimodal trials (15 studies, 71 experiments). Note that we use ratio scores in order to place the effects observed in all studies on the same scale (i.e. a study yielding 60% "visual only" responses and 20% "auditory only" responses shows the same level of visual dominance over audition as a study with 6% "visual only" versus 2% "auditory only").
3. Response times to unimodal auditory targets vs. auditory targets paired with a visual stimulus (11 studies, 25 experiments).

Statistical analyses
Effect sizes were calculated for the percentage visual-only vs.  Table 1 Details of experiments considered for analysis broken down by experiment and condition. Tick boxes indicate whether details necessary for the calculation of Cohen's d av were available (i.e. sample size, mean and standard deviation or standard error). Abbreviations within the "Attentional manipulation" and "Congruency" columns are as follows; C = Congruent, I = Incongruent, V = Visual, A = Auditory. If nothing is stated then this was either not manipulated or not reported within the obtained article. * Value indicates n for healthy control condition. For example, Moro and Steeves (2012;2013) both included 11 participants who had undergone monocular enucleation, these participants were not included.         auditory-only errors on bimodal trials (Colavita and reverse Colavita effects) as well as response times under unimodal visual vs. bimodal visual and unimodal auditory vs. bimodal auditory conditions. Calculation 2 of weighted effect sizes (see below) and model fitting was conducted using the metafor package in R (Viechtbauer, 2010). Cohen's guidelines of 0.2, 0.5, and 0.8 were used to define small, medium and large effect sizes for descriptive purposes. Given the wide range of contexts under which the Colavita effect has been explored, a random effects rather than a fixed effects meta-regression model was applied (Thompson and Higgins, 2002). Furthermore, the majority of studies included reported a range of differences in experimental procedure. As such these factors were held as moderator variables to explore whether they could account for the variance of effect size between studies.

Outliers
In line with the guidelines outlined by Viechtbauer (2010), outliers and influential cases were identified and examined if: a) The absolute DFFITS value was larger than − p k p 3 /( ) where p is the number of model coefficients and k the number of studies, suggesting the average effect size to be influenced by inclusion of i th study. b) Cooks distance exceeded X p 2 , 0.5 , indicating the mahalanobis distance between studies to be decreased following the deletion of ith study. c) The study was shown to have considerable leverage upon the fit of the model based upon a hat value larger than p 3( /k).
For further information on these parameters see Viechtbauer (2010). Combined effect sizes are shown including and excluding influential studies. These studies were not included within the modelling of moderator variables.

Calculation of effect sizes
Measures of effect size were calculated using Hedges g av , derived using Cohen's d av where the average standard deviation of both sets of observations S ( av ) is used as a standardizer (Cumming, 2012;Cumming and Calin-Jagerman, 2017;Lakens, 2013)  (2) We acknowledge that this is not the optimal measure of effect size for studying within-subject phenomenon. Alternative effect size measures, such as Cohen's d rm (see Lakens, 2013) take into account the correlation (r) between measures. However, although r is typically reported for clinical pre-post test designs, r is not always reported in experimental designs where trials are intermixed and correlation is not of primary interest (Dunlap et al., 1996). Thus unless raw data can be obtained, r is not always available. Few solutions to this problem have been suggested. Borenstein et al. (2009) suggested estimating the correlation based upon related studies and performing sensitivity analyses with a range of plausible correlations. Alternatively, r can be estimated from available t and f statistics (Hullett and Levine, 2003). However if these exact statistics are also unavailable one may need to estimate effect size directly from the means and standard deviations (Dunlap et al., 1996). Cohen's d av provides a convenient solution to this problem.
A further issue occurs, however, when calculating the variance around Cohen's d av . Cumming (2012) Thus if the researcher is unable to derive r from the available information similar problems are faced when calculating the variance of Cohen's d av .
To resolve this problem we utilised a method adapted from the calculation of variance for Cohen's d for independent samples (Eq. (5) Note this is a conservative method yielding marginally wider confidence intervals, relative to Algina and Keselman's (2003) approximate method (Eqs. (3) and (4)), and thus assuming slightly greater variance. Where possible, we also calculated Vd av using Eq. (4) to estimate the true extent of the effect. For experiments studying the Colavita effect only 26 of the 71 experiments to be included contained sufficient information for calculation of r. In all of these cases our method proved to be more conservative; the mean variance was 0.114 (SD = 0.05) when calculated using Eq. (5) vs. 0.073 (SD = 0.03) when calculated using the approximate method outlined in Eq. (4) with knowledge of r. Whilst ′ Cohen s d av is the most appropriate method for sample estimates, it may be positively biased for population estimates. For this reason a corrected ′ Cohen s d av , Hedges g av was calculated using Eq. (6). Whilst the differences between d av and g av are very small, g av provides an unbiased estimate of effect size (see Cumming, 2012).
To summarise, Hedges g av (Eq. (6)) was used as the effect size measure within our analysis. The variance of g av was calculated using Eq. (5), in which d av was substituted with g av . 5

Moderator variables
Given the range of contexts in which the Colavita effect has been explored the studies included in our meta-analyses were heterogeneous in terms of the methods used. As such we explored the following 8 factors by including them as moderator variables within a mixed-effects model of the data: • Number of response keys (2 or 3). Note that studies including only a single response key were considered for the response time analysis only, as Colavita errors cannot be made with a single response key.
• Ratio of visual, auditory and bimodal targets (and in one case no target present). • Whether auditory and visual stimuli were perceptually matched in intensity (either subjectively or based on thresholds).
• Stimulus congruency: stimuli could be "congruent" semantically, 2 Script available at https://osf.io/d7b3d/. 3 The equation used here is taken from Cumming and Calin-Jagerman (2017) but is also referred to as the common language effect size (Z) (Lakens, 2013;McGraw and Wong, 1992). Fig. 2. Effect sizes and 95% confidence intervals of studies reporting "visual only" responses on bimodal trials (the Colavita effect) and "auditory only" responses on bimodal trials. Symbol size reflects sample size. Weighted effect sizes are shown for all studies, all studies excluding outliers (asterisked experiments) and studies examining children and adults separately. Positive effect sizes indicate more "visual only" responses on bimodal trials. Negative effect sizes indicate more "auditory only" responses on bimodal trials. R.J. Hirst et al. Neuroscience and Biobehavioral Reviews 94 (2018) 286-301 e.g. picture of a cat and the sound of a cat, or spatially, e.g. a visual stimulus on the left and a sound on the left. Likewise stimuli could be "incongruent" semantically, e.g. a picture of a cat and sound of a dog, or spatially, e.g. visual stimulus on the left auditory stimulus on the right.
• Attentional manipulation: was attention biased towards the visual or auditory modality either through arousal, cueing, perceptual biasing (e.g. if the light was twice the subjective intensity of the sound) or via instructional manipulation (e.g. participants asked to attend to or respond only to auditory information). Fig. 2 illustrates the effect size of the Colavita effect in each experiment within each study. Positive effect sizes indicate more "visual only" responses on bimodal trials. Conversely experiments with negative effect sizes found more "auditory only" responses on bimodal trials. The combined effect size estimate reached Cohen's standard for a small effect size, 0.44 (SE = 0.1), but was significant (p < .001). This suggests that participants made more visual-only responses under bimodal stimulus presentation than auditory-only responses. One experiment (Monem and Filmore, 2016, experiment 1.2.1) was identified as an influential case. Removal of this experiment decreased the overall effect size to 0.4 (SE = 0.09), however this was still significant (p < .001).

Error data analyses: the Colavita effect
To explore the effects of moderator variables a mixed meta-regression model was conducted in which the intercept was set to reflect the effect size of studies using the most frequently used experimental parameters (adult participants, simple stimuli that were neutral in congruency and attentional manipulation, a trial ratio of 40 (visual): 40 (auditory): 20 (bimodal), 2 response keys). All studies included in this analysis presented stimuli at fixed intensities.
The estimated amount of residual heterogeneity in this meta-regression model (tau 2 = 0.23, SE = 0.06), suggested that the included moderator variables accounted for 42.54% of the variability. This was significant based upon an omnibus test (QM(12) = 47.46, p < .001). The intercept significantly differed from 0 (p < .001) with an effect size estimate of 0.79 (SE = 0.15). Only one factor, age group, significantly influenced this effect size estimate (p < .001) suggesting that experiments with child participants (aged 6-12 years) decreased this effect size by 0.89 (SE = 0.18). Six separate ANOVAs were then conducted to clarify the effect of each factor upon the intercept. These ANOVAs supported the mixed model indicating that only age group influenced the effect size of the Colavita effect (see Table 2). It should be noted however that a test for residual heterogeneity was also significant (QE(56) = 211.66, p < .001), suggesting other factors not accounted for in this model are also likely to be important.

Effect of age group
A further model was fitted to directly compare the effect sizes of studies using adult and child participants (regardless of other factors).
For details of studies included in this comparison see Table 1, column 4 labelled Age group. Unlike the model described above, here we included studies using all types of ratio and stimuli (rather than only "typical" parameters). This model indicated that the effect size significantly differed from zero in adults ( 0.76, SE = 0.09; p < .001) but not children (−0.26, SE = 0.13; ns). Thus, although children appeared to show a small reverse Colavita effect, this did not reach significance. The effect size seen in experiments with children was significantly smaller than the effect size found in adults ( p < .001).

Publication bias
To evaluate the presence of publication bias, data from studies included in model 1 (analysing the Colavita effect) were plotted as a funnel plot (Fig. 3). The amount of scatter around the true effect should decrease with decreased sampling variance/increased sample size, thus producing a classic "funnel" shape (Macaskill et al., 2001). Publication bias is associated with funnel plot asymmetry (Egger et al., 1997), whereby studies with large sampling variance/smaller sample size cluster to the left or right of the true effect. To quantify asymmetry a meta-analytic mixed effects regression analysis was performed, holding sample size as a predictor variable. This test indicated no significant asymmetry (z = 1.04, p = .3, Fig. 3), suggesting the reported findings were not influenced by publication bias.

Asymmetrical facilitation: response time analyses
We used studies that had reported response times to auditory and visual stimuli under unimodal and bimodal conditions to investigate whether the Colavita effect occurs due to asymmetrical facilitation and inhibition (Sinnett et al., 2008). Our first analysis compared response times to visual stimuli presented with an auditory stimulus (i.e. bimodal) to response times to unimodal visual targets. This asks if auditory stimuli facilitate response times to visual targets. Our second analysis compared response times to auditory stimuli presented with a visual stimulus (i.e. bimodal) to response times to unimodal auditory targets. This asks if visual stimuli impede response times to auditory Table 2 Statistics resulting from additional analyses of variance (ANOVAs) exploring the effect of each factor upon the intercept of the mixed model (i.e. the overall effect size of the Colavita effect). One factor, age group, significantly influenced the effect size of the Colavita effect. df = degrees of freedom, QM = omnibus test statistic.  targets. Across both sets of analyses positive effect size values would indicate response times were faster to the target under bimodal conditions. Conversely, negative effect sizes would indicate response times were faster to the target in unimodal conditions. As in our analysis of Colavita errors, we also test the effect of moderators in both sets of analyses to investigate if response time effects were modulated by; ratio, response keys (1 verses 2 as response time data were not available for any study using three keys), stimulus category, congruency, attentional manipulation, age group and whether stimuli were matched in intensity. This latter factor could only be included for the effect of audition on response times to visual targets, as all studies comparing unimodal and bimodal visual response times matched stimulus intensity.

Comparing response times to visual stimuli presented unimodally and bimodally
The combined effect size resulting from comparing response times to visual stimuli under unimodal vs. bimodal conditions was −0.26 (SE = 0.17) and non-significant (Fig. 4). Two experiments (Egeth and Sager, 1977, experiment 4.2;Koppen and Spence, 2007b, experiment 2.1) were identified as influential outliers. Removal of these studies resulted in an effect size of −0.43 (SE = 0.13), which significantly differed from 0 (p < .001). Contrary to Sinnett et al.'s (2008) predictions of asymmetrical facilitation, response times were slower for visual stimuli accompanied by auditory stimuli compared to when they were presented alone.
To explore the effects of moderator variables a mixed meta-regression model was conducted in which the intercept (reference) was set to reflect the effect size of studies using the most frequently used experimental parameters, as above. This model indicated that 96.74% of the residual heterogeneity (tau 2 = 0.01, SE = 0.04) was accounted for by the inclusion of moderator variables (QM(12) = 75.25, p < .001). The effect size estimate of the intercept was large (−0.95, SE = 0.12), and decreased in studies using ratios in which bimodal stimuli were more frequent (20:20:60, 25:25:50 and 33:33:33; yielding estimated changes of 1.67 (SE = 0.38, p < .001), 1.13 (SE = 0.44, p < .01) and 0.39 (SE = 0.18, p = .0277) respectively). Thus when bimodal trials were infrequent (20%) response times were slower to visual targets under bimodal conditions. However when bimodal targets were more frequent (33%, 50% or 60%) this effect was decreased. The effect size was also decreased by 1.53 (SE = 0.36, p < .001) in studies using complex stimuli and increased by 1.34 (SE = 0.55, p = .0148) in experiments using congruent stimuli. In line with this, post-hoc ANOVAs showed a significant overall effect of ratio, stimulus category, and congruency upon the intercept whilst other factors did not yield a significant overall effect (Table 3). A test of residual heterogeneity was non-significant (QE(11) = 11.95, p = .37) suggesting there was no further heterogeneity not accounted for within the model.
Given the significant effect of ratio (i.e. the balance of audio-visual, unimodal visual and unimodal auditory trials) and stimulus category (i.e. simple stimuli such as flashes and tones versus complex stimuli such as images and naturalistic sounds) found above, two further models were fitted to directly compare the effect size of multisensory facilitation/interference of studies using different ratios and stimulus categories regardless of other factors. A further model was not fitted to explore the effect of congruency as this had only been manipulated in one study.
The model for ratio indicated that only studies using the ratios 40:40:20 yielded effect sizes that significantly differed from 0 (p < .001). This suggested that when bimodal trials were infrequent (20%) response times to visual stimuli were slower under bimodal conditions. However when bimodal trials were more frequent (33%, 50% or 60%) response times were not significantly affected by auditory stimuli.
The model addressing stimulus category (simple vs. complex) revealed that only experiments using simple stimuli yielded an effect size that significantly differed from 0 (p < .001). This suggested that participants were slower to respond to visual stimuli paired with auditory stimuli but only when simple stimuli were used.
Overall these findings were not consistent with the hypothesis that response times to visual targets would be faster under bimodal vs. unimodal conditions. Rather, these findings suggested response times were slower to visual targets paired with auditory stimuli particularly when the frequency of bimodal targets was low and when simple stimuli were used.

Comparing response times to auditory stimuli presented unimodally and bimodally
The combined effect size for unimodal auditory vs. bimodal auditory response times was medium, (−0.57, SE = 0.08), and significant (p < .001). No experiments were identified as outliers.
A mixed meta-regression model was fitted for this effect in which studies using the parameters outlined as standard (see above) were used as the intercept. This model revealed no significant remaining heterogeneity (tau 2 = 0, SE = 0.04, QE(14) = 8.35, p = .8701) and a significant effect of moderators (QM(9)=20.5, p = .0248). However posthoc ANOVAs did not indicate any of the moderator variables to significantly influence the intercept (Table 4). From this we conclude that participants were slower for auditory targets paired with visual stimuli compared with unimodal targets, as can be seen in Fig. 5, and this was not modulated by experimental parameters.

Is the bimodal slowing effect between vision and audition symmetrical?
Contrary to the prediction based on the hypothesis of Sinnett et al. (2008) we found that vision slowed response times to auditory targets and vice versa. Robinson et al. (2016) noted that this might occur when multiple response keys are used, and conceptualised sensory dominance via the relative extent to which one sense slows another. They found that, when a single response key was used, visual stimuli slowed auditory response times more than auditory stimuli slowed visual response times. Moreover, when separate response options were available, auditory stimuli also slowed response times to visual stimuli. The authors interpret the extent to which one sense slowed the other as a measure of sensory dominance. To test whether vision slowed response times to auditory targets more than audition slowed response times to visual targets, a final model was fitted to directly compare the effect sizes yielded in our former two comparisons. No significant difference was found, suggesting visual and auditory stimuli slowed response times to the opposing modality to a similar extent.

Discussion
The current study quantitatively demonstrates that Colavita errors, whereby participants report only the visual element of an audio-visual target, are a robust experimental phenomenon. Mixed-effects analyses also corroborated the suggestion that Colavita errors are relatively insensitive to response demands, attentional manipulation, stimulus ratio, stimulus complexity, and congruency. However, residual heterogeneity did remain within the model therefore it should be noted that other factors not accounted for in our model are likely to influence the effect size of the Colavita effect.
Furthermore, we showed that the Colavita effect may be modulated by age, in that it is smaller, perhaps even reversed, in childhood. Although the current analysis includes only 2 childhood studies, these studies include data from a relatively large sample of 187 children aged between 6 and 12 years (Nava and Pavani, 2013, n = 51;Wille and Ebersbach, 2016, n = 136). If the tentative finding of a reversed Colavita effect in children appears in further studies this would be in line with evidence suggesting an auditory preference in childhood (Napolitano and Sloutsky, 2004;Robinson et al., 2016;Sloutsky, 2004, 2010;Sloutsky and Napolitano, 2003) and difficulty ignoring auditory distractions in childhood (Hanauer and Brooks, 2003). It should be noted that we used a binary categorisation of age group in our analysis ("adult" or "child"). Wille and Ebersbach (2016), however, reported a transition towards visual dominance around 9 years of age. As such, it must be considered that the size of the Colavita effect we report in children here likely differs between younger and older children. These previous findings together with the current data make an interesting case for the fluctuation of sensory dominance across the lifespan and highlight this as a field warranting further investigation.
Our response time analyses suggested that response times were slower for both visual and auditory stimuli when participants responded under bimodal rather than unimodal conditions and the effects of vision on audition and vice versa were not significantly different. The current study therefore does not suggest an asymmetrical relationship between vision and audition as proposed by Sinnett et al. (2008). They hypothesised a co-occurrence of multisensory facilitation and inhibition whereby auditory stimuli facilitate visual detection whilst visual stimuli inhibit auditory detection. This asymmetry was proposed to lead to the Colavita effect, since a visual response would be more likely to occur first on bimodal trials. An alternative, symmetrical, prediction is that response times are always faster under bimodal conditions. This would be expected based upon the known principles of multisensory integration, whereby neural responses elicited from bimodal targets are greater than unimodal targets (i.e. additive; see Stanford et al., 2004). However, our findings indicated that response times were in fact slower under bimodal conditions. This finding appears contrary to both asymmetric and symmetric models of multisensory facilitation. One likely explanation for our findings of slowing on bimodal trials is that most studies used at least two response keys, whereas previous literature finding multisensory facilitation (faster responses on bimodal trials) has used one response key (Forster et al., 2002;Gondan et al., 2005;Sinnett et al., 2008). Moreover, most Colavita studies traditionally present response time data only for correct trials. If multisensory facilitation does contribute to the Colavita errors, the beneficial effects of audition upon visual response times might be more evident within incorrect trials. For example, in order to respond to a bimodal target correctly (i.e. with both buttons) it may be that participants must first suppress the automatic tendency to respond towards only the visual target and then make the correct, bimodal, response. Thus, response times on correct trials would be slower due to the need to suppress automatic responses. This explanation is at present tentative.
Our analysis indicated that slowing of responses to visual targets by auditory stimuli was decreased in studies using fewer bimodal trials. This contradicts previous findings by Sinnett et al. (2007, experiment 3), who found that the frequency of bimodal targets did not influence reaction times. Thus, although the influence of stimulus ratio on response times was not revealed at the single study level, combining across several studies did yield this effect. It is possible that a more equal distribution of unimodal and bimodal target types (33% visual, 33% auditory and 33% audio-visual) produces equivalent response times across targets by limiting effects such as novelty.
Only one adult study included in our analysis of Colavita errors yielded a clear reverse Colavita effect (Ngo et al., 2011). This study utilised a repetition detection variant of the Colavita paradigm. Participants were required to detect (n-1) repetitions in auditory, visual and audio-visual information. The temporal demands of this task, however, were predicted to introduce auditory dominance (Welch and Warren, 1980). Ngo and colleagues also predicted that this would be exaggerated by the longer lasting nature of echoic vs. iconic short-term memory. The reversal of the Colavita effect in this study is therefore attributed to arise from a greater visual masking of targets by intervening irrelevant items under visual vs. auditory conditions. In line with this, if the intervening item was semantically meaningless (a pattern mask/ burst of white noise), neither auditory nor visual dominance was observed.
Finally, it is notable that Colavita errors are not the only method by which sensory dominance has been operationalized, and other methods have not consistently inferred visual dominance in adults. As outlined in our final analysis of response times, Robinson et al. (2016) propose that, in adults, when a single response key is used, auditory stimuli slow response times to visual targets more than vice versa (suggesting auditory dominance). Conversely, when multiple separate responses are required to visual, auditory, and bimodal targets (as in many of the included Colavita studies) visual dominance is seen. Interestingly, Barnhart et al. (2018) recently demonstrated that although auditory dominance effects (operationalized via response times) occurred in children and young adults, the reverse occurred in older adults. This indicates a shift in sensory dominance across the lifespan and enhanced visual dominance in later life. In the current analysis we found the extent to which vision slowed audition and vice versa did not differ, and this did not differ between adults and children. Nevertheless, this may have also been influenced by response times being based on correct trials (if slower responses were needed to make a correct response) and the limited number of child experiments included for analysis.

Conclusions
The current study provides an updated synthesis of literature Fig. 5. Effect sizes and 95% confidence intervals for studies/experiments reporting response times (RT) for auditory targets under unimodal and bimodal conditions. Symbol size reflects sample size. Positive effect sizes indicate RT was faster under bimodal versus unimodal conditions. Negative effect sizes indicate RT was faster under unimodal versus bimodal conditions.
surrounding the Colavita effect. The Colavita effect appears to be a robust phenomenon with medium effect size in adults, although not in children. The Colavita effect also appears insensitive to many experimental manipulations although it may be reversed under some designs (Ngo et al., 2011). This study highlights a need to examine the Colavita effect across the lifespan and suggests that visual dominance over audition may be weaker, or even reversed, in childhood.
Following this, and in answer to our original postulation, if you are an adult reading this paper you may be more distracted by an email pop-up versus your phone ringing. Furthermore, if your phone rings at the same time you see an email pop-up you may not answer (or hear) the phone at all. For this, you can blame sensory dominance.

Ethical conduct
The methodology included here was approved by the ethical review board of the School of Psychology, University of Nottingham, and conducted in accordance with the declaration of Helsinki.

Funding
This work was supported by the Economic and Social Research Council [grant number ES/J500100/1]