The association of brightness with number/duration in human newborns

Human neonates spontaneously associate changes in magnitude across the dimensions of number, length, and duration. Do these particular associations generalize to other pairs of magnitudes in the same way at birth, or do they reflect an early predisposition to expect specific relations between spatial, temporal, and numerical representations? To begin to answer this question, we investigated how strongly newborns associated auditory sequences changing in number/duration with visual objects changing in levels of brightness. We tested forty-eight newborn infants in one of three, bimodal stimulus conditions in which auditory numbers/durations increased or decreased from a familiarization trial to the two test trials. Auditory numbers/durations were paired with visual objects in familiarization that remained the same on one test trial but changed in luminance/contrast or shape on the other. On average, results indicated that newborns looked longer when changes in brightness accompanied the number/duration change as compared to no change, a preference that was most consistent when the brightness change was congruent with the number/duration change. For incongruent changes, this preference depended on trial order. Critically, infants showed no preference for a shape change over no shape change, indicating that infants likely treated brightness differently than a generic feature. Though this performance pattern is somewhat similar to previously documented associations, these findings suggest that cross-magnitude associations among number, length, and duration may be more specialized at birth, rather than emerge gradually from postnatal experience or maturation.


A Assumption Checks
We first checked the form of the raw distribution of looking times for test trials (Figure 1). The distribution appears to be roughly skewed to the right, consistent with the observation by Csibra et al. (2016) that infant looking times are well described by a log-normal distribution. In addition, a large proportion of observations occurred at 60 s, indicating that when trials ended at that time a number of infants may have been able to continue to maintain gaze at the monitor. In other words, some of the looking times were only partially observed; if this is ignored in statistical analysis, some cell means and their corresponding error variances may be underestimated as well as decreasing the size of any interactions. See the Appendix for additional assumption checks.
In addition to the histogram calculated over raw looking times, Figure 1 shows the best fitting normal and log-normal distributions, calculated with and without accounting for censoring using the fitdistrplus package in R (Delignette-Muller & Dutang, 2015). Without assuming censoring, the AIC value for the best-fitting normal distribution was 835.02 and the AIC value for the best-fitting log-normal distribution was 843.36. Assuming censoring, the AIC value for the best-fitting normal distribution was 767.04 and the AIC value for the best-fitting log-normal distribution was 750.48. The model with the lowest AIC, and thus the best fit, is the censored log-normal. We thus expect that the model that best accounts for the raw data, ignoring any effects of experimental manipulation, will be one fitted with log-normally distributed looking times and that additionally accounts for censoring.
More important for the validity of statistical models than the distribution of the raw looking times is the normality and homogeneity of the residuals for interpretation of standard errors (and thus statistical significance tests via F tests). After fitting an analysis of variance of raw looking times as a function of First Test, Trial Type, and Condition, we examined the model residuals as a function of the fitted values to detect any violations of the homogeneity of variances and a quantile-quantile (QQ) plot to assess potential non-normality of the residuals.
The Residuals-vs-Fitted plot in Figure 2A shows an important effect of censoring in which the maximum value of the residuals declines as the fitted values increases. In addition, the lower bound for looking times at 0 causes the minimum residual value to increase as the fitted values decrease. The Q-Q plot in Figure  2B shows deviations from normality at the tails of the distribution that are likely the result of the lower bounds at 0 and 60 s. Taken together, these suggest that adjustments to the classical ANOVA model may be necessary for accurate assessment of uncertainty in the model's estimates.

B Notes on Regression Model for the Current Experiment
We first give an explanation of the basic model structure and how it expands on the classical ANOVA model in the first section. We then give a summary of 6 different models examining the effects of different modifications of approach: Bayesian vs. maximum-likelihood versions of the log-linear and linear models without censoring and Bayesian models with censoring.
Each factor in an ANOVA with number of levels n is coded as n − 1 separate predictors in a regression. Each subcondition within each factor corresponds to a unique numerical score given by a standard, centered coding scheme for each predictor. (Centering-subtracting the mean-is essential for the interpretation of the coefficients for main effects in the presence of interactions. Since all predictors were centered, the intercept term corresponds to the grand mean and the meaning of each coefficient corresponds to the effect size of the corresponding predictor, averaging over the others.) We contrast-coded Trial Type (1-Change = -0.5; 2-Change = 0.5) and First Test (1-Change First = -0.5; 2-Change First = 0.5). Thus, the coefficient for the Trial Type predictor can be interpreted as the average difference in (raw or log) looking times to the two trial types (2-Change -1-Change). The coefficient for the First Test predictor can be interpreted as the average looking time at test for each of the two types of trial orders (average looking time for 2-Change First newborns -average looking time for 1-Change First newborns).
We simple-coded Condition with the Shape-Change group as the reference group. Simple codes are equivalent to centered dummy codes. Since there were three conditions, this required 2 simple codes; one for which the coefficient reflects the average difference between Congruent and Shape-Change looking times and one for which the coefficient reflects the average difference between Incongruent and Shape-Change looking times. For the first code, rows from the Congruent condition had a score of 2/3 while the other two conditions had a score of −1/3. For the second code, rows from the Incongruent condition had a score of 2/3 while the other two conditions had a score of −1/3.
Interaction terms for each factor are generated by multiplying each possible pair of codes for each predictor for 2-way interactions and each possible triple of codes for 3-way interactions. For example, the 2-way interaction between Trial Type and Condition is represented by two terms: one representing how the effect of Trial Type may differ between the Congruent and Shape-Change conditions and one representing how the effect of Trial Type may differ between the Incongruent and Shape-Change condition.
We also included z-scored familiarization looking time as a predictor.
In all, the regression model includes 13 coefficients of interest: 1 representing the intercept, 1 representing familiarization time as a covariate, 1 representing Trial Type, 1 representing First Test, 2 representing Condition, 2 representing the interaction between Condition and First Test, 2 representing the interaction between Condition and Trial Type, and 2 3-way interaction terms. In addition, the model included an intercept term for each newborn; these were constrained to be deviations from the main intercept and were modeled as being generated by a normal distribution.
In constructing the regression models, we incorporated the spirit of AN(C)OVA by assigning one prior to each factor. Each batch was assigned a normal prior centered at zero and standard deviation 3 times the residual SD from the data reported in de Hevia et al. (2014), with the following exceptions: (1) since it was not of primary interest, the Intercept was assigned a prior with a mean at the empirical mean and a standard deviation of 1, (2) the subjects' individual intercepts were drawn from a normal distribution with mean 0 and the standard deviation was given a Cauchy(0, 1) hyperprior. In addition, to fix issues with divergent transitions in the Hamiltonian Monte Carlo change due to data sparsity, the standard deviation was multiplied by a scaling factor with a standard normal prior (for details on non-centered parameterization, see Betancourt & Girolami, 2013).

D Notes on Aggregate Model
To build an analogous regression model for the aggregated data across the current and published work, in addition to contrast-coded Trial Type (1-Change = -0.5 and 2-Change = 0.5), First Test (1-Change First = -0.5 and 2-Change First = 0.5), and z-scored Familiarization Time, we constructed a new coding scheme for Condition to account for the following contrasts across the 5 aggregated groups of 16 infants each with different familiarization types: