Atypical processing of voice sounds in infants at risk for autism spectrum disorder

Adults diagnosed with autism spectrum disorder (ASD) show a reduced sensitivity (degree of selective response) to social stimuli such as human voices. In order to determine whether this reduced sensitivity is a consequence of years of poor social interaction and communication or is present prior to significant experience, we used functional MRI to examine cortical sensitivity to auditory stimuli in infants at high familial risk for later emerging ASD (HR group, N = 15), and compared this to infants with no family history of ASD (LR group, N = 18). The infants (aged between 4 and 7 months) were presented with voice and environmental sounds while asleep in the scanner and their behaviour was also examined in the context of observed parent–infant interaction. Whereas LR infants showed early specialisation for human voice processing in right temporal and medial frontal regions, the HR infants did not. Similarly, LR infants showed stronger sensitivity than HR infants to sad vocalisations in the right fusiform gyrus and left hippocampus. Also, in the HR group only, there was an association between each infant's degree of engagement during social interaction and the degree of voice sensitivity in key cortical regions. These results suggest that at least some infants at high-risk for ASD have atypical neural responses to human voice with and without emotional valence. Further exploration of the relationship between behaviour during social interaction and voice processing may help better understand the mechanisms that lead to different outcomes in at risk populations.


Introduction
One of the basic foundations for social communication is the human voice, which is arguably the most important acoustic stimulus in an individuals' social environment as it carries important cues such as speaker identity and emotional state. Further, research with adults has revealed that cortical regions along the superior temporal sulcus (STS) show stronger activation when participants listen to human vocalisations (speech, laughter, crying, coughing, etc.) as compared to nonvocal environmental sounds and acoustically matched stimuli (Belin, Zatorre, Lafaille, Ahad, & Pike, 2000). Activation of these temporal voice-selective areas can also be modulated by emotional information carried on the voice (Grandjean et al., 2005), as can activation in other areas such as inferior prefrontal cortex (Fecteau, 2005), premotor cortical regions (Warren et al., 2006) and the amygdala (Fecteau, Belin, Joanette, & Armony, 2007), insula and orbitofrontal cortex (Chikazoe, Lee, Kriegeskorte, & Anderson, 2014). Hence there is compelling evidence that specific regions of the human brain respond to voice and emotional voice sounds. One important question, however, is how the network of specialized regions tuned to social information emerges in the developing human brain. Addressing this question is crucial not only to better understand typical development, but also to increase our understanding of disorders that involve impaired development of social cognition, such as autism spectrum disorders (ASD). Functional neuroimaging studies by our group and others have revealed that from early infancy the typically developing brain is tuned to perceive and process information carried by the voice (Dehaene-Lambertz, Dehaene, & Hertz-Pannier, 2002;Lloyd-Fox, Blasi, Mercure, Elwell, & Johnson, 2012), and can be modulated by emotions (Grossmann, 2010). In a previous study we addressed the issue of the emergence of specialized brain regions for processing the human voice (Blasi et al., 2011) by investigating the brain responses to adult non-speech vocalisations (emotionally neutral, emotionally positive, and emotionally negative) and non vocal sounds in a group of typically developing infants (aged between 3 and 7 months) asleep in the MRI scanner. Our results showed an early functional specialisation for processing the human voice, with significant differential activation to vocal sounds (compared to non-vocal sounds) in the anterior portion of the temporal cortex [similarly to the findings in adults (Belin et al., 2000)], and also in the medial frontal gyri. In addition, we compared the brain responses to vocal sounds with positive (laughter) and negative (crying) valence to neutral vocal sounds and we found that sad vocalisations modulated the activity of brain regions involved in processing affective stimuli such as the orbitofrontal cortex (Kringelbach, 2005) and insula (Morris, Scott, & Dolan, 1999), whereas there was no differential response between happy and neutral vocalisations. These results point toward an emergence of specialisation of brain regions for processing stimuli that enable communication and learning of social behaviour. The data collected in our previous study has contributed to the LR group in the current study with the exception of three participants who had to be excluded from the current analysis (see the Methods section).
As ASD are characterised by deficits in social communication and behaviour, it is of paramount interest to investigate further when these deficits emerge in the process of development. Based on the possibility that one cause of the deficit in communication in ASD is an underlying atypical perception of sensory stimuli (C.R.G. Jones et al., 2009), we hypothesised that infants at-risk of later ASD may not show the early specialisation for processing the human voice. Auditory processing in the context of ASD has been extensively investigated with neurophysiological techniques such as eventrelated potentials (ERPs) which, thanks to their high temporal resolution, can reveal stimulus-specific neural responsiveness (see the reviews by O'Connor, 2012 and Kujala, Lepist€ o, & N€ a€ at€ anen, 2013). These studies have shown that both children and adults with ASD present an enhanced proficiency in processing low-level auditory stimuli (such as tones), however this advantage is lost when the complexity of the stimuli increases (O'Connor, 2012), affecting their ability to learn and understand language (Lepist€ o et al., 2008). These effects are reflected in the anatomical distribution of the responses to speech stimuli across age ranges in the context of ASD, with reduced activation in the left temporal and frontal regions (regions typically associated with language processing). Further, it has also been reported that these deficits in the left hemisphere may be compensated for by enhanced dominance of the right hemisphere (O'Connor, 2012). Right hemisphere dominance in ASD may be associated with enhanced proficiency in processing spectral characteristics of auditory stimuli, whereas left hemisphere deficiencies may be associated with diminished performance in processing temporal aspects of auditory stimuli with direct effect on speech perception (Haesen, Boets, & Wagemans, 2011). In the present work we focus on information about the human voice without the complexities of speech and language.
One particular area of interest for the analysis of voice stimuli is the extraction of information regarding emotions. Although many studies of brain function have addressed processing emotional facial expressions in the context of ASD (e.g., see Stewart, McAdam, Ota, Peppe, & Cleland, 2013), relatively few have examined the processing of socially relevant auditory information. Those which are available (e.g., Gervais et al., 2004) have reported that when presented with voice and non voice sounds, neurotypical adults showed c o r t e x 7 1 ( 2 0 1 5 ) 1 2 2 e1 3 3 stronger activation to voice compared to non voice stimuli whereas those with ASD did not. Further, no significant differences were reported between the groups in the responses to non voice sounds. In addition, and related to the evidence for atypical voice processing, adults and children diagnosed with ASD show difficulty in recognising the emotions of others when the information is conveyed by acousticprosodic stimuli (Golan, Baron-Cohen, Hill, & Rutherford, 2006;Stewart et al., 2013). In summary, there is strong evidence showing that individuals with ASD have atypical processing of social and emotional stimuli. However, the developmental time course of these atypicalities is unclear.
In order to better understand how, when and where developmental trajectories that result in ASD deviate from the typical, several research groups have studied infant siblings of older children diagnosed with ASD (E.J.H. , as around 20% of these infants will go on to a later diagnosis themselves (Ozonoff et al., 2011). With the first overt behavioural symptoms appearing only toward the end of the first year of life, affected infants will typically not be routinely diagnosed before their third birthday (E.J.H. Jones et al., 2014). However, results from infant sibling studies suggest that the underlying differences in brain function that later on give rise to the behavioural symptoms may already be evident during the first year of life (Elsabbagh et al., 2012;Lloyd-Fox et al., 2013). Despite this, to our knowledge, no studies have directly investigated brain response to human vocal sounds in infants at high risk of ASD. Yet such studies could provide crucial evidence on the onset of the disorder as, according to the Interactive Specialisation perspective on typical development (M.H. Johnson, 2000), cortical areas that become tuned to social stimuli develop through a process of reinforcement by differential patterns of experience. Disruption of this process may arise due to an atypical developmental trajectory compounded by later atypical interactions with the environment, which may ultimately lead to the well-established profile of ASD symptoms by the age of diagnosis. Moreover, for the developing infant, an important canalisation of environmental experience is through the interaction with their primary caregiver. Therefore, an increasing number of studies have suggested that the nature of this interpersonal interaction may be a sensitive early indicator of later problems (for a review, see E.J.H. Jones et al., 2014), and could provide an important context for a more complete understanding of the disorder (Elsabbagh & Johnson, 2010). However, relatively little is known on how atypical interaction patterns influence children's neurobiological development (Taylor, Eisenberger, Saxbe, Lehman, & Lieberman, 2006), and to our knowledge, there have been no studies that have investigated the moderating role of infant and parent behaviours on the association between risk status and brain activation.
Given the current lack of evidence, this study utilized fMRI to examine differences in brain response to human vocalisations in sleeping 4e7-month old infants. Infants with no family (first degree relative) history of ASD (LR group) were compared with infants with at least one full sibling with a community clinical diagnosis of ASD (HR group). Three specific questions were asked: first, are there differences in voice processing between HR and LR infants?; second, is there a group difference in the infant's sensitivity to affect (i.e., sad emotions) in vocal sounds?; and third, is variation in parentechild interaction associated with differences in infant brain responsivity?

2.
Material and methods 2.1. Participants fMRI data were acquired from a group of 33 infants at the Centre for Neuroimaging Sciences of the Institute of Psychiatry, Kings College London. 15 of the infants had at least one full sibling with a community clinical diagnosis of ASD (HR group, 147 ± 25 days of age, 10 male). These participants were within the average range of functioning (mean 96.8, standard deviation 9.86) as measured by the Early Learning Composite (ELC) standard scores of the Mullen Scales of Early Learning (Mullen, 1995). HR infants were recruited via the British ASD Study of Infant Siblings (BASIS), a UK collaborative network facilitating research with infants at risk for ASD that also provided ethical approval and informed consent, as well as background data on participating families. The remaining 18 participants had no family (first degree relative) history of ASD and had all been included in our previous work [( Blasi et al., 2011); LR group, 154 ± 26 days of age, 7 male]. 3 infants from the original LR group of 21 had to be excluded from the current study as one received an ASD diagnosis after the first publication, and two had incomplete fMRI data sets, with 4 and 6 trials missing (out of a total of 32) at the end of the run. Exclusion was necessary on this second ground as the transformation step required for the group comparisons needs complete experimental data sets for the calculations. As a result, the data of the remaining 18 LR participants were reanalysed after smoothing, at the end of the pre-processing sequence (see detailed description of the data analysis in the Supplemental Experimental Procedures section of the Supplementary Material). Infants in the low-and high-risk groups were of similar age (independent samples t-test, p ¼ .464, t ¼ .753); and Mullen ELC standard scores (only available from 6 of the LR infants: mean 105, standard deviation 7.34) were also similar (p ¼ .076, t ¼ 1.869).
As part of a multi-centre project, this research was also approved by the Institute of Psychiatry

fMRI data acquisition: outcome
Details of the experimental design are described in our previous publication (Blasi et al., 2011). In brief, while naturally c o r t e x 7 1 ( 2 0 1 5 ) 1 2 2 e1 3 3 asleep in the scanner (without sedation) the infants were presented with three categories of adult non-speech vocalisations by different male and female speakers: emotionally neutral (yawning, sneezing or coughing), emotionally positive (laughter), and emotionally negative (crying) sounds. The infants were also presented with non-vocal environmental sounds with which they were likely to be familiar (toys and running water, hereafter referred as non voice). The stimuli were organized in a block design, in which 21 sec of auditory stimuli were alternated with 9 sec of rest. A complete fMRI session comprised 32 blocks (8 in each stimulus category) lasting a total of 16 min.
The MRI data were acquired on a clinical GE 1.5 T Twinspeed MRI scanner (General Electric, Milwaukee, WI, USA) equipped with an 8-channel head radiofrequency (RF) coil array. Details of the scanning sequences can be found in Blasi et al. (2011) and in the Supplemental Information.

Measures of maternal and infant behaviour in the context of mothereinfant interaction: moderator
Mothers participated in a laboratory based face-to-face play session for 5 min, within two weeks of the MRI session. Mother-infant interactions were video-recorded using a standard assessment protocol (Murray, Fiori-Cowley, Hooper, & Cooper, 1996). Mothers were asked to play with and talk to their infant (seated facing the mother) as they would normally, without the use of toys. Using the Global Rating Scales (Murray et al., 1996), four maternal (i.e., sensitivity, intrusiveness, remoteness and depressive affect) and 3 infant behavioural dimensions (i.e., attentiveness, activeengagement, and fretfulness) were coded (Supplementary Table 1 of the supplemental information), by two trained coders, blind to infant risk status. Inter-rater intraclass correlations (ICC) on a randomly selected 20% of the interactions ranged from .75 to .90, indicating acceptable inter-rater reliability. Measures of maternal and infant behaviours in the context of mothereinfant interaction were available for the 18 LR infants with fMRI data and for 13 of the 15 HR infants with fMRI data.

fMRI data analysis
We analysed the MRI data with XBAM (www.brainmap.co.uk/ xbam.htm) using a data-driven approach based on the standard general linear model adjusted to incorporate the potential differences between adult and infant HRF (Richter & Richter, 2003). Instead of the standard adult HRF, for each participant, we used the mean HRF estimated from all the other participants (regardless of group), thus producing the best estimate of the HRF unbiased by the participant being analysed (see details in Blasi et al., 2011), assuming that there are no significant differences in the HRF between the groups (Feczko et al., 2012). We then analysed the data for each individual infant using standard GLM analysis and the estimated unbiased HRF. The selection of the condition contrasts used for group comparisons was based on the results reported in our previous publication on a group of typically developing infants (Blasi et al., 2011). This narrowed down the contrasts to the following: (1) neutral voice versus non voice contrasts (neutral voice > non-voice; non voice > neutral voice); and (2) sad voice versus neutral voice contrasts (sad voice > neutral voice; neutral voice > sad voice).
Between group comparisons of the condition contrasts of interest revealed the clusters where group differences in voice processing were significant. However, this analysis did not provide information regarding the origin of the group differences, i.e., whether it was one group showing a stronger preference for one type of sound, or whether one group had a stronger preference for one type of sound whereas the other group showed preference for the other type of sound. In order to find out the origin of the group differences we extracted the betas (averaged across all voxels in a cluster of interest defined in the whole brain analysis) for each contrast (voice > non voice, non voice > voice, etc…) per participant. Then, the betas of the condition contrasts averaged across participants within each group were used as estimates of the group effect size in that cluster and, therefore, allowed us to identify the origin of the group difference.

Moderation of behaviour during mothereinfant interaction on the associations between risk-status and infant processing of vocal sounds
Moderation analyses were conducted on the contrasts that are related to social communication: neutral voice stronger than non voice (voice selectivity) and sad voice stronger than neutral voice (modulation of sad valence on the response to vocal sounds). Further, the regions of interest were selected from the list defined by the clusters with significant group differences (as the results of the fMRI data analysis indicated, see Table 1). We hypothesized that moderation of behaviour would occur in the regions typically reported in association with processing voice (Belin et al., 2000;Blasi et al., 2011), emotions (Peelen, Atkinson, & Vuilleumier, 2010) and forming part of the social brain network (Adolphs, 2003). Therefore, the following regions were selected from the voice selectivity contrast: left middle temporal gyri (clusters 2 and 16), left temporal lobe (cluster 17), left superior and medial frontal gyri (clusters 27 and 37) and right medial frontal gyri (clusters 32 and 39); and for the sad voice modulation contrast: right fusiform gyrus (cluster 4), and the left hippocampus (cluster 10). For each cluster, multiple linear regression models were constructed which included each participant's averaged beta value (as outcome) and an interaction term between group status and each behavioural dimension. FDR correction for multiple comparisons was applied to the results (Benjamini & Yekutieli, 2001).

3.1.
Voice processing in HR and LR infants (neutral voice vs non voice contrast)

Within group activations
Infants in the low-risk group showed significantly stronger responses to the neutral voice condition as compared to the non voice condition (voice selectivity), bilaterally in the superior and middle temporal gyrus, in the superior and middle c o r t e x 7 1 ( 2 0 1 5 ) 1 2 2 e1 3 3 frontal gyrus, and also in the right cingulate gyrus. By contrast, infants in the high-risk group preferentially activated to neutral voice over the non voice condition, only in the right inferior parietal lobule and (similarly to the low-risk group) in a region of the right cingulate gyrus (Fig. 1a and Supplementary Table 2).
In both groups, brain functional response for non-vocal sounds over vocal sounds was significant in the left temporal gyrus. Additionally, infants in the high-risk group showed significant preference for non-vocal over neutral vocal sounds in the left cerebellum and the right pre-central gyrus ( Fig. 1b and Supplementary Table 3).

Between group differences
There were significant differences in voice selectivity in the left middle temporal gyrus and, bilaterally, in the superior/ medial frontal gyri. Additionally, there were group differences in the left thalamus and caudate and right cerebellum ( Fig. 1c and Table 1). Specifically, in the clusters with significant group differences, these were mainly due to different preference for voice over non voice conditions: whereas the low-risk infants showed stronger preference for voice (positive sign of the averaged beta values for the contrast voice vs non voice), the high-risk infants showed a tendency to respond more to non voice compared to voice (negative sign of the averaged beta values for the contrast voice vs non voice).
There were no significant group differences in brain response to the non voice over neutral voice conditions.

3.2.
Sensitivity to sad affect in voice in HR and LR infants (sad voice vs neutral voice contrasts)

Differences within group
In the analyses of sensitivity to sad affect in voice (sad voice > neutral voice) the low-risk infants showed significantly stronger responses to sad compared to neutral voice in the left superior frontal gyrus and the right inferior frontal gyrus; whereas the high-risk infants showed activation to sad affect in a small cluster within the right cingulate gyrus (8 voxels) (Supplementary Table 4). With reference to the neutral voice greater than sad voice contrast, low-risk infants showed a stronger activation to neutral vocal sounds in the left middle frontal gyrus, right superior temporal gyrus and the right uncus, whereas high-risk infants, showed greater activation to neutral vocal sounds bilaterally in the fusiform gyrus (with more clusters in the right hemisphere), the right lingual gyrus, middle frontal gyrus and left precentral gyrus (Supplementary Table 5).

Differences between groups
In the analyses of sensitivity to sad affect in voice (sad voice > neutral voice) the low-risk infants showed stronger activation than high-risk infants to sad vocal sounds in the right fusiform gyrus and left hippocampus (Table 1 and Fig. 2). High-risk infants did not activate significantly more than the low-risk in any brain region. With reference to the neutral voice greater than sad voice contrast, group differences (mostly low-risk infants showing stronger activation than high-risk infants) were found bilaterally in the caudate, and the right superior frontal gyrus (Table 1 and Fig. 2).

Moderation by maternal and infant interactive behaviours on the associations between risk-status and infants processing of vocal sounds
For the contrast neutral voice > non voice, there were significant interactions between maternal and infant behaviours with risk status to predict infant processing of vocal sounds in a number of brain regions. Maternal intrusiveness Â risk status predicted activation in the left middle temporal gyrus (cluster 16, BA 21), whereas infant behaviours (attentiveness, fretfulness and active-engagement) interacted with risk status to predict activation in the medial frontal gyrus (clusters 32, 37 and 39, as summarised in Table 2). However, the only Table 1 e Group differences in brain activation. Clusters with significant group differences in voice-sensitivity (neutral voice > non voice), and sensitivity to sad affect (sad voice > neutral voice, and neutral voice > sad voice). In the last column, 'þ' represents within group neutral voice > non voice; '¡' represents within group neutral voice < non voice. BA ¼ Broadman area, Num voxels ¼ number of voxels in each cluster. c o r t e x 7 1 ( 2 0 1 5 ) 1 2 2 e1 3 3 effect that survived FDR correction for multiple comparisons was infant active-engagement Â risk status in cluster 32 (voice selectivity contrast), in the right medial frontal gyrus (BA 9). Similar trends in the interaction between infant activeengagement and risk status were observed in the other two clusters in the medial frontal gyrus (as shown in Fig. 3). In these three clusters, infants in the HR group show negative correlation between active-engagement and voice selectivity: in cluster 32, Pearson correlation ¼ À.  (Fig. 3). There were no significant differences in measures of active-engagement between the two groups (LR, mean ¼ 3.64, SD ¼ .76; HR, In contrast, neither maternal nor infant behaviours moderated the group differences found for the sad voice versus neutral voice fMRI contrasts.

Voice-processing in HR and LR infants
In this fMRI study, infants in the high-risk group show a striking atypicality in human voice selectivity. Whereas lowrisk infants show a clear pattern of stronger activation to voice sounds compared to non-voice sounds, in the middle and superior temporal regions, as well as the medial frontal gyrus, infants in the high-risk group show significantly less voice selectivity in these regions. Importantly, however, the two groups did not differ in non voice sound selectivity. The results in the low-risk group are consistent with previously published research with adults (Belin et al., 2000) and with infants of similar age, (Grossmann, Oberecker, Koch, & Friederici, 2010;Lloyd-Fox et al., 2012). Adding to these previous studies, we have established that between 4 and 7 months there is already voice specialisation along the STS (similarly to that described in adults), but also in other brain regions such as the inferior frontal and fusiform cortex. As infants develop, the network of regions specialised in voice processing becomes more efficient, it narrows and consolidates in the temporal cortex (M.H. Johnson, 2011;Lepp€ anen & Nelson, 2008), possibly freeing the frontal areas to be involved in higher level processing and expanding to the posterior part of the STS. Moreover, the diminished voice selectivity we found in 4e7-month old infants in the high-risk (compared to the low-risk group) is very similar to the responses found in adults: for instance, Gervais et al. (2004) report that adults with an ASD diagnosis show deficits in voice selectivity in similar cortical areas. Therefore our results are in line with those that suggest that an atypical cortical processing of socially relevant auditory information is already present in at risk infants from 4 to 7 months (Lloyd-Fox et al., 2013). In the present study, the use of fMRI has allowed us exploration of the specialisation for voice processing in the whole brain, while previous studies were restricted to responses in the surface cortical regions covered by the fNIRS sensor. In both fNIRS and fMRI studies, there is a clear reduction in voice selectivity in the group of high-risk infants, but a similar pattern of non voice selectivity in both groups of infants. This compelling consistency across sessions and imaging modalities further supports the hypothesis of an atypical processing of auditory stimuli in infants at risk for later emerging ASD.
The group differences in voice selectivity we observed were mainly located in the left hemisphere in a region often associated with language processing (Hickok & Poeppel, 2007). Previous fMRI research has also found reduced activation of the frontal-temporal regions to speech-related stimulation in ASD, sometimes coupled with increased activation in the right frontal regions to facilitate processing of auditory stimulation (O'Connor, 2012). These findings have been reported from very early in development [at 2e3 years of age (Redcay & Courchesne, 2008)], and they have been shown to increase with age, becoming more pronounced in 3e4 year olds with autism (Eyler, Pierce, & Courchesne, 2012). Although we did not find the compensatory hyper-responsivity in the right frontal region in our HR group (O'Connor, 2012;Redcay & Courchesne, 2008), possibly due to the young age of our participants and/or to the non-speech nature of our stimuli, our Fig. 2 e Neutral voice versus sad voice group differences. Representation on an age-appropriate infant template (Sanchez et al., 2012) of the between group differences in neutral voice versus sad voice contrast. Significant clusters with responses to sad voices stronger than to neutral voices are represented in cyan; significant clusters where response to neutral voice > sad voice are represented in blue. (a) Three-dimensional rendering of the group differences. (b) Results on slices of the same template. See also Table 2.
c o r t e x 7 1 ( 2 0 1 5 ) 1 2 2 e1 3 3 current findings raise the possibility that atypical voice processing from early infancy may be one of the contributing factors influencing disruption of the typical developmental trajectory of language acquisition (Lepist€ o et al., 2008). Our results are also in line with the Interactive Specialisation framework discussed earlier (M.H. Johnson, 2011), and a resulting lack of emerging specialisation of social brain regions in ASD. The Interactive Specialisation perspective on brain development views the process of emergence of the adult pattern of cortical specialisation as a progressive tuning of responses in certain cortical areas to social stimuli. According to this view, biases in attention and processing in early infancy are reinforced by differential patterns of subsequent experience, with the end result being the patterns of cortical specialisation associated with the social functions observed in adults. Therefore, the disruption of the mechanisms that bias infants to attend socially relevant mechanisms may, in turn, disrupt the typical trajectory that leads to the adult social brain network (Dawson et al., 2005;M.H. Johnson, 2011;Lloyd-Fox et al., 2012;Schultz, 2005).
In addition to the temporal regions, the low-risk infants also showed increased voice selectivity bilaterally in the medial frontal gyrus as compared to the high-risk infants. It has been suggested (Mundy, 2003) that impairment of this region and in the anterior cingulate may constitute a substrate for socio-cognitive deficits in ASD, as they both play a role in joint attention and other higher complex behaviours involving interaction with others. Regions of the frontal cortex have been reported to have an atypical overgrowth (Carper & Courchesne, 2005) and, possibly, an abnormal connectivity (Courchesne & Pierce, 2005) in children diagnosed with ASD. Hence, it is possible that atypical function of these regions c o r t e x 7 1 ( 2 0 1 5 ) 1 2 2 e1 3 3 may result in difficulties in the integration of information that gives relevance to vocal sounds that are then processed in the voice temporal regions. If correct, this disruption may also contribute to the diminished voice selectivity observed in our work (Haesen et al., 2011;O'Connor, 2012).

Sensitivity to sad affect in voice in HR and LR infants
In the analyses of possible differences between groups in the modulation of emotion on the brain responses to voice sounds, we found [similar to our previous publication (Blasi et al., 2011)] that this modulation was limited in both groups. This may be explained, in part, by our participants being asleepeas cortical activation in the response to auditory stimuli is reduced during sleep (Czisch, 2002). Therefore, it is possible that the differential brain activation between two vocal conditions in our sleeping participants may have been too subtle to detect. Nevertheless, significant group differences in sad voice modulation were found in the right fusiform gyrus and in the left hippocampus, with the low-risk participants showing stronger sad voice over neutral voice responses than the high-risk infants. Deficits in the amygdalafusiform network, which support the development of face perception and social cognitive skills, may be instrumental in emerging ASD, as the development of social perceptual skills during childhood provides important scaffolding for social skill development (Schultz, 2005). Atypical brain processing of socially relevant information may be linked with differences in behaviour during a highly social task such as mothereinfant interaction. Therefore, we also investigated potential moderation effects of the interaction between group status and mother or infant behaviours in the context of mothereinfant interactions observed within two weeks of the MRI scan.

Moderation by maternal and infant interactive behaviours
We found that the association between risk status and infant processing of voice in the right medial frontal gyrus was moderated by infant behaviour, characterised by activeengagement during observed mothereinfant interactions. This finding suggests that group differences in brain responsivity can be accounted for, in part, by differences in social experience, which in turn are possibly created by the infants themselves. Moreover, we found a marginally significant group effect on one of the measures of maternal behaviour during mothereinfant interaction: mothers of HR infants tended to display sad affect whilst interacting with their infant (M ¼ 3.91, SD ¼ .54), compared to mothers of LR infants (M ¼ 4.27, SD ¼ .46), although the difference was at trend level only (t ¼ À2.0, p ¼ .058). Therefore, it is possible that the infant's behaviour is driving the interaction in a way that the mother tends to modify her contribution to it in turn. Further, individual differences in infant behaviour require Fig. 3 e Association between behaviour in the context of mothereinfant interaction and fMRI activation. Representation of the interaction between the infant behavioural measure Active-Engagement and group status on the voice sensitivity contrast in clusters (a) 32 (left medial frontal gyrus, BA 9); (b) 37 (left medial frontal gyrus, BA 6); and (c) 39 (right medial frontal gyrus, BA 6). Pearson correlation coefficients between Infant Active-Engagement and fMRI activation were calculated within group at each cluster; * and ** indicate significant Pearson correlation (2-tailed, at p < .05 level and p < .01 level, respectively).
consideration, as these can potentially reflect different developmental pathways to outcome (Elsabbagh & Johnson, 2010). For instance, differences in temperament (characterised by lower activity levels and disengagement of visual attention) in some infants who later go on to develop ASD, have been reported in prospective studies (Zwaigenbaum et al., 2005). However, how differences in infant behaviour interact with risk to influence brain responsivity remains unknown, and the directionality in the mutual influences between mother and infant cannot be fully resolved from the current study.
In this study infant active-engagement was not independently associated with risk status or with brain activation. Yet, it did interact with risk status to predict brain response to nonvocal sounds, in HR infants. That is, HR infants who are more engaged in their early interactions show a tendency to respond more strongly in the region of the medial frontal gyrus to non vocal sounds compared to human voices. It is possible that this counterintuitive result is a manifestation of a protective trait of the infants who grow up and do not develop ASD, showing that their stronger responses to non voice sounds may counteract the deficit in processing the human voices we have found associated with the HR status. Future work is required, however, to determine if modulating the early behaviours of infant HR siblings alters their developmental brain trajectories.
A weakness of our study is that the high-risk infants have not yet been assessed for ASD at three years of age. While only a minority of our infants at-risk will go on to a later diagnosis of ASD, the unaffected siblings of children with ASD often share common patterns of atypical activation ("trait activity") in cortical regions engaged in social processing, including the right inferior temporal gyrus, as reported in Kaiser et al. (2010). Thus, it is possible that our current results reflect "trait" activity in our high-risk infant group that will result in a later diagnosis of ASD only when combined with other genetic, neural, or environmental factors.