Temporal Context Invariance Reveals Neural Processing Timescales in Human Auditory Cortex

Natural stimuli like speech and music are structured at many timescales. But it remains unclear how these diverse timescales are neurally coded. Do neural processing timescales increase along the cortical hierarchy? Are there distinct timescales for particular stimulus categories? What information is coded at each timescale? Answering these questions has been challenging because there is no general method for estimating sensory integration periods: the temporal window within which stimulus features alter the neural response. Here, we introduce a simple experimental paradigm for inferring the integration period of any timevarying response. We present segments of natural stimuli in a sequence, such that same segment occurs in two different contexts (different surrounding segments). We then measure how long the segments need to be for the response to become invariant to the context. We apply this paradigm to map temporal integration periods in human auditory cortex using electrocorticography data from epilepsy patients. Our map reveals a clear gradient in which integration periods grow as one moves away from primary auditory cortex, providing support for hierarchical models. We also show that selectivity for sound categories first emerges at timescales of ~200 ms, approximately the duration of speech syllables and musical notes.

Natural stimuli like speech and music are structured at many timescales. But it remains unclear how these diverse timescales are neurally coded. Do neural processing timescales increase along the cortical hierarchy? Are there distinct timescales for particular stimulus categories? What information is coded at each timescale? Answering these questions has been challenging because there is no general method for estimating sensory integration periods: the temporal window within which stimulus features alter the neural response. Here, we introduce a simple experimental paradigm for inferring the integration period of any timevarying response. We present segments of natural stimuli in a sequence, such that same segment occurs in two different contexts (different surrounding segments). We then measure how long the segments need to be for the response to become invariant to the context. We apply this paradigm to map temporal integration periods in human auditory cortex using electrocorticography data from epilepsy patients. Our map reveals a clear gradient in which integration periods grow as one moves away from primary auditory cortex, providing support for hierarchical models. We also show that selectivity for sound categories first emerges at timescales of ~200 ms, approximately the duration of speech syllables and musical notes. Keywords: temporal integration; auditory cortex; electrocorticography; ECoG; natural stimuli

Extended abstract
Natural stimuli are structured at timescales from milliseconds (e.g. phonemes) to seconds (e.g. words) and minutes (e.g. narrative structure). Understanding and modeling how these diverse timescales are 280 This work is licensed under the Creative Commons Attribution 3.0 Unported License. To view a copy of this license, visit http://creativecommons.org/licenses/by/3.0 neurally coded is a central goal of sensory and computational neuroscience (Honey et al., 2012).
Sensory timescales are often defined in terms of their temporal integration period (Theunissen and Miller, 1995): the time window when stimuli alter the response. Integration periods are central to many theories and models of sensory coding. Hierarchical models often posit that integration periods grow as one ascends the sensory hierarchy (Honey et al., 2012;Overath et al., 2015). Other theories posit that different hemispheres (Zatorre et al., 2002) or cortical regions (Overath et al., 2015) have distinct integration periods, or that certain stimulus classes might require dedicated processing timescales.
There are two well-known approaches for estimating sensory integration periods. One approach is to derive an explicit model relating the stimulus to the response. In the auditory system, it is common to estimate a "spectrotemporal receptive field" (STRF): a linear mapping between a spectrogram and the neural response. This approach is effective if the response is linear with respect a spectrogram. But cortical responses are known to be highly nonlinear (Sahani and Linden, 2003), particularly in non-primary regions (Norman-Haignere and McDermott, 2018), and STRFs could thus yield misleading results. Estimating nonlinear models of neural responses remains a challenging task particularly in higher-order sensory regions.
A second approach is to temporally scramble natural sounds, and measure the power or reliability of the neural signal as a function of the scrambling window (Honey et al., 2012). A common finding is that putatively higher-order brain regions respond more strongly to stimuli that have intact temporal structure at longer timescales. However, because scrambling paradigms simply measure the power in response to more or less scrambled stimuli they cannot detect selectivity for stimulus features that are similar on average between intact and scrambled stimuli. This fact helps explain why primary auditory regions often show no effect of scrambling (Overath et al., 2015), since they plausibly respond to features such as frequency and modulation that are not greatly altered by scrambling.
To address these limitations, we introduce a novel paradigm, which we term "temporal context invariance" or "TCI", for estimating the integration period of any time-varying response. The TCI paradigm is effective because it directly tests the core idea of an integration period: that the response should be invariant to any stimuli falling outside the integration period. We measure responses to stimulus segments of different size (similar to scrambling paradigms) and test whether the response to a given segment is affected by what came before and after (our innovation) (Figure 1). If the integration period is less than the segment size, then there should be a moment at which the response to the current segment is unaffected by the surrounding segments. We can thus estimate the integration period by varying the segment size, and measuring at what point the response becomes context invariant.
Our study makes three contributions: (1) We show that our TCI method is effective at measuring integration periods throughout auditory cortex, unlike standard paradigms. (2) We show that integration periods increase substantially from primary to nonprimary regions. (3) We show short-timescale electrodes are best predicted by spectrotemporal features of sound while long-timescale electrodes are best predicted by a sound's category (e.g. whether it is speech or music). Using decoding analyses, we show that selective responses to speech and music first emerge at timescales of ~200 ms, suggesting selectivity for syllables or musical notes. Segments of natural stimuli (here sounds) are presented in a random order such that the same segment occurs in two different contexts (different surrounding segments). If the neural integration period is less than the segment duration, there should be a moment when the response is the same across the two contexts.