Contrast Gain Control in Auditory Cortex

Summary The auditory system must represent sounds with a wide range of statistical properties. One important property is the spectrotemporal contrast in the acoustic environment: the variation in sound pressure in each frequency band, relative to the mean pressure. We show that neurons in ferret auditory cortex rescale their gain to partially compensate for the spectrotemporal contrast of recent stimulation. When contrast is low, neurons increase their gain, becoming more sensitive to small changes in the stimulus, although the effectiveness of contrast gain control is reduced at low mean levels. Gain is primarily determined by contrast near each neuron's preferred frequency, but there is also a contribution from contrast in more distant frequency bands. Neural responses are modulated by contrast over timescales of ∼100 ms. By using contrast gain control to expand or compress the representation of its inputs, the auditory system may be seeking an efficient coding of natural sounds.


Contrast Gain Control in Auditory Cortex
Neil C. Rabinowitz, Ben D.B. Willmore, Jan W.H. Schnupp, and Andrew J. King

Inventory:
Table S1 is related to Figure 1. Table S2 is related to Figure 1.        Yes. STRFs trained on 9/10 of available data, and used to predict a PSTH on the remaining 1/10.

linear scaled
g ! L = gain for each contrast condition v ft = normalized receptive field use: Figure 2F cross-validated: Yes. STRFs from one stimulus condition (together with the gain factor g ! L ) predicted responses in the other conditions as well as the withincondition STRFs (median difference in prediction scores of 3.5%, P > 0.4; Figure 2F). Yes. Nonlinearities trained on 9/10 of available data, and used to predict a PSTH on the remaining 1/10.

nonlinear transformed
For σ L = σ ref = 8.7 dB : g σ L = 1, Δx σ L = 0, and Δy σ L = 0 use: Figure 4A-C cross-validated: Yes. Across units, nonlinearities from the reference stimulus condition, together with the fitted curve transformation, predicted responses in the other conditions as well as Model #3 above (median improvement in prediction scores of 0.2%; P > 0.3 by sign-rank test). Yes. This model was fit on a unit-by-unit basis to all data for each unit (where µ L = 40 dB SPL), fixing n = 1.31. Predictions of units' responses in each contrast condition were marginally better than Model #3 above (median improvement in prediction scores of 0.3%, P < 0.01 by sign-rank test). For units where gain did not increase monotonically as contrast was reduced to its lowest value ("threshold"), small improvements in predictions were obtained by limiting this model to responses above threshold.     Figure 1D. In the (logarithmic) level domain, the blue (low σ L ) distribution is uniform, with a mean μ L = 40 dB SPL and a width of ± 5 dB; the green (medium σ L ) and red (high σ L ) distributions have the same level mean but level widths of ± 10 and ± 15 dB respectively. In the (linear) sound pressure domain, as shown here, these are no longer uniform distributions, and their means are no longer identical.

test/mask
(B) The contrast (σ P /μ P ) of the sound pressure distribution is highly correlated with the standard deviation (σ L ) of the sound level distribution. Colored symbols shown correspond to the range of stimulus statistics presented during this study. Blue thick line shows the analytic solution of the relationship between these variables: Procedures). The black thin line shows a linear fit between the sound pressure contrasts and level standard deviations of the presented distributions. This provides an excellent fit (r 2 = 0.997), indicating that these measures are interchangeable, up to a scale factor and constant offset, i.e. Eqn. (1) in main text.
It is worth noting that this approximation does not necessarily hold under all sets of stimulus statistics, and appears to break down as the kurtosis of the sound level distribution is increased. As contrast was reduced, fewer units yielded predictive STRFs (P << 0.001; Kruskal-Wallis test). In general, units were less reliably driven by low contrast stimuli. Signal power under low contrast stimulation was a median 53% less than that measured under high contrast. Similarly, the intermediate contrast stimulus yielded a signal power that was 17% less than that measured under high contrast. Noise power, however, remained constant across conditions. Thus, units were more reliably driven under high contrast stimulation, and hence their responses were more predictable. The derivative of the nonlinearity with respect to X ! v (B) relates how a small change in the stimulus impacts the firing rate of the unit. By looking at this derivative as a function of X ! v , we can find the intensity region to which the cell is most sensitive. This is precisely the value of X ! v where the derivative is greatest.
(C) Although the range of X ! v values presented in each condition was different -between approximately ±7.4 dB for the low contrast stimulus, ±14.9 dB for the intermediate contrast stimulus, and ±22.3 dB for the high contrast stimulus -we extrapolated from the sigmoids that had been fitted over these limited regions to estimate which stimulus each unit would be most sensitive to under each contrast condition. By this criterion, the maximal slope occurred at a median position of X ! v =  In the main text, we describe how the data from all conditions were pooled to fit a single linear STRF, and separate nonlinearities were fitted for each contrast condition (Figures 3-4). We validated this procedure by checking that there were no systematic differences between STRF shape in different conditions.
To check that the differences in gain were not a by-product of using a STRF derived from the pooled dataset, we ran the same analysis where the STRF was fitted from only the responses of units to either the low contrast stimuli (A), medium contrast stimuli (B), or high contrast stimuli (C). In each case, the point nonlinearities were then estimated for all three conditions, and the input gain calculated. Panels (A-C) are equivalent to Figure 4A. The same qualitative observations were made in every case, regardless of which STRFs were used: lower stimulus contrast is compensated for by an increase in response gain.
(D) The gain effect was strongest among units with the most robust, repeatable spike trains: units with low noise levels showed a greater modulation of their gain as a function of contrast. The measured gain (ordinate) is dependent on the between-trial variability of the data recorded from each unit across multiple presentations of the same stimulus (noise level; abscissa). The more consistent a unit's responses across trials, the greater the observed gain rescaling. Colors as in Figure 2C-E, Fitted lines as in Figure 4D. Red dashed line indicates G = 1 across conditions, i.e. no gain modulation. If we extrapolate from the population data to a hypothetical zero-noise neuron (Sahani and Linden, 2003b;Ahrens et al., 2008), we obtain relative gain values of 2.21 ± 0.16 and 1.40 ± 0.06, respectively (median ± 99% confidence interval) for the same two contrast comparisons. This implies that neurons do not completely compensate for stimulus contrast. There was no corresponding dependence of x-offset or y-offset on noise level (data not shown).
(E, F) Dependence of x-offset and y-offset values on a broad range of stimulus contrasts. Panels E and F show the relationship between stimulus contrast (σ L ) and x-/y-offset in an equivalent manner to Figure 4E. Magenta line shows the predicted x-offset calculated from the (within-contrast) relationship between gain and xoffset in Figure 4D, and the dependence of gain on σ L in Figure 4E. The increase in xoffset as contrast is reduced can therefore be explained primarily by the pronounced increase in gain at low contrast. There is no systematic effect of stimulus contrast on yoffset. (H) Relationship between contrast (abscissa) and gain (ordinate) for 80 units. Contrast (σ L ) values as in Figure 4E, thin red horizontal line denotes G = 1. As contrast was reduced, 46/80 units increased their gain down to the lowest contrast measured, σ L = 1.4 dB (c = 17%); 26/80 responded with maximum gain at the second lowest contrast measured (σ L = 2.8 dB, c = 33%); four with maximum gain at σ L = 4.3 dB (c = 49%); and the remaining four showed no consistent dependence of gain on σ L . Within the first three of these groups, the input gain could be well described by Eqn. (2) down to their respective maximum-gain contrast values (cyan, red, and green, respectively; yellow fits for the remaining group). This demonstrates that gain normalization is evident on a unitby-unit basis, although individual units may have limits on how much they can compensate for arbitrarily low stimulus contrasts. Figure S4, related to Figure 5: Output nonlinearities for two example units show that contrast gain control is weaker at low mean levels; gain effects are identical when expressed in terms of σ P /μ P rather than σ L .
(A) Output nonlinearities for a given unit for three contrast values (left, middle and right blocks), and three stimulus mean levels (small sub-blocks; green µ L = 30 dB, orange µ L = 40 dB, red µ L = 50 dB SPL). Larger panels show all fitted sigmoids for a given contrast but different µ L values. For this unit, gain rescaling is strong at µ L = 50 dB and 40 dB, but relatively weak at µ L = 30 dB.
(B) As in (A), for a second example unit.
(C) As explained in the main text and Figure S1, we used the standard deviation of the distribution of tone levels (σ L ) as a measure of contrast, as this correlates very strongly with the contrast in sound pressure c = σ P /μ P . For completeness, we provide here a version of Figure 5A where the abscissa is expressed directly in units of c = σ P /μ P (%), rather than in units of standard deviation. All analyses and model fitting produced almost identical results when contrast was expressed as a percentage rather than in dB. (A) Average response to the test sound for an example unit, as in Figure 6C. Blue circles and blue line show the response when the test sound was embedded in a low contrast context; red circles and red line show response when the test sound was embedded in a high contrast context. Responses have been averaged within each contrast context over all post-switch delays from 150-800 ms. As only the time course of the response is of interest here, rather than the amplitude, responses in both conditions have been normalized to have a background rate of 0 and a peak response of 1. The duration of the excitatory portion of the response is measured as the width at half-maximum (i.e. along the dotted line). These are shown for each condition as the coloured thick bars. For this unit, the durations were 17.5 s in low contrast context, and 18 s in high contrast context.
(B) As in (A), for a second example unit.
(C) Relationship between the duration of the response to the test stimulus in high and low contrast contexts units, measured by the width at half-maximum as in (A,B). To ensure that durations could be accurately measured, only n=25 units where the standard deviation of the background rate was less than a third of the half-maximum firing rate were included. There was no significant difference between these durations (sign-rank, P > 0.3).
For the test/mask analysis, we fit two classes of models to a training dataset. This dataset consisted of the relative gain values for the 9 test/mask stimulus conditions, averaged (median) across 24 units where the test completely covered the responsive frequency range. The models included contributions to the gain from a combination of either one or two of: − the test contrast, σ test − the mask contrast, σ mask − a measure of local contrast, defined via the average level variance within the frequency bands constituting a unit's responsive frequency range (! RF ): − a measure of non-local ("remote") contrast, defined via the average level variance within the frequency bands outside the unit's responsive frequency range (! outside ): − a measure of the global contrast, defined via the average level variance across all frequency bands (! ): were obtained when these weighted sums were of the form . For fitting the model to the population median data, it was assumed that σ local = σ test and σ remote = σ mask . The value of n = 1.31 was fixed.
Each of the fitted models documented in the table were tested via a validation and a prediction. Firstly, the model, whose parameters were fit on the population median, was used to calculate the expected relative input gain of the 24 individual units where the test completely covered ! RF . This used the unit-by-unit measurement of the five contrast measures as listed above. The median percentage of variance explained across these units (%VE) is reported in the middle column of the table. Secondly, the model was used to predict the relative input gain for 42 individual units with ! RF lying entirely outside the test (rightmost column of the table). Again, this used the unit-by-unit measurement of the five contrast measures as listed above.
Two major categories of models were evaluated. In the first category, models could depend on only one measure of stimulus contrast. Here, a dependence on σ local provided the best validation and prediction scores (bold text in highlighted row), indicating that neural gain is most powerfully modulated by stimulus variance within the frequency bands closest to its BF.
In the second category, models could depend on both the stimulus contrast in local frequency bands, and in remote bands (or all bands), via a weighted sum in the denominator of the gain equation. Better validation and prediction scores were provided by including σ global as well as σ local , compared with the σ local model only. This demonstrates that neuronal gain is not just determined by the statistics within the responsive frequency range of a neuron. Notably, the weighting in the denominator on σ local is 2.4x greater than for σ global , showing that local effects remain stronger than global effects.

Spike sorting
Off-line spike sorting was performed using spikemonger, an in-house software package for Matlab (Mathworks Inc, Natick MA). Candidate spikes were identified as voltage-threshold crossing events. An automated expectation maximization algorithm sorted spikes by shape across up to five channels. Multi-unit clusters were chosen for further analysis; 42% (427/1020) of these were deemed to be single units based on spike shape and the presence of a refractory period in the autocorrelation histogram. We only included units that displayed acoustically-responsive activity.

Analytic relationship between sound pressure statistics and level statistics
The distributions of tone levels in this study were defined as uniform distributions in the level domain ( Figure 1D), i.e.: where μ is mean level, and w is the half-width of the level distribution. In the pressure domain, these distributions become: with the transformed variables ! = w ln(10) / 20 and ! = p 0 . 10 µ /20 , and the standard reference RMS sound pressure, p 0 = 20 μPa. Computing expectations gives ( ). The square of the contrast can then be

Statistics on STRFs
Statistics on the STRF were calculated as follows. The best frequency (BF) was calculated as the center of mass (CoM) of the (element-wise) squared frequency kernel, w f 2 . Given that a finite range of frequencies was used, noise in the estimates of w f 2 exerted a bias in the CoM values towards the center of this range. To compensate for this bias, we calculated the CoM on a circular basis. We manually checked that there were no wrapping artifacts when the true BF was at either the lower or upper extreme of the frequency range. Bandwidth was estimated as the (circular) standard deviation of w f 2 . Circular statistics were performed using CircStat in Matlab (Berens, 2009). As we found that noise in the STRF had a considerable impact on the estimate of bandwidth, we calculated a second bandwidth measure after excluding all coefficients of w f 2 that were not significantly non-zero. This was established via a bootstrap procedure, described below. We confirmed that all results held when BFs and bandwidths were estimated on non-circular (i.e. standard) bases.

Bootstrapped estimates of non-zero components of the STRFs
To determine what proportion of each STRF lay within the test, we first constructed confidence intervals for the components of w f via bootstrapping. We drew (with replacement) 1000 subsamples of 90% of the stimulus/response data from the pooled dataset, and estimated a STRF from each. Those components of w f not significantly nonzero across the ensemble (via studentization, P < 0.01) were set to zero. Equation (4) as given in the main text was calculated using only the non-zero components of w f .

Fitting of curve transformations
We fit Eqn. (6) in the Experimental Procedures by minimizing a quadratic loss function. This measured the squared difference between the target curve, y , over the domain of the target curve only. We used gradient descent to find the optimal parameters of the gain, g, the x-offset, !x , and the y-offset, !y , using initial conditions of g 0 = 1, !x 0 = 0, !y 0 = 0 . Gradient descent was repeated a further ten times from random initial conditions (with g 0~E xp(1) , and Δx 0 , Δy 0~N (0,10 2 ) ), and the best of all these fits chosen. Results were cross-checked by minimizing the mean squared error between the transformed curve and the original data by which the original sigmoid was fit.
Though curve transformations could, in principle, include both a scaling of curves along both the horizontal and vertical axes, the extent of vertical scaling could not be accurately estimated alongside horizontal scaling as none of the units measured operated near their saturation point. Nevertheless, we considered whether changes in the nonlinearities could be explained by vertical scaling (together with x-and y-offsets), rather than by horizontal scaling. This amounted to fitting the equation: in the same manner as Eqn (5) in the main text (see details above). This provided considerably poorer fits than Eqn (5) (data not shown).

Gain effects on units without predictive LN models
In the main text, we measure gain changes by measuring the horizontal scaling of output nonlinearities between different conditions. In order to be able to perform this analysis for a unit, we need a LN model that predicts the unit's responses in all conditions. For many units in our sample, we were unable to estimate such a model, and so we could not measure gain changes in this way. To assess gain changes across the whole sample, we performed the following alternative analysis.
Consider a sample of noiseless neurons, with linear stimulus-response relationships. If the neurons do not have contrast gain control, then we should expect the standard deviation of neural responses, σ (R) , to be directly proportional to the standard deviation of the stimulus, σ (S) . On the other hand, if neurons have complete contrast gain control, then any reduction in stimulus contrast should be compensated for by an increase in gain, so that σ (R) is unaffected by contrast. Thus, we can measure the strength of gain control by comparing σ (R) for two stimuli of high and low contrast. By measuring the ratio of response standard deviation in the two conditions ( ! lo and comparing it to the corresponding ratio of stimulus standard deviations ( ! lo (S ) ! hi we derive a measure of gain, G * : In principle, we can then measure G * for ! lo (S ) = 2.9 dB and ! hi (S ) = 8.7 dB respectively. If G * = 1 , this indicates that there is no gain control (σ (R) is proportional to σ (S) ). If G * = 3 , this indicates complete compensation for changes in stimulus standard deviation.
Real neurons are not noiseless, and so σ (R) is not a good estimate of the stimulusdriven variability (signal power) in the responses. However, Sahani & Linden (2003b) present a simple method for estimating the signal power and the noise power of a set of neural responses. We used this technique to refine the equation above, by replacing σ (R) (as derived from the average response) with s = signal power : To verify that this indeed produced a useful measure of gain, we compared G S * with relative gain (as measured through curve transformations in the main text) for the 458 units where the LN models were predictive in the low and high contrast conditions. Amongst these units, there was a moderate correlation of r = 0.44 between these two measures. This suggests that G S * can provide a fair picture of gain control amongst the remaining units.

Test sound reliability
To ensure that reliable estimates of time constants were obtained, we limited our analysis of adaptation time course to units that satisfied two criteria. Firstly, it was necessary that the mean peak response was different in the high and low contrast context conditions. Any fitting of exponentials of the form r(t) = a + b . exp(!t / !) to units where there was no difference between peak responses in high and low conditions would only be able to capture noise. Secondly, we calculated the noise ratio (NR) (defined in Sahani and Linden, 2003b; see Experimental Procedures) of the set of peak responses to the test sound, across all conditions for that unit. As for the PSTHs, NR=0 indicates that the peak responses were identical for each repeated stimulus presentation, while higher NR indicates that responses were less reliable. As for the STRFs, we restricted our analysis to units whose NR for peak responses was < 10.