Skip to main content
Advertisement

< Back to Article

Reconstructing Speech from Human Auditory Cortex

Figure 6

Schematic of nonlinear modulation model.

(A) The input spectrogram (top left) is transformed by a linear modulation filter bank (right) followed by a nonlinear magnitude operation (not shown). This nonlinear operation extracts the modulation energy of the incoming spectrogram and generates phase invariance to local fluctuations in the spectrogram envelope. The input representation is the two-dimensional spectrogram S(f,t) across frequency f and time t. The output (bottom left) is the four-dimensional modulation energy representation M(s,r,f,t) across spectral modulation scale s, temporal modulation rate r, frequency f, and time t. In the full modulation representation [18], negative rates by convention correspond to upward frequency sweeps, while positive rates correspond to downward frequency sweeps. Accuracy for positive and negative rates was averaged unless otherwise shown. See Materials and Methods. (B) Schematic of linear (spectrogram envelope) and nonlinear (modulation energy) temporal coding. Left: acoustic waveform (black curve) and spectrogram of a temporally modulated tone. The linear spectrogram model (top) assumes that neural responses are a linear function of the spectrogram envelope (plotted for the tone center frequency channel, top right). In this case, the instantaneous output may be high or low and does not directly indicate the modulation rate of the envelope. The nonlinear modulation model (bottom) assumes that neural responses are a linear function of modulation energy. This is an amplitude-based coding scheme (plotted for the peak modulation channel, bottom right). The nonlinear modulation model explicitly estimates the modulation rate by taking on a constant value for a constant rate [32].

Figure 6

doi: https://doi.org/10.1371/journal.pbio.1001251.g006