Many but not all deep neural network audio models capture brain responses and exhibit correspondence between model stages and brain regions

doi:10.1371/journal.pbio.3002366

Many but not all deep neural network audio models capture brain responses and exhibit correspondence between model stages and brain regions

Fig 4

Example model predictions for 6 components of fMRI responses to natural sounds.

(A) Predictions of the 6 components by a trained DNN model (CochResNet50-MultiTask). Each data point corresponds to a single sound from the set of 165 natural sounds. Data point color denotes the sound’s semantic category. Model predictions were made from the model stage that best predicted a component’s response. The predicted response is the average of the predictions for a sound across the test half of 10 different train-test splits (including each of the splits for which the sound was present in the test half). (B) Predictions of the 6 components by the same model used in (A) but with permuted weights. Predictions are substantially worse than for the trained model, indicating that task optimization is important for obtaining good predictions, especially for components 4–6. (C) Predictions of the 6 components by the SpectroTemporal model. Predictions are substantially worse than for the trained model, particularly for components 4–6. Data and code with which to reproduce results are available at https://github.com/gretatuckute/auditory_brain_dnn.

doi: https://doi.org/10.1371/journal.pbio.3002366.g004