Increasing neural network robustness improves match to macaque V1 eigenspectrum, spatial frequency preference and predictivity.
PLoS computational biology
2022; 18 (1): e1009739
Task-optimized convolutional neural networks (CNNs) show striking similarities to the ventral visual stream. However, human-imperceptible image perturbations can cause a CNN to make incorrect predictions. Here we provide insight into this brittleness by investigating the representations of models that are either robust or not robust to image perturbations. Theory suggests that the robustness of a system to these perturbations could be related to the power law exponent of the eigenspectrum of its set of neural responses, where power law exponents closer to and larger than one would indicate a system that is less susceptible to input perturbations. We show that neural responses in mouse and macaque primary visual cortex (V1) obey the predictions of this theory, where their eigenspectra have power law exponents of at least one. We also find that the eigenspectra of model representations decay slowly relative to those observed in neurophysiology and that robust models have eigenspectra that decay slightly faster and have higher power law exponents than those of non-robust models. The slow decay of the eigenspectra suggests that substantial variance in the model responses is related to the encoding of fine stimulus features. We therefore investigated the spatial frequency tuning of artificial neurons and found that a large proportion of them preferred high spatial frequencies and that robust models had preferred spatial frequency distributions more aligned with the measured spatial frequency distribution of macaque V1 cells. Furthermore, robust models were quantitatively better models of V1 than non-robust models. Our results are consistent with other findings that there is a misalignment between human and machine perception. They also suggest that it may be useful to penalize slow-decaying eigenspectra or to bias models to extract features of lower spatial frequencies during task-optimization in order to improve robustness and V1 neural response predictivity.
View details for DOI 10.1371/journal.pcbi.1009739
View details for PubMedID 34995280
Time-resolved correspondences between deep neural network layers and EEG measurements in object processing.
2020; 172: 27–45
The ventral visual stream is known to be organized hierarchically, where early visual areas processing simplistic features feed into higher visual areas processing more complex features. Hierarchical convolutional neural networks (CNNs) were largely inspired by this type of brain organization and have been successfully used to model neural responses in different areas of the visual system. In this work, we aim to understand how an instance of these models corresponds to temporal dynamics of human object processing. Using representational similarity analysis (RSA) and various similarity metrics, we compare the model representations with two electroencephalography (EEG) data sets containing responses to a shared set of 72 images. We find that there is a hierarchical relationship between the depth of a layer and the time at which peak correlation with the brain response occurs for certain similarity metrics in both data sets. However, when comparing across layers in the neural network, the correlation onset time did not appear in a strictly hierarchical fashion. We present two additional methods that improve upon the achieved correlations by optimally weighting features from the CNN and show that depending on the similarity metric, deeper layers of the CNN provide a better correspondence than shallow layers to later time points in the EEG responses. However, we do not find that shallow layers provide better correspondences than those of deeper layers to early time points, an observation that violates the hierarchy and is in agreement with the finding from the onset-time analysis. This work makes a first comparison of various response features-including multiple similarity metrics and data sets-with respect to a neural network.
View details for DOI 10.1016/j.visres.2020.04.005
View details for PubMedID 32388211