Orientation representations in convolutional neural networks are more discriminable around the cardinal axes

Convolutional neural networks (CNNs) share some similarity in representational structure to the primate ventral visual stream, however less is known about whether low-level visual features are represented in the same way by CNNs and the brain. Here, we focus on orientation perception, a well-understood aspect of the primate visual system. We asked whether convolutional neural networks trained to perform object recognition on a natural image database would exhibit an “oblique effect” such that cardinal (vertical and horizontal) orientations are represented with higher precision than oblique (diagonal) orientations, as has been measured in the primate brain. We obtained activation patterns from two networks (NASnet and Inception-V3) presented with oriented grating stimuli, and used a Euclidean distance metric to measure the discriminability between patterns corresponding to different pairs of orientations. In agreement with human perception, we find that the discriminability of representations generally peaks around the cardinal axes. This finding suggests that cardinality effects in human visual perception are not dependent on a hard-wired anatomical bias, but can instead emerge through experience with the statistics of natural images.


Introduction
Hierarchically organized neural network models trained to perform object categorization have been shown to provide a reasonable approximation of the features represented by neurons in the ventral visual cortex of primates (Kubilius et al., 2018;Yamins et al., 2014). This general correspondence is present even at the earliest layers of convolutional neural networks (CNNs), which are often found to learn Gaborwavelet-like filters (Yamins & DiCarlo, 2016). However, the organization of low-level feature representations by CNNs has not been extensively characterized. Understanding whether CNNs develop idiosyncrasies that mimic the properties of the primate visual system is important for developing models that can inform our understanding of the brain. Additionally, because the majority of neural network properties are acquired through training, examining feature representations of CNNs is a useful tool for determining which properties of the primate brain might be innate and which are likely to be acquired through experience.
In this paper we focus on the well-known "oblique effect", in which human and non-human primate observers tend to show higher acuity around cardinal orientations (horizontal and vertical) compared to oblique orientations (Bauer, Owens, Thomas, & Held, 1979;Higgins & Stultz, 1950). This effect is thought to originate from an over-representation of neurons tuned to horizontal and vertical orientations, which has been measured in primary visual cortex of mice and cats, as well as primate V2 (Li, Peterson, & Freeman, 2003;Salinas, Velez, Zeitoun, Kim, & Gandhi, 2017;Shen et al., 2014). According to an efficient coding framework, this anisotropy is adaptive because it allows for optimal processing of natural scenes, in which horizontal and vertical edges are common (Girshick, Landy, & Simoncelli, 2011).
Based on this framework, we hypothesized that if a CNN is trained on a dataset of natural images, it may develop similar properties. To test this, we took neural networks that were pre-trained on a database of natural images, and obtained activations after presenting them with circular grating stimuli of varying orientations. We then measured the discriminability of activation patterns at each layer corresponding to neighboring orientations. We then evaluated how discriminability changed as a function of position in orientation space. Our results suggest that, similar to the primate brain, CNNs exhibit an anisotropic representation of orientation.

Visual stimuli
Each CNN was presented with visual grating stimuli (square images 140 x 140 pixels) at a range of orientations, spatial frequencies, and noise levels. Stimuli were circular, sinusoidal gratings with smoothed edges (kernel size = 10 pixels, sd = 5 pixels), presented against a mid-gray background. After smoothing, each grating had a radius of 65 pixels. Orientations ranged between 1-180 in 1 degree steps, and spatial frequencies ranged from 0.04-0.22 cycles per pixel, in 4 logarithmically spaced steps (0.04, 0.07, 0.12,

350
This work is licensed under the Creative Commons Attribution 3.0 Unported License. To view a copy of this license, visit http://creativecommons.org/licenses/by/3.0 0.22). We also superimposed three levels of Gaussian noise onto gratings. The first level had zero noise, the second level had Gaussian noise with a standard deviation equal to the grating amplitude/8, and the highest noise level had a standard deviation equal to the grating amplitude/4. We generated 4 gratings at each orientation, noise level, and spatial frequency, for a total of 8640 images. The phase of each grating was randomly selected within the range of 1-180 degrees.
The full image set was passed through each CNN in 96 batches of 90 images each, and the resulting activation patterns at each layer were recorded. To reduce the size of the activations, we performed principal components analysis (PCA) across all 8640 images, and saved a maximum of 500 components for each layer.

Measuring discriminability
To evaluate how orientation discriminability varied at different points in orientation space, we calculated the Euclidean distance between activation patterns corresponding to each pair of neighboring orientations (1 degree apart). We performed this calculation within each spatial frequency and noise level separately. Since there were 4 gratings presented at each orientation (with randomized phase), this gave a total of 32 comparisons between each orientation and its leftward and rightward neighbors. We report the mean and standard deviation across these 32 comparisons. Note that though the absolute values of Euclidean distance reported here are not particularly meaningful, the relative values are interpretable.

Results and Discussion
To visualize the organization of orientation representations at each layer of each CNN, we first plotted the first two principal components corresponding to each orientation (example shown in Figure 1). This revealed that as orientation was varied, representations tended to follow either a circular or linear trajectory. Similar patterns were found in both networks, with some variation across layers. Clustering by spatial frequency was also apparent. More importantly, the spacing between points on these plots reveals that pairs of stimuli close to the cardinal axes tended to be more dissimilar than pairs spaced an equal number of degrees apart but located near an oblique. Figure 1. Orientation representations around cardinals are more spaced out than those around obliques. The first two principal components corresponding to each stimulus are plotted for an example layer, with colors indicating orientation and shapes indicating spatial frequency.
Next, we quantified this differential spacing effect by calculating the discriminability at each point in orientation space as described above. We focused first on the noise-free gratings. As shown in Figures 2 and 3, we found that the discriminability between neighboring orientations varied substantially with position in orientation space. Across the middle and late layers of both networks, discriminability was highest at the cardinals and was lowest at the obliques. Interestingly, many layers also showed an additional, smaller, peak centered over the oblique orientations (45 and 135 degrees). This secondary peak is consistent with human psychophysics studies that have found a small boost in performance for orientations centered directly on an oblique. This finding also raises the possibility that the ImageNet dataset might not precisely match the orientation distribution of the natural environment, but may instead have an overrepresentation of obliques as well as cardinals. Measuring the empirical distribution of orientations in this image database will be an important avenue for future work. This discriminability effect was least pronounced at the earliest layers of each network, and became markedly more robust at higher layers. This may suggest that the effect was enhanced by feedforward connections between the earliest layers, as has been suggested to occur between macaque V1 and V2 (Shen et al., 2014). Furthermore, the effect was most pronounced at the highest spatial frequencies (Figure 2), consistent with previous findings that orientation anisotropy in the tuning of single neurons is most robust for neurons preferring higher spatial frequencies (Li et al., 2003;Salinas et al., 2017;Shen et al., 2014). Finally, we evaluated whether the changes in discriminability across orientation space varied as Gaussian noise was added to the stimuli. This revealed that the overall discriminability between pairs of stimuli increased with noise (seen as an additive shift of the curves in Figure 3). However, increasing noise also decreased the magnitude of the cardinal bias. One interpretation of this is that adding noise masked the oblique effect, similar to the effect of decreasing spatial frequency. Interestingly, recent work has suggested that different types of noise may have opposing effects on cardinal biases in orientation perception (Wei & Stocker, 2015). Future work may focus on comparing the effects of different types of noise, such as bandpass-filtered noise, with the effect of Gaussian noise seen here.

Conclusion
Our results suggest that CNNs, like biological observers, represent stimulus orientations in an anisotropic manner, such that cardinal orientations are more discriminable than obliques. Since this bias was not built into the architecture of the networks, this suggests that cardinal biases can emerge solely as a consequence of experience with natural image statistics. This finding contrasts with results from mice and ferrets, in which cardinal over-representation decreases with experience (Coppola & White, 2004;Hoy & Niell, 2015), but is consistent with findings from primate V2, in which cardinal biases become stronger with age (Shen et al., 2014). More generally, these findings highlight an example of convergence between CNNs and primate brains, and may inform the future development of more biologically-plausible computer vision models.