fMRI evidence for areas that process surface gloss in the human visual cortex

Highlights • Glossiness information is mainly processed along ventral visual pathway.• The posterior fusiform sulcus (pFs) is especially selective to surface gloss.• V3B/KO responds to gloss, but differentially from the pFs.


Introduction
Surface gloss provides an important cue to an object's physical material and its microstructure (Nishio, Goda, & Komatsu, 2012). From a perceptual perspective, it has particularly intriguing properties because there are cases where glossiness is specified only by small image areas containing highlights (Beck, 1972). Unlike other aspects of material, a slight change in an object (e.g. minor change of material or smoothness) can cause huge differences in the perceptual impression of gloss (Fleming, 2012). While a number of image cues have been proposed to modulate gloss perception, it is an open challenge to understand how this information is processed to infer surface material.
Recent studies have suggested candidate areas in macaque brain that may play an important role in processing gloss (Nishio et al., 2012;Okazawa et al., 2012). For instance, specular objects elicited more fMRI activation along the ventral visual pathway, from V1, V2, V3, V4 to inferior temporal (IT) cortex compared to matte objects and phase-scrambled images of the objects (Okazawa et al., 2012). Single-unit recordings from the superior temporal sulcus (STS) within IT cortex identified neurons that were selective for gloss uninfluenced by changes in the 3D structure of the viewed object or by changes to the illumination (Nishio et al., 2012). Further, these gloss-selective responses reflect combinations of reflectance parameters that align to the perceptual dimensions guide judgments of surface properties (Nishio, Shimokawa, Goda, & Komatsu, 2014). These results from the macaque indicate that specular reflectance properties are likely to be encoded in ventral visual areas.
Despite this recent progress in the macaque model, we still have rather little insight into how the human brain processes gloss. Human brain imaging work examining the (more general) representation of material properties (e.g. wood vs. metal) implicated a role of ventral visual areas, especially in fusiform gyrus (FG), inferior occipital gyrus (IOG) and collateral sulcus (CoS) (Cant, Arnott, & Goodale, 2009;Cant & Goodale, 2007;Hiramatsu, Goda, & Komatsu, 2011). This work employed stimulus changes in multiple image dimensions (e.g. colour, texture and gloss), meaning that activity related to gloss per se could not be determined. It is likely to be an important distinction as tests of a neuropsychological patient who had deficits in colour and texture discrimination showed that they were unimpaired on gloss judgments (Kentridge, Thomson, & Heywood, 2012). This suggests that the cortical processing of gloss is (at least partially) independent from the processing of other material properties. Recently, Wada and colleagues (Wada, Sakano, & Ando, 2014) reported that fMRI activity related to surface gloss is evident in V2, V3, V4, VO-1, VO-2, CoS, LO-1 and V3A/B. In particular, they contrasted glossy and matte objects under bright and dim illumination to exclude the confounding of luminance. Here we use the different approach of perturbing global image arrangement while preserving local image features to target mechanisms of the global synthesis of image cues when judging gloss. It is also different from Okazawa et al. (2012) who contrasted glossy objects with phase-scrambled versions of these objects. We presented observers with stimuli from four experimental conditions: Glossy, Scrambled Glossy, Matte and Scrambled Matte. Thereby we sought to discriminate Gloss vs. Matte renderings of objects while dissociating the role played by local image features.

Participants
Fifteen participants who had normal or corrected-to-normal vision were recruited for the experiment. Two were authors (H.-C. S. and H. B.) and the remainder were naïve participants. All were screened for normal stereoacuity and MRI safety before being invited to participate. All participants had previously participated in other fMRI studies in which fMRI localiser data (see 'ROI definition') and a T1-weighted anatomical scans (see 'MRI data acquisition') were acquired. The age range was 19-35 years old, and 13 of the 15 were male. All participants gave written informed consent before taking part in the experiment. The study was approved by the STEM Ethical Review Committee of the University of Birmingham. The work was carried out in accordance with The Code of Ethics of the World Medical Association (Declaration of Helsinki). After completing the experiment, non-lab member participants received monetary compensation.

Stimuli
The stimuli comprised 32 2-D renderings of 3-D objects generated in Blender 2.67a (The Blender project: http://www.blender.org/). The objects were spheres and tori whose surfaces were perturbed by random radial distortions to produce slightly irregular shapes. The diameter of the stimuli was 12°on average and they were presented on a mid-gray background. We illuminated the objects using a square light source located front and above the objects. We chose this simple light source to be able to increase the influence of our scrambling manipulation. We created versions of the stimuli for each object that made up the four conditions of the experiment: Glossy, Scrambled Glossy, Matte and Scrambled Matte (Fig. 1). In the Glossy condition, objects were rendered using a mixed shader with 90% diffuse and 10% glossy components. We rendered objects in the Matte condition by setting the reflectance function to Lambertian (100% diffuse component). We controlled the luminance of the stimuli so that the mean luminance of the stimuli was 60.54 cd/m 2 and the absolute maximum was 103.92 cd/m 2 which corresponded to 57.55% and 98.78% of the display maximum luminance, respectively. All the objects were rendered without background then we set background colour to gray before further manipulations as described below.
To produce spatial scrambling, we superimposed a 22 Â 22 1-pixel black grid over the images and then randomly relocated squares (0.55°of side) within the grid (Kourtzi & Kanwisher, 2000;Malach et al., 1995). This approach differs from phase scrambling (Okazawa et al., 2012) as blur, contrast, and luminance are only marginally affected. Moreover, the mosaic spatial scrambling approach we used interrupts object shape, shading, and specular highlights while all the local information (e.g., luminance, contrast, luminance histogram skew) is unchanged. Previous work indicates that highlight congruence with surface geometry and shading is crucial for perceived glossiness (Anderson & Kim, 2009;Kim et al., 2011;Marlow et al., 2011). Thus our stimuli strongly attenuate the impression of gloss by disrupting the relationship between highlights and global object structure.
Note that the superimposed grid was presented for both intact and scrambled versions of the stimuli. This greatly attenuates the

Glossy
Scrambled Glossy Matte Scrambled Matte amount of additional edge information that results from the spatial scrambling manipulation. Formally, we assessed differences in image structure by computing possible image cues that might drive the fMRI response. In particular, we found that the image statistics of mean luminance, luminance root-mean-square contrast, and luminance histogram skew were matched across the four conditions ( Fig. 2) indicating that there was more variation within the same class of stimuli than there was between classes. This is trivial for the scrambled versions of the stimuli (they must have the same values of skew, contrast and luminance, by definition), however, it is important that matte and glossy stimuli were well matched. In such a case, although the addition of a grid affects all these measures, it did not create any consistent difference across the four conditions, thus the interpretation of the results should not be affected. Furthermore, the power spectra of the stimuli in the different conditions (Fig. 2D) indicate that the grid is effective in equalizing the spatial frequency content of the images, particularly when contrasted with scrambled images without a superimposed grid. The grid adds high frequency components to intact images creating a pattern that is very similar to the one due to the scrambling procedure. In this way, frequency spectra are made more similar across conditions.

Apparatus
Stimulus presentation was controlled using MATLAB (Mathworks Inc.) and Psychtoolbox (Brainard, 1997;Pelli, 1997). The stimuli were back projected from a JVC DILA SX21 projector onto a translucent screen inside the bore of the magnet. Participants viewed the stimuli binocularly via a mirror fixed on the head coil with a viewing distance of 64 cm. Luminance outputs were linearized and equated for the RGB channels separately with colorimeter measurements. A five-button optic fiber button box was provided to allow responses during the 1-back task.

MRI data acquisition
A 3-Tesla Philips scanner and a 32-channel phase-array head coil were used to obtain all MRI images at the Birmingham University Imaging Centre (BUIC). Functional whole brain scans with

Luminance
Contrast Luminance histogram Skew Image statistics of (A) pixelwise luminance, (B) contrast, (C) histogram skew, and (D) difference in power spectra across the 32 images with and without the superimposed grid. Luminance was calculated by averaging the mean luminance of all pixels in each image then averaging across images. Contrast was calculated with pixelwise luminance's standard deviation divided by its mean for each image, averaged across images. Skew was calculated as the third standardized momentum of the luminance histogram of each image, averaged across images (Motoyoshi et al., 2007). The absolute difference in power spectra was calculated for each image pair and then averaged across images. echo-planar imaging (EPI) sequence (32 slices, TR 2000 ms, TE 35 ms, voxel size 2.5 Â 2.5 Â 3 mm, flip angle 80°, matrix size 96 Â 94) were obtained for each participant. The EPI images were acquired in an ascending interleaved order for all participants. T1-weighted high-resolution anatomical scans (sagittal 175 slices, TR 8.4 ms, TE 3.8 ms, flip angle 8 deg, voxel size: 1 mm 3 ) were obtained from previous studies.

Design and procedure
A block design was used. Each participant took part in 8-10 runs with 368 s length of each run in a 1.5 h session. Each run started with four dummy scans to prevent startup magnetization transients and it consisted of 16 experimental blocks each lasting 16 s. There were 4 block types (i.e., one for each condition), repeated four times in a run. During each block, eight objects were presented twice in a pseudo-random order. Stimuli were presented for 500 ms with 500 ms interstimulus interval (ISI). Participants were instructed to maintain fixation and perform a 1-back matching task, whereby they pressed a button if the same image was presented twice in a row. They were able to perform this task very well (mean d 0 > 3). Five 16 s fixation blocks were interposed after the third, fifth, eighth, eleventh and thirteenth stimulus blocks to measure fMRI signal baseline. In addition, 16 s fixation blocks were interposed at the beginning and at the end of the scan, making a total of seven fixation blocks during one experimental run. An illustration of the scan procedure is provided in Fig. 3.

Data analysis 2.4.1. Functional MRI data processing
BrainVoyager QX version 2.6 (Brain Innovation, Maastricht, The Netherlands) was used for MRI data processing. Each participant's left/right cortical surfaces were reconstructed by segmenting gray and white matter, reconstructing the surfaces, inflating, cutting and then unfolding. All functional images were pre-processed with slice scan timing correction, 3D head motion correction, high-pass filtering (2 cycles per run cut-off) and linear trend removal. Functional images were co-registered with anatomical images and then transformed to Talairach coordinate space and aligned with each other. We computed the global signal variance of the blood oxygenation level dependent (BOLD) signal for each run using the whole-brain average of activity across volumes. If this exceeded 0.16% the scan run was excluded from further analysis to avoid the influence of scanner drifts, physiological noise or other artifacts (Junghöfer, Schupp, Stark, & Vaitl, 2005). On this basis, 17/146 runs across 15 participants were excluded from further analysis. A 3D Gaussian spatial smoothing kernel with 5 mm full-width-halfmaximum (FWHM) was applied before analysing the data using a group-level random effects (RFX) general linear model (GLM).

Additional fMRI analysis
We computed percent signal change (PSC) by subtracting the BOLD signal baseline (the average signal in fixation blocks) from each experimental condition and then dividing by the baseline. In addition, voxels used in the PSC analysis were masked with Fig. 3. The fMRI procedure for one scan run. One each run there were 23 blocks (16 s each), including 7 fixation blocks and 16 experimental blocks. During each experimental block, stimuli were presented for 500 ms with 500 ms interstimulus interval (ISI). Participants were instructed to perform a 1-back matching task. the t-value maps obtained by contrasting all stimulus conditions vs. fixation blocks for each individual participant. PSCs were examined within independently identified ROI under each experiment condition. We then computed the difference in PSC between intact and scrambled versions of Glossy and Matte objects, which we term DPSC.
Finally, we used random effects Granger causality mapping (RFX GCM) to probe the information flow between ROIs. Granger causality uses temporal precedence to identify the direction of influence from a reference region to all other brain voxels (Roebroeck, Formisano, & Goebel, 2005). The GCMs for each participant were calculated first then they were combined together with a simple t-test (t > 0) and cluster-size thresholding (25 mm 2 ).

Results
To identify brain areas that preferentially responded to glossy objects, we used a conjunction analysis to find voxels that were activated more strongly in Glossy condition than in any of the other three conditions across the 15 participants. In particular, Fig. 4 shows the results of a random-effects GLM with statistical significant voxels (p < .05) and cluster-size thresholding (25 mm 2 ). The orange areas demark significantly higher activation in Glossy condition under the three contrasts, respectively: Glossy vs. Scrambled Glossy, Glossy vs. Matte, Glossy vs. Scrambled Matte. In general, these areas were distributed along ventral visual pathway in both hemispheres including the ventral occipitotemporal cortex. In addition, we found responses in the area around V3B/ KO, which is traditionally thought to belong to the dorsal visual stream.
We then compared DPSC for Glossy (light bars) against Matte (dark bars) conditions (Fig. 5) in all the ROIs. A two-way repeated measures ANOVA showed a significant difference between Glossy and Matte conditions (F 1,14 = 10.7, p = .006), an effect of ROI (Glossy -Scrambled Glossy) ∩ (Glossy -Matte) ∩ (Glossy -Scrambled Matte) Right Hemisphere Left Hemisphere Fig. 4. The result of conjunction analysis across the 15 participants. The Glossy condition was compared with the other three conditions. Significant conjunctions are presented on representative flat maps. Sulci are shown in dark gray and gyri are in light gray. The colour scale indicates t-values. The significance level was p < .05 with cluster-size thresholding 25 mm 2 . The orange areas represent activation in Glossy condition that is significantly higher than any of the other three conditions, respectively. (F 10,140 = 102.5, p < .001), and a significant interaction (F 10,140 = 12.9, p < .001). Thereafter we tested for the differences between conditions in each ROI. Asterisks in Fig. 5 represent significant differences in activation between the two conditions (Tukey's HSD post-hoc test at p < 0.01). We found that responses were significantly higher for objects with glossy than with matte surfaces in areas V3B/KO and pFs. Note that to compute DPSC we subtracted the activation in scrambled versions of the stimuli, so the glossy selectivity observed in V3B/KO and pFs is unlikely to be explained by low-level differences in the images of the objects. Moreover, we found no significant difference in the percent signal change (PSC) between Scrambled Glossy and Scrambled Matte conditions (see Supplementary Fig. 1), suggesting that the significant differences in DPSC between glossy and matte stimuli were mainly due to the PSC difference between Glossy and Matte conditions rather than between their scrambled counterparts. DPSC in early visual areas (V1, V2, V3v, V3d, V4) were also significant, however response modulation in these areas was higher for scrambled stimuli than for intact ones. Since the PSC in Scrambled Glossy and Scrambled Matte conditions were similar (see Supplementary  Fig. 1), we can conclude that the difference is mainly due to intact conditions. It is possible that some neurons in these areas selectively respond to glossy object (Okazawa et al., 2012 andWada et al., 2014 also found the importance of V1-V4 in gloss processing), however, unlike V3B/KO and pFs, these areas respond prevalently to scrambled images rather than intact ones. This suggests that these areas primarily deal with low-level image features and do not account for overall glossy appearance. As reviewed above, responses in STS were very low and not significantly different across conditions.
The preceding analysis indicates two brain areas (pFs and V3B/ KO) that appear to be important in processing information about gloss. To quantify how these areas communicate with other parts of the visual cortex, we used a random effects Granger causality mapping analysis (RFX GCM) to assess how these areas influence and depend on activity elsewhere. Fig. 6 shows the results using either pFs (Fig. 6A) or V3B/KO (Fig. 6B) as the reference region, respectively. Blue areas indicate brain areas that are significantly influenced by the reference region, while the green colour map identifies locations that have a significant influence on the reference region. We found that activity in pFs had a strong influence on both dorsal and ventral areas. This may reveal that gloss-related Fig. 6. RFX GCMs with (A) pFs and (B) V3B/KO as reference regions (yellow areas). Blue areas received significant influence from the reference region and green areas sent significant influence to the reference region (p < .05 for t-test on GCMs). Note that since the group GCMs were averaged across participants and then presented on representative flat maps, individual ROI boundaries may not perfectly fit the group level. activity is used for the processes of object processing (in ventral cortex) in addition to affecting depth estimates (estimated in dorsal areas). By contrast, the estimated connectivity in V3B/KO was quite different. V3B/KO mainly received information from ventral areas rather than having influence on them, perhaps indicating that gloss information in V3B/KO is inherited from a primary locus in ventral areas. In addition, we observed that V3B/KO also received some information from an area near the STS. Although our other analyses did not suggest the involvement of the STS, this analysis appears consistent with the role of the STS in gloss indicated by electrophysiological recordings (Nishio et al., 2012). We should note that we could not determine whether the information flow captured by the Granger Causality Mapping is specific to gloss signals. Nevertheless, as the preceding conjunction analysis and PSC results showed the importance of pFs and V3B/KO in processing gloss, it is quite possible that the GCMs show different information flows between pFs and V3B/KO for gloss processing.

General discussion
The aim of this study was to localize the brain areas preferentially responding to glossy objects in the human brain. We did this by rendering glossy and matte versions of three-dimensional objects, and using scrambled images to control for low-level image cues. Our results point to a role for the posterior fusiform sulcus (pFs) and area V3B/KO in the processing of surface gloss: we found stronger responses to glossy objects than their matte counterparts, and this could not be explained by low-level stimulus differences. By assessing connectivity between brain areas while viewing glossy and matte stimuli, we observed that pFs exerted influence on ventral and dorsal brain areas, while V3B/KO was influenced by activity in midlevel ventral areas, which may indicate a difference between areas in their use of information from gloss as a cue to material (pFs) vs. object shape (V3B/KO).
Recent imaging studies in macaques suggest that glossy objects elicit more activation along the ventral visual pathway form V1 to IT cortex (Okazawa et al., 2012). We also found higher activation in the ventral stream, in particular in the pFs. Our results are reassuringly consistent with a very recent fMRI study that used a different image control approach (Wada et al., 2014). In particular, that study indicated the role of ventral areas and the combined areas V3A/B (which is very near to the V3B/KO that we identify). Since the ROI in our study were mapped using independent localisers before the experiment whereas Wada et al. considered only one area (V3A/B), our results pinpoint gloss-related activity more precisely, suggesting that the more lateral V3B/KO region is more important in gloss processing than V3A. The involvement of early visual areas (V1 to V4) is not clear. Although DPSC in earlier areas is significant due to higher activation for Glossy than for Matte objects (see Okazawa et al., 2012;Wada et al., 2014), however, unlike V3B/KO and pFs, response modulation in these early areas is higher for scrambled stimuli. This suggests that these areas primarily deal with low-level features such as the area which occupies visual field, discontinued borders and high spatial frequency information which is more in scrambled than in intact conditions. Note that some low-level features might be affected by our scrambling technique. For example, there are more highlight boundaries (line segments and edges) on Glossy objects and scrambling decreases the number of these segments and edges. Thus, the PSC difference in V1 to V4 might be caused by such low-level image properties rather than glossiness.
Previous human fMRI studies found the modulation of fMRI responses by different object materials perception in the fusiform gyrus (FG) and collateral sulcus (CoS) (Cant & Goodale, 2007;Cant et al., 2009;Cavina-Pratesi, Kentridge, Heywood, & Milner, 2010a,b;Hiramatsu et al., 2011). This work employed a wide variety of object materials (e.g., metal, wood, stone, glass) thus creating differences in surface gloss as well as differences in texture and colour. Here we focused on gloss, manipulating surface reflectance of untextured and homogeneously coloured objects. Despite this important difference between the studies, the surface-propertyspecific region (they denoted as CoS) they found is located very close to the area we denote as pFs based on a comparison of Talairach coordinates. Consistent with this, other work showed that a patient with colour and texture discrimination deficit could judge glossiness correctly, indicating that glossiness information does not exclusively depend on colour or texture processing (Kentridge et al., 2012). Taken together, this evidence suggests a dissociation between areas underlying material/texture from gloss. Nevertheless, the proximity of these areas may suggest a close interrelation and connection between material and gloss processing centres.

The role of V3B/KO in gloss processing
An important finding here is that the brain area V3B/KO seems to be involved in gloss processing. V3B/KO, located in dorsal visual stream, is well known to selectively respond to kinetic boundaries (Van Oostende, Sunaert, Van Hecke, Marchal, & Orban, 1997). It was also found to be involved in integrating different depth cues (Ban et al., 2012;Dövencioglu et al., 2013;Murphy et al., 2013;Tyler, Likova, Kontsevich, & Wade, 2006). Our study, together with the recent results by Wada et al. (2014), indicate that the activity in V3B/KO is modulated by surface gloss, although previous work has not highlighted the involvement of this area in processing material information. One possibility is that V3B/KO does not actually processes gloss information per se. The causality mapping suggests quite a different pattern of causal relationships in V3B/KO than in pFs, with V3B/KO primarily being influenced by signals from elsewhere, while pFs influences responses in other areas. It is possible that the effect we found in V3B/KO was due to the effect of adding internal boundaries to the shapes corresponding to the locations with highlights. Alternatively, because specular highlights are known to influence the perception of 3D shape (Blake & Bülthoff, 1990;Fleming, Torralba, & Adelson, 2004;Muryy, Welchman, Blake, & Fleming, 2013), it is possible that differences in activity in V3B/KO for glossy vs. matte objects relate to differences in the estimated 3D shape. This appears consistent with the recent work that indicates that V3B/KO integrates different cues to 3D structure (Ban et al., 2012;Dövencioglu et al., 2013;Murphy et al., 2013).

Human STS in gloss processing
The superior temporal sulcus (STS) of the macaque was found to show specific responses to glossy objects based on both fMRI (Okazawa et al., 2012) and single-unit recordings (Nishio et al., 2012). However, in our study we did not find strong evidence for the involvement of human STS in glossiness processing: changes in signals in this area were low, although the causality mapping did indicate some modulation of activity near the STS. It is possible that there are functional differences between human brain and monkey brain. For example, studies found functional differences between the two species in V3A and the intraparietal cortex for three-dimensional structure-from-motion (3D-SFM) processing (Orban, 2011;Vanduffel et al., 2002). It is also possible that the reasonably large voxel sizes used in our study limited our ability to detect responses to glossy stimuli in the human STS, and/or that the underlying population is spatially limited such that it did not survive the cluster threshold we applied.

The advantages of using mosaic spatial scrambling
In our study we chose to generate control stimuli using a scrambling technique applied to a visible grid. The presence of a grid reduces changes in low-level image properties due to scrambling (e.g., luminance, contrast, luminance histogram skew) while disrupting global properties of the shapes that are known to modulate the impression of gloss (Anderson & Kim, 2009;Kim et al., 2011;Marlow et al., 2011). The use of a superimposed grid over the stimuli was conceived to ensure that the amount of edge information in the stimuli was broadly similar between intact and scrambled conditions (Fig. 2). This expedient overcomes the large difference in spatial frequency content that would be produced by scrambling alone (Fig. 2D). Although there are slight differences in spatial frequency between intact objects and their scrambled counterparts (see Fig. 2D), scrambling had similar effects for Glossy and Matte conditions. Therefore differences in the spatial frequency spectra could not be the only cause for the pattern of results found. Furthermore, image statistics (luminance, contrast, skew and spatial frequency) did not differ substantially between Glossy and Matte conditions, ensuring that the results are not due to these properties as well. One could also argue that images with an overlaid grid could be amodally completed behind the occlusions. Such completion would be present for intact objects in both Glossy and Matte conditions. Therefore the completion-related activity would not bias the results. Similarly, even though scrambling clearly makes the stimuli occupy a larger portion of the visual field (Fig. 1), our analysis procedures makes it unlikely that such differences contributed to the findings we report in the study. This is because our conjunction analyses were not based only on [Glossy vs. Glossy Scrambled] and on [Matte vs. Matte Scrambled] comparisons, but also on the contrast [Glossy vs. Matte]. Overall, the results we presented cannot be explained by local edges, contrast, or configuration changes as these factors were the same for Glossy and Matte conditions.
We should also note that during our experiments our participants were not making active perceptual judgments of gloss. It is possible that activations would have been stronger had we asked for concurrent perceptual judgments. However, this would likely have introduced attention-based differences between the intact and scrambled conditions, which we deliberately sought to avoid using a task at the fixation point.

Conclusion
This study reveals that V3B/KO and pFs are selectively active when processing images of glossy objects. This finding is consistent with other recent human fMRI studies and it suggests close but dissociated networks for gloss and material processing in the ventral stream. Our results point to a different role of V3B/KO and pFs, suggesting that V3B/KO may be tuned to processing highlight boundaries or 3D shape properties rather than to glossiness processing. Overall, our study highlights a small network in the fusiform sulcus that may be important in supporting our perception of surface gloss.