Training Facilitates Object Recognition in Cubist Paintings

To the naïve observer, cubist paintings contain geometrical forms in which familiar objects are hardly recognizable, even in the presence of a meaningful title. We used fMRI to test whether a short training session about Cubism would facilitate object recognition in paintings by Picasso, Braque and Gris. Subjects, who had no formal art education, were presented with titled or untitled cubist paintings and scrambled images, and performed object recognition tasks. Relative to the control group, trained subjects recognized more objects in the paintings, their response latencies were significantly shorter, and they showed enhanced activation in the parahippocampal cortex, with a parametric increase in the amplitude of the fMRI signal as a function of the number of recognized objects. Moreover, trained subjects were slower to report not recognizing any familiar objects in the paintings and these longer response latencies were correlated with activation in a fronto-parietal network. These findings suggest that trained subjects adopted a visual search strategy and used contextual associations to perform the tasks. Our study supports the proactive brain framework, according to which the brain uses associations to generate predictions.


INTRODUCTION
Object recognition is a highly developed visual skill in primates. Behavioral and electrophysiological studies in humans and monkeys have suggested that object recognition is a rapid process that can be achieved within a few hundred milliseconds (Rousselet et al., 2002). Moreover, it has been shown that identifi cation of objects within natural scenes is facilitated when the context is meaningful (Biederman, 1972;Bar, 2004). The process of parsing the world into meaningful objects is mediated by activation in ventral occipitotemporal cortex, the so called "what" pathway (Ungerleider and Mishkin, 1982;Goodale and Milner, 1992;Haxby et al., 1994). Recent functional brain imaging studies in humans have shown that objects elicit neural responses in a distributed cortical network that encompasses a wide expanse of extrastriate cortex (Ishai et al., 1999(Ishai et al., , 2000aHaxby et al., 2001), where various object categories such as faces, animals, houses, tools, and body parts elicit distinct patterns of activation (Kanwisher et al., 1997;Aguirre et al., 1998;Epstein and Kanwisher, 1998;Ishai et al., 2000a;Downing et al., 2001;Yago and Ishai, 2006). Furthermore, ambiguous fi gures (Kleinschmidt et al., 1998), illusory contours (Stanley and Rubin, 2003), binocular rivalry (Tong et al., 1998), and visual imagery (Ishai et al., 2000b(Ishai et al., , 2002Mechelli et al., 2004) evoke activation in these object-responsive regions, suggesting that the visual system imposes top-down interpretations on ambiguous bottom-up retinal input.
Art compositions comprise a special class of visual stimuli with which one can investigate the mechanisms of various cognitive processes (e.g., Chatterjee, 2004). Specifi cally, abstract and indeterminate paintings, which resist identifi cation, can be used to investigate the neural correlates of object recognition (Ishai et al., 2007). Indeterminate artworks (Pepperell, 2006) present the viewer with an apparently meaningful yet persistently meaningless scene, or a Training facilitates object recognition in cubist paintings

Institute of Neuroradiology, University of Zurich, Zurich, Switzerland
To the naïve observer, cubist paintings contain geometrical forms in which familiar objects are hardly recognizable, even in the presence of a meaningful title. We used fMRI to test whether a short training session about Cubism would facilitate object recognition in paintings by Picasso, Braque and Gris. Subjects, who had no formal art education, were presented with titled or untitled cubist paintings and scrambled images, and performed object recognition tasks. Relative to the control group, trained subjects recognized more objects in the paintings, their response latencies were signifi cantly shorter, and they showed enhanced activation in the parahippocampal cortex, with a parametric increase in the amplitude of the fMRI signal as a function of the number of recognized objects. Moreover, trained subjects were slower to report not recognizing any familiar objects in the paintings and these longer response latencies were correlated with activation in a fronto-parietal network. These fi ndings suggest that trained subjects adopted a visual search strategy and used contextual associations to perform the tasks. Our study supports the proactive brain framework, according to which the brain uses associations to generate predictions.
knowledge on bottom-up processing. The aim of the current study was to test the extent to which a short training session about Cubism would facilitate object recognition in paintings by Picasso, Braque and Gris. We assumed that providing naïve subjects with some information about Cubism would aid task performance and hypothesized that subjects who received training would recognize familiar objects faster than control subjects, and would exhibit stronger activation in object-responsive and attention-related regions.

SUBJECTS
Twenty-four healthy, right-handed subjects (13 males, 11 females, mean age 24 years) with normal vision participated in the study. All subjects gave informed written consent for the procedure in accordance with protocols approved by the University Hospital of Zurich. The subjects, students from the University of Zurich, had no formal art education and reported visiting art museums once a year or less. Post-scan questionnaires revealed that all subjects were unfamiliar with the paintings and had not seen them prior to the experiment.

STIMULI AND TASKS
Stimuli were displayed using Presentation (www.neurobs.com, version 12.2) and were projected with a magnetically shielded LCD video projector onto a translucent screen placed at the feet of the subject. Stimuli consisted of 42 color and 42 monochrome Cubist paintings by Picasso, Braque and Gris. In half the trails, scrambled images, which were created by phase scrambling luminance and color information from these paintings, were used for visual baseline. We used an event-related design: in each trial, a meaningful title (e.g., "Vase with fl owers") or the word "Untitled" was presented for 1.5 s, followed by a painting or a scrambled image, which was presented for 3.5 s. While the picture was on the screen, subjects had to answer the question "Do you recognize any familiar objects?" by pressing one of two buttons (Yes/No). A screen then appeared for 3 s with the question "How many objects did you recognize?" and subjects had to press one of four buttons to indicate "0", "1", "2" or "3 or more" objects. A blank screen (inter-stimulus-interval) was then presented for 8 s, thus, the duration of each trial was 16 s. Trial types (painting/scrambled; title/untitled) were randomized and for each subject 7 time series of 16 trials each were collected. Thirty minutes before scanning, half the subjects (six males, six females) received a short training session, during which they were presented with information about Cubism, viewed examples of Cubist paintings, and practiced recognizing familiar objects in these paintings.
High-resolution spoiled gradient recalled echo structural images were collected in the same session for all the subjects (160 sagittal slices, TR = 8.21 ms, TE = 3.8 ms, fi eld of view = 240 mm, acquisition matrix = 256 × 256, reconstructed voxel size = 1 × 0.9 × 0.9 mm). These high-resolution structural images provided detailed anatomical information for the region-of-interest (ROI) analysis and for 3D normalization to the Talairach and Tournoux atlas (1998).

DATA ANALYSIS
For each subject, responses and reaction times were computed for stimulus type (painting/scrambled), title (meaningful title/ untitled), object recognition (Yes/No) and number of objects (0, 1, 2, 3 or more) tasks. ANOVA was used to compare the various conditions.
Functional MRI data were analyzed in BrainVoyager QX Version 1.10 (Brain Innovation, Maastricht, The Netherlands). All volumes were realigned to the fi rst volume, corrected for motion artefacts and spatially smoothed using a 5-mm full-width-at-half-maximum Gaussian fi lter. Stimulus events were modeled using a delta function, which was convolved with a canonical hemodynamic response function to yield a regressor for each condition. The main effects of interest (paintings vs. scrambled images; Yes vs. No objects; number of recognized objects; and titled vs. untitled paintings) were analyzed using the General Linear Model (Friston et al., 1995). Based on the main effect (paintings vs. scrambled images, p < 0.001, uncorrected) a set of ROIs was defi ned, which included the dorsal occipital cortex (DOC), fusiform gyrus (FG), parahippocampal cortex (PHC), intraparietal sulcus (IPS), inferior frontal gyrus (IFG), putamen and the anterior cingulate cortex (ACC). Note that the specifi cation of ROIs was orthogonal to the subsequent tests that were addressed at the second level analysis. For each subject and in each ROI, the mean parameter estimates were calculated separately for each experimental condition (title, training, objects and number of objects) and were used for between-subjects random-effects analyses.
Finally, we tested whether reaction times were correlated with brain activation by including the response latencies as a covariate in the GLM analysis. The reaction times of each subject and each trial were normalized by z-transformation, and the standard hemodynamic response function (HRF) was then multiplied with the new z-values for each trial, thus creating a latency-correlated design-matrix.

BEHAVIORAL DATA
The behavioral data collected while subjects performed the tasks in the scanner are shown in Figure 1. During the fi rst task ("did you recognize any familiar objects?") trained subjects had a signifi cantly higher proportion of Yes responses than control subjects [t(22) = 2.35, p < 0.05]. In terms of response latencies, it took control subjects the same time to respond "Yes, I recognized familiar objects" and "No, I did not recognize familiar objects". In contrast, trained subjects took signifi cantly longer to report "No, I did not recognize familiar objects", both relative to their own Yes responses [t(22) = 3.85, p < 0.001], and to the Yes responses made by the control subjects [t(22) = 3.25, p < 0.01].
We then compared the response to titled and untitled paintings. During the object recognition task, trained subjects had a significantly higher proportion of Yes responses (0.8 ± 0.04, mean ± SE) than control subjects (0.6 ± 0.06) for paintings that were preceded by meaningful titles [t(22) = 2.77, p < 0.05]. During the number of objects task, trained subjects reported not recognizing any objects ("0") signifi cantly less than control subjects [t(22) = 2.69, p < 0.05] for paintings that were preceded by meaningful titles, and a 2-way ANOVA revealed a signifi cant interaction between the two groups and the reported number of objects [F(3,95) = 6.06, p < 0.001]. Moreover, trained subjects reported recognizing two objects in titled paintings signifi cantly more than control subjects [t(22) = 3.64, p < 0.01].
Finally, we tested whether there were any differences between control and trained subjects in terms of their responses to the scrambled images. During the object recognition task, control subjects did not recognize familiar objects in 91% ± 4% of the scrambled paintings, whereas trained subjects did not recognize familiar objects in 96% ± 2%. Moreover, response latencies were virtually identical (793 ± 42 and 780 ± 47 ms for control and trained subjects, respectively). In terms of the number of recognized objects, control subjects recognized one object in 7% ± 3% of the scrambled paintings and their mean reaction time was 1089 ± 113 ms, whereas trained subjects recognized one object in 4% ± 2% and their mean response latency was 1101 ± 98 ms. The differences were not statistically signifi cant.

FIGURE 1 | Behavioral data. (A)
Mean responses and reaction times recorded during the fi rst task ("did you recognize any familiar objects?"). (B) Mean responses and reaction times recorded during the second task ("how many familiar objects did you recognize?"). In this and subsequent graphs, error bars indicate standard error of the mean (SEM).

IMAGING DATA
The main effect, namely responses evoked by all paintings as compared with the scrambled paintings baseline, revealed activation within a distributed cortical network that included multiple, bilateral regions (Figure 2). Signifi cant activation was found in DOC, FG, IPS, PHC, IFG, and ACC (see Table 1 for mean Talairach coordinates and cluster size). Comparing color with monochrome paintings revealed activation in extrastriate cortex (mean Talairach coordinates: 30, −70, −13; −26, −70, −13), consistent with previous fi ndings of activation in human V4 (e.g., McKeefry and Zeki, 1997). We then conducted an ROI analysis to test for differences between Yes and No responses during the object recognition task, number of recognized objects, and titled as compared with untitled paintings. We found that within the IPS, recognizing familiar objects evoked stronger activation than not recognizing any objects. In both hemispheres, The difference between Yes and No responses was statistically signifi cant in both control [mean parameter estimates ± SE were 1.61 ± 0.09 and 1.26 ± 0.12, respectively, t(22) = 4.92, p < 0.0001] and trained subjects [1.79 ± 0.06 and 1.42 ± 0.07, respectively, t(34) = 7.52, p < 0.0001].
We also found an effect of title on the number of recognized objects. Thus, within the PHC, trained subjects showed higher activation than control subjects for "3 or more objects" responses [1.95 ± 0.06 and 1.71 ± 0.08, respectively, t(39) = 2.28, p < 0.05]. Furthermore, within the FG, trained subjects showed higher activation than control subjects for recognizing 3 or more objects [2.11 ± 0.05 and 1.85 ± 0.07, respectively, t(40) = 2.83, p < 0.01].
Signifi cant differences between trained and control subjects were found in the paraphippocampal cortex, in terms of the evoked response associated with the number of recognized objects (Figure 3). Trained subjects showed an increase in the amplitude of the fMRI signal as a function of the number of objects they recognized. Thus, recognizing 3 or more objects evoked higher activation than not recognizing any objects [t(40) = 6.03, p < 0.0001]. The difference between trained and control subjects in terms of activation in the FG and PHC was statistically signifi cant for 3 or more objects [t(40) = 2.83, p < 0.01 and t(39) = 2.28, p < 0.05, respectively]. Finally, the interaction within the FG between group and number of objects was signifi cant [F(3,335) = 4.16, p < 0.01].

DISCUSSION
In this study we tested whether a short training session would facilitate object recognition in cubist paintings. We found that training resulted in significant behavioral and neural changes.  Trained subjects were faster and recognized significantly more familiar objects in the paintings, and exhibited enhanced activation in the parahippocampal cortex. Furthermore, trained subjects were significantly slower to report not recognizing any familiar objects in the paintings and these longer response latencies were correlated with activation in a fronto-parietal network that mediates spatial attention (e.g., Kastner and Ungerleider, 2000). objects in titled paintings resulted in enhanced activation in the IPS. These fi ndings suggest that meaningful titles can provide the top-down solution for ambiguous visual input, but only when prior knowledge or experience exists. Our fi ndings are consistent with previous fi ndings which showed that presenting a topic before a prose passage facilitates its subsequent comprehension and recall (Bransford and Johnson, 1972), indicating that relevant contextual information is required for understanding. Recent studies, in which eye-movement recordings were compared, have shown that artists view pictures differently from laymen: artists spent more time scanning structural and abstract features, whereas artistically untrained subjects viewed human features and objects (Vogt and Magnussen, 2007). Taken together, these observations suggest that recognition of familiar content in art works is a skill acquired through training.
The most surprising and intriguing fi nding in our study is the enhanced activation in the parahippocampal cortex of trained subjects. The PHC, a region implicated in the representation and processing of spatial navigation information (Epstein and Kanwisher, 1998), episodic memory (e.g., Gabrieli et al., 1997) and remote spatial memories (Spiers and Maguire, 2007), is a major node in the cortical network for contextual associations (Bar et al., 2008). Associations are formed over time, when repeated patterns and statistical regularities are extracted from the environment and stored in memory. It has been recently suggested that the role of associations is to generate predictions about the immediate future in order to guide behavior (Bar, 2007). It is highly likely that due to the short training session, our subjects used contextual associations in order to perform the tasks. For example, a meaningful title such as "Woman Reading" likely activated an existing "script" of a living room, a familiar scene which was previously encountered and stored in memory (see Bar, 2009). Subjects were therefore able to anticipate a woman sitting, a chair or a sofa, hands holding a book, etc. Thus, prior experience and stored representations facilitated the comprehension of visual scenes represented in indeterminate cubist paintings.
On a more speculative note, our fi ndings could also provide empirical evidence for Bayesian analysis, which was proposed as a model for object perception (Kersten et al., 2004) and evoked cortical responses (Friston, 2003(Friston, , 2005. According to the Bayes perspective, the short training session enabled our subjects to successfully match the indeterminate visual input with their top-down predictions. It is reasonable to assume that trained subjects were more likely than control subjects to suppress errors and establish a consensus between the actual bottom-up input and the top-down prediction. Thus, minimizing prediction error resulted in faster recognition of more familiar objects in cubist paintings. In contradistinction with perceptual learning, which requires repeated sessions, or the long-term acquisition of expertise (Poldrack, 2002;Bukach et al., 2006), our subjects underwent a short training session, 30 min before their brains were scanned. During this training session, subjects were presented with examples of cubist paintings and learned how to recognize familiar objects depicted in these paintings. The behavioral and neural changes observed in our trained subjects are therefore likely due to a strategy they adopted during training. Based on their responses during the object recognition and number of object tasks, and given the observed patterns of brain activation, it is reasonable to assume that trained subjects used contextual associations and a visual search strategy in order to perform the tasks.
The extent to which titles do or should infl uence the perception of meaning and the aesthetic impression of art compositions is contentious. In art theoretical terms, critics of a formalist persuasion claim that titles are merely "identifi cation tags" that should not affect the viewer's reading of the work. Others, however, claim titles function as guides to interpretation and provide important contextual cues to engage the attention of the viewer (Fisher, 1984). Empirical evidence suggests that titles infl uence both the understanding and the appreciation of art paintings (e.g., Leder et al., 2006). In a compelling example of the top-down effects of titles on art perception, viewers' description of the content of paintings varied according to the title (e.g., "Agony" vs. "Carnival") they were presented with (Franklin et al., 1993). In our experiment, cubist paintings were preceded by their meaningful title or by the word "Untitled". We found that meaningful titles facilitated object recognition, but only in trained subjects. Thus, relative to control subjects, trained subjects reported recognizing more familiar objects in paintings with meaningful titles. Moreover, recognition of two