Experimental verification for perceptual and cognitive processing of visual aesthetic experiences

Theoretical studies suggest that several mechanisms underlie human visual aesthetic experiences: perceptual processing, which has small variability among individuals (shared properties) and strong correlation with image statistics (e.g. color statistics); cognitive processing, which is idiosyncratic and has weak correlation with image statistics; and emotional processing, which determines the affective part of the aesthetic experience. Furthermore, several experimental studies have reported that the visual aesthetic experience can be largely explained by only a few latent factors. However, it is unclear whether the idiosyncrasy and sensitivity to the image statistics of the latent factors from empirical studies are consistent with the multi-stage processing hypothesis from theoretical studies. In the present study, using exploratory factor analysis, we derived three latent factors of visual aesthetic experiences from participants observing landscape paintings and photographs. Then we examined the difference in the idiosyncrasy of the factors and the relationship between the factors and color statistics. We found that there were significant correlations among the color statistics and Factors 1 and 3, which had a small variance of factor scores (low idiosyncrasy), and no or weak correlation among the color statistics and Factor 2, which had a large variance of factor scores (high idiosyncrasy). Our results provide experimental evidence for the perceptual and cognitive processing of visual aesthetic experiences.


Introduction
Visual images often stimulate complex aesthetic experiences. Explaining the mechanisms underlying visual aesthetic experiences is one of the oldest issues in the fields of philosophy and psychology (Leder, 2014;Shimamura, 2011). Recent studies have suggested at least three major processes generate a visual aesthetic experience (Leder, Belke, Oeberst, & Augustin, 2004;Pelowski, Markey, Lauring, & Leder, 2016;Redies, 2015).
One major process is perceptual processing, in which the visual aesthetic experience is based on several formal properties of the visual stimuli (Bell, 1914;Di Dio, Macaluso, & Rizzolatti, 2007;Jacobsen & Höfel, 2002;Leder et al., 2004;Redies, 2015). In perceptual processing, it is assumed that a visual aesthetic experience is mainly processed nonverbally and that the processing does not have to reach consciousness. Bell (1914) hypothesized that the properties of perceptual processing are universal and elicit the aesthetic experience independent of the background of the observer, e.g. culture, education, or personality, and of knowledge about the artwork itself or the context in which the artwork was made.
The second major process is cognitive processing, our understanding of which has been advanced by both contemporary philosophical aesthetics and experimental psychology (Cupchik, Vartanian, Crawley, & Mikulis, 2009;Kirk, Skov, Christensen, & Nygaard, 2009;Leder et al., 2004;Leder, Carbon, & Ripsas, 2006;Russell, 2003). It was suggested that cognitive processing is related to one's value judgement, which depends on the circumstances under which the image was created or displayed (Leder et al., 2004). Therefore, unlike perceptual processing, in cognitive processing, it is likely that the visual aesthetic experience depends on factors relating to the individual and thus shows wide variability (Vessel, 2010).
The above suggests that a visual aesthetic experience depends on at least two apparently contradictory processes. However, both are contributors to the experience gained when observing visual images (Leder et al., 2004;Pelowski et al., 2016;Redies, 2015). Leder et al. (2004) proposed a multi-stage processing model of five stages, in which perceptual analysis is followed by multiple cognitive mastering processes. More recently, Redies (2015) proposed a model of the visual aesthetic experience that combined perceptual and cognitive processing in parallel pathways. In both these models, the visual aesthetic experience is accomplished not by a single process, but rather by several subprocesses. Supporting this theory, it was reported that pictures of a real human body and sculptures of young athletes elicited similar global brain activation patterns, but the right antero-dorsal insula was activated only during the aesthetic judgement task (Di Dio, Canessa, Cappa, & Rizzolatti, 2011). These results are consistent with the possibility that a visual aesthetic experience is divided into outputs of cognitive processing and perceptual processing.
The third major process is emotional processing (Chatterjee & Vartanian, 2014;Jacobsen, 2004;Leder et al., 2004;Leder & Nadal, 2014;Pelowski et al., 2016;Redies, 2015;Silvia, 2014). Several studies have emphasized the willful expression of a specific emotion by the artist (Ducasse, 1964;Fellous, 2006;Silvia, 2005). Others have suggested the role of the observer in problem solving while analyzing and understanding a visual image, that is, perceptual and cognitive processing will lead to emotional responses (Leder, Gerger, Dressler, & Schabmann, 2012;Mills, 2001;Silvia, 2010). Accordingly, Redies (2015) proposed that emotional processing can have a modulatory effect on a visual aesthetic experience, while perceptual and cognitive processing induce emotions. Additionally, Redies (2015) and Leder & Nadal (2014) suggested that emotional processing has a close and dynamic interaction with perceptual and/or cognitive processing. Together, it is reasonable to assume that these three major processes (perceptual, cognitive, and emotional) make the basis of the multi-stage processing hypothesis.
In line with this hypothesis, previous studies using exploratory factor analysis (EFA) or principal component analysis (PCA) have reported that the impression of visual images consists of only a few latent factors (latent factor model) (Marković & Radonjić, 2008). However, it is unclear whether the latent factors for an aesthetic expression have any relationship with the individual processes in the multi-stage processing model.
In this study, to clarify whether each latent factor for a visual aesthetic experience is similar to perceptual (small idiosyncrasy and correlation with image statistics) or cognitive processing (large idiosyncrasy and weak or no correlation with image statistics), we investigated the magnitude of the idiosyncrasy of the latent factors for visual aesthetic experiences and the sensitivity of the latent factors to the color statistics of the images. We evaluated the variation of the latent factor scores among individuals as an index of the magnitude of the idiosyncrasy of the factors. We also examined the relationship between the color statistics of the images and the latent factors. Our results show that latent factor scores with relatively low idiosyncrasy exhibited significant correlation with the color statistics, while those with high idiosyncrasy did not. These results suggested a correspondence of the latent factor model for visual aesthetic experiences with the multi-stage processing model.

Participants
Eighty naïve Japanese volunteers without any formal art education participated in the experiments (31 males and 49 females, mean age = 19.6 ± 0.5). All had normal or corrected to normal visual acuity (greater than1.0) and normal color vision based on the Ishihara color vision test (Ishihara, 1972). Before the experiment, all participants were asked to score their familiarity with the visual images used in the main experiment on a 7-scale score (1 = no knowledge and 7 = very familiar). Of the 80 participants, the 76 who exhibited a mean familiarity score ranging from 2.0 to 4.0 were invited to participate in the main experiment (29 males and 47 females, mean age = 19.5 ± 0.5). Each participant gave written informed consent prior to the experiments, which were performed in accordance with the Declaration of Helsinki. All procedures were performed in accordance with the regulations of the Osaka University Medical School Ethics Committee (approval number 13099).

Apparatus and stimuli
For visual stimuli, thirty landscape images (15 paintings and 15 photographs) were selected from a previously reported image database (Kitaguchi, Wakabayashi, Sato, & Naito, 2014). We used both photographs and paintings as the visual stimuli to examine whether the latentfactor model structure is stably independent of the art expression style. A list of the 15 paintings used is shown in Table 1. The photographs were originally selected from the McGill Calibrated Color Image Database (http://tabby.vision.mcgill.ca/) (Fig. 1). In this study, we used HSV (Hue, Saturation, and Value) color space for the color statistics calculation, as it is regarded to be intuitive and perceptually relevant (Harada, Itoh, & Nakatani, 1999). Hue represents colors perceived by humans, such as red, green, and blue. Saturation represents the pureness and intensity of the color. Value represents the lightness of the color. Fig. 2B and C show representative examples of the paintings used in this study. For each image, the color statistics (mean, standard deviation, skewness, kurtosis and entropy) of the Hue, Saturation, and Value in the HSV color coordinate system were measured. Skewness is an index of the asymmetry of the distribution, and kurtosis is an index of the sharpness of the peak of the distribution. Because Hue is an angular value, circular statistics were calculated for the analysis (Hanbury, 2003). Given n Hue values, the mean and higher order statistics (j-th statistics, j ≥ 2) of Hue were derived from the following equations: We chose images so that the color statistics of the entire set of images did not concentrate on a particular value, that is, the color statistics of images were uniformly distributed as much as possible. We also avoided landscape images in which non-typical colors were used for the objects.
For a comparison with previous studies, we calculated the color statistics of the stimuli in CIELAB and HCL (Hue, Chroma, Lightness) color space. Lightness is the same metric as Value. Chroma represents the colorfulness of an area. For the value of Lightness (L*), two parameters, a* and b*, were obtained in CIELAB color space. a* and b* are the color axes red-green and yellow-blue, respectively. Hue and Chroma were derived from the following equations: To evaluate the visual aesthetic, we used the semantic differential method with 12 adjective pairs commonly used in previous studies (Marković & Radonjić, 2008;Osgood, 1964;Tanaka, Oyama, & Osgood, 1963) (Table 2). Each image was printed on A4-size white paper. The image length of the long side was normalized to be 8 cm on paper while keeping the original aspect ratio. Twelve bipolar adjective pairs with 7scale bars from 1 to 7 were printed just below the image. Each participant was instructed that "1 ′′ is for the left-hand side adjective, "4" is neutral, and "7" for the right-hand side adjective. For printing, an RGB profile with the perceptual rendering method was used to keep the color impression of the images in the printed version. The perceptual rendering method is a color coordinate conversion method that allows us to preserve the perceptual relationship between the CMYK color coordinate space used in printers and the RGB color coordinate space used in monitors (Homman, 2009). All images were printed by the same color-matched printer (OKI C511dn, OKI, Japan), whose color profile was calibrated by CM-2600 (KONICA MINOLATA, Japan). For the color statistics analysis and color management of the printed images, we used Photoshop CC (Adobe, USA) and the Python matplotlib library.

Procedure
In real scenes, a painting or photograph is normally seen on paper, not on a computer display. Therefore, we used printed versions of the images rather than virtual displays on a monitor. The experiments were conducted in an experiment room. Bright illumination was kept constant by using blackout curtains and a lighting system. An A4-size booklet, in which the visual images and scale-bars with adjectives were printed, was distributed to each participant. All participants were required to evaluate the score of all adjective pairs for all images. The image order was randomized for each participant. Participants freely took brief rests during the experiment if necessary. A typical experimental duration was ~ 45-60 min (including rest).

Data analysis
We obtained a three-dimensional (3D) tensor structure using the image impression evaluation task (30 images × 12 adjectives × 76 participants). The 3D data were transformed into a 2D matrix (participants-images × adjectives) by the stringing out method (Marković & Radonjić, 2008;Osgood, May, & Miron, 1975). In this study, we constructed the stringing out matrix by arranging the single participant matrices (images × adjectives) one under another. To determine the number of factors, we conducted PCA and employed the factors for which eigenvalues were larger than 1 ( Fig. 2A). Then we adopted a 3latent-factor model. EFA with the 3-latent-factor model was applied to the data with Varimax rotation and the maximum likelihood method. For the factor analysis, we used the 'factoran' command of the Statistics and Machine Learning Toolbox in Matlab (Mathworks, USA).

Reliability of the latent factor model
First, we performed two separate EFA for the two image categories (paintings and photographs). Both EFA yielded 3-factor models and similar factor structures (correlation coefficients, r, between the two factor loadings were 0.95, 0.87 and 0.87 for Factor 1, Factor 2 and Factor 3, respectively). We did not find any systematic difference in the color statistics between the two categories, consistent with a previous study (Montagner, Linhares, Vilarigues, & Nascimento, 2016). Thus, we pooled the stimulus categories together and ran another EFA. Again, we obtained a 3-factor model for the image impression. Table 2 shows the factor loading matrix. The cumulative contribution of the three factors was ~ 55% (the contributions of Factor 1, Factor 2 and Factor 3 were 21%, 20% and 13%, respectively) Next, we examined the reliability of the EFA result by split-half analysis (Revelle & Zinbarg, 2009). In this analysis, participants were randomly divided into two groups (N = 38 each). EFA with the 3-factor model was conducted on the two groups separately, and a factor loading matrix was obtained for each group. Then the correlation coefficients between factor loading matrices were calculated, and the reliabilities were estimated by the Spearman-Brown prophecy test. All corresponding latent factors exhibited extremely high reliabilities (ρ of Factor 1 = 0.99, Factor 2 = 0.99, and Factor 3 = 0.98). The results show that both the number of images and participants of this study were sufficient to obtain reliable EFA results.

Idiosyncrasy of latent factors
We defined a 3D 'Sensitivity Space' in order to visualize each participant's aesthetic experience. Each axis of the Sensitivity Space corresponds to one latent factor. Fig. 2B and C, bottom, show the factor scores in the Sensitivity Space for two images: B, Vincent van Gogh (1888), 'The Langlois Bridge at Arles' (Wallraf-Richartz Museum); and C, Andrew Wyeth (1978), 'Mill in Winter' (©Andrew Wyeth/ Fukushima Prefectural Museum of Art). Each dot in the space represents a factor score from each participant for the image. As seen in Fig. 2B and C, the factor scores clustered more densely in the left image (B). That is, the dispersion of the factor score in all axes was smaller in the left image (B) than in the right one (C), which corresponds to a difference in the idiosyncrasy of the visual aesthetic experience between the two images.
To evaluate the difference in idiosyncrasy of the latent factors, we compared the variance of the factor scores for each image. To this end, Gaussian fitting was applied. Before fitting, Kolmogorov-Smirnov tests were performed for each factor of each stimulus. The results of the tests Fig. 3. Gaussian fittings to factor score histograms. The idiosyncrasy of each latent factor to the painting stated at the top was evaluated by the width of the fitted Gaussian function. revealed the factor scores for all stimuli followed a normal distribution. Thus, Gaussian fitting was applied to the factor scores (Fig. 3). The means and standard deviations of the factor scores were captured by the center and width (σ) of the Gaussian function, respectively.
The mean values of σ for Factors 1, 2 and 3 were 0.57 ± 0.12, 0.76 ± 0.13 and 0.61 ± 0.10, respectively (Fig. 4). The differences between Factors 1 and 2 and between Factors 2 and 3 were statistically significant (Holm multiple comparisons after repeated measures one-way ANOVA; Factor 1 vs. Factor 2, p < 0.01; Factor 2 vs. Factor 3, p < 0.01; Factor 1 vs. Factor 3, p = 0.18). The results suggested that the idiosyncrasies of Factors 1 and 3 were smaller than that of Factor 2. We conducted the same analysis using the variance of raw scores and of normalized scores (z-scores) and obtained very similar results (data not shown).

Relationship between latent factors and image color statistics
Next, we investigated the relationship between the color statistics of the images and the factor scores. For this analysis, we adopted an HSV color-coordinate space. For each image, we calculated four color statistics: mean, standard deviation, skewness and kurtosis for each of Hue, Saturation and Value (Table 3; see Methods for the calculation). We also measured entropy, which is an index of the color non-uniformity of the images. We found significant correlations with Factor 1 and the mean (r = 0.36, p < 0.05), skewness (r = -0.54, p < 0.01) and kurtosis (r = -0.53, p < 0.01) of Saturation; with Factor 3 and the entropy of Hue (r = 0.49, p < 0.01), the mean (r = 0.56, p < 0.001), skewness (r = -0.43, p < 0.05), kurtosis (r = -0.43, p < 0.05) and entropy of Saturation (r = 0.52, p < 0.01), and the skewness of Value (r = 0.40, p < 0.05); and no significant correlation with Factor 2 and any color statistic. Based on these results, we concluded that factors with relatively low idiosyncrasy (Factors 1 and 3) exhibited significant correlations with the color statistics, while the factor with high idiosyncrasy (Factor 2) did not.
To compare our results with previous studies, we conducted correlation analyses by using HCL and CEILAB color space (see Methods). Table 4 describes the correlations between the latent factors and color statistics in the HCL color space. Factor 3 showed a significant correlation with the kurtosis of Lightness (r = -0.40, p < 0.05). Table 5 describes correlations between the latent factors and color statistics in the CEILAB color space. Factor 3 showed a significant correlation with the standard deviation and entropy of a*.
Taken together, Factor 1 showed a significant correlation with the color statistics in HSV color space, and Factor 3 showed a significant correlation with the color statistics in the three-color spaces. Factor 2 showed no significant correlation in any three-color space.
However, consistency between the latent factor model and the multistage processing model has not been systematically evaluated. If a latent factor corresponds to perceptual processing, it should show significant correlation with several basic visual properties or image statistics, e.g. color statistics (Hue, Saturation, and Lightness) (Mallon, Redies, &   Hayn-Leichsenring, 2014), and relatively small variability among individuals compared with other factors. In contrast, if a factor corresponds to cognitive processing, it will show weak or no correlation with basic perceptual properties and image statistics and instead show high idiosyncrasy.

Correspondence between the multi-stage processing model and latent factor model
Our results suggest that Factor 1 exhibited relatively low idiosyncrasy and had sensitivity to the color statistics in the HSV color system, which is consistent with the properties of perceptual processing in theoretical models. The adjectives that showed strong correlation with Factor 1 seemed to relate to perceptual judgement.
Factor 3 also showed low idiosyncrasy and significant sensitivity to the color statistics in all color coordinate systems we used (HSV, HCL, and CIELAB). The adjectives that showed strong correlation with Factor 3 seemed to relate to perceptual judgement (see Table 2 for details). Therefore, our findings suggested the possibility that Factor 3 also reflects the properties of perceptual processing. Taken together, we concluded that perceptual processing can be divided into two subcomponents. The adjectives in Factor 1 seems to be mainly related to brightness and contrast, while those in Factor 3 seems to be mainly related to motion information. Therefore, we inferred that Factor 1 mainly reflects the properties of brightness and luminance contrast perception, while Factor 3 mainly reflects the properties of motion perception. This hypothesis can be explored by investigating different adjective sets.
In contrast, Factor 2 might correspond to cognitive processing, which exhibits high idiosyncrasy and is relatively insensitive to the color statistics of the image. The adjectives that showed strong correlation with Factor 2 seemed to relate to value judgement (e.g. Beautiful-Ugly and Artistic-Inartistic; for details see Table 2), which depends on cognitive processing in the theoretical model (Leder et al., 2004;Leder & Nadal, 2014).
Regarding emotional processing, there are no clear predictions for idiosyncrasy or known correlations with the color statistics in theoretical models (Leder et al., 2004;Leder & Nadal, 2014). It has been proposed that emotional processing can have a modulatory effect on a visual aesthetic experience and that it has a close and dynamic interaction with perceptual and/or cognitive processing. In the current study, some adjectives obtained in the latent factors seem to be related to emotional processing (e.g. Stable-Unstable in Factor 1, Likable-Unlikable in Factor 2, and Lively-Lifeless in Factor 3). Therefore, we concluded that emotional processing could not be obtained as an independent latent factor, but rather it acted as a modulator of factors corresponding to perceptual and cognitive processing.
Overall, our results suggest that Factors 1 and 3 are consistent with the properties of perceptual processing and that Factor 2 likely corresponds to cognitive processing. Therefore, each factor reflects the characteristics of one of the processes of the theoretical multi-stage processing model, at least to some extent.

Functional significance of the image color statistics of natural images for an visual aesthetic experience
The results of the correlation analysis between the factor scores and color statistics of the images revealed that Factors 1 and 3 had significant correlation with several color statistics (Table 3). In contrast, there was no significant correlation between Factor 2 and any color statistic. Taken together, the results suggest that color statistics, especially the skewness and kurtosis of Saturation, play a role in shaping the shared visual aesthetic among observers.
The psychological meanings of skewness and kurtosis in the color statistics are, to our knowledge, unknown. Our results provide the functional significance of higher order color statistics of visual images, that is, higher order color statistics may play a role in generating commonly shared visual aesthetic experiences. The artist could therefore adjust the higher order color statistics to elicit an intended visual aesthetic experience from the observers. Gao et al. (2007) investigated the relationship between color and emotional impression using a similar method as the present study. They used single color patches as the stimuli and derived the factors 'Potency', 'Activity' and 'Temperature'. They used color properties in HCL color space (Hue, Chroma and Lightness) and reported that Activity and Potency exhibit significant correlation with Chroma and Lightness, respectively. In our study, correlation coefficients of the adjectives to the latent factors suggested that Factor 1 and Factor 3 partially corresponded to the Activity and Potency in Gao et al. (2007), respectively, even though the adjectives of the two studies were slightly different. Thus, the correlation analysis showed slightly different results from the two studies. The difference can be attributed to differences in the stimuli: single color patches that had no higher order color statistics (Gao et al., 2007) versus natural images (ours). Furthermore, it has been suggested that spatial interactions in adjacent colors, so-called color contrast or color assimilation, may affect the impression of the visual images (Ikeda, Matsuyoshi, Sawamoto, Fukuyama, & Osaka, 2015). If so, the spatial interactions between the different color regions of an image make it difficult to extrapolate the color impression of a singlecolor patch stimulus to a natural image.
The list of adjectives in the present study contains more emotional adjectives (e.g. Beautiful-Ugly, Likable-Unlikable) than Gao et al. As mentioned below, our results suggested that emotional factors were not dissociated as an independent factor but acted as a modulator of both perceptual and cognitive factors. This emotional modulation could also have contributed to the differences seen in the two studies.

Factors affecting cognitive processing
In this study, we only evaluated the relationship between the color statistics of visual images and latent factor scores of the visual aesthetic experience. Redies (2015) argued that the circumstances under which visual images are created or displayed and the intention of the artist who created the artwork affect cognitive processing. Therefore, if Factor 2 corresponds to cognitive processing, these circumstances may influence the Factor 2 score. Brieber, Nadal, & Leder (2015) reported that artworks were rated more arousing, positive, interesting and liked in a museum than in a laboratory (Brieber et al., 2015). This finding suggests that the experience of art relies on the environment. However, in the present study, all visual images were presented in the same condition to all participants, such that the display condition would not have an impact on Factor 2. Future studies should consider how the environment or presentation affects the factor scores, especially those related to cognitive processing. Another limitation of our study is the strong homogeneity of the participants, e.g. same nationality, similar educational history and no expertise on art. Factor 2 showed low agreement between participants in this study, suggesting individual factors may contribute to Factor 2. The theoretical multi-stage processing model suggests that cognitive processing is strongly affected by individual factors such as culture and knowledge on art. Regarding cultural background, previous studies reported that Westerners and Asians show distinctive aesthetic preferences to the same visual stimulus (Masuda, Gonzalez, Kwan, & Nisbett, 2008;Nisbett & Masuda, 2003). Additionally, a previous study reported that culture has an effect on color preference (Yokosawa, Schloss, Asano, & Palmer, 2016). For example, East Asians have a higher preference for reddish tones than Europeans, but no preference for yellowish or bluish tones (Gao et al., 2007;Ou et al., 2004). The difference in color preference among cultures may be attributed to differences in cognitive processing or emotional processing. However, in this study, we used Japanese participants of the same native language, which is why it is unlikely that cultural differences account for the observed properties.
Ecological valence theory (EVT) argues that the degree people like/ dislike a specific color is associated with the degree that they like/dislike environmental objects that are associated with that color (Palmer & Schloss, 2010;Palmer, Schloss, & Sammartino, 2013). Factor 2 exhibited a tight correlation with the adjective pair of Likable-Unlikable. In this study we used landscape paintings and photographs as visual stimuli. According to EVT, the preference for each scene is probably different among observers depending on what the image reminds the participant of. In line with EVT, future studies using abstract paintings as visual stimuli may decrease the idiosyncrasy of Factor 2.
Regarding education, it is reported that art expertise affects the aesthetic judgement (Hekkert & van Wieringen, 1996;Paasschen, Bacci, & Melcher, 2015). From the results of those studies, it is plausible that art expertise may affect cognitive processing, but not perceptual processing. Because none of our participants had art expertise, we could not make any definitive conclusions about the correspondence between Factor 2 and cognitive processing.
Finally, extrapolation of our results to other geographical populations should be done cautiously, because we only used Japanese participants. From the viewpoint of cultural differences affecting color preference, we cannot deny that participants belonging to different cultures exhibit different latent factor sensitivity for color statistics. To analyze these correspondences in future studies, it is necessary to control the nationality, personality, educational history, age, and art expertise of the participants.

Conclusion
In the current study, we derived three latent factors for visual aesthetic judgement. We found that Factors 1 and 3 exhibited small idiosyncrasies and significant correlations with the color statistics. These correlations are consistent with the properties of perceptual processing. We also found that Factor 2 showed significantly large idiosyncrasy and no or weak correlation with the color statistics, which is consistent with cognitive processing in the theoretical model. Based on these findings, we concluded that the latent factor model for aesthetic experience was partially consistent with theoretical multi-processing models (Leder et al., 2004;Redies, 2015). Our results suggest that the color statistics of visual images contribute to a shared perceptual visual aesthetic experience among observers.

Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.