Subjective Ratings of Beauty and Aesthetics: Correlations With Statistical Image Properties in Western Oil Paintings

For centuries, oil paintings have been a major segment of the visual arts. The JenAesthetics data set consists of a large number of high-quality images of oil paintings of Western provenance from different art periods. With this database, we studied the relationship between objective image measures and subjective evaluations of the images, especially evaluations on aesthetics (defined as artistic value) and beauty (defined as individual liking). The objective measures represented low-level statistical image properties that have been associated with aesthetic value in previous research. Subjective rating scores on aesthetics and beauty correlated not only with each other but also with different combinations of the objective measures. Furthermore, we found that paintings from different art periods vary with regard to the objective measures, that is, they exhibit specific patterns of statistical image properties. In addition, clusters of participants preferred different combinations of these properties. In conclusion, the results of the present study provide evidence that statistical image properties vary between art periods and subject matters and, in addition, they correlate with the subjective evaluation of paintings by the participants.


Introduction
In his book Vorschule der A¨sthetik, Gustav Theodor Fechner (1876) laid the foundations for a new scientific discipline, that is, the systematic search for stimulus properties that are associated with beauty (experimental aesthetics). He was one of the first to directly measure such properties in aesthetic stimuli. Nowadays, a large number of firmly established empirical methods are applied in this field. A particular focus has been on statistical image properties (SIPs) that are relevant to visual perception in humans. For many years, researchers have explored whether SIPs provide objective criteria to assess the aesthetic quality of artworks and photographs. Their goal was (and is) to identify universal features that are positively correlated with beauty and aesthetic experience. Until now, no such universals have been found. The main reason for this might be that experiences of beauty are-at least partly-domain specific (Markovic´, 2014). For example, the concept that the Golden Section-as proposed by Gustav Theodor Fechner-is universally perceived as beautiful across domains has been questioned and eventually rebutted (McManus, 1980;Russell, 2000). Over the past few years, more sophisticated analysis methods have been used to search for properties related to beauty in a specific domain, namely visual art (Graham & Redies, 2010;Hoenig, 2005). In the field of computational aesthetics, computer-assisted algorithms were used to extract statistical features that characterize aesthetic images Datta, Joshi, Li, & Wang, 2006;Graham & Field, 2007;Redies, Hasenstein, & Denzler, 2007). This approach has also been employed to predict emotional responses to paintings (Yanulevskaya et al., 2012) and to categorize painting styles (Wallraven et al., 2009).
Large subsets of Western and East Asian visual artworks share the property of a nearly scale-invariant (fractal-like) Fourier spectrum (Alvarez-Ramirez, Ibarra-Valdez, Rodriguez, & Dagdug, 2008;Graham & Field, 2007, 2008Graham & Redies, 2010;Redies, 2007;Redies et al., 2007) with images of complex natural scenes (Burton & Moorhead, 1987;Field, 1987;Geisler, 2008;Ruderman & Bialek, 1994). This finding led to the hypothesis that many artists apply natural scene statistics when they create artworks (Graham & Redies, 2010;Redies, 2007). Another computational method for analyzing artworks is based on the analysis of histograms of oriented luminance gradients (HOGs; Bosch, Zisserman, & Munoz, 2008). This method allows calculating image properties such as self-similarity, complexity, and anisotropy . Results indicate that images of artworks exhibited in museums and images of natural scenes are highly self-similar, that is, subparts of the images have HOGs similar to the entire image . In other words, artworks possess a relatively high degree of self-similarity in comparison to other image categories (Braun, Amirshahi, Denzler, & Redies, 2014). In addition, large subsets of visual artworks have complexity values in an intermediate range (Braun et al., 2014). This finding is in line with the proposition by Berlyne (1974) that an intermediate level of complexity is associated with higher aesthetic appeal than low or high complexity on average, as experimentally confirmed by several studies (Forsythe, Nadal, Sheehy, Cela-Conde, & Sawey, 2011;Nadal, 2007). Furthermore, colored artworks are, in general, highly isotropic, that is, they contain luminance gradients of similar strength across all orientations (Braun et al., 2014;Koch, Denzler, & Redies, 2010;Melmer, Amirshahi, Koch, Denzler, & Redies, 2013). In our study, we measured the following SIPs: PHOG (Pyramid of HOGs) Self-Similarity, HOG Complexity, HOG Anisotropy, Aspect Ratio, Rule of Thirds, and various color measures. For an exact definition of these measures, see the Methods section. To measure these properties in images of artworks, we used the JenAesthetics database, which was introduced by . The data set consists of over 1,600 highquality images of oil paintings of Western provenance. We used this type of artworks because it has been very popular over several centuries and thereby offers the opportunity to compare different art periods. For the rating experiment, we excluded more recent oil paintings (i.e., paintings with a year of origin later than 1935) and thereby eliminated paintings that are not intended to be visually pleasing. Furthermore, a previous study showed that representational paintings can be associated with more positive judgments on the dimensions of form, complexity, and regularity, as compared with abstract paintings (Markovic´, 2011).
A major concern regarding the analysis of the SIPs in this database was the age of some of the paintings, which can result in conservation artifacts, for example, a brownish film of varnish that may partially obscure color and luminance detail in the paintings (Bonaduce et al., 2012). We analyzed paintings in their present condition, in which museums made them available to the Google Art Project.
The subjective rating scores of the paintings have been obtained and analyzed in a previous study (Amirshahi, Hayn-Leichsenring, Denzler, & Redies, 2014a). Each image in this database was subjectively rated according to its aesthetics and its beauty. Aesthetics reflected the (''more objective'') artistic value of the respective image while beauty stands for the ''subjective'' liking by the individual participant. Given that there is some uncertainty with regard to the terminology , the terms were explicitly defined for the participants of the experiment (see section ''Gaining of Subjective Rating Scores''). Hedonic value is used in the manuscript as a superordinate term for aesthetics and beauty. The present work extends the study by Amirshahi et al. (2014a) (a) by performing a deeper analysis of the subjective rating scores, (b) by analyzing the SIPs of the paintings, and (c) by investigating the relation between SIPs and the rating scores of the paintings.
Besides the search for universal aesthetic features, another interesting research topic is the identification of statistical properties that are characteristic for particular art periods. Marchenko, Tat-Seng, and Irina (2005) used computer algorithms based on the measurement of color temperature (warm or cold), color palette (primary or complimentary), and color contrasts (light or dark) to distinguish modern art from medieval art. Therefore, low-level statistics can provide informative cues about art periods. In the so-called ontology-based disambiguation method, Leslie, Chua, and Ramesh (2007) combined analyses of color measures and brush strokes with semantic high-level concepts to distinguish art periods; they achieved a higher performance than with the approach by Marchenko et al. The influence of higher level in categorization strategies has also been demonstrated by Wallraven et al. (2009). Hence, we did not only focus on the investigation of possible universal features of art images but expanded our effort toward an investigation of the usefulness of SIPs for allowing to differentiate art periods and subject matters.
In previous studies, rating scores on art paintings have been linked to interindividual differences between participants, such as personality traits (Furnham & Walker, 2001;Lyssenko, Redies, & Hayn-Leichsenring, 2016), expertise (Aluja, Garcia, & Garcia, 2004;Leder, Ring, & Dressler, 2013), demographic variables (Furnham & Walker, 2001) and other personal characteristics. For example, Mallon, Redies and Hayn-Leichsenring (2014) found that preferences differed between subgroups of participants depending on the SIPs of abstract paintings. Although the clustering of the participants was performed exclusively based on subjective evaluations, 46% of the clustering's outcome was predicted by SIPs of the evaluated paintings. Gu¨clu¨tu¨rk, Jacobs, and van Lier (2016) described that two clusters of participants differed in their liking of complexity in digital images. One group of participants showed increasingly lower liking rates for increasingly more complex images while another group showed the opposite pattern of preference. Bies, Blanc-Goldhammer, Boydston, Taylor, and Sereno (2016) found different preferences for clusters of participants who rated fractal patterns. These preferences correlated with specific patterns of fractal dimension, symmetry, and recursion in the stimuli. Together, these results indicate that different groups of people have different preferences for specific SIPs. Here, we investigated whether clusters of participants preferred certain SIPs over others in the images of the JenAesthetics database.
In summary, we provide a detailed statistical analysis of the subjective evaluations of the JenAesthetics database. Specifically, we connected the subjective rating scores with the SIPs. In addition, we reanalyzed the data set for individual art periods and subject matters to find out whether specific subsets of the JenAesthetics database differ in their SIPs or in the correlation between rating scores and the SIPs.

Data Set
We used the JenAesthetics data set . This data set consists of 1,628 images of colored oil-paintings of Western provenance painted by over 400 artists. For technical reasons, we used 1,614 of the images only. The JenAesthetics data set is a subset of the Google Art Project database and therefore royalty free. All images were of high resolution (image size generally more than 3 MB; e.g., see Figure 1 images).

Categorization of Paintings
As described in a technical report , each image of the JenAesthetics database was categorized according to the art period and its subject matter. The categorization of the art periods followed standard textbooks on art and information available on the Wikipedia website. The data set contains works from 11 major art periods (Renaissance, Mannerism, Baroque, Rococo, Classicism, Romanticism, Realism, Impressionism, Symbolism, Post-Impressionism, and Expressionism). The categorization of subject matter of each painting is based on a subjective classification by two independent observers. Subject matters are abstract, nearly abstract, landscapes, scenes with person(s), still life, flowers or vegetation, animals, seascape, port or coast, sky, portrait (one person), portrait (many persons), nudes, urban scenes, buildings, interior scenes, and other subject matters. Up to three subject matters were assigned to one particular painting. However, in our analysis, we used only the first (predominant) subject matter of each painting.

Gaining of Subjective Rating Scores
In the present study, we used the rating scores for the JenAesthetics database, which were obtained in a previous study (Amirshahi et al., 2014a). In this study, participants rated the paintings on aesthetic value, beauty, liking of color, liking of content, liking of composition, familiarity with the artist, and familiarity with the painting. Before the experiment, participants were instructed that ratings on aesthetics should reflect the (''more objective'') artistic value of the respective image while beauty rating scores should reflect the ''subjective'' liking. In brief, the 1,614 paintings were rated in blocks of 163 randomly chosen images. Every painting was rated by 19 to 20 participants (131 participants in total). The rating experiment was performed with images that were reduced to a size of 800 pixels on the longest side (maximum size of 205 mm on the screen). Images were presented on a black screen (BenQ T221W widescreen monitor) that had been color calibrated using a colorimeter (X-Rite EODIS3 i1Display Pro). For a more detailed description of the experimental procedure, see Amirshahi et al. (2014a).

Statistical Image Properties
For every painting, we calculated the following statistical properties using MATLAB 2008A: (1) PHOG Self-Similarity. Self-similarity implies that an object as a whole has a structure similar to its parts. In the present study, we calculated Self-Similarity using the PHOG method, as originally introduced by Bosch et al. (2008). In brief, the method follows a pyramid approach : In a first step, the HOG feature (Dalal et al., 2005) for the entire image is calculated at the ground level (Level 0). The HOG feature represents the mean strength of the luminance gradients binned in 16 equally sized orientations that cover all orientations in the image. Second, the image was divided into four rectangles of the same size (Level 1), and the HOG features were calculated for each subimage. Then, each of the four rectangles was again divided into equal rectangles (Level 2), and the HOG features were calculated for the resulting 16 subimages as well. We took this approach up to Level 3. Then, we compared the HOG features of the entire image on Level 0 with the HOG features of the rectangles on the third level using the Histogram Intersection Kernel (Barla et al., 2002). A detailed description of the method  can be found in the Appendix to Braun et al. (2014). This measure ranges from 0 (no self-similarity) to 1 (maximal selfsimilarity).
(2) HOG Complexity. Several recent studies confirmed the importance of complexity in aesthetic perception (Bies et al., 2016;Forsythe et al., 2011;Jacobsen & Hofel, 2002;Rigau, Feixas, & Sbert, 2008). Here, we defined Complexity as the total strength of all oriented gradients in a painting . (3) HOG Anisotropy. Anisotropy is a measure for the heterogeneity (variance) of luminance gradients across orientations in a particular image. High values indicate that some orientations of gradients are represented more strongly than others in the HOGs, while low values imply that the luminance gradients are uniformly distributed across all orientations . Schweinhart and Essock (2013) showed that art paintings of different subject matters (e.g., landscapes and faces) tend to be less anisotropic even though, in real-world photographs, the pattern of anisotropy differs considerably between landscapes and faces. Possibly, artists over regularize the structure in their paintings by imposing the natural-scene horizontal effect in other types of subject matter. (4) Aspect Ratio. Although there is no evidence for an overall hedonic preference of a certain format of paintings (McManus, 1980;Russell, 2000), the aspect ratio has been linked previously to the aesthetic preference of paintings depicting specific subject matters, for example, landscapes or portraits (Palmer, Schloss, & Sammartino, 2013). Therefore, we included this measure because it might be related to preferences for images from specific art periods or of specific subject matters. (5) Rule of Thirds. We measured the degree, to which images structure followed the Rule of Thirds (see Amirshahi, Hayn-Leichsenring, Denzler, & Redies, 2014b). The Rule of Thirds implies that the focus point of a painting should be placed along one of the third lines to yield aesthetically pleasing results. In the present study, the focus point was determined based on the graph-based visual saliency method, as described by Mai, Le, Niu, and Liu (2011). (6) Color measures. In addition, we calculated the three color measures of the HSV color space (color hue, color saturation, and color value), which have been regularly used in aesthetic quality assessment (e.g., see Datta et al., 2006;Palmer et al., 2013). All color measures range from 0 to 1. Color hue ranges from red (value ¼ 0), yellow, green, cyan, blue, to magenta (value ¼ 1). For color saturation, higher values stand for higher saturated images and for color hue, higher values stand for brighter images.

Clustering of the Paintings and Participants
We divided the paintings with the k-means clustering method according to their mean subjective rating scores on aesthetics and beauty. Clustering allowed allocation of paintings into three subgroups that resembled each other according to correlation patterns. This number of clusters was judged to be close to optimal based on the elbow criterion, a computational feature in plots of the sum of squared errors. We used the same method (but based on correlations of subjective ratings with SIPs) for the clustering of the participants.

Analysis of SIPs
Correlation between SIPs. The measured SIPs correlate with each other (see Supplementary  Table 1 for a detailed analysis of correlations of SIPs within the JenAesthetics database). For example, there is a high correlation between Self-Similarity and Anisotropy (r ¼ À483, p < .001). Bearing this in mind, we performed detailed statistical analyses on SIPs over paintings and over participants.

Analysis Over Paintings
Correlation of SIPs with year of origin. First, we investigated whether the SIPs of the paintings changed over the years. Therefore, we correlated the year of origin of the paintings with the SIPs. We found significant correlations for most of the SIPs, namely for Self-Similarity (Pearson's r ¼ .204, p < .001), Complexity (r ¼ .127, p < .001), Anisotropy (r ¼ .090, p < .001), Aspect Ratio (r ¼ À.182, p < .001), Rule of Thirds (r ¼ À.107, p < .001), Color Saturation (r ¼ À.287, p < .001), and Color Value (r ¼ .425, p < .001). Therefore, image statistics changed over time. More recent oil paintings are more self-similar and more complex. Also, they possess a higher degree of Anisotropy and show different colors.
SIPs in paintings from different art periods. We then analyzed whether paintings from the various art periods differed in their SIPs. The results were entered into a one-way analysis of variance (ANOVA) considering art period as between-subject factor. Results revealed a significant effect of all measured SIPs: Self-Similarity, F(10, 1603) ¼ 11.344; p < .001, Complexity, F(10, 1603) ¼ 17.685; p < .001, Anisotropy, F(10, 1603) ¼ 4.596; p < .001, Aspect Ratio, F(10, 1603) ¼ 11.742; p < .001, Rule of Thirds, F(10, 1603) ¼ 7.040; p < .001, Color Hue, F(10, 1603) ¼ 16.319; p < .001, Color Saturation, F(10, 1603) ¼ 20.184; p < .001, and Color Value, F(10, 1603) ¼ 51.913; p < .001. Therefore, all measured SIPs showed differences over the art periods analyzed (see Supplementary Table 2 for descriptive statistics and Supplementary Table 3 for results of a multivariable linear regression analysis that studied the interaction between the SIPs in one model and their affiliation to a specific art period and Supplementary Figure 1 for a plot of the LOESS curve fittings for aesthetics and beauty). In addition, we performed a multinomial multivariable regression analysis for art period with all SIPs. Results showed that all SIPs are significant predictors of the art period affiliation. For example, an increase of saturation of 0.1 increases the probability of belonging to the art period Renaissance by 79.8% as compared with the reference art period Impressionism (see Table 1 for complete results).
SIPs in paintings with different subject matters over art periods. Next, we compared the SIPs of landscape and portrait paintings between different art periods. We used landscape and portrait paintings because these subject matters were the most numerous within the JenAesthetics database and, additionally, they were also common to all art periods.
In conclusion, all SIPs show changes throughout the art periods, except for Self-Similarity, which does not change in portrait paintings, and Anisotropy and Aspect Ratio, which are unaltered in landscape paintings (see Supplementary Table 4 for the descriptive statistics on landscape paintings and portrait paintings for every art period). To detect the influence of SIPs and subject matter (independent variables) including their interaction on aesthetics (dependent variable), a multivariable linear regression model was fitted (see Supplementary  Table 5 for results and Supplementary Table 6 for descriptive statistics on all subject matters).
Subjective rating scores and year of origin of the paintings. The year of origin of the paintings correlated positively with beauty (r ¼ .168, p < .001), liking of color (r ¼ .054, p < .05), liking of content (r ¼ .306, p < .001), liking of composition (r ¼ .092, p < .001), familiarity with the artist (r ¼ .280, p < .001), and familiarity with the painting (r ¼ .050, p < .05). Interestingly, evaluation on aesthetics showed no correlation (r ¼ À.011, p ¼ .673). We performed a Fisher transform to investigate whether differences of correlations between aesthetics or year of origin and beauty or year of origin were significant. To this aim, we converted Pearson's r to Fisher's z and computed the confidence interval at 99%. Results showed a significant difference of the correlations. This is particularly striking because subjective evaluation on beauty and aesthetics (see Introduction for definition of the terms) are highly correlated (r ¼ .772, p < .001). Therefore, participants personally preferred newer paintings while the ascribed (''objective'') artistic value remained stable over time.
General analysis of subjective rating scores in different art periods. Next, we investigated general differences of subjective rating scores between the art periods. Although each art period is obviously related to the year of origin, participants might be less familiar with certain art periods than with others and, therefore, they might systematically prefer paintings from particular art periods, irrespective of their year of origin. The subjective rating scores were entered into a one-way ANOVA considering art period as between-subject factor. Results revealed a significant effect of scores on aesthetics, F(11, 1605) ¼ 18.249; p < .001, as well as on scores on beauty, F(11, 1605) ¼ 17.155; p < .001. Thus, although we found no general effect of the year of origin on aesthetics, there are still differences in appreciation for the different art periods (see Figure 2 for mean scores for the respective art periods).
Subjective rating scores on preferences for color, composition, and content correlated with several SIPs (see Table 2 for details).
Subjective rating scores and SIPs in paintings from different art periods. Next, we investigated correlations between subjective rating scores and the SIPs for different art periods. Here, we focused on the subjective rating scores for aesthetics and beauty and investigated only art periods with more than 30 paintings in the database (see Table 3). Interestingly, the number of correlations of subjective rating scores differed between art periods. There were no correlations with the SIPs for Mannerism, Classicism, Romanticism, Symbolism, and Expressionism, and few correlations for Renaissance, Realism, Impressionism, Post-Impressionism, and Expressionism. In contrast, Baroque and Rococo showed correlations of subjective rating scores with several SIPs.
These results provide evidence that the evaluation of artworks from certain art periods (Baroque, Rococo) by our observers is correlated with particular SIP patterns, while this is not the case for artworks from other art periods (Mannerism, Classicism, Romanticism, Symbolism, and Expressionism). Subjective rating scores and SIPs in paintings with different subject matters. We investigated correlations between subjective rating scores and the SIPs for different subject matters that comprised more than 30 paintings. Again, we focused on subjective rating scores for aesthetics and beauty. In paintings of landscapes, flowers or vegetation, seascapes, and portraits with many persons, there were no correlations of subjective rating scores with the SIPs, while paintings with other subject matters showed at least some correlations. Particularly, the hedonic evaluation of paintings of buildings seems to be connected with SIPs (especially with Self-Similarity, Color Saturation, and Color Value; see Table 4 for  detailed results). We conclude that hedonic ratings correlate with SIPs for artworks of certain subject matters only.
Clustering based on subjective rating scores on aesthetics and beauty. To gain insight into the evaluations on artistic value (aesthetics) and subjective liking (beauty) and their relation to SIPs, we clustered the paintings (for a description of the method, see Methods section). We calculated the standard deviation of the mean values of each image from the mean of each cluster for two to six clusters. An elbow criterion provided evidence that three is the optimal number of clusters. The clusters consisted of 750, 481, and 383 paintings, respectively. While Cluster 1 (aesthetics: M ¼ .56, beauty: M ¼ .50) and Cluster 3 (aesthetics: M ¼ .44, beauty: M ¼ .37) had significantly higher ratings on aesthetics than on beauty, Cluster 2 (aesthetics: M ¼ .67, beauty: M ¼ .66) consisted of paintings that had a similar mean value of ratings on aesthetics and beauty. Next, we asked whether the three clusters had different mean values of SIPs (see Table 5 for results). We found significant differences of mean values between Clusters 1 and 2 for Anisotropy (two-samples t test:  paintings in Cluster 2 (for which subjective rating scores of aesthetics and beauty had a similar mean value) differ in their mean value for several SIPs. In addition, we performed a stepwise (backward elimination) multivariable linear regression analysis over all paintings (from all clusters). The mathematical difference aesthetics minus beauty was considered as dependent variable while standardized (ztransformed) values for SIPs were considered independent variables. Results were controlled with the post hoc Holm-Bonferroni method. Four of the SIPs predicted the difference between subjective ratings, F(4, 1609) ¼ 67.410; p < .001. These were Color Value (b ¼ À.238, t ¼ À9.812, p < .001), Aspect Ratio (b ¼ .191, t ¼ 7.958, p < .001), Color Hue (b ¼ À.092, t ¼ À3.931, p < .001), and Anisotropy (b ¼ À.072, t ¼ 3.080, p ¼ .01). We conclude that differences in evaluation on aesthetics and beauty-or the ascribed artistic value and the subjective liking-might depend on SIPs.
After clustering, 29.8% of the paintings were assigned to Cluster 2. However, when analyzed over art periods, we found that the percentage of paintings assigned to Cluster 2 differed. For Renaissance (C2 ¼ 7.2%), Mannerism (C2 ¼ 5.5%), and Expressionism (C2 ¼ 3.1%), there were only few paintings in Cluster 2, while for Romanticism (C2 ¼ 43.7%) and for Impressionism (C2 ¼ 47.1%), Cluster 2 comprised numerous paintings. Therefore, the artistic value and subjective liking that participants ascribed to romantic and impressionist paintings were similar. Participants ascribed artistic value to paintings from Renaissance, Mannerism, and Expressionism although they did not like these paintings subjectively to the same degree (see Figure 3(a) for detailed results).

Analysis Over Participants
General analysis. For the analysis over participants, we focused solely on the subjective rating scores on beauty (reflecting the individual liking of the images), because we were interested in the participants' personal taste and not in what they considered to be generally aesthetic. Overall, the mean of the subjective rating scores on beauty was M ¼ .416 (SD ¼ .338). Not surprisingly, participants that reported interest in arts gave higher ratings than participants that did not report interest in arts (interested: M ¼ .431, SD ¼ .081; noninterested: M ¼ .386, SD ¼ .015; two-samples t test: T(116) ¼ 2.447, p < .05). Clustering over SIPs. We calculated Pearson's correlation coefficient for subjective rating scores on beauty with SIPs of the rated paintings for every single participant (see Supplementary  Table 7 for a complete analysis). As the correlations were heterogeneous among the participants, that is, groups of participants showed a particular pattern of correlations with certain SIPs, we divided the participants with the k-means clustering method according to their respective correlation pattern. For the clustering, we calculated the standard deviation of the mean values of each participant from the mean of each cluster for two to seven clusters. Then, we calculated the sum of squares for two to seven clusters (SS 2 ¼ .066, SS 3 ¼ .054, SS 4 ¼ .052, SS 5 ¼ .046, SS 6 ¼ .042, SS 7 ¼ .039). The elbow criterion provided evidence that a number of three clusters is optimal. These three clusters consisted of 29, 37, and 65 participants, respectively.
To further justify our clustering, we calculated the mean correlations of subjective rating scores on beauty with SIPs for each cluster. Participants in two of three clusters (Clusters 1 and 2) showed a strong relation between subjective rating scores and SIPs, while SIPs had only a small effect on subjective rating scores of participants in Cluster 3 (see Figure 4 for results).

Statistical Image Properties
One of the central questions in experimental aesthetics is whether there is a ''universal beauty'' in artworks, natural scenes, and faces. In recent years, this question has been studied by novel computational methods that allow measuring specific image properties. In this context, the comparison between aesthetic and ordinary (nonaesthetic) images is of particular interest. For example, Braun et al. (2014) investigated SIPs in different categories of images. They showed that artworks, as a category of images that are created to be aesthetic, exhibit a relatively high degree of Self-Similarity, a low degree of Anisotropy, and an intermediate degree of Complexity. In these statistical measures, art paintings differ from other image categories like photographs of natural scenes, urban scenes, faces, and simple objects, as well as advertisements, on average. Here, we investigated the SIPs of oil paintings. We asked whether SIPs differed between art periods and between depicted subject matters in this set of paintings. Therefore, we investigated these subgroups of paintings separately. Our results show that, for each SIP, there are significant differences between art periods (see SIPs in paintings from different art periods section). Therefore, we did not obtain evidence that some SIPs are stable over all art periods investigated. However, in a more detailed analysis, we found that Anisotropy did not differ significantly over landscape paintings while it differs between art periods in portrait paintings. Conceivably, Anisotropy is uniformly high in landscape paintings because horizontal orientations (e.g., horizon) and vertical orientations (e.g., trees) predominate in real-natural scenes. The opposite pattern was observed for PHOG Self-Similarity, which differed in landscape paintings over art periods but did not differ over portrait paintings (see SIPs in paintings with different subject matters over art periods section). Our results are similar to findings by Redies et al. (2007) who showed that the Fourier slopes of grayscale art portraits did not resemble those of face photographs but of natural scenes. They concluded that artists portray faces not by mere copying of their real-world counterparts but by using specific divergent image statistics. In the present study, we provide another example for the usage of particular image properties (i.e., PHOG Self-Similarity) in artistic renderings of faces.
Fourier slope and Self-Similarity are correlated in artworks (Braun et al., 2014). Not only do artists portray faces with statistics divergent from real-world faces, but they also use relatively stable statistics to do so. Overall, our results provide evidence that artists from all art periods endow oil paintings of particular subject matters with similar image properties.

Subjective Rating Scores
The terminology that relates to aesthetic experience has been widely discussed in aesthetics research . Here, we distinguish between aesthetics and beauty ratings. Participants were instructed that ratings on aesthetics should reflect the (''more objective'') artistic value of the respective image while beauty rating scores should reflect the subjective liking (see Introduction and Gaining of Subjective Rating Scores sections). Rating scores differed between art periods, with classicist paintings evaluated as most aesthetic and impressionist paintings evaluated as most beautiful (see Figure 2). Overall, participants subjectively preferred more recent paintings while their rating of ascribed artistic value (i.e., aesthetics) was relatively stable over the centuries (see Subjective rating scores and year of origins of the paintings section). Consequently, we propose that, on the one hand, contemporary observers prefer more recent oil paintings, possibly because they are more familiar with them. On the other hand, the observers appreciated the artistic expertise of painters in the different periods to the same degree. This is in line with the notion that the skills of artists are more or less stable on average over the centuries. Unlike artistic skills, the taste of individual observers changes over time and, therefore, contemporary participants prefer more modern paintings in general. This preference might be based on a mere exposure effect for more recent paintings or, alternatively, on a shared preference for similar semantic content. Previously, it had been shown that visual preferences can be based on the semantic content of stimuli and shared semantic interpretations can lead to shared preferences (Vessel & Rubin, 2010).
Interestingly, ratings on aesthetics and beauty were quite similar for impressionist paintings. Therefore, impressionist paintings are valued artistically (objectively) to the same degree as subjectively. Again, this result may be explained by the greater familiarity of the observers with impressionist paintings in comparison to paintings from other art periods.
Focusing on the subject matter, ratings on aesthetics and beauty were similar in paintings of large-vista scenes (like landscapes, seascapes, urban scenes, and buildings). Interestingly, for some subject matters, aesthetics scores were higher than subjective liking (beauty) scores, especially for portrait and animal paintings. This difference might be explained by the content of the images. For example, in portrait paintings, participants might appreciate their artistic value, but they do not like the image subjectively, possibly because the liking or disliking of the displayed person might have an effect on this rating. Last but not least, no subset of images was rated as highly beautiful but not as aesthetic.

Subjective Rating Scores and SIPs
It has been shown that specific SIPs are related to the hedonic value of abstract art paintings (Mallon et al., 2014). In the present, more detailed study, we show that rating scores on artistic value correspond to a slightly greater extent with SIPs than rating scores on subjective liking (see Table 1). It is not surprising that the more objective ratings on aesthetics correlate stronger with objective image properties, such as specific SIPs.
In addition, we clustered paintings according to their ratings on aesthetics and beauty. We found that differences in rating scores correlated with specific SIPs, especially with Anisotropy, Rule of Thirds, Color Saturation, and Color Value (see Clustering based on subjective rating scores on aesthetics and beauty section). This result points to an interaction between objective properties (SIPs) and the subjective evaluation of the images. However, these differences might also be explained by other factors (e.g., preference for specific contents or styles that coincide with certain SIPs in the paintings). In addition, this finding does not hold for all art periods because hedonic evaluation is not correlated with SIPs for Mannerism, Romanticism, and Symbolism. Hence, the subjective liking of paintings from these periods must be driven by other factors.
We observed similar differences for subject matters (see Table 3). Here, we found that ratings of each subject matter correlated-at least weakly-with specific SIPs. Especially the rating of buildings showed relatively high correlations with Self-Similarity, Color Saturation, and Color Value.

Analysis Over Participants
In addition to our analysis of paintings, we also searched for similarities in the evaluations by the participants. In previous research, rating scores on art have been linked to expertise (Leder et al., 2013), personality traits (Lyssenko et al., 2016), and other characteristics of participants. Furthermore, it has been demonstrated that individuals exhibit stable patterns of preference for fractal-like characteristics across different image types (Spehar, Walker, & Taylor, 2016). In the present study, we focused on preferences for SIPs in groups of participants. We analyzed three clusters of participants. Affiliation to a certain cluster reflects a specific rating pattern that correlates, in turn, with preferences for images with specific SIPs. Two of the clusters showed multiple correlations of rating scores on beauty with particular SIPs (Figure 4). In Clusters 1 and 2, Self-Similarity, Complexity, Anisotropy, Color Saturation, and Color Value of the painting had an effect on the subjective preference, while in Cluster 3, the SIPs were not correlated with preferences (see Clustering over SIPs section). Hence, about two thirds of the participants were (perhaps unconsciously) sensitive to image statistics. A possible reason for this finding is that paintings of similar content or art style have similar image statistics. Therefore, a coherent taste may coincide with a preference for similar image statistics. Notably, the third group of participants showed only very few correlations of subjective ratings with SIPs. Perhaps, these participants possessed a rather incoherent taste or, possibly, a taste for different image features or statistical properties that have not been measured in the present study. Alternatively, their preference for paintings might be driven more by cognitive than by sensory factors, that is, these participants possibly focus more on image content than on artistic composition. It will be of interest to study the differences between such groups of participants in future research in more detail.

Limitations
In the presented study, we used images of oil paintings as stimuli and, therefore, we did not show real (original) artworks but representations of artworks. This difference may have an effect on the hedonic ratings (Brieber, Leder, & Nadal, 2015). Furthermore, the JenAesthetics database consists of a preselected group of high-quality oil paintings. Hence, the database includes a large proportion of images of rather similar quality. Any differences in aesthetic ratings of these images may be relatively small, and therefore the aesthetic ratings may be rather stable across art styles and content matter. In addition, the analysis of ratings on aesthetics and beauty strongly depends from a proper understanding of these terms by the participants. If participants understood the terms wrongly, the conclusions drawn would be impaired.

Conclusion
The analysis did not reveal evidence for universal image properties that are systematically linked to a higher aesthetic value in our sample of high-quality paintings. Instead, paintings from every art period show specific patterns of SIPs. As an exception, art portraits possess similar values of Self-Similarity over art periods. In an analysis of subjective rating scores, we found differences of ratings on artistic value (aesthetics) and individual liking (beauty). These differences in ratings were linked to SIPs, to the art period and to the time of origin of the paintings. Last but not least, we showed that groups of participants varied systematically in their hedonic preferences.
In summary, our study provides evidence that, to some extent, SIPs vary between art periods and subject matters and, in addition, they can be correlated with the subjective evaluation of paintings in a majority of the participants.

Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article.