Cross-cultural variation of memory colors of familiar objects

The effect of cross-regional or cross-cultural differences on color appearance ratings and memory colors of familiar objects was investigated in seven different countries/regions – Belgium, Hungary, Brazil, Colombia, Taiwan, China and Iran. In each region the familiar objects were presented on a calibrated monitor in over 100 different colors to a test panel of observers that were asked to rate the similarity of the presented object color with respect to what they thought the object looks like in reality (memory color). For each object and region the mean observer ratings were modeled by a bivariate Gaussian function. A statistical analysis showed significant (p < 0.001) differences between the region average observers and the global average observer obtained by pooling the data from all regions. However, the effect size of geographical region or culture was found to be small. In fact, the differences between the region average observers and the global average observer were found to of the same magnitude or smaller than the typical within region inter-observer variability. Thus, although statistical differences in color appearance ratings and memory between regions were found, regional impact is not likely to be of practical importance. ©2014 Optical Society of America OCIS codes: (330.1690) Color; (330.1720) Color vision; (330.5510) Psychophysics; (330.4060) Vision modeling. (330.5020) Perception psychology References and links 1. Hering, Grundzüge der Lehre vom Lichtsinn (Springer-Verlag, 1920). 2. C. J. Bartleson, “Memory colors of familiar objects,” J. Opt. Soc. Am. 50(1), 73–77 (1960). 3. S. M. Newhall, R. W. Burnham, and J. R. Clark, “Comparison of successive with simultaneous color matching,” J. Opt. Soc. Am. 47(1), 43–54 (1957). 4. K. A. G. Smet, W. R. Ryckaert, M. R. Pointer, G. Deconinck, and P. Hanselaer, “Color appearance rating of familiar real objects,” Color Res. Appl. 36(3), 192–200 (2011). 5. M. Vurro, Y. Z. Ling, and A. C. Hurlbert, “Memory color of natural familiar objects: Effects of surface texture and 3-D shape,” J. Vis. 13(7), 20 (2013). 6. P. Siple and R. M. Springer, “Memory and preference for the colors of objects,” Percept. Psychophys. 34(4), 363–370 (1983). 7. J. Pérez-Carpinell, M. D. de Fez, R. Baldoví, and J. C. Soriano, “Familiar objects and memory color,” Color Res. Appl. 23, 416–427 (1998). 8. S. N. Yendrikhovskij, F. J. J. Blommaert, and H. de Ridder, “Representation of memory prototype for an object color,” Color Res. Appl. 24(6), 393–410 (1999). #221779 $15.00 USD Received 14 Oct 2014; revised 3 Dec 2014; accepted 3 Dec 2014; published 23 Dec 2014 (C) 2014 OSA 29 Dec 2014 | Vol. 22, No. 26 | DOI:10.1364/OE.22.032308 | OPTICS EXPRESS 32308 9. C. J. Bartleson, “Color in memory in relation to photographic reproduction,” Photon. Sci. Eng. 5, 327–331 (1961). 10. C. L. Sanders, “Color preferences for natural objects,” J. Illum. Eng. 54, 452–456 (1959). 11. D. B. Judd, “A flattery index for artificial illuminants,” J. Illum. Eng. 62, 593–598 (1967). 12. W. A. Thornton, “A Validation of the Color-Preference Index,” J. Illum. Eng. 4(1), 48–52 (1974). 13. C. J. Bartleson and C. P. Bray, “On the preferred reproduction of flesh, blue-sky, and green-grass colors,” Photon. Sci. Eng. 6, 19–25 (1962). 14. J. J. M. Granzier and K. R. Gegenfurtner, “Effects of memory color on color constancy for unknown colored objects,” J. Illum. Eng. 3, 190–215 (2012). 15. Y. Ling, “The color perception of natural objects: familiarity, constancy and memory.,” in School of Biology and Psychology(University of Newcastle, Newcastle upon Tyne: 2005), p. 173. 16. E. Kanematsu and D. H. Brainard, “No measured effect of a familiar contextual object on color constancy,” Color Res. Appl. 39, 347–359 (2013). 17. A. C. Hurlbert and Y. Ling, “If it's a banana, it must be yellow: The role of memory colors in color constancy,” J. Vis. 5(8), 787 (2005). 18. T. Hansen, M. Olkkonen, S. Walter, and K. R. Gegenfurtner, “Memory modulates color appearance,” Nat. Neurosci. 9(11), 1367–1368 (2006). 19. J. S. Bruner, L. Postman, and J. Rodrigues, “Expectation and the perception of color,” Am. J. Psychol. 64(2), 216–227 (1951). 20. K. Duncker, “The influence of past experience upon perceptual properties,” Am. J. Psychol. 52(2), 255–265 (1939). 21. C. J. Bartleson, “Color in memory in relation to photographic reproduction,” Photograph. Sci. Eng. 5, 327–331 (1961). 22. S. N. Yendrikhovskij, F. J. J. Blommaert, and H. Ridder, “Color reproduction and the naturalness constraint,” Color Res. Appl. 24(1), 52–67 (1999). 23. P. Bodrogi and T. Tarczali, “Color memory for various sky, skin, and plant colors: Effect of the image context,” Color Res. Appl. 26(4), 278–289 (2001). 24. P. Bodrogi and T. Tarczali, “Investigation of Color Memory,” in Color Image Science: Exploiting Digital Media, L. W. MacDonald, and M. R. Luo, eds. (John Wiley & Sons Limited, 2002), pp. 23–48. 25. H. Zeng and R. Luo, “Modeling memory color region for preference color reproduction,” in SPIE: Color Imaging XV: Displaying, Processing, Hardcopy, and Applications (2010), pp. 752808–752808–752811. 26. T. Yano and K. Hashimoto, “Preference Index for Japanese Complexion Color under Illumination,” J. Light Vis. Env. 22, 54 (1998). 27. C. Boust, H. Brettel, F. Viénot, G. Alquié, and S. Berche, “Color enhancement of digital images by experts and preference judgments by observers,” J. Imaging Sci. 50(1), 1–11 (2006). 28. S. Xue, M. Tan, A. McNamara, J. Dorsey, and H. Rushmeier, “Exploring the use of memory colors for image enhancement,” in SPIE: Human Vision and Electronic Imaging XIX (2014), pp. 901411–901411–901410. 29. C. L. Sanders, “Assessment of color rendition under an iIlluminant using color tolerances for natural objects,” J. Illum. Eng. 54, 640–646 (1959). 30. K. A. G. Smet, W. R. Ryckaert, M. R. Pointer, G. Deconinck, and P. Hanselaer, “A memory color quality metric for white light sources,” Energy Build. 49, 216–225 (2012). 31. T. Tarczali, D.-S. Park, P. Bodrogi, and C. Y. Kim, “Long-term memory colors of Korean and Hungarian observers,” Color Res. Appl. 31(3), 176–183 (2006). 32. S. Fernandez, M. D. Fairchild, and K. Braun, “Analysis of observer and cultural variability while generating “preferred” color reproductions of pictorial images,” J. Imaging Sci. Technol. 49, 96–104 (2005). 33. C. Sik-Lányi, “Styles or cultural background does influence the colors of virtual reality games?” Acta Polytechnica Hungarica 11, 97–119 (2014). 34. L.-C. Ou, M. Ronnier Luo, P.-L. Sun, N.-C. Hu, H.-S. Chen, S.-S. Guan, A. Woodcock, J. L. Caivano, R. Huertas, A. Treméau, M. Billger, H. Izadan, and K. Richter, “A cross-cultural comparison of color emotion for two-color combinations,” Color Res. Appl. 37(1), 23–43 (2012). 35. X.-P. Gao, J. H. Xin, T. Sato, A. Hansuebsai, M. Scalzo, K. Kajiwara, S.-S. Guan, J. Valldeperas, M. J. Lis, and M. Billger, “Analysis of cross-cultural color emotion,” Color Res. Appl. 32(3), 223–229 (2007). 36. M. Saito, “Comparative studies on color preference in Japan and other Asian regions, with special emphasis on the preference for white,” Color Res. Appl. 21(1), 35–49 (1996). 37. A. Choungourian, “Color preferences and cultural variation,” Percept. Mot. Skills 26(3), 1203–1206 (1968). 38. S. Shoyama, Y. Tochihara, and J. Kim, “Japanese and Korean ideas about clothing colors for elderly people: Intercountry and intergenerational differences,” Color Res. Appl. 28(2), 139–150 (2003). 39. CIE16x-2004, “A review of chromatic adaptation transforms,” (CIE, Vienna, 2004). 40. M. Melgosa, P. A. García, L. Gómez-Robledo, R. Shamey, D. Hinks, G. Cui, and M. R. Luo, “Notes on the application of the standardized residual sum of squares index for the assessment of intraand inter-observer variability in color-difference experiments,” J. Opt. Soc. Am. A 28(5), 949–953 (2011). 41. H. Wang, G. Cui, M. R. Luo, and H. Xu, “Evaluation of color-difference formulae for different color-difference magnitudes,” Color Res. Appl. 37(5), 316–325 (2012). 42. P. E. Shrout and J. L. Fleiss, “Intraclass correlations: Uses in assessing rater reliability,” Psychol. Bull. 86(2), 420–428 (1979). #221779 $15.00 USD Received 14 Oct 2014; revised 3 Dec 2014; accepted 3 Dec 2014; published 23 Dec 2014 (C) 2014 OSA 29 Dec 2014 | Vol. 22, No. 26 | DOI:10.1364/OE.22.032308 | OPTICS EXPRESS 32309 43. H. Motulsky and A. Christopoulos, Fitting Models to Biological Data using Linear and Nonlinear Regression: A Practical Guide to Curve Fitting (Oxford University, 2004).


Introduction
The concept of memory color refers to the color associated with a familiar object in (longterm) memory; or, as Hering stated in the late 19th century, "the color in which we have most consistently seen the external object" and which is "impressed indelibly on our memory" [1].It should be distinguished from color memory, which is the ability to recollect colors in general.
Memory colors are however not perfect mental representations of the original object colors.Bartleson [2] and Newhall, Burnham and Clark [3] found that saturation and brightness tended to increase in memory colors and that hue tended to shift towards the dominant hue within the object for some objects.Smet, Ryckaert, Pointer, Deconinck and Hanselaer [4], while investigating color appearance tolerances for familiar objects, reported similar increases in saturation for memory colors and shifts towards the dominant hue for most familiar objects.Vurro, Ling and Hurlbert [5] also found hue shifts in memory colors of natural objects, but these were not systematically towards the dominant hue of the object.They also found that hue shifts were reduced by increasing the naturalness of the stimuli.Siple and Springer [6] confirmed the tendency for saturation increase, but reported quite accurate agreement for brightness and hue.In a study by Pérez-Carpinell, de Fez, Baldoví and Soriano [7] memory saturation only increased for high purity objects, while it decreased or remained the same for midrange or low purity objects.They also reported unsystematic hue shifts specific to the familiar object investigated.Memory color saturation was also higher for the familiar object -a yellow banana -in the study by Yendrikhovskij, Blommaert and de Ridder [8].
Studies on preferred object colors have also reported saturation increases with respect to the actual object colors [6,[9][10][11][12].The preferred color of an object is not necessarily identical to its memory color [13], although Siple and Springer [6] could not identify any significant differences between memory and preferred colors for food objects.
Memory and preferred colors have long been of interest to many different areas of color research.They have been investigated as a possible mechanism to improve the color constancy of other objects, as they provide cues to the visual system to help estimate the illumination.Although, Granzier and Gegenfurtner [14] reported a small improvement, neither Ling [15], nor Kanematsu and Brainard [16] identified such effect.Ling [17], as well as others [18][19][20] did however report an influence of the memory color on the perceived color of the familiar object itself, consistent with Hering's statement that "All objects that are already known to us from experience, or that we regard as familiar by their color, we see through the spectacles of memory color."[1].
When using memory colors as a reference, an important question to answer is: "Are memory colors geographically dependent?".Although, there are quite a few studies available investigating cross-regional or cross-cultural influences on color perception or color preference [31][32][33][34][35][36][37][38], the number of them directly related to memory colors is rather limited.In fact only one [31] was found in literature, but memory colors for only two regions -Central-Europe (Hungary) and South-East Asia (Korea) -were investigated.Statistically significant differences were found for many of the memory colors.
The current study is an attempt to contribute to the limited literature on the subject of cross-regional variation of memory colors.The memory colors of a set of 11 familiar objects -and the observer response to any color deviation from them -was determined and analyzed for test subjects from seven different countries/regions.

Methods
An international collaboration was set up to study cross-regional (or cross-cultural) variation of memory colors.Seven different laboratories located in respectively Belgium, Hungary, Brazil (State of São Paulo), Colombia (Cundinamarca), China (Shanghai), Taiwan and Irancovering a large portion of the globe -participated in this study.Eleven familiar objects covering the entire hue circle were selected.Each laboratory determined the memory colors, and the response to a deviation therefrom, using an identical experimental setup and procedure.The experiments were performed in each laboratory on a carefully calibrated monitor.
The following sections describe the choice of familiar objects, the observer panels and the experimental setup and procedure in more detail.
They were specifically chosen to cover the entire hue circle and because of their familiarity across cultures.Although, it should be noted that a 'smurf®' and 'dried lavender' were found to be unfamiliar objects by the test subjects in Iran (the lab in Iran only joined the collaboration at a later stage), hence no Iranian memory colors could be determined for these objects.

Test panels
Only color normal test subjects were included in this study.Color deficiency was examined using either Ishihara plates, the Farnsworth-Munsell 100 Hue test or the Farnsworth D15 test following the guidelines specific to each test.In addition to the color normality requirement, test subjects had to be familiar with the object (color) they would be presented with.The goal was to gather visual data from a test panel of 15 observers or more and with an approximate 50/50 male-to-female ratio.Note that the latter varied substantially across laboratories.A total of 280 unique observers participated globally: 28 in Belgium, 14 in Hungary, 20 in Brazil, 99 in Colombia, 19 in Taiwan, 42 in China and 71 in Iran.
The average and minimum number of unique participants per object, the number of total unique observers, the average male-to-female ratio and average age of the test panels for each lab are summarized in Table 1., R u v function provides a convenient description of the observer response to a deviation from the memory color.
The subscript '10' denotes the use of the CIE 1964 (10°) observer in the calculation of the chromaticity coordinates.

Experimental setup
Smet, Ryckaert, Pointer, Deconinck and Hanselaer [4] performed color appearance rating experiments using real objects presented in a special viewing booth.The color of a familiar object was changed by illuminating it with various settings of RGBY LEDs.The design of the viewing booth masked all cues to the color of the illumination, providing the illusion that the familiar object itself changed color.
However, the construction of such a viewing booth at several locations around the world would be costly and unpractical.Therefore, in this study the familiar objects were presented on a carefully calibrated monitor.
Experiments were performed in a fully darkened room with the monitor as the only source of light.Observers were seated approximately 80 cm from the monitor.Stimuli were presented using a software package especially written for this study.The stimuli were displayed at the center of the monitor and surrounded by a white background to ensure a constant adaptation state to the monitor white point.It is composed of a monitor calibration and stimulus presentation program.The experiment software was sent to each laboratory along with specific instructions to set up the experiment using their own monitor.Care was taken to have the experimental conditions across the different laboratories as identical as possible.

Monitor calibration
The monitor white point was set as close as possible to a D65 chromaticity (u' 10 ,v' 10 = 0.1979, 0.4695) at a luminance of Y 10 = 200 cd/m 2 .During monitor calibration, a set of RGB stimuli of approximately the same size as the familiar object stimuli was presented in the center of the screen at the location.After spectral measurement and calculation of the XYZ 10 tristimulus values of the RGB stimuli, the monitor calibration software generated a set of calibration parameters.These parameters included black point, white point, tone response curves and 3 × 3 matrices to go from RGB to XYZ and back.The R, G and B tone response curves were obtained with respect to the CIE 1964 L, M and S cone responses, as preliminary tests with several LCD monitors had shown to give better color accuracy than the usual luminance or principal components approach.The stimulus presentation software then used those parameters to present on each monitor, within its calibration accuracy and monitor gamut, the same set of colored stimuli for a familiar object.The accuracy of the calibration was assessed by generating 40 random test colors within the monitor gamut at three distinct uniformly spaced luminance levels and calculating the average and maximum ΔE* lab color difference between the target stimuli and the (spectrally) measured stimuli.Some of the details of the calibration, like monitor type, white point setting, luminance, mean and maximum color difference are summarized in Table 2.The monitor gamuts and white points, plotted in the CIE 1964 chromaticity diagram, for the monitors used by the laboratories are shown in Fig. 2.

Stimulus presentation and rating
At each laboratory color appearance ratings were collected for each familiar object using the same stimulus presentation program.Familiar objects were presented in the center of the screen in a large number of different colors approximately uniformly spaced in the CIE 1976 u'v' chromaticity diagram, while keeping the luminance nearly constant.Based on preliminary experiments, the extent of grid of test points was chosen such that object colors rated 'acceptable-to-very good' were maximally surrounded by those rated 'very bad', as this minimizes possible bias when fitting the bivariate Gaussian models to the observer ratings [4].During stimulus presentation, only those areas (pixels) associated with the prototypical color of the familiar objects were changed by using predetermined template images that identified the pixels to be altered.The original luminance values of the familiar object images were kept intact and only the chromaticity (of the target pixels) was changed.Two examples of a typical screen as seen by a test subject during the experiment are shown in Fig. 3.The large white area surrounding the familiar object was to ensure adequate and constant adaptation to a D65 chromaticity by minimizing adaptation to the stimulus itself.To further avoid the latter, test subjects were also instructed NOT to stare at the object.The average luminance values of the experimental stimulus grids are given in Table 3.The stimuli coordinates were also corrected for the slight deviation of the monitor white point from target D65 chromaticity by the CAT02 chromatic adaptation transformation [39].The exact number of presented stimuli varied from object to object and from laboratory to laboratory, as only stimuli within the laboratory's monitor gamut were selected.They are listed in Table 4.The average number of ratings per object, whereby each observer rated each object color only once per session, was 165 ± 24.Taking into account the number of test subjects that participated, a total of over 210000 ratings have been made during the course of this study, with an average of about 30000 per region.For each stimulus the test subject rated the color appearance with respect to how he/she thought the familiar object looks like in reality by clicking on a continuous graphical rating scale presented below the familiar object (see Fig. 3).
The assessment of one object took about 15 -20 minutes, not including instructions.Test subjects were allowed to rate more than one object a day.However, repeats -to asses intra observer variability -were performed on separate days.

Observer variability
Intra-observer variability was assessed with STRESS, the Standardized-Residual-Sum-of-Squares [40].Higher values indicate higher variability (less agreement).The intra-observer variability was obtained by having 2 or more observers repeat a color appearance rating experiment one or more times on separate days.First, for each observer (and object) a STRESS value was calculated between the individual repeated rating sets and their mean.Secondly, a general STRESS value was obtained by averaging the former across observers and objects.Note that not all laboratories had intra-observer data for all objects: BE, BR, TW and CN had 0 missing, CO and HU had respectively 1 and 5 missing, while IR had only one intra-observer data set.The general average intra-observer variability values and their standard deviations are given in Table 5.The values ranged from 0.17 to 0.26, with an average of 0.22 ± 0.03.The STRESS values, typically also found in color discrimination studies, indicate a satisfactory agreement between individual observer results obtained on separate days.The degree of intra-observer variability was mostly very similar for all familiar objects tested, as is shown by the generally small standard deviations in Table 5. IR reported data for only one object, a ripe apple.Therefore no standard deviation could be calculated.
Inter-observer variability was also assessed with the STRESS measure.For each object, it was calculated between the mean (across repeats) individual observer ratings and the ratings of the average observer (mean across individual observers).The mean (across objects) STRESS values for the different regions are shown in Table 5.The values range from 0.27 to 0.41, with an average of 0.36 ± 0.04.These STRESS values are typical for inter-observer variation in color difference studies [40,41], which is remarkably good considering the test subjects did not rate with respect to a single reference stimulus, but to his/her own nonphysical memory color.The degree of inter-observer variability was quite similar for the different objects, as indicated by the relatively small standard deviations.The STRESS results also show the inter-observer variability to be typically about 1.7 times larger than the intraobserver variation.
In addition to the STRESS value, inter-observer variability was also evaluated by calculating the ICC(2,n) Intraclass Correlation Coefficient [42], which expresses the reliability of the concept of an average observer based on the ratings of a limited number of individual observers.
While the good inter-observer variability indicated the validity to calculate an average observer to represent the panel of individual observers, the "excellent" (≥ 0.90) ICC (2,n) values -shown in Table 5 -validate the extension to a more general average observer representing the population from which the test panel was randomly drawn.In other words, the average observer obtained by taking the mean of the individual observer ratings should not only be representative of the test panel, but also of the entire population (region/culture).

Modeling the color appearance ratings
First, for each region and each object an average observer was calculated by taking the mean of the individual observer ratings.An example of the distribution in the CIE u'v' chromaticity diagram of the average observer ratings for each region for the familiar object "Asian Skin" is shown in Fig. 4. The distribution corresponding to the set of pooled ratings (of all regions) is also plotted.It's clear that the rating distributions of the regions are quite similar, especially in terms of their overall orientation.The centroids of the distributions, corresponding to the most likely location of the memory color of the object, appeared to be closely grouped (and can be approximated by an elliptical distribution).Even their size is comparable, except for Taiwan.
Following Smet, Ryckaert, Pointer, Deconinck and Hanselaer [4] the mean rating scores were therefore modeled by a bivariate Gaussian function R(u',v') (see Eq. ( 1).The fitting parameters a 1-7 for each of the rating functions R(u',v') are given in Appendix A. For Iran no rating data for smurf® (SM) and dried lavender (DL) are available as these objects are not familiar to Iranian people.As an example, a 3D plot of the fitted models is made for "Asian Skin " in Fig. 5. From Fig. 4, it can be observed that the 1d-elliptical contours are a good first order approximation for the chromaticity area associated with acceptable -positive -observer ratings.The 1d-elliptical contours for all objects and regions are illustrated in Fig. 6.
From Fig. 5, it is clear that there is good agreement between the experimentally determined mean observer ratings and the model fits.The goodness-of-fit was more quantitatively assessed by calculating the STRESS between the mean observer ratings and the modeled ratings.For "Asian Skin" the STRESS values ranged between 0.10 and 0.17, with an average of 0.14 ± 0.03.Compared to the inter-observer variability these values were much smaller, indicating a very satisfactory fit to the rating data.In fact, the agreement of the fit with the average observer is approximately 3 times better than the agreement of a random observer with the average observer!
The other objects had similarly excellent STRESS values as can be seen from the datashown in percent (%) -in Table 6, where the minimum and maximum STRESS values are 0.08 and 0.17 respectively.
Table 6 also shows the results for the fit to the pooled region rating data.In contrast to the previous results, these STRESS values are much larger.They range from 0.25 to 0.56, with an average of 0.37 ± 0.09.The pooled STRESS values are thus comparable to the ones found for the inter-observer variability within a single region.In other words, the average observer for a single region deviates approximately the same amount from the global average observer -as an individual observer within a region deviates from its average observer.Fig. 5.A 3D plot of the fitted models for "Asian skin" for all regions.The fit to the pooled region data is also shown.

Cross-regional/cultural differences in color appearance rating
Unlike intra-region variability -which is equivalent to the inter-observer variability within a single region -inter-region variability could not be directly evaluated by calculating the STRESS between the mean observer ratings within a single region and the mean observer ratings averaged across all regions, because different regions had slightly different stimuli sets.The sets differed both in size and in chromaticity of the test stimuli.The former due to differences in monitor gamut and the latter due to slightly different white points which resulted in slightly different corresponding chromaticity after correction to a D65 adaptation white.For this reason, cross-regional (or -cultural) differences have been analyzed by comparing -for each object -the region average observers with the global average observer.The former were modeled by the bivariate Gaussians fitted to the mean observer ratings for each region, while the latter was modeled by the bi-Gaussian fitted to the pooled set of mean observer ratings of all regions.Statistical significance of cross-regional effects on color appearance rating and memory colors was evaluated by the extra-sum-of-squares F-test [43].This F-test compares the goodness-of-fit of two alternative models, one being a simpler "nested" version of the other.In the analysis at hand, the simple model was the average global observer fit.This model assumes that the variance in the entire rating data set can be explained by a single bivariate Gaussian function with 7 parameters.The other, more complex model, assumes a separate bivariate Gaussian function for each region is required to explain the variance.This model has 49 parameters, 7 for each separate region fit.The null hypothesis is that the simple model is correct.The F-test compares the improvement in the residual Sum-of-Squares (SS) for the more complicated model with the loss in degrees of freedom (DF) associated with the increase in the number of model parameters: with SS null and SS alt the residual sum-of-squares between the model fit and the visual data for the simple and complex model respectively.DF refers to the degrees of freedom of the simple and complex models.
The results of the extra-sum-of-squares F-test (see Table 7) showed a statistically significant effect for all the familiar objects, meaning at least one region average observer differed significantly from the global average observer.Posthoc cross-comparison F-tests showed that all regions differed significantly (p < 0.001) from one another for all objects, even after Bonferroni correction of the significance level.However, the effect size was small as can be seen from the eta-square value ( 2 ) in Table 7.As the male-to-female ratio of the test panels varied widely across laboratories, gender is a possible confounding factor, i.e. the differences observed could be solely due to differences in rating between men and women.To investigate the gender effect, another extra-sum-of-squares F-test was performed, but this time between the global average observer and the global male and female observers.The latter were obtained by pooling respectively all male and female data from all geographical regions.The F-test results, given in Table 8, show a statistically significant effect for six objects, indicating the global male, global female observer or both did differ significantly from the global average observer for about half of the objects.However, the effect sizes are extremely small compared to those obtained for the effect of geographical region.In fact, on average they were about 30 times smaller.It can therefore be concluded, in agreement with the earlier analysis results, that geographical region did indeed have a significant effect, but that the size of the effect is small.That the impact of region or culture on color appearance rating and memory color, although significant, is small is also clear from Fig. 6 where the 1d-elliptical contours of the fitted bivariate Gaussian functions show very similar location, size and orientation.
In fact, as already mentioned during the discussion on the goodness-of-fit of the bivariate Gaussian models and as can be seen from a comparison of Figs. 6 and 7, the observed variability between different regions is of the same order of magnitude or smaller than the typical inter-observer variability within a single region.
Statistically significant, but small -smaller than the intra-culture variation -and hence practically irrelevant effects of culture were also reported by Fernandez, Fairchild and Braun [32] in a study on observer and cultural variability of preferred colors of pictorial images.Iran Fig. 7.The 1d-elliptical contours of the bivariate Gaussian models fitted to the individual observer data for 'Asian Skin' (colored solid lines), fitted to the average ratings (across observers) of each region (dotted black line) and fitted to the pooled (across regions) ratings (dashed black line).
Finally, to illustrate the size of the regional effect on memory color in a practical way, the region-average memory colors are displayed for "Asian Skin" in Fig. 8 and for "Green Apple" in Fig. 9 as an example.The memory color for the global observer -obtained from the pooled region-ratings -is also displayed.In addition, the chromaticities of the displayed memory colors are also shown.The displayed images were calculated using the srgb color space.Although, the actual displayed colors will depend on the medium on which they are presented, srgb does give a good first approximation on a typical monitor.For "Asian Skin" (Fig. 8), it is clear that the differences between the displayed memory colors, although visible, are small indeed.For "Green Apple" (Fig. 9), the cross-regional differences in memory colors were more striking.A possible explanation might be the object's large natural variation in color, due to among others, different types of green apples and varying stages of ripeness.Note that all region-average memory colors are still located within the 1d-elliptical contour -an approximate tolerance boundary for color acceptability -of the global average observer (see last graph of Fig. 9).Similar graphs for the region-average memory colors of the other familiar objects are plotted in Fig. 10.    9.The color differences were calculated using the normalized Mahalanobis distances -which reduce to a regular Euclidean distance when the normalized a 5 parameters are equal to zero -to the global average memory color.By taking the shape and orientation of the global average observer rating function into account, the normalized Mahalanobis color difference more accurately represents the actual perceptual difference with the global average memory color than a regular Euclidean distance.The distance in u'v' corresponding to one Mahalanobis unit of the global average rating function is also given for comparison.It's clear that the mean color difference is substantially smaller than the unit Mahalanobis distance, indicating, as noted earlier, that all region average memory colors fall well within the acceptability region of the global average observer.It may also be noted that larger variability with respect to the global average (as assessed by the STRESS of the fit with the pooled data in Table 6 or by the 2 in Table 7) does not necessarily correspond to larger perceptual differences, as variability and perceptibility are respectively expressed in relative and absolute measures.For example, consider the "green apple" and the "strawberry".While the variability for the green apple is larger than for the strawberry (see Table 6 and 7), it is also clear from Table 9 that the perceptibility for the green apple is substantially smaller, due to the much smaller unit Mahalanobis distance (MD in Table 9).In addition, variability is assessed with respect to observer ratings for any chromaticity (cfr.Gaussian models), while perceptibility is evaluated only with respect to memory color chromaticity.It may also be noted that, overall, the differences in memory color chromaticity between the region-average observers and the global average observers are larger than (or approaching) the perceptual discrimination limit (1 JND ≈ΔE u'v' ≈0.003).However, as indicated by when discussing the observer and region variability, similar or larger differences also occur between individual observers and the region average observer.

Conclusions
The effect of cross-regional or cross-cultural differences on color appearance ratings and memory colors of eleven familiar objects was investigated in seven different regions -Belgium, Hungary, Brazil, Colombia, Taiwan, China and Iran.In each of the corresponding laboratories, the familiar objects were presented on a calibrated monitor in over 100 different colors to a test panel of observers that were asked to rate the similarity of the presented object color with respect to what they thought the object looks like in reality.For each object and region the mean observer ratings were modeled by a bivariate Gaussian function.The goodness-of-fit, as evaluated by the Standardized-Residual-Sum-of-Squares (STRESS) was much smaller than the inter-observer variability STRESS value.Remarkably, considering the rather virtual and subjective -each his/her own -nature of the reference memory color, the variability was to that found in color difference studies that employ a fixed reference.A statistical analysis showed significant (p < 0.001) differences between the region average observers and the global average observer obtained by pooling the data from all regions.However, the effect size of region or culture was found to be small.In fact, the differences between the region average observers and the global average observer was found to be of the same magnitude or even smaller than the typical inter-observer variability within one region.Thus, although statistical differences in color appearance ratings and memory between regions were found, they are not likely to be of practical importance.

Fig. 1 .
Fig. 1.The eleven familiar 'objects' for which memory colors were determined.Note that images CS and AS are blurred for publication purposes only, no blurring was present during the experiments.

Fig. 3 .
Fig. 3. Top left and top right: two examples of a typical screen as seen by a test subject during the experiment.Bottom: a close up of the continuous graphical rating scale.

Fig. 4 .
Fig. 4. Average color appearance rating distributions in the CIE 1976 u'v' chromaticity diagram for "Asian Skin" for all regions and for all regions pooled.The black dots are the test chromaticity points.

Fig. 8 .
Fig. 8. Region average memory colors for "Asian Skin".The memory color for the global observer (all regions pooled) is also shown.The last subplot displays the CIE u'v' chromaticity coordinates of the different displayed memory colors and the 1d-elliptical contour of the global observer.The display colors were calculated using srgb color space.Note that the images are blurred only for publication purposes, no blurring was present during the experiments.

Fig. 9 .
Fig. 9. Region average memory colors for a "Green Apple".The memory color for the global observer (all regions pooled) is also shown.The last subplot displays the CIE u'v' chromaticity coordinates of the different displayed memory colors and the 1d-elliptical contour of the global observer.The display colors were calculated using srgb color space.

Fig. 10 .
Fig.10.Region-average memory color chromaticities.The 1d-elliptical contour and memory color of the global average observer is also plotted.The color differences in the CIE 1976 u'v' chromaticity diagram between the region average memory colors and the global average memory colors are given in Table9.The color differences were calculated using the normalized Mahalanobis distances -which reduce to a regular Euclidean distance when the normalized a 5 parameters are equal to zero -to the global average memory color.By taking the shape and orientation of the global average observer rating function into account, the normalized Mahalanobis color difference more accurately represents the actual perceptual difference with the global average memory color than a regular Euclidean distance.The distance in u'v' corresponding to one Mahalanobis unit of the global average rating function is also given for comparison.It's clear that the mean color difference is substantially smaller than the unit Mahalanobis distance, indicating, as noted earlier, that all region average memory colors fall well within the acceptability region of the global average observer.It may also be noted that larger variability with respect to the global average (as assessed by the STRESS of the fit with the pooled data in Table6or by the 2 in Table7) does not necessarily correspond to larger perceptual differences, as variability and perceptibility are respectively expressed in relative and absolute measures.For example, consider the "green apple" and the "strawberry".While the variability for the green apple is larger than for the strawberry (see Table6 and 7), it is also clear from Table9that the perceptibility for the green apple is substantially smaller, due to the much smaller unit Mahalanobis distance (MD in Table9).In addition, variability is assessed with respect to observer ratings for any chromaticity (cfr.Gaussian models), while perceptibility is evaluated only with respect to memory color chromaticity.It may also be noted that, overall, the differences in memory color chromaticity between the region-average observers and the

Table 2 . Monitor Details: Type, Brand, Gamut, the 10° CIE 1976 u'v' Chromaticity Coordinates and Y 10 Luminance of the Monitor White Point; Mean, Standard Deviation (SD) and Maximum Calibration Error ΔE* Lab , for a Set of 40 Random Test Chromaticities within the Monitor Gamut Region Monitor type, brand, gamut White point Calibration error ΔE* lab u' 10 v' 10 Y 10 cd/m 2 Mean ± 1 SD Max
Fig. 2. Monitor gamut and white point in the CIE 1964 chromaticity diagram of the participating laboratories.The common gamut is also plotted.

Table 5 . Average Intra-and Inter-observer Variability a
a The standard deviation is also given.b No