Estimating the Illumination Direction From Three-Dimensional Texture of Brownian Surfaces

We studied whether human observers can estimate the illumination direction from 3D textures of random Brownian surfaces, containing undulations over a range of scales. The locally Lambertian surfaces were illuminated with a collimated beam from random directions. The surfaces had a uniform albedo and thus texture appeared only through shading and shadowing. The data confirm earlier results with Gaussian surfaces, containing undulations of a single scale. Observers were able to accurately estimate the source azimuth. If shading dominated the images, the observers committed 180° errors. If cast shadows were present, they resolved this convex-concave-ambiguity almost completely. Thus, observers relied on second-order statistics in the shading regime and used an unidentified first-order cue in the shadow regime. The source elevations could also be estimated, which can be explained by the observers’ exploitation of the statistical homogeneity of the stimulus set. The fraction of the surface that is in shadow and the median intensity are likely cues for these elevation estimates.

Illumination direction estimation is an important prerequisite for estimates of the light field, shape from shading, and material judgments. In this article, we investigate how well human observers are able to estimate the illumination direction from 3D textures, in connection to our interest in light field perception. The light field (Gershun, 1939), or plenoptic function (Adelson & Bergen, 1991), is defined as the irradiance as a function of position and direction and might serve as a radiometric framework for perception. Texture provides us with cues which are additional to shading. Note that Lambertian shading (Horn & Brooks, 1989) is dependent on the normal component of the local light vector, while texture due to surface roughness is also dependent on the tangential component of the local light vector. The ensembles of local illumination orientation estimates over rough 3D objects form patterns, the illuminance flow . The illuminance flow depends systematically on the light field and on the shape of the object (Karlsson et al., 2008(Karlsson et al., , 2009 van Doorn, Wijntjes, & Koenderink, 2015) and provides cues about the light field and object shape (Koenderink, 2012;. In a previous study, we derived how second-order statistics based on the squared gradient and on the Hessian relate to the illumination direction . This relation suggests that the illumination orientation can be derived from 3D textures via responses of so-called edge and line detectors. In another study , we tested whether human observers were actually able to carry out this task for frontally viewed real 3D textures from the Curet database, see, for examples, Figure 1 left, and we found that the experimental results were surprisingly similar to the theoretical predictions. The similarity was especially surprising because real textures do not comply with the strict assumptions of the theory at all namely, that the geometry is Gaussian, that the material is locally perfectly matte (Lambertian shading), and that the relief is shallow such that no shadows and no interreflections occur.
In still another study , we carried out a similar experiment for frontally viewed computer-generated Gaussian surfaces, see, for examples, Figure 1 bottom, and again found that human observers' estimates were close to the fiducial orientation values (interquartile intervals of the deviations of the azimuthal estimates were below 14 ). However, for the Gaussian textures in the shadowing regime, we found that observers were able to resolve the convex-concave-ambiguity. The main difference between the Figure 1. Textures of real materials from the CURET database at the top and rendered textures of Gaussian surfaces at the bottom. All textures were viewed frontally. The Gaussian textures were rendered for a single scale of the roughness, for illumination polar angles of about 0 , 30 , 50 , and 70 (from left to right), varying azimuth (depicted by the red arrows), and for three reliefs, increasing from above to below. textures in the shadowing and shading regimes was the presence of cast shadows. This suggests that observers make use of the difference between the boundaries of the cast shadows and the body shadows, the latter being much more gradual than the former.
Next, we tested illumination direction estimation for textures of frontally viewed Gaussian anisotropic rough surfaces . For such textures, one expects systematic errors of the settings as a function of the anisotropy (Karlsson et al., 2008(Karlsson et al., , 2009). Our expectations were fully borne out, in that the observers committed the predicted systematic errors. The results were precise enough to allow the inference that illumination direction detection is based on second-order statistics, that is, of edge detector (rather than line detector) activity. Figure 1 shows examples of real materials and of rendered Gaussian surfaces. The Gaussian textures look somewhat artificial and as if photographed out-of-focus. The main reason might be that the surface roughness of these textures is restricted to a single scale, while in natural materials one typically finds undulations over a range of scales (Green, Padilla, Drbohlav, & Chantler, 2007;Kube & Pentland, 1988;Padilla, Drbohlav, Green, Spence, & Chantler, 2008;Wainwright & Simoncelli, 2000). Therefore, in this article, we tested whether a deviation from the theoretical assumption of Gaussian geometry (while the theoretical assumptions of Lambertian reflectance and uniform albedo were fulfilled) will systematically affect the estimates of human observers. We rendered images of height profiles resulting from linear superpositions of a range of Gaussian surfaces of different scales. Due to the effect that larger bumps might put smaller ones in cast shadow, such a ''Brownian image texture'' is not simply a linear superposition of the image textures of the composing Gaussian surfaces. Can human observers estimate the illumination orientation for these, more realistic surface profiles, containing roughness at a range of scales?

Methods Stimuli
We generated 250 images of frontally viewed Brownian surfaces, see Figure 2 for examples. To create these images, we first constructed statistically independent random surfaces (i.e., surface height profiles) for each of them. The surfaces were generated by linear superposition of seven random Gaussian reliefs of different scales. Each Gaussian relief component was generated with normally distributed heights and an isotropic Gaussian correlation function (Longuet-Higgins, 1957). Hereafter, we set scale as the half-width of the autocorrelation function. The scales of the seven Gaussian relief components were increased exponentially to 1, 2.2, 5.0, 11.3, 25.4, 57.0, and 128 pixels, in other words as 0.2%, 0.4%, 1.0%, 2.2%, 5.0%, 11.1%, and 25% of the image-width. The root mean square spreads in the heights were taken proportional to the scales. After superposition of the seven Gaussian random relief components, we subtracted the linear trends of the generated height profiles in order to avoid global slants of the surfaces with respect to the fronto-parallel surface. Finally, the variance of the surfaces was normalized, and the height scaled with a constant of 128. Thus the surface structures had a fractal like character, see the powerspectrum in Figure 3. Because of this fractal like character we called them Brownian surfaces.
The stimuli were prepared through a Mathematica program and saved as grayscale TIFFformatted image files of 512 Â 512 pixels, 8 bit per pixel, linearly mapping the luminance values. The stimuli were rendered assuming locally Lambertian or perfectly diffuse scattering (Lambert, 1760), a collimated beam (similar to direct sunlight), with pixels in body and cast shadows being set to black (no ambient term). These stimuli represent physically realistic renderings, except for the fact that multiple scattering is not present. Note that, although the surface structure or height profile is simply a linear superposition of the surface structures of the composing Gaussian surfaces, the resulting image structure or 3D texture is not a simple combination. This is because the shadows of larger bumps might overcast smaller bumps and in effect make smaller bumps invisible.
The illumination directions of the 250 images were distributed randomly over the hemisphere of potential illumination directions (see the polar plot in Figure 4). All stimuli were presented in randomized order. The rendered images were shown in a circular mask in order to avoid a possible bias due to the square shape of the images. The test was conducted Figure 2. Examples of the stimuli, with arrows depicting the true illumination direction. The cut-out was circular in order to prevent biased responses due to oriented contours.  using a linearized monitor (unit gamma; implemented via software and checked with a gray scale and Koninca Minolta luminance meter).

Observers
Six observers, the authors and three naive observers, participated in the experiment. Authors S. P. and A. D. were naive with respect to the stimulus parameters. All six observers had normal or corrected-to-normal vision. The experiment was done in accordance with local ethical guidelines, Dutch Law, and with the Declaration of Helsinki.

Experimental Setup
The setup consisted of an Apple Macintosh G4 and a luminance linearized, 22 00 LaCie Blue Electron monitor, at 75 Hz and 1600 Â 1200 resolution. Participants were seated with their heads in a chinrest, 83 cm from the screen. Vision was binocular, and the head was fixed through the chin rest. The stimulus and probe (see next subsection) images extended visual angles of 8.6 Â 8.6 each. The room was dark during the course of the experiment.

Design and Procedure
We defined a ''natural'' interface in the form of a monochrome rendering of an illuminated hemispherical boss on a plane (see Figure 5). The boss and plane were rendered using (Lambertian) shading, with body and cast shadows, without reflexes. The observer could use the mouse in order to adjust the direction of the (simulated) source. The task was to let The grid specifies 15 increments in azimuth and elevation, using equal-area projection. The convention for the specification of the azimuth (zero direction toward the right, increase in counter-clockwise direction) is used throughout the article. Elevation is measured by the polar angle, that is, the distance to the direction of normal incidence (at the center of the graph). The elevation and azimuth specify the direction toward the light source.
the illumination of the hemispherical boss appears the same as the illumination of the texture. This proved indeed to be an intuitive interface to all observers in our former and present studies. The median time for a judgment was less than 8 seconds. Figure 6 shows the settings (dots) of the azimuthal angles against the stimulus azimuths, per observer. The drawn lines represent the veridical values modulo 180 . Surprisingly, most settings seem to lie close to the true values, while one would expect about half of them to be 180 off due to the convex-concave-ambiguity. Figure 7 shows the polar histograms of the deviations of the azimuthal settings from the actual illumination azimuths, for all six subjects. Obviously, the number of deviations near 0 is different from the number of deviations that are 180 off, and clearly outnumbers it, confirming that most responses were clustered around the fiducial illumination orientation. Since this result contradicts naive expectations, we did some further analysis on these data. Figure 8 splits the data in three groups for three separate elevation ranges: with a polar angle of 0 to 30 (in which range shading dominates) in the left plot, 30 to 60 (in which range neither shading nor shadowing dominates) in the middle plot, and 60 to 90 (in which range shadowing dominates) in the right plot. It is clear from this figure that in the shading regime indeed (following our expectations), about half of the data is 180 off. However, in the intermediate and in the shadowing regimes, almost all settings are close to the veridical illumination orientations. We calculated the ratios of the numbers of datapoints in the first plus fourth quadrant with respect to the stimulus values to those in the second plus third quadrant. We found that the ratios in the shading, mixed, and shadow regimes were 185/217 ¼ 0.85, 156/594 ¼ 0.26, and 41/307 ¼ 0.13. Thus, the presence of shadows in the image seems to resolve the convex-concave-ambiguity.  Figure 9 shows the settings (dots) of the polar angles against the stimulus values, per observer. Theoretically, the elevation cannot be estimated due to the bas-relief ambiguity (Belhumeur et al., 1997), so here we expected no clear relation of the settings with the veridical values. Because the data seem to show some correlation with the stimulus values we did a regression on the data. The lines represent linear fits to the veridical polar angle y, for which we found: 37 þ 0.28 y with R 2 ¼ 0.35 for A. D., 24 þ 0.51 y with R 2 ¼ 0.54 for J. K., 44 þ 0.30 y with R 2 ¼ 0 .22 for S. P., 44 þ 0.32 y with R 2 ¼ 0.19 for J. J., 32 þ 0.40 y with R 2 ¼ 0.24 for X. C., and 17 þ 0.48 y with R 2 ¼ 0.36 for X. W.

Results
To understand this slight but significant correlation, we looked at the correlation between the data and a few possible effective cues that the observers might have used. Since the stimulus set is homogeneous in terms of statistics, observers might have used cues such as the average gray level, contrast, shadowed fraction of the total area. Figure 10 shows the average polar angle settings (horizontal axis) of the observers against the shadow fraction, median intensity, and Michelson contrast (from 5% to 95% percentiles instead of the absolute minimum and maximum). It is clear that the settings correlate nicely with the shadow fraction. The median intensity also acts as a cue in the shading regime, not just in the shadow regime. For the contrast, we find a distinct picture. The contrast explodes at about 60 due to the dominance of cast shadows of big bumps, which put very large parts of the image in shadow (including small bumps). Even in this large-scale shadow dominated regime, observers were able to conduct our task well.
In addition to the former analysis, we studied (straight) correlations between the observers' azimuthal data (O) and illumination orientation estimates (E) that were calculated from second-order statistics on the basis of the squared gradient-that is,  of edge detector (rather than line detector) activity. Our former studies suggested that such a mechanism might underly illumination orientation detection Koenderink et al., , 2004. The estimates were calculated for three differentiating scales of our algorithm (1, 8, and 64 pixels) and averaged over the inner Figure 9. Scatter plots of the polar angle settings (vertical axis) against the ground truth (horizontal axis) for each observer. We fitted the data linearly (drawn lines). Figure 10. Scatterplots of the average polar angle settings of all observers and the percentage of shadow-filled area (left), the average intensity (center), and the root mean square contrast (right). Notice that shadowing sets in at a polar angle of about 30 , and that the settings correlate nicely with the shadow fraction. The median intensity also acts as cue in the shading regime, not just in the shadow regime.

Table 1. Correlation Coefficients for Comparison Between the Illumination Orientation Estimates (E) and the True Azimuths (T), Between the Illumination Orientation Estimates and the Observers' Azimuthal Settings (E-O), and Between the Observers' Azimuthal Settings and the True Azimuths (O-T).
The illumination orientation estimates were computed at three different scales (second column). These correlations were computed separately for the shading, intermediate, and shadowing regimes (Columns 3-5).

Comparison
Scale (  square of the stimuli of 344 pixels squared in which there was no coverage by the circular mask. Correlations were computed for the orientations, rather than directions, in other words, we corrected for 180 flips. We also computed the correlations between these illumination orientation estimates (E) and the true azimuths (T), for comparison with the correlations between the illumination orientation estimates and the observers' azimuthal settings (E-O). Finally, as a sort of baseline correlation, we computed the observers' azimuthal settings against the true azimuths (O-T). These correlations were computed separately for the shading, intermediate, and shadowing regimes. The results are represented in Table 1. These numbers confirm that observers could estimate the illumination orientations rather well (O-T correlations are quite high). The correlations of the observers' settings and of the illumination orientation estimates with the true estimates were consistently higher for the intermediate regime than for the shading or shadowing regimes. Moreover, the correlations for the intermediate regime were most robust under variation of the differentiating scale. The correlations for the shading regime show the largest decrease with increasing scale. The correlations of the estimated illumination orientations at the largest scale (E-O and E-T) were clearly lower than those of the observers (O-T). Summarizing, we find that the second-order statistics correlated well with the observers' settings, especially at lower scales, and especially in the intermediate regime.

Conclusions and Discussion
The main conclusion from this study is that a deviation from the theoretical assumption of Gaussian geometry does not affect the estimates of human observers systematically. Human observers can estimate the illumination orientation for our more realistic surface profiles containing roughness at a range of scales. Moreover, the presence of a range of scales, instead of a single scale of the undulations, prevented complaints by the observers. In our previous work on random Gaussian surfaces, we found that observers ''did not 'like' the samples because they appear somewhat ambiguous'' . The stimuli in the current study are probably more pleasant to view because they do look sharp and they do offer a ''hold'' to the eye (as distinct from the Gaussian surfaces). The results from the current study confirm earlier results using texture images from the CURET database (Curet, 1997). Moreover, as in the case of our study using rendered random Gaussian surfaces , we found that observers were quite capable at elevation and azimuthal direction (instead of orientation) estimation.
The observers' sensitivity to light source elevation cannot be interpreted as an absolute sensitivity to the height of the light source. Such sensitivity is impossible in view of the basrelief ambiguity (Belhumeur et al., 1997), which is a basic image ambiguity concerning light source elevation and relief height. The reason must be the statistical homogeneity of the Figure 13. Surface illuminance flow estimates for a flat plaster surface, a mountainous area viewed from above, and a mountain viewed from the side. The ellipsoids' semimajor axes represent the estimated illuminance flow orientations. The eccentricity represents the confidence level. stimulus set. Observers might have used, for instance, the average brightness and shadow fraction to grade the samples and relate them to some equivalent elevation scale for the experiment. In contradistinction to our study on Gaussian textures, we did not find a monotonic relation of the settings with contrast, so the contrast cannot serve as a direct cue in the current study.
The azimuthal settings in the intermediate and shadowing regimes did not show a 180 modulus. Thus, observers were able to estimate the illumination direction, not just the orientation, if cast shadows were present. Probably the difference between cast and body shadows was used as a cue to the illumination direction, resolving the convexity-concavityillumination-direction-ambiguity, see Figure 11. Cast shadows have a sharp boundary, while body shadow boundaries are generally more gradual. Noncollimated lighting might thus influence how well the convexity-concavity-illumination-direction-ambiguity direction ambiguity can be resolved because the difference in sharpness of the cast and body shadows might become less clear. The transitions of light-to-dark and dark-to-light in the direction of the tangential component of the light vector are cast shadow edges and body shadow edges. The asymmetric shapes of the shadow and light patches might be another cue for this resolution. In the case of our more natural Brownian surfaces, these differences between cast and body shadows can be much less salient due to the nonlinear combination of such image effects on a range of scales. Figure 12 shows a visualization of how small-scale shadows may mask the large-scale cast-body shadow differences in sharpness of the gradients (sharp vs. more gradual boundaries) and the asymmetric shapes of the shadow patches. This masking effect is striking in the intermediate regime of the center images in Figure 12. In addition, in our stimuli, very large-scale cast shadows may put smaller ones in shadow and decrease this masking effect; this effect is very clear in the right images of Figure 12. However, despite these complicating factors-that are omnipresent in most natural materials-observers showed to be quite capable of using the shadowing cues to resolve the 180 ambiguity. It remains to be answered whether the same resolution due to cast shadows occurs in scenes with generic content, that is, objects instead of 3D texture.
In all regimes, we found that the observers' estimates were accurate in terms of orientation, which suggests that they used shading as well as shadowing cues. The second-order statistics correlated well with the observers' settings for stimuli in all three regimes, especially at lower scales. The decrease of the correlations for increasing scale suggests that shading and shadowing cues at smaller scales are needed to arrive at the observed accuracy of the observers' settings and may be combined with large-scale cues-especially in the intermediate and shadowing regimes. It would be interesting to study this point in more detail in combination with eye tracking, to see whether observers look at specific locations in the image. A mechanism combining shading and shadowing cues at a range of scales is of course very convenient with regard to light field estimates in natural scenes.
Our results show that it is very plausible that ensembles of illuminance flow estimates are an important cue to the light field in natural scenes. Figure 13 shows the computational gradient-based illuminance flow estimates (for the algorithm, see  for three photographs: a flat piece of plaster, a mountain area seen from above, and a mountain seen from the ground. The flow estimates are represented by ellipses, with the orientation of the major axis representing the estimated irradiation direction and the eccentricity representing the confidence. On the basis of our findings, we hypothesize that weighted combinations of such flow estimate ensembles at multiple scales are probably an important cue for the visual light field . Recent findings show that the visual light field is simplified in comparison to the physical light field and that observers are sensitive to converging, diverging, and uniform fields (Kartashova, Sekulovski, de Ridder, te Pas, & Pont, 2016; van Doorn, Koenderink, Todd, & Wagemans, 2012), which suggests that such relatively simple topologies might represent templates for (often more complicated) natural light fields. Also, it was shown that scene layout and object properties can influence illumination estimates (Schutt, Baier, & Fleming, 2016;Xia, Pont, & Heynderickx, 2016), which is to be expected if the visual light field is inferred from shading and shadowing patterns. In future studies, we will further study the extrapolation from textures to (perception of) illuminance flow over 3D objects and natural scenes, which is far from trivial due to, for instance, foreshortening and local occlusion effects.
Finally, these observations are of course related to material and shape perception, and not just light. Since we simultaneously infer higher dimensional material, shape, and illumination properties from two-dimensional images, it is to be expected that such inferences interact. Many studies have shown that it is indeed the case that material, shape, and illumination perception is confounded (Gerhard & Maloney, 2010;Ho et al., 2008;Kim et al., 2014;Motoyoshi et al., 2007;Pont & te Pas, 2006;te Pas & Pont, 2005;Wijntjes, Doerschner, Kucukoglu, & Pont, 2012). Looking at our stimuli in the current study, see, for examples, Figure 2, we saw that many of our stimuli did not look as being made of matte material. Many of them tend to look quite glossy or shiny. We studied this illusory gloss in depth in another article (Wijntjes & Pont, 2010), in which we tested gloss perception for Brownian surfaces as a function of the depth range and illumination direction. We found that an interpretation in the context of the bas-relief ambiguity (Belhumeur et al., 1997) could explain our gloss perception data; on average perceived gloss increased with increasing relief and decreased with decreasing source elevation.
Interreflections were ignored in that experiment, as well as in the current experiment. We expect that the addition of interreflections (or an ambient term) will not influence our data. We do expect, however, that interreflections rendering will influence the perception of relief; a famous observation in this area of study concerns the perceptual overestimation of relief of the moon surface by astronauts (Philips, 2006). Unfortunately, we are still lacking experimental methods to probe surface shape. We belief this is currently one of the biggest challenges to arrive at a more holistic approach in natural material perception. Simultaneous testing of surface relief height, material reflectance, and illumination perception may well be the only manner to fully understand its underlying processes.

Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work is supported by Delft University of Technology.