Spatiotemporal frequency characteristics of the visual unpleasantness of dynamic bandpass noise

Recent psychophysical evidence shows that visual discomfort and unpleasantness are related to particular image features such as the spatial frequency and orientation spectrum. We also have a strong unpleasant feeling toward moving objects such as swarming worms, but it is poorly understood how motion information relates to a feeling of unpleasantness. The present study investigated spatiotemporal frequency characteristics that cause visual unpleasantness using bandpass noise with variable spatial frequencies, temporal frequencies, temporal frequency bandwidths, and orientation bandwidths. Results show that dynamic noise with relatively low temporal frequencies (0.5-2 Hz) was markedly more unpleasant than static noise, including that judged as the most unpleasant in a previous study. Remarkably, translational motion of the noise did not increase the feeling of unpleasantness. A subsequent experiment using a dynamic texture in which elements moved in a variable range of random directions showed that the variegated motion direction plays a critical role in promoting visual unpleasantness. Natural scenes have regularity in that features inside an object usually move in the same direction and rarely at random, and the present results further support the notion that deviation from the statistical regularity of natural scenes in images and movies induces negative emotions.


Introduction
Images of particular objects and surfaces give rise to unpleasant feelings in humans. Whereas such affective responses seem to be evoked by the recognition of an object or material (e.g., rotten food) learned through daily experience, recent psychophysical evidence indicates that they can be partially induced by features and statistics of the image itself. As an example, trypophobia, which is the fear or a feeling of unpleasantness toward clusters of small holes and granules, is known to be related to a specific type of spatial frequency spectrum (Cole & Wilkins, 2013;Le et al., 2015). The Fourier amplitude spectra of trypophobic images tend to peak at a midrange spatial frequency and deviate from the linearity of the so-called 1/f α spectral falloff, which is prevalent in natural scenes (Geisler, 2008;Simoncelli & Olshausen, 2001;Tolhurst et al., 1992;. A similar law is relevant for a wide range of natural surfaces (Motoyoshi & Mori, 2016). Controlled analyses further indicate that excess energy in a specific spatial frequency band evokes a feeling of discomfort or unpleasantness toward artificial images such as images of Gaussian white noise and toward manipulated natural images (Fernandez & Wilkins, 2008;O'Hare & Hibbard, 2011). A different line of studies suggested that the reverse is also true by showing that many paintings with a high aesthetic value tend to follow the linearity and slope in the amplitude spectrum of natural scenes (Graham & Field, 2007;Hagerhall et al., 2004;Juricevic et al., 2010;Redies et al., 2007;Spehar et al., 2003;Taylor et al., 2005;Viengkham & Spehar, 2018). A recent study further suggested that the same principle is relevant to the orientation domain, by showing that a feeling of unpleasantness caused by bandpass noise is more profound when the stimulus has an isotropic orientation spectrum, which is rare in natural scenes (Ogawa & Motoyoshi, 2020). These findings consistently suggest that humans tend to feel discomfort or unpleasantness when images deviate from the spectral regularity of natural scenes.
In addition to the case of static images, we have a strong feeling of unpleasantness toward dynamic movies of, for example, swarming worms and snakes. Such negative emotions evoked by movies may also be related to specific dynamic characteristics of the movie itself. However, little research has been done regarding the relationship between visual unpleasantness and dynamic aspects of visual stimuli. An exceptional study (Yoshimoto et al., 2017) showed that visual discomfort resulting from the flicker of a homogeneous stimulus depends on how similar the amplitude spectrum of temporal modulation is to the amplitude spectra of natural retinal inputs (Dong & Atick, 1995;. However, it is unclear how visual unpleasantness depends on the temporal property of spatial patterns that are usual in natural environments, and how spatial information and temporal information interact with each other. To investigate the effect of spatiotemporal characteristics on visual unpleasantness, we here examine unpleasantness ratings for dynamic noise stimuli having a variety of spatial frequency spectra, orientation spectra, and temporal frequencies (Expt. 1). The results show that dynamic noise is judged constantly more unpleasant than static noise, particularly when the dynamic noise has a relatively low temporal frequency. Replicating the findings of previous studies, noise with a narrow spatial frequency bandwidth and broad orientation bandwidth is judged as unpleasant. We also find that translational motion had no effect on the unpleasantness. Investigating why translational motion has no effect, we find in a subsequent experiment (Expt. 2) that variation in the motion direction plays a critical role in determining visual unpleasantness. Given that local motion signals are usually coherent within a moving object, these results are consistent with the notion that deviations from statistical regularities in dynamic natural scenes can induce a strong feeling of unpleasantness.

Observers
Eleven naïve paid volunteers and one of the authors (NO) took part in the experiment (six females and six males, aged 19-34 years). All observers had normal or corrected-to-normal vision. None had a history of migraines. All the experiments followed the Declaration of Helsinki guidelines and were conducted with permission from the Ethics Committee of the University of Tokyo. All observers provided written informed consent.

Apparatus
For nine observers, visual stimuli were displayed on a LCD monitor (BenQ ZOWIE XL2730) in a laboratory dark room. The background mean luminance was 94.8 cd/m 2 . Owing to the situation of COVID-19, three observers used LCD monitors (BenQ XL2730Z, BENQ XL 2430 T) or an OEL monitor (SONY PVM-A250) set up in a darkened room at her/ his own home. The background mean luminance for these three observers was 56.5, 53.2, and 71.3 cd/m 2 . All monitors had gammacorrected luminance as calibrated with a colorimeter (ColorCal II CRS) and a frame rate of 60 Hz. For each observer, the viewing distance was adjusted so that the pixel resolution was 1.4 min/pixel. As a result, the size of the uniform background varied among monitors (from 46.6(W) × 25.6(H) to 60.6(W) × 34.1(H) deg) but was much larger than the target stimuli (6.1 deg in diameter).

Stimuli
The visual stimuli were dynamic bandpass noise with various spatiotemporal properties (Fig. 1). The noise was presented within a circular window having a diameter of 6.1 deg and the edge of each stimulus was tapered using a cosine wave with a wavelength of 1.5 deg. The root-mean-square contrast was fixed to 0.3, and the mean luminance was equated to that of the uniform background.
At a default setting, the bandpass noise had a center spatial frequency of 1.3c/deg, a spatial frequency bandwidth of 1 octave, a temporal frequency bandwidth of 1 octave, and an infinite orientation bandwidth (i.e., unfiltered). The center orientation was randomly decided for each stimulus. On the basis of these default settings, one of the four parameters was varied as follows. The center spatial frequency was varied from 0.3 to 5.3c/deg (Fig. 1a); the spatial frequency was filtered with a bandwidth of 1 or 2 octaves, or unfiltered (Fig. 1b); the orientation was filtered with a bandwidth of 30 or 90 deg, or unfiltered (Fig. 1c); the temporal frequency was filtered with a bandwidth of 1 or 2 octaves, or unfiltered (Fig. 1e). For all four conditions, the center temporal frequency of the noise was varied from 0 to 15 Hz (Fig. 1d). Additionally, we used noise that had standard spatial parameters and translationally moved with a constant velocity of 0-12 deg/s (i.e., a temporal frequency of 0-15 Hz). This drifting noise had a spatiotemporal frequency profile close to that of the dynamic bandpass noise with a temporal frequency bandwidth of 2 octaves except for variation of the motion direction.

Procedure
Using a rating scale method, we measured the degree of unpleasantness for each stimulus. In each trial, the stimulus was presented for a duration of 2100 ms, during which the onset and offset were tapered using a cosine wave with a wavelength of 267 ms. The observer's task was the same as that in our previous study (Ogawa & Motoyoshi, 2020). Observers freely viewed the stimulus and answered the question "Was the noise unpleasant?" on a nine-point scale that varied from "Not at all" (0) to "Very" (8). All instructions were given in Japanese. Each observer rated a total of 55 stimuli in random order in a single block, and the block was presented five times.

Results
The left panel in Fig. 2a shows the mean unpleasantness rating of the bandpass noise as a function of temporal frequency. Different colors represent the results for different center spatial frequencies. The right panel replots the same data against the center spatial frequency. For all spatial frequencies, the unpleasantness rating increases with temporal frequency and peaks at around 0.5-2 Hz and then decreases at higher temporal frequencies. The unpleasantness ratings for dynamic stimuli (>0 Hz) are higher than the unpleasant ratings for static ones (0 Hz), and some of the dynamic stimuli in fact had spatial parameters identical to those of the stimuli that received the most unpleasant ratings in a previous study (Ogawa & Motoyoshi, 2020). This suggests that dynamic information robustly enhances the feeling of unpleasantness. The unpleasantness rating seems to be high at around a spatial frequency of 0.5-2c/deg, but this is not a clear trend. A two-way repeated measures analysis of variance (ANOVA) performed with the spatial frequency and temporal frequency as factors showed significant main effects of the temporal frequency (F(4, 275) = 15.7; p < 0.0001; η G 2 = 0.25) and interaction between the two factors (F(16, 275) = 3.57; p < 0.0001; η G 2 = 0.03), while significant but weak main effects of the spatial frequency (F(4, 275) = 3.15; p = 0.023; η G 2 = 0.078). Fig. 2b shows that the rating for stimuli having a narrow spatial frequency bandwidth (1 octave) is similar to that in Fig. 2a. However, the rating is generally low for wider spatial frequency bandwidths, which is consistent with the results of a previous study examining only static stimuli (Ogawa & Motoyoshi, 2020). A two-way repeated measures ANOVA performed with the spatial frequency bandwidth and temporal frequency as factors showed significant main effects of both the spatial frequency bandwidth (F(2, 165) = 75.8; p < 0.0001; η G 2 = 0.63) and temporal frequency (F(4, 165) = 10.5; p < 0.0001; η G 2 = 0.21) and a significant interaction between the two factors (F(8, 165) = 8.14; The opposite patterns of the results are obtained when the orientation bandwidth is varied (Fig. 1c). The rating is higher for wider orientation bandwidths (90 deg and unfiltered) and increases at temporal frequencies around 0.5-2 Hz. The rating is generally low for a narrow orientation bandwidth (30 deg). This effect of the orientation bandwidth agrees with previous results for static stimuli (Ogawa & Motoyoshi, 2020). A two-way repeated measures ANOVA performed with the orientation bandwidth and temporal frequency as factors showed significant main effects of both the orientation bandwidth (F(2, 165) = 9.41; p < 0.005; η G 2 = 0.23) and temporal frequency (F(4, 165) = 15.4; p < 0.0001; η G 2 = 0.23) and showed significant interaction but a small effect size between the two factors (F(8, 165) = 4.31; p < 0.001; 2d shows a clear tendency that the unpleasantness rating is higher for dynamic noise with a narrow temporal frequency bandwidth than for temporal broadband noise. (As the broadband noise has no definition for the center temporal frequency, the same rating value is plotted for all temporal frequencies.) A two-way repeated measures ANOVA performed with the temporal frequency bandwidth and temporal frequency as factors showed significant main effects of the temporal frequency (F(4, 165) = 14.3; p < 0.0001; η G 2 = 0.18) and interaction of the two factors (F(8, 165) = 11.6; p < 0.0001; η G 2 = 0.11) and small significant main effects of the temporal frequency bandwidth (F(2, 165) = 3.5; p = 0.049; η G 2 = 0.05). Fig. 2e shows the unpleasantness rating obtained for bandpass noise that translationally moved at various temporal frequencies. Although the temporal frequency property of this stimulus was similar to that of the dynamic bandpass noise in Fig. 2d except that the bandpass noise involved various motion directions, the unpleasantness rating did not increase with temporal frequency at all. This demonstrates that, in contrast to the temporally filtered random noise, the movement of spatial noise in the same direction has no effect on visual unpleasantness. A one-way ANOVA performed for the temporal frequency did not show a significant main effect (F(3, 47) = 1.03; p = 0.39; η G 2 = 0.015).
In summarizing the above results, we found that dynamic bandpass noise is more unpleasant than static bandpass noise, particularly when the dynamic bandpass noise has a concentrated temporal frequency at around 0.5-2 Hz (Fig. 2a-d). Remarkably, the dynamic information of translational motion has no effect on visual unpleasantness.

Experiment 2
The results of Expt. 1 indicate that dynamic bandpass noise with a narrow bandwidth in the spatiotemporal frequency domain produces more profound visual unpleasantness than static bandpass noise. However, translational motion in a single direction had no effect on visual unpleasantness at all even if the noise has a temporal frequency property similar to that of band-passed random noise. This leads us to the notion that the movement of image features in variegated directions plays a critical role in enhancing visual unpleasantness. In an additional experiment, we directly examined this possibility using a dynamic texture stimulus for which we can control variation of the motion direction.

Method
The visual stimulus was a dynamic texture with a diameter of 6.1 deg, composed of isotropic Gabor patches (Fig. 3). Each isotropic Gabor patch was a radial cosine grating of 1.3c/deg windowed using a Gaussian with a standard deviation of 0.4 deg. During the presentation of 2100 ms, 256 Gabor elements appeared at random initial positions and moved with a specific direction and speed. The speed of all elements was constant at 0, 0.2, 0.8, 3, or 12 deg/s, which was equal to 0, 0.3, 0.9, 3.8, or 15 Hz, respectively, in terms of the temporal frequency. The motion direction of each element was determined in accordance with a normal distribution having a specific mean and standard deviation (motion direction SD) of 0 deg, 30 deg, or infinity (Fig. 3). The mean motion direction was randomly decided in each trial. The stimulus with an infinite motion direction SD, in which elements moved in completely random directions, was perceptually similar to the isotropic and spatialfrequency narrow-band noise in Expt. 1 except that it appeared to have slightly lower density.
Eleven naïve paid volunteers and one of the authors (NO) took part in the experiment (four females and eight males, aged 21-34 years). None had a history of migraines. For five observers, visual stimuli were displayed on the same apparatus as used in Expt. 1 in the laboratory. Seven observers used LCD monitors (BenQ XL2730Z, BENQ XL 2430 T, BENQ XL2720B, BenQ ZOWIE XL2735) or an OEL monitor (SONY PVM-A250, SONY PVM 2541A) set up in individual homes. For the seven observers, the mean luminance of the background was 56.5, 53.2, 71.3, 48.8, 79.5, 97.1, and 53.0 cd/m 2 . All monitors were calibrated and gamma-corrected, and the viewing distance was adjusted so that the pixel resolution was 1.4 min/pixel. The uniform backgrounds (from 45.5(W) × 25.6(H) to 60.7(W) × 34.2(H) deg) were much larger than the target (6.1 deg in diameter). Experimental procedures other than the abovementioned were the same as those in Expt. 1. Data for one observer, which had a remarkably different tendency from data of the other 11 observers and from the basic tendency found for all observers in Expt. 1, were not used in the analysis. Fig. 4 shows the mean unpleasantness rating as a function of the temporal frequency of element motion. Different colors represent results for stimuli with different motion direction SDs. It is found that element motion in the texture extensively increases the unpleasantness rating at temporal frequencies around 1-4 Hz only for stimuli with variegated motion (i.e., direction SDs of 30 deg and infinity). In contrast, no or little increase was found for stimuli with coherent translational motion (i.e., a direction SD of 0 deg). A two-way repeated measures ANOVA performed with the direction SD and temporal frequency as factors showed significant main effects for direction SD (F(2, 150) = 387; p < 0.0001; η G 2 = 0.70) and the main effects for the temporal frequency (F(4, 150) = 46.7; p < 0.0001; η G 2 = 0.72) and interaction between the two factors (F(8, 150) = 52.3; p < 0.0001; η G 2 = 0.53). When we performed a one-way repeated measures ANOVA for the case that the direction SD was 0 deg, a significant but weak main effect of the temporal frequency (F(4, 50) = 2.54; p = 0.055; η G 2 = 0.121) was found.

Discussion
To investigate spatiotemporal characteristics of visual unpleasantness in movies, the present study examined the unpleasantness rating of static noise and dynamic bandpass noise for a wide range of spatiotemporal frequencies and various orientational properties. Results showed that dynamic noise with a relatively low temporal frequency (0.5-2 Hz) was rated constantly more unpleasant than static noise. It was also found, however, that the drifting of noise in a fixed direction did not affect visual unpleasantness. A subsequent experiment using a dynamic texture pattern in which elements moved with a particular directional variance confirmed that the visual unpleasantness of a dynamic stimulus readily increases with variation of the motion direction.  These results demonstrate that visual unpleasantness of an image is robustly enhanced by dynamic information and particularly by relatively slow motions in random or unpredictable directions.
The relationship between visual unpleasantness and spatial characteristics obtained in the present experiments is largely consistent with the results of previous studies. Experiments employing synthetic stimuli or paintings have demonstrated that stimuli deviating from the 1/f α spectral falloff, which prevails in natural scenes (Geisler, 2008;Simoncelli & Olshausen, 2001;Tolhurst et al., 1992;, and especially those with excess energy at mid-range spatial frequencies within two octaves of 3c/deg evoke a feeling of unpleasantness (Fernandez & Wilkins, 2008;O'Hare & Hibbard, 2011;Wilkins, 1995;Wilkins et al., 1984). This general tendency was also found in our Expt. 1, in which static stimuli with a center spatial frequency of 1.3, 2.6, or 5.3c/deg were rated as being more unpleasant than other static stimuli (Fig. 1a). The results for static stimuli in Fig. 1b and 1c more directly replicated a previous finding that the unpleasantness of bandpass noise decreases with spatial frequency bandwidth and increases with orientation bandwidth (Ogawa & Motoyoshi, 2020).
The present data do not indicate that static noises induce little unpleasantness. In fact, static stimuli with a narrow spatial frequency bandwidth and an infinite orientation bandwidth used in the present experiments had almost the same spatial parameters as bandpass noise that received the most unpleasant rating in a previous study (Ogawa & Motoyoshi, 2020). Moreover, this bandpass noise was rated as significantly more unpleasant than the 1/f α noise that had been the least preferred in other previous studies (Juricevic et al., 2010;Ogawa & Motoyoshi, 2020;Spehar et al., , 2016Spehar & Taylor, 2013;Viengkham et al., 2019;Viengkham & Spehar, 2018). Given these comparisons, some of the static noise stimuli employed in our experiments are considered some of the most unpleasant static noise stimuli ever reported. The present results demonstrate that dynamic stimuli induce an unpleasant impression even more profoundly than these very unpleasant static stimuli. Indeed, the dynamic band-passed noise stimuli used in the present study might be the most unpleasant synthetic stimuli created to date.
While random noise with a specific temporal frequency gave rise to a very unpleasant impression, the translational movement of bandpass noise with similar temporal frequency characteristics did not give rise to unpleasant feeling at all (Fig. 2e). The results of Expt. 2 further suggest that variegated and/or random motion is crucial for enhanced unpleasantness in dynamic stimuli.
Why is the variation of motion so important to visual unpleasantness? A number of previous studies on static stimuli commonly suggest that a feeling of unpleasantness, or discomfort, is essentially brought by visual stimuli that deviate from the statistical regularity of a natural image in terms of spatial frequency spectrum (Cole & Wilkins, 2013;Fernandez & Wilkins, 2008;Juricevic et al., 2010;O'Hare & Hibbard, 2011;Spehar & Taylor, 2013;Spehar et al., 2016;Viengkham et al., 2019;Viengkham & Spehar, 2018;Wilkins, 1995;Wilkins et al., 1984) and of orientation spectrum (Ogawa & Motoyoshi, 2020). Such deviation can reduce the efficiency of visual processing and increase the neural work load, as supported by evidence showing that the enhanced hemodynamic response is correlated to the subjective discomfort of a visual stimulus (Bargary et al., 2015;Le et al., 2017).
It is possible that a similar ecological principle is relevant for the variation of the motion direction. Objects in a natural environment are usually rigid. When a rigid object moves, image features belonging to that object move in a common direction. As noted in many computational vision studies, the visual system uses this rigidity constraint to estimate the motion and depth of an object (Ullman, 1979). In other words, image features within an object rarely move in random directions. The dynamic noise found to be unpleasant in the present study had motion characteristics that deviate from this regularity. In the real world, a swarm of wriggling worms or snakes or a fluid are examples of such deviant visual stimuli. Indeed, the example of worms or snakes evokes terribly unpleasant feelings. Some high-viscosity liquids, such as mud or oil, are likely to give rise to a negative impression; this is consistent with the present result that relatively slow movement at 0.5-2 Hz causes an especially strong feeling of unpleasantness.
However, it should noted that the above idea is not yet supported by a rich body of evidence. At present, the statistical regularities of motion statistics in movies of natural scenes are poorly understood (Dong & Atick, 1995;, in contrast to the good quantification of the regularity of a static natural scene (Geisler, 2008;Simoncelli & Olshausen, 2001;Tolhurst et al., 1992;. Specifically, motion statistics of unpleasant natural movies, such as those of wriggling worms, are not as well known as statistics of static unpleasant images (e.g., Cole & Wilkins, 2013). It is also unclear if and how our observers rated the unpleasantness of a noise pattern by associating with a particular unpleasant movie. Whereas the rating data of our observers showed little in the way of individual differences, it is not impossible that some people are even pleased with a movie of writhing worms. It has been reported that many people do not judge the image of foods as unpleasant even though food images often involve image statistics related to unpleasantness (Mori & Motoyoshi, 2017, in an abstract form). To gain further insight, future investigations may analyze more directly the relationships among unpleasantness, natural motion statistics, and dynamic scene categories.
It is unclear why the feeling of unpleasantness peaked at relatively low temporal frequencies under many conditions; i.e., around 0.5-2 Hz in Expt. 1 and 1-4 Hz in Expt. 2. Several previous studies suggested that visual discomfort is related to the concentration of contrast energy at spatial frequencies around 1-5c/deg (Fernandez & Wilkins, 2008;O'Hare & Hibbard, 2011), at which the early visual system has the highest contrast sensitivity (Shepherd, 2001;Wilkins et al., 1980). Many studies also show that temporally changing patterns (e.g., flicker and grating) cause illusion and epilepsy in participants (Harding & Jeavons, 1994;Wilkins, 1995) with a temporal frequency of around 15 Hz, at which the early visual system is most sensitive (Kelly, 1979;Robson, 1966). According to these findings, one may expect that the visual unpleasantness of our dynamic noise (having spatial frequencies of 0.3-5.3c/deg) should have peaked at temporal frequencies of 8-20 Hz, which is much higher than the frequencies of 0.5-4 Hz that we found. It is difficult to ascribe this discrepancy to the spatiotemporal contrast sensitivity of the early visual system as measured with the detection threshold. One possible reconciliation of the discrepancy is that the unpleasantness referred to in the present study was an emotional dimension different from discomfort, which could even involve pain and epileptic responses. If this is the case, future studies should carefully distinguish such measures as unpleasantness, preference, and discomfort.
Meanwhile, the importance of the variegated motion direction seems to support the idea that the global statistics of motion signals, but not the temporal modulation of the luminance input, are critical. The unpleasantness of stimuli with random motion might be related not to local motion energy in the primary visual cortex but to spatial integration of motion signals in higher-order visual motion areas including the middle temporal visual area (Britten et al., 1993) and the superior temporal sulcus, which is sensitive to biological motion stimuli (Grossman et al., 2000;Frith et al., 2003). However, this hypothesis is not consistent with the fact that the unpleasantness peaked in a particular temporal frequency range regardless of the center spatial frequency (Fig. 2a): If the high-level motion representation is exclusively responsible, it would have depended on the velocity. Potentially, the data on the whole are a confounding result in terms of levels of motion representations and flicker-related representations.

Author contributions
NO and IM designed the experiment. NO collected and analyzed the data. NO and IM wrote the manuscript.