Introduction

The visual system faces a continuous challenge of deciding which parts of the world belong together and which do not. Visual stimulation from different parts of the retinal image must be segregated and grouped into figure and ground. Moving stimuli are especially tricky because of the inherent ambiguity of local motion signals, known as the aperture problem (Adelson & Movshon, 1982; Caplovitz, Paymer, & Tse, 2008; Nakayama & Silverman, 1988a, b). Form and perceptual organization operations provide important information that helps the motion system resolve the aperture problem (McDermott & Adelson, 2004). The inverse is also true; motion processes can influence image segmentation and form perception. For example, in structure-from-motion displays, motion can transform a flat, featureless pattern of dots so that it can give rise to a rich experience of object form and 3-D structure (Treue, Husain, & Andersen, 1991; Wallach & O’Connell, 1953). Similarly, in biological motion point-light walker displays, motion greatly facilitates the rapid grouping of dots into a global percept of a moving person (Johansson, 1973). In both cases, motion causes independent image elements to be grouped together, which then gives rise to form interpretations that were not possible when the stimulus was stationary.

Another example can be found in motion-based grouping (Anstis & Kim, 2011). A simple demonstration of motion-based grouping can be derived using a stimulus composed of four pairs of rotating dots. When viewing this stimulus, observers can perceive either a “local” percept of four individually rotating dot pairs or a “global” percept of two translating squares (see Supplemental Demo 1, as well as Anstis & Kim, 2011). In both cases, the dots are grouped according to Gestalt principles of common fate (Köhler, 1969). However, different groupings have different types of motion in common (Anstis & Kim, 2011) and, thus, correspond to distinct perceptual outcomes. Interestingly, the outcome of the grouping process does not just influence the perceived forms of the moving objects, but can also influence the perceived speed of the motion in the display. Specifically, when asked to indicate a general impression of how fast things are moving in the display, observers report that the global percept appears to move more slowly than the local percept (Kohler et al., 2009). It remains unclear, however, exactly why this is the case, and several empirically testable hypotheses can be proposed that might explain why seeing global motion leads to a slowdown in perceived speed.

One possibility is that the global slowdown effect occurs because the grouped percept of the translating squares is perceived to be a larger object than the L-pairs. Brown (1931) found that the perceived velocity of a moving object decreases with increasing size. This could occur if a just noticeable displacement of an object were proportional to its size, as in Weber’s law. Consistent with this possibility, it has been shown that increasing the size of a high-contrast drifting Gabor patch (a sine wave grating presented within a stationary two-dimensional Gaussian window) makes it more difficult to perceive the direction of motion (Tadin, Lappin, Gilroy, & Blake, 2003). The authors suggested that this effect could potentially arise because of center–surround antagonism in the medial temporal area. Although the mechanism remains unclear, these results indicate that motion perception can be critically influenced by stimulus size. Because the two illusory squares in the global configuration are larger perceptual objects than the four L-pairs, the decrease in perceived speed during grouping could be due to increased perceived stimulus size. This would mean that the size of an illusory moving object can affect how its motion is perceived. We are unaware of any study that has shown this to be the case.

Another potential explanation for the slowdown effect is that the local and global percepts may differentially activate rotation-sensitive neurons in the visual cortex. Emergent rotation is perceived in the local, but not in the global, percept. Several experiments have shown that different types of complex global motion yield different speed percepts, although the local motions in the image sequence remain identical. Geesaman and Qian (1996, 1998) found that an array of dots moving in an expanding pattern was seen to move faster than identical dots moving in a rotating pattern. Moreover, a radially moving grating was perceived to move faster than a translating grating, even when the local motion components were made identical (Bex & Makous, 1997). In another experiment, the perceived speeds of identical gratings arranged to form translating and rotating global motion patterns were the same, while radiating (expanding and contracting) patterns appear to move approximately one third faster (Bex, Metha, & Makous, 1998). In macaques, neurons in the medial superior temporal cortex (MSTd) have been found to be specialized for different types of emergent motion signals, such as expansion and rotation (Graziano, Andersen, & Snowden, 1994; Tanaka & Saito, 1989). It has been suggested that differences in perceived speed occur because MSTd neurons tuned to expansion outnumber those tuned to rotation (Geesaman & Qian, 1996). These interpretations are inconclusive, and various alternatives have been suggested (Bex & Makous, 1997; Clifford, Beardsley, & Vaina, 1999), but it is clear that different types of emergent motion signals can lead to differences in perceived speed that presumably reflect differences in underlying processing. Because emergent rotation is perceived in the local but not the global, percepts, these processing differences could potentially explain our effects.

A third possibility is that the slowdown effect is caused by a difference in perceived image elements between the local and global conditions. Several results have suggested that speed discrimination is, in part, dependent on the parsing of an image into discrete entities and the pooling of information across them over space and time. Spatially separated drifting gratings, when perceptually grouped and perceived as one large, partially occluded grating, have increased speed discrimination thresholds, as compared with when they are perceived as individual image elements (Verghese, Stone, & others, 1996, Verghese & Stone, 1995). These effects are highly dependent on spatial layout (Verghese & Stone, 1997) and are thought to occur because increasing the number of discrete entities from which to estimate speed increases sensitivity to speed (Verghese & Stone, 1995). Further work has shown that perceptual grouping can influence speed perception, potentially because local motion signals are hard to access when they are seen as part of a global motion path (Verghese & McKee, 2006). In a related set of experiments, the motion aftereffect was used to show that there is stronger surround suppression when local motion signals are seen as part of a larger moving object (Tadin, Paffen, Blake, & Lappin, 2008). It is important to note that these studies found effects on speed discrimination and the motion aftereffect, respectively, whereas the global slowdown effect is a reduction in perceived speed. However, if the effects on speed discrimination and the motion aftereffect are caused by the same mechanism, such as surround suppression, that mechanism could also potentially lead to a decrease in perceived speed and explain the global slowdown effect: When four rotating L-pairs are grouped into a percept of two large translating squares, the number of discrete elements that are perceived in the image goes from four to two. With fewer samples from which to estimate speed, the speed sensitivity may decrease, leading to the perceived slowdown when the L-pairs group together.

In this article, we describe the results of three experiments designed to test the predictions made by these three hypotheses. The results of the experiments suggest that none of the above hypotheses is sufficient to explain why the global percept is perceived to move more slowly than the local one. As such, an empirically derived answer to the source of the global slowdown effect remains to be found. However, by ruling out these three plausible explanations, the results presented here place useful constraints on both future research and modeling. In the discussion, we consider some other possible sources of the difference in perceived motion magnitude between the local and global percept cases, including emergent motion signals off of illusory corners or terminators and Bayesian priors on slowness of motion under conditions of uncertainty.

Experiment 1: Slowdown is not due to difference in perceived size

To test the hypothesis that the global slowdown effect is driven by differences in perceived size between the local and global percepts, we varied the distance between the L-pairs and, hence, the size of the illusory squares in the global percept, while keeping everything else constant. This modulates the size difference between the local and the global percepts, because bigger squares mean bigger size differences, while smaller squares mean smaller size differences. If the global slowdown effect arises because of differences in size between the global and local percepts, we would expect that the bigger illusory squares would lead to bigger effect sizes.

Method

Subjects

Nine students at Dartmouth College participated for course credit or cash payment. All subjects were naïve as to the purpose of the experiment, reported normal or corrected-to-normal vision, and gave informed written consent according to the guidelines of the Institutional Review Board of Dartmouth College prior to participating. This was true for all subjects in the experiments reported here, unless otherwise noted.

Apparatus and stimuli

The stimuli used were very similar to the ones used in our previous study (Kohler et al., 2009)—namely, a version of a stimulus designed by Stuart Anstis (Anstis, 2003; Anstis & Kim, 2011), where four rotating dot pairs group together into a percept of two translating squares. Instead of dot pairs, we used L-pairs that could be either in alignment, so that they consistently induced a percept of two squares translating (global configuration, shown on the left side of Fig. 1), or out of alignment, so that a local percept of four independently rotating L-pairs was consistently perceived (local configuration, right side of Fig. 1; see Supplemental Demo 2). In the local configuration, each L was pseudorandomly assigned one of the four possible upright orientations, such that Ls always consisted of a horizontal and a vertical line but could not combine to form the corners of a square, and the starting position of each pair was randomized on every trial. The two versions of the stimulus were thus nearly identical at the local level (observing the motion of only a single L would not reveal whether one was observing the local or the global configuration) but radically different in their tendency to induce the global percept.

Fig. 1
figure 1

Stimulus configuration used in Experiments 1 and 2. a The central area of the screen around fixation, as seen by the subjects in Experiments 1 and 2. On the left side of the fixation spot, the L-pairs move in-phase, and Ls are oriented so that they form corners of virtual squares, to facilitate a global percept of two overlapping translating squares. On the right side, the L-pairs move out-of-phase, and Ls are randomly oriented, to facilitate a local percept of four independently rotating L-pairs. All distances are reported in degrees of visual angle. Note that L-pairs were white on a gray background. In Experiment 1, the horizontal and vertical distances between L-pairs (indicated with *) was varied to produced four different global square sizes. b The rotational motion used for the L-pairs in Experiment 1. c The nonrotational “bumping” motion used for the L-pairs in Experiment 2. Individual Ls in a pair moved toward each other along a circular trajectory and reversed direction just before overlapping

L-pairs were white (67.02 cd/m2) on a gray (4.09 cd/m2) background, with a white (67.02 cd/m2) vertical bar separating the two sides of the display. The two line segments making up each L measured 0.4° of visual angle in length. Within a pair, the distance between Ls measured 0.8° of visual angle (see Fig. 1). Each stimulus configuration was centered 6.0° of visual angle from the fixation spot. While maintaining central fixation, subjects reported that they could easily see all the moving elements and consistently reported global motion percepts in the aligned configuration and local percepts in the unaligned configurations (see Supplemental Demo 2).

These stimulus parameters were identical to the ones used in the previous study (Kohler et al., 2009) in all regards, except for one critical detail: The horizontal and vertical distances between the L-pairs were varied according to the following four illusory squares sizes: size 1 (2 × 1.6° of visual angle), size 2 (3 × 2.4° of visual angle), size 3 (4 × 3.2° of visual angle, shown in Figure 1), and size 4 (5 × 4° of visual angle). All other distances in the experiment were identical to those used previously.

One global stimulus configuration (reference) was always presented with the L-pairs rotating at the same angular velocity (57.6° of rotation per second) on every trial. This corresponds to a linear velocity of 0.80° of visual angle per second; in the following, we will report angular velocities for all experiments except Experiment 3. This reference speed was chosen because it permitted the perception of both local and global percepts well, while being slow enough that the number of rotations could not bias subjects’ speed judgments. The reference was randomly presented to either the left or the right side of the fixation spot. Another stimulus (test) was simultaneously presented to the opposite side, with L-pairs rotating at one of seven different possible angular velocities, of which one was the reference angular velocity (17.3°/s, 28.8°/s, 51.9°/s, 57.6°/s, 63.4°/s, 69.2°/s, or 98.0°/s). On each trial, the test could be arranged in either the global or the local configuration, and the distance between L-pairs for both reference and test could correspond to size 1, size 2, size 3, or size 4. Stimuli were presented for 1.5 s and then removed from the screen, which meant that the L-pairs in the reference rotated 86.4° per presentation. Subjects were instructed to indicate, via a two-alternative forced choice (2AFC) buttonpress, whether the stimulus on the left or that on the right appeared to be moving faster. The test speed on each trial varied in a pseudorandom order, so that each speed was presented 20 times for both the local and global test conditions, for each of the four illusory square sizes. This resulted in a total of 1,120 trials (20 × 7 × 2 × 4). All four L-pairs rotated in the same direction on a given trial, but rotation direction was randomized across trials.

In our previous study (Kohler et al., 2009), as well as in the other experiments described here, we included a control experiment in which a single L-pair from the reference and test configuration was presented alone, to rule out any differences arising from local features of the stimuli. Because the only difference between this experiment and the ones previously reported was the distance between the L-pairs, such a control experiment would be identical to the one already reported (Kohler et al., 2009) and was, therefore, not included.

For each of the 9 subjects, and for each of the four illusory square sizes, we calculated the proportion of trials on which the test stimulus appeared to move faster than the reference, for both the global and local test stimuli. The data from each observer were then fit using a logit function in MATLAB, from which a point of subjective equality (PSE) was computed, corresponding to the speed at which the test needed to move in order to be perceived as having the same speed as the reference.

Results

As was expected, the results of Experiment 1 replicated the global slowdown effect. There was an interaction between the size of the global slowdown effect and global square size, but bigger global square sizes did not lead to bigger effect sizes. We did a repeated measures ANOVA on the PSEs with local or global test configuration as one factor and illusory square size as the other factor, which revealed a main effect of configuration, F(1, 8) = 64.39, p ≤ .005, (effects size : η 2 p  = .889), a main effect of square size, F(3, 18) = 3.51, p = .031 (effect size : η 2 p  = .305), and an interaction between configuration and square size, F(3, 24) = 4.93, p = .00827 (effect size : η 2 p  = .381). Mauchly’s test of sphericity was not significant for any of the main effects or interactions, indicating that sphericity could be assumed. We did a linear within-subjects contrast on the interaction and found a significant effect, F(1, 8) = 13.93, p = .0058 (effect size : η 2 p  = .635). To further explore this effect, we did paired t-tests for each size, comparing the PSEs when the test stimulus configuration was local and global. We found a significant effect for all four square sizes [size 1, t(8) = 8.54, p < .001; size 2, t(8) = 5.42, p < .001; size 3, t(8) = 6.01, p < .001; size 4, t(8) = 6.68, p < .001]. We calculated the effect size as the PSE decrease in percent between global and local motion for each subject by subtracting the local test PSE from the global test PSE, dividing with the global test PSE, and multiplying by 100. We did this for each of the four sizes and then averaged across subjects. The mean effect size for sizes 1–4 were 34.09%, 37.54%, 17.48%, and 27.78%, respectively (see Fig. 2).

Fig. 2
figure 2

Results of Experiment 1 show that the global slowdown effect does not increase with increasing global square sizes. a Four plots indicating the data and psychometric functions for all four global square sizes. The filled circles and squares represent the mean response rate across subjects, for each test stimulus speed; the curves are the mean of the fit across subjects (dark gray, global test; light gray, local test). Black lines indicate the test speed necessary for the test and reference to be perceived as moving at the same speed (point of subjective equality [PSE]). b PSE measures derived from the psychometric functions, for local and global test stimuli and each of the four global square sizes, presented as degrees of rotation per second. The solid black line indicates the reference speed used in these experiments. The local test PSEs are smaller than the reference speed for all four global square sizes, indicating that the local test stimulus had to move slower in order to be perceived as having the same speed as the global reference. The size of this effect does not increase with the size of the global square. In fact, we see a small effect in the opposite direction, such that the effect is bigger for smaller global square sizes

Discussion

The results of Experiment 1 show that for all square sizes, perceptually grouped global configurations were perceived to move more slowly than local configurations. The significant linear within-subjects contrast indicates that when L-pairs formed bigger illusory squares, the effect size was smaller (smaller decrease in the PSEs between global and local configurations). This is the opposite effect of what we would predict, given that bigger objects are perceived to move more slowly (Brown, 1931). Moreover, although the linear contrast reached significance, the relationship between size and slowdown was not monotonic. This means that although there is an influence of the size of the illusory squares on the slowdown effect, the relationship between illusory square size and effect size cannot account for the global slowdown effect. We conclude that the slowdown effect is not due to differences in perceived size.

Experiment 2: Slowdown occurs in absence of rotation signals

To test the hypothesis that the global slowdown effect was caused by the local and global percepts differentially activating rotation-specific neurons in the visual cortex, we adjusted our original stimulus (Kohler et al., 2009) so that emergent rotation signals were eliminated in the image and examined the effect under those conditions. If the slowdown effect relies on rotation-specific neurons, completely removing rotational motion from the stimulus should eliminate the effect.

Method

Subjects

Eight students at Dartmouth College participated for course credit or cash payment.

Apparatus and stimuli

Everything in our experimental design was identical to Experiment 1, except for two things. (1) Here, we kept the distance between L-pairs constant, so that the vertical distance between pairs was 3.2° of visual angle, and the horizontal distance between pairs was 4.0° of visual angle. (2) Instead of each L-pair rotating around a common axis, here, each L translated along a circular trajectory around the shared axis, but in opposite directions, so that when one L moved clockwise, the other moved counterclockwise (see Supplemental Demo 3). Whenever the centers of each L got within 90° of one another, the Ls reversed and continued along the trajectory in the opposite direction, so that the Ls never actually overlapped. Note that this “bumping” motion of the Ls meant that the stimuli produced no emergent rotation signal, although it was otherwise identical to the stimulus used previously. As with the previous stimuli, we created two versions of the stimuli, a global configuration where the L-pairs were aligned to induce the global percept of two illusory squares and a local configuration where the orientations of each L were chosen pseudorandomly, and the starting position of each pair was randomized on every trial, to induce the local percept of “bumping” L-pairs.

For the main experiment (Experiment 2A), one global stimulus configuration (reference) was always presented to either the left or the right side of fixation, with the pairs of Ls translating at the same angular velocity (57.6°/s) on every trial. Another stimulus (test) was simultaneously presented to the opposite side, with L-pairs translating at one of 11 different possible angular velocities, 1 of which was the reference angular velocity (17.3°/s, 28.8°/s, 40.3°/s, 46.1°/s, 51.9°/s, 57.6°/s, 63.4°/s, 69.2°/s, 74.9°/s, 86.4°/s, or 98.0°/s). On each trial, the test could be either the global or the local configuration. Stimuli were presented for 1.5 s and then removed from the screen. Subjects were instructed to indicate, via a 2AFC buttonpress, whether the stimulus on the left or the right was moving faster. Each possible test speed was presented in a pseudorandom order, 20 times for both the local and global test conditions.

As in our original study (Kohler et al., 2009), we ran a second experiment (Experiment 2B) using the same subjects, to control for potential local differences in motion signals produced because Ls were oriented differently across the two configurations. This experiment was the same as Experiment 2A, except that only a single L-pair from each of the configurations used in Experiment 2A was shown, presented equidistant from fixation at mirror-opposite positions around the vertical midline. This experiment serves as a necessary control, because no global percept of squares is possible when there is only one L-pair. Any differences found between the two local and global configurations in control Experiment 2B must be due to differences arising from the orientation of the Ls alone. If these two control stimuli are perceived to be moving at the same speed, any differences observed in Experiment 2A must be due to differences between global and local perceptual organization, and not local differences in the orientations of the Ls.

For each of the 8 subjects who participated in the two experiments, we calculated the proportion of trials on which the test stimulus appeared to move faster than the reference, for both the global and local test stimuli. The data from each observer were then fit using a logit function in MATLAB, from which a PSE was computed, corresponding to the speed at which the test needed to move in order to be perceived as having the same speed as the reference. None of the subjects had PSEs that were more than two standard deviations away from the mean across subjects.

Results

In Experiment 2, we found the same global slowdown effect as in previous experiments: When L-pairs were perceptually grouped into two translating squares, they appeared to move slower. We performed a repeated measures ANOVA on the PSEs from the two experiments, with number of Ls as one factor and local or global configuration as the other factor. We found a main effect of number of Ls, F(1, 7) = 16.34, p < .005 (effect size :  η 2 p  = .70), a main effect of configuration, F(1, 7) = 14.24, p < .007 (effect size : η 2 p  = .67), and a significant interaction between number of Ls and configuration, F(1, 7) = 19.08, p < .003 (effect size : η 2 p  =  .732). To further explore this effect, we did paired t-tests comparing the PSEs when the test stimulus configuration was local and global, for both Experiments 2A and 2B (see Fig. 3). We found a significant difference between the local and global test stimulus PSEs in Experiment 2A, t(7) = 4.31, p = .0035, but not in Experiment 2B, t(7) = 1.88, p = .10. Although we refer here to the configurations used in Experiment 2B as “local and ‘global,” it is important to note that in neither case was a global percept perceived. We calculated the effect size in Experiment 2A as described in the results for Experiment 1. The mean effect size between global and local configurations in Experiment 2A was quite large (69.33%), but after removing 2 subjects that had very large effect sizes (245.52% and 133.15%, respectively), the mean effect size was more similar to those found in Experiment 1 (29.33%). This suggests that these two potential outlier subjects may have artificially inflated the effect size. Repeating the entire analysis without these two potential outliers did not change the directionality or significance of any of the results reported above.

Fig. 3
figure 3

The global slowdown effect without emergent rotation in the image. a Results of Experiment 2A: Percentage of responses where the subjects reported the speed of the test stimulus as faster than the reference stimulus, plotted as a function of the physical speed of the test stimulus. The filled circles and squares represent the mean response rate across subjects, for each test stimulus speed; the curves are the mean of the fit across subjects (dark gray, global test; light gray, local test). The black lines indicate the test speed necessary for the test and reference to be perceived as moving at the same speed (point of subjective equality [PSE]). The right side of the figures shows means of the individually computed PSEs across subjects for both test conditions; the black line indicates the actual speed of the reference (57.6 º/s). b Results of Experiment 2B, where only a single L-pair was used. (Error bars represent the standard error of the difference between test conditions in each experiment.) The PSE for the local test condition is significantly smaller than the PSE for the global test condition in Experiment 2A, but not in Experiment 2B

Discussion

The results of Experiment 2 indicate that even in absence of emergent local rotation signals, the global motion percept appears to move more slowly than the local motion percept and that this effect cannot be explained by local differences in the motion signals arising from the different L orientations. We conclude that the global slowdown effect is not due to emergent rotational motion signals arising in the local, but not the global, condition.

Experiment 3: Slowdown is not due to number of image elements

In order to test the hypothesis that the global slowdown effect arises from differences in the number of perceived image elements between the local and the global percepts, we performed the same experiment with a different stimulus set for which grouping does not decrease the number of image elements but, rather, increases it. If the effect depends on the number of image elements, we would expect this stimulus to yield no effect or, perhaps, even the opposite effect of what we saw in the previous experiments.

Method

Subjects

Fourteen students at Dartmouth College participated for course credit or cash payment.

Apparatus and stimuli

Here, we abandoned the rotating L-pairs and, instead, used “pacmen” created using a thin circular annulus (radius, 0.80° of visual angle; line thickness, 0.20° of visual angle) occluded by a square that had the same color as the background (see Supplemental Demo 4 for the horizontal motion case and Supplemental Demo 5 for the vertical motion case). Each configuration consisted of four pacmen, each with an associated square occluder with an edge length equal to the annulus diameter. The occluders oscillated along a 0.64° of visual angle horizontal or vertical path centered on a position where exactly one quarter of the annulus was occluded and reversed direction whenever they reached either end of the path (see Fig. 4). The motion of the occluders determined grouping: When all four occluders moved in phase either horizontally or vertically, a grouped percept of a large square moving in front of four annuli was perceived (global configuration, five image elements). When the starting position was selected randomly for each occluder so that they moved out of phase, four oscillating pacmen were perceived (local configuration, four image elements). On each trial, all occluders in both the reference and test stimuli always traveled along the same dimension (vertical or horizontal). Unlike the L-pairs, this stimulus resulted in a global configuration that had more image elements (five) than the local configuration (four).

Fig. 4
figure 4

Stimulus configuration used in Experiment 3. Main figure shows the center section of the display seen by subjects in Experiment 3. On the left side, the occluders move in-phase in order to facilitate a global percept of a square occluding four annuli. On the right side, the occluders move out-of-phase in order to facilitate a local percept of four independently oscillating pacmen. Occluders (outline shown on the upper right pacman on the right side) moved either horizontally or vertically a set distance before returning, and annuli did the same in 1 of 24 evenly spaced directions (illustrated on the upper left pacman on the right side), changing direction every time they returned to their starting position. The maximal distance traveled by occluders and annuli in any direction is indicated on the inset. For illustration purposes, occluders and annuli on the left side are shown in the mid-point of their trajectory, whereas elements on the right side are shown at various points of their trajectory. However, occluders and annuli on both sides moved continuously throughout each presentation. All distances are reported in degrees of visual angle. Note that pacmen were white on a gray background

In order to strengthen the percept of each pacman as a single image element, we added lines (line thickness: 0.20° of visual angle) along the occluding edges so that the occluded annuli literally looked like line-drawing pacmen. Regardless of the configuration, we also made each annulus undergo translational motion in 1 of 24 directions, 15° apart. The motion direction of each annulus was randomly determined, but the speed was always constant and identical to the reference speed (2.40° of visual angle per second; see below). Like the occluders, the annuli moved a certain distance (0.24° of visual angle) before returning to their starting position, at which point a new random direction was selected. In the local configuration, this resulted in a strong percept of four pacmen undergoing jittery motion while, at the same time, undergoing transformations along the “mouth” edges. In the global configuration, the percept was of a square translating in front of four jittering annuli (see Fig. 4, Supplemental Demo 4 and 5).

In the main experiment (3A), similar to the previous experiments, one global stimulus configuration (reference) was always presented to either the left or the right side of fixation, with the occluders translating in phase at the same velocity (2.40° of visual angle per second) on every trial. Another stimulus (test) was simultaneously presented to the opposite side, with occluders translating out of phase at 1 of 7 different possible linear velocities, centered on the reference linear velocity (0.72, 1.92, 2.16, 2.40, 2.64, 2.88, and 4.08 ° of visual angle per second). The test stimulus could be either the global configuration or the local configuration and was presented randomly on either the left or the right side. As in previous experiments, the stimulus was presented for 1.5 s, after which subjects were instructed to indicate, via a 2AFC buttonpress, whether the stimulus on the left or that on the right was moving faster. Each possible test speed was presented in a pseudorandom order, 20 times for both the local and global test conditions, resulting in a total of 440 trials (2 × 11 × 20).

As in Experiment 2, we also did a second experiment (Experiment 3B), with the same subjects, to control for potential local differences between single annuli–occluder pairs. This experiment was the same as the one described above, except that only a single pacman (annulus+occluder), was presented in each configuration, equidistant from fixation at mirror-opposite positions around the vertical midline, one drawn from the local and the other from the global configuration. Any differences in perceived speed found between those two conditions must arise from differences in the motion of single annulus–occluder pairs alone. Conversely, if the single annulus–occluder pairs from the two configurations are perceived to move at the same speed, any differences observed in Experiment 3A must be due to global versus local perceptual organization, and not local differences between the different annulus–occluder pairs.

For each of the 14 subjects who participated in the two experiments, we calculated the proportion of trials on which the test stimulus appeared to move faster than the reference, for both the global and local test stimuli. The data from each observer was then fit using a logit function in MATLAB, from which a PSE was computed, corresponding to the speed at which the test needed to move in order to be perceived as having the same speed as the reference. Two of our subjects had PSEs that were more than two standard deviations away from the mean across all subjects and were, therefore, excluded from further analysis.

Results

In Experiment 3, we found the same global slowdown effect as in previous experiments: When pacmen were perceptually grouped into a square moving in front of four dots, the stimulus appeared to move slower. Our statistical analysis was equivalent to that used in Experiment 2. We performed a repeated measures ANOVA on the PSEs from the two experiments, with number of pacmen as one factor and local or global configuration as the other factor. We found a significant interaction, F(1, 11) = 7.06, p < .022 (effect size :  η 2 p  = .391), between number of pacmen and configuration, with none of the main effects reaching significance. This indicates that the global motion percept appears to move more slowly than the local motion percept and that this effect cannot be explained by local differences in the motion signals arising from individual pacmen. To further explore this effect, we did paired t-tests comparing the PSEs when the test stimulus configuration was local and global, for both Experiment 3A and 3B (see Fig. 5). We found a significant difference between the local and global test stimulus PSEs in Experiment 3A, t(10) = 2.42, p = .034, but no difference in Experiment 3B, t(10) = −0.45, p = .66. Although we refer here to the configurations used in Experiment 3B as “local” and “global,” it is important to note that since only one pacman was presented, in neither case was a global percept possible. We calculated the decrease in PSE in Experiment 3A as described in the results for Experiment 1 and found that the mean decrease in the PSE between global and local configurations in Experiment 3A was 11.51%.

Fig. 5
figure 5

Slowdown effect without decrease in image elements during grouping. a Results of Experiment 3A, same conventions as in Fig. 3. Plot shows the mean response rate across subjects, for each test stimulus speed, while the curves are the mean of the fit across subjects (dark gray, global test; light gray, local test). The black lines indicate the test speed necessary for the test and reference to be perceived as moving at the same speed (point of subjective equality [PSE]). The right side of the figures shows means of the individually computed PSEs across subjects for both test conditions; the black line indicates the actual speed of the reference (2.4º of visual angle per second). b Results of Experiment 3B, where only a single pacman was used. (Error bars represent the standard error of the difference between test conditions in each experiment.) The PSE for the local test condition is significantly smaller than the PSE for the global test condition in Experiment 3A, but not in Experiment 3B

Discussion

These results show that even when grouping leads to an increase in discrete image elements, rather than a decrease as in the previous experiments, the global motion percept appears to move more slowly than the local motion percept. This effect cannot be explained by local differences in the motion signals arising from the pacmen associated with each stimulus condition. The fact that we found a somewhat smaller decrease (11.51%) in this experiment, as compared with the other experiments reported here, may suggest that those effects were influenced to some degree by other factors, including, perhaps, the number of perceived image elements. When we calculate the decrease in the same way on the analogous data from our previously reported experiment (Kohler et al., 2009), we find a mean decrease of 14.45%, which is closer to the mean decrease in Experiment 3A, as is the mean decrease for size 3 in Experiment 1 (16.98%). However, even considering these two “reduced” effect sizes, the smallest that we have seen for L-pairs across three experiments, are ~3-5% larger than those found in Experiment 3A. Even if these effect size differences are, in fact, due to real differences between Experiment 3 and the other experiments, the fact that we find the global slowdown effect at all in Experiment 3 shows that there must be some other factor beyond the number of perceived image elements that is driving this effect.

General discussion

Local motion is inherently ambiguous. Cells in V1 have small receptive fields and can measure motion only in the direction orthogonal to the locally sampled moving contour. This means that the local motion measurements are compatible with an infinite number of possible real-world motions (Adelson & Movshon, 1982). This ambiguity, known as the “aperture problem” (Nakayama & Silverman, 1988a, b), necessitates that further analysis in areas beyond V1 takes place before an unambiguous motion percept can be computed (McDermott & Adelson, 2004). The visual system must solve the ambiguity by integrating information across space; one potential strategy is to find the intersection of the constraints given by ambiguous signals from different parts of the same object (Adelson & Movshon, 1982). Alternatively, solutions can rely on nonambiguous “trackable features,” such as corners, line terminators (Hildreth & Ullman, 1982), T-junctions (McDermott, Weiss, & Adelson, 2001), and contour curvature (Blair, Goold, Killebrew, & Caplovitz, 2013; Caplovitz, Hsieh & Tse, 2006; Caplovitz & Tse, 2007b). The challenge for the visual system in either case is to determine what parts of the image to integrate and what parts to segregate (Braddick, 1993; Burr & Thompson, 2011): Do two moving contours belong to the same or two different moving objects? Is the terminator motion signal coming from the moving figure, or is it a spurious signal arising due to occlusion? It is critical to combine information across space, but it is also critical to do so correctly (McDermott & Adelson, 2004). Reconciling segregation and integration may involve combining information available at different spatial scales, and across different domains and can potentially rely on the figure–ground organization of the image (Braddick, 1993). In fact, the motion system relies critically on information from a number of sources, including nonlocal information arising from operations that carry out form processing, grouping, and perceptual organization (McDermott et al., 2001).

In the present report, we investigate a case where motion-based grouping influences perceived motion. Locally rotating dot pairs can group together to form translating squares (Anstis & Kim, 2011). We have previously used a version of this stimulus to demonstrate that global percepts can be perceived to move more slowly than local percepts (Kohler et al., 2009). Because the local motion in the image remains the same, the output of global form processing associated with the grouped percept of translating large squares must, somehow, influence the perceived speed. Using different stimuli, we have previously shown that perceptual grouping can affect perceived speed (Caplovitz & Tse, 2007a) and motion fading (Hsieh & Tse, 2007) by giving rise to virtual trackable features and emergent motion signals. Unlike the stimuli used in those previous experiments, however, the separation between grouped image elements (dots or L-pairs) can be very large when using Anstis’s stimulus (up to 4.00° of visual angle in the experiments reported here). In this set of experiments, we test and rule out three hypotheses for why the global slowdown effect might occur, relating to perceptual size, emergent rotational motion signals, and the number of perceived discrete image elements.

In Experiment 1, we varied the size of the global motion percept by changing the distance between the L-pairs and found that this manipulation had no influence on the size of the effect. We used four different illusory square sizes, the largest (5° × 4° of visual angle) being 2.5 times bigger than the smallest (2° × 1.6° of visual angle). This is comparable to the range of Gabor patch sizes (0.7°–5° of visual angle) used by Tadin and colleagues (Tadin et al., 2003), and although they did find that the size effect was diminished when gratings were presented in the periphery, there was still a robust effect at eccentricities far greater than the 6.0° of visual angle at which we presented our L-configurations. Brown (1931) did not report the size of his stimuli in degrees of visual angle, but there was a twofold increase between his smallest and largest stimuli. Overall, we believe our stimulus sizes were appropriate for uncovering potential size-related differences in effect size. We find a slight decrease in effect size with increased size, rather than the increase that would be predicted if the global slowdown effect is a result of differences in perceived size between local and global motion percepts. Therefore, we conclude that the global slowdown effect is not related to the perceived size of the stimuli.

In addition, we note that the results of Experiment 1 also rule out normalized displacement as a potential explanation for the effect. Normalized displacement relative to stimulus size is smaller for the percept of global squares, as compared with the percept of individually rotating L-pairs, which could then be another potential explanation for why the effect occurs. Because the distance between the Ls in each pair was the same for all four global square sizes, the normalized displacement would have been smaller for larger squares, so normalized displacement, like perceived size, would predict a bigger effect with larger global squares. This was not what we found, so normalized size cannot explain our results.

In Experiment 2, we asked whether the L-pairs had to rotate in order for the global slowdown effect to occur. We found that rotation was not necessary for the effect: When the Ls in each pair were made to translate toward one another along the partial arc of a circular trajectory, the global slowdown effect persisted. This shows that our effects do not depend on specialized rotation-sensitive neurons that code the local percept of rotating L-pairs, but not the global percept of translating squares. While it is clear that neurons exist that are specialized for coding different emergent motion percepts such as rotation and expansion (Graziano et al., 1994; Tanaka & Saito, 1989) and that these differences in coding can have perceptual consequences (Geesaman & Qian, 1996; Thornton & Gilden, 2001), the global slowdown does not appear to be driven by rotational motion signals per se.

In Experiment 3, we replicated the global slowdown effect using a Kanizsa-square-like stimulus where the local configuration has fewer perceived image elements than the global configuration. The global slowdown effect persists with this stimulus, and therefore we conclude that a difference in the number of perceived image entities is an unlikely explanation for the effect. The fact that we can replicate the effect using a stimulus where both the local and the global motion is purely translational also lends further support to our findings from Experiment 2 that emergent rotational motion signals are not necessary for the effect. Furthermore, the results of Experiment 3 extend the generalizability of our results, by showing that our effect is not stimulus specific. Grouping of both our original L-pairs and the pacmen is likely to rely on common fate (Uttal, Spillmann, Stürzel, & Sekuler, 2000), but the pacmen stimulus may also rely on modal completion (Kanizsa, 1979) mechanisms that are largely absent from the L-pairs used in the first two experiments. This suggests that the global slowdown effect may, in fact, be pervasive in situations where local moving elements are grouped together to form a global percept. There is reason to expect that this may be the case, as we will discuss below.

Now that we have ruled out a number of potential explanations for the global slowdown effect, we will consider other hypotheses that may be capable of accounting for our results. Attention has been shown to modulate the appearance of a number of basic stimulus properties (e.g., Carrasco, Ling, & Read, 2004; Liu, Abrams, & Carrasco, 2009; Reavis, Kohler, Caplovitz, Wheatley, & Tse, 2013), among them perceived speed, which was increased by attention (Turatto, Vescovi, & Valsecchi, 2007). Given these results, one would expect that percepts that engage attention more would be perceived to move faster. There is no reason to expect, however, that the percept of rotating L-pairs would engage attention more than the global percept of translating squares. In fact, there is evidence that perceptual grouping requires focused attention (Ben-Av, Sagi, & Braun, 1992; Fang, Kersten & Murray, 2008), although the issue has not been resolved (Moore & Egeth, 1997). If the global percept of translating squares is dependent on attention or simply engages attention more than does the local percept, we would expect the opposite effect of what we find. Because of this, we consider attention an unlikely explanation for the global slowdown effect, although direct empirical tests will be necessary to fully resolve the potential contribution of attention.

One simple model that can account surprisingly well for the data in the first two experiments explains the effect in terms of differences in the magnitudes of emergent motion vectors generated off the moving stimuli. In the global case, the emergent motion vectors arise under the assumption that they are the translating corners of the illusory squares formed by the moving dots in Anstis’s initial demonstration of the effect or the Ls in Experiments 1 and 2. In the local case, the emergent motion vectors arise under the assumption that the dots or L-pairs travel along a circular trajectory. This implies that in the local case, the maxima of the motion vectors for each element (dot or L) are, at every moment, pointing in the direction tangent to the instantaneous motion of each dot. Let us set the magnitude of these “local” emergent motion vectors, which are constant over time, to 1 unit. For the global case, the magnitudes of emergent motion vectors normal to the sides of illusory squares at the corners varies sinusoidally or cosinusoidally over time, with a maximum value of 1 and a minimum value of 0 on each edge. The mean maximum motion magnitude over one cycle will, therefore, be the area under the maximum of the absolute value of the sine or cosine of the angle (see Fig. 6). This is the gray shaded area under the curves shown in the bottom half of Fig. 6. The area under the curve can be computed very easily by noting the symmetry under the curve between 0 and pi/4, and between p/4 and pi/2, and so on. It is therefore only necessary to compute the integral for cosine between 0 and pi/4. Since the integral of cosine is sine, computing the limits between 0 and pi/4 yields sin(pi/4) − sin(0) = 0.702. The area under the curve between 0 and 2pi is 8 times this area because there are eight quarters of pi between 0 and 2pi. Thus, for the global case, the integral of motion maxima at a corner of the square is 8 × 0.702 = 5.66. The integral of the local case maxima is simply the integral of the constant value 1 from 0 to 2pi, which is just 2pi. According to this simple “maxima of emergent motion vectors” model, the degree to which the global case should appear slower is 5.66/2pi = 0.90, corresponding to ~10% decrease. This is slightly less than the effect size that we observe in Experiments 1 and 2.

Fig. 6
figure 6

Emergent motion signal model. The schematic in the upper half of the figure shows the emergent motion vectors, indicated with arrows, that arise when local motion (left side of figure) and global motion are perceived. The bottom half of the figure shows how the emergent motion vectors vary as the stimulus elements travel along the circular trajectory. The local emergent motion vectors are constant at 1, while the global emergent motion vectors vary sinusoidally (light gray lines) or cosinusoidally (dark gray lines) over time, with a maximum absolute value of 1 and a minimum absolute value of 0. Because we are interested in vector magnitude regardless of sign, we are plotting the absolute values of the sine and cosine curve. The gray area indicates the area under the curve for the maximum absolute value of the two curves

While this simple model can capture the direction and approximate magnitude of the global slowdown effect in Experiments 1 and 2, the results of Experiment 3 cannot be explained by this type of model, because there is no difference in the emergent motion vectors produced by the local and global conditions, only in the phase relationship between emergent motion vectors. This means that this simple model would predict no effect in Experiment 3, which then serves as a powerful test of this simple model. This does not falsify the model. It merely shows that there must be other factors at play besides emergent motion signals. We can speculate that the smaller effect size we see in Experiment 3 is actually because of the absence of emergent motion signal differences between local and global motion percepts. Future experiments will have to determine whether this is the case, but for now we can conclude, on the basis of Experiment 3, that a simple emergent motion signal model is not sufficient to account for the global slowdown effect in all cases.

Another intriguing possibility is that motion-based grouping introduces additional uncertainty to the display. A Bayesian model of motion perception has been proposed, which assumes that local image measurements are noisy and that observers have a prior preference for perceiving slower, rather than faster, motion (Weiss, Simoncelli & Adelson, 2002). Within this framework, more uncertainty in the locally measured signals will cause the “prior of preferring slow motion” to have a stronger influence on the motion estimate, which can explain illusions where low-contrast stimuli are perceived as moving slower than high-contrast stimuli. Such a model could also explain our results, if motion-based grouping increases uncertainty of motion estimates. Why would this be the case?

We have previously shown that motion-based grouping can slow down perceived speed and motion fading, by giving rise to virtual trackable features not present in the image (Hsieh & Tse, 2007; Kohler, Caplovitz, Hsieh, Sun, & Tse, 2010). When the Ls group together into translating squares, the opposite might occur: The unambiguous terminator motion at the line ends of each rotating L may be discarded as the visual system estimates the perceived velocity of the display, because the Ls now form the corners of an illusory square. Similarly, in the Kanizsa-square-style display used in Experiment 3, the unambiguous motion coming from the corners of each pacman may be discarded during grouping because the corners are now seen to arise from a square occluding the four circles. The edges of the illusory squares perceived in the global configuration of both stimuli provide noisier input to the motion system, and therefore the overall uncertainty is increased.

A series of fMRI experiments conducted by Murray and colleagues lend some support to the idea that there might be suppression of local signals during motion-based grouping (Fang et al., 2008; Murray, Kersten, Olshausen, Schrater, & Woods, 2002). They found that when subjects shift from a percept of ungrouped, locally moving image segments to a percept of globally moving shapes, the BOLD signal in the primary visual cortex (V1) decreases, while the signal in object-sensitive lateral occipital cortex increases relative to baseline. This finding is consistent with local motion signals in V1 being suppressed or discarded as grouping occurs and higher-level visual areas begin to code the global motion percept on the basis of emergent motion signals. This loss of low-level local motion signals as a result of grouping may lead to greater uncertainty, which in turn leads to a stronger influence of the prior toward slow motion and a concomitant decrease in perceived speed. If this is the case, it gives rise to some interesting predictions: First, the global slowdown effect should generalize to other stimulus configurations that undergo motion-based grouping. This would include the stimuli used to demonstrate V1 suppression following grouping (Murray et al., 2002), such as the well-known bi-stable diamond created by Lorenceau and Shiffrar (1992). We have made some progress toward demonstrating this in Experiment 3, where we show that the global slowdown effect can occur with a stimulus that is very different from the original dot and L-pairs. Second, the stimuli we used in these experiments and any other stimuli for which slowdown occurs during motion-based grouping should also exhibit a decrease in V1 activity during perceptual grouping. Finally, if V1 suppression is indeed related to increased uncertainty of motion estimates and if increased uncertainty leads to the global slowdown effect, we might expect that the subject-wise variability in effect size for the slowdown should be correlated with the subject-wise variability in primary visual cortex deactivation.

In order to fully assess the generalizability of the global slowdown effect, shape also needs to be considered. In all of our experiments on the global slowdown effect, we have used squares, a convex global shape that human observers are highly familiar with. It has been shown that perceiving global motion is more difficult with concave shapes than with convex shapes and that this effect is somewhat independent of familiarity with the shapes (Lorenceau & Alais, 2001). Although both familiarity and concavity could potentially influence perceived speed, it is important to distinguish any such effects from effects on perceptual grouping itself. Anstis and Kim (2011) have demonstrated how rotating dot pairs like the ones used in the original illustration can be made to group readily into other shapes, such as expanding and contracting circles. In our pilot work for these experiments, we have found that dot pairs rotating out-of-phase will frequently group into nonrigid oscillating squares, especially as observers become more familiar with the stimulus. Making to-be-grouped stimuli the same color can also strengthen grouping of less natural stimuli (Anstis & Kim, 2011). Future experiments might take inspiration from these observations, as well as the broad range of stimuli used by Lorenceau and Alais, to construct unfamiliar and/or convex shapes that will readily group, to investigate how the shape of the grouped stimulus influences perceived speed.

We have described two models here, one based on emergent motion signals, and another based on a prior towards slow motion. These models are not mutually exclusive. As we have already acknowledged, the emergent contour hypothesis cannot explain the results of Experiment 3. It could be that both mechanisms can influence perceived global motion speed to various extents, depending on the specific stimulus.

In conclusion, we have shown that the global slowdown effect does not depend on the size of the global percept, the presence of rotational motion signals in the stimulus, or the number of image elements. Our data cannot resolve the question of why the global motion slowdown effect occurs, but they do constrain the range of possible models considerably. We have described two models that do reasonably well at describing our data and the predictions that follow from them. One is very simple and makes very few assumptions beyond the stimulus and its associated percepts but cannot explain the global slowdown effect found in Experiment 3. The other model appears to be more generalizable but forces us to make additional assumptions that are, to some extent, supported in the literature but may not apply to our stimuli. Future experiments will be necessary to test these models and the predictions they make.