The accurate perception of object motion is critical to survival. Although we have made major advances in our understanding of motion perception based on the processing of lateral (2D) motion signals on computer displays, the vast majority of motion in the real (3D) world occurs outside of the plane of fixation, and motion directly towards or away from the observer tends to have particular behavioral relevance.

One insight gained from the study of 2D motion perception is that when two objects with physically identical speeds but different contrasts translate on a computer screen, observers tend to report that the lower contrast object moves more slowly (Thompson, 1982; Stone & Thompson, 1992). This phenomenon has been explained in terms of perceptual inference, whereby the reduction in contrast and the associated increase in sensory uncertainty, result in a proportionally larger contribution of prior expectations. Because prior experience tells us that most objects in our environment tend to be stationary or move slowly, poorly visible stimuli therefore appear to move more slowly (Yuille & Grzywacz, 1988; Weiss, Simoncelli, & Adelson, 2002; Stocker & Simoncelli, 2006).

Since misperception of the motion of an approaching object can have serious consequences, we would like to know if similar effects occur for perception of 3D motion. Indeed, previous work has reported systematic biases in the estimation of both real and virtual object motion in depth, such that objects appear to move more sideways (Harris & Dean, 2003; Welchman, Tuck, & Harris, 2004; Harris & Drga, 2005; Gray, Regan, Castaneda, & Sieffert, 2006; Poljac, Neggers, & van den Berg, 2006; Lages, 2006; Rushton & Duke, 2007; Welchman, Lam, & Bülthoff, 2008; Duke & Rushton, 2012). This lateral bias is thought to arise based on the geometry of 3D motion perception and the mechanism for 2D speed perception described above (Welchman et al., 2008).

However, such bias is somewhat puzzling. Everyday behavior does not seem to be routinely affected by laterally biased estimates of object motion. The goal of the current study, then, is to revisit this bias and systematically investigate the accuracy of 3D motion perception.

We first assessed performance under relatively optimal conditions, and subsequently investigated changes in performance under two manipulations of sensory noise. In the first manipulation, we increased sensory noise through reductions in stimulus contrast, similar to manipulations employed in the study of biases in 2D motion perception. In the second manipulation, we varied the target’s position in depth relative to fixation. The shift of target position in depth does not affect the sensory uncertainty associated with the two retinal images, but we reasoned that it should impact sensory uncertainty due to reduced sensitivity to stimulus properties, such as binocular disparity away from the plane of fixation (Westheimer & Tanzman, 1956; Blakemore, 1970; Schumer & Julesz, 1984; Landers & Cormack, 1997). We further reasoned that if both sensory noise manipulations have similar consequences from a perceptual inference perspective, they should have similar impacts on behavioral performance.

To anticipate, we find little evidence for the previously reported bias in 3D motion perception. Instead, we find relatively accurate performance under optimal conditions. In addition, we find a surprising novel phenomenon, whereby observers systematically confuse the motion in depth direction of a target’s motion (i.e., the observer will report approaching motion when the object is in fact receding and vice versa). Because these confusions become more prevalent with manipulations of both contrast and target position, we conclude that sensory uncertainty in general leads to these systematic confusions of the direction of motion in depth.

Experiment 1: Behavioral performance in a 3D motion extrapolation task (“3D Pong”)

The goal of Experiment 1 was to establish performance under relatively optimal, low-sensory noise conditions. We measured performance with high-contrast targets moving near the fixation plane. Because our initial observations were inconsistent with previously reported biases in the perception of 3D motion, we verified the robustness of this tendency across five additional levels of relatively high target contrast.

Methods

Participants

Five experienced observers, including one author (JMF), and four inexperienced observers participated in the experiment. All had normal or corrected-to-normal vision. The experiments were performed in accordance with the guidelines of The University of Wisconsin - Madison Institutional Review Board, and all observers gave informed written consent.

Apparatus

The experiments were performed using Matlab and the Psychophysics Toolbox (Brainard, 1997; Pelli, 1997) on a Windows 7 computer with an Nvidia Quadro 4000 video card. All stimuli were presented on a 29-cm × 51-cm 3D LCD display (Planar, 120 Hz, 1920 × 1080 pixels) at a viewing distance of 90 cm (29.54° × 16.62° of visual angle) such that 1 pixel subtended 0.015° of visual angle. Stimuli were anti-aliased to achieve subpixel resolution. Observers viewed the display through Nvidia 3D shutter glasses, which were synched with the refresh rate of the display (60 Hz per eye). The experiment was conducted in a dark room, with the display being the only source of illumination. The luminance of the display was linearized using standard gamma-correction procedures with mean luminance = 3.85 cd/m2 when viewed through the shutter glasses. Observers used a keyboard to make responses and completed the trials at their own pace.

Stimuli

Observers essentially played a 3D version of the video game Pong. All stimuli were rendered according to the laws of projective geometry (a visual scene was rendered for the left- and right-eye separately using an asymmetric camera frustum in OpenGL). This meant that the visual stimuli contained correct monocular cues (size, looming, perspective) as well as binocular cues (disparity, interocular velocity). However, given that we presented all stimuli on a computer screen, the display lacked correct accommodative/blur cues when stimulus elements moved outside of the fixation plane. We used a chin-rest to maintain head position. To maintain fixation and aid vergence, stimuli were presented within a circular aperture (7.5° in radius), surrounded by a 1/f noise pattern that was identical in both eyes. The background seen through the aperture was black (0.013 cd/m2; Fig. 1a). In addition, a set of Nonius lines was presented within a small 1/f noise patch at the center of the display to further aid vergence and fixation.

Fig. 1
figure 1

Details of all three experiments. a Screenshot of the visual stimulus. Observers wore shutter glasses, so that the left- and right-eye images could be fused. The display consisted of a central fixation patch and Nonius lines, a surround 1/f noise texture, a white target (not shown), and an adjustable textured “paddle” (depicted). As the paddle’s position was adjusted by the observer, it rotated about the target’s start position, in the x-z plane. Note that in Experiments 2 & 3, the aperture was mid-gray in color (see those sections below). b Schematic of the experimental task. Observers fixated the center of the display. A white target of variable contrast (depicted black for demonstration purposes) appeared and moved in a random direction in 360° (“Observation”). After moving for 1 s, the target disappeared and a textured paddle appeared. Observers were asked to adjust the location of the paddle around an invisible circle that “orbited” the target’s origin of motion so that it would intercept the target if it continued along its trajectory (“Estimation”). Feedback was not provided. c Computation of setting judgment error. The judgment error on each trial was computed as the circular distance between the midpoint of the paddle setting (“reported direction”) and the true target trajectory endpoint (“presented direction”). d Classification of judgment errors. When observers’ reported directions are plotted against the presented directions, the positive diagonal corresponds to accurate performance. Data points that fall directly on the negative diagonal correspond to reports where the lateral component of the target’s motion is judged accurately, but the motion in depth component is confused. Left plot: Data points that fall within the red regions correspond to incorrect reports of the direction (i.e., approaching (downward facing arrows) vs. receding (upward facing arrows)) of the target’s motion in depth. Right plot: Data points that fall within the red squares correspond to incorrect reports of the target’s lateral motion direction (left- vs. rightward)

On each trial, a 0.43-cm diameter dot (“target”), 0.25° at the 90-cm viewing distance, appeared at fixation (i.e., at the midpoint of the screen plane). The target was rendered with one of six contrast levels (Weber fractions: 434.34, 62.16, 44.87, 38.85, 18.55, and 10.28), which corresponded to luminance values of 5.79 cd/m2, 0.84 cd/m2, 0.61 cd/m2, 0.53 cd/m2, 0.26 cd/m2, and 0.15 cd/m2 when viewed through the shutter glasses. (A fully white object was 5.79 cd/m2 when viewed through the shutter glasses). Note that these Weber fractions are all >1, meaning that all contrast levels in this experiment were significantly larger than the corresponding Michelson contrast (100 %) typically used in 2D motion experiments. We will turn to the effect of Weber fractions <1 in Experiment 2.

The target followed a random trajectory defined by independently chosen random speeds in the x (lateral) direction and the z (motion in depth) direction, with no change in y (vertical direction) before disappearing. Velocities in x and z were independently chosen from uniform distributions with magnitudes ranging between (0.02:6.2 cm/s). The velocities were then given a random sign, so that for approximately 50 % of the trials, the stimulus approached/receded. Given the stimulus presentation time of 1 second, the target’s motion produced an average maximum binocular disparity of 0.27°. The independently chosen x- and z-velocity components were critical for preventing observers from adopting strategies that bypass the need for accurate estimation of both x- and z-velocity components from the retinal signals.

A 3D rectangular block (paddle) whose faces also consisted of a 1/f noise pattern was positioned within the display at a simulated radial distance of 12.4 cm from the target’s initial start position. During adjustment, the paddle moved along a circular path around the target’s initial start position in the x-z plane with the same side of the paddle facing the target’s start position at all times. No translation occurred in y. The 1-cm wide paddle was positioned 12.4 cm from the target’s initial start position, so that it was at a sufficient distance to keep the target’s motion within the fuseable area while providing appreciable depth to the displays. A movie demonstrating the stimulus trial sequence can be viewed in the Supplementary Material

Procedure

Observers were first provided with written and oral instructions from the experimenter. They then performed 10-15 practice trials in the presence of the experimenter to become familiar with the apparatus. During this time, the observer also was asked to note whether the target was approaching or receding. If observers were unable to make these judgments, reported difficulty themselves, or were judged to have difficulty viewing the stimulus in any other way, they were excluded from further participation (this amounted to <10 % of potential observers for the entire study). All remaining participants completed the experimental trials in two sessions (Experiment 1) and one session (Experiments 2 & 3), and all data collected were included in the subsequent analyses. No feedback was provided for either the practice or experimental trials.

On each trial, the observer fixated the region between the Nonius lines at the center of the screen. The target appeared at fixation and followed a linear trajectory defined by the random independent velocities in x and z chosen for that trial, sometimes appearing to come out of the screen (“approaching”), sometimes appearing to move back into the screen (“receding”). After 1 second, the target disappeared and the paddle immediately appeared. On the first trial, the paddle appeared to the far right of fixation (i.e., at 0°). On subsequent trials, the paddle appeared at the last location set by the observer. The observer was instructed to extrapolate the visible portion of the target’s trajectory and adjust the paddle’s position so that it would have intercepted the target if the target had continued along its trajectory. We asked observers to extrapolate the trajectory so that responses would be based on perceived motion direction rather than some heuristic, such as the location of target disappearance. Explicit visual and verbal instructions, as well as the task familiarization phase validated that observers understood the task as an extrapolation task. During the setting phase of the trial, the observer’s eyes were free to move about the display. When the observer was satisfied with the paddle setting, he resumed fixation and pressed the spacebar to initiate a new trial (Fig. 1b).

Analysis

The angle corresponding to the midpoint of the paddle setting (reported direction) on each trial provided the observer’s estimate of the target motion direction (presented direction) on that trial. To quantify overall performance, we computed the error for each trial as the circular distance between the presented motion trajectory and the reported paddle position (Fig. 1c) using the CircStats toolbox for MATLAB (Berens & Velasco, 2009). To draw comparisons to the existing literature that has consistently reported lateral biases in observer reports, we took the direction (sign) of the observer’s report relative to the physical trajectory into account (Fig. 1c). Settings that were laterally biased as in previous studies, i.e., biased towards the fixation plane, were assigned positive values. Conversely, settings that were medially biased, i.e., biased towards the vertical plane through the viewing direction (the midsagittal plane) were assigned negative values. Thus, the circular mean of these signed errors not only provide a measure of each observer’s overall accuracy for the particular condition, but also a measure of any systematic biases (lateral or medial) in their estimates—negative values correspond to medial bias and positive values correspond to lateral bias.

To summarize performance as a function of sensory uncertainty, we used two basic summary measures of performance: 1) confusions in the motion in depth direction of the target’s motion (i.e., approaching vs. receding); 2) confusions in the lateral direction of the target’s motion (Fig. 1d).

Results

We begin by considering performance in the high target contrast condition (Weber fraction = 434.34). Each of the five individual observers’ reported motion direction is plotted as a function of the presented directions in Fig. 2a (O2 is JMF). The between-subject mean signed error in the reports (see Methods section) was 3.39° [bootstrapped 95 % confidence interval (CI) 1.79, 4.91°], revealing a small, but significant, lateral bias. The paddle itself covered a width of 4.63° around the circular path, so that the observers’ paddle intercepted the target on average.

Fig. 2
figure 2

Experiment 1 results: Reported 3D motion direction as a function of presented direction for all five observers for the high contrast target condition (Weber Fraction = 434.34). O2 = JMF. a The reported direction based on the midpoint of the paddle setting is plotted against the trajectory direction the target followed on each trial across the full 360° stimulus space (n = 200 per plot). The box inset represents a range of approaching stimuli [−64:64° relative to straight, head-on motion] used in several previous studies. b A closer look at performance within the inset from a, with the data plotted in the same format

To compare performance to previous reports (Welchman et al., 2004; Welchman et al., 2008), we also briefly consider performance in response to target motion within the range of −64:64° relative to head-on motion (270°), denoted by the inset in Fig. 2a and the data in Fig. 2b. The between-subject mean signed error was −4.25° [bootstrapped 95 % CI −5.30, −3.19°], indicating a small but medial bias within this subset of presented target motion—that is, the settings were closer to the midline than the true target trajectories, revealing an inconsistency with the pattern reported in previous studies.

Next, we investigated performance across five additional levels of reduced, but still relatively high, contrast levels. The reported directions for all five observers in each of the six total target contrast conditions are plotted as a function of the presented directions in Fig. 3a. Reduced target contrast is associated with a significant increase in mean signed error (repeated-measures ANOVA, F(5,20) = 6.8, p < 0.001) with bias occurring in the medial direction on average for the higher contrast targets and in the lateral direction on average for the lower contrast targets. However, this effect is driven largely by performance in the lowest contrast condition—removing this condition from the analysis reveals a marginal nonsignificant effect of reduced target contrast on mean signed error (F(4,16) = 1.72, p = 0.19). When the lowest target contrast condition is excluded and the between-subject mean signed error is computed for the remaining five target contrast conditions combined, the result is a mean error of 0.10° [bootstrapped 95 % CI −0.62, 0.81°] and does not reveal a significant bias. Thus, the small bias reported for the highest target contrast condition above may simply be due to random variability in performance.

Fig. 3
figure 3

Combined observer performance in 3D motion in depth estimation for six target contrast conditions. a Reported direction plotted as a function of presented direction in the same format as Fig. 2 for the six target contrast conditions expressed in Weber fractions as viewed through the shutter glasses (n = 1,000 per plot, combined data for 5 observers). The overlaid pale red regions correspond to the regions of confusion when reporting motion in depth direction (left column) and the regions of confusion when reporting lateral motion direction (right column). See Methods and Fig. 1 for more detail. b Mean percentage of trials in which observers confused the target’s direction of motion in depth (i.e., approaching vs. receding). c Mean percentage of trials in which observers confused the target’s lateral motion direction. Error bars correspond to bootstrapped 95 % CIs

Due to the decline in performance for the lowest target contrast condition, reduced target contrast is associated with a significant increase in the tendency to confuse whether the target is approaching or receding (Fig. 3b; repeated-measures ANOVA, F(5,20) = 10.2, p < 0.001). On the other hand, there is no effect of reduction in target contrast in the proportion of trials in which observers confuse the lateral direction of motion in depth (Fig. 3c; repeated-measures ANOVA, F(5,20) = 0.73, p = 0.61).

In contrast to previous work, these results reveal highly accurate performance under high-contrast conditions, with little evidence for systematic biases in the judgment of 3D motion direction. When contrast is reduced, the pattern of responses indicates that observers begin to confuse motion in depth, but not lateral motion, revealing a previously undocumented illusion in the perception of 3D motion. Rather than a lateral bias, observers seem to confuse approaching and receding motion. Given the potentially problematic nature of such confusions in the real world, we further explore this phenomenon in the next two experiments. As an aside, O3 & O4 seem to be confusing the direction of motion in depth, for receding, but not approaching, motion even under optimal conditions. We believe this is due to a separate bias in judging the position of stimuli as a function of contrast. We will return to this point in the discussion.

Experiment 2: Manipulating target contrast

Experiment 1 demonstrated that, over a broad range of target contrasts, 3D motion in depth trajectory extrapolation performance is relatively accurate. However, for the lowest contrast level, larger errors did emerge so that all observers became more likely to confuse the direction of the target’s motion in depth (i.e., reporting that the target was approaching, when it was in fact receding) even when observers’ ability to identify the lateral direction of the target’s motion was not impacted. Having established the type of impact target contrast has on 3D trajectory extrapolation, we designed Experiment 2 to explore the impact of target contrast by further reducing target contrast. This was achieved by changing the stimulus aperture to mid-gray. We then used three target Weber contrast levels <1.

Methods

Participants

Five observers from the UW-Madison community with intact stereovision as determined by the task acclimation phase (General Methods) participated in the experiment. Two were authors (J.F. and M.R.) and three were new and naïve to the purpose of the experiment. The authors’ performance did not differ from the naïve observers and thus their data are combined in the analyses.

Stimuli

The stimuli were identical to those of Experiment 1 with the exception that the aperture was set to mid-gray with luminance = 3.66 cd/m2 when viewed through the shutter glasses. The target was rendered with one of three Weber fraction contrast levels (0.58, 0.33, 0.17), which corresponded to luminance values of 5.79 cd/m2, 4.87 cd/m2, and 4.24 cd/m2 when viewed through the shutter glasses. The three target contrast levels were counterbalanced and presented in pseudorandom order. Each contrast level was tested 200 times for a total of 600 trials per observer.

Results

The reported trajectory directions for the three target contrast conditions for all observers as a group are plotted in Fig. 4a. With reductions in target contrast, there is a significant increase in the proportion of trials in which observers confuse the direction of motion in depth (Fig. 4b; repeated-measures ANOVA; F(2,8) = 64.08, p < 0.01). Although there also appears to be a slight increase in the proportion of trials in which observers confuse the lateral direction of motion in depth with reduction in target contrast, the effect is not significant (Fig. 4c; repeated-measures ANOVA; F(2,8) = 2.62, p = 0.13). We will discuss this apparent pattern in more detail after the next experiment.

Fig. 4
figure 4

Combined observer performance in 3D motion in depth estimation for three contrast conditions. a Reported direction plotted as a function of presented direction in the same format as Fig. 2 for the three target contrast conditions expressed in Weber fractions as viewed through the shutter glasses (n = 1,000 per plot, combined data for 5 observers). Left: 0.58; Middle: 0.33; Right: 0.17. b Mean percentage of trials in which observers confused the target’s direction of motion in depth (i.e., approaching vs. receding) as a function of target contrast following the order of a. c Mean percentage of trials in which observers confused the target’s lateral motion direction in the same format as b. Error bars correspond to ± −1 SEM

Experiment 3: Manipulating position-in-depth

In the preceding experiments, we demonstrated that reductions in target contrast were associated with (i) an increased tendency to confuse the motion in depth direction of the target’s motion, and (ii) no significant impact on the ability to estimate the lateral motion direction of the target’s motion. We argued that these impacts resulted from increased target motion uncertainty due to increased sensory noise, rather than the contrast of the target per se. If our reasoning is correct, the specific source of the sensory uncertainty should be irrelevant, and other manipulations that increase sensory noise should be associated with similar impact on observer performance. To test this assertion, we fixed the target contrast to the highest level used in Experiment 2 but varied the reliability of binocular information through shifts in the target’s position in depth relative to the plane of fixation.

Methods

Participants

Ten new, naïve observers from the UW-Madison community with intact stereovision as determined by the task acclimation phase (General Methods) participated in the second experiment along with two of the authors (J.F. and M.R.).

Stimuli

The stimuli were identical to those of Experiment 2, with the exception that the target’s start position occurred at one of two new locations (Fig. 5a). For the “in front of fixation” (Near) condition, the target’s start position was shifted 6.2 cm “out of” the display. For the “behind fixation” (Far) condition, the depth component of the target’s start position was shifted 6.2 cm “into” the display. There was no change in the horizontal component of the target’s start position. The fixation point remained at the midpoint of the display. The paddle’s position was shifted in depth so that it continued to circle the origin of the target’s motion. The target was always presented at the maximum Weber contrast fraction value of 0.58 used in Experiment 2.

Fig. 5
figure 5

Combined observer performance in 3D motion in depth estimation for three positions in depth configurations. a Reported direction plotted as a function of presented direction in the same format as Fig. 3 for the three start positions: near, at fixation, and far (n = 1,400 per plot for Near and Far; n = 1,000 for At Fixation). Note that the At Fixation data are the data from the high contrast condition of Experiment 2 for comparison. b Mean percentage of trials in which observers confused the target’s direction of motion in depth (i.e., approaching vs. receding). c Mean percentage of trials in which observers confused the target’s lateral motion direction. Error bars correspond to ± −1 SEM

Each of the ten naïve observers were assigned randomly to one of the two start position conditions (5 per group). Each completed 200 trials. The two authors participated in both conditions. The authors’ performance did not differ from the naïve observers, and thus their data are included in the analyses, yielding a total of seven observers per group. All other experimental details are identical to Experiment 2.

Results

Figure 5a shows the reported direction as a function of the presented direction for all seven observers in each of the two target start position conditions. For comparison, the nonshifted (“At Fixation” start position) data for the five observers from Experiment 2 also are shown. As expected, we find a significant increase in the proportion of trials in which the motion in depth direction of the target was confused for the two shifted target position conditions (F(2,16) = 8.243, p < 0.01; Fig. 5b). The shift in the target position relative to fixation did not significantly alter the proportion of trials in which observers confuse the lateral motion component (F(2,16) = 0.464, p = 0.64; Fig. 5c).

Quantifying the impact of uncertainty on 3D motion estimation

In three experiments, we have provided new insight into the impact of sensory uncertainty on 3D motion estimation. One of the consequences of increased uncertainty as revealed by performance in our task is a tendency for observers to misreport the direction of the motion-in-depth component under conditions of greater uncertainty. Whereas reported lateral motion did not differ significantly across levels of uncertainty, this may simply have been due to the low proportion of lateral motion confusions in our data (<3 %). We therefore wanted to test if the 3D motion estimation process breaks down more generally with increased sensory uncertainty, by nonetheless revealing an effect of uncertainty on reported lateral motion.

Towards this end, we compared the relative reliability of observers’ estimates of the two motion components (x and z) in our task. Observers’ responses reflect the combination of the perceived lateral and motion in depth speed components of the target’s motion on each trial, so we cannot test the reliability of those estimates separately. However, we can index the uncertainty in-motion, in-depth component on a trial-by-trial basis according to whether or not the observer reported the correct direction of motion in depth. We assume that sensory uncertainty was lower for trials in which the observer reported the correct direction of motion and that sensory uncertainty was higher for trials in which the observer reported the incorrect direction of motion. This binary measure is simply a categorization of a continuum of uncertainty. The lateral judgment error variance conditioned on the accuracy of the corresponding depth judgment provides a measure of uncertainty in lateral motion component estimation.

The extent to which motion-in-depth and lateral measures of sensory uncertainty vary together is indicative of the nature of break down in the 3D motion estimation process. The common source of sensory information (i.e., the retinal motion) used to derive these estimates predicts a correlation between the measures such that uncertainty in the motion in depth direction should be associated with greater variability in lateral judgments. Previous work has reported that observers do not rely on the z-component of motion when reporting trajectory direction (Harris & Dean, 2003; Harris & Drga, 2005). This would result in the two measures not being correlated. Such an outcome would suggest that estimation of motion in depth and lateral motion for the targets in our task share no common mechanisms (which could be the case, for example, if lateral motion is velocity-derived and motion in depth is disparity-derived).

Lateral judgment errors were computed as the horizontal distance between the location at which the true target trajectory would intercept the circular “orbit” on each trial projected directly on the fixation plane, denoted x i , and the midpoint of the paddle setting projected on the fixation plane denoted \( {\widehat{x}}_i \) (Fig. 6a). The lateral judgment error variance, conditioned on the accuracy of the corresponding depth judgments was then computed as:

Fig. 6
figure 6

Uncertainty in z-estimation is associated with increased uncertainty in x-estimation. a Lateral judgment errors. Top left: Schematic showing the projection of the observer’s paddle setting (pink arrow) and the intersection point of the true trajectory (red arrow) to the fixation plane. Lateral judgment error is defined as the difference in the positions on the fixation plane. Top right: If uncertainty in the estimates of the two motion components is independent, then the variance in lateral judgment errors should be equivalent on trials in which the observer correctly reports the target’s depth direction (red arrow; low uncertainty in motion in depth) with trials in which the observer incorrectly reports the target’s depth direction (pink arrow; high uncertainty in motion in depth). Bottom left: If there is an increase in uncertainty in the depth component estimation associated with an increase in uncertainty in lateral component estimation, the variance of lateral judgment errors will be larger on trials in which the observer incorrectly reports the target’s direction in depth (pink arrow) than on trials when the direction in depth is correctly reported (red arrow). Bottom right: If the observer relies on an estimate of target motion direction on trials in which uncertainty in motion in depth component estimation is low but abandons such estimation and relies on a predictable default location when uncertainty in motion in depth component estimation is high, lower variance of judgment errors is expected for trials in which the observer misreports the depth direction (pink arrow). b Results of the analysis. The lateral judgment error variance for incorrectly reported depth trials is plotted against the lateral judgment error variance for correctly reported depth trials for each observer in each of the conditions for the experiments described above. The symbol colors correspond to the Weber contrast fraction. Note that the Near and Far conditions of Experiment 3 have Weber fraction = 0.58, the largest used in Experiment 2 (denoted by the ‘x’ symbols). See the individual experimental sections for the corresponding luminance values. The majority of the data points fall above the identity line, including the group mean (± −1 SEM), which is consistent with the predicted results if the uncertainty in the estimates of the two velocity components is correlated and both factor into the variance in judgment errors

$$ RMS=\sqrt{\frac{1}{n}{\displaystyle \sum_{i=1}^n{\left({\widehat{x}}_i-{x}_i\right)}^2}} $$
(1)

The lateral judgment variance for the two categories of trials (i.e., correct/incorrect motion in depth direction reports) for all observers and experimental conditions are plotted in Fig. 6b. The majority of the data points fall above the identity line, including the group mean, indicating that the lateral judgment error variance was larger when the motion-in-depth direction was judged incorrectly compared to when it was judged correctly under all levels of uncertainty. These results suggest that estimation of both an object’s lateral motion component and motion in depth component are subject to a common source of sensory uncertainty. We previously did not find significant differences in lateral motion confusions with manipulations of sensory uncertainty. However, relating this finding back to the stereotypical pattern of results that emerged across these experiments and highlighting the motion in depth direction confusions, these results can be interpreted as providing direct evidence that trial-to-trial sensory uncertainty as revealed by motion in depth confusions has a similar, albeit much smaller, effect on lateral motion estimates.

We can rule out that the motion in depth direction confusions are the result of a random approaching versus receding response choice, independent of the lateral setting, as would be revealed by the data points falling along the identity line. Lastly, we can further rule out the use of a simple response heuristic in which observers provide a default response on trials in which they are uncertain about the motion in depth direction, as would be revealed by the data points falling below the identity line.

To conclude, the results indicate that the lateral and motion in depth components of the observer’s report are affected by a common source of sensory uncertainty. The effect on the motion in depth component is simply much more apparent than the effect on the lateral component.

Discussion

In the current study, we examined observer reports of the direction of targets moving in depth under manipulations of sensory noise. We uncovered a novel and surprising tendency to confuse the direction of target motion (i.e., confusing approaching and receding motion). These behavioral effects were consistent and systematic across conditions of increased sensory noise, suggesting that confusion in the direction of motion in depth is due to fundamental limitations of the sensory inference of 3D motion, rather than specific properties of the stimuli. These results shed new light on failures of human motion perception and provide a criterion to assess the quality of 3D visualization environments, such as 3D movie or virtual reality devices.

Although initial investigation indicated that the tendency to confuse the lateral component of the target’s motion did not significantly increase with sensory noise, further investigation revealed that these effects are in fact related, such that an increase in uncertainty in motion in depth component estimation is associated with an increase in uncertainty in lateral motion component estimation on a trial by trial basis. Thus, the two motion components share a common source. It simply seems the case that the impact of sensory uncertainty is much more apparent in the observer’s estimation of the z (depth) motion component.

These results contribute to the understanding of the types of errors that occur in 3D motion processing. It has been shown previously that 2D motion appears slower under high-noise conditions, but that even under low-noise conditions 3D motion appears to be systematically laterally biased—i.e., observers report motion as moving more sideways (Harris & Dean, 2003; Welchman et al., 2004; Harris & Drga, 2005; Gray et al., 2006; Poljac et al., 2006; Lages, 2006; Rushton & Duke, 2007; Welchman et al., 2008; Duke & Rushton, 2012). We do not find evidence for such bias in our data. The systematic decline of observer performance under the noisy conditions in our study supports the overall notion that uncertainty in 3D target motion gives rise to errors in the reported direction of motion (Welchman et al., 2008; Lages, 2006), but this seems to have altogether different effects than an increase in lateral bias.

Why might we observe results different from those reported previously? One distinction between our task and those used in most previous studies is that the full 360° space was utilized for both the stimuli and responses (but see Lages, 2006). In previous work, observers were often provided with a response space that was markedly inconsistent with the true set of stimuli. Specifically, the expected stimulus space comprised a wider (i.e., more lateral) range than the actual stimulus space. Observers were typically told that the stimulus could approach at any angle across the 180° range, whereas the stimuli in actuality were constrained within smaller ranges around the midsagittal plane. We hypothesize that invoking prior beliefs about the distribution of possible target motion trajectories is a strategy observers may use in overcoming the uncertainty in estimating motion in depth. We hypothesize that such a mismatch between one’s prior beliefs and the true distribution of possible target motion trajectories may have been a factor in the lateral estimation biases reported previously. In the context of our study, the range of presented trajectory directions did span the full 360° space; however, we note that the use of uniformly distributed x- and z-velocities establishes anisotropies in the range such that presented trajectories near the 45/135/225/315° directions were more likely than those closer to the cardinal directions. Thus, although we do not see strong evidence in observers’ settings, our observers may nevertheless have developed a prior for such a distribution. Future work focusing specifically on the range of presented trajectory directions, including work that utilizes a truly isotropic distribution of presented trajectory directions, might provide greater insight into the role and impact of prior beliefs on 3D motion direction estimation.

The directional confusions we report may have been observed previously in motion-in-depth estimation tasks. Lages (2006) established errors in the reported quadrant of the 360° space as a criterion to screen out observers before experimental participation. That is, directional confusions previously have been treated as indications that observers are unable to perform the task. We interpreted these confusions not as failures of stereovision per se, but rather as an index of uncertainty in estimation of the depth component. All our observers easily met the Lages criterion in that they correctly reported the motion in depth direction of the target on 75 % or more of trials for the majority of the contrast levels used. That observers (inexperienced and authors alike) can perform the task and meet the criterion under the best conditions, but start to confuse the direction of motion in depth under poor conditions, indicates that the confusions are a real phenomenon arising from increased sensory uncertainty rather than a fixed characteristic of an observer’s stereomotion acuity.

Secondary to our main effects, observers seemed more likely to report receding motion as approaching under high-contrast conditions and approaching motion as receding under low-contrast conditions. This pattern of results is consistent with a “brighter is closer” heuristic. This heuristic mirrors the relationship between contrast and depth in natural scenes, sometimes known as proximity-luminance covariance (Coules, 1955; Schwartz & Sperling, 1983; Dosher, Sperling, & Wurst, 1986) and recently has been shown to bias observer reports of perceived depth in natural images (Cooper & Norcia, 2014). Thus, in the context of our study, observers may be influenced by a prior expectation of target position in addition to any prior expectations of target motion, especially in cases where sensory uncertainty is already high.

Finally, one could propose that these motion-in-depth confusions are due to observers not seeing the target on certain trials. However, if observers did in fact miss the target on a particular trial, the response on that trial should be random. Random responses would not only yield motion-in-depth direction confusions, but also lateral motion confusions (i.e., misreports of whether the target was traveling leftward or rightward). Importantly, we saw few lateral motion confusions in our data (<3 % of all trials), ruling out that potential explanation for these surprising errors.

Conclusions

This work extends our understanding of motion perception and contributes to a principled approach to understanding errors that arise in 3D motion perception. We have shown that the human ability to estimate motion in depth, in particular, the direction of motion in depth (i.e., approaching vs. receding) is limited by sensory uncertainty, which may arise from a variety of sensory noise sources.