Elsevier

Acta Psychologica

Volume 136, Issue 3, March 2011, Pages 300-310
Acta Psychologica

Integration of disparity and velocity information for haptic and perceptual judgments of object depth

https://doi.org/10.1016/j.actpsy.2010.12.003Get rights and content

Abstract

Do reach-to-grasp (prehension) movements require a metric representation of three-dimensional (3D) layouts and objects? We propose a model relying only on direct sensory information to account for the planning and execution of prehension movements in the absence of haptic feedback and when the hand is not visible. In the present investigation, we isolate relative motion and binocular disparity information from other depth cues and we study their efficacy for reach-to-grasp movements and visual judgments. We show that (i) the amplitude of the grasp increases when relative motion is added to binocular disparity information, even if depth from disparity information is already veridical, and (ii) similar distortions of derived depth are found for haptic tasks and perceptual judgments. With a quantitative test, we demonstrate that our results are consistent with the Intrinsic Constraint model and do not require 3D metric inferences (Domini, Caudek, & Tassinari, 2006). By contrast, the linear cue integration model (Landy, Maloney, Johnston, & Young, 1995) cannot explain the present results, even if the flatness cues are taken into account.

Research Highlights

► Disparity and motion integration for prehension movements without haptic feedback. ► Grasp amplitude increases when motion is added to disparity information. ► We model cue-integration according to Intrisic Constraint and linear cue combination. ► The Intrisic Constraint model accounts for grasp amplitude without haptic feedback.

Introduction

It is commonly believed that visually guided behavior relies on a three-dimensional (3D) metric representation of the environment and the objects in it (Glover, 2004, Greenwald and Knill, 2009). It is also believed that this 3D depth map is found by reversing the physics of image generation to infer the outside world from sensory data (Helmholtz, 1867, Landy et al., 1995, Landy et al., in press, Poggio et al., 1985). The solution of the so-called “inverse-optics” problem by a biological system, however, is extremely difficult because of the underdetermination of the required information. Horizontal binocular disparities, for instance, are not sufficient to recover an object's depth unless the viewing distance is known (Mayhew & Longuet-Higgins, 1982; Fantoni, 2008). Similarly, optic flow is not sufficient to recover surface slant unless additional parameters are known (i.e., the angular displacement between the observer and the surface and the amount of surface rotation) — see Fantoni, Caudek, and Domini (2010). Moreover, even sufficient constraints provided by multiple cues do not guarantee unique percepts (Todd, 2004).

For these reasons, some researchers have questioned the assumption that visuomotor processes rely on metric representations of target distances. Instead, they have hypothesized that (1) the brain relies mainly on image measurements that specify 3D properties directly, without building an explicit metric representation of the environment, and (2) appropriate body–environment interactions emerge as a consequence of adaptive mechanisms, not as the solution of the “inverse-optics” problem (Braunstein, 1994, Domini and Caudek, 2003, Robert et al., 1997, Thaler and Goodale, 2010, Todd, 2004). In prehension movements aimed at reaching and grasping visual objects, for instance, the (haptic and/or visual) feedback resulting from the contact between the hand and the target provides an error signal for calibration that improves the accuracy of subsequent reaches (e.g., Mon-Williams & Bingham, 2007). Thus, visuomotor actions (such as prehension and pointing) may not require the recovery of the full 3D metric depth map, but instead be based on simpler mechanisms of conditional associative learning. If this is true, we should expect that perceptual metric judgments and motor actions in novel stimulus situations with no haptic feedback should be systematically distorted, which indeed has been found to be the case (e.g., Cuijpers, Brenner, & Smeets, 2008).

In the current investigation, we carried out a cue combination experiment in which human performance was measured in three stimulus conditions: with disparity-only information, motion-only information, or both (see also Tittle, Norman, Perotti, & Phillips, 1998). In different blocks of trials, participants either performed a grasping task or provided a perceptual judgment.

Two models of cue integration are considered here. In the first model, image measurements, diagnostic of 3D depth, but insufficient for metric reconstruction, are utilized (intrinsic constraints). The second model, instead, is based on the assumption that the brain uses metric structure (i.e., distance and direction) to represent locations (linear cue integration). In the next sections, we will describe the two models and show how it is possible to empirically validate their predictions by using the results of the present experiments.

Section snippets

Intrinsic constraints

The intrinsic constraint (IC) model proposes that, rather than deriving the full metric depth map, it is more advantageous for an organism to derive the best estimate of the local affine structure and use haptic feedback to calibrate ordinally scaled distance estimates (Di Luca et al., 2007, Domini and Caudek, 2010, Domini and Caudek, in press, Domini et al., 2006, Tassinari et al., 2008; see also Bingham and Pagano, 1998, Thaler and Goodale, 2010).

Retinal signals like relative disparity d are

Linear cue integration

The linear cue integration model assumes that our brain represents distances and locations in a metric format and that this metric representation is used to generate various kinds of responses (Ernst and Banks, 2002, Greenwald and Knill, 2009, Landy et al., in press). According to this approach, in order to obtain a metric representation, the human brain integrates information from multiple sources in order to reduce the uncertainty associated with any one of the available depth cues. If

Experiment

We asked the participants to perform two tasks: (1) to reach out to grasp a target object in the absence of haptic feedback, but with the stimulus always visible during the execution of reach-to-grasp movements (the hand was never visible), and (2) to performed a Manual Size Estimation (MSE) task: participants indicated the depth of the target object with index finger and thumb while holding their hand away from the target. MSE is interpreted as a measure of perceptual depth information in the

General discussion

In the present investigation we study reach-to-grasp movements directed towards virtual stimuli. We test two hypotheses. H1: The same depth distortions are found in performance involving action (with no haptic feedback) and perceptual judgments. H2: The IC model can account for the limited effectiveness of disparity and motion information in conveying spatial information.

The results shown in Fig. 4 support H1: The addition of motion significantly increased the FGA, even if prehension movements

Conclusions

In absence of haptic feedback, prehension movements for virtual targets (hand not visible) reveal the same distortions of 3D depth as perceptual judgments. Predictable and spatially displaced intermixed haptic feedback trials are not sufficient to calibrate no-feedback trials. Final grip aperture increases when motion is added to disparity information, even when this corresponds to an overestimation of depth. The present results are consistent with the IC model but not with linear cue

Acknowledgment

We wish to thank Joe Lappin for his constructive comments on a previous version of this manuscript.

References (74)

  • H.S. Greenwald et al.

    Integrating visual cues for motor control: A matter of time

    Vision Research

    (2005)
  • M.S. Landy et al.

    Measurement and modeling of depth cue combination: in defense of weak fusion

    Vision Research

    (1995)
  • I. Oruc et al.

    Weighted linear cue combination with possibly correlated error

    Vision Research

    (2003)
  • F. Phillips et al.

    Perceptual equivalence between vision and touch is complexity dependent

    Acta Psychologica

    (2009)
  • P. Servos et al.

    The role of binocular vision in prehension: A kinematic analysis

    Vision Research

    (1992)
  • J.T. Todd

    The visual perception of 3D shape

    Trends in Cognitive Sciences

    (2004)
  • S.J. Watt et al.

    Binocular cues are important in controlling the grasp but not the reach in natural prehension movements

    Neuropsychologia

    (2000)
  • W. Adams et al.

    Bayesian combination of ambiguous shape cues

    Journal of Vision

    (2004)
  • M.S. Banks et al.

    Consequences of incorrect focus cues in stereo displays

    Information Display

    (2008)
  • G.P. Bingham

    Calibration of distance and size does not calibrate shape information: Comparison of dynamic monocular and static and dynamic binocular vision

    Ecological Psychology

    (2005)
  • G.P. Bingham et al.

    Accommodation, occlusion and disparity matching are used to guide reaching: A comparison of actual versus virtual environments

    Journal of Experimental Psychology: Human Perception and Performance

    (2001)
  • G.P. Bingham et al.

    Distortions of distance and shape are not produced by a single continuous transformation of reach space

    Perception & Psychophysics

    (2004)
  • G.P. Bingham et al.

    The necessity of a perception/action approach to definite distance perception: Monocular distance perception to guide reaching

    Journal of Experimental Psychology: Human Perception and Performance

    (1998)
  • G.P. Bingham et al.

    Distortions in definite distance and shape perception as measured by reaching without and with haptic feedback

    Journal of Experimental Psychology: Human Perception and Performance

    (2000)
  • M.F. Bradshaw et al.

    Binocular cues and the control of prehension

    Spatial Vision

    (2004)
  • M.L. Braunstein

    Decoding principles, heuristics and inference in visual perception

  • E. Brenner et al.

    Holding an object one is looking at: Kinesthetic information on the object's distance does not improve visual judgments of its size

    Perception & Psychophysics

    (1997)
  • H. Bruggeman et al.

    Reaching movement accuracy is mainly determined by visual online control

    Perception

    (2010)
  • R. Coats et al.

    Calibrating grasp size and reach distance: Interactions reveal integral organization of reaching-to-grasp movements

    Experimental Brain Research

    (2008)
  • W.G. Cochran

    Problems arising in the analysis of a series of similar experiments

    Journal of the Royal Statistical Society

    (1937)
  • Domini, F., & Caudek, C. (in press). Combining image signals before 3D reconstruction: The Intrinsic Constraint Model...
  • F. Domini et al.

    Stereo and motion information are not independently processed by the visual system

    Vision Research

    (2006)
  • M.O. Ernst et al.

    Humans integrate visual and haptic information in a statistically optimal fashion

    Nature

    (2002)
  • C. Fantoni et al.

    Systematic distortions of perceived planar surface motion in active vision

    Journal of Vision

    (2010)
  • V.H. Franz

    Manual size estimation: A neuropsychological measure of perception?

    Experimental Brain Research

    (2003)
  • M. Gentilucci et al.

    Grasp with hand and mouth: A kinematic study on healthy subjects

    Journal of Neurophysiology

    (2001)
  • S. Glover

    Separate visual representations in the planning and control of action

    The Behavioral and Brain Sciences

    (2004)
  • Cited by (14)

    • Explicit and implicit depth-cue integration: Evidence of systematic biases with real objects

      2022, Vision Research
      Citation Excerpt :

      As soon as the hand was close enough to the target object, the vision of the hand allowed participants to guide their fingers onto the physical surface of the object. This is in line with previous evidence showing that what ensures a successful grasp movement is a combination of online visual control during a given trial and sensorimotor error-prediction to adjust the movements of the following trials (Bozzacchi & Domini, 2015; Bozzacchi et al., 2014, 2016; Campagnoli et al., 2012, 2015, 2017; Cesanek & Domini, 2017; Cesanek, Campagnoli, Taylor, & Domini, 2018; Foster et al., 2011; Volcic & Domini, 2014, 2016; Volcic, Fantoni, Caudek, Assad, & Domini, 2013). The small effect size of cue combination underlines the strong interconnection between the perceptual system and the motor system during the processing of depth information.

    • Depth magnitude from stereopsis: Assessment techniques and the role of experience

      2016, Vision Research
      Citation Excerpt :

      In previous studies of distance estimation from binocular disparity there have been reports of overestimation of depth from small and underestimation of depth from large disparities. In several experiments this pattern of responses has been attributed to unreliable internal estimates of egocentric viewing distance (Foley, 1980; Foster, Fantoni, Caudek, & Domini, 2011; Johnston, 1991; Rogers & Bradshaw, 1993). That is, in relatively impoverished viewing environments, like those used here, there is little information available to support reliable estimates of distance, apart from vergence, which is known to be highly variable (see Howard & Rogers, 2012).

    • Do we perceive a flattened world on the monitor screen?

      2011, Acta Psychologica
      Citation Excerpt :

      Therefore, we loose any rational justification for choosing maximum reliability as the goal of cue combination (see also Domini & Caudek, 2011). An alternative explanation of the present result is provided by the cue-combination model proposed by Domini and Caudek (Caudek, Fantoni, & Domini, 2011; Di Luca, Domini, & Caudek, 2007; Di Luca, Domini, & Caudek, 2010; Domini et al., 2006; Domini & Caudek, 2009; Domini & Caudek, 2010; Fantoni, Caudek, & Domini, 2010; Foster, Fantoni, Caudek, & Domini, 2011; Tassinari, Domini, & Caudek, 2008). This approach differs from linear cue combination in an important respect: according to the proposal of Domini and Caudek, the direct (i.e., retinal) information about 3D shape provided by different cues is combined before a metric interpretation is assigned to each signal in isolation.

    • New Approaches to 3D Vision

      2023, Philosophical Transactions of the Royal Society B: Biological Sciences
    • Does depth-cue combination yield identical biases in perception and grasping?

      2019, Journal of Experimental Psychology: Human Perception and Performance
    View all citing articles on Scopus
    View full text