Cross-Modal Interactions of the Tactile System

The sensory systems responsible for touch, vision, and hearing have traditionally been regarded as mostly separate. Contrary to this dogma, recent work has shown that interactions between the senses are robust and abundant. Touch and vision are both commonly used to obtain information about a number of object properties, and they share perceptual and neural representations in many domains. Additionally, visuotactile interactions are implicated in the sense of body ownership, as revealed by powerful illusions that can be evoked by manipulating these interactions. Touch and hearing both rely in part on temporal-frequency information, which leads to a number of audiotactile interactions reflecting a good deal of perceptual and neural overlap. The focus in sensory neuroscience and psychophysics is now on characterizing the multisensory interactions that lead to humans’ panoply of perceptual experiences.


Interactions Between Vision and Touch
Although vision is a distance sense whereas touch is a contact sense, the sensory systems supporting vision and touch have much in common. Both these senses are used to assess object properties, such as orientation, shape, and surface texture; spatial relations between objects or parts of objects; and object motion. Thus, it should not be surprising that the perceptual characteristics of the visual and tactile systems resemble each other, or that the corresponding neural representations overlap. For ease of exposition, we have chosen to organize the following sections by domain, referring in each case to psychophysical 1 studies of perceptual interactions between vision and touch and then to their neural substrates.

Orientation discrimination
People perceive the orientation of objects in their environment visually, but also appreciate the orientation of small or graspable objects haptically (i.e., with the hands). Further, vision and touch interact in the perception of stimulus orientation, as exemplified in tactile modulation of a visual illusion called the tilt illusion: In vision, the perceived orientation of a central grating 2 can be affected by the orientation of a surrounding grating. When the central and surrounding gratings differ in orientation by a few degrees, the central grating is perceived as tilted further away from the orientation of the surround than it actually is. This "repulsive" effect is enhanced by a simultaneously presented tactile surround grating whose orientation is congruent to that of the visual surround (Pérez-Bellido, Pappal, & Yau, 2018). Another example comes from the study of binocular rivalry, in which presenting two stimuli independently to the left and right eyes results in corresponding percepts that switch unpredictably back and forth. For instance, when the two stimuli are gratings with orientations perpendicular to each other, the perceptual experience oscillates between the two gratings. When a tactile grating is then added in an orientation matching that of one of the visual gratings, this visual grating tends to become dominant over its competitor in the rivalry (Lunghi & Alais, 2015).
Such interactions suggest that the representation of object orientation is common to vision and touch. This idea fits with neuroimaging studies using positron emission tomography (PET), which have shown that the same part of visual cortex is active during visual (Sergent et al., 1992) and tactile (Sathian et al., 1997) discrimination of grating orientation. This visual cortical region appears to be the human counterpart of the monkey visual cortical area known as the sixth visual area, or V6 ( Fig. 1a; for a review, see Sathian, 2016). Transient disruption of this visual cortical area by focal transcranial magnetic stimulation (TMS; application of brief magnetic pulses to the head) interferes with tactile discrimination of grating orientation (Zangaladze et al., 1999), which demonstrates the functional role of this visual area in touch. These neuroimaging and neurostimulation findings, together with the psychophysical studies summarized above, converge on the conclusion that object orientation is encoded in a neuronal pool accessible to both vision and touch. Collectively, these observations illustrate that so-called visual cortical areas subserve not only visual but also corresponding nonvisual tasks, a point we return to repeatedly in this review. Accordingly, one could argue that referring to these areas as "visual" is incorrect, but in the present review we use this term, not only for the sake of simplicity, but also in recognition of the fact that these areas were originally identified as stations along the pathways of visual processing.

Shape perception
Although an object's shape is a defining visual property, shape is also often assessed haptically, especially during grasping. Recognition of unfamiliar objects is view dependent in both the visual and the haptic modalities: Rotating a previously seen object impairs later recognition of it, although familiar objects can often be recognized despite being rotated, presumably because the observer has acquired multiple representations from different viewpoints (for a review, see Peissig & Tarr, 2007). In contrast, cross-modal object recognition (haptic recognition of an object initially studied visually, or vice versa), although not quite as good as within-modal object recognition, is view independent; that is, it is unaffected by rotation of the object (Lacey, Peters, & Sathian, 2007). We were able to induce within-modal view independence through either visual or haptic perceptual learning, by presenting repeated trials in which participants first studied objects and then were asked to recognize them in both rotated and unrotated orientations. In this study, the view independence acquired in the trained modality transferred cross-modally without further training (Lacey et al., 2009). Moreover, crossmodal training with unrotated and rotated objects by itself was sufficient to produce both visual and haptic view independence (Lacey et al., 2009). These studies point to the existence of a multisensory representation of object shape that incorporates view independence.
Haptic shape is encoded not only in somatosensory cortical areas, the parts of the cerebral cortex classically associated with touch, but also in a region of visual cortex called the lateral occipital complex (LOC; Fig.  1b; for a review, see Sathian, 2016), which mediates visual shape perception. The similarity of spatial patterns of LOC activity in response to different shapes, whether encountered visually or haptically, correlates with the perceptual similarity of the same shapes (Masson et al., 2016). Even sound cues can activate the LOC during shape recognition in both sighted and blind people, which suggests that the LOC computes geometric representations of shape regardless of the input modality (for a review, see Sathian, 2016). Posterior parietal cortex, in and around the intraparietal sulcus (IPS; Fig. 1b), is also involved in the perception of shape through both vision and touch; however, its function appears to be to reconstruct representations of objects from their component parts. To compare the involvement of areas such as the LOC and IPS in haptic and visual shape perception, we undertook a series of functional MRI (fMRI) studies of activity and connectivity during multiple tasks (Lacey et al., 2014). On the basis of these studies, we proposed a conceptual model of haptic shape perception (Fig. 1d). An important feature of our model is its inclusion of the ability for mental imagery, which appears to be quite important for haptic shape perception. Indeed, mental imagery is also valuable for visual perception, especially under suboptimal viewing conditions, in which imagery is used to construct hypotheses against which incoming visual input is compared (Kosslyn, 1994, Chapter 5). Locations of the human brain regions referenced in the text (a-c) and a conceptual model of haptic shape perception (d). In all three brain images, the front (anterior) of the brain is on the left, and the back (posterior) is on the right (adapted from Sathian, 2016, Fig. 1). The images in (a) and (c) are sagittal slices (i.e., in planes oriented along the anterior-to-posterior axis of the head); the x coordinates give the distance (in millimeters) from the midline dividing the head into equal right and left halves; positive and negative numbers indicate the right and left hemispheres, respectively. The image in (b) is a partially inflated side view of the left hemisphere, in which the sulci (irregular grooves) in (a) and (c) are darker gray than the adjacent gyri (protuberances). The location of visual area V6 is shown in (a). The image in (b) shows the location of prefrontal cortex (PFC), the parietal operculum, the central sulcus (CS; shown as a landmark), primary somatosensory cortex (S1) in the postcentral gyrus just posterior to the CS, the intraparietal sulcus (IPS), visual area MT (so named for the homologous area in macaque monkeys, known as the middle temporal visual area), and the lateral occipital complex (LOC). The location of medial occipital cortex (MOC) is shown in (c). In our conceptual model of haptic shape perception (d), object familiarity modulates recruitment of appropriate neural networks. For familiar objects, the LOC is driven top down from the PFC, which allows facilitation by object imagery, whereas for unfamiliar objects, the LOC is driven bottom up from S1, aided by spatial imagery. The IPS integrates representations of an object from its component parts, whereas the LOC houses modality-independent object representations. Adapted from "Spatial Imagery in Haptic Shape Perception," by S. Lacey Visual imagery can be subdivided into object imagery, characterized by rich pictorial representations that integrate surface characteristics such as texture, and spatial imagery, which emphasizes spatial relationships between parts of (or entire) objects. Individuals vary in their relative preference for spatial or object imagery (Kozhevnikov et al., 2005). The object-spatial distinction in imagery also applies to haptics (Lacey et al., 2011), and the accuracy of cross-modal object recognition correlates with the ability for spatial, but not object, imagery (Lacey, Peters, & Sathian, 2007). In our model (Lacey et al., 2014), we proposed that visual object imagery is particularly relevant for haptic perception of the shapes of familiar objects, during which the LOC is activated via top-down pathways 3 from prefrontal cortex (PFC; Fig. 1b) that presumably drive such imagery of previously encountered objects. In contrast, we argued that haptic perception of unfamiliar objects (for which object imagery is not readily available) is mediated instead by bottom-up pathways from primary somatosensory cortex (S1; Fig. 1b) and facilitated by spatial imagery processes in the IPS that are thought to allow whole-from-part reconstruction of object representations that are critical for unfamiliar objects but may also be useful for familiar objects (Lacey et al., 2014).
The commonalities between visual and haptic shape perception resonate with the question famously posed by Irish philosopher William Molyneux to his British colleague John Locke: Would restoration of sight to someone blind from birth allow visual recognition of objects previously experienced only through touch (Locke, 1706(Locke, /1997? It turns out that the empirical answer to this question is nuanced: Five congenitally blind individuals who underwent surgical treatment to restore their vision were unable to match a haptically presented sample object with the correct visually presented object within 48 hr after surgery. However, three of the five participants were tested days to weeks later and showed substantial improvement on the crossmodal test (Held et al., 2011). Thus, it appears that the answer to Molyneux's question is negative immediately after sight restoration, but turns positive after a short period of time, presumably a reflection of the (rapid) effect of multimodal experience. Given the small sample size and the preliminary nature of this study, further work on this topic is desirable.

Texture perception
The texture of object surfaces is a property that is primarily sensed via touch, and touch is superior to vision in this domain (for a review, see Sathian, 2016). This is not surprising when one considers the multiple dimensions of texture, which include rough-smooth, hard-soft, and sticky-slippery (for a review, see Bensmaia, 2009)judgments for which people tend to rely on touch. The rough-smooth dimension has been studied extensively from psychophysical and neurophysiological perspectives, and this work indicates that spatial patterns are particularly important (as they are in vision) for coarse tactile textures. Because perception of haptic texture depends on moving one's fingers over surfaces, temporal cues (arising from the timing with which bumps on the surface are encountered) also contribute; temporal frequency (an important property of auditory as well as tactile stimuli) becomes increasingly important as tactile textures get finer (for a review, see Bensmaia, 2009). Tactile textures bias judgments of simultaneously encountered visual textures, but not vice versa (Guest & Spence, 2003), which is consistent with the dominance of touch in texture perception.
The parietal operculum (Fig. 1b) is a key somatosensory cortical locus where haptic texture is represented. Sun et al. (2016) trained a multivariate classifier 4 to distinguish visual presentations of glossy versus rough objects using spatial patterns of fMRI activity not only in visual cortical areas, but also in the parietal operculum (Sun et al., 2016). The classifier's success implies that the visual stimuli evoked corresponding haptic representations. Reciprocally, haptic assessment of texture activates texture-selective visual cortical areas, especially in medial occipital cortex (MOC; Fig.  1c; for a review, see Sathian, 2016). Remarkably, the texture-selective parietal operculum is also active when people listen to sentences containing textural metaphors, such as "she had a rough day" (Lacey et al., 2012), which suggests that metaphorical roughness is understood by reference to its physical counterpart. This underscores the grounding of abstract concepts in relevant sensorimotor processes, an idea originally proposed by Aristotle, developed in modern cognitive psychology (see Barsalou, 2008, for a review), and applied to the subject of metaphors by Lakoff and Johnson (1980).

Motion perception
The motion aftereffect is well known in vision: After exposure to visual motion in one direction for about 10 s, a static visual stimulus seems to move in the opposite direction. This aftereffect is thought to be due to adaptation of the motion detectors. It is also manifest cross-modally: Adaptation to visual motion induces a tactile motion aftereffect, and vice versa, which indicates a shared visuotactile representation of motion (Konkle et al., 2009). Consistent with this hypothesis are numerous neuroimaging studies demonstrating that motion-selective visual cortical areas referred to as the MT complex (Fig. 1b) are also recruited by tactile or auditory motion, in both sighted and blind people, although some studies have failed to find cross-modal activation. TMS over the MT complex interferes with discriminating both the speed and the direction of tactile motion (Amemiya et al., 2017; this article also reviews previous neuroimaging and TMS studies relating to motion), which reinforces the idea that this and other visual cortical areas underlie the performance of domain-specific tasks (motion, shape, etc.) in a modalityindependent manner.

Body ownership
The sense of body ownership-where one's body is in space and what belongs to one's own body and not someone else's-should logically be one of the strongest percepts (for a review, see Ehrsson, 2020). However, the rubber-hand illusion (Botvinick & Cohen, 1998), which demonstrates an especially intriguing visuotactile interaction, challenges this idea: This illusion arises when one arm of the participant is concealed and "replaced" with a realistic fake arm. While the participant looks at the fake arm, the experimenter synchronously brushes the fake hand and the participant's own (hidden) hand, which induces the illusory feeling of touch on the unseen hand, the sense that the fake arm belongs to the participant's body (incorporation into the body image), and the sense that the unseen, real arm is located in the same position as the fake arm (proprioceptive drift). The self-reported strength of the illusion correlates with activity in ventral premotor cortex (for a review, see Ehrsson, 2020). The rubber-hand illusion can be induced in a matter of seconds in individuals who are susceptible to it, and recent work suggests that the effects are relatively long-lasting; in particular, the sense that the fake arm belongs to one's body may persist for several minutes (Abdulkarim et al., 2021).
In a similar but even more dramatic visuotactile illusion, synchronous stroking of a mannequin and a participant's own body at the same body location, along with a virtual-reality display that shows the mannequin in place of the participant's body, induces the perception that the mannequin's body is the participant's own (for a review, see Ehrsson, 2020). These amazing observations indicate that constructing the perception of one's body and its parts depends critically on multisensory integration. There are large individual differences in susceptibility to these illusions, the reasons for which have yet to be worked out; understanding such variability is vital because incorporating prosthetics or teleoperated devices into the body schema is important for their efficient use (Cutts et al., 2019).

Visuo-haptic object processing over the life span
Infants-even neonates-are capable of visuo-haptic cross-modal matching and are therefore sensitive to object properties common to vision and touch (for a review, see Lewkowicz & Bremmer, 2020). However, the statistically optimal visuo-haptic integration that adults demonstrate, in which input from the two modalities is flexibly weighted to minimize the variance of perceptual estimates, takes some time to develop: Up to about the age of 8 years, integration is suboptimal; haptics dominates size perception, and vision dominates orientation discrimination. By the age of 8 to 10 years, however, integration is more optimal, presumably a reflection of calibration of sensory systems by crossmodal comparison during development (Gori et al., 2008). That something important happens around the age of 8 to 10 years is consistent with the observation that cross-modal object recognition is view independent for children ages 9 to 10 years and older (as in adults; see the earlier section on shape perception) but not for younger children ( Jű ttner et al., 2006). Although proficiency at visual or haptic within-modal memory for objects is unaffected by age, this is not so for crossmodal memory: Older adults exhibit a marked asymmetry such that performance is much worse when haptic encoding is followed by visual retrieval than when visual encoding is followed by haptic retrieval, in contrast to the situation in childhood and early adulthood, when cross-modal object recognition is unaffected by which modality is used for encoding (Lacey, Campbell, & Sathian, 2007;Norman et al., 2006).

Audiotactile Interactions
Temporal-frequency information is perhaps most important to audition, for example, in speech and music perception (Pérez-Bellido, Barnes, et al., 2018), but the ability to perceive temporal frequency by touch also contributes to tactile texture perception, as noted above (Bensmaia, 2009). Further, manipulating the frequency of the (unattended) sounds generated by touching textured surfaces influences tactile texture judgments, which suggests that auditory frequency is perceptually integrated with tactile texture (Guest et al., 2002). Thus, a reasonable question is whether audition and touch share a common representation of temporal-frequency information and/or a common neural basis.
A number of psychophysical studies point to a shared representation of temporal frequency between audition and touch. For instance, in one study, participants were asked to report which of two successively presented stimuli had the higher temporal frequency while they attended selectively to either auditory or tactile input and attempted to ignore distractors in the other modality. Symmetric audiotactile influences were found: The frequency in the attended modality was consistently perceived as similar to that in the unattended modality (Convento et al., 2019). Also, auditory adaptation effects can transfer to the tactile domain: Exposure to frequencyspecific auditory noise improves subsequent discrimination of tactile frequency but not intensity (Crommett et al., 2017).
Neuroimaging and neurostimulation studies bear out the notion of convergent temporal-frequency representations of tactile and auditory inputs. For instance, in an fMRI study, the left auditory cortex was found to respond to vibrotactile frequencies of 20 Hz and 100 Hz, both in the audible range, but not to a 3-Hz stimulus, which is below the audible range (Nordmark et al., 2012). Conversely, another fMRI study revealed that multiple regions of somatosensory cortex responded to auditory inputs in a frequency-specific manner and that the similarity of spatial patterns of activity in response to different frequencies correlated with the similarity of corresponding perceptual judgments, although the somatosensory cortical responses were less robust and noisier than their auditory cortical counterparts (Pérez-Bellido, Barnes, et al., 2018). Moreover, TMS over primary somatosensory cortex impaired auditory frequency discrimination, but only when trials comprising unimodal auditory stimuli were interleaved with trials requiring (unimodal) tactile or (cross-modal) audiotactile frequency discrimination (Convento et al., 2018).

Conclusions
This brief review has provided some examples of multisensory interactions involving the tactile (haptic) system. Touch and vision represent object properties similarly in a variety of domains, and the neural representations of these properties converge in brain regions that should be considered as specialized for particular tasks rather than for particular modalities of sensory input. Although work on audiotactile interactions is less well developed than work on visuotactile interactions, a similar theme of perceptual and neural commonality emerges in the domain of temporal frequency. Tactile inputs can often be integrated with corresponding visual or auditory inputs, and dramatic illusions evoked by manipulating visuotactile interactions reveal that the very sense of body ownership depends critically on multisensory integration. Thus, the various sensory systems should no longer be considered independent. Rather, the goal in sensory neuroscience and psychophysics is to find out how they interact to produce the richness of sensory experience.

Transparency
Action Editor: Robert L. Goldstone Editor: Robert L. Goldstone

Declaration of Conflicting Interests
The author(s) declared that there were no conflicts of interest with respect to the authorship or the publication of this article.

ORCID iD
K. Sathian https://orcid.org/0000-0001-9495-2499 Notes 1. "Psychophysics" refers to quantitative behavioral studies of psychological functions, such as perception. 2. A visual grating is an image whose luminance varies in a periodic manner across the image. A tactile grating is a surface with alternating ridges and grooves that is actively felt with the fingertip or applied to a passive finger.
3. "Top-down" and "bottom-up" refer to the direction of information flow in sensory processing. Sensory inputs feed forward (bottom up) into progressively higher-order brain regions that perform more complex processing, whereas these brain regions provide feedback (top down) to lower-level regions. 4. In this machine-learning approach, the fine-grained pattern of activity across multiple voxels (volumetric pixels) in a given area of the brain, as measured with fMRI, is compared between two or more experimental conditions and used to train a classifier (an algorithm that assigns a class label to data input). The classifier is then tested on data that were not used in training.