First- and second-order transformational apparent motion rely on common shape representations

When one figure is replaced with another that overlaps its spatial location, observers perceive an illusory, continuous shape change of the original object, a phenomenon known as transformational apparent motion (TAM). The current study investigated the extent to which TAM depends on a common, high-level shape representation that is independent of the shape-defining attribute. Specifically, we tested whether TAM is perceived similarly for both first- and second-order objects, defined by luminance and texture contrast, respectively. A compelling motion percept was observed in second-order TAM displays that was comparable to that seen in first-order TAM displays. Importantly, TAM for both stimulus classes showed the same pattern over a range of stimulus onset asynchronies. These results support the high-level shape account, indicating that TAM is driven by segmentation mechanisms that rely on high-level shape information rather than low-level visual characteristics.


Introduction
Transformational apparent motion (TAM) is part of a family of motion percepts in which the sudden onset of an adjacent or overlapping shape (hereafter referred to as a "bar") is seen to smoothly extend from the pre-existing, statically presented stimulus (hereafter referred to as a "square"). The phenomenon was first observed by Kanizsa (1951) who reported that a bar that appeared all at once next to a statically presented square appeared to grow away from it. He called this motion percept "polarized gamma motion" (PGM), because he thought it was a variant of Gamma movement (Harrower, 1929). Hikosaka, Miyauchi, and Shimojo (1993) rediscovered PGM and named it "illusory line motion" (ILM; Fig. 1). In addition to ILM originating from a static square, these authors reported ILM away from a square when it was flashed on and off just prior to the appearance of the line. Since both of these cases involve motion away from an attentionally salient point on the screen, the authors suggested that ILM resulted from an attentional gradient surrounding the attended object. They argued, following Wundt's and Titchener's notion of "prior entry," that attention supported speeded entry into visual processing for points close to the attended object (Titchener, 1908). Since attention and the speed advantage decreased with distance away from the salient square, the line appeared to grow outward from the square, generating the appearance of motion.
When two identical squares are initially presented and a bar appears all at once between them, participants perceive motion from both squares "colliding" in the center (Von Grünau & Faubert, 1992). On first consideration, this finding is consistent with the presence of attentional gradients at the locations of both boxes. When the squares have different visual features, however, the percept differs. Faubert and Von Grünau (1995) presented participants with squares of different luminance values (with the same contrast relative to the background), or different isoluminant colors. Under these conditions, motion was perceived starting from the square that shared a feature with the bar, showing that correspondence of feature information constrains the interpretation of ambiguous motion. The same authors also showed that even though a common feature biases the motion when there are competing squares on both sides of the central bar, the illusory motion persists when a single square and adjacent bar are defined by two different features such as color, luminance, motion, texture, and stereo-depth (von Grünau & Faubert, 1994).
With more elaborate shapes, both feature and contour continuity define the correspondence between the "bars" and "squares" (Tse, Cavanagh, & Nakayama, 1998;Tse, 2006). Motion is perceived away from objects that share smooth contours with the "bar" and toward objects that have concavities at the adjacent edge of the "bar" after its appearance. Moreover, this shape continuity predominates over low-level motion-energy in determining the direction of illusory motion (Tse, Cavanagh, & Nakayama, 1998;Tse & Logothetis, 2002). These illusions emphasize the importance of shape segmentation in driving correspondence over time, which is why they are known collectively as transformational apparent motion illusions.
According to Tse and colleagues, what counts as an object must be decided before the visual system can match an object to itself over space and time despite displacements, shape changes, and potential changes in occlusion relationships among objects. So, the process of spatially isolating and identifying unique objects (including disambiguating them from the background, other overlapping elements, confounding noise, etc.) precedes the matching process underlying the computation of apparent motion. This process of segmenting elements of a visual display into specific, discrete shapes has been referred to as figural parsing (Tse, Cavanagh, & Nakayama, 1998;Baylis, 1998). In this framework, TAM reflects two distinct processes, a stage of segmentation within and across successive images in order to determine what counts as an object, and a subsequent stage of object matching across time where the shape change is seen as a continuous motion, as an object at time 1 moves or transforms into the matching object at time 2.
While the relationship between TAM and related motion illusions is not fully resolved (Hamm, 2017), it is clear that perception of TAM depends on shape segmentation and matching over time rather than exogenous attentional gradients or spreading activation in early visual areasalthough these may account for other forms of apparent motion. A previous fMRI experiment demonstrated that both hMT+ and LOC show greater activation when participants view TAM displays than when they view a similar control stimulus that does not produce illusory motion (Tse, 2006), suggesting that interactions between these visual areas mediate the influence of shape segmentation on motion perception.
Although displays with complex shapes emphasize the role of figural processing, simple square and bar displays with the same characteristics can also be constructed. For example, the bar appears to move away from a square of the same height when the other square is elongated and forms right anglesdiscontinuitieswith the bar (Tse, Cavanagh, & Nakayama, 1998). Similarly, when two squares are the same height as the bar, but an additional-shorter bar appears from one of them at a right angle to the bar connecting them, illusory motion of the connecting bar is perceived away from the solitary square that has a smooth connection to the bar (Faubert & Von Grünau, 1995;Tse, 2006;Fig. 2). We used this style of square-and-bar TAM display (similar to Fig. 2) in the present experiment, with shape information driving the direction of perceived motion without attentional cueing (bottom-up) or instruction (topdown). This allowed us to analyze the influence of shape processing in TAM independent of attention.
Critically, we also used second-order stimuli that preserve the shape information of the square and the bar but define shape by texture rather than luminance. Broadly, first-order objects are defined by a luminance contrast against the background, while second-order objects are defined by more complex stimulus attributes (Cavanagh & Mather, 1989;Zanker, 1997). Second-order stimuli share all first-order characteristics with the display background (e.g., color, luminance) and are instead defined by second-order statistics (Cavanagh & Mather, 1989) pertaining to the frequency with which combinations of first-order characteristics occur between pairs of points in the display. In addition to texture, which we use in the present experiment, second-order stimuli have been based on binocular disparity, motion, and other attributes (Julesz, 1971;Anstis, 1980;Chubb & Sperling, 1988).
Previous studies have found that several visual illusions that depend on form or shape occur for first-and second-order stimuli comparably (e. g., Hamburger, Hansen, & Gegenfurtner, 2007;Lavrenteva & Murakami, 2018), hinting that both processing channels project to a cue-invariant shape detection mechanism. There are two exceptions. First, as might be expected, any stimulus that depends on shadows (e.g., Mooney faces) must be represented using luminance differences because shadows must be darker and cannot be captured in any way by second-order stimuli (Cavanagh & Leclerc, 1989). The second exception is for stimuli that require completion, like one bar that is partially occluded by another, or subjective contours (Cavanagh, 1991), or shape from motion when the motion is carried by random dots (Livingstone & Hubel, 1988). When a shape has only contiguous regions, it leads to the same shape perception whether presented as a first-order or second-order stimulus. In the experiment reported here, the stimulus is initially two separated squares, followed by a single, contiguous L-shape ( Fig. 2) so it is not immediately evident whether the second-order version will suffer the fate of disconnected stimuli or be treated as contiguous at higher levels of shape segmentation common to both first-and second-order shapes.
To find out, we tested both first-and second-order variations of the simple version of square-and-bar stimuli (Fig. 2). Our second-order stimuli consisted of a dynamic white noise patch against a static white-noise background, and the subsequent analyses compared TAM induced by texture-and luminance-defined stimuli in otherwise identical displays. Specifically, four squares were presented on the screen (one in each visual quadrant), and on each trial a bar connected two adjacent squares. One of the squares changed shape at the same time as the bar onset so the two squares are transformed to the L shape on Fig. 2 at the end of the trial. This paradigm allowed us to connect the squares and manipulate the shape of one of them simultaneously, resulting in only one shape-matching solution, which as mentioned before, was expected to be perceived away from the matching square (Tse, 2006). Moreover, by presenting a central fixation and one square on each quadrant of the screen, we examined the difference between bars that elicited processing of correspondence across versus within hemispheres (horizontal and vertical motion percepts, respectively). We also tested the strength of TAM for both versions across multiple stimulus onset In the TAM illusion, shape correspondence determines the perceived direction of a bar presented between two boxes. The extension of the bar from the left-hand square involves straight, continuous contours (green) whereas any growth from the right-hand base involves a discontinuity (red). As a result, motion is biased to start from the lefthand square and continue to the right, as indicated by the yellow arrow. Figure adapted from von Grünau and Faubert (1995). (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.) asynchronies (SOA). High-level motion is less sensitive to the temporal offset between the first and second configuration, being visible for SOAs up to 500 ms (Boulton & Baker, 1993;Boulton & Baker, 1994;Bex & Baker, 1999) whereas low-level motion is quite sensitive to SOA (Cavanagh, Boeglin, & Favreau, 1985;Ramachandran & Anstis, 1983).

Materials and method
Participants (N = 12) were Dartmouth College students enrolled in an introductory psychology course. All participants provided informed consent under a study protocol approved by the Committee for the Protection of Human Subjects at the Dartmouth College Institutional Review Board and were compensated with course credit. Stimuli were presented using Psychtoolbox (Brainard & Vision, 1997;Pelli & Vision, 1997), in MATLAB (The MathWorks, Natick, MA, USA) on an LCD monitor (15-in, 40.0 • × 30.0 • , 60 Hz). Participants observed the display from a chin rest at a viewing distance of 57 cm and central fixation was monitored using an Eyelink II eyetracker (SR Research, Ontario, Canada). Trials on which gaze deviated more than 3 • from central fixation were excluded from the analysis.
First-order stimuli were black on a medium gray background. Second-order stimuli were dynamically updating black and white textures on a statically presented black and white background (Fig. 3). The background images were created by constructing 512 × 384 arrays in which each value was chosen to be black or white with equal probability. These images were stretched to fit the screen (1024 × 768 px) such that each 4 adjacent (2 × 2) pixels shared the same black/white assignment. A new background texture was used on each trial. Square and bar stimuli were constructed similarly, but the internal texture updated at a rate of 60 Hz while the stimulus was present on the screen.
Participants completed four 80-trial blocks for each stimulus type, and order was counterbalanced between participants. Each trial began with a central fixation dot. After a jittered delay interval, four squares appeared (3.92 • side length)-one in each quadrant of the displayequidistant (11.40 • ) from fixation (see Figs. 3 and 4). Following a variable stimulus onset asynchrony of 17 to 250 ms, a bar (16.07 • length) appeared that bridged two adjacent squares, chosen randomly. A short bar the same size as the initial squares appeared adjacent to one of the squares simultaneously with the bar (Fig. 3B), and both the bars and the 4 original squares remained on the screen until the response was made. The participant was instructed to indicate the direction of motion perceived along the connecting bar using the four arrow keys on a standard keyboard. Importantly, the bar could connect squares horizontally on the top or bottom of the display or vertically to the left or to the right of fixation and the short bar could be adjacent to the square on either end of the connecting bar.
Catch trials were included to assess whether participants perceived motion in a systematic direction when the L-shape was absent or when incremental motion appeared in the display. In 'no-L' catch trials (10%), the short bar did not appear. In 'incremental motion' catch trials (10%) the bar connecting the two squares was added incrementally, extending away from the base of the L, in the opposite direction from that predicted for TAM. Incremental motion occurred over 5 frames at 60 Hz presentation, for 83.3 ms total duration. As in experimental trials, the bar remained onscreen until the participant pressed a response key.

Results
One participant misunderstood the instructions, so their responses were excluded without review, leaving data from 11 participants for analysis. Responses on trials in which gaze deviated more than 3 degrees of visual angle from fixation were excluded from analysis (M = 4.86% and 2.98% of trials per participant per condition for luminance and texture stimuli, respectively). Also, trials in which the response did not match the orientation of the bar (i.e., 'Left' or 'Right' responses would be wrong for a vertical bar) were excluded (M = 0.51% and 0.41% for luminance and texture conditions, respectively).
Averaged across all SOA's, participants reported motion in the expected direction (congruent) at a level significantly above chance for both first-order (t(10) = 12.20, p < .001) and second-order (t(10) = 9.53, p < .001) stimuli, indicating a compelling, shape-driven percept of motion in both first-and second-order TAM conditions. Broken down by stimulus onset asynchrony (SOA), motion was perceived in the congruent direction at a level significantly above chance at all SOA's, for both stimulus types (all p < .001), as shown in Fig. 5, and remained significant after FDR correction (q < 0.05). A two-way repeated measures ANOVA revealed no significant interaction between SOA and stimulus type (F(7,70) = 0.10, p > 0.95) and no main effect of SOA (F (7,70) = 0.07, p > 0.98), showing no evidence for the drop in motion effect at longer SOA that we would expect if low-level motion processes were underlying second-order TAM. The only significant relationship was a main effect of stimulus type, which was consistent with a slightly stronger percept of first-versus second-order TAM across all SOA's (F (1,10) = 4.97, p = 0.03; reflected also in Fig. 7 and later analyses).
To check the strength of first-and second-order TAM effects, we compared them with percepts of real motion (Fig. 6). We used a twofactor repeated measures ANOVA to compare congruent motion percepts in TAM versus real motion catch trials and found no significant effect between real motion and TAM trials (F(1,10) = 0.92, p > .35), indicating that the illusory motion percept in TAM was comparable in strength to real motion perception. Additionally, during debriefing, none of the participants in the experiment reported noticing differences between the apparent motion trials and the real-motion catch trials, further confirming the strength of the apparent motion percept.
The experiment also included catch trials in which no L-shape was presented. In such trials, a bar appeared connecting the two boxes and participants were asked to indicate the perceived direction of motion. We expected that such trials would not be systematically perceived as motion in either direction. Indeed, Von Grünau and Faubert (1992) reported that participants perceived such displays as motion colliding at the center of the bar. Nevertheless, we included these trials in order to catch any response biases participants may have acquired during the course of the experiment. Since there was no option to indicate colliding motion (participants had to respond using the arrow keys) we expected responses to be at chance (50%) for both the horizontal motion (response: left vs. right) and vertical motion (response: up vs. down) conditions with either first-order and second-order objects. One sample t-tests for each of these conditions did not reveal significant differences from the chance level of 50% in any of these conditions (Table 1, all p > .05). Fig. 3. Square, fixation, and bar stimuli. At the start of each trial, participants saw four squares with side length 3.92 • , positioned 11.40 • away from a central fixation dot and 16.07 • from each other, adjacently (panel a). Next, they saw two additional stimuli appear simultaneously: a long bar bridging adjacent squares (16.07 • length), and a short bar (3.92 • length) adjacent to one of the bridged squares along its non-bridged side nearest fixation (panel b). The colors in the illustration reflect what participants saw in first-order stimulus trials. In second-order stimulus trials, stimuli had the same dimensions and positions, but the appearance of the stimuli and the background were different (see Fig. 4, panels d-f).

Hemifield effect
We also examined the characteristics of within-hemifield (vertical TAM displays) and between-hemifield (horizontal TAM displays) processing on the perception of TAM. A two-factor, repeated measures ANOVA revealed a significant interaction of stimulus type by hemifield (F(1, 10) = 10.08, p < .01) which appears to be driven by a reduction in the influence of shape on second-order TAM displays when they are within a single hemifield. Second-order TAM displays crossing the vertical midline (i.e., appearing to move horizontally) were perceived toward the base of the L more often (M = 91.9%) than those crossing the horizontal midline (M = 84.6%), which moved vertically and remained within a single visual hemifield (Fig. 7). In contrast, there was little effect of within vs across hemifields for the first order stimuli.

Discussion
We have found comparable TAM percepts for first-and second-order displays. Unlike Illusory Line Motion (ILM) that depends on attentional cues, TAM reflects shape segmentation processes and is recently shown to be quantitatively different than simple ILM (Hamm, 2017), making this the first report of TAM using second-order stimuli.
Our results showed a trend of lower percentage of responses toward the base of the L for second-order stimuli as evident in Fig. 5. Such difference is not unexpected as we did not match our first-and secondorder stimuli for visibility and texture-defined objects are generally more difficult to detect and process. It would be worthwhile to investigate whether TAM can also occur with stimuli in which the correspondence between the box and bar is defined by distinct attributes (as in von Grünau & Faubert, 1994). Moreover, if motion energy was driving TAM here, we would expect second-order TAM to be weaker or non-existent at longer SOAs (Boulton & Baker, 1993). This did not happen, supporting the common analysis of both stimulus types at the level of higher order motion processing.
We did observe a difference between first-and second-order TAM within-hemifield displays, with second-order displays leading to a less compelling percept of TAM on the vertical motion trials (within hemifield, Fig. 7). We are unsure of the origin of this difference and will continue to address it directly in further studies.
In conclusion, the current results provide evidence that Fig. 4. Experiment paradigm. Both first-order (left panel, a-c) and second-order (right panel, d-f) trials followed the same procedure. In experimental trials, we presented a central fixation dot (red), then a display of four squares, then a connecting bar between two squares appearing simultaneously with a short bar next to one of the two connected squares (a, d). In incremental-motion catch trials, we presented a connecting bar incrementally, extending away from the base of the L (b, e). In no-L catch trials, we presented a connecting bar the short bar (c, f). At the end of each trial, participants pressed one of four arrow keys indicating the direction of perceived motion. (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.)

Fig. 5.
Proportion of experimental trials with congruent motion percept. At each SOA, participants perceived motion toward the base of the L (congruent) in a significantly higher proportion of trials than would be expected by chance (50%). This was true for both first-order stimuli (solid line) and second-order stimuli (dashed line). Standard error of the mean for each stimulus type, at each SOA, is shown in red. (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.) Fig. 6. Proportion of congruent motion percepts in TAM versus real motion catch trials. In incremental motion trials (left), participants perceived the veridical direction of real motion (congruent) at a level significantly above chance (50%). In TAM trials (right), participants reported the expected direction of apparent motion (congruent, toward base of L) at a level significantly above chance. This was true for first-order (solid bar) and second-order (texture bar) stimuli.
transformational apparent motion arises due to shape segmentation at the level of a representation that is common to both first-and second order stimuli. The lack of difference between short and long SOAs for both types of stimuli also indicates that high-level processing of shape matching underlies TAM. Although we did not use displays with mixed first-and second-order stimuli (e.g., luminance-defined squares and a texture-defined bar), our results suggest that TAM is driven by a highlevel shape segmentation mechanism that is invariant to the visual attributes of the displays. Future experiments with mixed displays can further investigate the degree of independence of TAM from low-level visual attributes.

Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper. Fig. 7. Proportion of congruent trials in between-versus within-hemifield TAM displays. In TAM trials with a horizontal, between-hemifield bar (left), participants perceived congruent motion in a comparable proportion of first-and second-order trials. In TAM trials with a vertical, within-hemisphere bar (right), participants perceived congruent motion in a significantly higher proportion of first-order stimulus trials than second-order stimulus trials. The standard error for each measure is shown in red. (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.)

Table 1
One-sample t-tests for 'no-L' catch trials, split by stimulus type. Participants did not show a significant response bias above the level of chance (50%) in any condition.