Novel method of extracting motion from natural movies

BACKGROUND
The visual system in primates can be segregated into motion and shape pathways. Interaction occurs at multiple stages along these pathways. Processing of shape-from-motion and biological motion is considered to be a higher-order integration process involving motion and shape information. However, relatively limited types of stimuli have been used in previous studies on these integration processes.


NEW METHOD
We propose a new algorithm to extract object motion information from natural movies and to move random dots in accordance with the information. The object motion information is extracted by estimating the dynamics of local normal vectors of the image intensity projected onto the x-y plane of the movie.


RESULTS
An electrophysiological experiment on two adult common marmoset monkeys (Callithrix jacchus) showed that the natural and random dot movies generated with this new algorithm yielded comparable neural responses in the middle temporal visual area.


COMPARISON WITH EXISTING METHODS
In principle, this algorithm provided random dot motion stimuli containing shape information for arbitrary natural movies. This new method is expected to expand the neurophysiological and psychophysical experimental protocols to elucidate the integration processing of motion and shape information in biological systems.


CONCLUSIONS
The novel algorithm proposed here was effective in extracting object motion information from natural movies and provided new motion stimuli to investigate higher-order motion information processing.


Shape-from-motion as an interaction between pathways
In primates, the motion and shape pathways have been attributed to anatomically and functionally separable cortical streams, which are the dorsal and ventral visual pathways emerg-ing from the striate cortex (Ungerleider and Mishkin, 1982). The dorsal visual pathway projects to the inferior parietal lobule and computes visuospatial information (Kravitz et al., 2011). The ventral visual pathway terminates at the anterior inferotemporal cortex and plays a role in enabling object identification (Kravitz et al., 2013). Many anatomical connections between the two neuronal pathways have been demonstrated at multiple stages (Baizer et al., 1991;Ungerleider et al., 2008;Webster et al., 1994), indicating functional interactions between motion and shape information processing (Perry and Fallah, 2014). Indeed, recent psychophysical studies have reported an influence of shape information on motion information by demonstrating that cues having a simple form (e.g., oriented lines or glass patterns) modulate motion detec-   1. Schematic of the algorithm to extract motion information from an original natural movie. The basic idea for extracting motion information from a natural movie is to estimate the dynamics of local normal vectors by calculating local normal vectors for each pixel in each frame of the movie and their time derivatives. A frame of an original natural movie (top left) is considered to be a curved surface (x,y,/(x,y)) in the three-dimensional space, which consists of the two-dimensional pixel position (x,y) of the frame and the luminance value /(x,y) (top right). The normalized local normal vectors of the curved surface are calculated and projected onto the x-y plane (middle). The time derivatives of the projected local normal vectors are calculated as the difference between neighboring frames (bottom). Motion information is visualized on the basis of random dots that move in accordance with the dynamic vector field. tion (Burr and Ross, 2002;Geisler, 1999;Krekelberg et al., 2003). Therefore, "shape-from-motion," which is the extraction of shape information from motion information, is one form of interaction. In this context, biological motion (Cutting, 1978;Johansson, 1973) seems to be processed differently from other stimuli of shape-frommotion in several aspects. When a few light markers are attached to a human body and the marked person moves in total darkness, the time series of the images of the light markers provides vast information about the age, gender, and even the emotions of the marked person to human observers. This type of stimulus shows asymmetry with respect to gravity (Pavlova and Sokolov, 2000;Sumi, 1984). These properties cannot be explained in terms of the general shape-from-motion algorithms.

Introducing a novel stimulus to study the pathway interaction
Higher-order processing of shape-from-motion has been explored using random dot motion illustrating geometrical figures, which are not sufficiently complex to cover the variety of motion embedded in natural vision. Although the light markers used in biological motion experiments enabled the extraction of rich information from their complex motion, the motion covered by the stimuli were generally limited to humans, a few other species such as pigeons, or a part of an animal's body such as the arm of a monkey (Hatsopoulos et al., 2007;Putrino et al., 2015), because of the difficulty in attaching the light markers. Visual stimuli that provide greater varieties of complex motion have been explored in the investigation of higher-order motion processing and the integration processing of shape and motion in neurophysiological and psychophysical experiments. In this study, we proposed a new algorithm to extract and separate the motion information from natural movies of humans, other animals, or artificial objects, and to move random dots in accordance with the information without any physical constraints. Although the computation in this algorithm was based only on local normal curvature vectors defined in the threedimensional space (x,y,/(x,y)), where /(x,y) is the luminance value at position (x,y) in each frame image of a movie, the global movement of a structured object emerged from the extracted image motion.

Suitability of the new algorithm in neuroscience research
In order to assess whether motion information from original natural movies extracted by the proposed algorithm is suitable for the investigation of higher-order motion processing and the integration processing of shape and motion, responses were recorded from cells in the middle temporal (MT) area of common marmosets. The responses to the random dots that moved in accordance with the extracted motion information were compared with the responses to the original movies. Cells in area MT have direction selectivity (Born and Bradley, 2005;Lui and Rosa, 2015;Solomon et al., 2015;Zavitz et al., 2016) and respond to natural movies largely consistently with predictions based on simple stimuli (Nishimoto and Gallant, 2011). Thus, a similarity in the response pattern of area MT cells between the random dot movies and the original natural movies should indicate the effectiveness of the algorithm to extract the motion information from the original movies.

Material and methods
2.1. Detailed protocol 2.1.1. Extraction of motion information: time derivatives of normalized local normal vectors of image projected onto the x-y plane Our basic idea for extracting the motion information of objects in a movie was to estimate the dynamics of local normal vectors by calculating the local normal vectors for each pixel in each frame of a movie and their time derivatives (Fig. 1). Because we are interested in the object shapes in the motion extraction method, the local normal vectors were normalized, and were subsequently projected onto the x-y plane, on which the original objects are projected in the movie.
A local normal vector at position (x,y,/(x,y)) is depicted as follows: where /(x,y) is the pixel value (luminance value) at position (x,y). Thus, the x and y components of the normalized local normal vector of the frame image projected onto the x-y plane were calculated as follows: Although the normalized local normal vector is related to a standard image feature called the "gradient", we used the term "normal vector" instead because normal vectors on a three-dimensional object contain geometric information about the shape of the object, and we would like to emphasize the extraction of motion information of objects with a shape.
We applied a spatial low-pass filter before calculating the local normal vectors to remove discontinuities and enable the calculation of the space derivative. Subsequently, the time derivatives of the projected local normal vectors were calculated. The dynamic vector field defined by the time derivatives should reflect some aspects of the shape and motion information in the movie. It should be noted that the time derivatives of the normalized local normal vectors were calculated at each image position (x,y), but not at a specific position that is defined at a fixed point on an object.
A qualitative investigation based on the equations indicated several characteristics of the algorithm. First, a normalized local normal vector markedly changes when the angle between the x-y plane and the tangent plane of an object surface markedly changes (e.g., rotation in depth). Second, a local normal vector contains the space derivative of image pixel values, indicating that the calculation of the local normal vector functions as an edge detector. Thus, the time derivative of the normalized local normal vector enlarges when an edge with high contrast is moving. Because the calculation was local, the exact motion of the object was not necessarily extracted. As discussed later, it was not our intention to calculate the ground-truth motion of objects.
The motion information was visualized based on random dot motion. Each dot moves in accordance with the dynamic vector field. We varied the gray level of the dots to establish a correspondence between individual dots in adjacent frames easily. The supplementary movies are examples of movies in which a human actor and a tiger move (Supplementary movies 1 and 2, respectively) and random dots following the dynamic vector field are extracted using the present algorithm (Supplementary movies 3 and 4). The number of square random dots was 160,000 (the side of each dot was 1 pixel). Although each frame of the random dot movies contained no object information, the global movements of the human actor and the tiger emerged when they moved.

Animals
Experiments were performed using two adult common marmoset monkeys (Callithrix jacchus; weighing 300-400 g). This study was approved by the Experimental Animal Committee of the National Institute of Neuroscience and Psychiatry and the animals were cared for in accordance with the "Guiding Principles of the Care and Use of Animals in the Field of Physiological Science" of the Japanese Physiological Society.

Stimuli
We generated 55 random dot movies made from 55 natural movies containing a wide variety of moving objects: animals, human actors, and artificial objects. Motion information from animal movies, human actor movies, and artificial movies (Fig. 2a, Supplementary movies 5 and 6) were extracted by the following procedure to generate random dot movies that traced the extracted motion information. First, various natural movies, e.g., a walking tiger, a jumping human actor, and a moving toy, were recorded with a resolution of 640 × 480 pixels at 30 frames/s. Subsequently, the movies were trimmed to be one-second long. The colors of the movies were converted to grayscale. After applying a spatial low-pass filter (circular averaging filter, radius: 4 pixels), the normalized local normal vectors of the frame images for each pixel were extracted and projected onto the x-y plane using Eqs. (2) and (3). Subsequently, the time derivatives of the projected local normal vectors were calculated as the difference between neighboring frames. The normalization factor in Eqs.
(2) and (3) was approximated by the constant value computed for the last frame. In total, 16,000 square dots were placed (the side of each dot was 3 pixels). The gray level of the dots was randomly varied from 0 (black) to 255 (white). The background gray level was 0 (black). Each dot moved in accordance with the time derivative of the projected local normal vector for the corresponding pixel. To prevent dots from clustering in specific regions at the end of the movie, half of the dots were moved prospectively from the first frame to the last frame in accordance with the derivative in the movie and the other half were moved retrospectively from the last frame to the first frame in accordance with the time reversal of the derivative in the movie. The average lifetime of the dots was set to approximately 0.5 s. Static 8000 random dots were added to mask the shape information. The size of the stimulus movie was approximately 20 • . Each stimulus was presented 12 times in a pseudorandom order. The display device was a liquid crystal display (LCD) monitor (EIZO FG2421, Ishikawa, Japan) with a refresh rate of 60 Hz (Ghodrati et al., 2015).

Electrophysiological experiment procedure
The neural data were recorded in two adult common marmosets. All the methods for surgery, anesthesia, and electrophysiological recordings were described in detail previously (Suzuki et al., 2015a,b). In brief, to investigate whether the proposed algorithm extracted the motion information of the natural movies, we recorded 64 multi-unit activities (MUAs) from the area MT of the animals. Surgery and electrophysiological recordings were conducted under anesthesia induced by ketamine hydrochloride (Ketalar, 25 mg/kg, intramuscular [i.m.]) following atropine sulfate (0.15 g/kg, i.m.). The electrocardiographs (ECGs), expired CO 2 level, and rectal temperature were monitored continuously. The animals were placed in a stereotactic apparatus and the head holder and the recording chamber were implanted on the skull. Following the surgery, the stereotactic apparatus was removed and the head was fixed using the head holder. For the electrophysiological recordings, the anesthetic was switched to an intravenous (i.v.) infusion of remifentanil (Ultiva, 0.1 g/kg/min) with rocuronium bromide (Eslax, 13 g/kg/min, i.v.) to induce muscular paralysis. The animal was artificially ventilated with a mixture of 70% N 2 O and 30% O 2 . Before the recordings, the pupil was fully dilated with topical tropicamide (0.5%) and phenylephrine hydrochloride (0.5%). A contact lens was used to focus the eye contralateral to the recorded hemisphere at a distance of 57 cm.
Electrodes were inserted with reference to the superior temporal sulcus (STS), and retinotopical organization was revealed by intrinsic optical signal imaging (data not shown). A micromanipulator was used to lower a 32-channel multicontact linear-array electrode (Neuronexus, Ann Arbor, Michigan, USA), which contained four shanks (400-m shank separation). Each shank had eight electrode contacts (impedance: ∼1 M at 1 kHz) with an intercontact spacing of 200 m. We inserted the electrode perpendicular to the cortical surface of the posterior and ventral STS parts corresponding to the area MT (Lui and Rosa, 2015), where strong responses to visual motion stimuli were observed under anesthesia. MUAs were simultaneously recorded from the 32 contacts. The timings of band-pass (0.3-5 kHz) filtered MUAs and task events (stimulus onset and offset) were recorded at 24 kHz using a TDT signal processing system (RZ2; Tucker-Davis Technologies, Alachua, Florida, USA).
The receptive fields (RFs) and direction selectivity of the MUAs in area MT were identified before presenting the stimulus movies. First, the position and size of the RF for 32 MUAs were determined by presenting a small dot (4.8 • ) at different positions in the visual field. Second, the direction selectivity for each MUA was evaluated by presenting a traditional stimulus, i.e., random dots moving to eight directions with 100% coherence at a speed of 2.9 • /s in a 3.5 • window. The centers of the window were either at the fixation point, or at ±4.5 • of the fixation point (nine positions).

Data analysis-direction selectivity
The RF of each MUA was estimated to be the window in which the maximum response was elicited when the direction selectivity was investigated using the traditional coherently moving random dots. The preferred direction of motion for traditional coherently moving random dots was defined as the direction that elicited the maximum response in the window. The estimated direction selectivity for each MUA was compared between the traditional coherently moving random dots and the random dot movies generated with the new algorithm. The motion vector of each dot in each frame of the random dot movies was calculated by taking the position differences along the x-and y-axes between neighboring frames. The motion vectors were vector-summed and thresholded: if the norm of the summed motion vector exceeded 2.9 • /s, i.e., the speed of the dots in the traditional coherently moving random dots, the vector was normalized to 1; otherwise, vectors were normalized to 0. The preferred direction for each MUA estimated from the random dot movie was defined by the weighted average of the motion vector in the RF window. The weighting coefficient for each frame was the response magnitude of the MUA defined as the aver- Fig. 3. Comparison of the preferred direction of the random dot movies and example of multiunit activities in middle temporal (MT) area of the common marmoset. a. Histogram of the inner-products for all responsive MUAs (n = 50) is shown. The inner-products are calculated from normalized vectors with preferred direction derived from random dot motion with 100% coherence and those from the random dot movies generated with the present algorithm. b. Rasters and spike density functions show responses to random dot movies that follow the vector field extracted by the present algorithm and to the corresponding original natural movies. The order of the plot is the same as in Fig. 2a. Red indicates the responses to the original movies and black to the random dot movies. c. Scatter plot of the responses to 55 random dot movies and the corresponding original natural movies. MUA, multi-unit activity.
age firing rate in a temporal window from 67 to 100 ms after the onset of the corresponding frame subtracted by the spontaneous firing rate. Moreover, the weighting coefficient could be positive or negative depending on whether the response was excitatory or inhibitory. Therefore, the estimated preferred direction took into account both strong responses to the optimal direction and weak responses or suppression to the opposite direction. Finally, the estimated preferred directions were compared between the traditional coherently moving random dots and the random dot movies generated with the new algorithm, by taking an inner-product between the two vectors. Their norms were normalized to 1. If the innerproduct was 1, the two vectors would thus point to the same direction.

Data analysis-comparison between the random dot and natural movies
The neural responses to the random dot movies and original natural movies were evaluated in three aspects: magnitude (quan-titative), effective region of the visual field (spatiotemporal), and population representation (representational aspects). The average response magnitude was calculated as the average firing rate of MUAs in the window from 33 to 1033 ms after the movie onset subtracted by the average spontaneous firing rate in the 300-ms interval immediately before the movie onset. The response in each MUA was considered to be significant when the average firing rate in the window was larger than the mean +3 standard deviation (SD) of the spontaneous firing rate for one or more movies. To compare the stimulus selectivity of each MUA between the random dot movies and the original movies, the coefficient of correlation between the response magnitudes to the random dot movies and the corresponding original natural movies was calculated. The correlation coefficient was considered to be significant when it was larger than the mean +3SD of the distribution of 1000 correlation coefficients with random pairing between the random dot movies and the original natural movies. Local spatial regions of the visual field with the effective responses of MT cells (effective regions, ERs) were compared between the original natural movies and the random dot movies. First, the ER for each MUA for each of the original natural movies was derived from the weighted average of luminance change images. The luminance change images represented absolute pixel luminance value changes and were generated from differences between neighboring frames in the original natural movies. The absolute values were taken because an important aim of this analysis was to localize the spatial regions contributing to the responses of MT cells and with a high magnitude of luminance changes irrespective of the signs. The luminance change was referred to instead of local motion because computation of the local motion in the natural movie was not a trivial task. While the local spatiotemporal gradients in luminance may be calculated, they do not necessarily correspond to the local motion. For example, the same spatiotemporal gradients in luminance stem from motion of a bar in one direction at the leading edge and motion of another bar with the opposite contrast in the opposite direction at the tailing edge; therefore, it is ambiguous in motion estimation. The images showing luminance changes were binned by 20 pixels. The weighting coefficient for each frame in the movie was the response magnitude of MUAs defined as the average firing rate in the window from 67 to 100 ms after the onset of the corresponding frame subtracted by the spontaneous firing rate. Similarly, the ER in the random dot movies was derived from the weighted average of dot speed images with the weighting coefficient of MUAs. The dot speed images were generated from the random dot movies by averaging the speed of the dots in 20-pixel windows. The speed of each dot was defined by the norm of the motion vector, which was obtained by taking position differences along the x-and y-axes between neighboring frames. The weighting coefficient for each frame in the movie was the response magnitude of MUA defined as the average firing rate in the window from 67 to 100 ms after the onset of the corresponding frame subtracted by the spontaneous firing rate. Because the absolute luminance value change was used in estimating ERs for the natural movies, the absolute velocity (i.e., speed) was used in estimating ERs for the random dot movies. Finally, the correlation of the ERs for each MUA was evaluated between the original natural and random dot movies. The ER was obtained by averaging across the image frames weighted by the neuronal responses, delineating integrated stimulus portions that yielded a response.
While this "spike-triggered average method" used in this study was computationally equivalent to the "reverse correlation method" to reconstruct RF and response properties (DeAngelis et al., 1993), the meaning of the result was different because the spatiotemporal frequency distribution of the image set used here was not "white". In other words, there were spatial correlations in visual attributes, e.g., dot appearance, speed, or direction of the dot motion, in the random dot movies. A response to a frame of a movie could potentially be attributed to several visual attributes. Therefore, this analysis could not unambiguously relate individual stimulus elements, i.e., motion of individual dots to isolated neural responses to reconstruct and compare individual RFs and response properties such as direction selectivity or tuning curves. With a large number of movies, the spatiotemporal correlation might be reduced. Here, we anticipated that ERs represented integrated portions of the stimulus yielding the response, and would be useful in the comparison between the original and random dot movies used in this study.
The random dot and original natural movies were represented in a 32-dimensional population response space in which each dimension was defined by the normalized response magnitude of an MUA. The response magnitude was normalized by calculating the z-score with the mean and standard deviation across all movies. Euclidian distances in the population response space were calculated for all pairs in the random dot movies and in the original natural movies. The similarity of the representation between the random dot movies and the corresponding original natural movies was assessed on the basis of the Euclidian distances for the corresponding pairs. The correlation coefficient of the Euclidian distances for all the corresponding pairs was calculated and considered to be significant when it was larger than the mean + 3SD of the distribution of 1000 correlation coefficients with random assignments of response magnitudes to movies.

Histological processing
After the experiments, the animals were sacrificed by using an intraperitoneal overdose of sodium pentobarbital (Nembutal, 75 mg/kg) followed by intracardial perfusion with 0.1 M phosphate-buffered serum (PBS; pH 7.4) and, subsequently, 4% paraformaldehyde in PBS (Merck, Whitehouse Station, NY, US). The dissected brain blocks were immersed in 0.1 M PBS with 10, 20, or 30% sucrose. Coronal sections of 50-m thickness were cut on a freezing microtome (Yamato-Koki, Saitama, Japan) and prepared Fig. 5. Effective spatial region in the original natural movie and random dot movie. Effective spatial regions with the response of four MUAs in the original natural movie (left) and the random dot movie (right). Pseudo-color (arbitrary unit) indicates the response magnitude. Channel #27 corresponds to MUA described in Fig. 3b. Channel #22, #14, and #4 belongs to different shanks, which are located at intervals of 400, 800, and 1200 m from the shank of channel #27, respectively. Black circles indicate the centers of gravity. MUA, multi-unit activity.
in a series of three sections. For areal demarcation, one in three consecutive sections was stained with myelin (Pistorio et al., 2006) and the second with the Nissl substrate with thionin. The third section was used for fluorescence imaging of DiI, which marked the electrode tracks after the electrophysiological experiments. We confirmed that all the electrodes penetrated the area MT from the sections stained for myelin (Fig. 2b) (Rosa and Elston, 1998).

Results
Among the recorded 64 MUAs, 50 (78%) showed significant excitatory responses to one or more of the 55 original and 55 random dot movies. Fig. 3a shows the inner-products for all responsive MUAs, which were significantly greater than 0 (Mann-Whitney U test, p < 0.00001). These results indicated that the direction selec-tivity of the MUAs in area MT derived from the random dot movies generated with the new algorithm was mostly consistent with the direction selectivity determined by traditional stimuli. The possible reasons for these discrepancies are discussed in the Discussion section. Fig. 3b depicts an example of the MUAs to the random dot and the corresponding original natural movies. The random dot movies tended to elicit responses similarly to the corresponding original natural movies, indicating a resemblance in stimulus selectivity between the random dot movies and the original natural movies. The correlation coefficient was 0.68 between the response magnitudes to the random dot movies and those to the original natural movies (Fig. 3c). This was larger than the mean +3SD (0.01 + 0.42) of the distribution of 1000 correlation coefficients with random pairing. Among the 50 MUAs, 33 (66%) showed a significant correlation between the responses to random dot movies and the corresponding original natural movies.
The ER of MT cells in this study would be the same between the random dot movies and the original natural movies if the extracted motion information in each local region was similar between the random dot movies and the original natural movies. Before comparing the neural responses to the movies, we compared the movies themselves. First, we verified whether the dot motion in local regions of the random dot movies reflected the pixel value changes in the corresponding regions of the original natural movies. Fig. 4a depicts an image of luminance changes and a dot speed image each averaged across all the sequential frames of all the original natural movies and the random dot movies, respectively (n = 1595; 30-1 frames for 55 stimuli). These showed the specific spatial bias of luminance change and dot speed in the stimulus movie used in this study. For comparison, the center of gravity was calculated for each of the original movies and the random dot movies. Fig. 4b depicts the scatter plot of the centers of gravity for all the frames of all the movies. The correlation coefficients were 0.90 and 0.92 for the x-and y-axes, respectively, and were significant (p < 0.00001; n = 1595). Thus, the spatial bias was considered to be similar between the pixel luminance value changes and the dot speed for the stimulus movies used in this experiment. Subsequently, we compared the ERs between the original natural movies and the random dot movies. Fig. 5 shows the ERs for the original natural movies and the random dot movies for four MUAs: one (channel #27) was described in Fig. 3b and the other three, namely channel #22, #14, and #4, were selected from different shanks that were located at intervals of 400, 800, and 1200 m from the shank of channel #27, respectively. The correlation coefficients of the centers of gravity between the original movies and the random dot movies for all responsive MUAs (n = 50) were 0.35 (p = 0.013) and 0.38 (p = 0.007) for the x-and y-axes, respectively.
Representations of the random dot movies and the original natural movies in the population response space were examined by plotting all the movies in the response space spanned by the normalized responses of the recorded MT cells and by calculating the Euclidian distances for all pairs in the random dot movies and in the original natural movies, respectively. The correlation coefficients of the Euclidian distances for the random dot movies and the corresponding original natural movies were 0.60 and 0.64 for the first and second animal, respectively. These were larger than the mean +3SD (−0.17 + 0.21 and −0.21 + 0.23 for the first and second animal, respectively) of the distribution of 1000 correlation coefficients with random assignments.

Evaluation of the results
In this study, we developed a new algorithm to extract motion information from natural movies and showed that the random dot movies that followed the algorithm elicited neuronal responses similarly to the corresponding original natural movies in area MT of common marmosets, in terms of magnitude, spatial localization in the visual field, and population representation. Area MT is known to play an important role in visual motion processing (Born and Bradley, 2005;Lui and Rosa, 2015), since the cells in area MT are selective for direction, speed, and spatial frequency. The characteristics of the responses to natural movies in area MT are consistent with those to simple stimuli such as bars, gratings, and dots (Nishimoto and Gallant, 2011;Lui et al., 2012). Thus, the similarity between the responses to the random dot movies and those to the original natural movies shown in this study indicated that the proposed algorithm is a useful method for investigating biologically critical motion information from natural movies.
Although the correlation coefficients of the responses between random dot movies and original natural movies were significantly high, the responses were not the same. More specifically, the intercept of the linear regression through the points in Fig. 3c would be far from zero. Thus, the random dot movies tended to generate far greater responses when the spike rates were low. Consistent with this observation, some MUAs showed different direction selectivity between the random dot motion generated with the new algorithm and the traditional coherently moving random dots. These might be due to the sensitivity of the cells in area MT to the spatial frequency and contrast of the stimulus image (Priebe et al., 2003;Lui et al., 2007). The random dot movies obviously consisted of spatial frequency components and image contrasts different from the original natural movies. Testing various types of random dot movies, e.g., contrast-reversed versions (black dots on a white background) or Gaussian-filtered versions (dots containing more low spatial frequency components), would also be required in future studies to validate the notion that the proposed algorithm indeed extracts motion information of natural movies.
The random dot movies introduced here provided a wide variety of complex motion stimuli extracted from deformable objects without any physical constraints. In contrast, conventional stimuli used for shape-from-motion processing and biological motion usually do not, because the stimuli for shape-from-motion are based on geometrical figures (Siegel and Andersen, 1988;Todd, 1984), and multiple light markers used for biological motion are constrained by the moving agents (Cutting, 1978;Johansson, 1973). For example, attaching light markers for preparing a stimulus of the tiger's biological motion is laborious. In principle, the present algorithm can be applied to any daily movie. Thus, this new visual stimulus set of the random dot movie would be useful for investigating higher-order motion processing and the integration processing of shape and motion in neurophysiological and psychophysical experiments.

Limitation of the current methods
The aim of using algorithms to extract optical flow Barron et al., 1994) is to estimate the ground-truth flow of motion in an image sequence. These usually assume some constraints, such as a continuity equation (Horn and Schunck, 1981), to compute the vector field in the image sequence. In contrast to this, we did not attempt to estimate the ground-truth flow and the present algorithm did not require any assumptions or constraints on the image sequence. Thus, an image sequence can contain nonrigid objects such as animals. In the present algorithm, the field vector at each pixel was calculated from the spatial derivatives; thus, it depended on the change in contrast between nearby pixels. In an extreme case, it would be different if the grayscale was inverted. For example, the local vector direction in a movie in which a white object (e.g., a white bear) moves on a black background (e.g., soil) is opposite to that in a movie in which an identical, but black, bear moves on a white background (e.g., snow). Thus, the present algorithm is different from those used for the extraction of the optical flow.

Significance of the new algorithm for the visual sciences
A distinguishing feature of the present algorithm is that it consisted of two independent computations: one for the vector field derived from spatial and time derivatives in the original movies and the other for the visualization of the vector field based on the random dot motion. This implies that we can visualize the vector field using a different modality of random dot other than motion.
For example, Supplement movie 7 shows a film clip in which random dots change their size in accordance with the dynamic vector field extracted from Supplement movie 1 using the new algorithm. The lengths along the x-and y-axes of the random ellipse-shaped dots change in accordance with the vector field that has x and y components represented in Eqs. (2) and (3). The two independent computations for making the stimulus movie may be useful for preparing various types of stimulus sets for neurophysiological and psychophysical motion experiments. The presentation of the same information in different modalities may help in designing experiments to identify the response properties.
In the generation of visual stimuli, the present motion extraction algorithm can be applied to natural movies and does not require parameterized geometric shapes, providing a variety of complex visual motion stimuli, including social actions by animals. Because such visual motion stimuli do not contain shape information unless they move, it is possible to dissociate the shape information from the motion factor. Thus, this algorithm is useful for investigating the integration processing of shape and motion in neurophysiological and psychophysical experiments. For example, the functional roles of cortical areas other than area MT in integration processing might be inferred by comparing the neuronal responses to original natural movies with those to their random dot motion obtained using the present algorithm and static frames selected from the original movies. This might further elucidate the neural substrates for the interaction between the shape and motion pathways.

Conclusion
We have proposed a new algorithm to extract motion information from natural movies and new stimulus movies in which random dots move in accordance with the algorithm. The random dot movies elicited neuronal responses similarly to the corresponding original natural movies in area MT of common marmosets, indicating that the proposed algorithm is a useful method for investigating biologically critical motion information from natural movies. The algorithm can be applied to natural movies and does not require parameterized geometric shapes, providing a variety of complex visual motion stimuli. Thus, the new algorithm is a useful tool for investigating motion information processing in neurophysiological and psychophysical experiments.