Mantra: an open method for object and movement tracking

Mathôt, Sebastiaan; Theeuwes, Jan

doi:10.3758/s13428-011-0105-9

Mantra: an open method for object and movement tracking

Open access
Published: 10 May 2011

Volume 43, pages 1182–1193, (2011)
Cite this article

Download PDF

You have full access to this open access article

Behavior Research Methods Aims and scope Submit manuscript

Mantra: an open method for object and movement tracking

Download PDF

Sebastiaan Mathôt^1,2,3 &
Jan Theeuwes¹

1543 Accesses
3 Citations
1 Altmetric
Explore all metrics

Abstract

Mantra is a free and open-source software package for object tracking. It is specifically designed to be used as a tool for response collection in psychological experiments and requires only a computer and a camera (a webcam is sufficient). Mantra is compatible with widely used software for creating psychological experiments. In Experiments 1 and 2, we validated the spatial and temporal precision of Mantra in realistic experimental settings. In Experiments 3 and 4, we validated the spatial precision and accuracy of Mantra more rigorously by tracking a computer controlled physical stimulus and stimuli presented on a computer screen.

How to Perceive Object Permanence in Our Visual Environment: The Multiple Object Tracking Paradigm

Studying visual attention using the multiple object tracking paradigm: A tutorial review

Article 05 June 2017

Real-Time Motion Capture Toolbox (RTMocap): an open-source code for recording 3-D motion kinematics to study action–effect anticipations during motor and social interactions

Article 25 March 2015

Object tracking is a powerful method of response collection. There are many examples of studies that have addressed important questions by using object tracking. For example, in a study by Tipper, Howard, and Jackson (1997), participants reached for a target stimulus (a wooden block) while the positions of their hands were tracked. In addition to the target, a distractor stimulus was presented. The crucial finding was that the reaching trajectory of the hand systematically veered away from the distractor. The authors interpreted this as evidence for competition between the target and the distractor, which is resolved by inhibiting the distractor location. Another example that illustrates the usefulness of object tracking is a study by Brenner and Smeets (1996). In their study, participants picked up a target stimulus (a brass disk) while their thumbs and index fingers were tracked in order to measure hand opening. The apparent size of the target was manipulated by presenting it among converging lines in various configurations. The crucial finding was that this perceptual illusion did affect the participants' judgments of the size of the target, but did not affect how wide they opened their hands to reach for the target. Brenner and Smeets interpreted this finding as evidence for separate visual streams for action and perception (Goodale & Milner, 1992). Both the study by Tipper et al. and that by Brenner and Smeets illustrate clearly that object tracking is a unique and flexible tool, which allows researchers to investigate issues that cannot be investigated otherwise.

Even in situations in which the use of a keyboard may be considered an adequate form of response collection, object tracking can provide additional information. For example, keyboard presses are often used to investigate whether responses are faster (or slower) in one condition than in another condition. This approach has a rich history and forms the basis of many classic psychological paradigms (e.g., Donders, 1969; Posner, 1980). However, some questions are difficult to answer on the basis of response time alone. For example, is there a difference in the time of movement–onset or is there a difference in the velocity of the movement? Both possibilities could lead to a decrease in response time as measured using a keyboard. This question, and many others, can easily be investigated by tracking the location of a participant's hand throughout a trial.

Despite the obvious advantages of object tracking as a method of response collection, object-tracking systems are used sparingly by experimental psychologists. The reason is that the required equipment is generally expensive and is not part of the “default set of equipment” found in most psychological laboratories. In the present article, we introduce Mantra, a system for object tracking, which has three crucial advantages. First, Mantra is released under an open-source license and is available free of charge. Second, Mantra requires only a computer and a camera (an ordinary webcam is sufficient). Therefore, Mantra allows object tracking with general purpose, widely available equipment. Third, Mantra is designed specifically as a tool for experimental psychology. Therefore, it integrates painlessly with software for creating psychological experiments, such as E-Prime (Schneider, Eschman, & Zuccolotto, 2002), PsychoPy (Peirce, 2007), PyEPL (Geller, Schleifer, Sederberg, Jacobs, & Kahana, 2007), and OpenSesame (Mathôt & Theeuwes, 2010). As compared with systems such as the Liberty tracking system (Polhemus), TrakSTAR (Ascension Technology Corporation), or the Optotrak System (Northern Digital), Mantra offers basic functionality. However, for many purposes, such as the study by Tipper and colleagues (1997) described previously, this basic functionality is precisely what is needed.

In the first section of the present article, we provide a brief, nontechnical description of Mantra. The following sections describe four experiments. Experiment 1 is a replication of the Müller-Lyer illusion (Müller-Lyer 1889), which we have designed to validate the spatial precision of Mantra in a realistic experimental setting. Experiment 2 is a variant of the additional singleton paradigm (Theeuwes, 1994), which we have designed to validate the temporal precision of Mantra, also in a realistic experimental setting. In Experiments 3 and 4, we investigated the spatial precision and accuracy of Mantra more rigorously, by tracking a computer controlled physical stimulus and stimuli presented on a computer display. A detailed description of Mantra, installation packages, source-code, and experimental data can be downloaded from http://www.cogsci.nl/mantra.

Usage

System requirements

Mantra is available as an open-source software package for Linux and integrates directly with experiments created in E-Prime (Windows XP) and Python (cross-platform). Mantra will run on any modern computer system, including low-end systems, such as the Intel Atom-based netbook used in Experiment 3. A camera (e.g., a webcam) is required.

Defining objects

The first step in using Mantra is to define one or more objects. Object definitions are based on color, which provides a robust and computationally cheap way to track multiple objects simultaneously and unambiguously. Therefore, it is important to use distinctly colored objects. Stickers or colored pieces of paper can be attached to objects that do not have a distinct color themselves. The number of objects that can be tracked simultaneously is determined by the number of colors that are sufficiently distinct. In turn, this depends on factors such as lighting and camera settings. In practice, it is feasible to track up to five objects (Fig. 1c). In order to define an object, you simply hold it in front of the camera and select it in the object-definition window (Fig. 1b). The color of the selected pixel is taken as the object-defining color. The object now turns green, whereas the rest of the image turns red. This allows you to determine visually if the object is reliably detected and is not confused with other objects. By default, Mantra compensates for luminosity, by representing color values relative to luminosity [e.g., R_rel = R – (R + G + B)/3]. Therefore, detection remains reliable even if luminosity varies: A red object that has been defined in the light is also detected in the shade.

Tracking

After all objects have been defined, you can start tracking. While tracking is in progress, you can monitor the location of the objects (Fig. 1c). The average location of all matching pixels is taken as the object's location (x, y). A z-coordinate is also available, which is defined as the maximum of the width and height of the object and can be used as a (very) coarse approximation of distance. The velocity and acceleration of the object are determined as well. If the velocity exceeds a certain threshold, a movement start is signaled. If the velocity then drops below a second threshold, a movement end is signaled. All data are logged as plain text to a file.

In most cases, the temporal resolution will be limited by the frame rate of the camera. Most webcams, including the webcams that we have used in our experiments, have a frame rate of 25 Hz, which is equivalent to a temporal resolution of 40 ms. On a 1.66-GHz netbook, tracking at 25 Hz, CPU consumption is around 53%, irrespective of the number of objects that are tracked (one object, 53.1%; five objects, 53.7%).

The spatial resolution depends on two factors. The first factor is the resolution of the camera. In our experiments, we have used a camera with a resolution of 640 × 480 pixels, which is a typical resolution for webcams. The second factor is the distance between the camera and the object. For obvious reasons, spatial resolution is highest for objects near the camera. There is always a small jitter due to ambiguities in the separation between object and background (Figs. 3b and 7b, c, d). Under good conditions (i.e., with proper lighting, well-defined objects, and using a camera with a resolution of 640 × 480 pixels), objects can be tracked with a spatial precision of up to 0.3° (corresponding to about 2 mm in a regular setup; see Experiment 3). Under optimal conditions (such as tracking ideal stimuli on a computer display), a measurement error of less than 0.1° is even feasible (Experiment 4) (Table 1).

Table 1 The results of Experiment 4. The distance between two stimuli as measured by Mantra, compared with the real distance

Full size table

Communication

Because Mantra is primarily intended as a data-collection tool for experiments, communication between the experiment and Mantra is crucial. Example code is provided in Table 2 (E-Basic) and Table 3 (Python). The first step is to establish a connection between the experiment and Mantra. In order to do this, one needs to know the IP address of the computer running Mantra, which depends on your network configuration. You must also know the port on which Mantra is listening, which is displayed in the tracking preview window (Fig. 1c). After a connection has been established, the experiment can send information to Mantra. For example, the experiment can write messages to the Mantra log file to indicate the start and end of a trial. The experiment can also retrieve information from Mantra. The coordinates of an object can be queried (Experiment 1) or the experiment can wait for the start or end of a movement (Experiment 2).

Table 2 Example E-Prime code. This example assumes that the E-Basic Mantra script has been included in the User Scripts section of the experiment

Full size table

Table 3 Example Python code

Full size table

Experiment 1

The first aim of Experiment 1 was to validate the spatial precision of Mantra in a realistic experimental setting. To this end, we set out to replicate the Müller-Lyer illusion (1889). The Müller-Lyer illusion refers to the fact that people tend to overestimate the length of a line segment surrounded by inward-pointing arrowheads, relative to a line segment surrounded by outward- pointing arrowheads. In our experiment, participants controlled the length of a target line segment by adjusting the distance between their thumbs and index finger, which were tracked by Mantra. A replication of the Müller-Lyer illusion in this way would be a compelling demonstration of the spatial precision of the Mantra system.

The second aim of Experiment 1 was to provide a demonstration of how Mantra can be used in combination with E-Prime (Schneider et al., 2002). Because E-Prime is a widely used package for creating psychological experiments, it is crucial that Mantra integrates well with E-Prime.

Method

Participants, stimuli, and procedure

Five observers who were naive to the purpose of the experiment and one of the authors (S.M.) participated in the experiment (age range 18–27 years). All of the participants reported normal or corrected vision. The experiment was conducted in a well-lit room.

Before the start of each trial, a gray fixation dot was presented on a black background for 500 ms (Fig. 2a), followed by the presentation of two line segments that were 4.2° above and below the fixation dot. One of the line segments was surrounded by inward-pointing arrowheads; the other line segment was surrounded by outward-pointing arrowheads. One of the line segments (the match) was gray and had a fixed length (a random value between 2.5° and 4.2°). The other line segment (the target) was green, and its length was adjusted online, according to the distance between the thumb and index finger of the participant (see the following Apparatus, Software, and Response Collection section). The arrowheads consisted of two lines, 1.7° in length. The arrowhead style of the target (inward target/outward match or outward target/inward match) and the location of the target (target above/match below or target below/match above) were fully randomized. Participants were instructed to adjust the length of the target line segment and to press the spacebar when they felt that both line segments were equally long. It was emphasized that response time was not important. The experiment consisted of 16 practice trials, followed by 128 experimental trials.

Apparatus software, and response collection

The experiment was run on a desktop computer (Intel Core Duo, 3 GHz, Windows XP) running E-Prime 1.2. Mantra 0.2 was run on a laptop running Linux (Intel Pentium T4300, 2.1 Ghz, Ubuntu 9.10). Both computers were connected through an ethernet cable. For image acquisition, a Logitech webcam was used, with a frame rate of 25 Hz and a resolution of 640 × 480 pixels. The webcam was mounted on top of the experimental display and pointed downward (Fig. 2b). Participants wore a green paper “fingercap” on their thumb and an orange fingercap on their index finger. The length of the target line-segment on the display (in display pixels) was adjusted online to twice the distance (in webcam pixels) between the thumb and index finger.

Results

Target length was defined as the length of the target line segment relative to the match line segment. Trials in which target length was less than 50% or more than 150% were excluded (0.1%). In total, 99.9% of the trials were included in the analysis.

A two-tailed paired-samples t test revealed that target length was larger in the target-outward/match-inward condition (M = 105.4%; SE = 1.3) than in the target-inward/match-outward condition [M = 96.8%, SE = 1.7; t(5) = 3.0; p < .05; Fig. 3a]. All participants showed this effect, which reflects the Müller-Lyer (1889) illusion.

Figure 3b shows target length over time for a single, representative trial. A number of things are apparent from this graph. First, the oscillations reflect the typical tendency to iteratively adjust, overshoot, and readjust the length of the target line segment. Second—and, more importantly—jitter resulting from measurement error is small. For example, during the first 400 ms of this particular trial (the 10 frames before start of the first oscillation), the target-length standard deviation is 0.4%.

Discussion

In Experiment 1, we replicated the Müller-Lyer (1889) illusion. Participants controlled the length of a target line segment by adjusting the distance between thumbs and index fingers. The thumb and index finger were tracked on a computer running Mantra and communicated to a second computer running the experiment (programmed in E-Prime), which dynamically adjusted the length of the target line segment on the display.

Since it is conceivable that color affects perceived size, a potential concern is that the target line segment was always green, whereas the match line segment was always gray. However, this would lead to a systematic over- or underestimation of the size of the target line segment relative to that of the match line-segment, and cannot account for the Müller-Lyer (1889) illusion in the present experiment.

Two important conclusions can be drawn. First, Experiment 1 clearly shows that the position of multiple objects can be tracked reliably and precisely using Mantra. Second, Experiment 1 shows that Mantra integrates well with E-Prime.

Experiment 2

The first aim of Experiment 2 was to validate the temporal precision of Mantra. To this end, we created a variant of the additional singleton paradigm, in which participants made a speeded report of the orientation of a line segment within a uniquely shaped placeholder. On the basis of the literature, we expected that the presence of a distractor would result in increased response times, due to attention being captured by the distractor (Theeuwes, 1994). In addition, we expected that this distractor–interference effect would be largest if the distractor was presented near the target, due to increased competitive interactions between target and distractor at close spatial separations (Mathôt, Hickey, & Theeuwes, 2010; Mounts, 2000). In one condition, participants moved their index fingers, which were tracked by Mantra, to the left or to the right to make a response. In order to directly compare Mantra responses to keypress responses, we also included a condition in which participants responded using a keyboard.

The second aim of Experiment 2 was to demonstrate how Mantra can be used in combination with Python. Interoperability with Python ensures that the use of Mantra does not require access to proprietary software. A number of packages are available for creating psychological experiments in Python, such as PsychoPy (Peirce, 2007), PyEPL (Geller et al., 2007), and OpenSesame (Mathôt & Theeuwes, 2010).