anTraX: high throughput video tracking of color-tagged insects

Recent years have seen a surge in methods to track and analyze animal behavior. Nevertheless, tracking individuals in closely interacting, group-living organisms remains a challenge. Here we present anTraX, an algorithm and software package for high-throughput video tracking of color-tagged insects. anTraX combines neural network classification of animals with a novel approach for representing tracking data as a graph, enabling individual tracking even in cases where it is difficult to segment animals from one another, or where tags are obscured. The use of color tags, a well-established and robust method for marking individual insects in groups, relaxes requirements for image size and quality, and makes the software broadly applicable. anTraX is readily integrated into existing tools and methods for automated image analysis of behavior to further augment its output. anTraX can handle large-scale experiments with minimal human involvement, allowing researchers to simultaneously monitor many social groups over long time periods.


Introduction
Our understanding of behavior, together with the biological, neural and computational principles underlying it, has advanced dramatically over recent decades. Consequently, the behavioral and neural sciences have moved to study more complex forms of behavior at ever-increasing resolution. This has created a growing demand for methods to measure and quantify behavior, which has been met with a wide range of tools to measure, track and analyze behavior across a variety of species, conditions, and spatiotemporal scales [1][2][3][4][5][6][7][8]. One of the exciting frontiers of the field is the study of collective behavior in group-living organisms, and particularly the behavior of groups of insects. Insects provide an attractive and powerful model for collective and social behavior, as they exhibit a wide range in social complexity, from solitary to eusocial, while allowing for controlled, high throughput experiments in laboratory setting [9][10][11][12]. However, although complex social behavior has been the focus of extensive research for over a century, technological advances are only beginning to enable systematic and simultaneous measurements of behavior in large groups of interacting individuals. Solutions for automated video tracking of insects in social groups can be roughly divided into two categories (for reviews see [5,6]): methods for tracking unmarked individuals [13][14][15][16][17][18][19][20][21], and methods for tracking marked individuals [22,23]. The former category has the obvious advantages of reduced interference with natural behavior, less constrained experimental conditions, and an unbounded number of tracked individuals. At the same time, these approaches are limited by a more extensive computational burden, higher error rates, and stricter requirements for image quality. The most common approach for tracking unmarked individuals is to try and follow the trajectory of an individual for the duration of the video. The challenge in this approach is to resolve individuals from each other and link their locations in consecutive frames during close range interactions, when they are touching or occluding each other. Common solutions to this problem are to employ sophisticated segmentation methods [13,14,16], to use predictive modeling of the animal motion [13,20], or to use image characteristics to match individuals before and after occlusions [19]. The success of these solutions is case-specific and will usually be limited to relatively simple problems, where interactions are brief, occlusion is minimal, and image resolution is sufficient to resolve the individuals even during an interaction. One important limitation of this approach is that no matter how low the error rate is, it tends to increase rapidly with the duration of the experiment. The reason is that once identities are swapped, the error is unrecoverable, and will propagate from that moment on. A different algorithmic approach for tracking unmarked individuals is to use object recognition techniques to assign separate pieces of trajectories to the same individual [14,15]. While this approach is promising and performs well on many tracking problems, it requires high image quality to identify unique features for each individual animal. It will also generally not perform well on animals with high postural variability and is hard to validate on large datasets. On the other hand, tagging individuals with unique IDs has the advantage of having a stable reference, enabling error recovery. This approach also provides a simpler method for human validation or correction and enables following the behavior of individuals even if they leave the tracked region, or across experiments when the same animals are tested in different conditions. The use of generalpurpose libraries such as AprilTags [22,[24][25][26][27] and ArUco [28], or application-specific patterned tags [29][30][31], has become the gold standard for this approach in recent years. However, these tags are applicable only to species with body sizes sufficiently large to attach them, have adverse effects on the animals' behavior, and are often lost during experiments. They also require relatively high image resolution to correctly read the barcode pattern. Taken together, while currently available methods cover a wide range of experimental scenarios, the ability to accurately track the behavior of animals in groups remains one of the major hurdles in the field. As a result, much of the experimental work still relies on manual annotation, or on partially automated analysis pipelines that require considerable manual effort to correct computer-generated annotations (see [32][33][34][35] for recent examples). Here we present anTraX, a new software solution for tracking color tagged insects and other small animals. Color tagging is one of the best-established and widely used methods to mark insects, both in the field and in the laboratory [34][35][36][37][38][39][40][41], with long-term durability and minimal effects on behavior. anTraX works by combining traditional segmentation-based object tracking with image-based classification using Convolutional Neural Networks (CNNs). In addition, anTraX introduces a novel approach for representing tracking data on a graph, enabling the inference of identity of unidentified objects by propagating temporal and spatial information, thereby optimizing the use of partial tag information. anTraX is uniquely suited for tracking small social insects that form dense aggregates, in which individuals are unidentifiable over large parts of the experiment even for the human observer. It will also be useful in tracking and analyzing behavior in heterogenic groups of "solitary" insects, where keeping track of the individual identity for long experimental durations is important. Such experiments are of increasing interest, as the study of behavior in classical model systems like Drosophila fruit flies is shifting toward understanding more complex behavioral phenomena such as social interactions, individuality and interspecies interactions [12,[42][43][44][45][46].
While we tested anTraX and found it useful for behavioral analyses in a range of study systems, it was specifically developed for experiments with the clonal raider ant Ooceraea biroi. The clonal raider ant is an emerging social insect model system with a range of genomic and functional genetic resources [39,[47][48][49][50][51]. The unique biological features of the species enable precise control over the size, demographics and genetic composition of the colony, parameters that are essential for systematic study of collective behavior in ants. Moreover, the species is amenable to genetic manipulations [48], which open new possibilities not only for understanding the genetic and neural bases of social and collective behaviors, but also for developing and validating theoretical models by manipulating behavioral rules at the level of the individual and studying the effects on group behavior. While these ants have great promise for the study of collective behavior, they are impossible to track using available approaches, due to their small size and tendency to form dense aggregates. anTraX thus constitutes a crucial element in the clonal raider ant toolbox, enabling researchers to characterize behavior in unprecedented detail both at the individual and collective level. anTraX was designed with large-scale behavioral experiments in mind, where hundreds of colonies are recorded in parallel for periods of weeks or months, making manual tracking or even error correction impractical. Its output data can be directly imported into software packages for higher level analysis of behavior (e.g. [52]) or higher resolution postural analysis of individuals in the group [53][54][55][56]. This enables the utilization of these powerful tools and methods for the study of social insects and collective behavior. anTraX is modular and flexible, and its many parameters can be set via a graphical interface. The software is open source, and its main algorithmic components can be easily modified. Here we provide a brief description of the different steps and principles that constitute the anTraX algorithm, while a full description is given in the Appendix and the online documentation. We validate the performance of anTraX using a number of benchmark datasets that represent a variety of behavioral settings and experimental conditions.

Methods
The anTraX algorithm consists of three main steps. First, it segments the frames into background and ant-containing blobs and organizes the extracted blobs into trajectory fragments termed tracklets. The tracklets are linked together to form a directed tracklet graph. The second step of the algorithm is tracklet classification, in which identifiable single-animal tracklets are labeled with a specific ID by a pretrained CNN, while other tracklets are marked as either unidentified single-animal tracklets, or as multianimal tracklets. Third, we infer the identity of unclassifiable tracklets in the tracklet graph by using temporal, spatial and topological propagation rules.

Object tracking and construction of the tracklet graph
Each frame is subtracted from the background, and a fixed threshold is applied to segment the frame into background regions and animal-containing blobs to be tracked. When two or more ants are close together, they will often be merged into a single larger blob ( Figure 1A-C). Unlike other tracking solutions, we do not attempt to separate these larger blobs into single ant blobs at this stage, because those attempts are based on heuristic decisions that do not generalize well across species and experimental conditions. Instead, we will infer the composition of these larger blobs from the tracklet graph in a later step. Each blob in the currently processed frame is then linked to blobs in the previous frame ( Figure 1D-E). A link between a blob in frame t and a blob in frame t-1 implies that some or all of the ants that were part of the first blob, are present in the second one. A blob can be linked to one blob (the simplest case, where the two blobs have the same ant composition), to a few blobs (where ants leave or join the blob), or to none (suggesting the ants in the blob were not detected in the other frame). We use Optical Flow to decide which blobs should be linked across frames ( Figure 1D). While Optical Flow is computationally expensive, we found it to be significantly more accurate than alternatives such as simple overlap or linear assignment (based either on simple spatial distance or on distance in some feature space). To reduce the computation cost, we run the optical flow in small regions of the image that contain more than one linking option (see Appendix section 1.4 for details). Blobs are organized into tracklets, defined as a list of linked blobs in consecutive frames that are composed of the same group of individuals ( Figure 1E-F). Following linkage, tracklets are updated in the following way: (i) A blob in the current frame t that is not linked to any blob in the previous frame t- 1, will 'open' a new tracklet. (ii) A blob in the previous frame that is not linked to any blob in the current frame, will 'close' its tracklet. (iii) If a pair of blobs in the previous and current frames are exclusively linked, the current blob will be added to the tracklet that contains the previous blob. (iv) Whenever a blob in the current or previous frames is connected to more than one blob, the tracklets of the linked blobs in the previous frames will 'close', and new tracklets will 'open' with the blobs in the current frame. In these latter cases, the linking between the blobs across different tracklets will be registered as an edge in the directed tracklet graph from the earlier tracklet to the latter. The tracklet graph is constructed by running an iterative loop over all the frames in the experiment. The result of this part of the algorithm, after processing all frames in the video, is a directed acyclic graph containing references to all tracklets and blobs in the dataset ( Figure 1G).

Tracklet classification
The next step is tracklet classification, in which we label tracklets containing single ants (Appendix section 2.3) that can be reliably identified a specific ID. It is important to note that our approach does not rely on the identification of each and every tracklet, but rather on inferring the composition of tracklets based on propagation of IDs on top of the tracklet graph. Hence, we apply a conservative algorithm that classifies only reliable cases and leaves ambiguous ones as unidentified. Classification is done by training and applying a Convolutional Neural Network (CNN) on each blob image in the tracklet. The most frequent ID is then applied to the entire tracklet ( Figure 2A). In addition to the ID label, we also assign a classification confidence score to each classified tracklet, which takes into account the number of identified blobs in the tracklet, the confidence of each classification, and the prevalence of contradictory classifications across blobs in the tracklet (see Appendix section 2.4). anTraX comes with a graphical interface for training, validating, and running the CNN (see Supplementary Material and online documentation).

ID propagation
The last part of the algorithm is the propagation of ID assignments on the tracklet graph. This is implemented using a greedy iterative process. Each node in the graph (corresponding to a tracklet) is annotated with a dynamic list of assigned IDs (IDs that are assigned to the tracklet) and a list of possible IDs (IDs that might be assigned to the tracklet, i.e. that were not yet excluded). Initially, all nodes are marked as 'possible' for all IDs, and no IDs are assigned to any nodes. All the classified tracklets from the previous step are now ranked by their confidence score. Starting with the highest confidence tracklet, its ID is propagated on the graph as far as possible. Propagation is done vertically on the graph on top of edges, both positively (an ID that is assigned to a node must also be assigned to at least one of its successors and one of its predecessors) and negatively (an ID cannot be in the possible list of a node, if it is not in at least one successor and one predecessor node), horizontally (if an ID is assigned to a node, it cannot be assigned to any other time-overlapping node), and using topological constraints ( Figure 2B, Figure2-Movie1). Only non-ambiguous propagation is performed, and propagation is halted whenever an ambiguity or contradiction arises. We iterate the propagation until no more assignments can be made. Some of the propagation rules are modified in cases of tracklets that start or end in regions where individuals can enter or leave the tracked area (see Appendix section 3). Figure 2C-F visualizes an example of tracking an ant throughout a 10-minute segment from an actual experiment and depicts the path of the ant through the tracklet graph along with its spatial trajectory.

Export positional and postural results for analysis
The tracking results are saved to disk and can be accessed using supplied MATLAB and Python interface functions. For each individual ID in the experiment, a table is returned, containing its assigned spatial coordinates in each frame of the experiment, and a flag indicating the type of the location estimation (e.g., direct single-animal classification, inferred single-animal, multi-animal tracklet). For frames where the location is derived from single animal tracklets (i.e., the animal was segmented individually), the animal orientation is also returned. Locations estimated from multi-animal tracklets are necessarily less accurate than locations from single-animal tracklets, and users should be aware of this when analyzing the data. For example, calculating velocities and orientations is only meaningful for single-animal tracklet data, while spatial fidelity can be estimated based also on approximate locations. A full description of how to import and process the tracking results is provided in Appendix section 3.6 and the online documentation.

User interface and parameter tuning
anTraX has many parameters, and the optimal value for each depends on the specific nature and settings of the processed experiment, from the resolution and quality of the camera, to the details of the organisms and number of tags. anTraX comes with a graphical user interface to tune and verify the value of these parameters. anTraX also contains a user interface for creating an image database and training the CNN for tracklet classification.

Parallelization and usage on computer clusters
anTraX was specifically designed to process large-scale behavioral experiments, which can contain hundreds of video files and tens of terabytes of data. anTraX includes scripts to process such large datasets in batch mode where individual video files are tracked in parallel on multicore computers and high-performance computer clusters. Following per-video processing, anTraX will run a routine to 'stitch' the results of the individual files together (see online documentation).

Availability and dependencies
The core tracking steps of anTraX are implemented using MATLAB, while the classification parts are implemented using a TensorFlow/Python module. Compiled binaries are available for use with the freely available MATLAB Runtime Library and can be run with a command line interface. anTraX can be run on Linux/OSX systems, and large datasets benefit considerably from parallelization on computer clusters. anTraX depends on the free FFmpeg library for handling video files. The result files are readable with any programming language, and we supply a Python module for easily interfacing with output data. anTraX is distributed under the GPLv3 license, and its source code and binaries are freely available [57]. anTraX is a work in progress and will be continuously extended with new features and capabilities. Online documentation for installing and using the software is available at http://antrax.readthedocs.io. Users are welcome to subscribe, report issues, and suggest improvements using the GitHub interface.

anTraX tracks individual ants with near-human accuracy over a wide range of conditions
As any tracking algorithm, the performance of anTraX depends on many external factors, such as the image quality, the image framerate, the quality of the color tags (size, color set, number of tags per individual), and the behavior of the organisms (e.g., their tendency to aggregate, their activity level, etc.). anTraX was benchmarked using a number of datasets spanning a variety of experimental conditions (e.g., image quality and resolution, number of tracked individuals, number of tags and colors, size variability in the colony) and study organisms, including four different ant species, as well as the fruit fly Drosophila melanogaster (Table 1, Figure 3 and its supplements). All benchmark datasets, together with the raw videos, full description, configuration files, and trained classifiers are available for download [58]. The performance of the tracking algorithm can be captured using two separate measures. The first is the rate of assignment, defined as the ratio of assigned locations in the experiments to the total possible assignments (i.e., the number of IDs times the number of frames). The second measure is the assignment error, defined as the ratio of wrong assignments to the total number of assignments made. While the assignment rate can be computed directly and precisely from the tracking results, the error rate in assigning IDs for a given data set needs to be tested against human annotation of the same dataset. Because the recording duration of these datasets is typically long (many hours), it is impractical to manually annotate them in full. Instead, a validation procedure was used in which a sequence of randomly selected test points was presented to a human observer, where each test point corresponded to a location assignment made by the software to a specific ID in a specific frame. The user was then asked to classify the assignment as either 'correct' or 'incorrect'. If the user was unsure of the correctness of the assignment, they could skip to the next one. The process was repeated until the user had identified 200 points as either correct or incorrect. The accuracy of the tracking was measured as the ratio of correct test points to the sum of correct and incorrect test points, as determined by the human observer. Overall, anTraX performed at a level close to the human observer in all benchmark datasets ( Table 2).

Graph inference dramatically improves tracking performance
The main novelty of anTraX compared to other tracking solutions is the use of a tracklet graph for ID inference. This method increases the tracking performance in several ways. First, it allows identification of tracklets that are unidentifiable by the classifier, using propagation of IDs from classified tracklets. Second, it corrects classification errors by overriding low-reliability assignments made by the classifier with IDs propagated from high-reliability tracklets. Third, it assigns IDs to multi-individual blobs and tracklets. This provides an approximate location for analysis, even when an animal cannot be individually segmented. Table 2 and Figure 4A-B show the increase in assignment coverage and decrease in assignment errors following graph propagation in all benchmark datasets. To further demonstrate the utility of graph propagation, we used data from a full, large-scale experiment. We tracked the behavior of 10 clonal raider ant colonies, each consisting of 16 ants, for 14 days. The colonies were filmed at relatively low resolution using simple webcams (Logitech C910, 960x720 pixels image size, 10 frames per second), similar to that of benchmark dataset J16. This dataset represents a relatively challenging classification scenario, because the tags are small, and the colors are dull. Figure 4C-D show a comparison of assignment rate and accuracy across the 10 replicates before and after graph propagation, with both measures improving greatly.

The blob classifier generalizes well across experimental replicates
Collecting examples and training the blob classifier is the most time-consuming step in the tracking pipeline. Ideally, a universal blob classifier would be trained to identify the same tag combination across experiments, without the need to retrain a classifier for each experiment. In reality, however, this is impractical. CNN classifiers do not generalize well outside the image distribution they were trained on, so even apparently small changes in experimental conditions (e.g., the type or level of lighting used, or the color tuning of the camera) can markedly decrease classification performance. Nevertheless, when experiments are conducted using similar conditions (e.g., study organism, marking technique, experimental setup, etc.), it is possible to construct a classifier that will generalize across these experiments with minimal or no retraining. This enables construction of efficient tracking pipelines for high throughput and replicate experiments, without the need for additional manual annotations. We assessed the generalizability of blob classifiers with the 10 replicates of the experiment described in the previous section. We trained a classifier on examples from one replicate, and then used it to classify blobs sampled from the other replicates. We similarly evaluated the performance of classifiers trained with examples from two, three, and four replicates, and compared the results to the performance of a classifier trained on examples from all replicates. The comparison shows that, despite variability in animal shape and behavior, tagging process, cameras, and experimental setups across replicates, the classifier performs remarkably well ( Figure 4E). Moreover, when a classifier is trained with an example set obtained from as few as 2 replicates, it performs similarly well as a classifier trained with examples from all replicates. Obviously, the generalizability of this result will depend on how well conditions are standardized between replicates or experiments. Nevertheless, it demonstrates that robust behavioral tracking pipelines can be constructed with minimal retraining.

anTraX can be combined with JAABA for efficient behavioral annotation of large datasets
While direct analysis of the tracking output is a possibility, phenotyping high throughput experiments and extracting useful information from large-scale trajectory data beyond very simple measures are challenging and impractical. In recent years the field of computational ethology has shifted to the use of machine learning, both supervised and unsupervised, for analyzing behavioral data [7,8,59]. One of the most useful and mature tools is JAABA, a package for behavioral annotation of large datasets using supervised learning [52,60]. In short, JAABA projects trajectory data onto a high dimensional space of per-frame features. The user then provides the software with a set of examples for a certain behavior, and the software trains a classifier to find all occurrences of that behavior in a new dataset. anTraX includes functions to generate the per-frame data in a JAABA-compatible way. In addition to the original list of JAABA features, a set of anTraX-specific features is also generated (see online documentation for details). The user can then label examples and train a classifier in the JAABA interface. This classifier can then be used to analyze entire experiments using the anTraX interface. To demonstrate the power of this approach, we present two examples of using JAABA together with anTraX. In the first example, we train a classifier to detect O. biroi ants carrying a larva while walking. O. biroi ants carry their larva under their body, in a way not always obvious even to a human observer ( Figure 5A, Figure5-Movie1). By using subtle changes in the ants' appearance and kinematics, JAABA is able to classify this behavior with >93% accuracy (tested on a set of annotated examples not used for training). An example of trajectories from a 30-minute period annotated with JAABA is shown in Figure  5B.
In the second example, we used JAABA to classify the behavior of ants during periods when they are not moving. We trained a classifier to detect four distinct behaviors (Figure5-Movie2): rest, in which the ant is completely immobile; local search, in which the ant does not move but uses its antennae to explore the immediate environment; self-grooming, in which the ant stops to groom itself; and objectinteraction, in which the ant interacts with a non-ant object such as a piece of food, a larva or a trash item. JAABA was able to identify these behaviors with >92% accuracy. Figure 5C shows the spatial distribution of the classified behaviors during all periods where an ant stops walking for more than 2 seconds in a 60-minute experiment, across all ants in the colony.

anTraX can be combined with DeepLabCut to augment positional data with pose tracking
Much attention has recently been given to tracking the relative position of the animal body parts of animals, taking advantage of the fast progress in machine learning and computer vision [53,54,56]. This allows for the measurement and analysis of aspects of an animal's behavior beyond what is extractable from its trajectory. Although these tools can in principle be directly applied to videos with multiple individuals [61,62], they are still not mature enough for large scale use. A more reasonable approach it to combine an individual animal pose tracking with together with a track-and-crop step (see discussion within [56]). To track body parts of individual animals within a group or a colony, we took advantage of the fact that anTraX segments and crops the images of individual animals as part of its workflow, and included an option to run pre-trained DeepLabCut models [54] on these images, without the need to export the data in a DeepLabCut-readable format (which would have resulted in a heavy computational overhead). This way, the position of the tracked body parts relative to the animal's centroid are returned together with the spatial location of the centroid. For training such a model, anTraX enables exporting cropped single animal videos that are loadable into the DeepLabCut user interface. Of course, the ability to perform accurate and useful pose estimation depends on the resolution at which animals appear in the video. To demonstrate the potential of this approach, we trained a simple DeepLabCut model to track the main body axis and antennae positions of ants from benchmark dataset A36. Figure 5D  Ants use different antennation patterns to explore their environment [63], and the ability to track these patterns in parallel to their movement in space can contribute to our understanding of their sensory processing during free behavior. We used the pose tracking results to visualize the different modes of antennae movement used by the ants to explore their environment. Figure panels 5E and 5F show the total movement rate and the relative phase of the two antennae along the trajectory of one ant in a one-hour segment of the experiment, respectively, demonstrating the variability and richness inherent to these patterns.

Discussion
anTraX is a novel algorithm and software package that provides a solution for a range of behavioral tracking problems not well addressed by available methods. First, by using a deep neural network for image classification, it enables the tracking of insects that are individually marked with color tags. While color tags have been used successfully for behavioral analysis for decades in a wide range of social insects, and in many species they are the only practical type of marker, their use has been severely limited by the lack of automation. Second, unlike other existing approaches, it handles cases where insects tightly aggregate and are not segmentable, as well as cases where the tags are obscured. This is achieved by representing the tracking data as a directed graph, and using graph walks and logical operations to propagate information from identified to unidentified nodes. Third, anTraX handles very long experiments with many replicate colonies and minimal human oversight, and natively supports parallelization on computational clusters for particularly large datasets. Finally, anTraX can easily be integrated into the expanding ecosystem of open source software packages for behavioral analysis, making a broad range of cutting-edge ethological tools available to the social insect community.
anTraX is an open source software and conveniently modular, with each step of the algorithm (segmentation, linking, classification, and propagation) implemented as a separate module that can be easily augmented or replaced to fit experimental designs that are not well handled by the current version of the algorithm. For example, the traditional background subtracted segmentation can be replaced with a deep learning-based semantic segmentation, i.e. training and using a classifier to distinguish pixels of the image as belonging to either background or foreground [64][65][66]. This can potentially allow analysis of field experiments with natural backgrounds, or experiments with non-static backgrounds, such as videos taken with a moving camera. Another possible extension is an informed 'second pass segmentation' step, where multi-animal blobs are further segmented into single-animal blobs, taking into account the composition of the blob (number and IDs of animals). Knowing the composition of the blob provides a method to algorithmically validate the segmentation, allowing a "riskier" segmentation approach. Having a record of the composition of tracklets and blobs also paves the way to performing image-based behavioral analysis of interactions [67-69], or constructing specialized image classifiers for interaction type (e.g. allogrooming, trophallaxis, aggression etc.). As pose-estimation packages advance, this may also allow to track specific body parts inside aggregations. In summary, anTraX fills an important gap in the range of available tools for tracking social insects, and considerably expands the range of trackable species and experimental conditions. It also interfaces with established ethological analysis software, thereby elevating the study of social insects to the level of classical, solitary model organisms. anTraX therefore has the potential to greatly accelerate our understanding of the mechanisms and principles underlying complex social and collective behavior.
in this publication was supported by the National Institute of General Medical Sciences of the National Institutes of Health under Award Number R35GM127007 to D.J.C.K. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health. This work was also supported by a Searle Scholar Award, a Klingenstein-Simons Fellowship Award in the Neurosciences, a Pew Biomedical Scholar Award, and a Faculty Scholars Award from the Howard Hughes Medical Institute to D.J.C.K. A.G. was supported by the Human Frontiers Science Program (LT001049/2015). J.S. was supported by a Kravis Fellowship awarded by Rockefeller University. This is Clonal Raider Ant Project paper #14.

4.
Krakauer      Figure 4A-B.           Legends for movies and source data files           A table of performance measures, as defined in the main text, for the benchmark datasets. The data in this file were used to plot Figure 4A-B, and are referenced in Table 2.   Figure 4E. The data in this file were used to plot Figure 4E. Examples of short video clips from dataset A36 in which some ants walk while carrying a larva, while other walk without a larva. Clips like these were used to train the JAABA classifier in Figure 5B.  Figure 5C.

Appendix: detailed description of the anTraX algorithm
The anTraX algorithm consists of three main steps (Appendix- figure 1). In the first step, we detect the tracked animals in each frame of the video and organize the extracted blobs into trajectory pieces we term tracklets. As we will detail below, these tracklets are in turn linked together to form an acyclic directed graph we name the tracklet graph. The second step of the algorithm is tracklet classification, in which identifiable tracklets are classified based on the color tag information present in its blobs. In the third step of the algorithm, we use the topology of the tracklet graph to propagate identity information from the classified tracklets to the entire set of tracklets.
In this document, we detail each part of the algorithm and fully describe its various computational steps and parameters. A practical tutorial for running the software and use its graphical interface can be found in the online documentation.

Creating a background image
anTraX uses background subtraction for segmentation. Although using a static background is somewhat limiting in designing and performing experiments (requiring a static environment and a static camera), and it is possible to segment images for tracking without this step if there is a decent contrast between the objects and the background, background subtraction has the advantage of giving a stable object segmentation that simplifies later steps. For creating a background image, anTraX uses random sampling of frames from the entire duration of the experiment, or from a segment defined by the user (Appendix- figure 2A-B). The Appendix- figure 1: Flow diagram of the anTraX algorithm.
number of frames ! is configurable, and the background !" is computed by applying either a per-pixel median or max operator: Where $ is a randomly drawn timepoint in the experiment, # ! is the corresponding frame, and are the image coordinates, and is the color channel index. Generally, the median operation is useful in cases where animals are active enough to have each pixel in the image free of animals for at least half the frames. Otherwise, the max operation gives better results. The anTraX GUI enables the user to test and optimize the parameters in the background image creation step.

Creating an ROI mask
Typically, tracking should be performed only in part of the image, either because the animals to be tracked are confined to a region smaller than the image, or because the user cares about behavior in a small region of interest (ROI). The ROI mask ()* (Appendix- figure 2B) is a binary image with the same dimensions as the video frames, which is 1 in regions to be tracked and 0 in regions to be ignored. The anTraX GUI includes a utility to create the mask by drawing shapes to be included or excluded on a frame.

Image segmentation
The first step in analyzing each frame is segmenting it into blobs (Appendix- figure 2C, Appendix- figure 3): contiguous regions of the frame that significantly differ from the background and correspond to individual animals or tightly clustered groups of animals. Segmentation is done by first subtracting the image from the background (using the fact that the animals are dark and tracked on a light background), then converting the difference to a grayscale image (Appendix-figure3A-B), and comparing to a user defined threshold + and the ROI mask to produce a binary image: The resulting binary image (Appendix- figure 3C) will then undergo optional morphological operations (image closing, image opening, hole-filling, convex hull filling) that, depending on the specific conditions of the experiment, are useful for noise reduction. Blobs (connected components; using the 8-connectivity criterion) are then extracted from the final binary image. For each detected blob, we register the coordinates of its centroid, its area, its maximal intensity (in the # , grayscale image) and the parameters of the best fitted ellipse (orientation, eccentricity, and major axis length). Blobs are then optionally filtered by minimal area and minimal intensity criteria (Appendix- figure 3D). The anTraX GUI allows the user to test and configure all the segmentation parameters.

Linking blobs across frames
After blobs are extracted from a frame, the next step in the algorithm is to link them to the blobs in the previous frame (Appendix- figure 4A-E): a link between a blob in frame and a blob in frame − 1 implies that some or all of the individual animals that belong to the first blob, are present in the second one. A blob can be linked to one blob (the simplest case, where the two blobs have the same composition), to a few blobs (where animals leave or join the blob), or to none (suggesting the animals in the blob were not detected in the other frame). Relying on the fact that videos were recorded in a frame rate high enough so that blobs corresponding to the same individuals will overlap in consecutive frames even when the tracked animals are moving at their maximum possible speed (for O. biroi ants, for example, 10 frames per second is sufficient), the most accurate method to link blobs is Optical Flow, which takes into account the actual pixel content of the image. It is, however, a computationally expensive algorithm, and running it on full frames is not practical for long, high-resolution videos. On the other hand, simpler and commonly used methods, such as the popular Munkres linear assignment algorithm (the Hungarian algorithm, [1]) are prone to errors in dense problems such as those we aim to solve, and often require considerable amount of manual correction after automatic tracking. In sophisticated tracking solutions, the distance-based cost function that underlies the linear assignment is corrected with predictive modeling of the animals' behavior, or with other distinguishing features of the animals such as shape, orientation, and appearance. These, however, are often problem-specific and do not generalize well across tracking problems. We chose to implement a dynamic approach, in which the linking method is chosen based on the difficulty of assignment. The linking step begins with dividing the linking problem into a few independent subproblems, by using a maximal linking distance ( /0'1 ), which by default is set to twice the maximal velocity 234 times the inter frame time interval. Practically, this is done by creating a binary image, defined as the pixel-wise logical OR of the two segmented binary frames, and dilating it using a disk with a radius that equals to /0'1 (Appendix- figure 4F). The resulting image is then divided into connected components, and all the blobs that overlap with each component are treated as an independent subproblem. For each subproblem we choose the appropriate linking method: (i) a problem with one or more blobs in one of the frames and no blobs in the other results in no links, (ii) a problem with exactly one blob in each of the frames will link the blobs with no further processing, (iii) otherwise, a small region containing only the blobs in the subproblem will be cropped from each of the frames, and an optical flow assignment will be performed (Appendix- figure 4G). For solving a subproblem using optical flow, we do the following: We first crop a region from the two frames, corresponding to the bounding box of the subproblem's connected component. This region includes all of the blobs that belong to this subproblem, but no others. We then compute the optical flow field between the two cropped frames using the Horn-Schunck method [2]. Next, we define the Flow Number, 56 ( , ), for each pair of blobs across the two frames as the number of flow field vectors pointing from blob in frame − 1 to blob in frame . The flow number is an estimate of the number of pixels in the blob that have moved to blob in the consecutive frame. For each pair, if the flow number is greater than a threshold number 56 , the blobs are linked ( Figure 4H). The threshold number defaults to a third of the minimal size of a single animal in pixels and can be configured using the anTraX graphical interface. Once again, all the parameters of the linking step can be configured and tested in the anTraX GUI.

Updating the tracklet graph
As defined above, a blob can correspond to an arbitrary number of tracked individuals. Instead of trying to break these blobs down into individual animals, our tracking approach relies on registering the transition of individuals between blobs that possibly contain multiple animals. For this purpose, we define the tracklet as a list of linked blobs in consecutive frames that have the same composition of individuals. In other words, no animal has left or entered the group between the first and last frame of the tracklet (Figure 1 in the main text). After linking the blobs in frame to this in frame − 1, the tracklets are updated in the following way:

I.
A blob in the current frame that is not linked to any blob in the previous frame − 1, will 'open' a new tracklet.

II.
A blob in the previous frame that is not linked to any blob in the current frame, will 'close' its tracklet.

III.
If two blobs in the previous and current frames are exclusively linked, the current blob will be added to the tracklet of the previous blob.

IV.
Whenever blobs in the current or previous frame are connected to more than one blob, the tracklets of the linked blobs in the previous frames will 'close', and new tracklets will 'open' for the blobs in the current frame. In these cases, the linking between the blobs across different tracklets will be registered as a link between the tracklets. In cases where a tracklet has its last blob linked to the first blob of a different tracklet, the former is defined as the parent tracklet, and the latter as its child tracklet.
Although the linking and tracklet construction processes are very conservative, errors can still occur when the assumptions of the algorithm are violated. For example, in benchmark dataset J16, in which the behavior of a 16 ant colony is recorded in an uncovered arena surrounded by Fluon-coated walls, ants sometimes climb on the arena's walls and fall down on top of another ant, hence violating the maximal linking distance assumption. In such a case, the tracklet corresponding to the climbing ant will end without parenting a child tracklet, while the tracklet of the second ant will contain one ant in its first part and two ants in its second part. In the analyzed dataset, such linking errors occur very rarely (less than 0.05% of the tracklets), and in most cases will not lead to classification errors, due to the robustness of the ID propagation step to such errors (section 3). Upon closing a tracklet, the blob orientation has a ±π ambiguity, as a result of the definition of the orientation as that of the best fit ellipse, which is not consistent along the tracklet (for each blob, the orientation is set independently of the other blobs in the tracklet by MATLAB's blob analysis algorithm) . We use a method adapted from [3] to disambiguate the orientation. In short, this method uses the fact that whenever the tracked animal is moving fast, we can reliably assign the correct orientation in the moving direction and propagate this assignment to the entire tracklet by using dynamic programing. In tracklets where the animal is not moving fast enough, the result can be incorrect, but it is at least consistent along the tracklet. Multi-animal tracklets generally do not have a meaningful orientation. The end result of this part of the algorithm, after processing all frames in the video, is an acyclic directed graph containing references to all tracklets and blobs in the experiment.

Color correction
The actual RGB values of the color tags are highly sensitive to changes in illumination and variability in camera sensors, both between experiments, and within an experiment as a function of time and location. These sources of variability can adversely affect the performance of the tracklet classifier. To overcome this problem, at least partially, we include the option of applying a color correction step on images before classification (Appendix- figure 5). To do so, we use a white reference frame , which is an image of a white or gray surface taken using the same conditions as the videos. The color corrected frame is then: Pixel values that exceed the pixel value range are truncated. In cases where the tracking background approximates a homogenous white surface, as is the case with all the benchmark datasets, the white reference can be automatically generated by anTraX by filtering the background image with a 2D Weiner filter. In other cases, a white reference image can be taken in the experimental setup before or after the experiment.

Training a blob classifier
Classifying a tracklet begins with classifying the individual blobs it contains. To do so, we train a Convolutional Neural Network (CNN) image classifier using TensorFlow [4]. To create a classifier, the user has to supply a list of possible labels. Typically, this will be the list of IDs (unique tag combinations) for the experiment, plus optional labels for non-animal objects that can be detected in the videos (e.g., larvae, food items, etc.). One of the limitations of using CNNs for classification is the high rate of false positives, i.e., blobs that are assigned an ID even though they are not identifiable. To overcome this, we add a special label for unrecognizable blobs, which are treated as a separate class (labeled as 'Unknown' or 'UK').
To train the classifier, we collect a set of example images for each classifier label (Appendix- figure  6). This can be done easily using an anTraX GUI app (see Online Documentation for details). The example set will then be augmented using various transformations (flipping, rotations, shearing, and brightness and color shifts; Appendix-figure 7). As usual with supervised classifiers, there is a tradeoff between the complexity of the classifier (the size and architecture of the network), and its performance, training time, and the optimal size of the training dataset. anTraX contains a few CNN architectures that we have found to work well with our data. However, it can also use an arbitrary, user-defined architecture (see Online Documentation for details).

Filtering tracklets for classification
Once the blob classifier is trained, it can be applied to the tracklets of the experiment. Because direct classification is only meaningful for tracklets that represent individual animals, we first filter the tracklet list to identify possible single-animal tracklets. To do so, we use the typical size range of individual animals (interactively adjustable in the anTraX GUI). A tracklet whose average blob area falls within that range is considered a possible single-animal tracklet, and is passed on to the classifier. Although this filtering method is not perfect, it rarely leads to false negatives (single-animal tracklets with average blob size outside of the specified range). If the rate of false positives is high (which is usually the case in problems with high size variability between individuals), it is useful to include a separate class for multi-animal blobs. For performance reasons, this filtering is done during the blob tracking step, and the images constituting possible single-animal tracklets are saved separately to disk, thus avoiding the need to extract them again from the videos. It is therefore important to set the single animal size range before running the tracking.
Appendix-figure 7: Dataset augmentation using TensorFlow's intrinsic mechanism for image transformation on a single example image to generate a larger training dataset.

Classifying single-animal tracklets
To classify a possible single-animal tracklet, we perform the following steps: I. The blob classifier is applied to each blob in the tracklet. The output of the classifier for each tracklet is a matrix of posterior probabilities, 1/ , i.e., the probability of blob belonging to class . We define the most probable label for a blob as: II.
If the most probable label for all of the blobs in the tracklet belongs to a non-animal label, the tracklet is classified as non-animal, and the most abundant label in the tracklet is chosen as the tracklet label.

III.
If any of the blobs in the tracklet is classified as multi-animal, the tracklet will be classified as multi-animal. This step will occur only if a multi-animal class has been included in the classifier.

IV.
If there is no ID label in the sequence of most probable labels, the tracklet is marked as unidentified.
V. Otherwise (i.e., the tracklet is single-animal and there are at least some blobs classified as a specific ID label), we define a score for each possible ID as the sum of the posteriors for that ID over all blobs: The tracklet is labeled with the ID that has the maximal score: In addition, we define and register the classification confidence score as: Where n is the number of blobs in the tracklet classified as specific animal IDs. Using this definition, the confidence score will increase as evidence for the assignment accumulates (so longer tracklets with more identifiable blobs will have a higher score).

Verification and retraining
Although this is not the final tracklet ID assignment, it is useful to be able to estimate the performance of tracklet classification.

Propagating IDs on the tracklet graph
At this stage, we have the tracklet graph, in which a subset of single-animal tracklets have been labeled with a specific ID and a confidence score for that label. The rest of the tracklets in the network are either unidentified single-animal tracklets, or multi-animal tracklets. We assume that some of these classifications can be incorrect. The next step in the algorithm is to make the actual ID assignments for the tracklets, and to propagate these assigned IDs over the tracklet graph, trying to identify the composition of all tracklets, including multi-animal tracklets. In the process, a large portion of the incorrect classifications will be identified and overridden by the algorithm.

Initializing the graph
We start the propagation algorithm by creating a dynamic list of possible IDs (initially representing all individuals in the experiment) and a dynamic list of assigned IDs (initially an empty list) for each node in the graph. These lists are continuously updated during the propagation process. For all nodes (tracklets) that have been assigned a "non-animal" label in the classification step, we initialize the possible ID list also as an empty one, effectively removing these nodes from the graph.

Propagation and assignment rules
Propagating and assigning IDs are done according to a set of rules executed in a specific order. For each node to which we want to assign an ID, we do the following (see Appendix-figure 8, Figure  I. If the ID we want to assign is not on the list of possible IDs for that node, abort.

II.
If the node represents a single-animal tracklet (i.e., is in the area range of a single animal as defined by the user AND was not classified as a multi-animal tracklet by the classifier), assign the ID and eliminate all other possible IDs. If it is not a single-animal node, assign the ID without eliminating other IDs.

III.
Horizontal propagation (negative): for all other nodes that overlap in time with the currently assigned node, eliminate the ID we just assigned.

IV.
Vertical propagation (positive): for each parent node of the current node, look if the currently assigned ID is on the list of possible IDs. If there is only one such parent, and it has not already been assigned the ID, assign the ID to that parent. Do the same for child nodes.
V. Topological propagation (positive): a pair of nodes on the graph that constitute a 2-vertex cut set (i.e., cutting the graph at both these nodes creates a disconnected subgraph) and the corresponding disconnected subgraph does not contain any other 0-indegree or 0outdegree nodes (i.e., there are no animals leaving or exiting the subgraph), are defined as tween nodes. Such a pair will have exactly the same composition of IDs (this is not true in cases where one of the tracklets in the subgraph touches a border of the ROI at a point where animals can exit and enter; these cases are flagged during tracking, and no assignment is made; the special case of open boundary ROIs is discussed below). For each assignment, we also assign the first descendent tween node and the first ancestor tween node (if they exist). For each node from which we want to eliminate a possible ID, we do the following: I. If the ID is already marked as 'assigned' for that node, abort.

II.
Vertical propagation (negative): for each parent node of the current node, if there is no other child node for which the ID that we are currently eliminating is possible, eliminate the ID for that parent. For each child of the current node, if there is no other parent for which the ID is possible, eliminate the ID for that child.

III.
Topological propagation (negative): eliminate the ID for the first ancestor tween node and the first descendent tween node (if they exist).

Propagating from classified single-animal tracklets
Before we start the propagation, we rank all the single-animal tracklets that were labeled with a specific ID by the classifier (the 'source' tracklets) according to their confidence scores. We start by assigning the tracklet with the highest score with its classified ID, and then recursively propagate according to the rules above. When no more propagations can be made, we move on to the next tracklet on the list. All nodes with assignments inherit their confidence from the confidence of the source node. Once the last source tracklet has been reached, we conduct another round of propagation, this time starting from all nodes with assigned IDs (not only the CNN-classified nodes), again sorting them according to their confidence, so that higher confidence propagations will have precedence. This process is repeated until no more propagations can be made.

Handling open boundary ROIs
The assumption that underlies the propagation rules as described above is that a tracklet indeed represents a given set of tagged animals in each of its frames, and that the tracklet graph correctly captures the flow of individual animals between tracklets. This assumption is violated if the ROI of the experiment is open (i.e., animals are free to exit and leave the tracked region), because a tracklet that touches the open boundary can have a changing set of tracked animals. To handle these cases, blobs that overlap with an open boundary are treated differently. In the blob linking step, whenever a blob that touches the open boundary is linked to a blob that does not touch the open boundary in the previous frame, the tracklet closes (even if it is a 1:1 link as defined in section 1.4), and a new tracklet opens and will be linked to the previous with a graph edge. The same happens when a blob that does not touch the boundary is linked to a blob that does. This way, the blobs touching the boundary (i.e., blobs that can 'loose' or 'gain' animals) are confined to the same tracklet. These special tracklets do not participate in the propagation process (i.e., they do not act as sources for IDs and do not accept vertical or topological propagations). Open boundaries are marked by the user as part of the ROI mask creation (see online documentation). See also benchmark dataset V25 (Figure3-supplement1, Figure3-Movie2) for an example of a tracked experiment with an open boundary.

Connected component filtering
Ideally, at this point, when all possible ID propagation options are exhausted, we have inferred the maximum information about the composition of each tracklet. If we look at the subgraph corresponding to a specific ID (defined by all the nodes that are possible for that ID, along with their connecting edges), we expect to see a single connected component (Appendix- figure 9A). This connected component will consist of nodes assigned that ID, which will not have nodes parallel to them in the subgraph, as well as nodes without ID assignments, which can in principle have ambiguities (parallel nodes that are members of the same subgraph). However, because the tracklet classification process will usually result in some errors, the ID subgraph can have several disconnected components (Appendix- figure 9B). To filter out connected components that correspond to classification errors, we assign a confidence score to each connected component, defined as the sum of the confidence scores of all the ID assignments in that component. We then go over the list of components sorted by their confidence, and accept them in order. Whenever a component contradicts one of the already accepted components (e.g., it overlaps in time, or does not contain a possible route on the graph to a previously accepted component), we discard it. To eliminate a component, we undo all assignments made of the focal ID to the nodes of that component, and all the eliminations that resulted from these assignments. This is done separately for each ID subgraph. Following the connected component filtering, we again run the ID propagation loop to close the gaps between the accepted components (Appendix- figure 9C). This procedure is repeated until no more component filtering can be made.

Finalizing assignments and exporting data
At this point, when all inference options are exhausted, each ID is represented in several types of nodes/tracklets. In order of decreasing assignment quality, these are: Single-animal tracklets that were assigned by the classifier and confirmed by the graph propagation algorithm (i.e., that were not identified as erroneous and overridden).

II.
Single-animal tracklets for which IDs were inferred by the propagation algorithm.

III.
Multi-animal tracklets for which IDs were inferred by the propagation algorithm.

IV.
Tracklets for which no ID was assigned, but which are the only possible tracklet for a particular ID.

V.
Points of ambiguity, where no assignment was made with confidence, and several temporally overlapping nodes could possibly contain the focal ID.
When exporting trajectory data for the experiment, the assignment type for each point in the trajectory is also reported.

Multi-colony experiments
anTraX enables tracking multiple colonies/groups within the same video. This feature is useful when designing and performing high-throughput experiments, where one camera records several colonies. For multi-colony experiments, the software assigns a colony ID to each tracklet during the initial tracking step, based on the spatial location of the tracklet. During the graph propagation step, the software partitions the tracklets into a number of graphs, one for each colony. Propagation is then performed on each colony-graph separately, and the final trajectories are saved separately for each colony. Dataset G6X16 (Figure3-supplement8, Figure3-Movie9) gives an example of tracking an experiment where 6 colonies of 16 ants each are recorded with a single camera.