MovingCables: Moving Cable Segmentation Method and Dataset

Manipulating cluttered cables, hoses or ropes is challenging for both robots and humans. Humans often simplify these perceptually challenging tasks by pulling or pushing tangled cables and observing the resulting motions. We propose to use a similar interactive perception principle to aid robotic cable manipulation. A fundamental building block of such an endeavor is a cable motion segmentation method that densely labels moving cable image pixels. This letter presents MovingCables, a moving cable dataset, which we hope will motivate the development and evaluation of cable motion segmentation algorithms. The dataset consists of real-world image sequences automatically annotated with ground truth segmentation masks and optical flow. In addition, we propose a cable motion segmentation method and evaluate its performance on the new dataset.


I. INTRODUCTION
M ANIPULATING one-dimensional deformable objects such as cables, hoses or ropes (henceforth referred to as "cables" for brevity), especially when cluttered, is challenging both for humans and robots due to self-occlusions, high-dimensional state space, uniform visual appearance, and complex interaction dynamics.Imagine, for example, that a robot should replace a specific damaged cable in the scene shown in Fig. 1.There are passive computer vision methods [1], [2], [3] for segmenting individual cable instances.However, these methods struggle with occlusions or complex intersections of multiple cables.Novel cable segmentation methods are therefore needed.Our work is inspired by the way humans interactively discover the topology of cluttered cables when trying to untangle them.When a human finds it too hard to visually infer whether two cable segments are directly linked, she grasps and pulls or pushes one of them.The motion visually distinguishes the grasped cable from the clutter.This observation guides us to integrate perception and interaction to aid robotic cable manipulation.
Methods that segment moving cables are an essential building block of the eventually integrated action-perception loop.To test or train such methods, we need a suitable dataset.Creating such a dataset is challenging because we need to obtain not only the cable instance segmentation masks but also the cable motion ground truth.We created an automatically annotated moving cable dataset and a novel method able to segment moving cables.
As our robots are too large to manipulate thin cables gently, we recorded video clips featuring a garden hose being manually pushed by a poking stick.We painted UV fluorescent markers on the hose to facilitate ground truth motion estimation.The UV paint is invisible in regular white light but shines clearly in UV light, see Fig. 2. Marker tracking automatically estimated the ground truth optical flow and chroma key techniques generated cable and poking stick segmentation masks.Finally, we generated video clips featuring multiple overlapping hoses by compositing several single-hose video clips into one.
The contributions of this letter include: 1) MovingCables, the first moving cable segmentation dataset with optical flow and instance segmentation ground truth, is automatically generated by a novel data annotation method.2) MfnProb, a novel cable motion segmentation algorithm based on an optical flow prediction neural network with probabilistic outputs.3) An evaluation of five cable motion segmentation algorithms (including MfnProb) on the new dataset demonstrates how the dataset can be used.The cable motion segmentation methods presented here assume that either the segmentation mask of the arm moving the cables is available or the arm is not visible in the image.In practice, one can obtain the arm mask using, e.g., the arm CAD model and forward kinematics, model-based rigid object segmentation or pose estimation/tracking [4], UV fluorescent markers, or color thresholding (our approach).
Section II discusses the related work.Section III presents the dataset creation process, the automatic data annotation method, and the resulting data's nature.We introduce four cable motion segmentation algorithms based on optical flow prediction neural networks in Section IV.Section V suggests how such algorithms can be evaluated on the dataset, Section VI presents the evaluation results, and Section VII discusses the results and concludes the letter.
a) Cable perception: Cable segmentation is generally challenging because cables are often of uniform appearance without distinctive features.Several cable detection or segmentation methods in the literature thus relied on simplifying assumptions.Some assumed a single cable was present in the scene [8], [9], others relied on a good cable/background color contrast or on color thresholding to segment the cables from the background [8], [10], [11], [12], [13], [14], [15].
A DeepLabV3+ semantic segmentation neural network can segment wires in an image [5].Ariadne+ [6] segmented individual wires by processing a superpixel region adjacency graph, taking advantage of the DeepLabV3+ semantic segmentations.An additional TripleNet network predicted the superpixel connectivity scores at wire intersections.
FASTDLO [3] is a recent state-of-the-art passive wire instance segmentation method.It skeletonized each foreground segment predicted by the DeepLabV3+ network to find cable sections, intersections, and endpoints.At each intersection, a similarity neural network paired the neighboring segments with similar color, thickness, and direction estimates.The more recent RT-DLO [2] method replaced FASTDLO's skeletonization with a sparse graph-based approach to handle degraded foreground segments.mBEST [1] found cable instances in skeletonized foreground segments by minimizing the cumulative bending energy of the cables.FASTDLO, RT-DLO, and mBEST may struggle with multiple overlapping cables and severe occlusions, see Fig. 3.We note, however, that scenes involving occlusions or more than two cables at an intersection were outside the scope of mBEST [1].Zhaole et al. showed that the semantic segmentation networks [3], [5] trained on wire datasets do not generalize well to cables of different textures and color patterns (e.g.ropes) [16].Their combination of the Segment Anything large vision model with a post-processing method outperformed [3], [5] in segmenting a cable from the background.
Deep networks can replace cable state estimation algorithms when task-specific human-labeled training data is available.They can propose interaction keypoints, detect endpoints, classify knots, or refine grasps.Such networks were applied to untangle a multi-cable knot [13], a non-planar knot [17] or a long cable [14].In [14], an interactive perception algorithm preferred certain manipulation primitives over others when the perception was uncertain.Nevertheless, these approaches assumed that the cables were segmentable from the background by color thresholding.A deep network also helped a robot pick a wiring harness entangled in a pile of wiring harnesses [18].It predicted the success probability of each available open-loop action given a grasp candidate and a depth image of the scene.
Our work exploits the motion of a cable of interest to simplify the cable perception task, even in complex scenes with multiple overlapping cables and severe occlusions.
b) Interactive segmentation: Interactive perception is the exploitation of forceful robot-environment interactions to simplify and enhance perception [19], [20].Interactive segmentation [21], a more specific interactive perception skill, interacts with the environment and segments it into a set of movable objects based on the observed motion.It is computationally efficient and requires little prior knowledge about the environment.
Interactive segmentation processes a visual motion signal to segment the moving objects.Options to consider include intensity image differencing with 2D template tracking [21], dense optical flow [22], [23], [24], [25], sparse feature tracking [24], [26], object trackers [26].Compared to optical flow, intensity change detection performs poorly when the moved object and the background are of similar color [23] or when multiple objects move [25].Change detection used together with optical flow improves robustness under strong occlusion, where never-reappearing pixels degrade optical flow [24].One cannot apply sparse feature tracking to most cables due to their uniform visual appearance.
We have not found any motion segmentation method tested on cluttered cables.To segment cables, we started with a method based on thresholding the magnitude of optical flow predicted by an off-the-shelf neural network [27].Next, we improved its results by extending it with probabilistic outputs [28] and by retraining it on standard optical flow datasets.
c) Cable datasets: We are not aware of any existing moving cable dataset.Zanella et al. [5] published a static cable dataset for training and evaluating segmentation methods.They took photos of wires on a monochromatic background and randomized it using the chroma key technique.In [7], a human labeled 3D keypoints along a real-world wire using a VR tracker pen.A camera mounted on a robotic arm took images of the wire from different viewpoints.The authors trained semantic and instance segmentation networks on dataset mixtures containing different proportions of synthetic and real-world images.They showed that adding real-world training data improved accuracy at test time.
We propose MovingCables, a novel dataset utilizing UV fluorescent markers to obtain the motion ground truth.UV fluorescence provided the ground truth in datasets for optical flow [29] and the semantic segmentation of rigid and deformable objects [30], [31].Baker et al. [29] painted fluorescent speckles onto several objects, including clothes.They switched between visible and UV light to record images with and without the speckles.The Lucas-Kanade algorithm estimated the ground truth optical flow even for low-textured objects thanks to the speckles.Instead of relying on speckles, we opt for stripe markers to obtain uninterrupted marker trajectories extending across the entire video recording.

III. MOVINGCABLES DATASET
Here we present the dataset creation process, the automatic data annotation method, and the nature of the resulting data.We started by recording the video clips of a single hose with a blue screen in the background (Section III-A).Chroma key segmentation and UV fluorescent marker tracking automatically annotated these images with optical flow and segmentation masks (Section III-B).Finally, we composited multiple recorded single-hose clips and various background images (Section III-B) to obtain the final composed dataset consisting of clips showing multiple overlapping hoses (Section III-C).

A. Raw Data Recording
A Basler ace aCA640-750uc camera with a 6 mm lens recorded the moving cable scene.A frame standing in front of the camera held the two endpoints of a plain yellow garden hose.We placed a blue screen in the background.The poking stick, see Fig. 2(c), was a long thin aluminum bar with dark green cardboard attached to one of its faces.We ensured the cardboard faced the camera when recording to keep the aluminum bar hidden.
The UV fluorescent stripe markers in Fig. 2(b) are cylinder shells painted on a cable in regular intervals with transparent UV fluorescent paint (UV-elements Invisible Glow Lacquer green1 ).
White LED strips two meters tall lit the background blue screen from the sides, see Fig. 4(a).Another set of vertical UV LED strips (370 nm wavelength) illuminated the cables, see Fig. 4(b), (c).Solid-state relays turned the white and UV LED strips rapidly on and off.White LED strips could also illuminate the cables in the foreground with visible light.Instead, we used high-power white SMD LEDs and a custom LED driver with a digital PWM/enable control input.
The camera recorded the scene at 640 × 480 pixels, 120 FPS.Its digital trigger output signal emitted at the start of every exposure controlled the lights.A UV-lit image followed each white-lit image taken by the camera so that the white-lit image sequence was recorded at 60 FPS.We recorded one video clip per one poking interaction.Each clip is 10 seconds long and contains ca.1201 images.The raw recorded dataset consists of 177 clips and 212 581 images.

B. Post-Processing
We post-process the recorded clips in two stages.The first stage performs chroma keying, marker detection, marker tracing, and optical flow ground truth computation.Foregroundbackground compositing and data augmentation run separately in the second stage.
a) Chroma keying: We use chroma keying to key the blue screen and the green poking stick, see Fig. 5. Chroma keying The marker detector extracts individual marker blobs by thresholding a UV-lit image using a fixed intensity threshold.It then locates the center point of the marker blob, see Fig. 6(a).
c) Optical flow ground truth: Optical flow is an independent per-pixel estimate of motion between two images [32].Given the current image I j sampled at x i ∈ R 2 discrete pixel locations, optical flow vectors φ i ∈ R 2 estimate the location of these pixels in a reference image I 1 .The optical flow minimizes the brightness or color difference between corresponding pixels summed over all the pixel locations of the current image, We provide two types of flow ground truth: full optical flow and "normal flow".Sufficiently textured cables allow full optical flow estimation."Normal flow" is relevant for textureless cables that only exhibit motion at their boundaries.It is the normal projection of the optical flow vector on the cable boundary normal unit vector.Both ground truths neglect motions caused by a cable rotating around its axis.
In the recordings, the cable never crosses itself and its endpoints are outside the image.Given a cable segmentation mask (a binary image) and marker traces, interpolation estimates the ground truth flow for each cable pixel.
Thresholding the background-foreground alpha matte yields the foreground mask, and thresholding the poking stick alpha matte yields the poking stick mask.We dilate the poking stick mask by two pixels to ensure that (almost) all poking stick  boundary pixels are segmented.The cable segmentation mask is the foreground mask with the poking stick mask pixels removed (set to zero).
The interpolation process illustrated in Fig. 6(b) finds the longest closed contour in the cable segmentation mask, removes its points lying on the image boundary and finds the cable backbone curve by interpolating the two remaining parallel contour lines.Fitting a spline curve to the backbone points estimates the normal vectors for computing the normal flow.Linearly interpolating the displacement of the two markers closest to a backbone point yields its motion.The remaining pixels of the cable segment obtain their flow estimate from their nearest backbone point.See Fig. 7 for a sample visualization of the ground truth optical and normal flow magnitude during a poking action.
d) Compositing and data augmentation: We composite each final clip from a static background image, a moving cable clip and one or more static clips or still images extracted from moving cable clips.We keep both the moving and static poking sticks in the compositons.One can generate a semi-three-dimensional scene of cables stacked on top of each other this way, see Fig. 8(a).
We manually downloaded CC0-licensed (public domain) background images from the internet.The search was biased towards textures, bushes or woods, and distractors (queries: texture, colorful texture, fractal texture, bushes, ropes, wires, pile).We divided the images into two classes: clutter and distractors.Distractors may be confused with hoses, cables, wires, or ropes.Clutter is everything else.See Fig. 8.
Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.

TABLE I THE MAIN FEATURES OF THE COMPOSED DATASET
Even though the backgrounds are often artificial textures or high-quality photographs, we wanted to reduce any JPEG artifacts and remaining sensor Thus we downscaled each background image at least by a factor of two and extracted the center crop 640 × 480 pixels in size.
Foreground augmentation randomly alters the color of moving and static cables.It can transform hue, contrast, saturation and brightness; invert RGB colors, shuffle RGB channels, or convert to grayscale.
A sensor noise model adds artificial noise to the static background and still cable images to ensure that all image regions exhibit similar noise distributions.If we did not add noise, methods based on temporal image differencing could "segment" the moving cable by assuming that only the moving cable pixels were affected by variable sensor noise.
e) Sensor noise model: We use sRGBNoise [33], a model originally trained on images taken by five different smartphones [34].The model generates noise conditioned on a noise-free image, the camera name, and ISO value.We collected a training set with the Basler camera to train its noise model.We treated the (downscaled) background images as the noise-free input to sRGBNoise at inference time.However, the real sensor noise already corrupted the still cable images.Therefore we applied a bilateral filter to suppress the noise before feeding them to sRGBNoise.

C. The Composed Dataset
We composed the final dataset from the 177 recorded clips (106 200 white-lit images in total).Each recorded clip shows a cable of a single configuration (i.e., a characteristic global shape), see Fig. 9, and a single motion class.
a) Dataset features: Table I summarizes the main features of the composed dataset.The motion classes are: poking the cable, pushing/pulling an endpoint, endpoint lateral motion, or static (no motion).The cable density relates to the number of cables overlaid in a composition.We used every recorded moving cable clip to create exactly two composed clips, each with a unique background and a unique combination of cable configurations.In a subset of the compositions, we also randomly transformed the colors of the cables or the plain background.
Table III presents the numbers of images and video clips in each composed dataset split.Each video clip is ten seconds long and consists of ca.600 white-lit images.

IV. PROBABILISTIC MASKFLOWNET MOTION SEGMENTATION METHOD
Given a sequence of color images, poking stick segmentation masks, and a motion threshold τ , a motion segmentation algorithm detects cable motion with respect to the first (reference) image I 1 of the clip.The algorithm outputs a motion mask P m for each image.The pixels p of cable segments in image I j shifted by more than τ pixels away from their position in the reference image I 1 should be marked as moving in the motion mask, P m (p) = 1.Poking stick pixels and all other pixels p should be marked as static, P m (p) = 0.
We compare five cable motion segmentation methods.The first four of them are baseline methods based on off-theshelf optical flow predictors, namely MaskFlownet [27], GM-Flow [35], FlowFormer++ [36] and the OpenCV implementation of Farnebäck's optical flow algorithm [37].To compute the motion segmentation masks, we added optical flow magnitude (L2-norm) thresholding to these methods.
The fifth method is our novel proposed method, MfnProb.To create MfnProb, we added probabilistic outputs [28] and thresholding to the MaskFlownet deep neural network architecture.Given a pair of noisy input images and trained (certain) network weights, MfnProb predicts noisy optical flow vectors.The probability distribution of a predicted optical flow vector is assumed to be multivariate Laplacian parametrized by location µ and a diagonal covariance matrix Σ, σ 2 = diag Σ.The network learns to predict the mean φ p = μ and the standard deviation σ p ∈ R 2 of each optical flow vector probability distribution given a pair of images.
The predicted standard deviation (or variance) has to be nonnegative.To ensure that, Gast and Roth [28] proposed to predict the variance in log space, i.e., σ 2 = exp(σ 2 ), where σ2 was the (log-space) output of the neural network.When we tried to train MfnProb with the exponential, the training diverged.Therefore we replaced the exponential with a softplus function, i.e. σ = ln(1 + exp(σ)) if σ ≤ 20 and σ = σ otherwise, to ensure nonnegative standard deviations.
The training loss function of a predicted optical flow vector which is proportional to the negative log-likelihood of the multivariate Laplacian distribution.We set The index i runs over the two flow coordinates, horizontal and vertical.σ p,i is the standard deviation predicted by the network for the flow coordinate φ p,i .We set = 10 −8 and σ min = 10 −2 to stabilize the training.We trained with the same training schedule on the same optical flow datasets as [27], namely FlyingChairs [38], FlyingThings3D [39], Sintel [40], KITTI [41], HD1K [42], [43].
In addition to thresholding the optical flow magnitude, Mfn-Prob can utilize the predicted uncertainty to reduce false positives.The segmentation labels any pixel with uncertainty magnitude σ p 2 > σ t as static.We empirically set the uncertainty threshold σ t on the validation set to maximize the mean segmentation intersection over union (IoU).In practice, we argue it is safer to predict a static scene when too uncertain because reliable robot's actions, such as grasping, depend on precise true positive segmentation.When a segmentation algorithm has high precision but low recall, the robot can compensate for the low recall by trying multiple different motions until the segmentation succeeds.On the other hand, compensating for low precision is harder.

V. ALGORITHM EVALUATION PROCESS
Given a τ value, a predicted motion mask P m , and the ground truth optical flow, the evaluation process computes standard segmentation quality metrics, namely the mean intersection over union (IoU), precision, and recall.Our experiments show that increasing the τ threshold above 10 pixels (up to 20) leads to significant decreases in both IoU and recall on the validation set of our dataset.On the other hand, the maximum noise level of the marker detector is 0.528 pixels for static markers.Therefore the evaluation varies τ from 1 to 20 pixels on the validation set and chooses the optimal τ * value yielding the highest validation IoU.The evaluation reports the test set results given τ * .In practice, a robot should try to move a cable as little as possible to preserve the cable topology and avoid hitting other cables by accident.
The evaluation also reports the mean endpoint error of the predicted optical flow (EPE) in pixels.

VI. EVALUATION RESULTS
Table IV shows the evaluation results of the cable motion segmentation methods on the test set of our dataset.Methods MaskFlownet FT and MfnProb FT are MaskFlownet and Mfn-Prob fine-tuned on a mixture of Sintel, KITTI, HD1K, and the MovingCables training set.We evaluated the methods on the normal flow ground truth as the hoses in the clips have almost no texture, see Fig. 8(a).The optimal motion threshold τ values on the validation set were 2.5 pixels for MaskFlownet, 2.0 pixels for MfnProb, 1.0 pixel for Farnebäck, 1.0 pixel for GMFlow, and 1.5 pixels for FlowFormer++.The optimal uncertainty threshold of MfnProb was positive infinity, i.e., no high-uncertainty predictions had to be suppressed to maximize the validation IoU.
MfnProb outperforms MaskFlownet in all the evaluation metrics.The probabilistic training scheme reduced the overall mean EPE by almost half.Mean segmentation recall has improved by 68%, precision by 25%, and IoU by 42% simultaneously.MfnProb outperforms GMFlow in terms of IoU and recall but not precision.FlowFormer++ reaches the highest IoU among the methods not fine-tuned on MovingCables.MfnProb FT achieves the highest IoU overall.Sample segmentations are in Fig. 10.Our additional qualitative experiments on real-world videos without chroma keying or compositing indicate that all the motion segmentation methods generalize well to different cable textures (hoses, ropes, cables) and real backgrounds.

VII. DISCUSSION AND CONCLUSIONS
We have proposed a method to automatically annotate a real-world moving cable segmentation dataset with optical flow and segmentation masks thanks to UV fluorescent markers, controlled lighting, and chroma keying.Using the method, we have created the MovingCables dataset consisting of 312 video clips.The clips differ in their backgrounds, cable colors, numbers of overlaid cables, motion interaction types, or distinct combinations of cable configurations.
As an alternative to a real-world dataset, one could build a synthetic dataset in a simulator.For example, the Blender software can simulate chain-like rope dynamics. 2It would likely require less manual work as one would not need to design and build any hardware setup.A simulator could simulate a cable in many different positions, such as lying on a desk or hanging freely.However, the cable appearance and the scene lighting would be synthetic.Furthermore, simulating realistic hose or cable dynamics may be more challenging than simulating a chain-like rope.Nevertheless, a synthetic moving cable dataset could complement the real-world dataset presented in this letter.
We have tested MaskFlownet, GMFLow, and FlowFormer++ off-the-shelf optical flow neural networks on our dataset and found that they can segment moving cables from a static background.We added uncertainty outputs to the MaskFlownet architecture and retrained it with a probabilistic loss function on standard optical flow datasets.This retrained MfnProb network has significantly improved the cable motion segmentation performance over MaskFlownet on our dataset.Fine-tuning MaskFlownet and MfnProb on MovingCables further improved the accuracy.Nevertheless, we believe that optical flow estimators should work reliably on any realistic visual input without fine-tuning.
Limitations: We have found that all the neural networks struggle with texture-free backgrounds.Furthermore, manipulating a cable in a cluttered environment can perturb neighboring cables, causing multiple moving cables.As our methods segment motion by thresholding the flow magnitude, they segment multiple moving cables as a single cable.We will address this limitation in future work.

Fig. 1 .
Fig. 1.One of the untidy cables in the scene is moving.Its motion segmentation by MfnProb is in green.

Fig. 2 .
Fig. 2. Yellow hose, dark green poking stick, blue backdrop.(a) No markers are visible on the hose in white lighting.(b) UV lighting shows the UV fluorescent markers and hides everything else.(c) Detail of the tip of the poking stick.

Fig. 3 .
Fig. 3. Instance segmentation of an image from our dataset.

Fig. 4 .
Fig. 4. (a) Vertical white lights light the blue screen background from the left and right.(b) Shining UV light strips with white light turned on.(c) Only UV lights turned on.

Fig. 6 .
Fig. 6.(a) Marker center point detection.1) Fit the minimum area rotated bounding rectangle to the blob.2) Rectangle (and marker) center line.3) Scan along lines parallel to the center line.Find the endpoints of the line segments entirely within the blob.4) Fit a parabola to each set of endpoints.Use orthogonal distance regression (ODR).5) Intersect each parabola with the center line to estimate the central segment.6) The central segment center is the marker center.(b) Interpolating optical flow along a cable backbone (middle curve).The cable segmentation mask is white, its contour lines are red and green.The dots represent marker centers, their colors indicate the optical flow magnitude.Black arrows show the unit normal vectors of the backbone spline.

Fig. 7 .
Fig. 7. Sample ground truth optical and normal flow magnitude in pixels when poking the cable at the right side towards the left as marked by the white arrow.
Dataset splits: We composed the training, validation, and test dataset splits as follows.First, we divided the recorded clips into three mutually exclusive sets, one for each dataset split.The division satisfies the following constraints: (a) The images of each recorded clip are used in only one split.(b) In each split, each cable configuration is represented by at least one moving cable clip.(c) The number of recorded clips of each motion class in each split is specified in Table II.(d) The cable density classes are represented equally.

TABLE II THE
DIVISION OF THE RECORDED CLIPS BY MOTION CLASS INTO THE THREE SPLITS (TRAINING, VALIDATION, TEST)

TABLE III THE
SIZE OF THE COMPOSED DATASET AND ITS SPLITS

TABLE IV MEAN
EVALUATION METRICS ON THE TEST SET

TABLE V MEAN
WALL AND PROCESS RUNTIMES REQUIRED TO COMPUTE OPTICAL FLOW FOR A PAIR OF RGB VGA (640 × 480 PIXELS) IMAGES

TABLE VI MEAN
IOUS ON THE TEST SET SEPARATELY FOR THREE BACKGROUND TYPES, CLUTTER, DISTRACTOR, AND PLAIN

Table V
presents the runtime of each algorithm with batch size one.We obtained these results on a desktop computer with Intel Core i9-9900 K CPU (3.60 GHz) and NVIDIA GeForce RTX 2080 Ti GPU.The original and the probabilistic MaskFlownet networks are similarly computationally intensive, achieving runtimes around 0.040 s per image pair on the GPU and 2.6 s on the CPU.The CPU process times suggest that both networks demand approximately 48× more CPU computation than Farnebäck's algorithm.FlowFormer++ is less suitable for real-time interactive perception than MfnProb as it is 5.7× slower on a GPU.We further evaluated the methods separately on clips with different background classes (clutter, distractor, plain), see TableVI.Clutter and distractor backgrounds yield comparably accurate segmentations.Plain backgrounds, however, tend to cause significantly more false positive segmentations by the neural networks in static areas, resulting in lower mean

TABLE VII STATISTICS
OF PER-CLIP MEAN IOUS ON 20 CLIPS WITH VARIOUS SOLID BACKGROUND COLORSIoU.Replacing poorly textured plain backgrounds with texturefree solid colors completely confuses the neural networks, see TableVII.They falsely predict motion in almost the entire image.Fine-tuning MaskFlownet or MfnProb on MovingCables brings negligible improvements.By contrast, plain backgrounds do not affect Farnebäck's performance significantly.We think that the neural networks do not regularize towards the smallest flow at a pixel where many flow vectors have very similar matching costs.