Deformable Linear Objects 3D Shape Estimation and Tracking From Multiple 2D Views

This letter presents DLO3DS, an approach for the 3D shapes estimation and tracking of Deformable Linear Objects (DLOs) such as cables, wires or plastic hoses, using a cheap and compact 2D vision sensor mounted on the robot end-effector. DLO3DS can be applied in all those scenarios in which the perception and manipulation of DLO-like structures are needed, such as in the case of switchgear cabling, wiring harness manufacturing and assembly in the automotive and aerospace industries, or production of hoses for medical applications. The developed procedure is based on a pipeline that first processes the images coming from the 2D camera extracting key topological points along the DLOs. These points are then used to model each DLO with a B-spline curve. Finally, the set of splines obtained from all the images is matched by exploiting a multi-view stereo-based algorithm. DLO3DS is validated both on a real scenario and on simulated data obtained by exploiting a rendering engine for photo-realistic images. In this way, reliable ground-truth data are retrieved and utilized for assessing the estimation error achievable by DLO3DS, which on the employed test set is characterized by a mean reconstruction error of 0.82 mm.


I. INTRODUCTION
N OWADAYS the request for the automation of processes involving cables, wires, hoses, wiring harnesses, and in general Deformable Linear Objects (DLOs), is relevant in many industrial manufacturing areas. As an example, in the automotive and aerospace sectors, the assembly of the cabling systems is actually an expensive process almost completely based on human work [1]. On the other hand, automating aspects concerning the manipulation of DLOs is not easy. In fact, manufacturing processes involving DLOs pose serious problems at both the manipulation and perception levels [2] since their intrinsic deformability makes modeling their behavior during the manipulation complex. Moreover, if connectors or other rigid parts attached to them are missing, the DLOs lack of features and significant textures make their detection with vision systems challenging even with new state-of-the-art learning-based methods [3], [4]. Manuscript  In this letter, we analyze the problem of the accurate estimation of the DLOs 3D shape. In this regard, general purpose consumer 3D cameras like Intel RealSense or CamBoard pico flexx fails in perceiving thin objects like DLOs [5]. This problem is shared across all consumer devices irrespective of the specific 3D depth technologies. The only category of 3D active cameras that can reliably detect the shape of very thin cylindrically shaped objects like DLOs with a diameter as low as 2-3 mm is the high-end one, consisting of devices like Zivid One+/Two and Photoneo MotionCam3D or short-range laser scanners [5]. In fact, these devices can reach sub-millimeter depth accuracy, but, on the other hand, they show several limitations in terms of pricing, bulkiness, and working constraints. Thus, they are usually placed at a fixed position and not at the end-effector level, increasing the risk of occlusions and reducing the flexibility of the application. If semi-transparent materials are taken into account, such as in the case of medical hoses manufacturing, even high-end 3D sensors are not able to correctly detect those materials because of transparency, refraction and internal reflections [5].
In contrast, 2D cameras arranged in a stereo (or multi-view stereo) setup could potentially be more effective in detecting thin DLOs. However, these passive 3D devices have limitations in terms of baseline (which is fixed and optimized for distant objects) and usually struggle in case of changes in lights and non-textured areas. DLOs, having small dimensions and lacking relevant textures, represent a difficult object to tackle for passive stereo cameras.
To address the drawbacks of both 3D active and passive cameras, we decided to deploy a single 2D camera mounted on a robotic arm. Utilizing just a 2D camera brings many beneficial effects: these cameras are usually cheaper than 3D devices, more compact and lighter, they have a wide range of resolutions, and the field of view and working distance can be easily adapted to the specific scenario. In addition, placing the 2D sensor on the robotic arm allows for exploiting the high repeatability and accuracy of the latter to avoid occlusions while, at the same time, enabling immense flexibility in terms of baselines and distances from the target.
In this letter, a method to infer the 3D shape of DLOs in static scenes by exploiting DLO instances [4] extracted from multiple images is introduced. For the sake of brevity, the proposed method is referred to as DLO3DS in the following. DLO3DS exploits a multi-view stereo-based approach to reconstruct the 3D DLO shape from multiple images taken at known viewpoints without any prior knowledge of the DLOs or the surrounding scene, independently from the background. DLO3DS extends the preliminary results obtained in [6]. The DLO instances, after being modeled as B-spline curves, are matched by exploiting a triangulation-based method. DLO3DS provides reliable results where standard stereo-matching algorithms [7] fail due to the peculiar characteristics of DLOs previously discussed. Finally, the availability of the robotic arm is exploited by optimizing at run-time the baseline and distance from the target, thus reducing, even more, the estimation error. In Fig. 1  r Optimization at run-time of baseline and distance from the target exploiting the robotic arm and camera on the endeffector; r Extensive analysis of the accuracy and error characteristics of the 3D estimations with comparisons against several stereo-based baseline methods; The DLO3DS source code is available at https://github.com/ lar-unibo/DLO3DS.

A. 3D Reconstruction From Multiple Views
The 3D shape reconstruction of objects from 2D images is a complex and extensively analyzed problem in computer vision. In this letter, we develop a multi-view stereo approach for the 3D estimation of DLOs. The goal of multi-view stereo is to reconstruct a complete 3D object model from a collection of images taken from known camera viewpoints [8]. In this section, we review the closest contributions and methods dealing with stereopsis. In order to achieve high accuracy in the 3D reconstruction, we should work on both the disparity error and the geometric error [9]. The first is related to correspondence algorithms while the latter is to physical parameters like baseline and distance from the objects. In the context of correspondence algorithms, stereo approaches are usually classified between local and global methods [10]. The latter are usually slower but more effective than the first in the case of non-textured areas. Among the many existing approaches, Semi-Global Matching (SGM) [7] is the most widely used approach due to its balance between quality, efficiency and scalability. However, its limitations in the case of non-textured areas are well-known [11] and several works have tried to address its weaknesses, such as time execution with a GPU implementation [12]. With the rise of deep learning, several approaches have been proposed for the computation of correspondence by employing SGM with, for instance, learned parameters [13], learned matching cost [14], a complete end-to-end learning approach [15]. Learning methods could potentially solve several challenges of traditional stereo algorithms, although the problem of dataset generation and model deployment in the real world still remains to be evaluated.
Concerning the geometric error, it can be not possible to adjust the baseline in case of a fixed stereo setup, as well as with commercial solutions. Thus, only the distance of operation (and possibly the resolution) can be modified. Instead, some works exploit either multiple 2D cameras mounted with different baselines to combine the advantages of short and wide baseline systems [16], whereas others employ a single 2D camera and a robot to emulate a multi-baseline system [17]. DLO3DS tackles both the disparity and geometric errors. The first is addressed by the reliable processing of the 2D images and the matching of splines. The latter is by exploiting the robotic arm optimizing at run-time the baseline and the distance from the target.

B. 3D Perception in Robotics
Common sensors that can be found in robotics are from the RealSense and PrimeSense families [18] or more expensive ones such as the Ensenso 3D camera [19]. Very recently, new highly accurate 3D active sensors were made available, like the ones from Zivid or Photoneo. From [5] emerged their suitability for reconstructing very thin and small objects. Alternatively, Linear Laser Scanners mounted in an eye-in-hand configuration can be considered as well, in case an even higher reconstruction accuracy is sought [20]. Despite the abundance of sensors and 3D technologies, there still exist some limitations in case a specific application, like the one presented in this letter, requires the utilization of the device very close to the target while satisfying space constraints as well [5]. DLO3DS solves these limitations by empowering a compact 2D camera mounted on the robot end-effector.

C. DLOs Detection and Segmentation
Concerning the detection and modeling of DLO in images, several approaches have been described in the literature [2]. The semantic segmentation of DLOs, specifically electric wires, via learning-based methods has been attempted in [21] where a dataset is made publicly available. Similarly, a weakly supervised dataset generation approach combining synthetic and real images of DLOs has been proposed in [22]. Simpler methods to segment DLOs in images are based on markers [1], background color removal [23], [24], [25], Frangi filter [26], Ridge filter [27] or ELSD algorithm [28].
More complex approaches are [3], [4], [29]. In particular, these methods allow obtaining the individual instances composing the scene. In Ariadne [29], the individual DLOs are segmented from complex backgrounds starting from their endpoints, which are detected by a CNN. Ariadne+ [3] improves Ariadne by building the graph representation directly from the segmentation mask avoiding the CNN step. In combination with a more efficient paths discovery algorithm, better accuracy and a noticeable speedup are achieved. Recently, a 20 Frames-Per-Second (FPS) capable approach named FASTDLO [4] was proposed, further boosting accuracy and throughput. These methods were then exploited in bigger frameworks in order, for instance, to combine the sensing of DLO also from tactile sensors [30]. In [6], instead, some preliminary results about the shape estimation of DLOs are provided. Notice that DLO3DS contains several improvements with respect to [6], like being able to deal with multiple DLOs in the scene and to track a target DLO shape. In addition, extensive experimental validations and comparisons are provided.

III. INSTANCE SELECTION AND MODELING
This section reports the details about the processing and estimation of a spline for each captured image. The estimated spline is employed both for computing the 3D shape of the DLO but also for aligning the camera with the target DLO main direction through Principal Component Analysis (PCA). Indeed, in order to increase the portion of the same DLO visible in every sample, it is assumed to have the camera oriented along the DLO main axis and to record the samples by sliding orthogonally to it, see Fig. 3 for an example of the sliding direction with respect to the DLOs orientation. In Section III-A, the extraction of DLO instances from each image and their modeling via B-splines curves is presented. Section III-B discusses the selection of the target spline among the set of detected ones extracted from a captured sample.

A. DLOs Segmentation and B-Spline Modeling
DLO3DS exploits existing approaches for segmenting the DLOs from an image. In this work, the learning-based algorithm named FASTDLO [4] is employed, taking as input the RGB image of the scene and providing as output both an instance mask, where each DLO is denoted with a unique color identifying the assigned ID, and a sequence of 2D coordinates in the image plane for each detected DLO. A cubic B-spline is fitted to these coordinate points obtaining a continuous representation of the considered DLO. The considered spline is addressed as q(u), where u ∈ [0, 1] is the free parameter, i.e. the normalized position along the spline neutral axis. The computed curve is then discretized into a fixed number n s of points. The utilization of a learning framework in [4] allows to intrinsically deal with changes in lights and textureless areas, partially solving the limitations discussed in Section I. However, other image processing pipelines can be employed for increased robustness and depending on the application scenario.

B. Spline Selection
The spline selection is performed in case an image contains multiple DLOs. Indeed, all the instances extracted from an image are modeled in Section III-A. However, in the following, a single DLO spline per image is expected. Thus, a regression-based distance approach is employed for retrieving the target DLO in sample i, based on sample i − 1, with i = 2, . . . , k. In particular, the point-to-point distance between the target spline T of a sample i − 1 and each newly detected spline of a sample i is computed, as shown in Fig. 3. Then, a line is regressed for each distance curve and the spline associated with the smaller slope line is selected. Indeed, due to the orthogonal sliding direction, the same portion of DLO is assumed to be visible in each sample, thus an overall constant distance between the two curves given by the motion of the baseline step is expected.

IV. SHAPE ESTIMATION FROM MULTIPLE VIEWS
This section details how the different k splines are matched and exploited for obtaining the final 3D shape of a given DLO, see Fig. 2. In Section IV-A the matching of the splines is discussed, while in Section IV-B the triangulation approach is detailed. In Section IV-C the possibility of employing the reprojection error for evaluating the quality of the estimation is presented. In Section IV-D, the optimization of the baseline and distance from the target is described. Finally, in Section IV-E, the applicability of DLO3DS in a DLO tracking framework is analyzed. Fig. 4. Scaling process. The shortest spline is selected as the reference and all the others are scaled to match the same DLO portion as closely as possible.

A. Splines Matching
A spline q i (u) can be sampled by defining a suitable vector u of n s equally-spaced free parameter values in the interval [0, 1]. Thus, n s 2D pixel points along the DLO for the i-th view are retrieved.
Let's denote with p ij = [p x ij p y ij ] T the j-th spline sample on the i-th image plane, with i = 1, . . . , k and j = 1, . . . , n s . To assess the accurate 3D location of a generic point seen from multiple images at pixel coordinates p ij , we need to compute precisely the corresponding points p ij . For this purpose, we exploit both the constraints embedded in the case of a normal stereo setup and the availability of the splines modeling the DLO.
The first step consists of sampling all the splines over the same DLO section by defining suitable vectors u i , one for each spline. The length l i of each spline is measured by summing the distance in pixels among adjacent points. Then, the index r of the shortest spline is taken as a reference r = argmin i {l i : i ∈ 1, . . . , k} Thus, the splines are re-sampled according to the redefined vectors of free parameters where the function d(·, ·) provides the distance in pixels between two points. As a consequence, the spline samples q i (u i ) provide a coarse matching across the different views. The n s spline samples of the shortest spline need to be precisely matched in all the other splines q i (u), i = 1 . . . k \ r. In this regard, the corresponding j-th point on the i-th image plane p ij is searched along the row coordinate of p rj , empowering the basic constraints of epipolar lines in case of a normal stereo rig, as the intersection point with the spline q i (u). In the eventuality of multiple matches between the spline curve and the epipolar line, a smoothness constraint is also employed enforcing the most consistent point based on the past matches.
The aforementioned procedure is depicted in Fig. 4. The spline samples p ij are then used to compute the DLO 3D shape as detailed in Section IV-B.

B. Multi-View Triangulation
For the sake of simplicity, the discussion is focused first on just one target point, i.e. j = 1. Let us consider the case in which a single unknown point p in the Cartesian space expressed with respect to the world reference frame is observed by the camera mounted on the robot from multiple points of view. Provided that the camera frame with respect to the world frame at the i-th points of view is where w R c i is the rotation matrix and w t c i is the position of the camera frame origin in world coordinates obtained from the kinematics of the robot and the extrinsic parameter of the camera calibration. It is assumed that the point p is seen in the image related to the i-th points of view at p i = [p x i p y i ] T , being p x i and p y i the point pixel coordinates in the image. A so-called unit ray v i passing through the image reference frame origin and p can be expressed in the image frame considering the pixel coordinates p i and the camera focal distance f where c x and c y are the pixel coordinates of the image center (assuming the camera frame is centered with respect to the image). Then, v i can be expressed in the world frame by Provided that k distinguished points of view are available, the estimationp of the unknown point p can be obtained by looking for the point having the minimum distance from all the rays. By defining the symmetric V i matrix providing the semi-norm on the ray distance, the point location estimatex is provided by the nearest point search algorithm, i.e.
The aforementioned algorithm is thus applied to estimate the DLO segment employing as input the spline samples p ij = p i (u j ), j = 1, . . . , n s , i = 1, . . . , k. The vector of control points q v = [q 1 · · · q n s ] T of the 3D spline q(u) that optimally approximated the set of point estimates p ij can be defined as where # represents the matrix pseudo-inverse and being V ij the matrix computed according to (1) for the j-th sample provided by the i-th image.

C. Evaluation of Estimation Error by Reprojection
To evaluate the estimation error, the 3D DLO B-spline obtained in Section IV-B is reprojected on each image and the difference with respect to the input 2D spline provided by Section III-A is computed. Considering a generic 3D spline sample q(u j ) = B q v , its homogeneous representation is provided byq(u j ) = [q(u j ) T 1] T . The projected coordinatesp ij = [p x ijp y ij ] T of the j-th spline sample on the i-th image plane can be written as is the camera matrix containing the camera intrinsic parameters, such as the focal length f and center point coordinates c x and c y . Then, the overall error is provided by collecting all together in a single vector the error related to every single image, i.e. e = [· · · e ij · · · ] T , j = 1, . . . , n s , i = 1, . . . , k, where e ij = p ij −p ij is the distance between the corresponding initial spline sample provided by Section III-B and the projection on the image plane of the estimated 3D spline sample. Finally, the mean error norm e n s k = √ e T e/(n s k) can be used to evaluate the quality of the estimation result.

D. Online Reconstruction Optimization
In a general stereo setup, the two sensors are fixed and, as a consequence, their baseline can not be modified. In our setup, instead, the mobility of the robot can be exploited in order to find the best baseline and distance from the object corresponding to the minimum depth error. Indeed, both the baseline b and the distance from the target object z are responsible for the overall depth estimation error arising in triangulation methods, with the well-known relationship [9]: where denotes the depth error, f the focal length of the camera and d the disparity error (assumed to be within one pixel in the following). Thus, given a set of points in the 3D space p : n s ]}, the optimization problem aiming at minimizing the depth error can be implemented, having the following cost function: where δ z denotes the camera distance increment from the object, a value that can be either positive or negative. This multi-variable optimization problem is subjected to a set of bounds and constraints that limit the admissible search space. The bounds are thus defined as : where p x i is the row pixel value corresponding to the 3D coordinate p i , σ is a safe offset in pixel coordinates to avoid regions of the image near the borders, w is the image width and z min denotes the minimum distance of the camera from the 3D point. The solution to this minimization problem provides the optimal pair of baseline b and camera distance increment δ z . Notice that (3) and (4) restrict the value of the parameters b and δ z such that all the points p are inside the k images taken using the optimal parameters. The optimization routine requires as input an initial guess of the depth values z i . Thus, a coarse guess should be utilized or an initial execution of DLO3DS with fixed default parameters for b and δ z is required for computing the initial guess. Moreover, in the case of tracking of the DLO shape (see Section IV-E), the values of the previous section can be used as an initial guess.

E. Tracking
In order to achieve a precise estimation of the DLO shape, DLO3DS is executed with camera samples captured in the proximity of sections of the DLO, e.g. the depth error is proportional to z (2). Thus, if the estimation of a long DLO shape is sought, a different approach is required. In this section, we described the steps employed for applying DLO3DS in a tracking framework, thus reconstructing the full 3D shape of a DLO combining individual estimations of small sections. In particular, after the estimation of a given section of the DLO, the camera is moved forward along the DLO principal direction and centered with respect to the estimated points. Thus, based on the overlap parameter n o , a given percentage of previous points are still visible in the next DLO section and they are used for keeping track of the DLO under reconstruction, even in presence of multiple DLOs in the scene, as shown in Fig. 5.
At the end of the tracking performed along a DLO, the 3D points estimated for each segment of the DLO under analysis are collected in a unique vector in order to then obtain a single spline curve able to represent the overall DLO 3D shape.
Moreover, in order to further improve the estimation, these points are filtered to eliminate outliers and overlaps produced by subsequent acquisitions. To this end, the Locally Weighted Scatterplot Smoothing (Lowess) algorithm [31], a locally weighted regression method that works by defining a window in the sample data, is applied for the final filtering of the points.

V. EXPERIMENTAL VALIDATION
DLO3DS is validated experimentally employing a 7DoF robotic arm, the Panda from Franka Emika, equipped with an eye-in-hand 2D low-cost camera having a resolution of 640 × 480 pixels. The camera is both intrinsically and extrinsically calibrated, as shown in Fig. 6. The experiments are performed both with simulated and real data, in Section V-A and Section V-B respectively. Moreover, in Section V-C, DLO3DS is characterized in terms of processing time.

A. Evaluation in Simulation
To perform a proper evaluation of DLO3DS, ground truth data is needed. Considering that it is quite difficult to obtain an error-free 3D ground truth shape of a real DLO, synthetically generated data [32] is exploited to assert the DLO3DS performances. Thus, a test set of 10 randomly shaped reference synthetic DLOs of 0.8 meters in length is generated resembling the shape and appearance of real DLOs. They are accompanied by ground truth data in the form of 3D points describing their center line.

1) Influence of DLO3DS Parameters and DLO Diameter:
The test set is rendered using three different reference diameters φ = 2.0, 3.5, and 5.0 mm to analyze how the DLO thickness may affect the performance of DLO3DS. In addition, we analyzed the influence of the number of views, the percentage of overlap during the tracking (Section IV-E), and the contribution of the online optimization approach compared to a fixed stereo parameter setup or a partial optimization. When otherwise not specified, the default values are cable diameter 3.5 mm; number of views 3; overlap 50%; optimization of baseline and distance from the object (b + z). With default settings, the estimation mean error is 0.821 mm whereas the reprojection mean error is 0.731 pixels.
The box plots resulting from this analysis are depicted in Fig. 7. From the plots, it is possible to conclude that the diameter of the DLO does not play a major role in the estimation error. The same can be said for the overlap percentage, with the only remark that in a real estimation a bigger overlap may help to compensate for calibration errors. On the contrary, the slight drop in the error between 2 and 3 views is noticeable. Indeed, we commonly deploy DLO3DS using 3 views since the increase in the algorithm processing time is negligible and can be mostly compensated by its execution in masked time, as detailed in Section V-C.
Ultimately, online optimization does play a major role in bringing the interquartile range of the reconstruction error between 1 and 0.5 mm. The contribution of optimizing just the baseline corresponds to an error drop of 9 % compared to the fixed setup. Instead, the optimization of the camera distance provides a drop of 22 %. The joint optimization makes the error drop of 29 %. We claim that the major relative improvement of z as opposed to b compared to the fixed setup is due to the changing of the virtual baseline, i.e. the baseline virtually increases when the camera is moved closer to the object. Thus, in the z experiment there is an actual minor coupling with b making its result closer to the b + z configuration.
2) Comparison With Baseline Methods: A comparison between DLO3DS, established methods like Semi-Global Matching (SGM) [7], and more recent approaches like SISTER [17] is provided by rendering the sample number 1 of the test set with different backgrounds and colors. For the estimation performed by DLO3DS, we used 3 views. Concerning the SGM method, we used just 2 views and we compute the matching cost one time via Census Transform (denoted as CENSUS/SGM) and a second time via a learned similarity measure [13], [33] (denoted as MCCNN/SGM). Finally, for SISTER we used 5 views as detailed in [17]. Aiming at a fair evaluation, in all the experiments the baseline was set to 25 mm, and the not-optimized fixed setup was employed, see Section V-A1. Fig. 8 shows the computed depth images normalized between the min and max values of the ground truth one. Both SISTER and SGM provide as output a disparity image, thus we converted it into a depth image given the known baseline and focal length. Instead, DLO3DS provides as output just 3D points describing the DLO center line. In order to compute the depth image, the estimated 3D points and the colored mask of Section III-A are used to first estimate the radius of the DLO in world coordinates. Then, the original center line description is over-sampled and used to reconstruct the DLO surface keeping into consideration its radius. The result is a dense depth image of the DLO. For a fair comparison, the methods are evaluated only for what concerns the depth values belonging to the DLO, the ground truth mask was used to select those points. The error between each method and the ground truth depth is computed by subtracting the latter from the first and it is shown using a box plot capturing the error distribution. From the figure, it is clear that DLO3DS provides an overall better estimate of the depth with wrong estimates only along the DLO boundaries due to prediction error in the segmentation mask.

B. Real-World Evaluation
To establish a boundary value of the estimation error in a real application, an experiment is performed using two types of purposely designed gripper fingers that, once closed, provide a hollow circle with a diameter of 6 and 10 mm respectively. First, DLO3DS is applied to estimate the DLO shape, then the center of the circumference is used as the reference frame for the generation of the motion: the robot should successfully follow the DLO without touching it, despite the shape of the DLO and changes in the z values. For the sake of generality, this experiment is performed both with electrical cables having a diameter of 3.5 mm and also with a different type of DLO, a polymeric hose for medical applications with an external diameter of 1.2 mm. The material of this hose is semi-transparent, such that it is almost invisible to commercial 3D sensors (including high-end ones) and laser scanners [5]. In Fig. 9, key-frames from a video sequence showing the experiment are reported. The cable of 3.5 mm is tested with the 10 mm gripper, while the hose with the one of 6 mm. Despite the complexity of the task, the cheap 2D camera used in this work is able to provide a reliable reconstruction of the sample objects, allowing for correct tracking without touching in both experiments.

C. Timings
The execution timings of DLO3DS are affected, other than the specific computing resources used, by the instance segmentation, modeling and selection performed in Section III, and by the triangulation procedure of Section IV. The timings of the first are mostly correlated to the choice of the image Fig. 9. Key-frames from a video sequence (available as supplementary material) showing the tracking test performed on DLOs of different types and diameters. Tester gripper diameter: black 6 mm, blue 10 mm. processing algorithm. By employing FASTDLO [4], 20 FPS are guaranteed for processing a single image when deployed on a workstation equipped with an Intel i9-9900 K CPU and Nvidia 2080Ti [4]. The performances' triangulation procedure of Section IV is affected by the number of points (n s ) at which the spline is evaluated. The following values are obtained for some configurations: n s = 10, 7.5 ± 3.3 ms; n s = 20, 19.5 ± 9.3 ms; n s = 40, 27.2 ± 12.1 ms. Overall, DLO3DS provides competitive performances. It is worth mentioning that the data processing on a real setup can be mostly executed in masked time while the robot is moving toward the next pose.

VI. CONCLUSION
DLO3DS utilizes multiple 2D acquisitions for the accurate 3D shape estimation of DLOs. It is a fundamental tool for enabling the manipulation of DLOs by means of a robot without the need for expensive, bulky, and constrained 3D sensors. Thus, DLO3DS can be particularly useful in industrial applications aiming for low-cost and effective solutions to complex manufacturing tasks involving the manipulation of cables, hoses, wires, ropes, and other similar objects.
DLO3DS in its current form deals with static scenes, i.e. the DLOs are still between the images acquisitions, and can be susceptible to the quality of the extracted splines. Thus, future activities will be devoted to addressing dynamic scenes and increasing robustness in cluttered conditions.