Methodology for a reverse engineering process chain with focus on customized segmentation and iterative closest point algorithms

One-off construction is characterized by a multiplicity of manual manufacturing processes whereby it is based on consistent use of digital models. Since the actual state of construction does not match the digital models without manually updating them, the authors propose a method to automatically detect deviations and reposition the model data according to reality. The first essential method is based on the “Segmentation of Unorganized Points and Recognition of Simple Algebraic Surfaces” presented by Marek Vanco (2003). The second method is the customization of the iterative closest point (ICP) algorithm. The authors present the overall structure of the implemented software, based on open source and relate it to the general reverse engineering (RE) framework by Buonamici et al. (2017). A highlight will be given on.• The general architecture of the software prototype.• A customized segmentation and clustering of unorganized points and recognition of simple algebraic surfaces.• The deviation analysis with a customized iterative closest point (CICP) algorithm.Especially in the field of one-off construction, characterized by small and medium companies, automated assessment of 3D scan data during the design process is still in its infancy. By using an open source environment progress for consistent use of digital models could be accelerated.


Background
We want to reposition given CAD models from an assembly ("as-designed"-state) according to underlying 3D scan data ("as-is" state). The use case is an aircraft's interior, where parts like furniture (seats, floor) and other components attached to the aircraft's body are demounted. The scan data is present as tessellated point cloud. It features a relatively bad quality, meaning that the point density is coarse and details smaller than 10 mm are not depicted. Noise artefacts distort the true representation of the geometry. To accomplish this repositioning, we make use of our software framework that is capable of state-of-the-art tools of the RE-process chain (see [1] ). In the scope of this paper, we want to describe the development and general architecture of this prototype. The whole code is open source based. To meet the special requirements of the given use case, we customized common methods: First, the segmentation method and second, the iterative closest point method. In supplementary material, we give further information about the automated process presented in our corresponding paper.

Our framework: Software prototype scangineering
The graphical abstract depicts the architecture and principle of our software prototype. As integrated development environment (IDE), we use Visual Studio Professional 2019. The project is split in main project and framework. The main project is generally for development and testing purposes. It contains a GUI, implemented with Windows MFC . The user is able to import and export data and manually call all data-processing actions. Using the visualizer of PCL , data is displayed in the main window. The framework is a separate project which is added to the main project as a submodule. It includes all of our functionalities to store, manipulate and analyse 3D scan data. The reason and advantage of separating main project from framework is that it enables to include the framework as "invisible" back-end for further software solutions (e.g. as backend for a plugin in proprietary software). As further benefit, the backend does not need to be compiled each time the GUI is adapted.
The framework's subprojects are compiled as static libraries and therefore perform fast when included in the main project. It consists of four C ++ subprojects: • 3D data : The central object is the so-called "PC_Object" (stemming from point cloud object). It contains all datatypes that characterize a 3D object. This may include: point cloud, polygon mesh, point normal, polygon normal, edge graph, area, volume, centroid, principle axes, etc. • Database : The database inherits all different entities that may be constructed from 3D data. A model itself (mostly representing a single PC_Object), segmentation, registration, correspondences, geometry detection, NURBS surfaces, filter, etc. If any other entity type is needed for individual purposes, this database may be extended easily by adding a corresponding class. A database object that is able to access and query all entities completes this structure. • Support & tools : In this structure, all manipulations to obtain the different entities are carried out.
Here, the true "work" is done, whereas the entities in the database only store and organize the outcome of the tools. Similar to the entity types, objects like segmentation or registration exist. Additionally, the static class called "Tools" contains a variety of data processing algorithms, it is used throughout all other objects. • Data analysis : This subproject represents the link to gnuplot . Quantitative data is given as input (e.g. the distribution of surface types and sizes of segmentation results) to generate plots and graphs. This is carried out by a scripting interface using simple gnuplot commands.
The external libraries that we use are.
• PCL (Point Cloud Library): framework that delivers a broad basis for working with point clouds and polygon meshes. It can be installed separately or as an all-in-one installer, that also includes third party libraries VTK, Boost, Flann and Eigen . • VTK (Visualization ToolKit): Library for 3D computer graphics. The PCL visualizer is based on VTK .
Additionally, it provides useful datatypes for polygon meshes and graph structures. • Boost: support for PCL concerning mathematics and datatypes. The optional interfaces to external software APIs enable to integrate the Scangineering framework in existing software environments. To meet the needs of individual technical applications, diverse implementations are conceivable. Depending on given conditions, it could be either open source based or implemented in a proprietary software, in a suitable programming language. Regarding the given use case as an example, a link to the proprietary CAD software SolidWorks is required, which is the desired front-end for the end user. Therefore, we used the API in C#.

Customized method: Segmentation by Vanco
Marek Vanco [2] presented a segmentation method based on normal vectors that includes a variable reference vector. The aim is to detect planar faces. The algorithm uses the region growing principle, meaning that starting from an initial point, a segment is created by iteratively checking the difference of normal vectors of neighbouring candidates. The variable reference vector is the average normal vector of all points already belonging to the segment. It is updated each time a point is added. Thus, two conditions are imposed: first, a condition for local curvature by checking the candidate's normal vector against the segment's neighbouring normal vector. Second, a condition on global orientation of the candidate's normal vector by checking against the reference vector. We implemented Vanco's algorithm as it is, and additionally we simplified it: We added the option to fix the reference vector, such that segments with a desired orientation are detected. This turned out to be useful in our use case, since in the global CAD-assembly, orientations of surfaces are already known. To detect scan segments that correspond to a certain CAD surface, we simply used the known CAD surface normal vector as reference vector. As initial data, a polygon mesh is preferred for the segmentation. Using a mesh, the information about neighbours are given through edge and vertex connections. This makes a separate neighbour search unnecessary.
In the following, the essential elements of how we implemented the segmentation algorithm of Vanco using the variable reference vector is presented as pseudo code. The algorithm constitutes one region growing cycle, starting with one random point of the point cloud, called "seed". To obtain our variant using a fixed reference vector, line 6 becomes v re f = f ixed v ector and line 17 is dropped.
The algorithm is also simplified to no usage of a reference vector if all three lines 6, 14 and 17 are dropped and only the local condition is checked in line 15, obtaining a more general approach for a segmentation method. All variants have their advantages and disadvantages, depending on the actual application. To obtain segmentation results for not only one start point but the whole model, many or even all points of the model can be used as start points. Another strategy could be to choose the next seed out of the remaining points after one region growing cycle. If one does not explicitly exclude already in other segments used points, segments may overlap, thus having points in common. Ideally, each point on one segmented surface generates the same surface when used as start point. The fact of having overlapping patches lead to a further customization that we made in the subsequent step after segmentation.

Variable names and explanations:
As mentioned, the scan data in the use case exhibits many noise artefacts and is very coarse at the same time, leading to very rough representations of surfaces. This is harmful for the segmentation results, since the local criterion is hard to fulfil. The angular threshold for the local condition may be increased, but a too large threshold may lead to regions, that erroneously traverse geometry edges. The seeds for the region growing segmentation are spread randomly over the scan data. Results of the segmentations are mostly small separate patches that initially belong to one single continuous surface. To counteract this problem, we invented a strategy to fuse segmentation results to larger clusters.
Iteratively, all patches are examined if they overlap others -if so, by a certain minimum number of points, they will be merged Fig. 1 . shows the principle. Left, the segmentation results without the cluster formation are depicted. Several small patches are next to and overlap each other, shown in different colours. Middle, the same patches are merged into larger clusters. This leads to a more proper description of the surface, despite of the bad scan data quality. Right, the corresponding CAD part is shown for this scan section, where the continuous planar surface is facing upwards.

Customized method: Iterative closest point
The Iterative Closest Point (ICP) algorithm is a well-known tool for registration, aiming to minimize the distance between two point clouds by aligning the source point cloud with the target [3] . To this end, for each point in the source cloud, the nearest point in the target is determined and the summed squared distance of all point pairs is calculated. This value needs to be minimized iteratively by finding suitable transformations (rotations and translations) of the source cloud. Since regarding our use case we need to align CAD data with scan data, it seems evident to use this algorithm. PCL provides a ready-to-use implementation of ICP.
However, as a project requirement only transformations or rotations in specified degrees of freedom (DoF) were allowed. Therefore, we implemented a customized version and labelled it "Custom ICP" (CICP) that only takes one single DoF for the transformation into account. As measure for the distance between source and target, we took the average over all absolute pairwise distances instead of squared distances and termed it "fitness score": In the use case, the source is the CAD data and the target the 3D clusters resulting from the aforementioned segmentation. One CICP cycle is solely applied per surface, or rather for all surfaces that have the same orientation. For instance, a surface having a normal vector ( 1 , 0 , 0 ) is not suited approximate along y or z axes, but only in x direction. The following figure left shows a typical situation: from two surfaces with different orientations, scan sections could be obtained. Separately for each surfaces, the CICP is applied in the respective direction. Right, a close up of the scan sections (of the curved surface, left red framed) are depicted, showing the cad to scan fitting before (top) and after (bottom) the CICP process.
We obtained the source cloud as follows: the CAD data is imported as .stl file, such that we can upsample points on the polygon faces and afterwards downsample these to obtain a homogeneous distributed point cloud having a higher point density than the scan segments. Furthermore, there were several pre-processing steps required, which we described more in detail in our corresponding paper. The scan patches in Fig. 2 are already exempted from erroneous noise and outliers, thus ready for the CICP. The implementation is straightforward and presented as pseudo code in what follows. To obtain a registration that approximates in coarse steps at the beginning and refine steps when getting closer, we introduced certain refinement levels. Finding out which values for n r , x r and initial transformation matrices was achieved by trying and testing. A more intelligent solution that automatically determines and adapts to the current situation is conceivable and desirable in future. This CICP cycle depicts the alignment in one DoF. To transform the source along different DoFs, we applied these consecutively and tested different combinations.
Variable names and explanations: P s source points n r number of amplitude refinements; typically n r < 5 P s,n source points transformed f c current fitness score P t target points f n new fitness score x factor for amplitude of transformation M 4 x 4 transformation matrix x r Refinement factor for x , typically 0 . 01 < x < 0 . 25