Matching Design-Intent Planar, Curved, and Linear 1 Structural Instances in Point Clouds

6 The lack of timely progress monitoring and quality control contributes to cost-escalation, 7 lowering of productivity, and broadly poor project performance. This paper addressed the 8 challenge of high-precision structural instance segmentation from point clouds by leveraging 9 as-designed IFC models in Scan-vs-BIM contexts. We proposed an automatic method to 10 segment the entire points corresponding to the as-designed instance. The workflow contains: 11 1) Instance descriptor generation; 2) PROSAC-based shape detection; 3) DBSCAN-based 12 cluster optimization. The method matches design-intent planar, curved, and linear structural 13 instances in complex scenarios including: 1) the as-built point cloud is noisy with high 14 occlusions and clutter; 2) deviations between as-built instances and as-designed models in 15 terms of position, orientation, and scale; 3) both Manhattan-World and non-Manhattan-World 16 instances. The experimental results from five diverse real-world datasets showed excellent 17 performance with mPrecision 0.962, mRecall 0.934, and mIoU 0.914. Benchmarking against 18 state-of-the-art methods showed that the proposed method outperforms all existing ones. 19


Introduction
This research is about matching design-intent planar, curved, and linear structural objects in point clouds to maintain geometric building Digital Twins (DTs).By matching, we refer here to detect and segment object instances from point clouds into point clusters.By Design Intent (DI), we refer here to the client-approved, final as-designed model used as a benchmark at the construction stage [55].In this paper, we focus on Industry Foundation Classes (IFC) models that serve as a standardised digital description of buildings [56].By planar, curved, and linear structural objects, we refer here to the top frequent structural object classes ranked by [1], namely, planar and curved walls, slabs including floors and ceilings, beams, and columns.By point clouds, we refer here to data sets with millions of points made of XYZ coordinates [57].By maintaining, we refer here to keeping DT's geometry dynamically updated with the assistance of DI to reflect the as-is statuses of a building at different timestamps during the construction stage [1].By a geometric building DT, we refer here to a product information repository for storing and sharing physical and functional properties of a building over time with all Architectural, Engineering, and Construction (AEC) stakeholders throughout its lifecycle [2].A DT differs from a Building Information Model (BIM).A BIM only provides product information and can be updated at various timestamps throughout the life cycle of a DT [1].By Scan-vs-BIM, we refer here to a process system that aligns scanned Point Cloud Data (PCD) with as-designed BIM models to compare and recognize object instances to support construction progress monitoring and quality control [3].
The lack of timely progress monitoring and prompt quality control are two of the problems plaguing current building construction projects, leading to poor construction project performance [58].Over 50% of construction companies have suffered one or more underperforming projects in recent years [63].Only a quarter of construction projects managed to stay within 10% of their initially planned deadlines [63].Many construction projects exceed their budget.Specifically, 69% surpass their budget by over 10%, while only 31% stay within 10% of their initial estimate [62].Executing large projects on time and within budget is typically a challenge for the construction business [4].The construction industry continues to be one of the least digitised industries in comparison to media, finance, and other industries [59].The AEC industry can benefit from digital technologies, including BIM and DT, with up to a 50% boost in field efficiency, a 10% acceleration in timelines, and an 80% decrease in modifications [61].It is essential to digitise and automate the design, construction, operation, and refurbishment of buildings to enhance their efficiency and performance with the help of DTs [58,60].
Using a DT to facilitate building progress monitoring and quality control still lacks automation in the current state of practice, leading to delayed feedback.For example, stateof-the-art commercial products, such as OpenSpace [64] and Buildots [65], enable image comparison methods to support progress monitoring.For this to be possible, a worker needs to wear a helmet equipped with a 360 ∘ camera and move around a construction site to record a video.Then, this two-dimensional (2D) data stream is compared with the DI to update the status of the project.However, these systems only enable visual inspections and cannot directly update three-dimensional (3D) geometric data.For example, they cannot retrieve the thickness of a wall or a window directly from images.Also, manual effort is still required for quality monitoring and control.In summary, the available commercial products fall short of the high demand for a higher degree of automation and level of resolution in the maintenance of a geometric building DT at different timestamps during the construction stage.
Three stages are required for maintaining a building geometric DT [4]: 1) As-designed BIM model to scanned PCD registration ensures that the DI model (e.g., IFC model) is aligned with the as-built PCD into the same coordinate system.2) Matching DI object instances in the as-built PCD aims to detect and segment instances from the PCD with the help of the DI. 3) 3D representation from the extracted PCD converts the points into information-rich meshes and updates the meshes into the DT.The goal of the first stage is to determine the rigid transformation matrix to align BIM with PCD and it has been well studied.Random Sample Consensus (RANSAC)-based [82] methods are commonly used to (semi-)automate coarse registration [5][6][7][8].Fine registration is always applied after coarse registration to obtain a more accurate result and most well-established methods were derived from the Iterative Closest Point (ICP) algorithm [9][10][11][12].Software including Recap and CloudCompare are also capable of registration in practice.However, the second step is more complex and timeconsuming than the first step.The current research including mapping points to the model's surface [21][22][23][24][25][26]75] and RANSAC-based shape extraction [28,29,[77][78][79][80][81] still have limitations in real and complex environments.The comprehensive literature review towards this stage will be conducted in Section 2 Background.Finally, the third stage has also been well-explored.The methods of mesh reconstruction developed by [66,67] are effective in generating detailed representations from PCD. Rashidi and Brilakis [68] also summarised the methods for filling gaps in PCD, which can be used to improve the performance of meshing.
In this paper, we focus on the second stage and propose an innovative, robust method to automatically detect and segment top frequent building structural objects from PCD with the assistance of DI in a real, complex context.The selected structural objects are built in planar, curved, and linear shapes.We need to detect and segment as-built instances before creating and assigning as-built meshes to a geometric DT to keep it updated.Keeping the geometry updated can help monitor the progress and control the quality at different timestamps during the construction stage.This paper highlights the following contributions in particular: 1.In the Scan-vs-BIM system, many existing approaches only focus on object detection to monitor the construction progress.However, the proposed method herein can not only detect but also segment instances from PCD to offer the entire points corresponding to the instances, where the whole extracted point cluster can help with quality assurance during the building construction stage.
2. Our proposed method is robust on the real, noisy PCD with significant occlusions and clutter, as opposed to most present methods which are only proven on synthetic or simple datasets that are clean and complete.For instance, temporarily stored building materials or workers moving in front of a wall may be scanned into the PCD as noisy points to occlude the wall; using current approaches may result in inadequate or irrelevant point extraction, which cannot accurately depict the as-built geometry.
3. Our proposed method can be used in more complex and real environments where there are distinct deviations in position, orientation, and scale between the DI geometry and the as-built instances, in contrast to most state-of-the-art Scan-vs-BIM methods that fully depend on the DI to detect or segment object instances from PCD.
Our method supports progress monitoring and quality control in the real world by leveraging the DI model without being fully dependent on it.
4. While most of the current methods only focus on cuboid or cylindrical objects, our proposed method is designed for the most frequent structural objects in various shapes including planar, curved, and linear shapes.
5. While most of the current methods only focus on Manhattan-World buildings, our proposed method can also deal with non-Manhattan-World multi-storey buildings.For example, the proposed method is also robust when walls are not aligned with a horizontal axis, or when walls are curved.
The rest of this paper is organised as follows: the background, including the literature review, gaps in knowledge, and objectives, is reviewed in section 2; the proposed method, including the workflow, the details, and the pseudo-code, is introduced in section 3; the experiments and results are shown in section 4; discussion and conclusions are provided in sections 5 and 6.

Background
In this paper, we focus on the advancements in instance matching in the Scan-vs-BIM context.We aim to detect and segment the structural object instances from PCD by leveraging the DI models.This is a crucial step as the second stage in the maintenance of a building's geometric DT.The state of research, including instance detection and segmentation, data fitting and clustering, IFC schema, and object descriptor, is discussed in detail in the following subsections before the knowledge gaps and research objectives are summarised.

Instance Detection and Segmentation
Object detection refers to identifying the location and class of each object instance while object segmentation refers to cutting the whole PCD down to the object instance level.They are crucial steps to generate and update a DT in a 3D environment for many applications, including construction project management and heritage building operation.Structural object detection and segmentation focus on walls, slabs, columns, and beams in buildings.
In Scan-vs-BIM, we investigated five types of methods for instance detection and segmentation: point-to-point, Hough transform, point-to-surface, feature-based, and RANSAC-based methods.Point-to-point matching was initially developed by [17] to detect points corresponding to a DI instance in the scanned PCD.The performance was evaluated by calculating the ratio of retrieved as-designed points to the total number of as-designed points.The threshold ratio was set as 50% to assess the retrieval result on small-scale datasets (4 columns and 1 slab, each within 18,000 points).This method was then adopted to monitor the progress of construction projects, by detecting primary and temporary structural objects [18,19,20] and mechanical objects [3].In [70], the authors also applied the point-to-point method for automatic deviation detection for columns and beams.This method performs well in tracking the existing status of objects when there are few deviations between the asdesigned and the as-built.However, since the retrieval result fully depends on the threshold value setting, the false-negative results will increase when the as-built objects have large spatial deviations against the DI.Also, the false positive will occur if part of another instance is at the same location with the covered points exceeding the threshold.Hough transform [71] maps edge points in image space to parameter space for shape detection.It performs well in line and circle detection with outliers.In [72,73], the authors applied 2D Hough transform by projecting resampled 3D points into circle slices via the normal orientation to detect cylindrical pipes.However, its application demands a consistent as-built position and dimension with the DI geometry.In [74], the authors enhanced this approach by incorporating point-to-point comparison to detect out-of-pace instances and identify the instance completeness.Still, the Hough transform suffers from high storage and computational costs, struggles with high occlusions, and presumes predominant orthogonality in cylindrical object instances, making it less robust in complex environments.The third type of method, point-to-surface matching, computes the overlapping area between the PCD and the model directly, as demonstrated by [21,22].It is also used for deviation analysis between the DI and the as-built objects [75].In [23], the authors improved the method by developing a surface coverage ratio calculation algorithm using alpha shape reconstruction.Other researchers used Euclidean distance to determine the nearest point to the model's surface for instance detection [24,25,26].However, this method struggles to recognize all object points in cluttered PCDs or when the PCD and DI geometry deviate significantly.Their validity is compromised if as-built deviations surpass a manually set coverage ratio.A further approach takes advantage of an object's features, such as position, size, normal, and continuity, for instance detection.In [27], the authors used Lalonde, orientation, and continuity to identify instances, but the method presupposes all instances are DI-compliant.In [76], the authors used five features including length, size, colour, orientation, and the number of connections with adjacent object instances to test the prefabricated pipes in an environment without any occlusion and clutter.In [54], the authors developed a 3D eigenvector-based shape descriptor using voxels for point cluster matching, but this method requires the PCD has few occlusions and clutter.In [15], the authors used the eigenvalues and shape histograms of the PCD for cluster matching, yet this method primarily localises different point clusters without yielding precising instance segmentation results.In [22], the authors computed the probability distribution of the PCD and the model's geometric attributes to match objects, but this requires the denoised PCD without any occlusion.All these methods may not work effectively when the PCD deviates from the DI geometry or when the PCD is cluttered significantly.On the other hand, RANSAC-based methods have been demonstrated to be more effective in instance segmentation in the Scan-vs-BIM context [77].In [78], the author applied RANSAC to optimise the edge points for quality assurance of the full-scale precast concrete slabs.In [79], the authors used a normal-based region growing method with RANSAC to detect cylindrical pipes when the position and orientation of the as-built differ from the DI.In [28] and [29], the authors applied RANSAC and its variant (MLESAC) to segment cuboid-shaped instances and cast-in-place footing.In [80], the authors applied PCA to estimate the normal vector from PCD and RANSAC to estimate planes with different orientations.However, the experimental data is a separate bathroom and an office room with no connected spaces.It is difficult to identify instances after plane extraction for multi-space buildings.In [81], the authors proposed a slicing method with RANSAC for curved façade and window extraction from PCD. RANSAC-based methods primarily detect primitive shapes like cylinders and planes but struggle with complex structures like T-shaped joints or sprinklers.Additionally, the absence of verification or optimization steps can result in inaccuracies in cluttered and occluded environments.Overall, Table 1 summarises the five types of methods and highlights the common limitations of these state-of-the-art (SOTA) methods.
Table 1 Instance detection and segmentation methods and their limitations in Scan-vs-BIM.

Method Reference
Common Limitations Point-to-Point [3, 17 -20, 70] sensitive to the threshold value setting Hough transform [72 -74] high storage and computational costs Point-to-Surface [21 -26, 75] few deviations between PCD and DI Feature-based [15,22,27,54,76] PCD with few occlusions and clutter RANSAC-based [28, 29, 77 -81] object instances with primitive shapes In the relevant area of Scan-to-BIM, deep learning has been widely used for semantic and instance segmentations.Semantic segmentation aims to assign a semantic label to each point in PCD, where points with the same semantic label belong to the same category.Instance segmentation, on the other hand, involves identifying and segmenting individual object instances with a unique label assigned.Scan-to-BIM aims to convert PCD into a BIM representation, which is pivotal when a pre-existing BIM of a building is absent.Scan-to-BIM methods are widely employed for retrofitting projects, historic preservation, and facility management.It can be a foundational step for Scan-vs-BIM by creating a digital representation of a building.This digital model can then be compared against an updated PCD to support progress monitoring and quality control during construction.Many networks have been developed to solve the semantic segmentation problem in the Scan-to-BIM context without support from DI models.PointNet [30] was the first network proposed for processing points in the point cloud.Its key principle is to learn a permutation invariant function that maps an unordered set of points to a fixed-size feature vector.In [31], the authors developed an end-to-end trainable multi-view aggregation model by merging features from images into 3D points.The method is robust for large-scale indoor or outdoor semantic segmentation on the S3DIS benchmark (Stanford 3D Indoor Scene Dataset).In [32], the authors proposed a new window-normalization method by unifying the point densities in different parts to improve the segmentation performance.As for instance segmentation, PointCNN [33] was first proposed to segment points by computing a feature transformation matrix based on local geometric information of neighbouring points.Table 2 and Table 3 summarise the performance of the SOTA deep-learning models for semantic and instance segmentation.The SOTA model can only achieve around 77% mean Intersection over Union (mIoU) for semantic segmentation, and around 75% mean Precision (mPrec) and 72% mean Recall (mRec) separately for instance segmentation.These numbers show that the current SOTA algorithms cannot be directly applied to support construction management due to their relatively lower accuracy level.Higher precision and recall are necessary to support quality control in construction.Elevated precision in instance segmentation is imperative to minimize the inclusion of extraneous points, while enhanced recall ensures comprehensive retrieval of relevant points correlating with the model.Consequently, the segmented point cluster can exhibit superior efficacy in representing the current state of a given instance.In [34], the authors combined deep learning and void-growing approaches together to create digital twins of Manhattan-world buildings with a higher mIoU.Although Scan-to-BIM methods are versatile to support progress monitoring and quality control, some additional steps are still required.For example, a new BIM needs to be created regularly from the updated PCD to support comparison between two BIMs at different timestamps.Therefore, Scan-to-BIM methods cannot be directly adopted to support progress monitoring and quality control due to the lack of matching results between DI and PCD.[40] In conclusion, structural object detection and segmentation in PCD is a rapidly growing field with significant potential for a wide range of applications.However, current SOTA methods for Scan-vs-BIM have common limitations (Table 1) that prevent them from being used on large-scale real-world applications.Despite the limitations and challenges that currently exist, ongoing research aimed at improving the performance and versatility of Scan-vs-BIM methods will likely result in continued advancements in this area.

Other Data Fitting and Clustering Methods
Besides RANSAC, its derivatives are widely used for dealing with situations where the underlying dataset contains a substantial number of outliers while attempting to estimate model parameters.These derivatives are applied to fit shapes and detect instances in PCD as model-driven algorithms.M-estimator Sample Consensus (MSAC) [83] minimizes a cost function that exhibits reduced sensitivity to outliers.In [41,84,85], the authors applied MSAC to extract planes for PCD segmentation.MSAC allows for a more seamless transition between inliers and outliers when compared to the rigid threshold in RANSAC, which leads to better performance when there is no clear-cut difference between inliers and outliers.Nevertheless, MSAC is computationally more demanding than RANSAC.Maximum Likelihood Estimation Sample Consensus (MLESAC) [83] estimates the model parameters that maximize the likelihood, assuming that the noise is Gaussian and that the outliers are uniformly distributed.In [43], the authors developed a Prior-MLESAC algorithm to extract both vertical and non-vertical planar and cylindrical structures.In [86], the authors applied MLESAC to fit surface primitives in PCD.MLESAC provides a more statistically rigorous estimation than RANSAC but requires an accurate estimation of the inlier ratio, which is not always available.Also, the assumption of uniformly distributed outliers may not hold in all cases.In [87], the authors proposed an outlier detection in PCD by getting maximum consistency with minimum distance (MCMD).The method is faster and more efficient in detecting planes by estimating consistent normal vectors.Another RANSAC variant named Progressive Sample Consensus (PROSAC) was proposed to exploit the linear ordering defined on the set of correspondences by a similarity function used in establishing tentative correspondences [42].It capitalizes on the concept of assigning a likelihood to data points being inliers and arranges them in this order.This can improve the speed of the process and the robustness of the result by gradually decreasing the sample size, especially for datasets with a large number of noise or outliers.In [43], the experiments showed that PROSAC is more robust to the data with more outliers than RANSAC and MLESAC while processing time is less than Prior-MLESAC.
Clustering in PCD refers to the grouping of spatially or geometrically similar points into distinct subsets, aiding in data segmentation and feature extraction.K-means [88] is a partitioning method that clusters the points into K number of centroids.It is simple and efficient but requires the number of clusters to be pre-specified.Mean Shift [89] is a nonparametric, iterative clustering algorithm used primarily for mode seeking and data clustering.It can find clusters of any shape, but all points will gravitate toward cluster centres even for outliers.Agglomerative Clustering [90] is a hierarchical clustering method that starts with each data point as an individual cluster and successively merges the closest clusters until only one cluster remains or a specified stopping criterion is met.However, this method does not explicitly handle outliers and can be computationally expensive for large datasets.On the other hand, Density-based Spatial Clustering of Application with Noise (DBSCAN) is an unsupervised 3D data clustering algorithm that groups together the points that are close to each other based on a density criterion [44].The algorithm can identify clusters of arbitrary shapes and noise points that do not belong to any cluster.The algorithm works by defining two parameters: a threshold for the number of neighbours, minPts, and a radius, , to measure an arbitrary distance.Given a set of data points, DBSCAN begins by randomly selecting a point and examining all other points within a distance of  from that point.If there are at least minPts points within that radius, a new cluster is formed.In [45], the authors used DBSCAN for boundary detection for PCD.One of the advantages of DBSCAN is that it does not require specifying the number of clusters in advance, unlike many other clustering algorithms such as k-means.Another advantage of DBSCAN is that it can handle clusters of different shapes and sizes, and it is also robust to noise and outliers in the dataset.However, it struggles with separating clusters with varying densities.HDBSCAN [91] is an extension of DBSCAN that can find clusters of varying densities.Instead of working with a single radius, HDBSCAN constructs a hierarchy of clusters and then extracts flat clusters from this hierarchy, but it is more computationally intensive.

IFC and Object Descriptor
Industrial Foundation Classes (IFC) is a schema and an open standard used in the AEC industry for representing and exchanging building and construction data among various software applications.It defines a standardized data structure for building information, including information about the building's geometry, spatial relationships, and properties of building elements, such as walls, floors, doors, and windows.Many applications including quantity take-off [46], model code compliance checking [47], and energy simulations [48] can be done by IFC models.An IFC model follows a top-down hierarchy to express the properties of a building's structural, mechanical, and electrical objects.Generally, it contains an object's ID label (GUID), dimension (IfcBoundingBox), location (IfcObjectPlacement), material properties (IfcMaterial), connection relationships (IfcRelConnectsPathElements), and space relationships (IfcRelSpaceBoundary), which can be employed for matching DI structural object instances in PCD in our proposed solution.
The concept of object descriptor is widely used in computer vision-based fields.It refers to a set of attributes or characteristics that can be used to identify or classify an object.In [93], the authors evaluated five SOTA 3D descriptors for object recognition, including local descriptors for instance recognition, and global descriptors for classification.Specifically, spin image [94], Signature of Histograms of Orientations (SHOT) [95], and Unique Shape Context (USC) [49] use histograms and point normal vectors to identify local features.Ensemble of Shape Functions (ESF) [13] uses distributions of distances, areas, and angles to identify objects.Principal Axes Descriptor [14] applies principal component analysis and occupancy ratios to identify object types.Similarly, in [54] and [15], the authors used eigenvectors and eigenvalues to generate descriptors for cluster matching.In the Scan-vs-BIM context, existing descriptors can be material-based including colour, texture, and reflectivity [29], or shape-based including height, width, radius, and curvature [50,92], or some combination thereof.Some descriptors use heuristic models with human codification to identify or classify objects [16].However, the specific attributes of an object descriptor restrict this approach to only being capable of DI-compliant cases for instance detection and segmentation in PCD.Relying solely on the descriptor is insufficient to address challenges in PCD with significant occlusions and clutter.Also, there is still a lack of leveraging IFC to develop a generic descriptor for instance matching in the Scan-vs-BIM context.

Gaps in Knowledge and Objectives
Structural object detection and segmentation in PCD in the Scan-vs-BIM context presents several gaps in knowledge: 1.At present, there is still a lack of generic methods to accurately segment the entire cluster of valuable points while simultaneously eliminating noisy points in highly cluttered and obstructed environments in the Scan-vs-BIM context.In certain situations, only a portion of an object's surface is visible and can be captured in the PCD, and obstructions such as building materials or workers can further complicate the scanning result.Because of these difficulties, PCD captured in such complex environments may not accurately reflect the status of instances and can make it challenging to achieve high-precision instance segmentation and geometry reconstruction.
2. We do not yet know how to segment the complete point cluster with a high-precision result when there are significant deviations in the position, orientation, and scale of the DI geometry and the actual object instances.Existing methods are unable to ensure the extraction of all relevant points corresponding to the object instance in the Scan-vs-BIM context.
3. There is still a lack of generic methods to precisely segment the entire point cluster in non-Manhattan-world buildings in the Scan-vs-BIM context.Most methods focus on Manhattan-world buildings where objects are aligned with X-Y-Z axes.To the best of our knowledge, few methods can deal with matching as-designed diagonally positioned or curved walls with high-precision performance in PCD.
4. We do not yet know how to precisely detect and segment the complete point cluster for building object instances that have non-primitive shapes, such as cross piping joints, sprinklers, terminals, and light fixtures.The most existing methods are only effective for cuboid and cylindrical shapes in the Scan-vs-BIM context.
The objective of this research is to develop an automatic high-precision method to match (detect and segment) as-designed planar, curved, and linear structural object instances in PCD.The matching result of point clusters can be used for progress monitoring and quality control at the construction and operation stages.More specifically, this work addresses gaps 1, 2, 3, and part of gap 4 above for structural objects in buildings.

Scope and Overview
The scope of this research is limited to the most frequent planar, curved, and linear structural instances in a typical building.More specifically, we focus on planar walls, symmetrically curved walls, slabs including floors and ceilings, beams, and columns, since around 81.44% of the top frequent structural objects are from these classes [1].
The general thrust behind our proposed method is to follow a top-down idea to break the whole PCD as a high-level initial input into smaller, more manageable clusters in each step.We designed a recursively narrowing-down segmentation algorithm to process clusters from each step to finally reach a high-resolution result of instance-level segmentation.This method can reduce the information loss by clustering points with a necessary but not sufficient condition in each step.
The workflow of the proposed method is illustrated in Figure 1.We use acronyms to present inputs, intermediate outputs, and final outputs of each step.For example, {B|A} refers that B is a subset of A. The left part of the figure shows the inputs from the start and the outputs from the end along with the intermediate outputs of five process steps, while the right part of the figure elaborates on the process of each step within a grep dash box.We require an IFC model and the coarsely aligned PCD, P0, as two kinds of data inputs from the beginning.More specifically, we determine the instance that we want to monitor and record this designated instance's GUID as an input for the further process in Step 1 (shown in magenta).
On the other hand, the coarsely aligned PCD refers that the PCD scanned at the construction stage is coarsely aligned with the IFC model.This is data pre-processing and will be discussed The final output at the end is the point cluster corresponding to the designated wall, column, beam, or slab instances.Each step will be illustrated in detail in the following sections.
Figure 1 The workflow of the proposed method.

Generating Object Instance Descriptor (Step 1)
To the best of our knowledge, most of the existing object descriptors extracted features of object classes or instances, and they were only created for the specific subject matters of research.We therefore proposed a new descriptor named IFC-based Object Instance Descriptor (OID) that is more generic, standardised, and efficient to solve the matching problem in the Scan-vs-BIM context.We take advantage of IFC to calculate and encode a set of necessary variables which can support matching instances in the PCD since an input IFC model is a DI benchmark.
We define an OID here as a data structure that encapsulates the properties associated with a specific instance of a building.We proposed two sub-descriptors, namely Geometry Descriptor (GD) and Relationship Descriptor (RD) to compose a complete OID, as shown in equation (1).GD refers to the geometric attributes that reflect the instance itself while RD refers to the interaction attributes that reflect the surrounding information of the instance.We proposed these two sub-descriptors because we need the both of attributes' information to help us segment the PCD when there are deviations in terms of position, orientation, and scale between the DI and as-built instances.The structure of the proposed OID for supporting the matching process of DI planar, curved, and linear structural instances in the PCD is illustrated in Table 4.In the sub-descriptor of GD, we encode 1) the maximum and minimum XYZ coordinates of the axis-aligned bounding box, AABB, as it can help to set the parameters of an Enlarged Bounding Box (EBB) to crop PCD in Step 2; 2) the attribute of orientation, O, as it serves as a constraint to help determine the points corresponding to the as-designed shape in Step 3; 3) the primitive shape type, S, as it also serves as a constraint to help shape model detection in Step 3. Specifically, we choose AABB rather than Oriented Bounding Box (OBB) because it is easier to compute the accurate value of {  ,   ,   ,   ,   ,   } for AABB without any other conditional inputs and is more generic to be applied on the instances with complex shapes.Figure 2 shows two examples of AABB of two diagonally positioned walls, separately.A diagonally positioned wall refers to the wall that is still perpendicular to the X-Y plane, but the principal axis is not aligned with neither the X nor the Y axis.The attribute of orientation, O, refers to two different variables in different cases, namely, it represents a normal vector,  ⃗ , when the object instance is a wall and slab, while it represents a principal axis,  , when the instance is a beam or column.Finally, we only consider two types of primitive shapes, namely, cuboid and cylinder, as they can be used to represent the geometry of the most frequent structural object classes for this research.In the sub-descriptor of RD, we define, compute, and encode three attributes, namely, Inner Connection Relation (ICR), Border Connection Relation (BCR), and Hierarchy Relation (HR).As discussed before, RD refers to the interaction attributes that reflect the surrounding information of the instance, which is useful for optimizing the final point clusters in real, complex Scan-vs-BIM contexts.Specifically, ICR and BCR work for the point cluster optimization of walls, beams, and columns, whereas HR works for slabs.Figure 3 where   _ refers to the XY value of AABB of the connected wall while   refers to the XY value of AABB of the targeted wall.We add a small tolerance, ∆, to improve the result's robustness.Figure 3 indicates that the connected wall 2 has an ICR with the targeted wall 0 since none of the vertices of wall 2's AABB belongs to the threshold of vertices of wall 0's AABB.Similarly, the discrimination of BCR follows the equation (3) below: which means that the connected wall 1 has a BCR with wall 0 since at least one vertex of wall 1's AABB is within the threshold of vertices of wall 0's AABB in Figure 3.We only need to compute XY values from the top view to determine the connection relationship without considering Z values since the scope of this research is limited to the walls that are parallel to Z axis.We can also determine the connection relationship between beams and columns by equations 2 and 3. On the other hand, we define HR as the affiliation relations between spaces and instances and use this relationship to help optimize the point clusters and remove noisy points in PCD for slabs.We use a backward reasoning method to determine the number of spaces from the related building elements including doors or walls.For example, in Figure 4, IfcRelSpaceBoundary can be determined by the inverse attribute of IfcElement.
The IfcSpace can then be determined by the attribute "Relating Space" of IfcRelSpaceBoundary.We will explain in detail about how to use RD to optimize the result in Step 5.In summary, our proposed OID has three benefits to support matching structural instances in the Scan-vs-BIM context: 1) the attribute AABB in GD can support generating and modifying the EBB which is used for cropping the whole PCD into a small-scale, targeted cluster for subsequent object instance matching; 2) the attributes orientation and primitive shape type in GD can support detecting and extracting the points corresponding to the related shapes to narrow down the point clusters; 3) the RD can help to select the top clusters to optimize the final segmentation result; 4) the GD and RD can also support estimating artificial points to fill gaps in the point cluster (this benefit is out of this research scope).

Narrowing Down PCD with Shape Detection (Step 2&3)
As our proposed solution follows a recursively segmenting logic, we aim to narrow down the input PCD,  0 , into smaller clusters step by step until find the optimized result.We first use an EBB to crop the entire PCD into a smaller cluster in Step 2 to narrow down the size of the PCD.This idea was inspired by the real cases in the Scan-vs-BIM context, where there are deviations in terms of position, orientation, and scale between DI and as-built instances.
Using an EBB instead of an AABB to crop the input PCD can allow these deviations to exist, and at the same time, reduce the processing time caused by the input data size.Figure 5 shows an example of an AABB and an EBB with a cuboid instance.We can take the value of the AABB from GD in Step 1 and extend the size of the AABB by around 20% to 50% to generate an EBB.Cropping an instance within an EBB allows an appropriate tolerance to let us determine the instance in the point cluster, but at the same time increase the existing probability of noisy points.
We would like to segment the points,   , in the EBB with the support of the IFC model to further narrow down the size of PCD.Given the fact that the shape of an as-built structural instance is the same as the DI model, we can segment the cluster from Step 2 by fitting a primitive shape model.In this step, we aim to extract the representing points in a simple, fast, and easily implemented manner, therefore, we chose PROSAC here to fit the points with a given shape model because this algorithm can improve the speed of the process as well as keep the robustness of the result, especially for the dataset with a large number of noise or outliers.PROSAC is an enhanced variant of the RANSAC that incorporates prior knowledge in the form of point ranking for robust parameter estimation in the presence of many outliers.The fundamental principle behind PROSAC is that, given a set of data points where the quality ranking is known, it is statistically more probable for the better-ranked data points to be inliers than the ones ranked lower.Therefore, PROSAC begins by sampling only the topranked data points during its initial iterations, incrementally expanding the sampling base as the iterations progress.For shape fitting, the normal consistency between neighbouring points is considered a good metric.Points with a consistent normal to their neighbours are ranked higher to consider for initial iterations of the shape estimation, leading to a more efficient result.We investigate two primitive shapes of the top common structural objects for segmenting PCD based on PROSAC: plane and cylinder, in Step 3, since cuboid and cylinder are two kinds of common geometry in the structural category.We use the model "plane" for cuboid detection (namely, planar walls, slabs, cuboid columns, and beams), and the model "cylinder" for curved edge or cylinder detection (namely, curved walls, cylindrical columns, and beams).Table 5 illustrates the type and the number of SAC models used in shape detection.
The number of SAC models depends on the object types.Specifically, two planar surfaces are visible and captured in the scanned PCD for planar walls and slabs, so we need to detect two planes for these types of instances.Similarly, we need to detect four planes in the PCD cluster for cuboid columns and beams.Normal vectors of surfaces are estimated from GD in Step 1 to increase the robustness of the PROSAC algorithm.We only need to detect one cylinder for a cylindrical column or a beam with the radius value recorded in GD.Curved walls are more complex because we cannot use "plane" to detect the shape as planar walls.
Since the curved wall is symmetrical, we can detect the curved two surfaces by detecting two cylinders.However, unlike cylindrical columns or beams that IFC has already recorded their radius attributes, few BIM models record the curvature of curved walls.For example, we usually do not know the radius and the centre of the curved wall in an IFC model.Therefore, we need to compute the radius of the curved surface for curved wall's shape detection.The basic idea for estimating the curvature of a curved wall is to simulate a curved wall from the plan view as a circular arc of a circle and compute the radius and centre of this circle.Figure 6 (a) elaborates on how to simulate a curved wall in brown from the plan view into a circle and how to compute the centre and radius by three points on the circular arc with the help of the AABB.Specifically, we first compute the dimension of the curved wall's AABB from the IFC model, and then compute the tangent and crossing points between the curved wall and the AABB based on the coordinates of the maximum X, minimum X, maximum Y, or minimum Y.We only select three pairs of coordinates of tangent or crossing points from the X-Y to compute the fitted circle's centre and radius.The intersection of the two perpendicular bisectors of the lines with arbitrary two tangent or crossing points is the centre of the circle.Assuming three tangent or crossing points are ( 1 ,  1 ), ( 2 ,  2 ), ( 3 ,  3 ), the centre (ℎ, ) is: The radius is: In real-world cases, curved walls have various scales and orientations, which will lead to different cases of tangent and cross points in the AABB.We investigated all five cases from the plan view (X-Y plane) based on the orientation and scale of the curved wall in an IFC model to infer the radius value of the simulated circle.Figure 6 (b) -(f) illustrate five cases of an IFC curved wall from the plan view.Specifically, Figure 6 (b) shows a curved wall with the shortest length of the arc, we need to add an arbitrary point on the arc besides two crossing points to compute the radius.The rest four cases show different orientations and lengths of the arc, along with the determined tangent and crossing points for computing the radius value.
In summary, Steps 2 and 3 aim to narrow down the whole PCD input into a small size point cluster and coarsely segment it by shape detection.It should be highlighted that all shapes defined here are infinite.Therefore, the PROSAC-based shape detection can only find the points corresponding to the required geometry in the EBB but cannot distinguish which point belongs to the selected instance and which point just belongs to the shape (e.g., noisy points) (Figure 7).To solve this problem, we developed a DBSCAN-based cluster optimisation algorithm in Steps 4 and 5 with the help of the RD to remove the noisy points and optimise the final segmentation result.

Optimization with Unsupervised Clustering (Step 4&5)
PROSAC can only extract the points corresponding to the defined shape but not the points corresponding to the designated instance, as illustrated in Figure 7.We need to segment the point cluster extracted from Step 3 and remove the noisy points to optimise the result.Furthermore, an instance can be represented by different numbers of point clusters due to various occlusions and clutter, we therefore apply DBSCAN to cluster points into different patches since DBSCAN is an unsupervised clustering algorithm which can identify clusters of arbitrary shapes and noise points without setting the number of clusters.The key step to obtain the desired clusters using DBSCAN is the setting of two parameters: the radius threshold, , and the minimum number of neighbours in a cluster, minPts.Specifically, the points classified into the same cluster should reflect a part of the specific instance, so the radius threshold should be larger than the density of PCD.Meanwhile, for each object, the reason for the existence of different clusters is due to the inner-connected instances (e.g., walls) or spaces that segment the PCD.Therefore, determining the thickness of the connected instances can help to set the radius threshold.In practice, due to scanning errors, density settings from the scanner, and scale discrepancies between the DI and as-built instances, it is optimal to adjust the setting of  based on the thickness values to achieve the best performance.For the minimum number of neighbours in a cluster, minPts, compared to the PCD size of each instance, it can be set with a small number (e.g., 60 -100) as a default to facilitate clustering and noise elimination.The point cluster can be split into different small clusters after applying DBSCAN in Step 4, where we need to further determine which clusters belong to the instance itself.
One instance may be represented by several different clusters in PCD because of occlusions and clutter from the connected instances.Therefore, we take advantage of RD generated in Step 1 to help rank, select, and merge the top number of clusters in Step 5 for result optimization.Kd-tree is used here to accelerate the computation speed.More specifically, we use ICR and BCR to rank the clusters for planar and curved walls; HR to rank the clusters for planar slabs including floors and ceilings.Since columns and beams are always built independently in a space such as a lobby or a room, we can directly segment the points by fitting models.
Planar and Curved Walls. Figure 8 uses an example of a planar wall to elaborate on how ICR and BCR help to rank and select clusters corresponding to the instance as the final extraction result.In Figure 8 (a), the red-edged wall in the IFC model is the targeted planar wall, where the surfaces with white-dash lines are the targeted plane.The targeted wall has two BCR and one ICR with three adjacent walls.In Figure 8 (b), we crop the PCD by an EBB. Figure 8 (c) shows the result of applying PROSAC with the shape of "plane" for surface 1.It is obvious that the extracted plane contains some occlusions and clusters due to an open door, an inner connected wall, and a border-connected wall.Then we apply DBSCAN to cluster points into different clusters (Figure 8 (d)).We then rank the clusters by the number of points.We only select top ( () + 1) clusters as the final extracted point cluster, where   refers to the number of the ICR.We apply the same method for curved walls.The optimized result for a wall is shown below: (  +1) 1 (7) Slabs.Space enumeration over a slab is a key point in ranking and selecting clusters after DBSCAN.We take advantage of HR to inversely infer the number of spaces on a slab.Figure Columns and Beams.Usually, we do not need to select clusters for columns and beams as they are always built without any ICR or HR.In a few cases, we can apply ICR to select top clusters for result optimization if columns or beams have intermediately connected with other instances.Despite all that, there are always some signs and marks on the columns and beams in real-world cases, which can cause false positive results after applying PROSAC in Step 3.For example, Figure 10 (a) shows a sign of fire exit on a column, of which the points can also be extracted from PROSAC.We then proposed a size restraint algorithm to remove these noisy points and optimize the result.Figure 10 (b) illustrates the proposed size restraint algorithm.We first compute two distances (m and n) from two parallel planes from the plan view, separately.Then we compute   as the longest distance from the central point, which serves as the threshold to select candidates and remove noisy points.The proposed size restraint algorithm is robust for the real-world dataset since it does not rely on the instance scale of the as-designed model.It should be noted that this algorithm is effective when none of the four surfaces of the column is completely occluded by clutter.In other words, all four surfaces of the column need to be at least partially exposed to help determine the distances between two pairs of parallel faces from a top-down perspective.On the other hand, we do not need to apply this size restraint on cylindrical columns and beams but only need to use the ICR to optimize the result if applicable.The optimized result for a column or beam is shown below:

Pseudo Code and Summary
Overall, the pseudo-code of matching design-intent planar, curved, and linear structural objects in PCD is proposed in Algorithm 1.At the beginning, for each selected instance in the IFC model,  , we use its GUID, I, to compute the object instance descriptor, OID.Specifically, if the GUID belongs to the object classes of walls, columns, and beams, we then compute the values of attributes for GD, ICR, and BCR; otherwise, if the instance belongs to the slabs, we compute the attributes' values for GD and HR.Afterwards, an EBB is computed from the attribute's value of the AABB to crop and reduce the size of the entire PCD.Then, the PROSAC shape fitting is conducted by determining the inlier points,  () , with the parameter hypothesis,  (ℎ) , of the designated shape, S, and the deviation value, .After that, DBSCAN is applied to divide the point cloud,  () , into different clusters,   , by finding the neighbours of core points.Each cluster,   , is ranked by the number of points in descending order.Finally, the top (  () + 1 ) of clusters are selected and merged as the final segmentation result,  () , for walls, columns, and beams; the top ( () ) of clusters are selected and merged as the final segmentation result for slabs.Our proposed method has some advantages compared with the SOTA methods: 1) An EBB is generated to crop the whole PCD, which can reduce the size of the input data into a small-scale cluster to decrease the computational complexity.
2) The bounding box is enlarged to make the method robust when there are distinct deviations in terms of position, orientation, and scale between the DI model and the as-built PCD.
3) The proposed method is robust for analysing raw PCD directly without any pre-processing or denoising.4) The proposed method is robust for highly occluded PCD with clutter.5) Compared with deep learning-based methods, the proposed method avoids the problem of a lack of training datasets and can directly identify the instance ID to facilitate progress monitoring and quality control.

Data Acquisition and Pre-processing
We acquired five datasets shown in Table 6 to validate our proposed method.Each dataset contains an as-designed IFC model and an as-built PCD of buildings from the real world.The IFC models reflect both Manhattan-World and non-Manhattan-World buildings.The characteristics of PCD from the selected five datasets are summarised in Table 7.It is noted that both mobile and terrestrial scanners were used to collect points.Five datasets contain different numbers of points and various complexities of clutter and occlusions, which is suitable for evaluating our proposed method.
ISPRS (International Society for Photogrammetry and Remote Sensing) WG IV/5 Benchmark on Indoor Modelling contains six pairs of public datasets with different complexities [51,52].We selected three datasets with different building styles, point scales, and clutter complexities from the ISPRS benchmark.First, the PCD of the TUB2 dataset was captured by Zeb-Revo in a two-floor building at Technische Universität Braunschweig in Germany.The indoor scene comprises a total of 24 rooms on two floors which are enclosed by walls and ceilings with different thicknesses and heights.It also contains 51 open and closed doors and 21 windows.The building is not furnished.Second, the PCD of the UVigo dataset represents one room and an entrance hall captured at the University of Vigo in Spain.
The scene contains one curtain wall, 20 windows and 7 open and closed doors.The scene also contains several columns with a circular cross-section and multiple rectangular surfaces.Third, the PCD of the GM dataset was captured in the Grainger Museum.This is a non-Manhattan-World building with many curved and diagonally positioned walls.Two further complex datasets were also produced from the real world.The PCD of Cambridge CEB was captured in the Civil Engineering Building (CEB) at the University of Cambridge with a FARO Focus 3D X330 Terrestrial Laser Scanner.It contains three floors with many furnished spaces.The PCD of ConSLAM was captured on a construction site at Whiteley's building in London.It contains columns with a rectangular cross-section.Overall, the geometry deviations of object instances in terms of position, orientation, and scale exist between each IFC model and its corresponding PCD in five datasets.As can be seen in Table 7, the levels of clutter and occlusions in the PCD of the GM, CEB, and ConSLAM datasets are relatively high.We need to register the as-built PCD with its corresponding as-designed IFC model before matching DI instances in PCD.The problem can be described as follows: =   +  +   (10) where   refers to the points in the PCD and   refers to the points in the model, i = 1,2, 3..., n.  is a rotation matrix and  is the 3D translation vector.  is the noise vector representing the discrepancy after the coarse alignment by  and .  can be considered as a slight movement for registration refinement to minimise the discrepancy between the transformed DI and the PCD.In this research, coarse registration is applicable as deviations are allowed between the as-designed BIM and the as-built PCD.We proposed two efficient methods for IFC to PCD coarse registration, summarized in Table 8.The first utilizes Recap for coordinate system adjustment while the second calculates the registration matrix in CloudCompare.We applied Method 1 to datasets TUB2, UVigo, GM, and CEB, and Method 2 to ConSLAM. Figure 11 showcases the registration results.Both methods aim to adjust the coordinate system of PCD, using the IFC as a reference.Method 1 is more convenient when the IFC's coordinate origin and the axis direction are easy to find, while Method 2 is more reliable when the IFC is too complex to locate the coordinate origin and the axis direction.
The results of the two methods show little difference towards the coarse registration in this study.

Experimental Results
We did experiments on the five datasets to evaluate the proposed method.For each class of the structural category, we selected several representative instances from five datasets.Specifically, we selected eight planar walls from TUB2, Uvigo, GM, and CEB datasets, where four of them were built in Manhattan-World type and four of them were in non-Manhattan-World type.Each planar wall has different complexities in terms of occlusions and clutter caused by doors, windows, furniture, and sundries.We also selected four curved walls and four slabs from GM, TUB2, UVigo, CEB, and ConSLAM datasets, including both Manhattan and non-Manhattan types.Besides the occlusions and clutter mentioned before, different curved walls have different constant curvatures, and the slabs have various shapes from the plan view.Finally, we selected five columns including both cylindrical and cuboid geometry from UVigo and ConSLAM datasets.In addition to the numerous occlusions and clutter, the as-designed models are quite different from the as-built columns in terms of dimensions and positions.In summary, we chose a wide range of planar, curved, and linear instances with different complexities to validate the proposed method's efficiency and reliability in automating the process of matching.
Having discussed the variety and complexity of the selected instances, we implemented the proposed method in algorithms written by C++ and Python with Point Cloud Library (PCL) [69] and IfcOpenShell.As Section 3.4 explained the process of setting DBSCAN's parameters for clustering, we computed the minimum thickness of the connected walls in IFC models for the selected instances and set a value slightly below this as the radius threshold.Specifically, for planar walls and slabs, Eps was set around 0.1m for all datasets; for curved walls, Eps was set around 0.6m in the GM dataset.The density of PCD in the GM dataset is sparser and the connected walls are thicker than other datasets, so the Eps is larger than others.All settings are larger than the PCD's density.It was not applied to columns since there are no inner connections.The value of minPts was set as 60 to cluster the noisy points.Figure 12 demonstrates the most representative experimental results.Specifically, the Figure shows, from left to right, the as-designed model (IFC instance), the EBB, the result of the extracted point cluster, and the ground truth.The left column of the figure illustrates the selected IFC instance in red from DI, which is used to compute OID.The second column of the figure illustrates the cropped PCD within the EBB.The following column shows the point cluster of instance matching results after applying the proposed method.The right column shows the ground truth of the instance point cluster which was generated from the PCD manually.Specifically, TUB2-PW2 shows the segmented result when there are two inner connections on one surface of the wall.GM-PW5 shows the result of a non-Manhattan planar wall with significant clutter and occlusions.For GM-CW1, there are 4 thin planar walls in grey connected to one surface of this curved wall, which are considered as the border connection since they are very close to the boundary of CW1.For GM-CW4, it contains significant clutter and one inner connection for one surface of the wall.TUB2-S1 shows the segmented result of the slab connecting two floors.ConSLAM-RC2 shows the segmented result when there are signs and clutter on the column.Finally, ConSLAM-RC4 shows a successful segmentation result when there is a distinct scale deviation.
Throughout our experiments, we applied our proposed method to a total of 8 planar walls, 4 curved walls, 4 slabs, and 5 columns with different complexities in PCD.The overall experimental results and visual representations are presented in Appendix Figures 13 -15.In conclusion, the experimental visualization presents a good result in real-world datasets for matching structural objects in the scan-vs-BIM environment with significant occlusions, clutter, deviations, and scales.

Evaluation Metrics and the Ground Truth
We applied a point-to-point comparison method to evaluate the experimental results against the ground truth (Please see the overall experimental results and visualization representations in Appendix Figures 13 -15).We first assigned a unique label to each point besides the X, Y, Z coordinates before computing the result cluster and generating the ground truth cluster.Then, we matched each pair of corresponding points between these two sets of point clusters to compute Precision, Recall, and IoU.This evaluation method is more accurate than the surface-to-surface comparison since it computes the correspondence directly, rather than after transforming the points to surfaces.The metrics formulas are shown below: where TP refers to true positive and FP refers to false positive.
where FN refers to false negative.

𝐼𝑜𝑈 = 𝑇𝑃/(𝑇𝑃 + 𝐹𝑃 + 𝐹𝑁) (13)
The evaluation between the result and the ground truth is shown in Table 9 and Table 10.The overall mean precision is over 0.962, the mean recall is over 0.934, and the mean IoU is over 0.914.Overall, our proposed solution is robust with high precision and recall on planar, curved, and linear structural instance segmentation in the Scan-vs-BIM context with various complexities.To the best of our knowledge, the proposed method is a novel workflow, and no similar method has been proposed before.

SOTA Method Comparison
In this section, we aim to compare the performance of our proposed method with the SOTA methods to demonstrate its superior performance in real-world, complex environments.Previously, we introduced and summarised five SOTA method types in the Scan-vs-BIM context in Section 2.1 Table 1.To improve the SOTA methods' capability and robustness in different cases, we here merged Point-to-Point and Point-to-Surface methods as the improved SOTA method 1, and the last two methods (Feature-based and RANSAC-based methods) as the improved SOTA method 2 We do not consider Hough transform in evaluation since this method is only applied for cylinder detection, and sensitive to the noisy PCD with high occlusions and clutter.
Specifically, for SOTA method 1, we directly selected the points within a threshold from the surface after calculating the nearest distance between the as-built PCD and the centre of the as-designed model.For SOTA method 2, we used normal vector, shape, and length information together with RANSAC to select points from the as-built PCD.The improved SOTA methods can have a more stable performance in different scenarios from real-world datasets.Hence, the comparison results can be more convincing to prove the better performance, feasibility, and robustness of our new proposed method.
We conducted the comparison experiments from two perspectives: instance types and deviation scenarios, because we want to understand the SOTA method's performance towards 21 instances as we used for validating our proposed method, and the SOTA method's performance with different deviation scenarios in terms of position, orientation, and scale.Table 11 shows the matching evaluation results based on different instance types.Two SOTA methods perform fairly when detecting slabs.SOTA method 1 is better for column and beam segmentation in precision while SOTA method 2 performs better for planar wall and slab segmentation.SOTA method 2 has better overall performance in mIoU.Both two SOTA methods cannot deal with curved walls directly since the geometry information is not complete and cannot be extracted directly from the original IFC model.On the other hand, the SOTA method's performances in different deviation scenarios in terms of scale, orientation, and position also need to be evaluated.These scenarios often happen in the realworld context and our proposed solution can perform well towards these situations.Table 12 shows the evaluation results for both SOTA methods against the ground truth in four scenarios: 1) the IFC instance model is directly aligned with PCD; 2) the IFC instance model is reduced to around 25% smaller scale compared with the as-built instance; 3) the IFC instance model is rotated approximately 20 ∘ from the as-built instance; 4) the IFC instance model is moved around 20% from the original position.Finally, Table 13 compares the mean precision, mean recall, and mean IoU between the proposed method and improved SOTA methods.It is evident that our proposed solution performs much better than SOTA methods in all cases.

Strengths and Limitations
Our proposed method offers several advantages over existing SOTA approaches.First, we apply an EBB to crop the whole PCD, which can reduce the data size and computational complexity, as well as enhancing the method's robustness against position, orientation, and scale deviations between the DI model and as-built PCD.Second, the method does not require any pre-processing, such as denoising, for the raw PCD; it is adept at handling PCDs with significant occlusions and clutter for both Manhattan and non-Manhattan-Word buildings.Lastly, unlike deep learning techniques, our method does not require extensive training datasets; it can directly segment PCDs on the instance level and identify instance IDs by leveraging IFC models in the Scan-vs-BIM context.With the high-precision result of instance matching, the proposed method can be employed for construction progress monitoring and quality control for project management.
Although our method performs high-precision results, there are several factors that can influence instance segmentation in the Scan-vs-BIM environment.First, too large intervals among scanned points can cause false negative results when segmenting point clusters.For example, in our cases, points for planar walls will be considered as noise and removed if the distance is larger than around 0.1m based on experience, the right part of TUB2-PW2's result cluster in Figure 12 is an example.This situation typically arises when the distance between the scanner and the object is considerable.Due to the limited precision of the scanner and the increased scanning distance, the sampled points become sparse and can be possibly detected as noise by our method.Second, two instances with seamless and smooth surfaces from IFC models are not easily distinguishable in PCDs and thus will cause false-positive segmentation results.Additionally, for cuboid columns, the size restraint algorithm may not be effective in removing points of signs if one surface of a column is completely occluded and this surface cannot be detected in PCD.This can lead to false positive results.Finally, the floor junction may also be included during segmenting if the edge of an interior wall does not connect to another wall, leading to false positive results.A potential solution is to first determine whether the wall's edge connects to another wall; if not, further segmentation methods need to be developed to address this problem.

Conclusion
This paper first examines the current SOTA methods with their strengths and limitations in detecting and segmenting structural object instances from PCD in the Scan-vs-BIM context.Such insights are helpful for future researchers aiming to further improve on these methods.Subsequently, we proposed a novel method that can rapidly, efficiently, and precisely segment common planar, curved, and linear structural instances from PCD in complex and real-world environments.For the academic contributions, the proposed method exhibits robustness in scenarios where: 1) the input PCD contains numerous occlusions and clutter; 2) the as-built object instances have significant deviations from the as-designed model in terms of position, orientation, and scale; 3) the as-designed or as-built are Manhattan-world buildings or non-Manhattan-world buildings.The proposed method following a top-down idea enhances the current SOTA in computer vision-based instance matching in the Scan-vs-BIM context.It has wide-ranging academic implications for studies in similar or allied domains, and potentially can act as a benchmark for upcoming research.For practical contributions, the automation of DI instance matching in PCD can significantly reduce manual checking time, leading to faster project progress monitoring and quality control at the construction stage.As this matching algorithm can be implemented at different timestamps during the building's construction, the discrepancies between DI and as-built status can be detected and reported timely.It can help avoid costly rectifications in the later stages of construction.Finally, the matching result can aid in the updating of DT for infrastructure, which is crucial for modern facility management and predictive maintenance.
In future research directions, standardising the matching solution for complex mechanical and electrical instances will broaden the method's applicability.The automation of matching DI top frequent object classes, including pipe segments, duct segments, pipe joints, terminals, and lighting fixtures, will make the method more generic to support the geometric DT's maintenance.Also, optimizing or automating the registration process of complex PCD with its respective DI file can enhance the congruence between the physical and digital realms.Lastly, addressing the inconsistencies such as gaps and truncation in extracted point clusters will lead to more refined models for streamlining the process of updating geometric DTs.For future potential applications, the proposed method integrated with the updated DT can aid in monitoring construction progress, identifying discrepancies early, and ensuring adherence to design specifications.Building managers can utilize the method to maintain up-to-date DT of facilities, helping in predictive maintenance and space optimization.Furthermore, updating a geometric DT automatically and dynamically can foster better collaboration between architects, engineers, and construction professionals, ensuring everyone works from the most accurate and up-to-date data during a building's lifecycle.

Acknowledgement
later in the section of research methodology.The flowchart in blue on the left side of the figure shows the processes of the coarsely aligned PCD with a recursively segmenting algorithm from Step 2 to Step 5, with the support of the outcome of Step 1.More precisely, an object instance descriptor of the designated instance generated from an IFC model is used to support cropping PCD within an enlarged bounding box in Step 2, segmenting the remaining PCD into a fitted shape in Step 3, and optimizing the final output cluster in Steps 4 and 5.

Figure 2
Figure 2 Two examples of axis-aligned bounding boxes of diagonally positioned walls.
indicates one example of ICR between the targeted wall 0 and the connected wall 2, as well as three examples of BCR between the targeted wall 0 and the connected wall 1 (in three cases).Precisely, we define ICR by first computing the maximum and minimum XY values of AABB of two related walls, and then discriminating ICR by the equation (2) shown below:

Figure 5
Figure 5 An example of an Axis-Aligned Bounding Box (AABB) in magenta on the left and an Enlarged Bounding Box (EBB) in yellow on the right, for an object instance of a cuboid.

Figure 6
Figure 6 Curved walls in brown from the plan view (X-Y plane) in IFC models.(a): calculate the centre and the radius of the fitted circle.(b) -(f): five cases of a curved wall in an axis-aligned bounding box.

Figure 7 (
Figure 7 (a) narrowing PCD into an EBB for a planar wall; (b) & (c) the points of two surfaces extracted by plane shape detection, the points in the red circle belong to the wall itself, while the points in the blue circle just belong to the shape, considered as noisy points.

Figure 8
Figure 8 An example of how the ICR supports DBSCAN to rank clusters for the final points extraction.(The IFC model and PCD are from an open source [51])

9Figure 9
Figure 9 An example of the HR between spaces and walls (left) / doors (right) in an IFC model.(The model is from an open source [51]).

Figure 10 (
Figure 10 (a) An example of signs and marks on the column in a real point cloud cluster; (b) the plan view of four surfaces (solid black lines) fitting the point cluster (grey dots) of the as-built column,  is computed by surface distances as size restraint.

Figure 11
Figure 11 The as-designed IFC model (left), the corresponding as-built PCD (middle), and the coarse registration result (right) for the five datasets.

Figure 12
Figure 12  The representative selection of PCD matching results for planar walls (PW), curved walls (CW), Slabs (S), and cuboid columns (RC).From left to right: the selected IFC instance in red; the cropped PCD within the enlarged bounding box; the result of extracted point cluster using the proposed method; the manually generated ground truth.
This work is funded by the European Commission's Horizon 2020 for the CBIM (Cloudbased Building Information Modelling) European Training Network under agreement No. 860555.Appendix: Visualization of all matching results (Figures13 -15).

Figure 13 PCD
Figure 13 PCD matching results for planar walls (PW).

Figure 15 PCD
Figure 15 PCD matching results for cylindrical columns (CC) and cuboid columns (RC).

Table 2 SOTA
Deep-learning semantic segmentation evaluated on S3DIS.

Table 3 SOTA
Deep-learning Instance segmentation evaluated on S3DIS.

Table 4
The structure of the proposed IFC-based Object Instance Descriptor (OID), including two sub-descriptors: Geometry Descriptor (GD) and Relationship Descriptor (RD).

Table 5 SAC
Models used for Geometry Detection in PCD *Curved walls are more complex.Details are explained in the body paragraph.

Table 6
Real World Datasets Used to Validate the Proposed Method.
*Although CEB itself is a Manhattan-World building, the coordinate system is not aligned with CEB, making it processed as a non-Manhattan-World building.

Table 7 PCD
Characteristics for the Five Datasets Used in This Study.

Table 8
Proposed Solutions for IFC to PCD Coarse Registration

Table 9 Matching
Result Evaluation by Instance Code: Proposed Method vs the Ground Truth (PW: planar wall, CW: curved wall, S: slab, CC: cylindrical column, RC: rectangular column)

Table 10 Matching
Result Evaluation by Instance Type: Proposed Solution vs Ground Truth

Table 11 Matching
Result Evaluation by Instance Type: Improved SOTA Method 1, Improved SOTA

Table 13 Matching
Result Evaluation: Improved SOTA Methods vs Our Proposed Method