Enriching thermal point clouds of buildings using semantic 3D building models

Thermal point clouds integrate thermal radiation and laser point clouds effectively. However, the semantic information for the interpretation of building thermal point clouds can hardly be precisely inferred. Transferring the semantics encapsulated in 3D building models at LoD3 has a potential to fill this gap. In this work, we propose a workflow enriching thermal point clouds with the geo-position and semantics of LoD3 building models, which utilizes features of both modalities: The proposed method can automatically co-register the point clouds from different sources and enrich the thermal point cloud in facade-detailed semantics. The enriched thermal point cloud supports thermal analysis and can facilitate the development of currently scarce deep learning models operating directly on thermal point clouds.


Introduction
A thermal point cloud combines synchronized thermal radiation data and the laser point cloud, effectively capturing the thermal characteristics of objects.In the case of building objects, the radiation and temperature differences can be caused by building operation, material variation, aging, and physical damage.Therefore, thermal point clouds of buildings can be applied in hidden structure detection, energy inspection, and heritage protection (Ramón et al., 2022).When interpreting a building's thermal point cloud, we must consider the element types and comprehensive geometric descriptions.Notably, geometryinduced thermal variations such as cracks and distortions can be estimated from dense point clouds.However, directly inferring facade elements, such as doors, windows, roofs, and walls, from thermal point clouds proves challenging.
Semantic information can be obtained by manually labeling thermal point clouds or by performing semantic segmentation.Directly projecting TIR images to a 3D building model for thermal point cloud generation will render unrelated objects, such as vehicles and pedestrians, to the facade, leading to misinformation.Therefore, it is crucial to synchronize and align the laser point cloud with TIR images to ensure accurate radiometric representation.However, directly labeling the laser point clouds is time-consuming and requires extensive familiarity with the study area.Additionally, existing methods for point cloud semantic segmenting facade elements often lack accuracy, especially for categories like clutter and windows (Su et al., 2022, Matrone et al., 2020).Given these challenges, it is advisable to augment the facade-level semantic information with data from reliable sources.
A potential to fill such data-scarcity gap exhibit worldwideavailable semantic 3D city models.Rich building-related semantics are encapsulated in semantic 3D building models at LoD3, characterized by highly-detailed and object-wise semantics at the facade level (Gröger et al., 2012, Kolbe andDonaubauer, 2021).Moreover, such LoD3 models possess highly accurate absolute georeferencing accuracy, reaching up to the cm-level (Roschlaub and Batscheider, 2016).Recent trends imply growing availability of LoD3 building models, since there are new LoD3 datasets emerging1 as well as novel methods investigating automatic LoD3 reconstruction (Wysocki et al., 2023b, Hoegner and Gleixner, 2022, Huang et al., 2020).
Assuming the infrequent occurrence of substantial structural and morphological changes in urban architecture, we believe thermal point clouds can be semantically enriched by fusion with LoD3 models.However, three factors need to be considered.First, the LoD3 model and point clouds are organized in different data formats (Abreu et al., 2023).The structured LoD3 geometric representation frequently follows the boundary representation (B-Rep), as per the CityGML standard (Gröger et al., 2012).In contrast, point clouds are commonly represented as a set of unstructured points (x, y, z).Second, the shared overlap is different.The LoD3 model contains envelopes of the complete building, including all the outer-observable details.The buildings in the point clouds may be incomplete due to occlusions and scanning platforms.Besides buildings, point clouds include all the objects in the scene, such as vehicles, traffic signs, and pedestrians.Moreover, the objects' geometric features are represented differently.Although the LoD3 models are highly detailed, they represent a generalized building geometry.In contrast, point clouds provide non-generalized, raw data representing events and states occurring only at the time of recording, such as opened doors and blinded windows, which is also crucial for thermal analysis.
Considering all the factors, we propose a workflow to transfer the semantic information from the LoD3 model to the thermal point clouds.We first transfer the LoD3 model to semantic point clouds and then co-register the point clouds.The semantic labels are enriched in thermal point clouds according to registered building models.The enriched thermal point clouds disclose thermal attributes for different building elements for analysis.The co-registered point clouds, on the other hand, can • Our experiments validate the results of enriched information from the model and application to the thermal analysis.
The structure of this paper is organized as follows: In Section 2, we summarise the related work, and our proposed methods are presented in detail in Section 3.Then, the data and experiments are described in Section 4, and then the results are discussed in Section 5. Finally, some conclusions are drawn in Section 6.

Semantic 3D building models
Semantic 3D city models comprehensively describe structures, taxonomies, and aggregations on a city, regional, and even national scale.Internationally, the standard CityGML, established by the Open Geospatial Consortium (OGC) (Kolbe, 2009, Gröger et al., 2012, Kolbe et al., 2021), is utilized for the representation and management of city models.CityGML facilitates the modeling of urban objects with their 3D geometry, appearance, topology, and semantics at four different LoD.The latest data model of CityGML 3.0 adheres to the ISO 191xx series of geographic information standards, and CityGML datasets are commonly encoded using either the Geography Markup Language (GML) or CityJSON (Kutzner et al., 2020, Ledoux et al., 2019).
Since urban dwellings are the cornerstone of each city, most existing semantic 3D city models comprise buildings (Biljecki et al., 2015).LoD1 and LoD2 building models are currently widely available, as underscored by the example of approximately 220 million models available in Germany, Japan, the Netherlands, Switzerland, the United States, and Poland2 .This broad adoption owes mainly to the robust 3D reconstruction algorithms and available building footprints combined with aerial observations (Roschlaub andBatscheider, 2016, Haala andKada, 2010).Although LoD1 and LoD2 possess building semantics, they lack detailed facade semantics, which is pivotal for facade-level point cloud labeling.This gap can be filled by LoD3 building models, characterized by descriptive facades, composed of objects such as windows, doors, balconies, and even underpasses (Wysocki et al., 2022).Currently, the automatic LoD3 reconstruction is an active field of research proposing various promising methods and input datasets to solve the challenge (Wysocki et al., 2023b, Hoegner and Gleixner, 2022, Huang et al., 2020).

Model to Point Clouds Registration
Registration of 3D models and point clouds is typically done by feature matching.Point clouds offer accurate and detailed geometry information about existing structures.Therefore, they are widely used as data providers for 3D model reconstruction and manual modeling.Extracted planes are used as features for coregistration due to their relatively simple representation and frequent occurrence in man-made objects.(Bosché, 2012) propose the semi-automatic method to register construction sites with Building Information Modeling (BIM) models.Three corresponding planes are required to be manually selected for coarse registration.(Gruner et al., 2022) also focus on planes but generalize BIM faces as terrestrial laser scanning (TLS) patches with a point and a normal vector.The detected faces and planes are organized manually by connected relation.Then, the model faces are co-registered to the point cloud patches for monitoring the construction process.Besides man-involved work, automatic methods are also investigated.(Sheik et al., 2022) group the detected parallel planes as descriptors to register as-built point clouds and as-plan BIM models.The use of planes avoids setting control points for registration, but a sufficient number and unique patterns are required.Another often-used set of primitives is lines.(Kaiser et al., 2022) propose a fully automated method to register photogrammetric point clouds to a building model with lines for the indoor scene, where sufficient and well-distributed corresponding line features from images are required.(Chen et al., 2022) conducts a coarse-to-fine registration from BIM to the point cloud.Raw camera poses are used to coarsely align the model, and then an adopted iterative closest point (ICP) to achieve the fine registration.
Despite registering point clouds to BIM models, which targets to monitor and detect the changes, only a limited number of research groups have delved into the realm of coregistration between street-level point clouds and semantic 3D city models in our extensive investigation.(Goebbels et al., 2019) primarily centers around point clouds generated from images through the structure from motion (SfM) algorithm, utilizing radiometric features for prefiltering, which may inadvertently eliminate valid building features.(Goebbels and Pohle-Fröhlich, 2018) detect the footprint of buildings from point clouds and LoD model.A mixed integer linear program is employed to identify correspondences between 2D lines and points.In these cases, sufficient line and point features are necessary to form unique patterns for registration.(Lucks et al., 2021) consider only fac ¸ades points by random forest and register to LoD1 model for trajectory optimization.This approach, however, requires training data and initial transformation information.

Point cloud co-registration
Point cloud registration has long been a research topic.It involves aligning point clouds from different sources, with low overlap, or from metrically inaccurate datasets.The standard registration, such as ICP (Segal et al., 2009), can deal with common situations with sufficient overlap and similar point density but can hardly handle complex situations.Using control points (targets) could solve this issue leading to a high and traceable accuracy (Janßen et al., 2022, Janßen et al., 2023), but requires manual field work and is thus not scalable to large data sets and it lacks autonomy.Therefore, most point cloud registration pipelines inherently have a workflow containing two steps: coarse registration for initial transformation and refined transformation with denser correspondences (Xu et al., 2023).
The coarse transformation uses sparse feature-based correspondences, and it is crucial for non-georeferenced point clouds.Key points (Barnea and Filin, 2008), lines (Chen and Yu, 2019), and planes (Li et al., 2022) can all be used as features for registration.Moreover, a combination of the features is also used, like 4PCS (Aiger et al., 2008) and Super4PCS (Mellado et al., 2014).When the initial relative poses are given by global navigation satellite system (GNSS) or manually processed, the coarse registration is typically unnecessary.The fine registration can adjust the geometric transformation to achieve better accuracy.It usually iteratively updates the transformation matrix to minimize the point distances with denser correspondences.The ICP method is widely adopted in the field owing to its simplicity and efficiency.Numerous algorithms have stemmed from the ICP framework, exemplified by references such as (Yang et al., 2015).Recently, deep learning (Lu et al., 2019, Zhang et al., 2022) methods have become increasingly prevalent for point cloud registration.However, a notable challenge persists in managing the intricacies of large study areas and meeting the demand for extensive training datasets.This issue underscores the current limitations in the application of deep learning techniques to point cloud registration, particularly in addressing the complexities posed by expansive geographical contexts.

Method
Our

Point cloud generation
To cope with the cross-domain gap, we opt to homogenize the two distinct representations.The approach of generalizing the point cloud feature to match the model primitives is challenging, as shown by the 3D building reconstruction research (Wysocki et al., 2023b).The occlusion and incompleteness of laser point clouds lead to inefficiency and false matching.Therefore, LoD3 geometry is sub-sampled to a set of points corresponding to the representation of the thermal point cloud.
Thermal point cloud generation The thermal point clouds are generated by projecting the thermal texture from TIR images onto the mobile laser scanning (MLS) point clouds with position information (Zhu et al., 2023).The MLS point clouds and TIR image sequences are captured with the same platform.
When the relative poses of the thermal camera are estimated, the points in the point clouds find its corresponding points in the TIR image by co-lineary equation (eq. 1) (Zhu et al., 2021).
where ui = image coordinates Xi = point cloud coordinates K = camera parameters, including aspect ratio s, focal length f and principle points (cx, cy) R = 3 × 3 rotation matrix T = translation matrix When the matrix K, R and T are obtained from pose parameters, the corresponding intensity values of point clouds are calculated from the image and rendered to the point cloud.

Model point cloud generation
The semantic 3D building models follow the paradigm of boundary representation (B-Rep), where each modeled object has its geometrical, outerobservable surface explicitly described by a set of vertices (Kolbe and Donaubauer, 2021).Moreover, each object in the model has assigned semantics and shall not overlap with other objects.As illustrated in Figure 2, we leverage these traits Figure 2. Semantic LoD3 building model hierarchy comprises LoD2 (orange boxes) and LoD3 (green boxes) features, where each object has a unique identifier (id) and class (label).Our object-level sampling approach is shown on a WallSurface in purple, where lines indicate the distance (sr) between surface-sampled points.Adapted from (Wysocki et al., 2023a) to our advantage by performing object-oriented point surface sampling on a regular grid (purple) at the given sampling rate sr; the parameter is chosen accordingly to the expected thermal point cloud density.Each sampled point inherits the semantic class of a leaf object labeli and its associated unique identifier idi.This results in a point cloud M Ci extended by a scalar value labeli and idi, as in Equation 3. [xi, yi, zi, labeli, idi] (3)

Co-registration
Owing to the point cloud generation step, we formulate the alignment as the cloud-to-cloud co-registration problem.In our work, only rigid transformations between point clouds are considered, as shown in eq. 4.
where T Ci = coordinates of thermal point clouds M Ci = points in the model point clouds Considering the limited overlapping areas, the features and details of building facades are important for co-registration with point clouds.Unlike LoD1 and LoD2, the LoD3 building models comprise 3D facade elements (Kolbe and Donaubauer, 2021).The increased information makes it possible to locate the corresponding features with point cloud facades, especially in highly repeated patterns.However, the features of the windows and doors may not have the same representation in measured point cloud and the model point cloud of the LoD3 model.
One-to-one corresponding features cannot be guaranteed.Under this condition, we propose a coarse-to-fine registration by combining a feature-based method using Fast Global Registration (FGR) (Zhou et al., 2016) and our adopted version of plane-based ICP (Rusinkiewicz andLevoy, 2001, Wysocki et al., 2021) as fine registration.
FGR uses features to calculate the correspondences and estimate the transformation matrix.It first calculates the FPFH (Fast Point Feature Histograms) features of the point clouds.The initial corresponding points are established by feature matching with nearest neighbor pairs.It uses the Reciprocity test and Tuple test to improve the inlier ratio of the correspondences set.Due to the differences in feature representation, noisy correspondences cannot be avoided.Then, the pose are optimized such that distances between corresponding points are minimized.The optimization function for the optimal transformation matrix estimation is expressed as in (eq.5): where pi = point coordinates in thermal point clouds qi = corresponding point coordinates in the model point clouds ρ = robust penalty FGR uses a scaled, well-chosen German-McClure estimator to reduce the computation, and Black-Rangarajan duality is used to optimize eq. 5 with a line process over the correspondences.Then, the optimization objective can be turned into a leastsquares objective and the Gauss-Newton method is used to find the solution.
After coarse registration, a fine registration is further updated by point-to-plane ICP.Although the FGR provides the initial transformation result, it is insufficient for detailed analysis due to the false matching from different target and source point cloud feature representation.We adopt the point-to-plane ICP variant (Rusinkiewicz and Levoy, 2001) and model-based height rectification (Wysocki et al., 2021) with both height and center point rectification.Assuming that the model and point clouds are all related to the ground, the thermal point cloud is lifted to the same basic height of the model.The same applies for the center points of the planes.We leverage the algorithm to align two point clouds while minimizing the distances between corresponding points belonging to the target and source plane.Since thermal point clouds only capture the facades of buildings, which follow planar-like shapes, the main plane of the buildings are extracted as a base for the registration.We perform the plane extraction using RANdom SAmple Consensus (RANSAC) algorithm (Schnabel et al., 2007) where the main planes are detected in both point clouds.Our approach assumes that the closest planes are detected and aligned from the coarse registration.The algorithm initiates with an initial alignment represented as identity matrix, where the target point cloud is approximately aligned with the source point cloud using an initial transformation matrix of from coarse registration.Subsequently, a nearest neighbor search is conducted for each point in the source cloud to find its closest counterpart in the target cloud with optimization eq.6 where np = normal of the point p The maximal corresponding distance corresponds to the dmax.
The convergence criteria are met if the root mean squared error (RMSE) reaches trmse (eq.7) threshold and performs tit iterations.
Upon meeting these conditions, the algorithm concludes, providing the final alignment result in the form of a transformation matrix.The updated transformation parameters are used to refine the position of the source point cloud.

Semantic enrichment
In Section 3.2, the transformation matrix is calculated to register the thermal point clouds to the model point clouds.The geo-reference coordinates of thermal point clouds can be calculated by applying the estimated transformation matrix.After the transformation, the thermal point cloud is aligned to the model point cloud.Assuming there are no changes in building details for the laser point clouds and model; the semantic labels for windows and doors will remain the same.Therefore, the points in thermal point clouds should have the same labels as in the model point clouds.Considering the differences in sampling rate and locations, a threshold distance is set to minimize the false correspondence.For each point in the thermal point clouds, the closest point in the model point clouds is calculated, and the label is given to the laser point.If the nearest neighbor points do not have corresponding labeled points in the model point cloud within a certain threshold, they can be regarded as noise and labeled as "unlabeled".This avoids mismatched labels from other objects, such as trees and pedestrians, while keeping the labels for the buildings.The test site is around the main campus of the Techinical University of Munich (TUM).The thermal point clouds were generated by combining thermal image sequences and MLS point clouds from the TUM-MLS dataset (Zhu et al., 2020).The TUM-MLS dataset was measured using a mobile platform, which includes two laser scanners and a thermal camera.The poses of TIR images were estimated from the GNSS and the inertial measurement units (IMU) system from the integrated platform.The LoD3 model was selected from the TUM2TWIN dataset3 .We selected one building from the test dataset close to the main gate characterized by a partial coverage (the socalled building 23), as shown in Figure 3.We set the sampling rate at sr = 0.1, which resulted in a uniformly sampled point cloud of a 0.1 m distance.The generation of thermal point clouds and semantic labeling was done using c++ and pcl library(1.81)(Rusu and Cousins, 2011).With 32G RAM, and an i7-6000 @3.4 GHz CPU, it takes approximately 346.06s for labeling.The FGR was performed using the code from (Zhou et al., 2016).Further experiments were performed using the Feature Manipulation Engine (FME) version 2020.01 and Open3D (Zhou et al., 2018).The implementation is available in a public repository4 .

Result and Discussion
The  To estimate the co-registration result, we calculate the RMSE distances (eq.7) and the fitness (eq.8) (threshold = 2m).As we show in Table 1, our fine-registration approach can reach a high improvement rate.In the case of our tested sample, the RMSE has decreased approximately five times (1.46m vs. 0.33m), while the fitness score has improved by approximately 65% (0.54 vs. 0.88).To further validate our result, a comparison experiment was conducted by manually selecting corresponding points and estimating the transformation matrix.Six corresponding points were manually selected from model point clouds and thermal point clouds and were transformed with the estimated matrix (Figure 6).The RMSE between the ground truth and fine registration was 0.4m.Regarding the fitness and RMSE, our proposed method achieved a comparable level of accuracy and better fitness to the reference (Table 1).Table 1.The co-registration results for our fine registration approach (↑ indicates the more the better, ↓ otherwise).
Figure 8. Semantic analysis of thermal properties and the result.
The average intensity and standard deviation for each class are calculated.

Conclusion
In work aims to transfer the semantic information from the LoD3 model to the corresponding thermal point clouds.The general workflow is shown in Figure 1.First, the point clouds are generated from laser point clouds and TIR image sequence, and an LoD3 model respectively.Then, thermal point clouds are aligned to the model point clouds to obtain the transformation matrix.After registration, the semantic information from model point clouds is transferred to the thermal point clouds for analysis.

Figure 1 .
Figure 1.The general workflow

Figure 3 .
Figure 3. (a) Original MLS point cloud and (b) LoD3 model Figure.3(a)  shows an example of the TUM-MLS point cloud with intensity, and (b) demonstrates the LoD3 building model.
generated thermal point cloud and model point cloud are shown in Figure4.Thermal point clouds(Figure.4 (a)) include building facades and other objects in the TIR images, such as traffic lights, pedestrians, and vehicles.Thermal point clouds show the geometry of building elements, including different shapes of windows.The moldings, balconies, and special decorations are also recorded as they are.However, the rooftop and some corners (e.g., the upper right corner) are missing due to scanning mode and height limitations.Compared to the raw point clouds, thermal point clouds keep the original geometry features while attaching thermal attributes as intensity for the points.The different intensities represent the temperature and can reveal inner structures like heating pipes.The higher intensity around windows shows wooden frames and some indoor rooms with higher thermal temperatures.Model point clouds(Figure.4 (b)) generated from the LoD3 model describe the semantic information with different colors for windows, doors, walls, roof, and ground.Unlike laser point clouds, where the laser penetrates the window glasses and leaves empty spaces, the model point clouds block the window areas with indepth planes.Moreover, all the functional segments are labeled, but non-functional decorations are simplified compared to the laser point clouds.

Figure 5 .
Figure 5. Registration result (a) Transformation after FGR and (b) after fine registration.Target: orange, model point cloud; Source: blue, thermal point cloud.

Figure
Figure 7. Semantically enriched thermal point clouds.Method vs. GT LoD3 Fitness ↑ RMSE ↓ FGR registration 0.54 1.46 Fine registration (ours) 0.88 0.33 Manually 0.87 0.33 this paper, we propose a feasible workflow to enrich the semantic information of a thermal point cloud given a LoD3 model.The proposed method converts the LoD3 model to a point cloud and registers a thermal point cloud to the model through point cloud co-registration.With the proposed coarseto-fine registration, the thermal point clouds can be registered to semantic model point clouds regardless of limited overlap and feature differences.The co-registration results have comparable accuracy to the referenced manually registered results.Finally, the semantic labels from the LoD3 model are assigned to the thermal point clouds for analysis.This work is not limited to thermal point clouds but also to all the co-registration tasks, from laser point clouds to building models requiring semantic label transfer.The enriched results improve the point cloud labeling pace by giving knowledge and enhancing the efficiency of semantic data generation.The thermal point clouds with LoD3 labels can serve as supportive data for further urban study and algorithm development, such as testing and training deep learning models.With the given pose of the image, the labels can be back-projected to the TIR images for processing and supportive analysis, as shown in Figure9.In this work, we combine point cloud geometry for thermal anomaly interpretation, but also a bi-directional information exchange can be pursued: The thermal properties might be mapped onto the LoD3

Figure 9 .
Figure 9. Semantically enriched TIR image.(a) Original TIR image.(b) Semantic enriched TIR image.objects enriching the LoD3 in radiometric thermal features for visualization and building operation monitoring (Biswanath et al., 2023) For future work, it is worth further investigation into the topic of robust methods for model and point cloud co-registration, especially in large-scale datasets.Though this work proposed the initial tasks for single building co-registration and semantic enrichment results, how to improve the efficiency is to be explored.How to use the features in the laser point clouds and LoD3 model while minimizing the effect introduced by different feature representations is still a problem.The coregistration results can help enriching and fusing information from different datasets or localize and compare the changes from other models or time stamps.