Semantic Enrichment of Indoor Point Clouds An Overview of Progress towards Digital Twinning

This paper presents an approach towards the development of a service-oriented platform for semantic enrichment of indoor point clouds. It mainly focuses on integrated methods for the capture of as-is 3D point clouds using commodity mobile hardware, classification of point cloud clusters using a multiview-based method, geometric reconstruction of room boundaries, interactive 3D visualization, sensor data visualization, and tracking of spatial changes and user annotations via a secure ledger. Implementing the methods in a prototypical web-based application, we demonstrate our approach for the semantic enrichment of indoor point clouds and the generation of base data for Digital Twin representation.


INTRODUCTION
One of the key challenges in modern Facility Management (FM) is to digitally represent the current state of the built environment, referred to as-is (or as-built) versus as-designed (Kensek, 2015).While the use of Building Information Modelling (BIM) can address the issue of digital representation, the generation and maintenance of BIM data requires a considerable amount of manual work and domain expertise (Volk et al. 2014).Another key challenge is to enable users to monitor and forecast the current state of the built environment, especially if digital documentation is integrated with real-time or historic data that is used to provide feedback and enhance deci-sion making (Clau~ et al. 2014).These challenges create a demand for automated generation and updating of digital representations of the indoor areas -using methods that suit the modern Internet ofThings (loT) environment.The need for integrated solutions is becoming more pronounced as practices from Industry 4.0 are currently being evaluated and adopted for FM use (Teizer et al. 2017;Lasi et al. 2014).

Concept of Digital Twins
According to (Grieves 2014), a DT is a digital duplicate of the physical environment, states, and processes.A DT representation fuses as-designed and asis physical representations, with additional informa-Draft -eCAADe 37 I SIGraDi 23 I 1 tion layers pertaining to the current and forecasted states of an indoor environment or complete buildings (Posada et al. 2018).In order to capture the current physical state of the built environment, 3D point clouds can be used for this task, as they can capture important and intricate details of the as-is physical environment (Qu and Sun 2015).
A 3D point cloud consists of unstructured, noninterpreted data -data that is open to visual interpretation but does not have any semantics associated with it.For any further representations and automated assessment, the 3D point cloud needs to be processed in order generate useful semantics (Bassier et al. 2016).These semantically enriched indoor 3D point clouds are used as base data for DTs, as they capture the physical attributes of the built environment, and can be used alongside other data (e.g., sensor data), for further stakeholder engagement, decision making and forecasting.

Approach and Research Contributions
The current use of BIM within the FM sector still requires a large amount of manual work and domain expertise in Architecture, Engineering and Construction (AEC) topics in order to make use of the full potential of the digitization practices offered by BIM.In contrast, advancements are being made in the field of Digital Twins and Industry 4.0, specifically concerning cyber-physical representations of as-is built environments for domain specific decision making (Parrot and Lane 2017).
The presented research focuses on the design, development, and experimental testing of the main components of a service-oriented platform, used to semantically enrich indoor 3D point clouds.Extending our previous work (Stojanovic et al. 201 Sb), this paper provides contributions as an overview of current state-of-the-art approaches related to DT representations, and details of approaches and methods for semantic enrichment of indoor point clouds for potential use as DT representations.

Point Cloud Acquisition
The use of commodity mobile devices has become a viable option for acquisition of point clouds.Photogrammetry-based processing methods are implemented on mobile devices and allow the processing of a sequence of captured images to be converted into a point cloud representation .According to (Froehlich et al. 2017), consumer mobile devices with depth sensing cameras provide an affordable, flexible, and effective solution for capturing 3D point clouds of interiors in comparison to more expensive portable 3D scanning devices.Commodity mobile devices can also be used to Augmented Reality (AR) representations of indoor environments to enhance decision making amongst stakeholders (Riedlinger et al. 2019).

Point Cloud Processing and 30 Reconstruction
Before semantics can be added to the point cloud, the point cloud dataset has to be segmented in order to divide and mark the homogeneous regions of point clusters, which allows for quicker identification of physical features (Nguyen and Le 2013).After segmentation, the normal vectors for each of the point clusters need to be pre-computed .The use of vector normals is important for tasks such as geometric surface reconstruction.Computation of point normals can be accomplished by analyzing the local neighborhood of a point (Mitra and Nguyen 2003), where the normal vector is oriented according to the represented neighbouring points.The neighborhood of a point can then be computed using the covariance matrix of the k-nearest neighbors (KNN), and corresponding eigenvectors and eigenvalues (Hoppe et al. 1992).
Most of the current point cloud reconstruction methods for BIM applications are based on a semiautomatic approach.These are able to detect important structural features such as walls, floors, and ceilings (Macher et al. 2017).The visualization of 3D point clouds fulfils the requirement for up-to-date representations of built environments, and can additionally be used to generate floor plans based on room-boundary evaluations (Stojanovic et al. 2019a).

Semantic Enrichment using Deep Learning
Segmented point clusters of a 30 point cloud can be classified using a Convolutional Neural Network (CNN) (loannidou et al. 2017), where the CNN can be used to recognize 20 or 30 spatial and visual features of a given point cluster.Training data used to train a CNN can either be 30 point cloud clusters, geometric approximations of point regions (e.g., voxels), or 20 images of specific 30 objects and real-life photographs of their counterparts.Our approach focuses on classification of 20 raster images of 30 point clusters, known as multi-view classification (Stojanovic et al. 2019b).These generated multiview images can be classified in a service-oriented manner, and the classification results can be streamed back and associated with each point cluster of the indoor 30 point cloud (Stojanovic et al. 2018a).This allows for automated semantic enrichment of indoor point cloud data, which in turn enables the point clouds to be used as base data for as-is BIM or DT representations.

Interactive Visualization Techniques for Facility Management
The use of interactive 30 visualization can benefit FM stakeholder engagement, information sharing, and collaboration by allowing real-time display and analysis of 20 and 30 visual outputs generated from the acquired data sources (Khalid et al. 2017, pp. 93-104).Using modern computer graphics rendering approaches, complex visualizations can be presented in real-time on various commodity hardware configurations.
With the support of the WebGL 30 graphics API for most modern HTMLS compliant web browsers, it is possible to visualize in real-time 30 various models and generated outputs for AEC applications.A number of programming frameworks for web-based 30 visualization allow for real-time 30 features such as model loading in different file formats, scene navigation, 30 data structures, and GPU-based rendering (Discher et al. 2018).

Change Tracking using Secure Ledgers
Another important requirement is to enable users to visualize, annotate, and track user-made changes of the DT representation in real-time.(Shen and Pena-Mora 2018) provide a detailed literature review of potential blockchain applications for smart cities.With growing popularity of blockchains as immutable and dynamic data structures for Industry 4.0 and built environment applications (Li et al. 2018), their application as secure ledgers for storing transactional data without proof-of-work is used in our approach .

Indoor Sensor Data Acquisition and Visualization
The use of various hardware sensors for monitoring indoor environments can greatly benefit FM by providing important insight into, e.g., energy consumption, space usage, and health factors of indoor environments.Visualization of such sensor data can help FM stakeholders to determine the appropriate allocation of energy resources for building occupancy comfort and user health.Visualization approaches can make use of thematic color mapping of segmented indoor areas (Patlakas et al. 2017), textual information displayed alongside the 30 model (Pouke et al. 2018), abstracted 30 visual analytics such as 30 bar graphs (Virtanen et al. 2016), and combined 20 data analytics results with spatially corresponding 30 scene markers for linked visualization of energyrelated building data (Sihombing and Coors 2018).

Service-Oriented Architecture Implementation
The use of a service-oriented approach allows for the processing of complex visualization scenarios to be performed on a remote server, without relying on the platform capacity of the client device.Some device hardware configurations may be ofolder generation and may not have the ability to process complex data in real-time using their native hardware, thus visual data can also be pre-processed and streamed from a server in real-time to commodity hardware Draft -eCAADe 37 I SIGraDi 23 I 3 clients using an implemented service-oriented architecture solution (Hildebrandt et al. 2011 ).(Franz et al. 2018) described the development and application of a service-oriented platform for collaborative reconstruction of indoor point cloud scenes captured using commodity mobile 3D scanning devices, focusing the use of their approach for crime scene documentation.(Qiao et al. 2019) provide an overview of the state-of-the-art for AR via web-based applications and service-oriented architectures, and they describe how the use of lightweight web-based software components can benefit the application of AR for practical visualization scenarios.

APPROACH AND METHODS
We present an approach for the acquisition, processing, and semantic enrichment of indoor point clouds (Fig. 1 ).Our approach allows for light-weight and flexible service-oriented system components to be integrated for automated semantic enrichment, visualization and change tracking.We have implemented the presented approach and methods within a prototypical service-oriented web-application.The implemented components can be used by FM operators for inspection, annotation, and task allocation using the semantically enriched indoor point cloud representations.Each of the implemented components addresses the requirements stated as the eight related work topics, which include: point cloud acquisition, point cloud processing, 3D reconstruction, semantic enrichment, interactive visualization, change tracking, sensor data acquisition and visualization and service-oriented architecture implementation.We describe how we implemented each of these components within our prototypical serviceoriented web-application.

Point Cloud Acquisition
We focus on acquisition of indoor point clouds using commodity mobile devices.For our approach we made use of a mobile phone with a depth sensing camera, in order to acquire point cloud scans of indoor spaces.The photogrammetry method 4 I eCAADe 37 I SIGraDi 23 -Draft used by such devices works by comparing the depth and colour values of sequentially captured image frames.While this approach is practical and affordable, in comparison to using professional-grade scanning devices, it is limited in precision, with the visual quality of the captuer being affected by the indoor lighting conditions.These limitations often require multiple scans of the same room being taken during optimal daylight conditions, and then merged together during later pre-processing stages.
The resolution and visual quality of the acquired point cloud is suitable for further classification and semantic enrichment, though smaller objects such as light switches and power sockets are not adequately captured due to coarsity of the scans.This can be mediated by additional user annotation during the inspection stages of the semantically enriched indoor point cloud.
Additional data such as floor plans, historic and current data (e.g., sensor data), as well as access credentials, can be obtained for the corresponding indoor location at this stage and included with the data acquisition package.All of this data should preferably be uploaded on a secure server which is accessible to the client application.

Point Cloud Pre-Processing
Pre-processing of acquired point clouds is required in order to optimize the point cloud and introduce additional data required for semantic enrichment.Preprocessing of acquired point clouds requires the following steps: 1. Spatial alignment/registration, 2. Automated and/ or manual removal of noise and clutter artifacts, 3. Computation of normal vectors, 4. Automated and/or manual segmentation of homogeneous spatial regions.
First, the point clouds needs to be registered.This can be based on indoor GPS longitude or latitude coordinates, or alignment with an existing floor plan overlay.In either case, the coordinate system needs j ~I @ i~~ Task Requests, Decisions and Outcomes Draft -eCAADe 37 I SIGraDi 23 I S to be transferred and applied as Cartesian 3D coordinates, which is required for projecting the given point clusters in 3D space.Second, the point cloud should be inspected and cleaned of any noise or undesired clutter that may have been captured during acquisition.Removal of point cloud noise and clutter can be accomplished by manually selecting point cluster regions and deleting them, or by using a statistical method to detect outlier points.Third, the normal vectors are computed for the entire registered and cleaned point cloud.Fourth, the point cluster should be segmented into homogeneous spatial regions (this can be based on the region, color or distribution of points).
Finally, the approximation of floor plan boundaries can also be evaluated from vertically segmented point clusters (Stojanovic et al. 2019a), using concave boundary line approximation and clustering methods for detection of secondary boundary regions (e.g., smaller closed concave shapes within a larger primary concave shape).

Semantic Enrichment of Indoor Point Clouds
Semantic enrichment of indoor point clouds can be accomplished using image-classification methods (based on deep-learning principles).In practice, this works by generating images of point cloud clusters, classifying them and associating the image classification result with the corresponding point cluster, thus, semantically enriching it.Prior to generating images of point clusters, the given indoor point cloud scene needs to be discretized into spatial partitions.The method for spatial partitioning can be based on uniform partitioning method such as octree-based partiontining (Stojanovic et al. 2018a), or using data clustering methods (Stojanovic et al. 2019b).Nonuniform spatial partitioning approaches are particularly useful for classifying indoor point cloud scenes that contain furniture items, which may be spatially cluttered, occluded, or only partially scanned.
The generated images from each spatiallypartitioned cluster can then be classified using a 6 I eCAADe 37 I SIGraDi 23 -Draft retrained CNN.Training a CNN usually requires provisioning of hundreds to thousands of reference photographs or pictures of objects for specific categories, which the CNN is trained to recognize.In terms of furniture, we classify common office furniture (e.g., chairs, sofas, and tables).
Training a CNN can be a time-consuming process, thus it can be viable to retrain only the last layer of a network-known as the bottleneck layer, with new object categories.The tuning of retraining parameters largely depends on the amount of image data available for training, as well as the adjustment of hyper-parameters such as the learning rate of a CNN model.Once the CNN model has been retrained, it can be used to classify images of given objects (in our case these are images of RGB point clusters).Classification of image data for 3D objects, rather than using the 3D geometry data itself for classification, has the advantage of dealing only with images in terms of data bandwidth, thus making it more suitable for light-weight service-oriented systems integration (e.g., images can be compressed and original geometry data is not required to be transmitted to the server).

Service Oriented System Architecture Overview
The presented approach is implemented as components within a web-based application, designed using a service-oriented architecture.The web-based application is composed of a web-based client application, and a server application responsible for the majority of the computational tasks.The client application is written in JavaScript, while the majority of the server application is also written in JavaScript using Node.js[3], with additional components used for classification and image-processing tasks written in Python.The client/server model used by Node.js is based on the echo-server architecture, and the server is implemented as an express echo server using the Socket.ioAPI [2].Additionally, the secure ledger and sensor data integration are based on the standard HTTP API (using POST and GET operations).The client Multiview Classification.We use two main multiview classification methods, both based on generation of multiple images from spatially partitioned 3D point clusters.The first method is based on octree partitioning of the scene, and the generation of cubemap images of point clusters from each octree node (Stojanovic et al. 2018a).The second method is based on non-uniform cluster evaluation using either k-means or DBSCAN clustering, as well as the selection of optimal viewpoints based on visual entropy for the generation of multiview classification images (Stojanovic et al. 2019b).Both methods are suitable for classifying indoor objects, such as office furniture for indoor point cloud scenes.
Sensor Data Visualization.Sensor data plays an important role in assessing the current state of the built environment, by providing visual analytics related to temporal changes of data in multiple dimensions.Such data may include occupancy comfort criteria such as room temperature or ambient noise levels, or important energy efficiency data such as power usage and carbon emissions.Affordable sensors can collect room temperature and humidity readings.Such real-time or previously collected sensor data (e.g., historic sensor data) can then be visualized in combination with the as-is digital representation of the indoor environment (e.g., BIM geometry or indoor point cloud models).The sensor data can be transmitted using the same client-server architecture as used for the visualization and classification components.We use filtered JSON data, that is then parsed and visualized for each timepoint.The client visualization system can then map these values using different visualization methods for different assessment scenarios (Fig. 2).This allows for the application of visual analytics to enhance decision making and forecasting using quantifiable attributes of the built environment.
User Annotation and Secure Ledger Spatial Tracking of User Changes.Direct annotation of semantically enriched point clouds enables domain expert users to inspect, modify and add notes to given indoor point clouds scenes (Fig. 3).We have experimented with several features of annotations, namely: interactive selection and transformation of point clusters, textual annotation, and spatial measurement.All of the annotation initiated by the users is recorded in the secure ledger for the purpose of generating immutable digital documentation (retaining user-centric and driven decisions).The use of these simple visual analytics enables users to communicate important suggestions, and recommended changes to the spatial layouts of indoor spaces represented by semantically-enriched point clouds.The annotation visualization component and the associated annotation methods are implemented within the client-side web-application .
Furthermore, the use of secure ledgers allows for tracking of user-initiated changes and annotations to a given 3D point cloud scene.The implemented secure ledger component is based on blockchain technology, except that it does not use any proof-of-work mechanism.Rather, any changes to the 3D point cloud scene, such as spatial changes, are recorded within an immutable block with a unique hash code.The hash code is computed for each block when an annotation action is initiated by the user inspecting the 3D scene.The hash code for a current block is then compared to the hash code of the previ-Draft -eCAADe 37 I SIGraDi 23 I 7 ous block, ensuring that validity of the blockchain.This way, any agreed changes can be communicated by stakeholders and referenced as immutable documentation.

CASE STUDY
The main focus of our case study was space and occupancy management, by means of using indoor point clouds to provide as-is visualization of indoor office environments, automatically classify office furniture, record spatial changes, and visualize sensor data for the corresponding spatial locations.We tested all of the presented components using an acquired indoor 30 point cloud scan of a typical office area (Fig. 4).

BI eCAADe 37 I SIGraDi 23 -Draft
The indoor point cloud was acquired using an ASUS ZenFone AR mobile device.The point cloud was then registered using the provided floorplan, cleaned, and segmented (using both manual and automated segmentation methods).We apply our multiview approach to classify the indoor furniture items.We also tested our secure ledger methods for recording of spatial changes, by performing basic annotation tasks within selected indoor area representations of the point cloud (Fig. 3).Finally, one of the offices and the kitchen area included heat, car-

Figure4
The office area floorplan (left), and the acquired point cloud used for testing.Only specific rooms with integrated sensors were scanned for the case study.bon emission, and energy usage sensor data readings that are used to testthe sensor data visualization features (Fig. 2).With the combined use of semantically enriched point clouds, sensor data, user annotation, and tracking of scene changes via a secure ledger, a practical platform for indoor digital twins can be realized.While our approach and methods cannot replace the accuracy of complete Bl Ms representations and Building Automation Systems (BASs), they can potentially be used to enhance existing Integrated Workplace Management Systems (IWMSs) by providing as-is digital representations of indoor environments for FM related applications (e.g., Operations and Maintenance (O&M) decision making).

CONCLUSIONS AND FUTURE RESEARCH
This paper presents an approach and related methods for semantic enrichment of indoor point clouds.The semantically enriched point clouds can be visualized alongside existing digital documentation and sensor data.Our approach thus enables the potential use of semantically enriched point clouds as DTs of indoor environments for FM use.
The case study demonstrated the feasibility of our approach for semantically enriching indoor point clouds acquired using mobile devices.Specific use case approaches are presented as concepts, with the aim of demonstrating the potential benefits of using a service-oriented approach for classification, visualization and tracking of annotations of indoor 3D point clouds.The main advantage of our approach is that it integrates service-oriented system components as a flexible, robust, and lightweight platform for semantic enrichment of indoor point clouds.We are able to acquire, process, classify, visualize, annotate, and track changes of indoor environments with the use of point clouds, sensor data, floor plans, and domain expertise.For future work we will generate qualitative empirical results through FM domain expert user feedback.We also plan to investigate methods for registration of placeholder 3D models of furniture using locations of classified point clusters.
Figure 1 Figure 2 Temperate data recorded for a specific office represented in the indoor point cloud scan.The color of the point clusters representing the office changes with each new temperature reading.The temperature data was recorded over a period of one month.