IMPLEMENTATION AND FIRST EVALUATION OF AN INDOOR MAPPING APPLICATION USING SMARTPHONES AND AR FRAMEWORKS

: In this paper, we present the implementation of a smartphone-based indoor mobile mapping application based on an augmented reality (AR) framework and a subsequent performance evaluation in demanding indoor environments. The implementation runs on Android and iOS devices and demonstrates the great potential of smartphone-based 3D mobile mapping. The application includes several functionalities such as device tracking, coordinate, and distance measuring as well as capturing georeferenced imagery. We evaluate our prototype system by comparing measured points from the tracked device with ground control points in an indoor environment with two different campaigns. The first campaign consists of an open, one-way trajectory whereas the second campaign incorporates a loop closure. In the second campaign, the underlying AR framework successfully recognized the start location and correctly repositioned the device. Our results show that the absolute 3D accuracy of device tracking with a standard smartphone is around 1% of the travelled distance and that the local 3D accuracy reaches sub-decimetre level.


INTRODUCTION
The demand for capturing accurate 3D information is growing in a wide variety of disciplines such as BIM (Building Information Modelling), facility management or indoor navigation.Until recently, mapping indoor environments was a demanding task, requiring highly specialized multi-sensor systems such as terrestrial or mobile laser scanners.Then, new high quality indoor mobile mapping systems (MMS) were introduced such as the BIMAGE backpack (Blaser, Nebiker, & Cavegn, 2017).With such a backpack, 3D point clouds and highly detailed 3D image spaces (Nebiker, Cavegn, & Loesch, 2015) of large buildings, construction sites or tunnels can be captured.However, when it comes to keeping the data up-todate, using such high-end MMS would be too costly and the system would be restricted to a small group of experts.Hence, there should be a simple and cost-effective solution allowing building owners or facility managers to keep the digital twin of their infrastructure up-to-date.In recent years, the computing capacity of mobile devices has rapidly increased, which is enabling more and more computing intensive applications.A typical example are Augmented Reality (AR) applications, which are very demanding with respect to scene tracking and augmentation in real-time, tasks which were not possible on mobile devices a decade ago.Since Niantic has released Pokémon Go in 2016, the number of AR applications is rapidly increasing.Although Pokémon Go was a geospatial AR application, the most common AR applications are placing virtual 3D objects in an arbitrary scene using either a smartphone or AR glasses.These 3D objects can be as simple as a toy figure or as complex as a scaled 3D city model.Most often, these AR applications are restricted to work only in a single room or a small area.With the introduction of the AR frameworks ARCore (Google, n.d.) and ARKit (Apple Developers, 2019), developing AR applications has been greatly simplified.These AR frameworks support device motion tracking and scene understanding.Visually distinct features from the camera image -called feature points -combined with inertial measurements from the device's IMU are used to calculate the device's pose relative to the environment.Clusters of feature points that lie on horizontal or vertical surfaces such as tables or walls are detected as planar surfaces.Both ARCore and ARKit require mobile devices with calibrated cameras and the generated point cloud is at world scale.At the Institute of Geomatics at the FHNW, we are developing a new AR mapping application, which shall combine the advantages of the local tracking of an AR framework with referencing the device to a reference image dataset, which has been georeferenced in a geodetic reference system (Nebiker et al., 2015).As a first step, we developed an application, which is able to motion track the device, measure points, localize itself to a specific reference system and capture photographs with absolute orientation.Once it is possible to align the captured photographs to a georeferenced image database, this application is ready for various mapping tasks with high global accuracy.Our paper is structured as follows: in chapter 2, we discuss the related work.In chapter 3, we describe our development and architecture and in chapter 4 and 5, we outline our accuracy experiments and their result.Finally, in chapter 6 we give a conclusion and an outlook to future developments.

RELATED WORK
There are different types of indoor mapping systems.On the one hand, there are static mapping systems like terrestrial laser scanners, which scan the environment with high precision at the cost of a time-consuming data collection process.On the other hand, indoor mobile mapping systems (MMS) are becoming more popular since the data collection can be performed while driving or walking through the environment.Different indoor MMS have been proposed, which can be categorized either by the platform type or by the sensors used.Generally, there are backpack-based systems, handheld systems and trolley-based systems.The Würzburg backpack (Lauterbach et al., 2015), for example, consists of a 2D laser profiler and a Riegl VZ-400 laser scanner.Another backpack system (Blaser, Cavegn, & (Nebiker et al., 2015;Rettenmund, Fehr, Cavegn, & Nebiker, 2018).This method works in indoor and outdoor environments and does not require ground control points.However, the success of this method depends on ideal conditions such as up-to-date reference images, same lighting, seasonality and similar viewpoint.Since then, new and more robust methods have evolved.DenseSFM proposes a Structure from Motion (SfM) pipeline that uses dense CNN features with keypoint relocalization (Widya, Torii, & Okutomi, 2018).Sarlin et al. (2019) also use learned descriptors to improve localization robustness across large variations of appearance.These approaches are more robust than using classical local features like SIFT and its variants and have the potential to solve the absolute image orientation problem.However, these approaches are computationally heavy and at the time of writing, only a few approaches are computing in real-time.Therefore, we have not yet implemented one of these approaches in our AR mapping framework.

OUR ACHITECTURE
The main goals of our development included: a simple operation on a broad range of devices, a compatibility with the two most prominent mobile operating systems Android or iOS, and a real-time capability.The minimal functionality should include the possibility to interactively localize the device in an absolute reference frame using control points, to perform point measurements and to capture georeferenced images.

Underlying Frameworks
Our development is based on the widely used game engine Unity.Unity provides a large number of packages, which can be included into a project and Unity-based applications can be deployed to various operating systems.Our project is developed with Unity's AR Foundation package, which includes built-in multi-platform support for AR applications (Unity, 2018).This makes it possible to develop an application, which can run either Google's ARCore or Apple's ARKit depending on the user's device and operating system.

Design
Since our development is mainly a proof of concept and not a distribution-ready application, we did not focus on UI and UX aspects in our mobile app development.Instead, the focus was placed on implementing our required functionality, which considerably differs from existing applications.Therefore, all graphical interfaces were implemented with default settings, which allow custom individualisations at a later stage, if required within a distributed application.

Device Tracking:
The foundation of our application is the device tracking.The underlying AR frameworks support motion tracking by fusing multiple sensors such as accelerometer, gyroscope, magnetometer and camera.Visually distinct feature points from the camera image combined with inertial measurements are used to estimate the device's relative pose to the environment.Furthermore, the framework estimates horizontal and vertical planes with detected feature points, which are mostly located on walls and on the floor.
Once the AR app is started, device tracking starts immediately.
The origin of the local AR reference frame coincides with the location where the app was initialised -with the heading of the device defining the direction of the forward axis, the up-axis pointing vertical and the right axis perpendicular to the right.Since either Google's ARCore or Apple's ARKit only run on calibrated devices and multiple sensors are fused, the AR reference frame is automatically at world scale.
The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume XLII-2/W17, 2019 6th International Workshop LowCost 3D -Sensors, Algorithms, Applications, 2-3 December 2019, Strasbourg, France 3.3.2Measurement Functionality: After the AR app has been initialised, local measurements can be taken immediately.
The app supports two measurement modes: point and 3D distance measurements (Figure 2).Other modes such as area, volumetric or slope measurements could additionally be implemented.Both point and distance measurements can either be made directly using individual feature points from the device tracking or using the detected planar surfaces.Measuring on detected planar surfaces has the advantage that measurements can be carried out continuously, even when a surface is lacking visual features.
To execute a measurement, a single tap on the screen on the desired location is required.Depending on the measuring mode, a pop-up window with the local or global coordinates or the 3D distance appears.A 2D distance measurement mode (top down, floorplan) could be implemented additionally, if needed.The coordinates of measured points can be saved to a text file with local and if available global coordinates for further processing.For first version, we realized a 6 degree of freedom (6DoF) transformation using ground control points (GCP) in order to transfer the local scene into a global reference frame.
To start the referencing process, a list of GCPs can be imported from file.After a successful import, at least three points need to be measured with the AR app and referenced to a GCP by choosing from a dropdown list.Again, measurements can be directly conducted on feature points or on detected planar surfaces.Then, the 6DoF transformation is calculated according to Umeyama (1991) and the residuals are displayed (Figure 3, left).The transformation can easily be evaluated based on residuals and dynamically adjusted by additional point measurements or by the exclusion of points.Once the transformation calculation is correct, any object in the global reference system can be augmented in the camera feed of the app.For verification purposes, the app overlays the GCPs into the camera feed (Figure 3, right).With the app, it is also possible to upload the photo with its pose to a web service.For verification purposes, the captured photograph can be displayed in the AR scene at its real location and with its original pose (Figure 4).

PERFORMANCE EVALUATION
We carried out accuracy evaluations based on 3D point measurements in order to determine the performance and stability of motion tracking and subsequent measuring and mapping accuracy in indoor environments.
In the first experiment, we compare the two 3D measurement methods: feature-point-based and surface-based measurements.
In the following two experiments, we investigate the deviations along multiple trajectories.In the first case, the trajectory describes a route with different start and destination points and in the second case, the trajectory forms a loop.In all of the following experiments, we used common high-end smartphones such as Google Pixel 2 and Samsung Galaxy S9.

Test Site
Our test site is located in the new main building of the FHNW Campus in Muttenz/Basel.It covers the eastern part of the 10 th floor, where four large perpendicular corridors form a loop (Figure 5).The indoor environment of this modern building is challenging since there are a lot of repetitive structures and uniform surfaces including large glass facades.Therefore, extracting distinct visual features is demanding.The overall accuracy of the reference system is < 3mm.The GCPs are exactly defined natural points like door corners and intersections between a wall corner and the floor (Figure 6).

Coordinate Measurement
In the first part of the investigation, we compared different 3D coordinate measurement methods within the AR Scene.The AR framework supports two types of coordinate measurements: 1) by getting coordinates directly from the raw feature point cloud or 2) by determining the coordinates by intersecting an image ray with a detected surface.By investigating first the quality of the raw feature point cloud and then comparing the residuals between GCPs and measurements on the point cloud and the detected surfaces, we want to know how to measure 3D coordinates for the following investigation.

One-way trajectory
In the first of the two mapping experiments, we evaluated the tracking performance using a one-way trajectory.This trajectory was created by walking a path from A via B to C (Figure 5).This path is around 90 meters long and includes three direction changes of 90 degrees.Between B and C, there is a long corridor with repetitive structures and large windows as shown in Figure 7.At the starting location A, seven GCPs on different height levels were measured.Along the trajectory to the final location C, nine additional points were measured as checkpoints.The transformation parameters to the global reference frame were derived using the seven GCP measurements at location A. Once the transformation parameters were estimated, all measured points were transformed into the global reference frame and the residuals to the GCPs were calculated.

Closed loop trajectory
In the second mapping experiment, we measured a trajectory, which forms a closed loop from location A via B, C and D and ending again at location A. In this campaign we measured 20 points in total.At location A we measured the identical GCPs and we additionally measured 13 check points along the loop (see Figure 9, right).The trajectory length is about 140 meters and includes four 90-degree turns.We again used the GCPs to calculate the transformation from the local to the global reference frame and then transformed all points with these parameters.Finally, the residuals were calculated for all measured points.

RESULTS
In this chapter, we show and discuss the results of our three performance investigations.First, we compared two different 3D measurement approaches supported by our application.Second, we examined the tracking quality by performing 3D measurements along two different trajectories, one open and the other closed.The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume XLII-2/W17, 2019 6th International Workshop LowCost 3D -Sensors, Algorithms, Applications, 2-3 December 2019, Strasbourg, France

Coordinate measurements
The quality of the raw feature point cloud is rather low because of the high noise.The detected points particularly vary a lot in the direction of the device's camera.This is not surprising since the 3D points are calculated for every frame and the camera displacement between frames is often only marginal, both in location and orientation.Figure 8 shows the horizontal projection of the resulting point cloud.1. Differences between measurement approaches (avg.residuals between measured points and GCP coordinates) These first results show excellent local point measurement accuracies at the sub-decimetre level for both approaches.They also indicate that the surface-based approach improves the horizontal measuring accuracy by about a factor of two, to approx. 2 cm in X and Y.The vertical point measurement accuracy does not significantly differ between the two approaches.Since the planar surface-based approach yields better results and since its handling is simpler, we used this measuring approach for all subsequent investigations.

Trajectories
The accuracy of the trajectories was assessed by comparing the measured check point coordinates along the mapping paths with their reference coordinates.Table 2 lists the RMSE values of for the open (A-C) and the closed loop trajectories (A-A).The RMSE in both horizontal and vertical direction are surprisingly small considering the travelled distance of 90 meters and 140 meters respectively.In the first campaign, the maximum horizontal error is 1.6 meters after 86 meters and the maximum vertical error amounts 0.8 meters after 89 meters along the trajectory (see Figure 10, left).In the second campaign, the maximum horizontal error is 1.7 meters at 104 meters and the maximum vertical 0.4 m after 87 meters along the trajectory (see Figure 10 As can be seen in Figure 9, both horizontal and vertical drifts increase with the distance travelled from the start location.Interestingly, in the closed-loop trajectory at the last measured check point again a high accuracy was proven (<10 cm), as visible in Figure 10.Since the last point was again very close to the location, where the AR application has been initialized, this indicates that the AR device has recognized this location from before and has successfully been relocated to this position.It is important to be aware of this behaviour, since in the current version of the AR framework a) does not signal a loop closure and b) yields a discontinuous trajectory around the loop closure event.
Another interesting phenomenon, which is evident in Figure 9 (left) and especially in Figure 10 (right) is the vertical shift, which happens in the long corridor after around 40 meters.This shift did not happen in the second campaign.

CONCLUSION AND OUTLOOK
We successfully developed a low-cost image-based indoor mobile mapping prototype application based on current AR frameworks.The application runs on most modern mobile devices, both in the iOS and Android ecosystems.Our application supports absolute geo-referencing via ground control points.Thanks to AR tracking, directly georeferenced images can be captured with our application.In addition, we carried out performance investigations in a challenging indoor environment with two different campaigns.In a first investigation, we obtained local 3D measurements accuracies using the plane-based measuring approach of 2-3 cm horizontally and approx.5 cm vertically.Our subsequent mapping test campaigns showed that AR tools are surprisingly accurate with a max 3D error of the full circle campaign of 1.6 m or 1.6% over a distance of 100 meters in a very demanding environment (Figure 10 left).The analysis of the difference vectors in Figure 9 indicates that the local accuracy is even higher.All this shows that tools have a huge potential in accurately tracking mobile devices in indoor environments without specific and expensive hardware.
In summary, we demonstrated that AR Frameworks are an interesting alternative to costly high-end mobile mapping systems in certain application areas.In the future, AR mapping apps could provide a low-cost frontend to an ecosystem of image-based mobile mapping and visual localization services.As demonstrated in this paper, consumer devices could be used for carrying out relatively accurate 3D measurements and for updating existing image-based infrastructure services, e.g. by providing accurately georeferenced error or change reports to facility managers.Future work includes the combination of the high local accuracy of an AR tool with GNSS or a visual positioning service as an absolute positioning system.We also plan to extend our accuracy investigations to natural outdoor environments without manmade structures.

Figure 2 .
Figure 2. Point (left) and distance (right) measurement modes3.3.3Global Referencing:For absolute geo-referencing of captured images and for conducting measurements in a global reference frame, the device needs to be related to a reference frame.For first version, we realized a 6 degree of freedom (6DoF) transformation using ground control points (GCP) in order to transfer the local scene into a global reference frame.To start the referencing process, a list of GCPs can be imported from file.After a successful import, at least three points need to be measured with the AR app and referenced to a GCP by choosing from a dropdown list.Again, measurements can be directly conducted on feature points or on detected planar surfaces.Then, the 6DoF transformation is calculated according toUmeyama (1991) and the residuals are displayed (Figure3, left).The transformation can easily be evaluated based on residuals and dynamically adjusted by additional point measurements or by the exclusion of points.Once the transformation calculation is correct, any object in the global reference system can be augmented in the camera feed of the app.For verification purposes, the app overlays the GCPs into the camera feed (Figure3, right).

Figure 3 .
Figure 3. Display of the residuals (left) and verification of the transformation by displaying GCPs as blue spheres (right) 3.3.4Photo Capture with Pose: Finally, it is possible to capture geo-referenced images.Every time a user takes a photo, the app stores the local pose and if available the global pose (position and orientation).With the app, it is also possible to upload the photo with its pose to a web service.For verification purposes, the captured photograph can be displayed in the AR scene at its real location and with its original pose (Figure4).

Figure 4 .
Figure 4. Photo display with original pose (captured photo in dashed rectangle)

Figure 5 .
Figure 5. Floorplan of the 10th floor with test area (inside dashed rectangle)As a reference, we established 137 ground control points (GCPs) which we measured with a multistation Leica MS60.The overall accuracy of the reference system is < 3mm.The GCPs are exactly defined natural points like door corners and intersections between a wall corner and the floor (Figure6).

Figure 6 .
Figure 6.Ground control points near location A

Figure 7 .
Figure 7. Repetitive structures in the corridor between B and C

Figure 8 .
Figure 8. Raw feature point cloud at location A (room 10.O.02)The resulting coordinate measurement accuracies of the two approaches -feature point-based and planar surface-based -are shown in Table1.The table lists the average residuals between measured point coordinates and GCP coordinates.For both approaches, the same seven points were measured.

Figure 9 .
Figure 9. Behaviour of difference vectors (scaled by factor 5) on one-way trajectory (left) and closed-loop trajectory (right)

Table 1 .
The table lists the average residuals between measured point coordinates and GCP coordinates.For both approaches, the same seven points were measured.

Table 2 .
, right).Residuals of the one way and the closed loop trajectory