Evaluating the Quality of TLS Point Cloud Colorization

Terrestrial laser scanning (TLS) enables the efficient production of high-density colored 3D point clouds of real-world environments. An increasing number of applications from visual and automated interpretation to photorealistic 3D visualizations and experiences rely on accurate and reliable color information. However, insufficient attention has been put into evaluating the colorization quality of the 3D point clouds produced applying TLS. We have developed a method for the evaluation of the point cloud colorization quality of TLS systems with integrated imaging sensors. Our method assesses the capability of several tested systems to reproduce colors and details of a scene by measuring objective image quality metrics from 2D images that were rendered from 3D scanned test charts. The results suggest that the detected problems related to color reproduction (i.e., measured differences in color, white balance, and exposure) could be mitigated in data processing while the issues related to detail reproduction (i.e., measured sharpness and noise) are less in the control of a scanner user. Despite being commendable 3D measuring instruments, improving the colorization tools and workflows, and automated image processing pipelines would potentially increase not only the quality and production efficiency but also the applicability of colored 3D point clouds.


Introduction
Terrestrial laser scanning (TLS) enables the efficient and detailed collection of 3D point clouds from the real world for a rapidly increasing number of use cases. Three-dimensional point clouds describe the geometry of the targeted object or environment and are either applied directly, or used as a starting point for further processing, modeling, or analysis. In addition to geometry, non-geometric information such as radiometric information about the target is highly relevant and required by many applications. For example, laser scanners usually record the point intensity values and can utilize cameras to derive color values for the 3D point clouds. Color information (most commonly red, green, and blue values of the RGB color model) can be considered as one of the most common, useful, and important types of non-geometric information and typically comprises the radiance captured by an imaging sensor as an integrated part of the 3D measuring system or as a separate external camera. This radiance is affected e.g., by illumination, geometry, and diffuse and specular reflectivity of the target (e.g., [1,2]). describe the colorfulness but the differences in luminance and in color (i.e., contrast) that are then visually perceived as details in the scene. Reliable and accurate color information is crucial in numerous applications that rely on visually interpreting and understanding the data, such as in visually recognizing objects (e.g., color-coded standard pipes in industrial facilities [3]) or their material properties. Color is essential in photorealistic applications that rely on textured 3D models or colored point clouds (e.g., [4]), for example in content creation of textured 3D assets for video games industries [5]. The importance of accurate color information has been frequently stressed, e.g., in the cultural heritage and archaeology fields where the visual appearance of a model is a key aspect (e.g., [2,6]). Color information can be also used to improve and automate, e.g., traffic sign recognition [7], construction components [8] and building materials [9], the detection of material defects [10], the reconstruction of building facades [11], and more generally automate various data processing steps such as registration [12][13][14] and segmentation [15][16][17] of 3D point clouds ( Figure 1). Modern TLS systems often rely on integrated cameras to acquire the color information. Many scanner manufacturers such as Leica Geosystems (Hexagon AB, Stockholm, Sweden), Faro (Faro Technologies Inc., Lake Mary, FL, USA), Trimble (Trimble Inc., Sunnyvale, CA, USA), and Zoller & Fröhlich (Zoller & Fröhlich GmbH, Wangen, Germany) have added high dynamic range (HDR) imaging features to their systems (e.g., [18][19][20][21]) to increase the flexibility, and potentially the imaging quality especially in challenging illumination conditions. HDR is a technique where several images of different exposure times are combined to produce an image of a greater dynamic range (e.g., [22]). Additionally, instead of relying solely on traditional coaxially mounted cameras, some manufacturers have added more camera components to the scanner frame, thus increasing the speed of the imaging process during data collection (e.g., [18,23]).
A significant amount of research has been published assessing the quality of 3D point clouds produced via TLS, focusing largely on geometric aspects of the resulting data quality. The geometric quality of a point cloud is influenced by the accuracy and precision of the laser scanner (e.g., [24][25][26][27][28][29][30][31]). The level of identifiable detail in the point cloud is affected by the scan resolution and disturbed by unwanted errors caused by e.g., edge effects (e.g., [26,30,[32][33][34]). Furthermore, the target surface reflectivity and geometry clearly have an impact on the geometric quality of the resulting point cloud (e.g., [24,26,28,[35][36][37][38]). Many of these studies have focused on investigating the effect of the target material and its properties, such as the surface color or texture, on the geometric quality of the resulting point cloud. Furthermore, the scanning geometry (e.g., [39,40]) and environmental Modern TLS systems often rely on integrated cameras to acquire the color information. Many scanner manufacturers such as Leica Geosystems (Hexagon AB, Stockholm, Sweden), Faro (Faro Technologies Inc., Lake Mary, FL, USA), Trimble (Trimble Inc., Sunnyvale, CA, USA), and Zoller & Fröhlich (Zoller & Fröhlich GmbH, Wangen, Germany) have added high dynamic range (HDR) imaging features to their systems (e.g., [18][19][20][21]) to increase the flexibility, and potentially the imaging quality especially in challenging illumination conditions. HDR is a technique where several images of different exposure times are combined to produce an image of a greater dynamic range (e.g., [22]). Additionally, instead of relying solely on traditional coaxially mounted cameras, some manufacturers have added more camera components to the scanner frame, thus increasing the speed of the imaging process during data collection (e.g., [18,23]).
A significant amount of research has been published assessing the quality of 3D point clouds produced via TLS, focusing largely on geometric aspects of the resulting data quality. The geometric quality of a point cloud is influenced by the accuracy and precision of the laser scanner (e.g., [24][25][26][27][28][29][30][31]). The level of identifiable detail in the point cloud is affected by the scan resolution and disturbed by unwanted errors caused by e.g., edge effects (e.g., [26,30,[32][33][34]). Furthermore, the target surface reflectivity and geometry clearly have an impact on the geometric quality of the resulting point cloud (e.g., [24,26,28,[35][36][37][38]). Many of these studies have focused on investigating the effect of the target   1 Information from the manufacturers' product specifications. 2 Size of a single exported image frame. 3 Estimated from the data.   The X-Rite ColorChecker Classic [72] color reference target (Figure 3a) was used to assess quality factors related to color reproduction, such as color accuracy. A standard size (21.6 × 27.9 cm) ColorChecker chart consists of 24 reference patches representing natural objects, as well as chromatic, primary, and grayscale colors.
The sinusoidally modulated Siemens star [73] chart ( Figure 3b) was used to assess quality factors related to detail reproduction such as image sharpness. The Siemens star chart (size of 50.0 × 66.7 cm) consists of 144 pattern bands and is included in the ISO 12233:2017 [74] standard (Annex E). Measuring sharpness via resolution measurements using the sinusoidal Siemens star is considered a reliable approach for all cameras and is less susceptible to image processing (e.g., sharpening) compared to methods that rely on high contrast edges. Furthermore, it allows measuring sharpness from multiple angles at a time [73].
The simplified ISO 15739 digital camera noise test chart (Figure 3c) based on ISO 15739 [75] was used to quantify the amount of noise in the data. The noise test chart (size of 30.5 × 45.7 cm) consists of 15 uniform greyscale patches specially designed for measuring noise.
The test environment was illuminated with uniform the standard illuminant D65 (for noon daylight and sRGB) as specified in the ISO 11664-2:2007 standard [76]. D65 is highly compatible with the popular sRGB color space [77] that has a white point at a corresponding 6500 K temperature and is considered to be consistent among various devices (e.g., computers, cameras, monitors, and mobile devices), as well as the Internet and common 3D graphics programming interfaces such as Direct3D [78], OpenGL [79], and WebGL [80]. The X-Rite ColorChecker Classic [72] color reference target (Figure 3a) was used to assess quality factors related to color reproduction, such as color accuracy. A standard size (21.6 × 27.9 cm) ColorChecker chart consists of 24 reference patches representing natural objects, as well as chromatic, primary, and grayscale colors.

Data Acquisition
The sinusoidally modulated Siemens star [73] chart ( Figure 3b) was used to assess quality factors related to detail reproduction such as image sharpness. The Siemens star chart (size of 50.0 × 66.7 cm) consists of 144 pattern bands and is included in the ISO 12233:2017 [74] standard (Annex E). Measuring sharpness via resolution measurements using the sinusoidal Siemens star is considered a reliable approach for all cameras and is less susceptible to image processing (e.g., sharpening) compared to methods that rely on high contrast edges. Furthermore, it allows measuring sharpness from multiple angles at a time [73].
The simplified ISO 15739 digital camera noise test chart (Figure 3c) based on ISO 15739 [75] was used to quantify the amount of noise in the data. The noise test chart (size of 30.5 × 45.7 cm) consists of 15 uniform greyscale patches specially designed for measuring noise.
The test environment was illuminated with uniform the standard illuminant D65 (for noon daylight and sRGB) as specified in the ISO 11664-2:2007 standard [76]. D65 is highly compatible with the popular sRGB color space [77] that has a white point at a corresponding 6500 K temperature and is considered to be consistent among various devices (e.g., computers, cameras, monitors, and mobile devices), as well as the Internet and common 3D graphics programming interfaces such as Direct3D [78], OpenGL [79], and WebGL [80].

Data Acquisition
In the test environment, scans were obtained from a fixed scanning location using a fixed and leveled tripod height of 1.36 me at a distance of 2 m from the targeted image quality test charts. The tripod height was set approximately to the same level as the wall-mounted test charts. Due to the structural differences (e.g., shape, size, and sensor configuration) in the scanner systems, identical positions for the scanner optical centers were not achieved. The optical center heights were within approximately 7 cm from each other between the tested scanners.
Tested scan and imaging settings were set so that the total data collection time per station was reasonable considering the real-life use of the scanner in the field where a project can easily consist of tens or even hundreds of scan stations. Thus, data collection times above 20 min per station were avoided.
The scans were repeated using two alternative resolution settings to observe the effect of scan resolution: a high-density scan setting with the closest available setting to three millimeters at ten meters (see Table 3) and a medium-density scan setting with the closest available setting to six millimeters at ten meters (see Table 3) as selectable in the scanner settings. These parameters were chosen because Remote Sens. 2020, 12, 2748 6 of 31 they were as equally reproducible as possible within all chosen scanner instruments and those also well represented assumed typical real-life use cases.
All the tested scanners were capable of capturing high dynamic range (HDR) images. Whenever possible the tests were repeated using both HDR and low dynamic range (LDR) imaging settings. The Leica RTC360 uses only HDR imaging without any option for LDR. With LDR we refer to non-HDR imaging where the dynamic range of the image data consists of an exposure time of a single image. Whenever possible, the varying imaging settings were set to represent the highest quality setting and as full automation as possible.
Some tested scanners had relevant user-controllable scan and imaging parameters that were tested and set according to the test environment and method. For the Leica ScanStation P40, the image resolution was set to maximum, the exposure time was set to automatic, the white balance adjustment was set to the "cold light" preset mode to match the scene lighting, and the scan sensitivity was set to "normal". For the Faro S 350 the exposure metering mode was set to "even weighted metering", the HDR images were collected with the maximum number of five brackets, and the scan quality setting (which reduces the level of noise in the distance measurement at the cost of increasing the scanning time) was set to "3x". The photographic reference dataset was captured from the same location with the scans in the NEF (Nikon electronic file) format. A summary of the acquired scans and their selected comparable settings for assessing the point cloud colorization quality are listed in Table 3.

Developed Method in Brief
After the data acquisition phase, the 3D point cloud data sets were processed and analyzed for the purpose of evaluating the colorization quality. For this goal, a test method was developed to prepare 2D images from the 3D scanned test charts and analyze the resulting image data using selected image quality metrics that describe the capabilities of the scanner system to reproduce colors and details in the scene. To assess the usefulness of our method for benchmarking purposes, the results of these individual quality metrics were summarized into one combined quality score per scan. This proposed method for evaluating the colorization quality is introduced in detail in the next section (Section 3).

The Proposed Method for Evaluating the Colorization Quality of TLS-Derived 3D Point Clouds
To evaluate the point cloud colorization quality of the tested TLS systems, a test method was developed. The key purpose of this proposed method was to process, colorize, and prepare the 3D point clouds of the scanned image quality test charts into 2D images that could be analyzed using image quality measurements. Our test method was split into three stages: (1) point cloud pre-processing and colorization, (2) point cloud preparation for image quality analysis, (3) image quality analysis, and (4) combining metrics to achieve a final quality score. An overview of the developed method is illustrated in Figure 4.

The Proposed Method for Evaluating the Colorization Quality of TLS-Derived 3D Point Clouds
To evaluate the point cloud colorization quality of the tested TLS systems, a test method was developed. The key purpose of this proposed method was to process, colorize, and prepare the 3D point clouds of the scanned image quality test charts into 2D images that could be analyzed using image quality measurements. Our test method was split into three stages: 1) point cloud preprocessing and colorization, 2) point cloud preparation for image quality analysis, 3) image quality analysis, and 4) combining metrics to achieve a final quality score. An overview of the developed method is illustrated in Figure 4.

Image Quality
Image quality is the result of a complex combination of quality factors inherited from the imaging sensor, the lens, and the image processing pipeline. Image quality can be assessed objectively by analyzing the image data via various automated image quality measurements, or subjectively by relying on various perceptual methods that assess human subjects [81]. For objective assessment there exists a great amount of literature and numerous standards (e.g., ISO 12233 [74] for resolution and spatial frequency response measurements, and ISO 15739 [75] for noise measurements) for testing various image quality aspects of digital cameras. These standards define a broad range of quality metrics that are typically calculated from a wide variety of different test charts, e.g., Siemens stars or slanted-edge charts for resolution measurements, and grayscale charts for noise. Image quality testing from these charts can be performed with applicable software, e.g., Imatest Master (Imatest LCC, Boulder, CO, USA) [82] or iQ-Analyser (Image Engineering GmbH & Co. KG, Kerpen, Germany) [83].
Sharpness, color reproduction, and noise have been regarded as the most important metrics of imaging quality (e.g., [62]), but no single metric exists that would depict the quality of a camera as a whole. Firstly, sharpness determines the amount of detail the imaging sensor can capture. For example, it can be objectively quantified with resolution measurements that define the sensor's

Image Quality
Image quality is the result of a complex combination of quality factors inherited from the imaging sensor, the lens, and the image processing pipeline. Image quality can be assessed objectively by analyzing the image data via various automated image quality measurements, or subjectively by relying on various perceptual methods that assess human subjects [81]. For objective assessment there exists a great amount of literature and numerous standards (e.g., ISO 12233 [74] for resolution and spatial frequency response measurements, and ISO 15739 [75] for noise measurements) for testing various image quality aspects of digital cameras. These standards define a broad range of quality metrics that are typically calculated from a wide variety of different test charts, e.g., Siemens stars or slanted-edge charts for resolution measurements, and grayscale charts for noise. Image quality testing from these charts can be performed with applicable software, e.g., Imatest Master (Imatest LCC, Boulder, CO, USA) [82] or iQ-Analyser (Image Engineering GmbH & Co. KG, Kerpen, Germany) [83].
Sharpness, color reproduction, and noise have been regarded as the most important metrics of imaging quality (e.g., [62]), but no single metric exists that would depict the quality of a camera as a whole. Firstly, sharpness determines the amount of detail the imaging sensor can capture. For example, it can be objectively quantified with resolution measurements that define the sensor's capability to maintain the optical contrast of increasingly finer details in a scene [84]. Secondly, color reproduction determines the ability of the imaging sensor to reproduce colors in the scene. An accurate color description can be understood as a truthful combination of chromatic and luminance components. Finally, noise can be described as unwanted random spatial variation in an image, and objective noise measurements are commonly based on measuring a signal-to-noise ratio from an image.

Point Cloud Pre-Processing and Colorization
The point cloud pre-processing workflow (see Figure 4) consisted of processing the raw scan data and colorizing the point clouds. The raw laser scan data was processed with respective original software from the scanner manufacturers (Leica Cyclone REGISTER 360 version 1.6.2. [85] and Faro SCENE version 7.5.2.3361 [86]). Both manufacturers' software included various optional features to adjust the image data prior to the point cloud colorization. This included support for exporting the panoramic images for editing with third-party software, tools to semi-automatically tune the white balance, and several alternative tone mapping operators that are used to map the HDR image data to more limited dynamic ranges that are universally supported by typical devices and displays.
For LDR scans, all scans were colorized using the suggested default settings to achieve as straightforward and automated results as possible that would depict the most typical colorized point cloud output from the given scanner.
For HDR scans, two alternative strategies were employed. Firstly, the scans were colorized using the default settings in the software similarly to LDR. Alternatively, the resulting raw equirectangular panoramic photos were also tone mapped using a linear operator, and then the exposure and white balance levels were manually set in an external program Darktable (version 3.0.0), an open-source program for editing raw photographs [87]. The idea of this alternative approach was to process the raw image data with an equal workflow to mitigate the unknown effects of automated image processing in point cloud colorization. In general, the purpose of a tone mapping operator is to compress the luminance range (e.g., from 32-bit into 8-bit per channel) while preserving contrast [88]. There exist a wide range of different tone mapping operators but validating and comparing their performance is considered difficult [89] and was considered beyond the scope of this work. Thus, a linear tone mapping operator was selected as the most consistent alternative between the tested scanners. Firstly, 32-bit raw panoramic image files were exported (as .exr files from the Leica Cyclone REGISTER 360 and as .hdr files from the Faro SCENE) using linear tone mapping settings. Secondly, the image exposure and white balance were manually edited in Darktable and finally, the images were exported as 8-bit JPEG image files in sRGB color space for subsequent colorization in the Leica and Faro software.
For both the LDR and HDR scans, the resolution of the equirectangular panoramic images was set as large as possible for the maximum level of detail in the point cloud colorization. For Leica ScanStation P40 and RTC360 scanners, the resolution was set to the maximum of 20,480 × 10,240 pixels, and with Faro Focus S 350 the maximum value of 20,288 × 10,144 was used. For the Leica BLK360, the default resolution setting of 8192 × 4096 was selected because the estimated total pixel count per scan was not considered feasible to produce a panoramic photo with the same maximum resolution (of 20,480 × 10,240 pixels) without oversampling and thus potentially producing biased test results.
Finally, the colorized point clouds were exported in the E57 format containing location (XYZ) and color (eight-bit per channel RGB) information in sRGB color space.

Point Cloud Preparation for Image Quality Analysis
Colorized point cloud data per each tested scanner was prepared using CloudCompare (version 2.10-alpha), which is an open-source 3D point cloud and mesh model processing software package [90]. The goal was to prepare and render comparable 2D image files from colorized 3D point cloud data for each test chart for further image quality analysis. This data preparation workflow (illustrated in Figure 4) was as follows: 1.
3D points representing each test chart were manually segmented from the point clouds. 2a. The points representing the ColorChecker were rendered as point clouds using rectangular points with the point size set to the minimum so that there were no visible holes inside the charts. This was done to mitigate any interpolation of color data before the color-related quality measurements. 2b. Alternatively, for the points representing the Siemens star chart and the simplified ISO 15739 digital camera noise test chart, a Delaunay triangulation was performed to create mesh models (with vertex colors) of the charts. This was done to fill all potential gaps in the point cloud and to negate the effect of point size for the detail reproduction-related quality measurements.

3.
The segmented test charts were rendered using orthographic projection and an equal zoom level between the charts and exported as 2D image files in PNG format.

Image Quality Analysis
The rendered 2D images of the 3D scanned test charts were analyzed using Imatest Master (version 2020.1.0.45711 Alpha), a software package for analyzing image quality factors [82]. The eight-bit (per Remote Sens. 2020, 12, 2748 9 of 31 channel) image files (in sRGB color space) processed from colorized 3D point cloud data per each respective image quality chart were used as input data for image quality analysis (see Figure 4).
Prior to the analysis, the photographic reference image collected with the Nikon D800E was converted in Darktable into a 16-bit TIFF file from the raw NEF format using AmaZe de-mosaicking algorithm, and scene-specific exposure and white balance levels were adjusted without any artificial sharpening or denoising.

Color Reproduction
A Color/Tone module [91] in Imatest was used to calculate the color accuracy from the ColorChecker charts for each scan. Algorithms used by Imatest are described in detail in [92]. CIEDE2000 color difference formulas [93], also part of the ISO/CIE 11664-6:2014 standard for colorimetry [94], were used to calculate the mean color difference ∆E 00 and chroma difference ∆C 00 . The CIEDE2000 formula is considered to be the most accurate color differencing equation (e.g., [95,96]). Compared to the total color difference (∆E 00 ), the chroma difference (∆C 00 ) describes color accuracy as an error in colorfulness where the effects of exposure errors are reduced (indicated by differences in luminance) [92]. As additional metrics, the exposure and white balance error were calculated from the grayscale patches of the ColorChecker chart using the calculation methods described in [92].

Detail Reproduction
A Star Chart module [97] in Imatest was used to measure the instrument's capability to reproduce details in the scene. The sharpness (modulation transfer function, MTF) and Shannon information capacity were calculated from the sinusoidally modulated Siemens star chart for each tested scan.
The sharpness can determine the maximum level of detail that the imaging system is able to capture. As a metric for system sharpness, the MTF was measured as a mean from the star pattern along the radii of a circle in eight segments. MTF is often interchangeably referred to as a spatial frequency response (SFR). For comparison purposes, spatial frequencies of MTF50P and MTF10P were selected as metrics to summarize the MTF curves. MTF50P describes a spatial frequency where the image contrast drops to 50% of its peak value and it is less sensitive to artificial image sharpening and can provide a more stable indication of the system performance than other similarly used metrics [98]. MTF10P describes a spatial frequency where the image contrast drops to 10% of its peak value and it is considered to correspond to a limiting resolution below which all information can be considered useless. Algorithms for calculating the MTF from a Siemens star in Imatest are based on [73] and the ISO 12233 standard [74].
In addition to assessing the sharpness by measuring the MTF values, the Shannon information capacity [99] was tested as an experimental image quality metric to measure the information capacity in bits per pixel for each scan. It is a novel metric introduced in Imatest and based on the original Shannon information capacity theory [100] with the hypothesis that image sharpness (MTF) and noise correlate to the information capacity that is proportional to the perceived image quality. The sinusoidal Siemens star is a recommended test chart for calculating the Shannon information capacity because it allows the signal (proportional to MTF) and noise to be calculated from the same location. The method used by Imatest is described in detail by [101].
Noise can be described as undesirable random spatial variation in an image that obscures desired details. To quantify the level of noise a signal-to-noise ratio (SNR) was calculated according to the ISO 15739 [75] standard using the Color/Tone module [91] in Imatest. The SNR was calculated from the uniform gray patches of the simplified ISO 15739 grayscale noise test chart for each scan. The calculation method is described in detail in [102].

Combining Metrics for a Quality Score
For the purpose of benchmarking, the individual calculated quality metrics were combined to form a comparable quality score per scan. To achieve this, we recognized that geometric means have been described as suitable averaging approaches to combine a wide range of measurements even without normalization [103,104]. For example, [62] used geometric means to combine camera phone image quality and performance metrics into one benchmark score. Similar to this, the point cloud colorization quality score per scan was calculated as a geometric mean of the selected quality metrics using the following Equation (1): The measured results of system sharpness (summarized as MTF50P and MTF10P), Shannon information capacity (C), signal-to-noise ratio (SNR), color difference (∆E 00 ), and white balance error (WBerr) were combined into one benchmarking score. The quality score combines these individual quality metrics that describe the instrument's capability in reproducing color and details from the scene in the form of RGB color values of a 3D point cloud.

Color Reproduction
To evaluate the scanner systems' ability to reproduce colors, the color accuracy was measured from colorized 3D point clouds representing the ColorChecker chart for each scan. Additional metrics of exposure error and the mean white balance error were also calculated. A table summarizing the results is presented in Table A1 and the analyzed ColorChecker charts for each scan with high-resolution setting (closest to 3 mm @ 10 m) are listed for visual reference in Table A2.
To quantify the total color accuracy, the mean values of absolute color difference (∆E 00 ) for each scan are presented in Figure 5, below. Additionally, to assess the color accuracy with the minimum effect of luminance, the mean chroma difference (∆C 00 ) was calculated for each scan. The color and chroma differences were calculated using the CIEDE2000 formulas [93]. For all of the tested scanners, the total color difference was the largest when using HDR mode with default tone mapping settings. The HDR scans acquired with the Leica RTC360 produced the smallest error when using these default settings. The results from the Faro S 350 were closest to the ColorChecker reference when the linear tone mapping settings were used or no HDR mode was used at all. As expected, the scan resolution setting did not show any significant effect on the color difference. For the chroma difference, using LDR produced generally more accurate chroma than using HDR. Of all tested scans, the Faro S 350 produced the smallest error while the Leica P40 had the largest differences in chroma values. The measured exposure errors per scan are presented in Figure 6. The f-stop is the ratio between the focal length of the camera lens and the diameter of the entrance pupil. The HDR scans with default settings appeared to suffer from underexposure, except for the Leica RTC360 that produced the most accurate exposure. All the LDR scans suffered from overexposure, except the one collected with the Leica BLK360. The measured exposure errors per scan are presented in Figure 6. The f-stop is the ratio between the focal length of the camera lens and the diameter of the entrance pupil. The HDR scans with default settings appeared to suffer from underexposure, except for the Leica RTC360 that produced the most accurate exposure. All the LDR scans suffered from overexposure, except the one collected with the Leica BLK360. The measured exposure errors per scan are presented in Figure 6. The f-stop is the ratio between the focal length of the camera lens and the diameter of the entrance pupil. The HDR scans with default settings appeared to suffer from underexposure, except for the Leica RTC360 that produced the most accurate exposure. All the LDR scans suffered from overexposure, except the one collected with the Leica BLK360. The measured mean white balance errors per scan are presented in Figure 7. The HDR scan with default tone mapping settings scanned with the Leica BLK360 clearly has the largest white balance error, while the Faro S 350 had the most accurate white balance.  The measured mean white balance errors per scan are presented in Figure 7. The HDR scan with default tone mapping settings scanned with the Leica BLK360 clearly has the largest white balance error, while the Faro S 350 had the most accurate white balance. The measured exposure errors per scan are presented in Figure 6. The f-stop is the ratio between the focal length of the camera lens and the diameter of the entrance pupil. The HDR scans with default settings appeared to suffer from underexposure, except for the Leica RTC360 that produced the most accurate exposure. All the LDR scans suffered from overexposure, except the one collected with the Leica BLK360. The measured mean white balance errors per scan are presented in Figure 7. The HDR scan with default tone mapping settings scanned with the Leica BLK360 clearly has the largest white balance error, while the Faro S 350 had the most accurate white balance.  A comparison between the measured color and the reference color for each patch of the ColorChecker chart is presented in Figure 8, below. The comparison showed visually perceivable differences in the chroma, white balance, and luminance values between the different scans and settings.
Remote Sens. 2020, 12, x FOR PEER REVIEW 12 of 31 A comparison between the measured color and the reference color for each patch of the ColorChecker chart is presented in Figure 8, below. The comparison showed visually perceivable differences in the chroma, white balance, and luminance values between the different scans and settings.

Detail Reproduction
The selected quality metrics related to system sharpness and information capacity were measured from colorized 3D scans of the Siemens star chart. To estimate the level of noise in the colored point clouds, the signal-to-noise ratio according to ISO 15739 was measured from colorized

Detail Reproduction
The selected quality metrics related to system sharpness and information capacity were measured from colorized 3D scans of the Siemens star chart. To estimate the level of noise in the colored point clouds, the signal-to-noise ratio according to ISO 15739 was measured from colorized 3D scans of the simplified ISO 15739 noise test chart for each scan. A summary of the results is presented in Table A3 in the appendices and for visual reference, the analyzed Siemens star charts and the simplified ISO 15739 noise test charts per scan segmented from colorized 3D point clouds with high-resolution settings (the closest to 3 mm @ 10 m) are listed in Tables A4 and A5 in the appendices.

Sharpness
As a metric for system sharpness, MTF curves were measured from the sinusoidal Siemens star charts. As expected, the results (see Figure 10) showed a clear increase in sharpness when the scan resolution was increased. While the differences between the tested dynamic range settings were not so self-evident. The mean MTF curves for each scan for all the compared dynamic range settings are presented in Figure 9. When using LDR the Leica P40 produced the sharpest result of all compared scans with a noticeable improvement in sharpness compared to the HDR scans. As expected, using HDR with the linear dynamic range settings appeared to reduce the sharpness with all scanners. Overall, the Leica BLK360 clearly produced the least sharp results at all tested settings.
Remote Sens. 2020, 12, x FOR PEER REVIEW 13 of 31 3D scans of the simplified ISO 15739 noise test chart for each scan. A summary of the results is presented in Table A3 in the appendices and for visual reference, the analyzed Siemens star charts and the simplified ISO 15739 noise test charts per scan segmented from colorized 3D point clouds with high-resolution settings (the closest to 3 mm @ 10 m) are listed in Tables A4 and A5 in the appendices.

Sharpness
As a metric for system sharpness, MTF curves were measured from the sinusoidal Siemens star charts. As expected, the results (see Figure 10) showed a clear increase in sharpness when the scan resolution was increased. While the differences between the tested dynamic range settings were not so self-evident. The mean MTF curves for each scan for all the compared dynamic range settings are presented in Figure 9. When using LDR the Leica P40 produced the sharpest result of all compared scans with a noticeable improvement in sharpness compared to the HDR scans. As expected, using HDR with the linear dynamic range settings appeared to reduce the sharpness with all scanners. Overall, the Leica BLK360 clearly produced the least sharp results at all tested settings. To summarize the system sharpness, the selected spatial frequencies of MTF50P and MTF10P for all tested scans are presented in Figure 10. To summarize the system sharpness, the selected spatial frequencies of MTF50P and MTF10P for all tested scans are presented in Figure 10.

Information Capacity
The calculated Shannon information capacity per scan is described in Figure 11. Overall, the information capacity increased with the scan resolution. Amongst all the tested scans the Faro S 350

Information Capacity
The calculated Shannon information capacity per scan is described in Figure 11. Overall, the information capacity increased with the scan resolution. Amongst all the tested scans the Faro S 350 produced the highest information capacity, especially at a higher scan resolution. Further, the information capacity of the Leica BLK360 was the lowest of all the comparable tested scans.

Information Capacity
The calculated Shannon information capacity per scan is described in Figure 11. Overall, the information capacity increased with the scan resolution. Amongst all the tested scans the Faro S 350 produced the highest information capacity, especially at a higher scan resolution. Further, the information capacity of the Leica BLK360 was the lowest of all the comparable tested scans.

Noise
The signal-to-noise ratio according to ISO 15739 was measured from colorized 3D scans of the simplified ISO 15739 noise test chart per each scan and the results are described in Figure 12. Linear tone mapping improved SNR in all scans except the ones acquired with the Leica RTC360. The Leica P40 produced better results with default HDR than with LDR. On the other hand, using LDR improved the results with the Faro S350 and especially with the Leica BLK360.

Noise
The signal-to-noise ratio according to ISO 15739 was measured from colorized 3D scans of the simplified ISO 15739 noise test chart per each scan and the results are described in Figure 12. Linear tone mapping improved SNR in all scans except the ones acquired with the Leica RTC360. The Leica P40 produced better results with default HDR than with LDR. On the other hand, using LDR improved the results with the Faro S350 and especially with the Leica BLK360.
Remote Sens. 2020, 12, x FOR PEER REVIEW 15 of 31 Figure 12. Signal-to-noise ratio (in dB) for each tested scan measured from a simplified ISO 15739 noise chart.

Quality Score
The individual quality metrics for the color difference (ΔE00), white balance error (WBerr), sharpness (MTF50P and MTF10P), Shannon information capacity (C), and signal-to-noise ratio (SNR) were combined into a single quality score using Equation 1 and are presented in Figure 13. The results indicated that the Faro S 350 had the best colorization quality when using LDR or HDR with linear tone mapping. The Leica RTC360 produced the best colorization quality when using HDR with minimal manual processing steps (default tone mapping settings). The Leica BLK360 produced the lowest quality score for all tested settings. Using linear tone mapping appeared to increase the colorization quality for all the scanners except for the Leica RTC360. In addition, the scan resolution

Quality Score
The individual quality metrics for the color difference (∆E 00 ), white balance error (WBerr), sharpness (MTF50P and MTF10P), Shannon information capacity (C), and signal-to-noise ratio (SNR) were combined into a single quality score using Equation 1 and are presented in Figure 13. The results indicated that the Faro S 350 had the best colorization quality when using LDR or HDR with linear tone mapping. The Leica RTC360 produced the best colorization quality when using HDR with minimal manual processing steps (default tone mapping settings). The Leica BLK360 produced the lowest quality score for all tested settings. Using linear tone mapping appeared to increase the colorization quality for all the scanners except for the Leica RTC360. In addition, the scan resolution increase improved the colorization quality for all tested scanners. noise chart.

Quality Score
The individual quality metrics for the color difference (ΔE00), white balance error (WBerr), sharpness (MTF50P and MTF10P), Shannon information capacity (C), and signal-to-noise ratio (SNR) were combined into a single quality score using Equation 1 and are presented in Figure 13. The results indicated that the Faro S 350 had the best colorization quality when using LDR or HDR with linear tone mapping. The Leica RTC360 produced the best colorization quality when using HDR with minimal manual processing steps (default tone mapping settings). The Leica BLK360 produced the lowest quality score for all tested settings. Using linear tone mapping appeared to increase the colorization quality for all the scanners except for the Leica RTC360. In addition, the scan resolution increase improved the colorization quality for all tested scanners. Figure 13. Quality score combining the selected image quality metrics into one 3D point cloud colorization quality score for all tested scans and the photographic reference.

Discussion
We implemented a method to assess the point cloud colorization quality of modern commercial TLS systems with integrated imaging sensors. The capabilities to reproduce colors and details were investigated by applying established image quality assessment methods. The results showed clear differences between the tested scanners in all measured quality aspects and can be supported well with visual observations. These measured and perceived quality inconsistencies in point cloud colorization reduce the reliability and usefulness of the color information and can hinder the ability to produce uniform color information between different scan settings and scanners. With this research, we hope to raise the awareness of the importance of the quality of point cloud colorization and its implications on applications that rely on colored 3D point clouds. Better colorization quality Figure 13. Quality score combining the selected image quality metrics into one 3D point cloud colorization quality score for all tested scans and the photographic reference.

Discussion
We implemented a method to assess the point cloud colorization quality of modern commercial TLS systems with integrated imaging sensors. The capabilities to reproduce colors and details were investigated by applying established image quality assessment methods. The results showed clear differences between the tested scanners in all measured quality aspects and can be supported well with visual observations. These measured and perceived quality inconsistencies in point cloud colorization reduce the reliability and usefulness of the color information and can hinder the ability to produce uniform color information between different scan settings and scanners. With this research, we hope to raise the awareness of the importance of the quality of point cloud colorization and its implications on applications that rely on colored 3D point clouds. Better colorization quality leads to higher quality interpretation and analyses, and potentially increases the level of automation in various data processing and modeling tasks.
The established objective image quality metrics were successfully implemented and appear to be useful in evaluating TLS colorization quality even in the case where the quality of an imaging system is a complex combination of an unknown number of unknown factors related to the quality of the lens, imaging sensor, image processing (e.g., compression, tone mapping, sharpening, noise removal, panoramic stitching, and point cloud colorization) and the test chart. Nevertheless, our method can be considered suitable for comparing all TLS instruments even those which are essentially "black box" types of systems, or those that rely on an external camera for point cloud colorization. Yet we hope to encourage scanner manufacturers to be more open about the imaging capabilities and specifications of their scanners. This would not only help in fixing the found errors and inconsistencies in the colorization but would assist in making better and more proven decisions on instrument selection. Additionally, at least for many scientific applications, fully controllable measurement parameters would be highly beneficial. The results suggest that the problems in the reproduction of color could be mitigated in data processing while the issues related to the reproduction of details such as sharpness or noise are much more beyond the control of a user.

Color Reproduction
Furthermore, the measured and visually clearly detectable errors (e.g., reported in real-life conditions by [58]) in luminance caused by the inaccurate exposure time and white balance can be corrected using semi-automated or manual adjustments similar to those commonly used in editing photographs. Tools to correct these types of errors are already offered by the scanner manufacturers but their use can be discouraged due to lack of awareness and knowledge caused by vague settings and insufficient documentation, or the significantly increased workload when editing data of larger measuring campaigns from multiple scanning stations. In addition to the luminance, properly correcting the chroma error and verifying the corrections would require the scanner to be color calibrated in the field with a color reference chart like the ColorChecker and applicable tools (similarly to the color corrections done for a photogrammetric data set by [2]). This color calibration would improve the quality of the colored 3D point cloud data, but also the collected panoramic images that could also be used directly e.g., as a data source in virtual tour applications or in illuminating virtual scenes with HDR image-based lighting.
Lighting conditions have a significant impact on the colorization quality. In even and stable lighting conditions, such as in our test environment, using the HDR mode to colorize the scans appears to offer few benefits if the colorization is done with the default settings and without any scene-specific manual adjustments. On the contrary, scans colorized using LDR image data appeared to produce better results than the default HDR workflows. However, HDR can be expected to be much more beneficial in challenging real-life circumstances with changing and uneven lighting, such as in typical outdoor environments. It is also notable that all the tested point clouds included color information of eight-bits per channel, which does not exploit the full potential of HDR and thus favors LDR. Automatically compressing the luminance range from 32-bits to 8-bits per channel causes inaccuracies in the color data, but 8-bits per channel is arguably the most widely used and supported format for storing the color information in point clouds and displaying and handling it in various applications and viewers.
By processing the HDR data alternatively with linear settings we were able to assess the colorization quality with fewer unknown image processing steps compared to automated tone mapping. The clear improvement in quality between the default and linear tone mapping settings suggests that there is potential to achieve a better colorization quality by improving or optimizing the automated image processing pipeline.
On the other hand, the goal of these automated processes could be to produce more visually appealing colors than accurate ones. When speaking of color, it is crucial to note the difference between accurate and visually pleasing color. Automatic image processing is typically aimed towards providing pleasing color via processes such as saturation boosting and artificial sharpening. However, in many applications, especially in remote sensing, the truthful and unmodified presentation of radiometric values can be considered more important and useful than visual appearance (e.g., [105]), and even a precondition for some use cases such as imaging luminance measurement (e.g., [106]). Furthermore, it is arguably easier to adjust accurate colors to be visually pleasing than adjusting visually pleasing colors to be accurate.
The challenges in automated and scene-specific processing can be underlined with the observable errors in sharpness, colorfulness, exposure, and white balance between the tested scans as seen in Figure 14.
appearance (e.g., [105]), and even a precondition for some use cases such as imaging luminance measurement (e.g., [106]). Furthermore, it is arguably easier to adjust accurate colors to be visually pleasing than adjusting visually pleasing colors to be accurate.
The challenges in automated and scene-specific processing can be underlined with the observable errors in sharpness, colorfulness, exposure, and white balance between the tested scans as seen in Figure 14. Figure 14. A comparison of cropped close-ups rendered from the colorized 3D point clouds of a printed photograph in the test environment. The LDR scan by the Faro S 350 produced the most accurate colors while the LDR scan of the Leica P40 produced the sharpest result, but both also suffered from blown-out highlights. The Leica RTC360 had the most accurate automated exposure and was the only HDR scan that was not visibly underexposed, perhaps explained by the fact that it does not rely on exposure metering in the field. Furthermore, in visual assessments, the Leica BLK360 produced the weakest results overall in respect to both detail and color with visible oversaturation and blurriness.

Detail Reproduction
A scanner's capability to reproduce details that transfer into color information is a complex combination of factors affected, e.g., by the scan resolution and the resolution of the images the color information is transferred from. In practice, the smallest detectable detail is governed by the sampling distance of both the laser and the camera sensor and, e.g., interpolation is required if the image resolution is not high enough. Additionally, the overlap required by the panoramic stitching affects the sampling. Thus, compared to color reproduction, assessing the factors related to detail is perhaps even more complex and difficult to measure, and moreover, to correct.
An illustration of how color values are sampled from equirectangular panoramic images into 3D point clouds is presented in Figure 15. Further, the ground sample distance (GSD) of the image data can be estimated using the physical size of the center circle (with a diameter of 23 mm) of the Siemens star chart to give a better picture of the limitations in transferring details from the image data into the 3D point cloud. Figure 14. A comparison of cropped close-ups rendered from the colorized 3D point clouds of a printed photograph in the test environment. The LDR scan by the Faro S 350 produced the most accurate colors while the LDR scan of the Leica P40 produced the sharpest result, but both also suffered from blown-out highlights. The Leica RTC360 had the most accurate automated exposure and was the only HDR scan that was not visibly underexposed, perhaps explained by the fact that it does not rely on exposure metering in the field. Furthermore, in visual assessments, the Leica BLK360 produced the weakest results overall in respect to both detail and color with visible oversaturation and blurriness.

Detail Reproduction
A scanner's capability to reproduce details that transfer into color information is a complex combination of factors affected, e.g., by the scan resolution and the resolution of the images the color information is transferred from. In practice, the smallest detectable detail is governed by the sampling distance of both the laser and the camera sensor and, e.g., interpolation is required if the image resolution is not high enough. Additionally, the overlap required by the panoramic stitching affects the sampling. Thus, compared to color reproduction, assessing the factors related to detail is perhaps even more complex and difficult to measure, and moreover, to correct.
An illustration of how color values are sampled from equirectangular panoramic images into 3D point clouds is presented in Figure 15. Further, the ground sample distance (GSD) of the image data can be estimated using the physical size of the center circle (with a diameter of 23 mm) of the Siemens star chart to give a better picture of the limitations in transferring details from the image data into the 3D point cloud.
Increasing the scan resolution unsurprisingly increased the level of measurable details, but the tested dynamic range settings appeared to show variable effects. This can be explained by unknown processing steps, such as sharpening or noise reduction, in the image processing pipeline. The measured sharpness and visual observations suggest that especially the data collected with the Leica P40 was affected by strong artificial sharpening. This can be seen as clearly visible halo effects in the edges and is mitigated in the linear processed HDR datasets ( Figure 16). Remote Sens. 2020, 12, x FOR PEER REVIEW 18 of 31 Increasing the scan resolution unsurprisingly increased the level of measurable details, but the tested dynamic range settings appeared to show variable effects. This can be explained by unknown processing steps, such as sharpening or noise reduction, in the image processing pipeline. The measured sharpness and visual observations suggest that especially the data collected with the Leica P40 was affected by strong artificial sharpening. This can be seen as clearly visible halo effects in the edges and is mitigated in the linear processed HDR datasets ( Figure 16).   Increasing the scan resolution unsurprisingly increased the level of measurable details, but the tested dynamic range settings appeared to show variable effects. This can be explained by unknown processing steps, such as sharpening or noise reduction, in the image processing pipeline. The measured sharpness and visual observations suggest that especially the data collected with the Leica P40 was affected by strong artificial sharpening. This can be seen as clearly visible halo effects in the edges and is mitigated in the linear processed HDR datasets ( Figure 16).  The calculated Shannon information capacity indicated the superior quality of the Faro S 350 and the photographic reference dataset. In theory, this novel image quality metric could be the best single quality metric to describe the potential detail reproduction of the whole system since it takes both sharpness (i.e., MTF) and noise into account and measures both from the same location. The measured results appear to be more in line with the visual observations than relying solely on single MTF measurements or a signal-to-noise ratio that on their own seem to be more sensitive to image processing.
All the tested scans were measured to be unsharp (or blurred) and especially the scans by the Leica BLK360 were significantly more blurred compared to the other tested scans. This was caused, at least partially, when some of the details were transferred in the colorization process from pixels into 3D points and then the 3D point data was transformed back into 2D image files where the gaps between the points were filled with interpolated values. This interpolation effect is stronger the lower the point density is. Thus, this data preparation process in effect denoises the analyzed images and to some degree proportionally favors the lowest resolution Leica BLK360 in the SNR measurements.

The Photographic Reference Dataset
As expected, the Nikon D800E DSLR camera produced a better-combined quality score than all the scanners and the best overall color reproduction quality. On the other hand, the TLS scanners performed better at reproducing details than had initially been expected and in some cases (see Figure 10) exceeded the measured sharpness of the reference. However, the comparison between the scans and the photographic reference can be considered somewhat unreliable due to the inevitable differences in the data preparation, where a single photograph was analyzed directly instead of an image rendered from a 3D point cloud. Additionally, the selected 14 mm lens (used typically for photogrammetric applications) resulted in a much wider field of view (approximately 81.2 • × 104.1 • ) when compared to the scanners (e.g., 48 • × 62 • with Leica RTC360) from identical distances to the target. This favored the scanners, especially in the detail reproduction measurements.
From the imaging perspective, using a handheld camera allows more flexibility in selecting a desirable and optimal point-of-view, whereas TLS is tied into a more limited number of fixed locations that are typically governed by the requirements and limitations of laser scanning. Yet, the results might further indicate that it is not inevitable that using an external camera would result in better colorization quality than that obtained from integrated camera sensors. Especially if scene-specific manual adjustments are made to the images collected by the scanner prior to colorization.

The Effect of Data Collection Speed
When assessing the colorization performance of a terrestrial laser scanner, the speed of the data acquisition cannot be neglected. The speed of a TLS instrument depends on the selected scan and imaging settings (e.g., scan resolution and dynamic range settings such as the number of exposure brackets with HDR) and the performance of the instrument. Long data collection times can not only increase the duration and cost of the real-life projects but also reduce the quality of the collected data via changes in the environment and its lighting such as changing weather conditions or moving objects in the scene.
In this study, the differences between the data acquisition times were strikingly different with possible real-life implications. When collecting HDR data at high scan resolutions (closest to 3.0 mm @ 10 m) the fastest tested scanner (the Leica RTC360 at 2 min 42 s) was ten times faster than the slowest scanner (the Faro S 350 at 26 min 44 s) with equivalent settings. To some degree, the scan time of the Faro S 350 was longer because a quality setting of "3x" had to be used that reduced noise in the distance measurement data and make sure that there were no gaps in the point cloud data. Two of the fastest scanners (the Leica RTC360 and Leica BLK360) use three camera sensors mounted on the scanner body, instead of the more traditional approach where the scene is imaged through a single camera mounted coaxially with the laser. When collecting HDR image data, the imaging time is also affected by the exposure bracketing settings. The Leica P40 collected three brackets, whereas all the other instruments collected five.
Furthermore, in real-life situations in the field, the imaging time could make a difference as to whether to use HDR or LDR mode. For example, with the Faro S 350 took 2 min and 7 s to acquire the LDR images, while collecting the HDR data took 10 min and 19 sec. Hence, this was 8 min and 12 s (487%) more time per scan, and this starts to accumulate if multiple scan stations are required. Furthermore, it would suggest that the decisions made about imaging have a larger impact on the total data acquisition time than those made about the actual laser scanning. The only exception being the scans collected at the high-resolution setting of the Faro S 350 where the scan duration was significantly longer than with the other tested scanners.
The relation between the quality score and scanning speed is further illustrated in Figure 17. Where the scanning speed was calculated as an inverse of the total scan time for each tested scan at a high scanning resolution (closest to 3.0 mm @ 10 m). Like the importance of colorization quality, the importance of the instrument speed also depends on the type of use case and the nature of the project. Furthermore, in real-life situations in the field, the imaging time could make a difference as to whether to use HDR or LDR mode. For example, with the Faro S 350 took 2 min and 7 s to acquire the LDR images, while collecting the HDR data took 10 min and 19 sec. Hence, this was 8 min and 12 s (487%) more time per scan, and this starts to accumulate if multiple scan stations are required. Furthermore, it would suggest that the decisions made about imaging have a larger impact on the total data acquisition time than those made about the actual laser scanning. The only exception being the scans collected at the high-resolution setting of the Faro S 350 where the scan duration was significantly longer than with the other tested scanners.
The relation between the quality score and scanning speed is further illustrated in Figure 17. Where the scanning speed was calculated as an inverse of the total scan time for each tested scan at a high scanning resolution (closest to 3.0 mm @ 10 m). Like the importance of colorization quality, the importance of the instrument speed also depends on the type of use case and the nature of the project.

Study Limitations
Our method assesses the colorization quality by analyzing the results of the 3D point cloud colorization process. Thus, it reflects the real-life situation where a colored 3D point cloud data is directly visualized or used in any application. The tests did not include raw panoramic pictures. While it would have given a better indication of the potential imaging quality it did not reflect the final point cloud colorization quality that well. Furthermore, the raw image data of single image frames could not be taken into account since that was not equally available from the different scanners we tested. Additionally, the effect of color blending that could cause potential artifacts between multiple scan stations was not taken into account.
In practice, testing all the possible image quality metrics was not feasible. Therefore, we selected those metrics that we assumed to be most suitable for describing the colorization quality of TLS-based 3D point clouds. Furthermore, our method combines individual quality metrics to form a comparable

Study Limitations
Our method assesses the colorization quality by analyzing the results of the 3D point cloud colorization process. Thus, it reflects the real-life situation where a colored 3D point cloud data is directly visualized or used in any application. The tests did not include raw panoramic pictures. While it would have given a better indication of the potential imaging quality it did not reflect the final point cloud colorization quality that well. Furthermore, the raw image data of single image frames could not be taken into account since that was not equally available from the different scanners we tested. Additionally, the effect of color blending that could cause potential artifacts between multiple scan stations was not taken into account.
In practice, testing all the possible image quality metrics was not feasible. Therefore, we selected those metrics that we assumed to be most suitable for describing the colorization quality of TLS-based 3D point clouds. Furthermore, our method combines individual quality metrics to form a comparable quality score and treats all selected quality metrics as equally important and includes no weighting. The use of a geometric mean was observed to be suitable for this purpose. Furthermore, since the evaluation was based on the results of a complex chain of processes done in the scanner and later in the manufacturer's software, the results may be subject to change based on the software and firmware versions.

Future Research Directions
Future research topics include the radiometric calibration of TLS instruments from the perspective of point cloud colorization, using a similar approach to evaluate the colorization quality of photogrammetric 3D point clouds, or point clouds colorized with an external camera. Further, multispectral or hyperspectral laser scanning holds great potential for removing the need for additional camera sensors in laser scanning altogether. Combining the intensity data of multiple wavelengths could be used to produce the color information for 3D point clouds directly and actively, as well as to advance existing applications (e.g., automated classification of materials) or create completely new applications (e.g., illumination independent texturing). Despite its promise, multispectral and hyperspectral laser scanning is still in its early stages with operational sensors existing mostly as research prototypes.
In general, the application space of 3D point cloud data is expected to increase in the future, using point clouds more and more directly without complex meshing or vectorization workflows. As an example, recent advances in point-based rendering are enabling the direct use of colored 3D point clouds in real-time rendered environments such as in-game engines (e.g., Unreal Engine [107]) or on the web (e.g., Potree [108]).
In luminance measurement, the photography parameters should be fully controllable and should preserve the captured data unchanged [109]. The TLS systems could be applied to measure a 3D luminance model to be used by architects, lighting designers, or illumination engineers. However, uncontrollable or automatic exposure makes the data useless for luminance measurement. Furthermore, certain common post-processing practices, such as unsharp masking or compressing the luminance range (bit-depth) makes the data worse for luminance measurement. High fidelity in color capturing, and controllable imaging would also be essential when TLS is used to capture data for surface material classification. In surface material classification, both the color and the luminance data are important. As the measurement direction is known, an alteration in luminance data can be applied in order to determine the specularity and the diffusivity parameters of the surface.

Conclusions
Despite the widespread application and indisputable usefulness of colored point clouds, insufficient attention has been put into investigating the colorization quality of TLS-derived 3D point clouds. Previous quality studies related to TLS data have focused largely on various aspects of geometric quality or on the quality of the point intensity values.
We successfully developed a test method to evaluate the point cloud colorization quality of modern commercial TLS systems with integrated imaging sensors. Our method assessed the capability of the tested scanner systems to reproduce colors and details of the scene by measuring the objective image quality metrics: color accuracy (ISO/CIE 11664-6), sharpness (ISO 12233), information capacity, and signal-to-noise ratio (ISO 15739) from test charts (X-Rite ColorChecker, sinusoidal Siemens star (ISO 12233), and a simplified noise test chart (ISO 15739)). Furthermore, the individual quality measurements were summarized into one combined quality score to demonstrate the usefulness of our method in benchmarking the colorization quality of any TLS instrument.
Our study found clearly noticeable quality issues in the tested 3D point clouds that reflect considerable differences between the tested terrestrial laser scanners: the Leica ScanStation P40, Faro Focus S 350, Leica RTC360, and Leica BLK360. The Faro S 350 produced the best colorization quality results when using LDR imaging mode or HDR with linear tone mapping. The Leica RTC360 performed the best when using HDR with default tone mapping settings, and the Leica BLK360 produced the lowest quality score among all tested settings. The results were in line with visual observations and suggest that the problems in color reproduction (i.e., measured differences in color, white balance, and exposure) could be mitigated in data processing, while issues related to detail reproduction (i.e., measured sharpness and noise) are more beyond the control of the scanner user. However, the processes and tools related to fixing these problems (e.g., in-field color calibration) have not yet either been well-established or would be labor-intensive to apply in a real-life project setting. Furthermore, the data acquisition times were significantly different between the tested scans and scanners (e.g., there was a 10× difference between the fastest scanner Leica RTC360 and the slowest scanner Faro Focus S 350). This has implications on the real-life performance of the instruments, and it is often the decisions made about the imaging settings (e.g., whether to use HDR or LDR) that have the greatest impact on the total data acquisition time spent in the field.
Despite being increasingly efficient and accessible measuring instruments, there is a need to develop better and more accessible colorization tools and workflows, and automated image processing pipelines that would increase not only the quality but also the production efficiency of 3D point cloud colorization. This focus on colorization quality would increase the direct applicability of colored 3D point clouds in various visually and radiometrically demanding use cases that require reliable object interpretation and recognition, visual analysis, or photorealism. Further, in many remote sensing related applications, the truthful and unmodified presentation of radiometric values can be considered more important and useful than visual appearance. This development would be relevant and useful not only in traditional application areas such as engineering, surveying, or cultural heritage but increasingly in emerging application fields such as virtual production in the film industry or 3D content creation for video games and immersive experiences.             5.0 mm @ 10 m Leica BLK360 LDR 5.0 mm @ 10 m Table A5. The analyzed simplified ISO 15739 noise test charts (8-bit/sRGB) for each scan segmented from colorized 3D point clouds with high scan resolution setting (closest to 3 mm @ 10 m).  Table A5. The analyzed simplified ISO 15739 noise test charts (8-bit/sRGB) for each scan segmented from colorized 3D point clouds with high scan resolution setting (closest to 3 mm @ 10 m).  Table A5. The analyzed simplified ISO 15739 noise test charts (8-bit/sRGB) for each scan segmented from colorized 3D point clouds with high scan resolution setting (closest to 3 mm @ 10 m).