Automatic extraction of pavement markings on streets from point cloud data of mobile LiDAR

Pavement markings provide an important foundation as they help to keep roads users safe. Accurate and comprehensive information about pavement markings assists the road regulators and is useful in developing driverless technology. Mobile light detection and ranging (LiDAR) systems offer new opportunities to collect and process accurate pavement markings’ information. Mobile LiDAR systems can directly obtain the three-dimensional (3D) coordinates of an object, thus defining spatial data and the intensity of (3D) objects in a fast and efficient way. The RGB attribute information of data points can be obtained based on the panoramic camera in the system. In this paper, we present a novel method process to automatically extract pavement markings using multiple attribute information of the laser scanning point cloud from the mobile LiDAR data. This method process utilizes a differential grayscale of RGB color, laser pulse reflection intensity, and the differential intensity to identify and extract pavement markings. We utilized point cloud density to remove the noise and used morphological operations to eliminate the errors. In the application, we tested our method process on different sections of roads in Beijing, China, and Buffalo, NY, USA. The results indicated that both correctness (p) and completeness (r) were higher than 90%. The method process of this research can be applied to extract pavement markings from huge point cloud data produced by mobile LiDAR.

Pavement markings provide an important foundation as they help to keep roads users safe. Accurate and comprehensive information about pavement markings assists the road regulators and is useful in developing driverless technology. Mobile light detection and ranging (LiDAR) systems offer new opportunities to collect and process accurate pavement markings' information. Mobile LiDAR systems can directly obtain the three-dimensional (3D) coordinates of an object, thus defining spatial data and the intensity of (3D) objects in a fast and efficient way. The RGB attribute information of data points can be obtained based on the panoramic camera in the system. In this paper, we present a novel method process to automatically extract pavement markings using multiple attribute information of the laser scanning point cloud from the mobile LiDAR data. This method process utilizes a differential grayscale of RGB color, laser pulse reflection intensity, and the differential intensity to identify and extract pavement markings. We utilized point cloud density to remove the noise and used morphological operations to eliminate the errors. In the application, we tested our method process on different sections of roads in Beijing, China, and Buffalo, NY, USA. The results indicated that both correctness ( p ) and completeness (r) were higher than 90%. The method process of this research can be applied to extract pavement markings from huge point cloud data produced by mobile LiDAR.
Keywords: laser pulse reflection intensity, mobile LiDAR, automatic extraction, RGB, point density (Some figures may appear in colour only in the online journal) Original content from this work may be used under the terms of the Creative Commons Attribution 3.0 licence. Any further distribution of this work must maintain attribution to the author(s) and the title of the work, journal citation and DOI.

Introduction
Road information is one of the most important parts of basic geographic information, and accurate and high-precision road information plays an important role in urban planning, traffic control, and emergency response [1][2][3][4]. At present, the main record of spatial vector road data is two-dimensional (2D) information; however, it has been unable to meet the needs of three-dimensional (3D) navigation and intelligent city modeling [5]. A laser scanning or light detection and ranging (LiDAR) system is one of the most important technologies to capture spatial data in a fast, efficient and highly effective way. It has been widely used in many fields, such as digital city, basic surveying and mapping, urban planning, transportation, and environment protection. It can directly obtain the 3D coordinates of object for 3D object reconstruction and digital elevation model generation [6]. Mobile LiDAR systems have been actively studied and implemented in the past few years, for example the vehicle-borne laser measurement system [7], StreetMapper [8][9][10][11], LYNX [12,13] and FGI Roamer [14,15]. However, when compared with developments in mobile LiDAR systems, automated algorithms from mobile LiDAR point clouds fall rather behind; this is due to the huge data volumes and the complexity of urban pavement markings. To extract pavement markings, mobile LiDAR point clouds need to be classified into different categories, which is a key step in the accurate identification of pavement markings.
A large amount of research has been recently published on vehicle-mounted or mobile laser scan data. These publications can be placed into two categories: (1) point cloud classification [16][17][18][19] or (2) road information extraction [15,16,20]. However, few studies have been published on the extraction of pavement markings and algorithms cannot be applied to different fields well when extracting the pavement markings. Fang and Yang [21] classified three types of roads and built a digital road model with three indexes: height, point cloud density and slope. They extracted structural road information by analyzing the spatial distribution of point cloud and statistical characteristics of the laser scan lines. Boyko and Funkhouser [22] obtained a 2D connection diagram by approximating the road map network, and gradually producing a digital map of roads with established routes; they extracted road information on a large scale. Hernández and Marcotegui [23] determined the road boundary by the elevation change and differences along the cross-section of the road studied. The hypothesis of this study is that the road surface should be on the same elevation and should be relatively even. However, owing to the different objectives of algorithm design and the low accuracy of the extraction results, all of the mentioned research extracted road information rather than pavement markings.
Kumar et al [24] described an automated way of applying a range of dependent thresholding functions to the intensity values to extract road markings. They use binary morphological operations and generic knowledge of the dimensions of road markings to complete their shapes and remove other road surface elements introduced through the use of thresholding. Yang et al [25] conducted the extraction of road markings by filtering the point cloud using point cloud coordinate locations and distinguishing between target points and other points by laser pulse intensity of scanning. Kumar et al [26][27][28] provided a combination of two modified versions of the parametric active contour or snake model in which the parameters are empirically selected and fixed for all of the road sections. After analyzing the relationship between intensity, distance, and angle, Huang et al [29] eliminated the errors and noise of the original data collected, and classified it into 16 levels using laser pulse reflection intensity. Different intensity levels correspond to different targets of geographic features. However, the intensity of laser scanning is easily affected by the surroundings, such as the weather, the environment, the distance of the laser pulse origin, and so on, hence this absolute quantitative classification of laser pulse reflection intensity cannot be used for various applications of laser scanning data processing. This paper is aimed at the problems mentioned above and proposes a process method of extracting city road pavement markings by collecting the attribute information of vehiclemounted point cloud laser scanning data, which can be widely applied. The method process utilizes X, Y, Z coordinate locations, RGB, laser pulse reflection intensity, and point cloud density to identify and extract pavement markings. It uses multiple methods of multi-level layered filtered extraction instead of using single attribute judgment to extract the pavement markings in order to enhance the extraction method of extensive practicability. The process method utilizes car driving recorder data to change the X, Y, Z coordinates according to the direction of the vehicle. The coordinate system is then used to calculate the spatial characteristics of the local point cloud of a plurality of scanning lines. Given full play to the use of synchronous vehicle photos, in the color point cloud it will be converted into the data point attribute RGB gray level calculation to improve the accuracy and efficiency of point cloud target extraction. We put forward point cloud computing to help in vehicle target extraction of the point cloud based on the laser reflection intensity difference process compared with the previous algorithm. Making up for a little bit of cloud strength value unstable defects, an increase in the application range of cloud point extraction can simultaneously extract between the three-lane road lines to provide assistance in the development of unmanned technology.

Extraction of vehicle-borne laser colored point clouds
In this study, we applied the SSW Vehicle-Borne Mobile Model (figure 1); details can be found at www.jx4.com/en/. It was developed by Capital Normal University and Beijing Geo-Vision Technical Company and it includes a 360° laser scanner, IMU, GPS, and CCD camera. This system adopts a laser scanner with a long range, high accuracy, a large field of view (FOV), and an IMU with high accuracy. Compared with similar products, the advantages of this SSW model are its: short initial time span, easy operation procedure, high accuracy, and automatic classification and modeling.
The SSW system was integrated through the mechanical structure. Time synchronization was ensured with GPS time as the main line and the absolute coordinates of the measured target points were obtained through integrated navigation and a mutual structural relationship. Among them, time synchronization achieved a unified time system of the laser scanner, and IMU and camera through the GPS marking format, namely the GPS time system, so as to realize the data consistency of the system at every moment. The integrated navigation was realized by using GPS and IMU to acquire the attitude and position data of the system at every moment combined with the calculation. The system obtained the data by rotating the laser scanning beam in a clockwise direction around the rotating axis at a high speed, and gained the coordinates in the instantaneous scanning coordinate system through post-processing. The laser scanning device, IMU, the GPS receivers, and the vehicle-borne platform were bound in a rigid way, and the position and attitude relationship among the laser scanning coordinate system, GPS, and IMU was obtained through precise stand-alone calibration and system calibration. In the end, the coordinates of the measured object could be converted to the geodetic coordinate system through the formula outlined below. The relevant attribute data could be identified by photos synchronously collected by the camera. The pixels of photos synchronously collected by the camera corresponded to the laser point clouds to generate the laser point clouds that included all kinds of attribute information.
(1) The equation (1) can be expressed as a vector, is the position vector of the laser footprint in the homoeopathic laser beam coordinate system; R W and R G are the coordinate transformation matrix and coordinate rotation matrix related to the current position; R N and R L are the rotation matrices related to the measured or interpolated attitude angle and scanning angle; and R M refers to the rotation matrix of the installation error.
Through the above formula and the relevant calibration parameters, the laser point cloud data were converted to the WGS-84 absolute coordinate system, and then the data were used based on practical needs.

Automatic extraction method process of road pavement markings
Since laser scan point cloud data has the characteristics of large volume, high uncertainty, and noise, it is necessary to conduct a pre-process prior to the computing process of feature extractions. Based on the vehicle laser scanning route, distances between points, and local point cluster representing the different planes of geographic features, the noise and errors can be filtered and eliminated. Both the volume of data and the computational time can be reduced by pre-processes.
The major scanning method of the vehicle-mounted laser scanning system is linear scanning. Different geographic features or objects have different spatial distributions. Aiming at the automatic extraction of road pavement markings, we proposed a novel method process for pavement marking extractions. First, scanning lines were extracted from dispersed laser reflection points according to their recorded GPS time or reflection angle (figure 2). Secondly, road surface information  was extracted by changes in the vertical height and fluctuation of point clouds on the road surface utilizing the spatial distribution characteristics of the laser points on the scanning lines. In essence, the extraction of the target point cluster is accomplished through a series of sequential selection filtering of point clouds by vertical elevation difference, RGB difference of adjacent points, changes of point density, laser pulse reflection intensity, and intensity differences. Finally, a density-based clustering of the extracted pavement marking point clouds was carried out. The density-based spatial clustering of applications with noise (DBSCAN) clustering method [30] was applied to cluster the boundary points on the whole road by setting the clustering search radius R d and forming Min d , the minimum point of the clustering cluster. After the clustering was completed, the point cloud fitting calculation was performed for a single class and the vectorization of each pavement marking was realized based on the traffic direction. Point clusters of clear and accurate pavement markings can be obtained by eliminating noise as per the corresponding attribute information estimation.

Extraction of vehicle-mounted point cloud laser scanning lines
Points recorded by the vehicle-mounted scanning system are sequenced by their returning time. The difference of scanning angles of successive lines of the points recorded in the system is of fixed value, which is usually the angle resolution of the laser scanner. When the FOV is the sky, no laser points are reflected, and an irregular jump is seen between the last scanning point ( p tj ) and the first one ( p tj+1 ) in the following scanning line ( p tj and p tj+1 are the two consecutive points recorded by the system), as is shown in figure 3. Accordingly, an irregular jump in the GPS time gap between p tj and p tj+1 can be retrieved. Therefore, the time gap and angle difference between two adjacent lines of points can be calculated according to the formulas indicated below. In this case, the points in dispersion can be arranged into a series of 2D lines. Every 2D line or scanning line is an approximation of a crosssection of the road [21]: where in these two equations p tj+1 (θ1) and p tj (θ2) are the scanning angle values of the adjacent points; p tj+1 (t − gps) and p tj (t − gps) are their recorded GPS time, respectively; and ∆θ and ∆t are the thresholds of the angle difference and time gap of the scanning line whose values are affected by the angle resolution and pulse frequency of the laser scanner.

Extraction of road features based on point cloud elevation difference
Every single point among the laser scanning data has its own X, Y, Z coordinates. Based on these coordinates, we selected the measurement platform of the laser scanner as the horizontal plane. Therefore, the Z direction could be considered perpendicular to the road surface. Since the height of the measured vehicle was also known, we could basically obtain the approximate location of the road surface relative to the measurement platform; so as to determine the threshold value selection in the Z direction ( figure 4), we deleted all of the point clouds above the height of the road surface and gradually narrowed the selection range of the target point clouds Z pi < Z pe . Results can be achieved through the method of filtering point cloud using the elevation difference of the Z value. In this research, the relative Z pe elevation difference between the center point and the surrounding points was utilized to extract leveled road features. Since the road surface is relatively flat, there will not be sudden changes in the vertical Z direction. Thus, it is believed that if there is a large change in the Z direction, then it means there is a road boundary (the curb), so as    to initially exclude interference other than the point clouds on the road surface.
Point clouds are sequenced by the laser scanning route. A single scanning cross-section is similar to a straight line in 2D spaces, while multiple scanning lines can be grouped into a nearly rectangular array (figure 5). We used a nearly square array to interpret and filter point clouds in this research. Different sizes of nearly square arrays were experimented with, such as square arrays of 3 × 3, 5 × 5, and 7 × 7 ( figure 6). It was found through using multiple data files for testing that the 5 × 5 point array is the most suitable filter for extracting road pavement. Subsequently, the filtering algorithm follows the 5 × 5 point array arrangement, and uses the Z coordinate of the point cluster generated for calculations, where the coordinates of the 5 × 5 array is Z p0 , Z p1 , Z p2 , Z p3 ...... Z p24 , respectively, in this case. Z represents the sum of the absolute value of the difference (∆Z i ) between center point A 0 and each surrounding point. The value of Z generally shows the difference of the distance between the current center point and the surrounding points in predicting the road plane. By limiting the maximum of the threshold value of Z , the road plane can be extracted ( Z < Z t ):

Extracting boundaries of different objects based on grayscale difference
Grayscale is the measurement of the darkness of an object. It utilizes absolute black as the standard to judge the different saturation showing on an image. The gray digital image, also known as black and white, is made up of pixels that contain only the intensity information, from the weakest, black, to the strongest, white [31]. In this research, RGB color images were collected simultaneously with laser scanning. The color imagery element was assigned to each of the laser pulse points using fusion technology. The fusion process was created by matching the X, Y, Z coordinate locations. The original purpose is to add the real scene attribute to the digital models. The relative difference of the grayscale is basically stable if the laser scanning data and imagery data are collected simultaneously. We used the formula (gray = R × 0.299 + G × 0.587 + B × 0.114) to convert RGB into gray scale to conduct the calculation [32,33]. Thus, we used the grayscale difference to identify the boundaries of different objects. The average area ratio can be used to structure the difference plot. We can confirm the changes according to the difference plot divided by the average area ratio [34]. The present study applied Otsu's method to calculate the grayscale variation line of the point clouds on the surface of the road [35,36]. The Otsu algorithm divides the grayscale of an image into two groups with one specific assumed gray value t. When the between-class variance between the two groups reaches its maximum value, this gray value t is the optimal threshold value of image binarization.
Suppose the image has L gray values within the value range of 0 to L − 1, in which the gray value T is chosen to divide the image into two groups, namely G 0 and G 1 . The gray value of the pixel contained in G 0 is within the range of 0-T and that contained in G 1 is within the range of T + 1-L − 1. N represents the total number of the image pixels, and n i is the number of pixels whose gray value is i. p i = n i /N is the probability for each gray value i to appear; assume ω 0 and ω 1 are the proportions occupied by the number of the two groups of pixels, G 0 and G 1 , in the overall image, and the average gray values of the two groups are µ 0 and µ 10 , respectively, then we can gain: The total average gray value of the image: µ = ω 0 µ 0 + ω 1 µ 1 . Between-class variance: The optimal threshold value: T = argmax(g(t)), which refers to the corresponding t value when the between-class variance reaches its maximum value.
To guarantee the accuracy and completeness of the target point clouds, a 5 × 5 point cloud square matrix with the extraction point as the center was used as the target point cloud for subsequent calculations.

Extracting road pavement markings based on the laser pulse reflection intensity of point cloud and intensity change
Laser intensity is a special measurement value of the terrestrial 3D laser scanner, but it has not been fully and effectively applied in current research. Affected by factors such as system error, atmospheric conditions, and scanned geometrical shape, the intensity value is likely to have large deviations, and thus it is hard to truly reflect the target information [37]. However, if it is distinguished only by the reflection intensity of different substances under certain circumstances, a relatively good effect can still be achieved. As shown in figure 7, all of the laser points on a common street were rendered according to their intensity values and the intensity region was 0-2048.
Under normal circumstances, for terrestrial 3D laser scanners, factors such as relatively close scanning distance and atmospheric attenuation can be negligible. Equation (5) expresses the relationship between the laser intensity value I and the target reflectivity ρ; the laser distance measuring value R, the laser incident angle θ, and the transmitting power P E : For the equipment used in the study, θ is completely consistent. Assuming that the transmitting power P E is stable, equation (5) can be simplified as: The general form for the laser radar to act on the distance equation is: Herein, P R is the power of the receiving laser; P E is the power of the emission laser; G E is the transmitting antenna gain; σ is the target scattering section; D is the receiving aperture; R is the laser distance measuring value; η Atm is the one-way atmospheric transmission coefficient; and η Sys is the optical system transmission coefficient of the laser radar. As for the expanded Lambert scattering targets, the LiDAR equation can be simplified as [38]: In the equation, ρ is the average reflection coefficient of the extended targets and θ is the laser incident angle. For close range terrestrial 3D laser scanners, the atmospheric transmission coefficient η Atm can be neglected, namely: If c is constant, then equation (6) can be further simplified as: In the present study, the incident angle θ is fixed and the emission power P E is stable, thus, equation (10) can be further simplified as: In the equation, C = ccosθπP E η Sys /4. It is a coefficient related to the laser incident angle θ, the emission power P E , and the optical system transmission coefficient η Sys . Therefore, the intensity value of the laser point cloud is mainly affected by target reflectivity and distance.   In order to efficiently and accurately utilize the intensity attribute of point clouds, a statistical data analysis of the laser intensity of currently available equipment was carried out. A section of the street was selected as the test area and the data points on the road surface were collected manually. As indicated by the study, with a relatively high reflectivity, the pavement markings can overcome the change of distance within a certain range and directly distinguish the intensity value. As shown in figure 8, within the distance range of seven meters, differences among the reflection strength values of the road surface paint are relatively larger compared with other parts of the road. However, since such differences are not so obvious beyond the distance range of seven meters, we made the judgment based on the strength difference. Figure 8 shows that if the point is closer to the laser scanner, its reflection intensity value is generally greater. The reflection strength of the road pavement marking is usually above 1750, while the reflection strength of ordinary road pavement rarely reached this level of intensity. Therefore, the target object or the road pavement marking in this research is relatively easier to be identified within a distance of seven meters. When the scanning distance is equal to or greater than seven meters, the reflection intensity difference is much smaller. Therefore, it is not possible to establish a numerical   threshold of reflection intensity in order to find the pavement markings directly. In this research, we developed the differential intensity threshold method of adjacent points in order to judge the boundaries of road pavement markings.
This method significantly improved the accuracy and precision of road pavement marking extraction.
The method of differential intensity is similar to the two filtering methods previously indicated in this research. We calculated the differential intensity of points in 3 × 3 point clusters. I is the sum of the absolute difference of intensities between the eight points and the central point. Based on our experiments so far, the outline of pavement markings can be clearly identified and extracted with the differential value ranges being greater than 30.

Erasing the points of noise and error based on point cloud density
The clear outline of the point clusters of pavement markings can be identified after the steps outlined above. However, some sporadic points still exist, which might create problems for tasks, such as building digital solid models. Therefore, we use point cloud density to eliminate those points.
As shown in figure 9, with the vertical road surface direction as the plane, one point was selected as its center of the grid, utilizing the X Y coordinate direction to count the number of points in a square of 0.4 m on each side. Using a threshold number, we can judge whether the targeted central point is a sporadic point or not. If it is, the point would be removed. There is a very slight difference in Z values, which would not affect the results of this method. In the meantime, the number of points in this cluster is very small and we can quickly evaluate each of the points. After computing all of the points in the cluster, the accurate point cluster data of pavement markings will eventually be obtained.

Partial point cluster vectorization
Cluster and feature analyses of the discrete points extracted through the pavement markings need to be carried out to  eliminate fake target points. Meanwhile, a vectorization of the discrete points should be conducted to accurately express the pavement markings. The process is mainly divided into three steps: (1) conduct a density-based clustering of the target point clouds; (2) put forward and combine the clustering clusters based on the feature analysis; and (3) vectorize the clustering cluster of each pavement marking.
The DBSCAN clustering method [31] is used as the density clustering method. By setting the clustering search radius R d and forming Min d , the minimum point of the clustering clusters, the method clusters all pavement markings on the road. A set of continuous points is obtained through density clustering with each cluster as the pavement marking of a section of the road. Feature analysis can be used to determine the shape of the point set and the width of the pavement marking is 10 cm along the direction of the traffic. Thus, if the width of the clustered point set exceeds 10 cm, it will be determined as a fake point set and be eliminated. In the end, linear fitting of each point set gained is carried out to realize vectorization.

Research data
This research used the data collected by the SSW Vehicle-Borne Mobile Model system. The scanning range of data was roughly 2000 m × 300 m, with an approximate point number of 20 million. The data were extracted by scanning objects, such as roads, trees, buildings, and guardrails. The roads were in an urban area, and were relatively level and smooth with clear pavement markings. The data are shown in figures 10-12. Figure 13 shows the results of the original point cloud after being extracted by our proposed algorithm. Then, the point cloud data were vectorized ( figure 14). It is obvious that the extracting work of road pavement markings in this research yielded complete and accurate results. Figures 15 and 16 show the results of the other areas of the point cloud. This suggests that using the combined methods of differential point cloud elevation, differential grayscale, differential laser scan pulse reflection intensity, and point cloud density are effective in extracting road pavement markings. We can extract the target geographic feature clearly and achieve the expected accuracy and goals.

Result assessment
In order to assess the accuracy and precision of the extracted road pavement markings, evaluations on the results were carried out according to the following models.
(1) Correctness: p = √ X 2 +Y 2 √ X 2 +Y 2 + √ ∆X 2 +∆Y 2 . X is the target center X coordinate; Y is the target center Y coordinate; ∆X = X 1 -X 2 ; ∆Y = Y 1 -Y 2 ; X 1 is the target center coordinate; X 2 is the center coordinates of extraction results; Y 1 is the target center coordinate; Y 2 is the center coordinates of extraction results.
(2) Completeness: r = s1 s2 . S 1 = (X MAX -X MIN ) × (Y MAX -Y MIN ); X MAX is the maximum X coordinates after point cloud cluster; X MIN is the minimum X coordinates after point cloud cluster; Y MAX is the maximum Y coordinates after point cloud cluster; Y MIN is the minimum Y coordinates after point cloud cluster; S 2 is the actual measurement area point cloud. Results of each of the three categories are shown in table 1. The evaluation data indicates that two indexes (p, r) are above 90%. The average time for extracting each data set was 22.3 s. The assessment shows this method process of extracting pavement markings is accurate and precise and can be effectively used for large data set processing.

Conclusion
Using point cloud data of a vehicle-mounted laser scanning system, this research proposed a novel automatic extraction method process of road pavement markings, in particular for urban streets using some attribute information of the point cloud data, including location coordinates, grayscale of RGB, and laser pulse reflection intensity. First, this research proposed to apply the combined methods of differential grayscale, differential reflection intensity, and point clouds density identification to extract the target objects. In comparison to the existing method of road pavement markings, the method process developed and applied in this research produces more accurate and more complete results of road pavement markings efficiently. Results of the trial evaluation indicate both the accuracy and the completeness of point cloud extraction of pavement markings are higher than 90%; therefore, this method process can be used for extracting pavement markings, streets, and roads from large LiDAR point cloud data. The algorithm proposed in this research lays the foundation of effective road feature extractions for 3D modeling. The methodology proposed by this research facilitates building and maintaining the 'digital city' or 'smart city' worldwide.