Precise vehicle ego-localization using feature matching of pavement images

Purpose – Precise vehicle localization is a basic and critical technique for various intelligent transportation system (ITS) applications. It also needs to adapt to the complex road environments in real-time. The global positioning system and the strap-down inertial navigation system are two common techniques in the ﬁ eld of vehicle localization. However, the localization accuracy, reliability and real-time performance of these two techniques can not satisfy the requirement of some critical ITS applications such as collision avoiding, vision enhancement and automatic parking. Aiming at the problems above, this paper aims to propose a precise vehicle ego-localization method based on image matching. Design/methodology/approach – This study included three steps, Step 1, extraction of feature points. After getting the image, the local features in the pavement images were extracted using an improved speeded up robust features algorithm. Step 2, eliminate mismatch points. Using a random sample consensus algorithm to eliminate mismatched points of road image and make match point pairs more robust. Step 3, matching of feature points and trajectory generation. Findings – Through the matching and validation of the extracted local feature points, the relative translation and rotation offsets between two consecutive pavement images were calculated, eventually, the trajectory of the vehicle was generated. Originality/value – The experimental results show that the studied algorithm has an accuracy at decimeter-level and it fully meets the demand of the lane-level positioning in some critical ITS applications.


Introduction
Precise vehicle localization is one of the basic and urgent problems for most of the intelligent transportation system (ITS) applications. Through vehicle localization, many parameters associated with the working state of vehicles, such as vehicle position, velocity, acceleration and trajectory, can be obtained. These parameters are closely related to many security-themed applications in ITS. The literature (Boukerche et al., 2008) lists over 10 applications closely related to the localization in ITS, which include routing navigation, data dissemination, map localization, adapted cruise control, cooperative intersection safety, blind crossing, platooning, vehicle collision warning, vision enhancement and automatic parking. It also points out that some applications such as vehicle collision warning, vision enhancement and automatic parking need sub-meter resolution. If precise localization information of all vehicles can be obtained in real-time, it will bring about revolutionary changes in future traffic management. This can be specifically manifested in five aspects as follows: 1 In the event of a potential collision, for example, the potential risk for separated bicycle paths (Xu C et al., 2016) and the merging vehicle's rear-end crash risk, an early warning can be made accurately (Weng et al., 2015); 2 The accidents occurred in the past can be accurately reproduced through the recorded precise location data; 3 Some microscopic vehicle behaviors such as lane change, overtaking and motion in the wrong direction can be identified; 4 More timely and detailed road traffic situation can be obtained; and 5 Many new intelligent transportation applications can be invented through the accumulated precise vehicle The current issue and full text archive of this journal is available on Emerald Insight at: https://www.emerald.com/insight/2399-9802.htm trajectory data. In a word, if the real-time and precise localization problem were solved, it will speed up the step of "internet of vehicles" from theory to application. In this paper, the existing vehicle localization methods have been summarized through literature reviews.
1.1 Global navigation satellite system localization GNSS stands for the global navigation satellite system, which refers to all satellite navigation systems, including global, regional and enhancements systems, such as the American global positioning system (GPS), the Russian GLONASS, the European Galileo, China's BeiDou satellite navigation system and related enhancement systems, for example, the American wide-area augmentation system, Europe's European Geostationary Navigation Overlay Service, Japan's multifunctional transport satellite augmentation system, etc (Zaidi and Suddle, 2006). GPS is currently the most common method of GNSS localization with the advantages of low cost, wide applications, allweather working, etc. However, it also has limitations. On one hand, the satellite signals are blocked in the places of tunnels, mountain roads and city roads with skyscrapers surround, where GPS receiver cannot receive the satellite localization signals, on the other hand, the accuracy of GPS localization is typically between 20-30 m, which cannot achieve the accuracy at lane level. Aiming at the defects of the GPS, the differential GPS (DGPS) was designed as an improved method, which calculates pseudo-range correction in each satellite based on the reference station in the ground, and these corrections complete improvements in timing of the GPS satellite signals, track and atmosphere error. In general, the best localization accuracy of the DGPS is approximately 1 m. Unfortunately, when the number of satellites is seven or less due to buildings or trees, the average errors are more than a meter (Rezaei and Sengupta, 2007). The accuracy and reliability are still not high enough for some security applications such as collision warning, platooning and automatic parking.

Dead reckoning localization
The dead reckoning (DR) is a classic localization technology independent of the GNSS. For a moving object within a twodimensional space, if its initial position and all the displacements at any previous time are known, the current position of the object can be calculated by the initial position added with accumulated displacement vectors, which relies on the inertial sensors such as odometer, gyroscopes, accelerometers and electronic compass to obtain the displacement and heading of a vehicle. The implementation of the DR system has two requirements as follows: the first is that the initial position of a moving target should be informed and the second is that the distance and direction of a moving target at all moments should be obtained (King et al., 2006). As the DR localization is an accumulation process, each estimated position of the target depends on the localization result of the previous moment. So, the measurement error and calculation error will accumulate with time elapsing, leading to a continuous decline in the DR localization. The DR system features high autonomy, high security, good resistance to radio interference, all-weather working, etc. Besides, it only uses its own inertial measurement components to deduce position, speed and other navigation parameters. However, the accumulated errors of the DR system grow rapidly over time, so that it is unsuitable for long-term performance. Besides, it needs a long time for initial alignment, especially for position measurement (Bevly and Parkinson, 2007).
1.3 Integrated localization combined global positioning system with dead reckoning As GPS and the DR are complementary, the localization precision can be improved by combining the two techniques. As an external input, the GPS information corrects the positioning result of DR frequently when the vehicle is in movement, which controls the accumulated error of DR as time going. On the contrary, the output of DR can solve the problems of the GPS in a short time, such as the loss of GPS signal and cycle slip in a complex environment, which strengthens the system's anti-interference ability. The mutual penetration and combination of the two systems' information can play a role in the complementary performance and improve the overall navigation precision and performance of the system (Krakiwsky et al., 1988). The overall performance after the combination of two systems is far better than each separate system, which becomes a hotspot in this research filed. There are several ways to fuse the localization information from multiple sensors in the GPS/DR integrated systems. Then, the fusion ways generally have three types, namely, non-coupling, tightly-coupling and loose-coupling. Among them, loosecoupling fusion has the best fault-tolerance performance, which uses local filter equations to fuse the output of the GPS and DR subsystems, and uses the main filter to fuse the output of the local filters. This approach not only reduces the system dimensions with the advantages of small calculation amount and parallel processing but also decreases the coupling degree of each subsystem. One sensor fault will not make a serious impact on other subsystems' filter equations. The federal filter proposed by Carlson is a kind of loose-coupling fusion model, which attracts widespread attention because of its flexibility, small calculation amount and good fault-tolerance performance (Carlson, 1996).

Map matching localization
Map matching is a localization correction method based on software technology. Its basic idea is to associate a vehicle localization trajectory from a GPS receiver with the road information in an electronic map database, and thus, determine the vehicle position relative to the map (Chausse et al., 2005). The map matching applications are based on two premises. One is that all vehicles are always traveling on road; and the other is that the accuracy of the electronic map data should be higher than that of the estimated position of the road-vehicle navigation system. When the above conditions are met, the localization trajectory is compared with the road information through an appropriate matching process to determine the vehicle's most likely traveling road section and its most likely position in this section. The map matching algorithm has a close relationship with the digital map (Jagadeesh et al., 2005). The electronic map must have the correct network topology and high accuracy to complete the map matching, otherwise, it will lead to false matches (Deusch et al., 2013).

Localization based image/video information
Images, videos and data processing techniques are often used for real-time localization in the field of autonomous vehicles and mobile robotics. These localization methods can be roughly divided into three categories as follows: 1 passive localization method based on video surveillance (Chapuis et al., 2002); 2 ego-localization based on scene matching (Uchiyama et al., 2009); and 3 ego-localization method based on visual odometry (VO) (Sakai et al., 2010).
The video surveillance based passive localization tracks vehicle by cameras mounted on road infrastructures, which detects the target vehicle through background subtraction and calculates the vehicle's actual position through the camera calibration. However, it is difficult for the vehicle to obtain its own localization information from the video surveillance system. The localization method based on scene matching calculates the position of the vehicle by searching the most similar image in a pre-recorded image database or street view database with the captured images in real-time. The ego-localization method based on VO calculates the relative motion displacement between the two consecutive frames of the vehicle through matching overlapping areas among multi-frame images captured by the camera. Because of its relatively simple structure and high reliability, VO has been used in a wide variety of robotic applications such as on the Mars exploration rovers (Maimone et al., 2007).

Radio localization
Radio localization is the process of finding the location of something through the use of radio waves. Generally, it first measures the transmission parameters of the radio waves, which travel from the known stationary objects to the moving target object such as the difference of time or phase and the variation of amplitude or frequency. From these parameters, the distance difference between the known objects and the target object, the moving direction of the target object can be calculated, which can be used to determine or predict the location of the moving target object (Sun et al., 2005). One typical application of radio localization is the American 911 telephone system, which can acquire the localization of the person who dialing a mobile phone. In addition, there are other radio localization methods such as ultra-wideband, wirelessfidelity and cooperative localization method based on vehicular ad hoc network (Bahl and Padmanabhan, 2000;Lee and Scholtz, 2002;Cheng et al., 2005;Thangavelu et al., 2007). However, if this method is applied in vehicle localization, it requires a large number of roadside stations and needs high investment costs, which is clearly not suitable for vehicles' longdistance localization. This paper presents a precise vehicle ego-localization method using feature matching of pavement images, which are captured by the camera installed at the rear of the car. On this basis, the local features of the pavement image are extracted based on speeded up robust features (SURF) descriptors, and the relative displacement and the rotation angle between two consecutive frames are obtained through the image matching. Finally, the trajectory of the vehicle is extracted and the precise localization is achieved. The organization of this paper is as follows: in Section 2, we provide a simple literature review of the inertial navigation system (INS) assisted by the GPS and the INS assisted by the vision. Section 3 will introduce the experimental equipment and explain the flow of the entire algorithm. Section 4 illustrates a matching algorithm based on road image features. Section 5 describes the vehicle trajectory estimation algorithm. The experiment is described in Section 6, and Section 7 gives the conclusion. Tao Wu and Ranganathan (2013) proposes a vehicle localization method using road markings. In his paper, the road markings (such as arrows, speed limits and zebras) were surveyed beforehand, and the corresponding GPS latitude and longitude are stored in the database. The pavement videos were captured by a color camera mounted on the roof of a car. With the developed detecting algorithms, the road markings were recognized and matched with those stored in the database. Once the road markings were matched successfully, the position of the vehicle can be calculated based on the stored GPS data. Tao Wu indicated that the proposed methods can achieve positioning accuracy at lane-level. It belongs to the global positioning method and requires surveying the positions of all road markings in advance, obviously not suitable for longdistance vehicle localization.

Related work
Chen proposed a perceptual fusion three-dimensional localization scheme for autonomous driving scenes using LIDAR and vision sensors, etc., efficiently generating threedimensional candidate frames from a three-dimensional point cloud and combining features from multiple views divided by region get up and finish positioning (Chen et al., 2017). Experiments show that this approach outperforms the state-ofthe-art by around 25 and 30% AP on the tasks of threedimensional localization and three-dimensional detection. In addition, for two-dimensional detection, this approach obtains 10.3% higher AP than the state-of-the-art on the hard data among the LIDAR-based methods. However, the price of LIDAR is relatively expensive, so LIDAR-based vehicle positioning methods can be difficult to implement.
Hiroyuki (Uchiyama et al., 2009) from Nagoya University presents a vehicle ego-localization method using streetscape image sequences. The image sequences of two in-vehicle cameras are matched with a database that contains a sequence of streetscape images and their corresponding positions. A sequential image matching algorithm is developed to search for the image similarity with the captured image in the database. Eventually, the vehicle position is calculated based on triangulation using the positions stored in the database and the viewing directions of the two cameras. Based on experiments, the authors proved that the positioning accuracy of the proposed method is better than the GPS, and the horizontal positioning accuracy error is less than 1.5 m. This method requires establishing a huge database containing a large amount of streetscape images and it can hardly guarantee the system can work in real-time. Vu et al. (2012) from the University of California, Riverside presents a sensor fusion technique that uses the computer vision and the differential pseudo-range DGPS measurements to aid the INS. The proposed method mainly solve the localization problem in a challenging environment where the GPS signal is limited or unreliable. In Anh Vu's paper traffic lights were surveyed as landmarks and their location data is stored in a database in advance. The localization method uses satellite pseudo-range time-of-arrival measurements, Doppler measurements between satellites and the GPS antenna and previously mapped visual landmarks on an image taken by a camera that measures the angleof-arrival to correct the INS. The experimental results have shown that the combination of the DGPS and a single visual feature measurement at 1 Hz is sufficient to achieve localization accuracy, which is typically less than 1 m. This method heavily relies on the aid of traffic lights, while there are almost no traffic lights on the expressway or rural roads. Pink et al. (2009) propose a vehicle localization method based on aerial image matching. The method combines ideas from research on VO with a feature map that is automatically generated from aerial images into the visual navigation system. The presented method detects the road markings from the aerial images and the features of road markings are extracted to create a feature map. Two forward-looking cameras are fixed on the roof of a vehicle to capture the road images, an image processing algorithm is developed to match features from the cameras to previously generated feature map to obtain a precise vehicle localization result.
Turgay and Ahmed (2011) build a framework that uses stereo camera images and freely available satellite and road maps to automatically obtain accurate global vehicle localization. The forward pavement images are captured by two cameras on a car, the three-dimensional point cloud of the road surface is reconstructed based on the theory of stereoscopic. With the three-dimensional point cloud, the top-view images of the road are used to match with the satellite images. At first, the accurate vehicle poses, high-resolution top view images, map overlays and three-dimensional reconstructions of the road and its surroundings are all obtained.
In addition, Dean et al. (2008) propose a vehicle location method based on road terrain parameters including the road height change, the derivative of height and superelevation changes. Claus Brenner presents a vehicle localization method using landmarks obtained by the LIDAR mobile mapping system. Using associated landmark pairs and an estimation approach, the positions of the vehicle are obtained. From the literature listed above, we can find out most vision-based vehicle localization methods belong to the global positioning method, which needs to build a huge database previously and it is difficult to achieve real-time and long-distance localization of vehicles.

System setup and algorithm processing
This section describes the system setup and an overview of the proposed algorithm. The smart car uses the 2 million pixels Basler aca1600-60gc camera (60 frames/s, adjustable) on the campus of Chang'an University to capture the road image. Excessive vehicle speed will result in blurred pictures, which cannot detect and match the interesting points correctly. So, the vehicle speed is maintained in the range of 20-30 km/h, as shown in Figure 1. The data offline processing is implemented by MATLAB R2016a.
The general idea of this study is to achieve the precise vehicle localization using local feature matching of pavement images, as shown in Figure 2. First, the initial position of the vehicle is got by a GPS receiver. Second, we can get the top view of the pavement image. Then, we use the SURF operator to extract feature points of the two consecutive pavement images after correction to match feature points one by one and use a random sample consensus (RANSAC) algorithm to eliminate the false matching points. Finally, the relative translation and rotation offsets between the two consecutive images are computed with the selected matched points. With the known initial position and relative offsets between any two consecutive images, the vehicle's position can be obtained in realtime. This method is an ego-localization method, with relatively high robustness and precision. Furthermore, it is independent of landmarks, and there is no need to build up a database beforehand. The detail of the method was discussed as below.

Image matching
4.1 Comparison of the methods for local feature detection This research mostly resolves the matching of the local features of pavement images, so it is very important to choose an Figure 1 Smart car test platform Figure 2 The overall flow chart of the algorithm appropriate method to ensure the number of feature points and the efficiency of the algorithm. Different local features extraction algorithms are suited to the images with different features such as corners, blocks, spots and edges. The number of the detected feature points and the operation time are two key evaluation indicators for the local features extraction algorithms. Moreover, the size of the image is also a critical parameter, which directly determines the computation of an algorithm. To select an appropriate local features extraction algorithm and suitable image size, we conduct testing as follows: The originally captured pavement images are transferred into three different sizes, for example, 720 Â 1,280 pixels, 360 Â 640 pixels and 180 Â 320 pixels shown in Figure 3; Four different corner detection algorithms (Harris, Susan, SIFT and SURF) are used to detect as candidates for detecting the feature points of the above images under the same computing environment. CPU: dual-core Intel 2.50 GHz, memory: 8G and platform: MATLAB R2016a; and An efficiency function is defined as follows: where N is the number of feature points, T is the processing time of the corresponding algorithm and P is the number of feature points matched in unit time.
The comparison results are shown in Table 1. Obviously, when the size of the image is equal to 360 Â 640 pixels and the SURF is chosen as the local features extraction algorithm, the efficiency function P achieves the peak. So in this paper, we transfer all the original captured pavement images into 360 Â 640 pixels and select SURF as the specified local feature extraction algorithm.

Detection and matching of the interesting points using improved speeded up robust features
According to Table 1, this study uses an improved SURF algorithm to detect the initial interesting points of pavement images. To further speed up the efficiency of feature points matching, this paper proposes a matching method based on the prejudgment of the dominant direction and the simplified distance formula. In addition, a simplified RANSAC algorithm is used to remove the false correspondence pairs. Finally, the robust matched feature points are acquired.
SURF is a good algorithm for the extraction and description of the local image features, which is primarily used in the field of image registration and stitching. The SURF algorithm includes three steps as follows: 1 the detection of the feature points; 2 the description of the feature points; and 3 the matching of the feature points.
The study in this paper optimized the SURF algorithm, a rapid and accurate matching algorithm is proposed, which makes the extraction of the vehicle trajectory more robust. Figure 4 shows the flow chart of the improved SURF algorithm.

Detection of the feature points
Feature points detection includes three steps as follows: the establishment of the integral image, the construction of the multi-scale space for the specified image using a box-type filter and the localization of feature points.
The rule judges whether a point (x, y) is a feature point or not can be described as follows: For a given threshold, if the determinant of the Hessian matrix of one pixel is greater than the threshold, it will turn to Step 2, else turn to next pixel; The non-maximum suppression is applied for 3 Â 3 Â 3 three-dimensional neighborhood of the point, only the point, which is greater than all 26 response values in the three-dimensional neighborhood can be adopted as the candidate feature point; and To get a stable position and scale value of a candidate feature point, it needs to carry out interpolation operation on different scale space.

Description of the feature points
The feature described can be divided into two steps as follows: first, the dominant direction of the feature point is calculated to ensure the rotation invariance of the algorithm; second, the neighborhood of the feature points is rotated to the dominant direction, and the descriptor of the feature point is gained.
After the dominant direction of the feature point is determined, SURF uses wavelet responses in the horizontal and vertical direction to describe a distinctive feature point. A square region centered on the feature point and oriented along with its dominant orientation. The size of this window is 20 Â 20 s, where s is the scale at which the feature point was detected. This square region is divided into 4 Â 4 sub-regions with size 5 Â 5 s. For each sub-region, a four-dimension feature vector is established as follows: In equation (2), dx denotes the Haar wavelet response in a horizontal direction, and dy denotes the Haar wavelet response in a vertical direction. SURF also extracts the sum of the absolute values of the responses, |dx| and |dy| to enhance the robustness of the distinctive feature vector. Then, the vectors of 16 sub-regions are forming a 64 (4 Â 16) dimension feature vector. To ensure its brightness and scale invariance, the descriptor must be normalized in advance.

Matching of the feature points
In this paper, there are three steps for feature point matching. First of all, fast index matching for preliminary screening of the SURF algorithm continues to be used. Second, the absolute distance is chosen to match the feature points and optimize the result of fast index matching. Third, the angle difference of the In the process of the feature points detecting, the Hessian matrix trace is calculated. If the traces of two feature points have the same signs, it means that these two feature points have the same contrast. Otherwise, it indicates that they have different contrast and there is no need to measure the similarity between the two feature points, which can reduce the matching time and computation cost.
The similarity measurement for the matched feature points based on the absolute distance To describe the similarity of two feature points in two images, respectively, the absolute distance is used to calculate as follows: In equation (3), l ik denotes the k-th element of the i-th SURF feature point of the previous image. l jk denotes the kth elements of the j-th SURF feature points of the current image. N 1 is the number of SURF feature points in a previous image and N 2 is the number in the current image. For each feature point in the previous image, its absolute distances to all feature points in the current image are calculated, which constructs a distance set. From the distance set, we can select the minimum distance and the second minimum distance to compare with a threshold T. When the second minimum distance is less than T, this feature point in the previous image will be detained, as its corresponding feature point is found in the current image. Otherwise, we will discard this feature point. The smaller threshold is set, the less correspondence pairs will be reserved, while the distinctiveness and robustness of these pairs are higher. The proposed absolute distance is useful to improve the efficiency of the algorithm and also shortens the computation time with the comparison of the Euclidean distance.

Elimination of the false correspondence pairs based on the angle difference
Taking into account image rotation, there is a certain angle difference among dominant directions of the matched points. F1 is a feature point in the previous image, which corresponds to the dominant direction v 1 . F2 is a feature point in the current image, which corresponds to the dominant direction v 2 . The angle difference between the two dominant directions is shown in equation (4): Image rotation reflects on the rotation of the feature points' dominant direction. If Dw is less than a threshold (T1), the  Figure 4 The flow chart of improved SURF algorithm feature points can be reserved, else they will be eliminated as false matched points.

Elimination of the false matched points using random sample consensus
RANSAC algorithm is an iterative method to estimate parameters of a mathematical model from a set of observed data, which contains outliers. In this study, it is used to eliminate the false matched points. Every time, eight groups of points among all matched points are selected randomly to calculate the fundamental matrix, which can determine whether the rest of the points are inliers. The set with the maximum amount of inliers is considered as the final matched points set.
The similarity distances of all correspondence pairs are sort in descending order; N (N = TotalNum Â t, TotalNum refers to the total number of matching points and t is a proportional factor) groups of correspondence pairs with bigger similarity distances are selected as the initial sample space; The fundamental matrix is calculated using the eight groups correspondence pairs randomly selected from the initial sample space and then the inliers can be detected according to the fundamental matrix; and Step 3 is repeated until the trial times come to the setting number. Finally, the points set having the maximum amount of inliers is considered as the one containing all the correct correspondence pairs.
In this study, the initial sample space is confined in the correspondence pairs, which have bigger similarity distances. So the fundamental matrix computed from this sample space would have higher compactness. As the sample space is narrowed, the computation time of the proposed RANSAC algorithm is reduced. On the other hand, it also decreases the total number of the final matched pairs. Considering the computation time and the total number of the remained correspondence pairs, in this study, the proportional factor t is set to 0.5-0.6 after experimental testing. The feature point matching diagram before and after using the RANSAC algorithm is shown in Figure 5.

Extraction of vehicle trajectory
Figures 6(a) and 6(b) are two consecutive images captured at time T n and T n 1 1 , respectively. After completing the matching of the feature points in both images, the offset of vehicle movement during the sampling interval can be calculated through the coordinate transformation of the corresponding pairs. The feature points P 0 , P 1 , P 2 , P 3 , . . ., P k -1 in the n-th image I n and P 0 0 ; P 0 1 ; P 0 2 ; P 0 3 ; . . . ; P 0 kÀ1 in the (n 1 1)-th image I n 1 1 represent the matched points set on the pavement in image coordinate system. When the vehicle moves, if the camera pose relative to the vehicle remains the same, the rigid transformation between the matched feature points in I n and I n 1 1 can be described by equation (5): In this formula, (x 0 , y 0 ) and (x, y) denote the coordinates of the feature points in I n11 and I n , respectively; Du denotes the rotation angle of the vehicle movement; Dx and Dy denote the horizontal and vertical offsets of the vehicle movement in the image coordinate system; M represents the scaling coefficient from the world coordinate to image coordinate. If I n11 is rotated around its center O 0 with an angle of Du in counter-clockwise direction, and translated with an increment of Dx in horizontal and Dy vertical direction, these two images will completely overlap. However, when the vehicle is moving in real-world coordination, the camera pose will constantly change due to vehicle vibration. Hence, the relationship between I n and I n 1 1 becomes a projective transform instead of a strict rigid transformation, and the solution of equation (5) will be not a standard form shown in equation (6), which only includes the parameters regarding the rotation and translation. On the contrary, it is a matrix form shown in equation (7), from which the rotation and movement offsets cannot be obtained directly: In this paper, an approximate method is proposed to estimate the offset of vehicle movement. The key idea is to produce two arbitrary polygons with the same shape by connecting all the feature points in I n and I n 1 1 ordered by the index. In Figure 6(c), after a rotation of Du and a translation of (Dx, Dy), the polygonP in I n11 will approximately overlap with the polygon P 0 , P 1 , P 2 , P 3 , . . ., P kÀ1 in Figure 6(d).
The rotation angle Du can be estimated through equation (8) by averaging the included angles between the corresponding edges on the two polygons: In addition, Dx and Dy can be estimated by the translation of the gravity centers of the two polygons as shown in equation (9): As the offsets Dx and Dy are relative to coordinates XOY of the (n 1 1)-th image, therefore, according to the rotated angle of two Figure 7 Simulation trajectory images, offsets Dx 0 and Dy 0 can be calculated in the coordinate XOY of the n-th image, as defined by equation (10). If u is the sum of the image rotated angles from the first image to the (n 1 1)-th, Dx 0 and Dy 0 denote the offsets in the coordinate system of the first image. The average offsets of the (n 1 1)-th image can be calculated based on all the final feature points: In this study, the GPS coordinate of the first image is taken as the initial position of the vehicle, and the above-average deviation is used to calculate the position corresponding to other images. The vehicle track can be plotted by connecting all the positions. Therefore, a track is drawn on an asphalt pavement as shown in Figure 7(a), and a track is obtained by using the above steps as shown in Figure 7(b). It can be seen that the two tracks basically coincide, thus verifying the feasibility of this algorithm.

Experimental results and analysis
To verify the correctness of the algorithm, we conducted a field experiment on the campus of Chang'an University, using the car shown in Figure 1 for data acquisition. In addition to the camera shown in the figure, the car was installed a DGPS system, and the DGPS system has a positioning accuracy of 2 m. Two experiments were carried out on the campus of Chang'an University. The first group of experiments is the road adaptability test; the second group is the open environment short-distance test of different maneuvering behaviors.

Road adaptability experiment
This topic selects three kinds of pavement images to test road adaptability. The three pavements are: 1 paving pavement; 2 asphalt pavement; and 3 cement pavement.
For each type of pavement, we collected 7,500 images for feature point acquisition and matching experiments. Figure 8 shows the SURF algorithm matching results for the three road surfaces. It can be seen from the results in the figure that for the image with a rich texture of the road surface, the SURF operator can get more matching feature points pairs because the surf operator is a multi-scale space feature point detection algorithm. Image spots at different scales are detected.

Short-distance experiment
In our open environment, three different maneuvering trajectories were tested. The current vehicle speed is 30 km/h. The three maneuvers are straight, right and meandering. In this experiment, the DGPS positioning data is used as a reference. It Figure 8 Surf algorithm matching results for three road surfaces Figure 9 Short-distance experiment can be seen from Figure 9 that the image positioning trajectory is smoother and the GPS data has a certain jitter. The detection accuracy is shown in the table. In the short distance case, the image positioning achieves the accuracy of the lane level. Figure 10(a) shows the correction results of two consecutive images. Figure 10(b) shows the matched pairs between the two images, which are used to calculate the offsets and rotation angle between the two images. Table 2 shows the x-axis and y-axis offsets of 36 pairs of matching points, which are in Figure 10.
The average translation of all matched points is used as image offsets Dx and Dy. In addition, each pair of matching points is applied to calculate the rotated angle (Table 2) and the average of all the rotated angles is as the image rotated angle Du . It may cause an error angle due to the error matching points, so the calculation results are amended by using the threshold of 0.05. The actual coordinates of the initial position are (a, b) and the logical coordinates are (0, 0). According to equation (8), the image's offsets are calculated in the initial coordinates.
To test the accuracy of the trajectory, which is got by the algorithm in this study, three trajectories are compared to the data from GPS. The results are as shown in Figure 9. The analysis is shown in Table 3. It is easy to know that the vehicle trajectory accuracy of this experiment is better than that of GPS.

Conclusions
This study presents a precise vehicle ego-localization using local feature matching of pavement images, which captures pavement images by a reversing camera installed at the rear of a car. Through the matching of the two consecutive pavement images, the relative position of vehicle motion is obtained. This research can draw the following conclusions: The proposed method with low complexity and good realtime performance has an advantage that there is no need to establish a global database prior to use; After the comparison experiment, the pavement image feature extraction algorithm is suggested in this paper. Adopting the SURF for feature extraction, which can extract much more feature point with short processing time, it is very suitable for the pavement images; Experiment results show that the positioning accuracy of the new algorithm is less than 0.5 m, and it can satisfy the intelligent transportation application in the lane level; The studied algorithm is tested under daylight conditions, but for the night time conditions, the supplemental lighting equipment is also needed to enhance the overall image brightness; and The proposed algorithm may lead to undesired deviation in generating vehicle trajectory due to cumulative errors after a long run, so combining the GPS, the INS and other   positioning sensors together with the research is required to ensure the positioning with the long-term stability and reliability.