Freeway Traffic Speed Estimation of Mixed Traffic Using Data from Connected and Autonomous Vehicles with a Low Penetration Rate

School of Automation, Nanjing University of Science and Technology, Nanjing 210094, China Key Laboratory for Automotive Transportation Safety Enhancement Technology of the Ministry of Communication, Chang’an University, Xi’an 710084, China Zachry Department of Civil and Environmental Engineering, Texas A&M University, College Station, TX 77843-3135, USA Joint Research Institute on Internet of Mobility, Southeast University and University of Wisconsin-Madison, Madison, WI, USA School of Transportation, Southeast University, Nanjing 211189, China


Introduction
Autonomous vehicle (AV) technology is a hot and practical research spot. When AVs embedded with the feature to communicate with others including vehicles, roadside infrastructures, or traffic control centers, they are defined as connected and autonomous vehicles (CAVs). It is expected that CAVs can provide faster responses and keep shorter headways, which lead to an increased overall roadway capacity [1]. Other expected benefits of CAVs include improved mobility to people with disabilities, enhanced productive use of travel time, better fuel efficiency, fewer emissions, and flexibility in parking [2,3]. However, it is estimated that the market penetration rate of CAVs might be able to reach between 24% and 87% by 2045 [4,5]. erefore, there will be a long period of mixed traffic condition comprising CAVs and regular human-driven vehicles (RVs).
A majority of research works have been dedicated to the impact analysis of AVs/CAVs in the mixed traffic. Some focused on the impact on the traffic efficiency, i.e., capacity and throughput. For instance, Davis explored the contribution of adaptive cruise control (ACC) vehicles to the reduction in the jam formation [6]. Shladover et al. proved that the Cooperative Adaptive Cruise Control (CACC) technology has the potential to increase lane throughput from the average 2000 veh/h to approximately 4000 veh/h with high market penetrations [7]. Friedrich found that the traffic volume could be increased to about 3900 veh/h/lane when AVs are in application compared with current designed capacity values of a lane of 2200 veh/h [8]. Both Zhou et al. and Xiao et al. found that a cooperative control of AVs would improve the traffic efficiency of the merging area [9,10]. Some focused on modelling the different traffic behaviours of CAVs, such as fundamental diagram and longitudinal and latitudinal movements. For example, Baskar et al. demonstrated that RVs and ACC-equipped intelligent vehicles had the different fundamental diagrams [11]. Liu et al. changed the lane-changing rules in the cellular automata to simulate the autonomous vehicles [12]. Lu and Aakre proposed a smart driver model to simulate the carfollowing behaviour of CAVs [13]. Moreover, some discussed the influence on other aspects, i.e., safety and environment. For example, Morando et al. investigated the safety performance of AVs with varying penetration rates in two different cases, i.e., a roundabout and a signalized intersection [14]. Lu et al. improved the ACC model of CAVs and validated that these CAVs performed better than RVs in fuel economy [15,16].
Obviously, these works admitted that CAVs had different behaviours compared with RVs. Most of them expected the CAVs to have a faster reaction time, and thus, CAVs could keep a smaller distance with the front vehicle and be safer.
ese works have validated that the application of CAVs is definitely beneficial when CAVs take a high proportion of traffic, but the impact of CAVs with a low penetration rate is controversial. If the penetration rate of CAVs is high in the mixed traffic, the information from CAVs is definitely sufficient to identify the traffic state. What if CAVs only take a low proportion of mixed traffic, will their information be enough to acquire or estimate the traffic state? Since the penetration rate of CAVs grows slowly, it is meaningful to explore whether these CAVs in low penetration rates are a new data source to assist the surveillance of the traffic condition.
Data provided by CAVs resemble the data collected through the traditional human-driven probe vehicles, such as global positioning system-(GPS-) based data and cellphone-based data. Traffic state estimation based on these probe vehicles is one of the most effective methods because probe vehicles have a wide coverage over space and time [17][18][19][20][21]. ere are two common categories of traffic state estimation methods, i.e., the model-based methods and the data-driven methods. e model-based methods are made up of two parts. e first part is the traffic flow model, such as the Lighthill-Whitham-Richards (LWR) model [22], Payne model [23], and their successors. e second part is a data assimilation method to realize the estimation, such as Kalman filtering (KF) and its extensions [19,24]. e datadriven methods mine the relationship between estimates and observations from the historical big data. e commonly used data mining techniques include the statistical analysis algorithms for the time-series data and the artificial intelligence models [20]. However, it should be noted that the traditional probe-based methods are under the humandriven mode, and the probe and non-probe vehicles are supposed to have similar driving behaviour. As mentioned before, the driving behaviours of CAVs are expected to be different from those of RVs, so the applicability of the traditional probe-based estimation methods is uncertain. e datadriven methods require vast amount of historical data.
However, the CAVs have not been put into the market officially, so it is hard to achieve sufficient historical CAV data. Regarding these factors, this study would like to focus on the model-based estimation method using CAV data. ere has been some research using the model-based method. For instance, Wang et al. compared the first-order and second-order models to estimate the mixed traffic state with different AV penetration rates [25], but they did not discuss the low penetration specifically. Considering the controversy under the low penetration condition, this study aims to furtherly discuss how to use the model-based estimation method with information from a small proportion of CAVs in mixed traffic.
More specifically, this study would firstly contribute to set up a simulation platform. Hereafter, this study would explore the sampling characteristics of CAV probes under a low penetration rate, such as their sample size, data-missing rate, and their speed difference with the average link speeds. Furthermore, whether their limited information is supportive to the traffic state estimation would be discussed. Afterwards, although the KF technique is widely used, this study makes the following adjustments to adapt the lowly penetrated CAVs: a recursive model to fulfil the missing parts, calculation methods for state, and measurement noise. Its performance and accuracy are going to be evaluated.
Accordingly, the rest of this paper is organized as follows: Section 2 introduces a simulation platform of mixed traffic to generate the data for the following investigations. e different characteristics of traffic with CAVs are discussed in Section 3. Section 4 presents the exploration of KF-based estimation. Finally, Section 5 summarizes the main conclusions and provides some plans for the improvement and future study. e highly or fully automated CAVs referring to Level 4 or Level 5 in the SAE autonomy level definitions [26] are still in development or test. e simulation method provides a possibility to studying the mixed traffic condition with CAVs. ere is a bunch of expected types of highly or fully automated CAVs. Different types of CAVs would lead to different influences on traffic. erefore, this study made some preceding assumptions to clarify the studied object and situation.

Simulation Platform for Mixed Traffic
First, CAVs are supposed to behave more assertively than RVs, and thus, they can maintain a shorter distance with the front vehicle.
Second, CAVs have a stronger ability to sense the traffic environment compared with RVs. is ability could be enhanced either by the communication with everything (roadside unit, other vehicles, traffic management center, and so on) or by the advance sensing facilities. As a result, this sensing range is supposed to be within ± 500m in this study.
ird, since this study is based on the simulation, the latency and packet loss of the communication between CAVs and everything (roadside unit, other vehicles, traffic management center, and so on) would not be considered this time.

Simulation Parameters.
is study uses VISSIM (version 9) to simulate the mixed traffic containing CAVs and RVs. PTV Group has stated that CAV behaviour could be modelled using VISSIM internally or externally [27]. is study implements the internal way, which is to modify the VISSIM default driving behaviour parameters. Comparatively speaking, the internal way is simpler and more convenient to use, whereas the external approach is used when researchers want to define their own driving behaviour models in VISSIM. Since the focus of this study is to estimate speeds from data generated by CAVs with a low penetration rate, the internal way is more suitable and achievable.
PTV Group has given some recommendations to set the internal model by changing the car-following and lanechanging behaviour parameters for the CAVs [28]. In application, there have been some works that are based on the internal model in VISSIM to explore the impact of CAVs. Table 1 summarizes their adjusted parameters as well as the corresponding default value in VISSIM 9. It should be noted that both this study and the works in Table 1 use the Wiedemann 99 model as the car-following model for the freeway traffic.
Since no empirical data are available, these applications have indicated the possibility of modelling CAVs in VISSIM internally, to some extent. Although it seems that each study has made different adjustment to the default values, they have something in common. For instance, they would let the CAV keep a shorter distance with the front vehicle, have faster and smoother reactions, observe more around vehicles, and realize the cooperative lane changing. Some differences might be caused by the different versions of VISSIM. For example, the maximum speed difference is different between VISSIM version above 9 and below 9. Within the threshold present in these existing studies, this study made the following modifications to the internal models in VISSIM 9, as shown in Table 2. RVs use the default values, while some parameters are adjusted for CAVs. Besides, the desired speed is reset as well, which is 80 km/h for RVs and 90 km/h for CAVs.

Simulation Scenarios.
A simplified freeway is simulated, which contains a 6-km three-lane mainline in one travel direction, a one-lane on-ramp, and a one-lane off-ramp, as shown in Figure 1. e simulation duration is 15300 s with a 900 s warm-up period. Data collected from 900 s to 15300 s are used for analysis.
To analyse the impact from CAV penetration rates, this study proposes six scenarios with different compositions of RVs and CAVs, as shown in Table 3. To indicate a traffic condition with a low proportion of CAVs, the largest ratio of CAVs in mixed traffic is set as 10%. In each scenario, the mixed traffic is loaded on mainline and on-ramp, which is varying over time, as shown in Table 4. e input traffic is set to approach the designed freeway lane capacity from the simulation time 8100 s and last to 10700 s. Besides, in all scenarios, 15% of mainline traffic is assigned to leave the freeway at off-ramp.

Discussions on Mixed Traffic
e 6 km mainline is divided by 500 m into 12 links. ose links are then labelled from Link 1 to Link 12 same as the travel direction, as shown in Figure 1. Data are integrated by the time interval of one minute. e average ground-truth link speeds could be calculated by the ratio of the link length to the average travel time of all vehicles. e average speed of CAVs on a link during a time interval is calculated using the position and the timestamps of CAVs. Here are some statistical findings about the simulated mixed traffic.

Sample Size and Data-Missing Rate.
e boxplot in Figure 2 shows the distribution of the sample size per minute under different CAV penetrations. e median sample size per minute under the penetration rate of 1%, 3%, 5%, 7%, and 10% is 1, 2, 4, 5, and 7, respectively. When the penetration rate is 1%, sample size per minute would mostly appear as a number within [1,2]12. Similarly, it can be seen that the most frequent sample size for the penetration rate of 3%, 5%, 7%, and 10% is [1,3], [2,5], [3,7], and [5,10], respectively. Besides, it seems that the variation of the sample size adds with the increase of penetration rate.
Except for the sample size, another very concerned issue in the discussion of traffic probe with low penetration rate is the missing data rate. is study defines the data-missing rate on a link as the rate between the number of time intervals that have collected CAV data and the total number of time intervals. Figure 3 presents the data-missing rate on different links and under different CAV penetrations. Figure 3 shows that if the CAV penetration rate is small, the sample size is really small and there will be a serious data loss. Especially, when the proportion of CAVs is 1%, the data-missing rate almost reaches fifty percent. It requires that the estimation method is capable of filling the missing parts.

Speed
Difference. Afterwards, this study looks into the difference between the average speeds of CAVs and the mixed traffic speeds on a link. is difference is calculated by the following equation: where d ij is the difference between the average speed of CAVs and the average link speed at the ith time interval and jth link, v cij is the average speed of CAVs at the ith time interval and jth link, and v ij is the average link speed at the ith time interval and jth link. Table 5 summarizes the speed differences and their variances. e desired speed of CAV is higher than that of RV, so it could be referred that the average speeds of CAVs would be most likely higher than the average link speeds. It is proved by the average and median speed differences in Table 5. e maximum and minimum differences indicate that the average speeds of CAV might also overestimate and underestimate the link speeds. It would be vital to establish a Journal of Advanced Transportation right relationship model between speeds of CAVs and link speeds. Accordingly, the variance of speed differences is calculated as shown in Table 5, which could be applied to calibrate the relationship model. e variance shows the expectation of the squared deviation of the speed difference from its mean difference, and Table 5 indicates the deviation decreases with the increase in the CAV penetration rate.

Fundamental Diagram of Mixed Flow.
is section aims to discuss the impact of CAVs on the fundamental diagram. Taking Link 7 as an example, Figure 4 shows the speed-flow diagrams under different penetration rates of CAVs. It seems that an increase in the CAV penetration rate has a slight impact on the shape of the speed-flow fundamental diagram. e largest traffic volume (approaching the link capacity) is 7860, 7620, 7980, 7560, 7920, and 8040, respectively, when CAVs account for 0%, 1%, 3%, 5%, 7%, and 10%. It indicates that an increase in the CAV penetration rate would not definitely contribute to the increase in traffic flux, when the penetration rate is under 10%. Moreover, when the penetration rate increases, the number of the scatter dots on the left side reduces. To some extent, it indicates that the increase of CAVs in the mixed flow could relief the traffic congestion.
Besides, the critical speed to identify the free-flow state seems to remain the same at 80 km/h, as shown in Figure 4. Since the lowly penetrated CAVs do not have a significant impact on the critical speed and volume, this study would assume that the traditional estimation method (i.e., Kalman filtering-based estimation method) might be effective when the proportion of CAV in the mixed flow is low.

Kalman Filtering-Based Estimation
Method for Mixed Traffic

Basic Kalman Filtering
Algorithm. e traditional Kalman filtering-based estimation method is applied. For application in this study, the discrete form of the KF in the linear speed model is given by where x t is the average link speed at the tth time interval. For simplicity, it is originally assumed that it has a linear relationship with the speed value at the previous time interval. y t is the collected speed which is the average speed of CAVs at the tth time interval. Similarly, the CAV speed is supposed Aria et al. [30] Tibljaš et al. [31] Stanek et al. [32] Sukennik [28] Asadi et al. [33] Version of VISSIM and E(ω s , v t ′ ) � 0. e state equation (2) shows the behaviour of an n-dimensional state vector x t , and the measurement equation (3) describes how the state vector is related to an m-dimensional measurement vector y t . Obviously, in this study, m and n are mostly not equal. Especially, when CAVs account for 1%, m is far less than n. In the presence of incomplete data, the following recursive formula is used to solve the previous discrete model in this study:   Journal of Advanced Transportation 5 if y t is collected, if y t is missing, if y t is collected, if y t is missing, where P t t is the error covariance matrix of state x t and P t−1 t � A t · P t−1 t−1 · A t ′ + Q t . Usually, A t , H t , Q t , and R t are calibrated using the historical data. According to the small average speed difference in Table 5, this study sets both A t and H t as 1. As for the state noise Q t , this study calibrates it separately based on the traffic condition. With a mixed traffic of 1% CAV penetration rate as an example shown in Figure 5, it is obvious that the variation of state error enlarges when speed falls below the critical speed of 80 km/h. From the observations among all fundamental diagrams across all penetration rates, the application of CAVs does not have a significant impact on the critical speeds when CAVs have a low penetration rate. erefore, the same critical speed is used to identify  e state noise Q t is calibrated under free-flow condition and non-free-flow condition separately. e measurement noise R t could be calculated using the variance of difference between CAV speeds and ground-truth speeds, as shown in Table 5. Finally, the initial state values are set as x 0 = 85 and P 0 = Q t under the free-flow condition.
Furtherly, the ground-truth speeds, estimated speeds, and CAV speeds are illustrated in the time-space form, as shown in Figure 7. According to the speed values, traffic state is divided into three conditions which are represented by three different colors, i.e., green, yellow, and red. Figure 7 shows that the estimates (i.e., Figures 7(a) and 7(d)) almost copy the ground-truths (i.e., Figures 7(b) and 7(e)). If not for the data missing, CAV speeds could almost tell the traffic condition, as shown in Figures 7(c) and 7(f ). Especially, 10% CAVs (i.e., Figure 7(f )) seem to be able to visualize the traffic state in a rough three-color map compared with the groundtruth speed map (i.e., Figure 7(e)).

Accuracy.
is section will further evaluate the estimation accuracy. It is measured by RMSE (Root Mean Squared Error) and MAE (Mean Absolute Error). ey can be obtained by  Journal of Advanced Transportation with 10% CAVs has smaller RMSE and MAE than other penetration scenarios. However, RMSE and MAE of the 3%, 5%, and 7% scenarios are quite approaching those of the 10% scenario. Although RMSE and MAE of the 1% scenario are a bit larger than other scenarios, their values remain within a small value, i.e., RMSE is less than 7 and MAE is less than 5.
In general, the estimation method with limited CAV data has a reasonable performance, even when the proportion of CAVs in mixed traffic is only 1%. Moreover, this study would like to compare the accuracy of estimates and CAV speeds. Since there are missing parts in CAV speeds, RMSE and MAE are calculated using the data that eliminates the data-missing time intervals. Taking the data from Link 7 as an example, the accuracy comparison results are shown in Figure 9. It is obvious that estimates reduce the speed error compared with the CAV speeds.

Sensitivity Analysis.
In the application of this KF-based estimation method, some parameters might play an important role in the estimation accuracy. ey are the state and measurement noises. As mentioned in the KF-based estimation method, the measurement noise is calibrated by the historical data of the CAV speeds and the ground-truth speeds on each link. In practice, the ground-truth speeds are        usually not available on all links. erefore, this study selects out the minimum and maximum R t from Table 5 and uses these values in the estimation. eir estimation accuracy is compared with those using the calibrated R t , as shown in Figure 10. Obviously, the proposed method is the optimal, but if R t could not be calibrated on each link, a small value of R t calibrated on other links is suggested.
Another parameter is the state noise Q t . In the proposed method, both the calibration and application of the state noise would be separated based on the traffic condition. If this separation is eliminated, this study finds that it will lead to larger estimation errors as the comparison results in Figure 11. It indicates the outperformance of the proposed method.

Conclusions and Future Works
It seems to be inevitable that CAVs will come into the market and travel on the regular roads in the near future. It also could be imaged that there will be mixed traffic consisting of CAVs and RVs, and the proportion of CAVs will be low at the beginning stage. is study discussed the application of the limited CAV data to estimate traffic state at this beginning stage. At first, this study set up a microsimulation platform of the mixed traffic flow using the VISSIM. Five simulation scenarios with the CAV penetration increasing from 1% to 10% were set to generate the testing data.
en, a step-by-step discussion on the characteristics of mixed traffic was conducted based on the simulation data. e sample size distribution under different CAV penetrations was found, and the data-missing rate was calculated which was especially large when CAVs only account for 1% of mixed traffic. e analysis on the speed difference between CAV speeds and the groundtruth link speeds was an assistant in the following calibration of the proposed estimation model . e speed-flow diagrams of mixed traffic indicated the possibility of applying the traditional estimation method. Accordingly, the simple KF-based estimation method was used and adjusted to adapt the incomplete CAV data. e estimation results, accuracy evaluations, and sensitivity analysis validated the applicability and precision of the proposed estimation method using limited CAV data.
Since the Level 4 and Level 5 CAVs are not ready to enter the market, the simulation method is an alternative way to make these investigations. With the developing technology of CAVs, the driving behaviour model of CAVs needs to be updated accordingly in the future. Besides, the simulated roadway is oversimplified. e complex merging and weaving area from on/off-ramp to the mainline could be discussed in detail. Moreover, the fusion of the measurements from other existing sensors, such as loop detectors and GPS probe vehicles, and the testing of other existing estimation methods are also useful for the field application.

Data Availability
e data used to support the findings of this study were generated from VISSIM simulation software based on the simulation settings described within the article.

Conflicts of Interest
e authors declare that there are no conflicts of interest regarding the publication of this paper.