Autonomous Vehicle Evaluation: A Comprehensive Survey on Modeling and Simulation Approaches

In recent years, autonomous vehicles (AVs), which observe the driving environment and lead a few or all of the driving duties, have garnered tremendous success. The field of AVs has been developing rapidly and has found many applications. As a safety requirement by policymakers, these vehicles must be evaluated before their deployment. The evaluation process for AVs is challenging because crashes are rare events, and AVs can escape passing predefined test scenarios. Therefore, capturing crashes and creating real test scenarios should be considered in order to have an evaluation approach that represents the real-world scenarios. One evaluation approach is based on the naturalistic field operational test (N-FOT), in which prototype AVs are driven by volunteers or test engineers on the roads. Unfortunately, this approach is time-consuming and costly because one needs to drive thousands of miles to experience a police-reported collision and nearly millions of miles for a fatal crash. Another approach is the Accelerated Evaluation method. The core idea of the Accelerated Evaluation approach is to modify the statistics of naturalistic driving so that safety-critical events are emphasized. This paper presents a brief survey of the advances that have occurred in the area of the evaluation of partly or fully AVs, starting with naturalistic field operational tests (N-FOTs). The review goes on to cover test matrix evaluation, worst-case scenario evaluation (WCSE), Monte Carlo simulations, and accelerated evaluation (AE). We also present all the simulation-based and agent-based modeling approaches that do not follow any evaluation protocol listed above. This study provides a scientific analysis of each of the evaluation techniques, focusing on their advantages/disadvantages, inherent restrictions, practicability, and optimality. The results reveal that the accelerated evaluation approach outperforms naturalistic field operational tests (N-FOTs), test matrix evaluation, worst-case scenario evaluation (WCSE), Monte Carlo simulations methods in some of the car-following, and lane-change studies when using specific models. Moreover, the agent-based model and augmented and virtual reality approaches show promising results in AVs evaluation. Furthermore, integrating machine and deep learning into the available AV evaluation methods can improve its performance and generate encouraging outcomes.


I. INTRODUCTION
D ECADES-long mobile robot navigation and more recent artificial intelligence (AI) and wireless communication advances have created the technological possibilities to make the semi-autonomous road vehicles of today possible and have brought the fully autonomous intelligent transportation systems (ITS) of tomorrow within reach. This body of research in AI also offers excellent potential to increase the efficiency and safety of future transportation substantially. Autonomous vehicles (AVs) can help to save fuel, decrease traffic crashes, reduce traffic congestion, and provide better transportation services to older people and people with disabilities [1]. There are many legal challenges in developing AVs, and an effort is being made to remove these challenges. Since 2012, at least 41 states and D.C. have considered legislation related to AVs [2]. Twenty-nine states have proceeded with laws authorizing the testing of AVs on public roads [2]. In Arizona, Delaware, Hawaii, Idaho, Illinois, Maine, Massachusetts, Minnesota, Ohio, Washington, and Wisconsin, executive orders have been issued regarding AVs [2], [3], [4], as shown in Figure 1.  [2] In Europe, the United Kingdom allowed the testing of AVs on public streets beginning in January 2015 [5]. On February 6, 2019, a formal statement issued by the Department for Transport (DfT) said that the UK is "on track to meet its commitment to have fully self-driving vehicles on UK roads by 2021" [6]. Table 1 shows the AVs Readiness Index for 30 countries [7], [294]. This index shows the level of preparedness for AVs [7]. It is a compound index that combines 28 individual measures from various sources into a single score [7]. The 30 countries have been evaluated on 28 measures collected into four pillars, namely policy and legislation, technology and innovation, infrastructure, and consumer acceptance [294]. Four measures are used to define the final index score for each country by KPMG International and the ESI ThoughtLab [7]. They use public data such as media reports, press releases, and other materials [7]. One of these measures utilizes a consumer survey conducted by Branded Research in every country on the list [7].
All variables are given equivalent weight except the mobile connection speed and broadband measures in the infrastructure pillar [294]. These two measures have half the weight of other measures. The collected data is normalized first before being combined using the min-max approach because every variable has a different measurement unit. The normalization method changes the variables into a range between zero and one [294]. As a result, the top-ranked country gets one and the bottom country zero. Thus, each pillar has an equivalent weight in the final score for each country [294]. Figure 2 shows these pillars with the associated variables based on [294].
As the pace of AV innovation picks up, cities have become the proving grounds of choice. Tech giants, automakers, and Variables [294] startups alike are focused on cities because that is where future customers live and work [8]- [13]. Many major car companies have begun research and development programs for AVs. Table 2 presents the reported AV production schemes [14], [15]. Three features are considered, namely traffic jam assist (TJA), autonomous parking assist (APA), and automated highway driving (AHD). On October 14, 2015, Tesla operated the autopilot function on the Model S through an over-the-air software update [16], enabling features such as adaptive cruise control, lane-keeping, auto lane change, autopark, and automatic emergency steering [17].
After 2016, a large number of companies entered the AVs industry such as Amazon, Apple, Microsoft, Nvidia, Tesla, Toyota, Uber, Volvo, Huawei, and many more [18], [19]. Recent developments in the area of advanced driver assistance systems (ADAS) show vast improvements toward the accessibility of autonomous driving. Many companies have raised the level of autonomy over the last few years. Several projects are targeting SAE level 4 or higher. A list of the definition of SAE levels of AVs is explained in [20] and is shown in figure 3. Advancement in autonomous driving requires high-level algorithms that are efficient enough to solve complicated scenarios, especially urban scenarios, such as intersections with multiple pedestrians, pedestrians with unknown intent, traffic lights, cars, and bicycles, which are a real challenge to predict. These high-level algorithms include pattern recognition (classification) [232], [233], [297]- [299], clustering [236,237,238], decision matrix algorithms [239,240,296], pedestrian intention prediction [234,235] and many more algorithms.
Driving in urban environments has been both a potential and a hot area of research due to the high density of vehicles and many obstacles that must be avoided. There have been  several in-depth efforts to study this problem, such as the DARPA Urban Challenge [21], the V-Charge Project [22], and at least three US military applications-urban operations (UO), manned-unmanned teaming (MUM-T), and AGR [23]. The challenge of driving in an urban environment is complex because it considers increasing the speed of AVs and environmental complexity [24]. By increasing the level of automation, the evaluation process becomes challenging because the AV system will become more complex. An AV may have 100 million lines of code, while the Boeing 787 has only 6.5 million [28] ( Figure 4). It is a real challenge for a company and also for the evaluation authorities, such as the National Highway Traffic Safety Administration (NHTSA) [30], to check every line of on-Road Vehicles [20], [26], [27].
code. Many problems may be uncovered after the product release, which could cost the company a lot of money [31], [32]. It is, therefore, necessary to evaluate the AV system VOLUME 4, 2016 during the design process. In this paper, we focus on the evaluation of AVs in level 3 to level 5. The term "autonomous vehicle" is used in this article instead of "automated vehicle." We have chosen to utilize "autonomous" because it is a common term, and the general public is familiar with it. However, the term "automated" implies control or operation by a machine and refers to connected vehicles, while "autonomous" have more intelligence than the term "automated" and suggests the vehicle is acting independently [295]. The following sections show several AV evaluation methods.

II. NATURALISTIC FIELD OPERATIONAL TESTS
Naturalistic field operational tests (N-FOTs) [33] have been used to evaluate AVs. In this test, several equipped vehicles are deployed on the road and are driven in naturalistic conditions [34]. During the driving time, the data are collected for evaluation purposes. An advantage of naturalistic field operational tests (N-FOTs) observation is that it allows the investigators to directly observe the CAVs and AVs in a natural setting. A naturalistic driving study of 100 vehicles was implemented by Virginia Polytechnic Institute and Virginia Tech to investigate the major contributing components of crashes. The collected data had been used to inspect many elements, such as driver performance, surrounding environment, driving conditions, and other components, that are related to critical incidents, near collisions, and collisions [35]- [39]. A list of large-scale N-FOT projects carried out in the U.S. is shown in table 3. Some companies, such as Waymo (formerly the Google self-driving car project), have designed several SAE Level 4 AVs [41] and evaluated the whole autonomous system on actual roads since 2012. Up to Jan 2020, the Waymo AVs have logged nearly 20 million miles of self-driven operation on public roads in 25 cities and tens of billions of miles through computer simulations, with thousands of scenarios and different individual test tracks [42], [304]. In the N-FOT test, the drivers are trained and know where to drive. Thus, the evaluation process involves non-intrusive driving conditions. The N-FOT test approach has many restrictions, such as the time required to conduct this test. In addition, the probability to expect critical events under naturalistic conditions is very low. For example, in the U.S., the vehicles should travel a total of 0.53 million miles for a policereported collision and 99 million miles for a fatal collision [43]. Therefore, the N-FOT projects require many vehicles, a lot of time, and large budgets. In [44], it is noted that an N-FOT "cannot be conducted with less than $10,000,000." An efficient test approach for AV evaluation is required.

III. TEST MATRIX EVALUATION
A test matrix evaluation is defined as a series of test scenarios that are defined at the start of the process. Then, the autonomous vehicles go through each test and are assessed objectively or subjectively [40]. Figure 5 shows an example of a test matrix evaluation process. In figure 5, the design cycle starts with using specific scenarios or cases. Then, the functional and technical specifications are constructed from these cases. The final design is then verified at a component and function level. In the evaluation cycle, the function description is established using the functional and technical specifications. Then, three types of tests are applied to the constructed function. These tests are potential safety tests, human factors, and technical tests. At the end of the process, a validation test and a safety impact analysis are implemented.
Test matrix evaluation scenarios can be applied in field tests, hardware-in-the-loop (HIL) tests, driving simulator tests, and computer simulation [40]. Field tests are utilized by all certification authorities [40]. Driving simulator tests and computer simulations have also been used to decrease the cost and time. The Test Matrix evaluations are mainly based on crash databases. The pre-crash scenarios are investigated in many studies [241]- [245]. Figure 6 shows the General Motors 44-crashes typology. The United States Department of Transportation designed the pre-crash typology based on the NASS crash databases GES (General Estimates System) [246] and CDS (Crashworthiness Data System) [247]. Volpe National Transportation Systems Center integrated these two typologies to create the 37 pre-crash scenarios and capture the vehicle movements and dynamics in real-world and precrash critical scenarios. The top five scenario groups are namely, car-following, lane change, left turn, crossing, and opposite direction, were generated by Volpe using the GES, NMVCCS (National Motor Vehicle Crash Causation Survey) [248], and EDR (Event Data Recorder) [249] databases.

Google technicians and volunteers
Fully self-driven vehicle FIGURE 6: Pre-crash scenarios defined by NHTSA [244]. Table 4 presents the major crash databases in the USA and Europe.
For more details about crash analysis, reference [250] presents extensive reviews about it. Test matrix forms scenarios from the data acquired from the NFOTS and acquired by technical document analysis [251], [252], [253]. Several programs and research projects have been started to develop evaluation policies using the test matrix technique, such as the collision scenarios designed in Crash Avoidance Metrics Partnership (CAMP) [252], the critical scenarios created through the classification tree method for ADAS [253], and the scenarios constructed based on ontologies [254]. Table 5 shows all the test matrix projects. The significant advantages of the test matrix technique are that the determined test policy is repeatable, well-grounded, and quick to finish [46]. Nevertheless, several challenges come to the table, such as all the test scenarios being predetermined. Thus, the AV control system can attain excellent results in these test scenarios, but the results under real-life scenarios may not be satisfying. In an analogy, "Having a standard test is akin to holding an SAT exam for students with all problems pre-announced. Students do well in the test, but the score may tell very little about how much they learn" [45].
Besides, the test matrix scenarios are usually chosen from collision databases in which most of the collisions were caused by human-controlled vehicles (HVs). Therefore, the test scenarios and the evaluation processes applied to AVs may not accurately capture the safety-critical events of AVs [40]. Moreover, according to CAMP, ADAS, and ontologies projects, the results indicated that the test matrix is more appropriate in autonomous driving system evaluation with the availability of low cost and high controllability of scenarios. However, the generation of test scenarios using the traditional test matrix approach is usually based on few influence factors. These factors are generally integrated simultaneously to generate an ultimate scenario [252]. The influence factors are usually taken from the NFOTS database, WCS database, test standards, and many more. The influence factors can be then divided into surrounding environment parameters, AV parameters, and road users' parameters [131], [132], [40], VOLUME 4, 2016 [255]. Thus, adding additional factors will show a geometric growth in the number of scenarios and, as a result, increase the test cost.
An accident analysis report by Tesla revealed that the faults of a critical system might be generated by mixtures of particular values of some factors. Moreover, test scenarios that integrate these mixtures of values can help in the evaluation process by revealing new problems [255]. Therefore, using the traditional test matrix approach that is based on the exhaustive testing of all influence factors is redundant and ineffective.
Furthermore, specific test scenarios should be generated using certain influence factors (elements) to evaluate AV systems. For example, the generated test scenarios to evaluate the Lane Departure Warning (LDW) system take into consideration the traffic environment parameters, subject vehicle driver's behavior, and traffic participants' state [263]. . Every parameter, such as weather condition, has many values such as sunny, cloudy, rainy, and foggy. To design test scenarios according to ISO 17361, the total test scenarios are only eight [8], which are not enough to find the system failure. According to the test matrix method using the exhaustive testing approach, the total test scenarios are 497,664,000 [264]. Assuming an average running time of 30 s for each scenario, a total time of 473 years is required to finish all scenarios [263]. It is undoubtedly an inefficient and unacceptable testing approach.
Authors in [263] proposed a Combinatorial Testing Scenario Generation Method Based on Complexity (CTBC) to improve the effectiveness of the traditional test matrix technique. The presented method considers decreasing the number of test scenarios and improving the overall complexity of the scenarios. The results revealed that scenarios with high complexity are effective in finding system failures. Moreover, the CTBC method reduces the number of test scenarios and generates more complex scenarios than the traditional test matrix methods. On the other hand, each AV system has unique influence elements, and the coupling relationship between these elements from different systems is not investigated yet. Therefore, the system's defects under the coupling relationship condition are not explored yet by the CTBC method. Thus, many AV systems, subsystems, and advanced features are not yet tested and evaluated by the CTBC method and traditional test matrix methods.

IV. WORST-CASE SCENARIO EVALUATION
The worst-case scenario evaluation (WCSE) technique is suggested to recognize the highly challenging scenarios for any cars, with or without active control systems [40]. In [47], [48], Ma tries WCSE on rollover (overturning of a vehicle) and jackknifing of articulated cars using a dynamic game theory. The term Jackknifing refers to the folding of an articulated vehicle so that it resembles the acute angle of a folding pocket knife. This approach suggests that the control inputs and disturbance inputs take part in a two-player game condition. In [49], Ungoren proposes another approach as a one-player game by considering the car and its control system as a joint dynamic structure. Then, for solving the WCSE problem numerically, the iterative dynamic programming technique is applied [40]. This technique is used in [50] to assess the integrated chassis control (ICC) system. Therefore, a mathematical model of the vehicle is established, and the WCSE is defined as a horizon optimization problem to resolve for a trajectory (e.g., a sequence of steering inputs) that minimizes or maximizes the cost function (e.g., rollover index) [50]. A solution for two different systems is conducted. For a linear system (SISO linear time-invariant and MIMO systems), the worst bounded inputs are acquired from the convolution of impulse responses [51]. For a nonlinear system (nonlinear dynamical system for a control problem), the solution of the Hamilton-Jacobi-Bellman equations is obtained by the calculus of variations to resolve the optimal trajectory task [52].
Even though the WCSE approach can recognize the weakness of a car and a car control system, it does not take into account the occurrence probability of such worst-case scenarios [53], [54]. Therefore, the WCSE results do not provide sufficient data about critical real-world situations. Furthermore, there are some limitations when using complicated control algorithms or when the control algorithms are not in numerical form. As a result, the WCSE techniques may either face difficulties finding the worst scenarios or be timeconsuming.

V. MONTE CARLO SIMULATIONS
Monte Carlo simulation or Monte Carlo method is a mathematical procedure utilized to predict the likely results of an unpredictable event. The Monte Carlo simulation creates a model of potential outcomes by leveraging a probability distribution (uniform or normal distribution etc.) for any variable with uncertainty [300]. In this approach of AV evaluation, the N-FOTs data are used to construct stochastic models, and Monte Carlo simulation is applied to assess partly or fully AVs. Table 6 presents a list of all the papers related to this method, with the objectives, techniques & models, and scenarios.
In addition to table 6, table 7 presents all the references with the associated AV tasks.
In [55], collision avoidance systems are evaluated by establishing an "errorable" driver model to mimic human distraction based on road-departure crash-warning (RDCW) FOT and intelligent cruise control (ICC) FOT naturalistic driving databases. In [56], heavy trucks' collision warning and collision mitigation braking technologies are assessed by building 1.5 million forward-collision test scenarios from naturalistic driving data conflicts. The main advantage of this approach is that naturalistic driving data are used to create all the scenarios/models. Therefore, these scenarios/models represent real-world scenarios. As a result, Monte Carlo simulation models may decrease the assessment cost compared to the field tests. Moreover, this method evaluates data collected from human driving databases without any actual crashes [55], [56]. Therefore, using Monte Carlo simulations VOLUME 4, 2016 Utilize model predictive control and its optimization function to find a smoother trajectory. Use Monte Carlo simulations as a safety assessment.
Model predictive control and Monte Carlo simulations.
Lane-change on the straight road and turning at the intersection.

[277]
Analyze the safety of complex road scenes using a reasoning framework and Monte Carlo integration.
A road scene with a parked car and a moving bicycle.
[278] Present a threat evaluation algorithm for general road scenes.
A dynamic driver model, Monte Carlo sampling, and an extended Kalman filter.
Vehicle approaching an obstacle at a high velocity. Two vehicles are moving, with one approaching the other from behind with a higher velocity. [279] Present an algorithm to predict the driver intention of other vehicles using a random-forests classifier. Compute possible future trajectories with a sequential Monte Carlo method.
A sequential Monte Carlo method. Lane change scenario.
[284] Propose a reinforcement learning-based deep-MCTS algorithm for vision-based autonomous driving control.

Deep-MCTS.
Race road with four sharp curves with and without obstacles.
[285] Propose a risk-based framework to evaluate AV systems and subsystems as a black box.
Multilevel splitting method and adaptive importance-sampling methods.
A highway scenario consisting of six agents (five are part of the environment). [286] Implement a simulation framework to evaluate a fully AV system using adaptive importancesampling methods.
Adaptive importance-sampling methods. A highway scenario.

[287]
Proposes an Enhanced Driver Model (EDM) that predicts the driver action in an urban environment. Using the EDM model, a Monte Carlo simulation is used to identify the statistical distribution of fuel consumption and travel time.

EDM.
A Mixed Urban Route.  directly may result in an inefficient simulation model because of the dominance of the non-safety-critical portions of the naturalistic driving data.
In [55], the model coefficients are usually acquired by fitting a collection of driving data. Then, the errable model can be improved in real-time to achieve a higher level of false positives and false negatives for better modification to the driver. However, according to [55] and [56], tunning the models under open-loop usually show inefficient results with human or hardware-in-the-loop. Therefore, when the hardware or human is in the loop, this method may not speed up the evaluation procedure.
The security problem in the autonomous driving domain, especially trajectory planning, is investigated heavily in the literature reviews. It is essential to estimate the surrounding road users and then predict the probabilistic occupancy of each road user to identify any future risk [271]. Achieving accurate estimations help the AV navigates to the final destination with the least crash probability. According to some literature reviews, it is commonly proposed that all road users keep their initial motion state [272], [273]. Therefore, the actual motion of the road user might be different from the estimated motion because of the uncertainty of the road user detection and future intent. In literature reviews, there are many estimation techniques based on kinematic models or dynamic models [274] - [276]. The kinematic models have some limitations in neglecting the forces that influence the road user movement [274]. At the same time, the dynamic models consider the variety of forces that affect driving, such as tire forces and air friction. Using dynamic models is excessively complex. Moreover, it requires to have a different model for each vehicle [270]. Therefore, Monte Carlo simulation [55], [56], [265], [266], [268]- [270], [277]- [279], Gaussian distribution [280], [281], and Markov chain abstraction [265], [282], [283] are usually used to tackle the above issue.
In [270], the motion prediction of road users is studied. The authors focus on trajectory planning in two typical lanechange scenarios (lane change on a straight road and turning at intersections). The Monte Carlo simulation is used as a safety assessment to estimate the probabilistic path planning of the road users and then produce a map from probability statistics to actual scenarios. Furthermore, Monte Carlo simulations have a limitation because of the probabilistic errors from the random sampling [265]. According to literature reviews, it is required to use more samples to achieve accurate outcomes by Monte Carlo simulations. However, using more samples will accumulate more errors, and the results will not be accurate [265], [270]. The outcomes in [270] revealed that Monte Carlo simulation is inefficient in real-time computation.
Authors in [265] compared Markov Chain Abstraction and Monte Carlo Simulation for the safety evaluation of fully AVs. The two methods have common differences in terms of their error sources. The significant errors in the Markov chain approach are only the systematic errors from the discretization of the state and input space [265]. Moreover, the Markov chain has no probabilistic errors since no random sampling is implemented [265]. In the Monte Carlo method, the main errors are probabilistic errors from sampling the initial states and the input sequences [265].
Furthermore, the Monte Carlo simulation has no systematic errors because each simulation uses the system's main dynamical equations [265]. The outcomes show that the Markov Chain resulting probability distributions outperformed the Monte Carlo simulation approach in terms of accuracy and simulation speed [265]. On the other hand, the Monte Carlo simulation produces superior outcomes than the Markov Chain method when computing the crash probabilities [265]. Thus, the outstanding performance of Monte Carlo simulations in crash probabilities is due to the absence of systematic errors in the Monte Carlo simulation.
Koren et al. extended the adaptive stress evaluation technique that was used to evaluate the aircraft collision avoidance system to test the AVs [266]. Adaptive stress testing (AST) is an approach to find the critical scenarios using a Markov decision process (MDP) [266]. The original AST approach utilizes a Monte Carlo tree search (MCTS) with double progressive widening (DPW) to explore any failure in the system [267]. The test scenario in [266] includes an AV with noisy sensors approaching a pedestrian crosswalk. The deep reinforcement learning solver is used to improve the efficiency of AST instead of using the Monte Carlo tree search (MCTS). Authors claimed that the deep reinforcement learning approach is more efficient and can discover more critical events than the Monte Carlo tree search [266]. The results reveal that both methods can recognize the failure trajectories in an AV-pedestrian conflict. The two solvers produce many events where the AV hits the pedestrian. A major problem with the MCTS is that MCTS has non-zero noise that adds up over time. As a result, the MCTS does not reduce this noise to zero, leading to a considerable amount of probability error with time. Thus, the AV will not detect and predict the pedestrian and result in a critical collision in the VOLUME 4, 2016 generated scenarios. Table 8 shows the numerical results of the two solvers of adaptive stress testing.
In table 8, the number of calls to STEP for MCTS is the needed number of calls to find a critical accident in the AVpedestrian conflict. In other words, it refers to the algorithm's computational capability to find a critical conflict. Moreover, this number of calls represents the required number of calls to trust the results. The presented approach in [266] consists of three scenarios shown in figure 7. In scenario 1, the generated event by MCTS and DRL send one pedestrian into the scene towards the AV to establish a conflict. DRL approach produces a straightforward path for a pedestrian towards the vehicle, which is better than the MCTS method. In short, according to the results presented in [266], the DRL solver for AST outperformed the MCTS solver, especially in higher-dimensional scenarios.  [293]. Three action generators and two reward functions are compared. The outcomes revealed that the MCTS performs well and converges to a driving agent in static conditions. Moreover, the results showed that the MCTS only succeeds at low speeds in real-time driving [293].
Reference [268] investigates the behavior of eight tracking controllers under extreme situations, uncertain parameters, and sensor noises. Different tests from the single and double lane change scenarios are generated to evaluate the tracking controllers. Monte Carlo simulations and Rapid Exploring Random Trees (RRTs) are utilized to assess the controllers' average and worst-case performance [268]. The authors state that most controllers' properties (e.g., stability, noise rejection, robustness to model variations) are not strongly compromised during the turning phase [268]. Moreover, the authors concluded that the obtained outcomes by Monte Carlo simulations and Rapid Exploring Random Trees (RRTs) may not be perfect but can help choose the suitable controllers.
In [269], the univariate Gaussian Probability Density Functions are used to estimate future discrete state transitions such as the beginning of a turn by other agents. Then, the outcomes are compared to the Monte Carlo simulations. The results showed a remarkable correlation between the proposed prediction distributions and the Monte Carlo simulations, especially over long prediction horizons [269]. Although the outcomes revealed an excellent correlation between the two methods, more investigations and validations are required for this model.
Reference [278] presents a risk evaluation algorithm for public road scenes: 1) The driver behavior is modeled as a probabilistic prior.
2) A Monte Carlo sampling is used to approximate the distribution of future scenarios. 3) Different safety measures are computed based on the distribution of future scenarios. 4) A variety of techniques are implemented to increase the performance of this algorithm. The results showed that the algorithm was tested on simulated data and sensor data and was able to differentiate between the safe and non-safe road scenes. However, the dataset used in this paper is not enough to achieve optimal risk assessment because the data does not have a lot of variation. Moreover, this algorithm removes samples with conflicts with other objects and replaces them with nonconflict samples. Thus, a real-time assessment is required to judge the efficiency of this algorithm.
Reference [279] captures the driver behavior of other vehicles using a random-forests classifier. Then, the likely future trajectories are computed with a sequential Monte Carlo simulation followed by the possible risk assessment. This method is tested by conducting numerical simulations. The simulation results exposed that the algorithm was able to recognize the driver's behavior. However, a limitation to this method is that no real-time deployment and evaluation is conducted.
Reference [277] presents a reasoning framework for the future movement of multiple road users. The probability distribution of every vehicle's future motion is generated by Monte Carlo planning. The Synthetic data that is based on a real-world scenario is used to test this approach. The suggested approach shows excellent outcomes but requires more improvements and validation to handle more V2V and V2P complex scenarios.
Reference [284] develops a reinforcement learning-based Monte Carlo Tree Search (deep-MCTS) control technique for an AV vision-based system. Two deep neural networks (DNNs) are utilized to predict action probabilities and then fed to deep-MCTS to reconstruct multiple future trajectories. The deep-MCTS method outperforms existing methods and shows 50.0%, 66.30%, and 59.06% improvement in training efficiency, steering control stability, and driving trajectory stability, respectively [284]. The deep-MCTS is evaluated using the USS and the Torcs simulators.
In [285], the authors present a simulation testing platform to evaluate the whole AV system as a black box. The multilevel splitting method and adaptive importance-sampling methods are used to address the shortcomings of naïve Monte Carlo simulations in estimating rare event probabilities. The approach in [285] outperforms naïve Monte Carlo for events with a probability lower than 10 −3 . Moreover, the variance of the failure probability is decreased by up to 10x [285]. In contrast, naïve Monte Carlo performs well comparing to the above method in predicting non-rare events [285].
In [286], O'Kelly et al. implemented an end-to-end AV testing framework using adaptive importance-sampling methods to speed up rare-event probability validation. As a result, the system validation is accelerated by 2-20 times compared to naïve Monte Carlo methods and 10-300p times (where P is the number of processors) over real-world evaluation [286].
Reference [287] proposes an Enhanced Driver Model (EDM) that predicts the driver action in an urban environment. The effects of Signal Phasing and Timing (SPaT) are considered by presenting the concept of Line-of-Sight (LOS). Signal Phasing and Timing (SPaT) data gives information on signal states by motion [288]. Detailed studies about SPaT and LOS are presented in [288]- [292]. The EDM model is then validated against data collected from equipped vehicles with different drivers. Using the EDM model, a Monte Carlo simulation is used to identify the statistical distribution of fuel consumption and travel time under other states such as traffic conditions, SPaT, and driver behavior. This study evaluates the influence of uncertainties related to real-world driving in fuel consumption in connected vehicles [287].
In short, based on the literature reviews presented in this section, Monte Carlo simulations perform well in evaluating AV in some scenarios and fail to produce outstanding outcomes in other scenarios compared to other methods. Thus, more improvements are required to this method using new techniques such as deep learning and reinforcement learning.

VI. ACCELERATED EVALUATION
Zhao in [40] proposed the accelerated evaluation test. The main objective of this test is to establish a method that can speed up the AV evaluation course of action. Moreover, this method can precisely show AVs' real-life safety benefits. The main idea of the accelerated evaluation approach is to reduce the evaluation process time and eliminate the safe parts of daily driving by skewing the statistics of the surrounding vehicles. This process consists of many steps, as shown below: • Collect a massive amount of real-world driving data.
• Take out events that have possible conflicts between AVs and surrounding human-driven vehicles. • Model the surrounding human-driven vehicle behaviors as the main distraction to AV. Then, a modeling of the randomness as random variables vector x with probabilistic distribution f (x) is conducted. • Skew the disturbance statistics to minimize the safe part of daily driving by replacing f (x) with the accelerated distribution f *(x). • Run Monte Carlo tests with the accelerated probability density function f *(x). The results will provide more intense interactions/collisions between AVs and humandriven vehicles. • "Skew back" the outcomes of the accelerated tests to understand the performance of AVs under real-life driving scenarios using the statistical analysis. Figure 8 shows the accelerated evaluation procedure. technique [40].
The suggested method can be used in computer simulations, human-in-the-loop tests with driving simulators, hardware-in-the-loop tests, or vehicle tests. Four procedures were established in this approach to shape the foundation of the accelerated evaluation idea. The first technique depends on a likelihood analysis of naturalistic driving. A probabilistic model approach based on time series driving data was used to build the test scenarios. The assessment policy is sped up by decreasing the relatively safe events that are highly likely to occur. The second technique gives a mathematical base for the "skewing back" mechanism depending on the importance sampling theory, such that the statistical equivalence between the accelerated tests and the naturalistic driving tests can be rigorously demonstrated. The third technique is adaptive accelerated evaluation. This technique shows a policy to recursively discover the best way to skew VOLUME 4, 2016 the probabilistic density functions of human-driven vehicles to maximally decrease the evaluation duration. Finally, the accelerated evaluation method for analyzing the dynamic interactions between AVs and human-driven vehicles was established based on stochastic optimization procedures.
In [59], three indicators -crash, injury, and conflict rates -are calculated to test the accelerated evaluation's accuracy and performance. The crash and conflict cases are binary events, and the injury event is modeled as a probabilistic function [59]. Two types of simulations are performed: accelerated evaluation and naturalistic driving simulations (non-accelerated, based on Monte Carlo). Table 9 shows the accelerated rates of crash, injury, and conflict events in carfollowing scenarios [59]. The N natur e , N acc , r acc in table 9 represent the number of naturalistic driving simulations, the number of accelerated tests, and the accelerated rate, respectively. In the crash and injury events, the accelerated approach expedites the evaluation by five orders of magnitude [59]. In the conflict event, the acceleration rate is lower by 300 times [59]. Table 10 shows the accelerated rates of the crash, injury, and conflict events in lane change scenarios [57]. The D natur e , D acc , and r acc in table 10 represents the driving distance needed in naturalistic test, the driving distance required in accelerated test, and the accelerated rate, respectively.
The simulation outcomes in car-following and lane change scenarios in [59] and [57] revealed that the accelerated tests could decrease the assessment time of the collision, injury, or conflict events by 300 to 100,000 times. Otherwise stated, driving for 1,000 miles can show the AVs challenging scenarios that would take 300 thousand to 100 million miles to encounter in the real world [57], [59]. As a result, the development and validation time for AVs will be reduced. Table 11 presents a list of all the papers related to this method, with the objectives, techniques &models, and scenarios.
Even though the accelerated evaluation methods can produce excellent results and reduce the duration of the evaluation, they do not take into account the following tasks: • The AVs to AVs and human-driven vehicles to humandriven vehicles interactions are not studied and are used only as a benchmark. • The AV sensors and controls are suggested to work accurately. Thus, the measurements and perception errors, and control are assumed to be accurate. • AVs are assumed to look like human-driven vehicles.
Therefore, human drivers' reactions to AVs are the same as to the other human-driven vehicles. • The secondary impacts of crashes are not considered in these methods. • The human-driven vehicles model is not accurate enough to mimic real-world scenarios. • Many real-world scenarios are not investigated and developed in this technique, such as sensing/detection fail scenarios (e.g., fog, snow, low light), perception fails scenarios (e.g., hand gesture, eye contact, blink-ing lights), vehicles/pedestrians/pedal-cyclists conflicts scenarios (e.g., running, red light, cut-in, jaywalk), making-decisions scenarios (e.g., low confidence, multiple threats), and so on. Therefore, more improvements for this method are required to have a well-rounded and efficient technique to evaluate AVs.

VII. SIMULATION-BASED MODEL APPROACH
The goal of autonomous driving is mainly to decrease the number of deadly accidents in a highly uncertain environment as well as to provide a high quality of comfort and efficiency and create unprecedented intelligent transportation for individuals within cities. In the interest of getting the AV to navigate safely and dependably in uncertain environments, many challenges need to be considered. Modeling AVs is one of these challenges and is regarded as an essential step toward accurately validating AVs in highly uncertain environments. Then, the interaction between AVs and the surrounding vehicles or vulnerable road users should be investigated and validated in various real-world scenarios. A well-established validation approach is required to fill many gaps in the AVs evaluation process during the design, predeployment, and deployment stages. However, based on the previous AV validation techniques, the real-world data are limited, and many safety-critical scenarios are hard to capture in real life. Therefore, the simulation-based model approach is introduced to tackle these challenges.
In 1934, Greenshield introduced the first traffic model [307]. Since 1934, three major model classes have been presented: microscopic, macroscopic, and mesoscopic models depending upon the level of details needed for network analysis [305]. The microscopic models study the behavior and interaction of individual vehicles based on car-following, lane-changing, and gap-acceptance algorithms [305] [306]. The microscopic models are used to model the sophisticated urban street networks, intersections, vulnerable road users (VRU) movements, traffic lights, multi-model systems, and many more. The macroscopic models represent traffic as a continuous sequence and are based on the relationships of the flow, speed, and density of the traffic stream [306]. The macroscopic models focus on modeling large-scale traffic networks such as freeways, corridors, surface-street grid networks, and rural highways on a section-by-section basis instead of following individual vehicles [306]. Moreover, the macro simulation-based model approach requires a traffic assignment policy, which can be implemented by utilizing activity-based models [70]- [73] or modified traditional fourstep models [74]. Finally, the mesoscopic models integrate the properties of micro and macroscopic simulation models and allow less fidelity than the micro models for individual vehicles [306]. An example of mesoscopic simulation studies can be found in [310]- [312].
The focus of this section is mainly on the connected and AVs' micro simulation-based model studies that consider longitudinal and lateral dynamics. The micro-simulation-   Likelihood analysis, importance sampling techniques, adaptive accelerated evaluation, multistep stochastic sampling.
Lane change scenarios and carfollowing scenarios.
[57], [58] Accelerated evaluation of AV safety. Importance sampling techniques. Lane changes scenarios. The main crash type is the frontal collision due to unsafe cut-ins. [59], [60] Accelerated evaluation of AVs safety. Extracted naturalistic driving data are used to build the statistics of the motion of the primary other vehicles (POV).
Car-following scenarios.
[61] Accelerated evaluation of AVs safety. Piecewise mixture models. Lane -change scenarios. [62] Accelerated evaluation of AVs safety. Piecewise mixture distribution models. Lane -change scenarios. [63] An Accelerated testing approach for AVs. Joint statistical models, accelerated distributions for Gaussian mixture models using importance sampling techniques.
[64] Accelerated evaluation and validation of complicated control systems within AVs.
Kernel methods.
Learning-based approach. Unknown rare-event sets.
Statistical learning models. Rare-event sets. [67] Evaluation uncertainty in data-driven self-driving testing.
classical bootstrap method with likelihood ratiobased scheme.
Lane change test scenario.

[68]
On-track testing for AV. Kriging-based statistical approach. Lane change scenario. [69] An accurate, affordable, and safe way to evaluate a design of an AV.
Co-Kriging-based statistical model. Lane change test scenario.

[302]
A rare-event simulation for neural network and random forest predictors is presented.
Importance Sampling. Sequential Mixed Integer Programming. Random Forest. Neural Network.
Rare-event sets.
[303] Present a Deep Probabilistic Accelerated Evaluation (Deep-PrAE) to estimate rare-event probabilities in safety-critical applications.
based model produces valuable data for AVs' future development based on the level of details in the model. Moreover, the interactions of AVs with human-driven vehicles and vulnerable road users (VRU) are presented and discussed.
In addition, the agent-based simulation modeling of AVs is presented.

A. MICRO-SIMULATION-BASED MODELS
In microscopic simulation models, the AV's behaviors can be modeled by adopting the available human drivers' models or by inventing new intelligent models that consider V2X capabilities [110,313]. Moreover, the car-following model is required in collaboration with the driver models or any new innovative model for V2X communications to represent how the simulated vehicles interact with the vehicle ahead [110]. Therefore, the car-following model is an essential part of modeling the behavior of human-driven vehicles (HVs), connected autonomous vehicles (CAVs), and AVs in microsimulation modeling [110,313]. A car-following model can be designed based on these assumptions by keeping a safe distance between the lead and host vehicles and controlling the vehicles' speed and accelerations [135].
This section presents all the available car-following models for CAVs and AVs in micro-simulation modeling in lateral and longitudinal directions. Then, the AV simulation platforms are taken into the spotlight. Finally, the agent-based modeling studies are covered. VOLUME 4, 2016 The car-following, lane change and distance headway are the three significant behaviors of any vehicle in microsimulation modeling. Figure 9 shows an illustration of these major behaviors. In addition, the interaction between vehicles on the road is determined by many factors such as lateral vehicle maneuver, driver behavior, and surrounding vehicles behavior [317]. This section discusses the longitudinal and lateral dynamics for human-driven vehicles (HVs), CAVs, and AVs. The following vehicle will follow the lead vehicle with proper distance headway, speed, and intended acceleration or deceleration in the longitudinal scenario. In the lateral scenario, a lane-change maneuver is performed, as shown in figure 9. This chapter covers all the significant contributions in microscopic analysis of traffic flow and safety evaluation and how the diverse traffic flow modeling has been presented and developed from homogeneous microscopic modeling to mimic the real-world environment. In micro-simulation modeling, all vehicle parameters such as the max, min, intended acceleration or deceleration, and desired speed values are defined using many statistical distributions and functions [315]. In 1945, researchers introduced vehicle trajectories for car-following modeling instead of using speeds and distance between the two vehicles [314]. They proposed the safe driving distance between the lead and following vehicles that the following vehicle driver must maintain. The car-following models can then be applied after determining the safe gap between the lead and the following vehicle in every scenario. The popular car-following models are GHR models, safe distance models, Intelligent Driver Model (IDM), ACC CACC Models, optimal velocity models, psychophysical models, fuzzy logic models, and cellular automata (CA) models. The car-following models are classified into types or categories depending on the utilized logic [316]. For example, in [318], the authors presented five different classes while other researchers, such as [319], suggested three more types to include the 21st-century models. Figure 10 illustrates the available car-following models.
The following subsections will describe each type of carfollowing model.

GHR Model
In 1958, Chandler introduced the first version of the GHR model to find the relative velocity between two vehicles, known as a stimulus [324]. In 1961, General Motors (GM) presented a car-following model using the acceleration/deceleration values as a stimulus [328,329]. The GM model used the speed of the leader and follower vehicles to estimate the acceleration/deceleration values. Then, the estimated acceleration/deceleration values are used to calculate the driver's reaction time. The GM model is a simple linear car-following model with a constant sensitivity parameter, and the acceleration of the following vehicle can be approximated [324,330]. In this model, the gap between the leader and follower vehicles affects the stimulus, making the GM linear model impractical [315]. Moreover, this model does not consider the driver's acceleration and deceleration reactions in the lane-change maneuvers [325]. Furthermore, the diversity of vehicles on the road is neglected in the reaction time calculation [326,327]. Therefore, the GM and GHR models are failed to address the drivers and vehicles' diverse conditions [321]. Other restrictions in these models are the non-availability of the acceleration/deceleration limits [320]. Recently, many researchers suggested a wide range of solutions to expand the original models and overcome some GM model limitations. For example, in [322], the following driver is permitted to accelerate if the relative speed of the lead vehicle is increased. Moreover, the authors also introduced various acceleration and deceleration parameters to improve the original models in making an efficient decision. The authors in [323] also proposed an extension to speed up the driver reaction under deceleration scenarios as compared to the acceleration cases. Furthermore, the authors in [331] suggested a critical headway value to estimate the state of driving behavior. Finally, another extension to the GM model is presented in [329] to consider the non-linear behavior in terms of relative speed and distance between the lead and following vehicles. Despite all the extensions, the GM model has its behavioral limitations, such as the drivers' reactions to random changes in the stimulus, and the actions of the leader vehicle keep impacting the relative speed and the driver of the following vehicles even when the gap between these vehicles are high [315]. Table 12 shows an outline of the GHR model studies.

Safety-Distance or Collision Avoidance Model
The safety-distance model keeps a safe distance between the lead and follower vehicle. This model is based on the fundamental motion equation [318]. Pipes explains the term of a safety distance in 1953 [112] as "a good rule for following another vehicle at a safe distance is to allow yourself at least the length of a car (about 15 ft) between you and the vehicle ahead for every 10 miles of hour speed at which you are traveling". In 1981, Gipps presented the first acceleration model that documents car-following and non-car-following maneuvers [339]. However, a limitation to the Gipps model is that the model requires keeping the safe-distance headway and not exceeding the desired speed. As a result, many researchers performed extensive extensions, modifications, and calibrations studies to the Gipps model [340][341][342]. Table  13 shows a list of these studies. Nowadays, the Gipps model is widely used in micro-simulations modeling because of its basic calibration assumptions about human driving behavior [315].

Psychophysical Models
The psychophysical or action point model [318] utilizes space headway and relative velocity for the following vehicle as a threshold. An effort is taken by the drivers when the threshold values are reached. The threshold values of the spacing or relative velocity should be reached to see a reaction from the drivers. The psychophysical model is also helping to record if the drivers are paying attention to the small spacing and the associated effect when there is a large spacing on the following behavior [330]. In [343], the psychophysical model is implemented to a simulation platform using a framework called "MISSION." Moreover, the interaction between the lead and follower vehicles is investigated by defining four thresholds and regimes, as shown in figure 11. The drivers' behavior in the psychophysical model is suggested to be naturally distributed and can be represented as normal distributions [343]. That means that each driver has unique driving skills for perception, reaction, and prediction of the surrounding environment. The authors in [343] also proposed that the vehicles have different abilities to perform simple techniques such as max velocity and max acceleration/deceleration values. The Wiedemann model in [343] presented various ranges of other random parameters to be used to calculate threshold values and driving functions. An example of these parameters are, namely, the desired distance (AX), the desired minimum following distance (ABX), the maximum following distance (SDX), the perception threshold (SDX), and the decreasing and increasing speed differences (CLDV, OPDV) [315]. More improvements are required to this model because the calibration of its parameters is challenging [315,321]. Examples of simulators using psychophysical models include VISSIM and PARAMICS. The VISSIM platform is widely used to model heterogeneous traffic conditions. However, using VISSIM to simulate 2D traffic conditions is not recommended because it produces inefficient outcomes [321].

Optimal Velocity Model (OVM)
The optimal velocity model (OVM) is proposed to model the traffic flow instabilities that cause congestion in public roads. The OVM model is developed to describe the dynamical behavior of traffic flow using the motion equation of each vehicle. The model is based on the relative distance to the lead vehicle. Moreover, the acceleration of the following vehicle is controlled in a way that the final velocity is modified according to the trajectory of the lead vehicle [321]. Several modifications and extensions are proposed to overcome some of the limitations of the OVM approach for modeling the mixed traffic conditions. The significant improvement to the OVM approach is the two-velocity difference model (TVDM) [345]. The TVDM approach is developed to integrate the intelligent transport system (ITS) with the OVM approach. The integration of the two models provides a comprehensive car-following model that incorporates multiple leading vehicles [321]. Table 14 shows a list of all the OVM modification and extension studies.

Intelligent Driver Model (IDM), Adaptive Cruise Control (ACC), and Cooperative Adaptive Cruise Control (CACC) Models
These days, with the advancement of the intelligent driving assistance system (IDAS), vehicles have become more intelligent and are expected to perform many driving tasks. The cruise control (CC) system is an early step in connected AVs' intelligent driving assistance system. More improvements are added to the cruise control (CC) system to have more advanced methods such as the adaptive cruise control (ACC) and cooperative adaptive cruise control (CACC) systems. Intelligent driving assistance systems such as CC, ACC, and CACC are essential for supporting acceleration control for longitudinal motions based on the gap distance and the speed difference between the lead and host vehicles. Therefore, many researchers utilize the simulation-based model approach to evaluate and study the impacts of connected and AVs. Furthermore, the simulation-based model analysis provides the flexibility to build safety-critical scenarios and validate the AVs during their development to avoid mistakes before public road deployment. Many of the microsimulation papers in this section established their own ACC, CACC, AV, or CAV car-following models. Furthermore, each article has implemented a unique method and produced distinct performance indicators. Figure 12 presents the four VOLUME 4, 2016   The ACC system extends the existing CC system to incorporate a headway sensor that observes the distance between the host vehicle and the vehicle in front of it. The essential function of the CC system is to keep a constant vehicle speed adjusted by the driver. In contrast, the ACC system's principal purpose is to control vehicles' acceleration based on a distance gap and a speed difference between lead and host vehicles. Moreover, the ACC system can accelerate or decelerate based on the lead vehicles' speed changes. Figure  13 shows the ACC scenario.  Numerical simulation Estimating the evolution of traffic congestion. Lane width effects on the car-following model can lower critical headway. The lateral separation effects greatly enhance the realism of car-following models. [346] Evaluate the vehicle typedependent car-following heterogeneity from microand macro-aspects by using Next Generation Simulation (NGSIM) trajectory data.
Numerical simulation During the close car-following process with the same speed, the driver generally keeps a higher gap distance with the leader when following or driving a larger vehicle.
[347] Develop a new staggered carfollowing model taking into consideration lateral separation effects.
Cars 2012 Time-to-collision, visual angle variables, lateral separation distance.
Numerical simulation Incorporating lateral separation effects into the car-following model leads to the suppression of traffic jams and greatly enhances the realism of the model. Cars. 2015 Lateral gaps, Two-sided lateral separation.
Numerical simulation The two-sided lateral gaps carfollowing model has a larger stable region compared to a one-sided lateral gap car-following model. Cars 2016 Lateral headway, escape corridor.
Numerical simulation Considering the lateral separation and overtaking expectation in the model can better simulate carfollowing behavior, especially in some complex driving conditions. Furthermore, communication capabilities are added to the ACC system. The modified ACC system with V2V and V2I communications is called the CACC system. The CACC system shares the acceleration, deceleration, braking capability, and vehicle positions using V2V and V2I communications [75], [102]. The communications capabilities of the CACC provide shorter headway time compared to the ACC. Figure  14 shows the CACC scenario setup. The ultimate goal of the intelligent driving assistance system (IDAS) is to control vehicles fully. Connected AVs have all the AV functions along with V2V and V2X communications. Figure 15 shows the CAV and AV scenarios. One significant difference between CACC and CAV is the automated lateral movement. The standard car-following motions established for human-driven vehicles are old-fashioned compared to the CC, ACC, CACC, and CAV. Thus, the literature studies' related terms and approaches are slightly different and are not profoundly classified. Table 15 presents the simulation-based modeling studies of ACC and CACC and their validation analysis. This table provides the objective of each review, the base model, modeling and validation scenarios, vehicle types, assessment basis, and primary outcomes.  [308,350].
According to literature reviews, the IDM and MIXIC models are used widely as benchmark car-following models. Some modifications to the IDM [106,78,79,95] and MIXIC [94,116,133] models are added to understand AVs' longitudinal motion fully. The car-following model is an essential model that is used widely in simulating AVs. Research efforts have been conducted to establish AV car-following models by improving the traditional car-following models (IDM and MIXIC). The intelligent driver model (IDM) [134] is one of the uncomplicated and safe models that produce practical outcomes [135]. The main goal for developing the intelligent driver model (IDM) is to tackle the modeling of the mixed traffic conditions. The initial development of the IDM was done by Treiber et al. [134] for a single-lane scenario. Furthermore, the acceleration is described as a function of VOLUME 4, 2016  Pipes [112], ACC [113], and comprehensive modal emissions model (CMEM). the gap, velocity, and space difference between the lead and following vehicles [321]. The maximum acceleration and minimum headway are considered to maintain the minimum gap and acquire the required velocity. Additional extensions to the model are required to cover the multilane traffic modeling and consider the potential risk elements [135]. The IDM and linear approaches do not support the modeling and validation of 2D traffic scenarios [321]. The modified IDM can be used as the ACC model or as a human-driven vehicle model. Furthermore, the improved version of the IDM can be utilized to simulate connected autonomous vehicles (CAVs) [95].

Position of ACC vehicles and Market
To fully understand and estimate the impact of AVs using a simulation-based model, autonomous modeling should be able to examine the performance of the AV under highly uncertain conditions. Moreover, the model should evaluate AV safety, fuel consumption and emissions, noise emission, and traffic performance. The MICroscopic model for simulation of intelligent cruise control (MIXIC) is then suggested and developed as a stochastic simulation model to overcome these challenges. The MIXIC is implemented widely for cooperative AVs simulations because this model uses V2V communication and can optimize traffic capacity under realworld conditions. The MIXIC allows interaction between the lead and host vehicles to share the actual speed, acceleration, maximum potential braking, and warnings. In [94], the CAV is established based on the MIXIC model. This technique utilizes smart-micro automotive radar (UMRR-00 Type 30) (90 m ± 2.5% detection range and ± 35 horizontal FOV) as an input for the MIXIC model. The sensor update rate is 50 ms and can track up to 64 objects. The AV speed should be low enough to apply a complete stop when the lead vehicle is detected and reached a full stop. Using the maximum deceleration of the AV (host) and lead vehicle, the AV's maximum safe speed and acceleration and the minimum safe distance can be calculated. In [89], the authors presented a hardware-in-the-loop(HIL) testing system for the CAV applications. The results showed the effectiveness of the CACC in absorbing certain disturbances and oscillations of speeds. Moreover, speed oscillation decreases as vehicle position in the string increases. In addition, perfect communication/radar contributes to string stability.

Cellular Automata Model
In 1992, the cellular automata (CA) model was presented by Negal and Schrecknberg [351]. The road segment in this model is divided into cells with the same size of almost 7.5 meters long [351]. Each cell can fit a single vehicle or can be unoccupied. The longitudinal dynamics of the vehicles are integrated into the CA model by including the acceleration, braking time, and randomization of vehicle types [321]. The CA model is then extended to include the two-lane traffic conditions [352]. As a result, large-scale traffic dynamic modeling is achieved easily using this model. A limitation to this model is the loss of information due to the discretization of cells [321]. Due to the discretization of cells that have the same size, vehicles are required to update their parameters, such as velocity and acceleration/deceleration in multiple cells. Another limitation is the difficulty of representing all vehicles types within cells. The cell size is an essential factor in the CA model. The representation of all vehicles types cannot be achieved if a large cell is utilized. Moreover, the computational workload might be increased when using a small cell. Furthermore, this model cannot validate the changeability in headway distance between the lead and follower vehicles regarding vehicle velocity due to the same cell size assumption. Therefore, a wide range of modifications and extensions to the CA model was performed by Incorporating many parameters such as other vehicles types, vehicle size, mechanical properties, lateral arrangement, lateral gaps between vehicles, flow, velocity, occupancy of a vehicle in a cell, cell size, and acceleration&vehicle type [353][354][355][356][357][358][359][360][361][362]. As a result, the improved CA model utilizes a cell size of 0.5 m in length, a safe gap at the front and back of vehicles, the relationship area of occupancy, interaction rate, and structure of vehicles [361,362]. However, according to outcomes presented in [353][354][355][356][357][358][359][360][361][362], further validation and investigation of the model in various traffic conditions and lateral and longitudinal interactions is required to expand its application.

Fuzzy Logic Model
In 1992, Kikuchi et al. incorporated the relative headway distribution, velocity, and acceleration into the fuzzy model [363]. The developments of the fuzzy logic model continued with time to include the car-following model [364][365][366][367]. The generated outcomes show several issues due to the inadequate establishment of the drivers' perception. Moreover, the mixed traffic flow in car-following behavior in this model is not included [321]. An advantage of the fuzzy logic model in car-following maneuvers is the ability to determine the lane shifting behavior of the vehicles [368][369][370].

2) Other micro-simulation studies
Lane-changing models are essential elements for modeling HVs, CAVs, and AVs in traffic micro-simulation tools. In 1978, Sparmanns proposed a lane-change model to classify lane-change behavior as slower-to-faster and faster-to-slower lanes based on driver need [321]. In 1986, Gipps presented the well-known lane-change model for urban driving that considers the effects of elements such as traffic signals, obstructions, and heavy vehicles in traffic flow [371]. The main focus of the Gipps model is to investigate the critical interaction between vehicle to vehicle, vehicle to obstructions, and other real-world driving behaviors [371]. More studies about lane-change modeling in micro-simulation can be found in [372][373].
The intersection is one of the challenging environments for AVs because of the unpredicted interactions among pedestrians, bicycles, and vehicles and the intersection users' highly complicated design and behavior. As a result, many studies cover a wide range of research related to intersection scenarios. For example, in [123], a turning vehicle is modeled, and its surrogate safety indicators are investigated at mixedflow intersections. Authors in [124] validated the automated intersection traffic management applications using a vehiclein-the-loop (VIL) verification environment. In [127], Mohd et al. evaluated CAVs and AVs applications and fuel consumption and emission using a hardware-in-the-loop testbed. The outcomes showed fast data transfer at every 200 ms, and the optimized engine operating points and the desired vehicle speed are tracked precisely. In [126], Shao et al. assessed CAVs and AVs using a hardware-in-the-loop testbed and a living lab focusing on fuel consumption and emissions. The outcomes revealed that the error between the virtual vehicle and the actual testing vehicle is 1%. Thus, the results support the use of the HIL testbed to evaluate CAVs in realworld scenarios. Furthermore, in [128], Li et al. presented an advanced intersection control system to support CAVs and AVs trajectories and validate their safety and performance at intersections. Table 16 shows a list of studies related to the modeling and validation of AVs in different scenarios.
With the advancements toward fully AVs, human drivers will not control and understand the surrounding environment. Therefore, AVs should have a social understanding of the interaction between their control systems and road users to ensure a safe driving environment [137]. The meaning of interaction in driving involves many tasks such as identifications, behavior analysis, future action prediction, and so on, and taking the right actions to avoid any severe collisions. Behavioral psychology studies have investigated the social aspects of driving and have shown what factors can significantly impact road users' decisions [138]- [140]. These factors are pedestrian demographics [140], road conditions [139], social factors [139], and traffic characteristics [142]. Thus, a deep understanding of pedestrian crossing behavior, the extent of these factors, and how they are connected is required.
In the case of autonomous driving, intent prediction algorithms have been established to estimate the next moves of pedestrians [143] and drivers [144]. A wide range of technologies has been developed to assist AVs in communicating with road users, such as V2V [145] and V2P [146] communications. Moreover, visual intent interfaces such as LED lights [147] or projectors [148] are used. The problem with all of these studies is that they consider the technologies a rigid active thing rather than a social interaction [149]. Pedestrian behavior studies are classified into two categories, classical studies and AV conflicts studies. The traditional methods focus on studying pedestrian behavior when interacting with human drivers. A wide range of data-collection methods is used in classical pedestrian behavior studies, such as observation, police reports, video recording, photography, simulation, scripted observation, questionnaires, literature surveys, and interviews. The focus of this section is on the simulation-based method. A study conducted by Caird and Hancock [151], which involves 48 men and women, shows that the road users misjudge the vehicle arrival time as the size of the vehicle increases. In [152], Sun et al. study the relationship between pedestrian waiting time before crossing and gap acceptance. The outcomes show that a long pedestrian wait time results in a low acceptance gap. Another study investigates the impact of vehicle size on pedestrian behavior and shows that pedestrians are more careful when interacting with a larger vehicle [153]. Wiedemann [154] shows that pedestrian flow and pedestrian speed have a linear relationship with no interaction between pedestrians. Rasouli and Tsotsos [150] classify the factors that impact pedestrian behavior into two groups: pedestrian and environmental factors. Figure 16 shows a list of these factors and how they are connected under classical studies. represented by dashed lines [150].
In contrast, figure 17 presents a list of factors that impact pedestrian behavior when facing AVs and how they are connected. Various methods are used to collect data that are used in pedestrian behavior studies involving AVs, namely, observation, video recording, photography, simulation, scripted observation, questionnaires, literature surveys, interviews, Wizard of Oz (A research experiment in which subjects interact with a computer framework that subjects believe to be autonomous but actually run or partially run by a hidden human being.).
The simulation data collection methods used in pedestrian intention studies involving AVs are discussed briefly in this section. Beggiato et al. [155] investigate the indirect forms of communication between vehicle and pedestrian, such as vehicle speed and distance. The authors claim that many factors impact the interpretation of the signal, such as vehicle speed, road users' age, and time of day.
Jayaraman et al. [156] investigated how the availability of traffic signals at crosswalks slightly influences pedestrian crossing behavior while the AV's driving decisions significantly impact such behavior. In [157], Chang et al. suggest a method for intent display by placing moving eyes at the Vehicle emergency braking method is tested. A vehicle that is using V2V and V2P technologies will engage in a severe conflict with a pedestrian when the delay or packet loss rate increases to a certain value. The communication delay and packet loss rate should be reduced to avoid deadly conflicts with pedestrians. [ CAVs bring about compelling benefit to road safety as traffic conflicts significantly reduce even at relatively low market-penetration rates (12-47%, 50-80%, 82-92% and 90-94% for 25%, 50%, 75%, and 100% CAV penetration rates, respectively).
[123] Model the turning vehicles at mixed-flow intersections and investigate their surrogate safety measures.
Turning vehicles at mixed-flow intersections.   [150]. front part of the vehicles. Based on data collected from 15 participants, the authors concluded that more participants choose to cross with the availability of rolling eyes, and 20% increases the number of participants if the eyes are staring toward them. Another study by Pillai [158] suggests that pedestrians' crossing decisions depend on the erratic behavior of the vehicle (speed and distance) and that under specific weather conditions with low visibility, the use of intent display will be helpful. Finally, Pillai concluded that culture is an essential factor that should be considered when designing any intent displays. According to literature reviews [150], pedestrian behavior under autonomous driving conditions needs more focus to include signal, location, road structure, gap acceptance, and social norms factors. Moreover, some elements from classical studies, such as group size, pedestrian speed, and street width, should be evaluated under autonomous driving circumstances. These factors are essential for understanding pedestrian intention to cross the road. A deep understanding and consideration of these factors will result in safe autonomous driving. In short, the V2V, V2P, and V2I communications can provide a safe environment for autonomous driving and road users. However, although using these technologies is advantageous, several questions are raised regarding sharing pedestrians' data via these technologies [159].

3) Autonomous Vehicle Simulation Platforms
Simulation and modeling platforms are well-developed tools for the design and validation of autonomous or nonautonomous vehicle developments. V-model is one of the most popular simulation methods used to cover the testing and design of the entire AV development process [164]. In the development process of autonomous vehicles, virtual simulation methods are applied at different stages, and various testing setups are achieved, such as model-In-the-loop (MIL), software-in-the-loop (SIL), and hardware-in-the-loop (HIL). The ISO 26262 is based on V-model and does not match an agile development process. As a result, there is a wide range of simulation platforms. In [163], Rosique et al. classify the simulation platforms into four different approaches that can be considered when selecting a simulator for autonomous vehicles, namely, vehicle test simulation, games, and physics engines for simulation, robotics simulators, and perception simulators.
Autonomous vehicle development is based on v-model development. Model v has several phases of development and testing, such as the model-in-the-loop (MIL) [165] approach, software-in-the-loop (SIL) [166] validation, hardware-in-theloop (HIL) [167], and the vehicle-hardware-in-the-loop (Ve-HIL) approach. The vehicle test simulation approach is based on the v model criteria. Several factors must be considered when choosing an autonomous vehicle simulator, such as the availability and compatibility of models, subsystems that can be tested, availability of real-time simulation communications protocols, and compliance with ISO 26262 [163]. Table 17 shows some of the simulation platforms used to validate and test autonomous vehicles based on the vehicle test simulation approach.
Another ordinary simulator is the use of an available game engine. A game engine is defined as the software part of a computer game that has a rendering engine, a physics engine, collision detection, and response, sound, scripting, animation, artificial intelligence, networking, streaming, memory control, threading, localization support, and scene graph. In addition, it might incorporate video support for cinematic and virtual reality (VR) simulation [163]. Game engines provide some features that are advantageous for autonomous vehicles and robotics researchers, such as physical fidelity, distributed architecture, cutting-edge graphics, and scriptable environments [163]. The main game engines that are used widely in the development of autonomous vehicle systems or subsystems are Unity 3D [174], Unreal Engine [175], Blender [176], and Cry Engine [177]. The physics engine is an essential component when simulating an autonomous vehicle perception system. This engine provides less fidelity and works according to the detection of collisions. Examples of the high-performance physics engines that are used in AV simulation are Open Dynamics Engine (ODE) [178], Bullet physics [179], NVidia PhysX [180], and PreScan [231].
Robotics simulation platforms are also used in autonomous vehicle simulation. Models of all sensors and actuators should be provided for the effective use of this type of simulation [181][182][183]. Moreover, a realistic environment for testing and validating all kinds of algorithms and subsystems should be provided as well. Many features should be considered when choosing a robotics simulator, such as 3D  [169] Commercial n/a n/a x x x dSpace GmbH [168] Commercial -x x x x LabVIEW [171] Commercial -x x x x CarSim [172] Commercial -x x x x CAT Vehicle [173] GPL/Open Source  [190]. Table 18 presents a broader list of some robotics simulators that integrate data simulation data. Table  19 provides a comparison between sensors that are simulated using robotics simulators. Simulation platforms should mimic real-world environments to model and validate any perception algorithm. Therefore, the available simulation platforms tend to have these features: fast prototyping, physics engines for realistic motions, realistic 3d rendering, and dynamics with scripting. Tables 20 and 21 show a list of perception algorithms simulation platforms and their features.

B. AGENT-BASED MODELS
Agent-based models integrate activity-based demand generation and dynamic traffic assignments [204]. This approach covers all macroscopic four-step procedures, namely, demand generation, demand distribution, model choice, and traffic assignment [203]. Agent-based models (ABMs) also utilize independent agents with a bottom-up technique to simulate a highly complex system [202]. This type of modeling and simulation is considered a superior simulation approach compared to other methods in terms of flexibility, hierarchy, intuition, and dealing with complex systems. For example, an AV operating on public roads while interacting with human-driven cars, vulnerable road users, and road networks is highly problematic. Within the independent vehicle system, all subsystems are interconnected and worked simultaneously. Moreover, a wide range of elements, such as the diverse behavior of agents (people and vehicles), is integrated within agent-based modeling. With high-end computers, the agent-based modeling approach is used to build challenging models with more realistic scenarios.
The autonomous-vehicles agent-based modeling studies are diverse. They include the travel and environmental impacts of autonomous vehicles [70], [206], the parking require-ments with the arrival of autonomous vehicles [207], [208], the traffic congestion caused by autonomous cars [209], the system performance of the self-driving vehicle [210], [211], [208], [212], [213], and the autonomous vehicles' modal share and travel modes [214], [72], [215]. There are many critical variables that can impact the system performance of the AV, namely, fleet size, demand, strategy, ride-sharing, pricing schemes, configurations of stations, travel mode, vehicle capacity, service area, refuel/recharge time, maximum waiting time, and cruising time [221]. Many researchers consider these variables in the sensitivity analysis and modeling of various simulation scenarios. For example, in the literature review, there are 27 papers related to the fleet-size research area. In the fleet-size studies, the regular vehicles are replaced by autonomous vehicles (AV) [71], autonomous taxis (ataxi) [216], autonomous mobility on demand (AMOD) [217], autonomous transit on demand (ATOD) [218], shared autonomous vehicles (SAV) [70], or shared autonomous electric vehicles (SAEV) [219].
The fleet size or replacement rate is considered one of the significant outcomes of the agent-based simulation. The replacement rate is used as an indicator to show the efficiency of an autonomous vehicle system. Fagnant et al. [220] argue that the travel demand, average speeds, and average trip distances impact AV system performance. Moreover, the replacement rate of the autonomous vehicle is investigated with a case study in Austin. Fagnant et al. [70] argue that one autonomous vehicle can replace ten human-driven cars. In [221], the outcomes show that the replacement rate in [70] is 1:11 with link-level travel time and is 1:9 with constant speed in [220]. In [210], Marczuk et al. show that fleet size relies on many factors, such as service area, average demand, level of service (based on average waiting time, service and reject rate), routing scheme, relocation plan, and design of the facility. Moreover, in [206], the fleet size can be minimized by ride-sharing. Additionally, the average trip distances data in the survey papers are not precise enough. Furthermore, environmental scenarios, such as an urban area or highway, can impact the travel distance. In [221], ride-sharing is considered as an indicator for the routing scheme. In short, many major factors can affect the fleet size or replacement rate, namely, service data, average demand, average speed, average waiting time, service and reject rate, ride-sharing, relocation plan, and design of the facility. The replacement rate in [70], [220], and [216] is the same, considering that one VOLUME 4, 2016     [195] Restricted n/a n/a n/a Yes Driving SIMLidar [196] GPL/Open Source n/a n/a C++ n/a LiDAR Helios [197] GPL/Open Source JMonkey Engine OpenGL Java n/a LiDAR GLIDAR [198] GPL autonomous taxi can replace ten regular vehicles, excluding relocation and travel demand. The outcomes show that the average waiting time is approximately 2.28 minutes, which is considered too large.
In [162], the authors present an autonomous intersection management algorithm called AIM-ped, which considers vehicles and pedestrians. The total optimal throughput is created when incorporated with max pressure control. Moreover, the conflict region model conducts stability analysis of the autonomous intersection management system's properties. The AIM-ped algorithm is implemented by integrating the max-pressure control with an existing trajectory optimization algorithm to obtain the optimal vehicle trajectories. The result is that the AIM-ped algorithm can trigger vehicle movements when there is a change in pedestrian demand. The simulation outcomes show that pedestrians and vehicle delays are negatively correlated. In [125], the sequential movements of vehicles in intersections are modeled as a multi-agent Markov decision process (MAMDPS). The outcomes show that the optimal sequential decision from DCL-AIM outperforms all the other control policies. In [205], the authors present a model to develop the interaction dynamics x x x n/a n/a n/a n/a n/a DeepDrive x x -x -Udacity * x x x n/a n/a x n/a n/a Constellation Table legend: x-Yes, n/a-Unknown or could not be determined, --No.
between drivers and pedestrians in dense traffic areas where pedestrians and/or drivers do not obey the traffic laws and regulations. This approach can be used in control systems of AVs and drivers' onboard alert systems [205]. In [92], the performance of many SAV fleets and vehicle sizes serving travelers across France's Rouen Normandie metropolitan area is evaluated. Moreover, the effect of ride-sharing and rebalancing strategies on service is studied. This study emphasizes that the performance of SAV is strongly correlated with the fleet size and the shared rides. Table 22 presents a summary of the features of specific agent-based simulation platforms. Most agent-based simulation research papers use the MATSim simulation platform to model all the autonomous vehicle system operations. In the past, MATSim was used to simulate regular vehicles and not autonomous vehicles. With the need to validate AVs, [209] and [222] establish the AV toolkit for MATSim. In the agentbased simulation, the autonomous vehicles are likewise simulated, and the difference between all the simulated vehicles is the data source. For example, for an autonomous taxi, taxi data are used, and travel surveys are taken into account for other car-sharing services. Private AVs simulation in the literature surveys is limited. In addition to agent-based modeling, the augmented and virtual reality methods have a great potential to be essential methods for AV evaluation. Recent studies about these methods are presented in [374][375][376][377]. In short, the agent-based simulation approach for AVs is in its infancy stage. More focus on agent-based simulation of private autonomous vehicles is required for this approach to compete with other AV validation methods.

VIII. DISCUSSION AND FUTURE RESEARCH DIRECTIONS
Two main tests are currently used to evaluate CAVs and AVs, namely Naturalistic field operational tests (N-FOTs) and virtual tests. The virtual tests include test matrix evaluation, worst-case scenario evaluation (WCSE), Monte Carlo simulations, accelerated evaluation (AE), simulation-based and agent-based modeling approaches. In some cases, both N-FOTs and virtual tests are combined simultaneously to evaluate CAVs and AVs. In N-FOTs, vehicles are equipped with the required sensors and are driven in naturalistic conditions, which is not the case in virtual tests. The N-FOTs allow the investigators to observe CAVs and AVs in a natural setting. The data collected from N-FOTs are utilized to investigate many elements, such as driver performance, surrounding environment, driving conditions, and other components related to critical incidents, near collisions, and collisions. However, the N-FOTs have many restrictions, such as the time required to conduct the test, trained drivers, the low probability of critical events. Moreover, the test requires many vehicles, a lot of time, and large budgets. Therefore, virtual tests are used as an efficient alternative approach to model and validate CAVs and AVs. Many questions are raised regarding virtual tests and how these tests can be reliable and replace the Naturalistic field operational tests. For example, urban scenes are essential in virtual tests, which inevitably involve pedestrians, vehicles, cyclists, motorcyclists, etc. Simulating traffic congestion, lane-change scenarios, car-following scenarios, pedestrian-vehicle conflict, vehicle-vehicle conflict, pedestrian behavior, driver behavior, human-driven vehicle behavior, weather conditions, and so many scenarios in a large-scale traffic scenario is a complicated multi-layer task. Usually, the resulting movements of each object in the simulation rarely follow the physical laws. Moreover, accessing vehicle trajectories and including them in virtual tests or applications in real-time is challenging. Furthermore, road networks generation and representation is also a fundamental task in traffic simulation and modeling. Several simulation tools support road networks, but the outcomes do not resemble real-world traffic at the street level. Therefore, in virtual testing methods, model verification in terms of the similarity between the simulated traffic model and real-world scenarios is always a concern. In this review paper, we presented the pros and cons of each evaluation method. The review papers presented in this survey show a clear gap in the research area of CAVs and AVs evaluation. Every method has its strengths and weaknesses. For example, many techniques focus on car-following and lane-change modeling and evaluation and neglect the remaining traffic conditions and the modeling of severe weather conditions. Furthermore, the V2V, V2P, and V2I technologies are still under investigation and require VOLUME 4, 2016 [224] Singapore C++ Open-source Commuter [225] US/Autodesk Java Commercial Aimsun [226] Spain/German Python Commercial Mobility Testbed [227] USA Java Open-source Multi-agent middleware JADE [228] Italy Java Open-source mobiTopp [229] n/a n/a n/a Gurobi [230] USA/Gurobi Python Open-source GAMA n/a n/a n/a NetLogo USA LOGO Open-source *Table legend: n/a-Unknown or could not be determined. more validation.
Moreover, the CAVs and AVs modeling and evaluation is a task that requires integrating many models simultaneously with a wide range of parameters and variables. Choosing the suitable models will produce satisfying outcomes. Based on our findings, different models related to behavior, carfollowing, lane-change, vehicle dynamics, etc. being used in every research paper. Therefore, establishing a comparison study is a very challenging task. For example, some carfollowing models outperform other models. Using the superior car-following model in a research paper with a specific evaluation method will produce promising outcomes for this method. Moreover, several architecture models have been developed from completely modular to fully end-to-end, each with its limitations. The optimal algorithms for localization, mapping, and perception still lack accuracy and efficiency. In short, for safe autonomous driving, a high-fidelity driving simulator, which includes realistic traffic streams and complicated traffic conditions, is necessary. Such a simulator can construct critical training environments in an efficient and reproducible manner. New evaluation methods need to be developed for more scenarios to give a thorough validation of AVs. The community has not fully understood the full failure modes of AVs to design a complete list of test scenarios, but possible elements incorporate: 1) Challenges in sensing/detection under severe weather conditions such as heavy snow, rain, fog, etc. 2) Aggression of surrounding vehicles/vulnerable road users such as running a red light, cut-in, jaywalk, etc. 3) Challenges in making decisions such as under low confidence, multiple threats at a time, and so on. 4) Challenges due to road types and vehicles types Moreover, these simulators and evaluation methods should provide clear answers to the following questions: 1) What are scalable driving policies to control many AVs in mixed traffic comprised of human-driven vehicles (HVs), CAVs, AVs, vulnerable road users, etc.? 2) How do we estimate human driver behaviors, pedestrian behaviors, surrounded vehicles (HVs, CAVs, and AVs)?
3) How to make sure that the behaviors of drivers and pedestrians are accurate and capture the real-world behaviors? 4) How should the driving behavior of HVs, CAVs, AVs be modeled in the environment? 5) How are the interactions between human-driven vehicles (HVs) and AVs characterized? 6) How are the interactions between CAVs and AVs characterized? 7) How are the interactions between pedestrians and AVs characterized? 8) How are the interactions between other vulnerable road users (VRU) and AVs characterized? 9) How should pedestrian behavior be modeled in the environment? 10) How should the severe weather conditions be modeled in the environment?
Many methods showed promising outcomes but did not provide answers to all of these questions.

IX. CONCLUSION
It is critical to evaluate AVs thoroughly before their release and deployment to the general public. However, because most trips are not safety-critical in naturalistic driving, testing AVs on public roads is hugely time-consuming, inefficient, and expensive. In this paper, we surveyed all the evaluation methods. These methods are naturalistic field operational tests, test matrix evaluation, worst-case scenario evaluation, Monte Carlo simulations, accelerated evaluation, and simulation-based model approach. This survey has shown that there is a clear gap in the field of AVs evaluation. Many factors affect our judgment on what is the best approach to evaluate the AVs. These factors include: 1) The AVs to AVs and HVs to HVs interactions are not studied and used only as a benchmark.
2) The AV sensors and controls are suggested to work accurately in many papers, and the measurements are presumed to be accurate. 3) The drivers' reactions to AVs are assumed to be the same as to HVs.
4) The vehicle models are not accurate to mimic the realworld scenarios. 5) Many real-world conditions are not investigated yet. 6) Different models related to behavior, car-following, lane-change, vehicle dynamics, etc. being used in every research paper.
The accelerated evaluation approach outperforms naturalistic field operational tests (N-FOTs), test matrix evaluation, worst-case scenario evaluation (WCSE), Monte Carlo simulations methods in some of the car-following, and lanechange studies when using specific models in terms of the assessment time of the collision, injury, or conflict event. In addition, some studies show that integrating machine and deep learning techniques to test matrix evaluation, Monte Carlo simulations, and accelerated evaluation can reveal significant improvements. In the simulation-based model approach, the agent-based modeling approach was investigated and shown to be advantageous in AV modeling and validation. However, more works are needed to implement the agent-based modeling approach to cover a wide range of self-driving vehicle research. Another promising approach for AV evaluation is the augmented and virtual reality methods. The development of AVs depends on advancements in scientific disciplines and new technologies. Therefore, the AV research development has a high impact on AV driving technology by overcoming the weaknesses of the available evaluation methods and by inventing new evaluation methods. In May 1997, he was promoted to an Associate Professor, with tenure. His current research interests are in the areas of game design, machine learning, optimization, intelligent systems, and wearable sensors. He is also a recipient of several research contracts that address problems in Intelligent Transportation Systems, and serves as a consultant to the industry and the government on these problems as well. In 1996, he founded M-Vision, Inc., a company specializing in the application of computer vision to automotive problems.