Integrating LiDAR Sensor Data into Microsimulation Model Calibration for Proactive Safety Analysis

Studies have shown that vehicle trajectory data are effective for calibrating microsimulation models. Light Detection and Ranging (LiDAR) technology offers high-resolution 3D data, allowing for detailed mapping of the surrounding environment, including road geometry, roadside infrastructures, and moving objects such as vehicles, cyclists, and pedestrians. Unlike other traditional methods of trajectory data collection, LiDAR’s high-speed data processing, fine angular resolution, high measurement accuracy, and high performance in adverse weather and low-light conditions make it well suited for applications requiring real-time response, such as autonomous vehicles. This research presents a comprehensive framework for integrating LiDAR sensor data into simulation models and their accurate calibration strategies for proactive safety analysis. Vehicle trajectory data were extracted from LiDAR point clouds collected at six urban signalized intersections in Lubbock, Texas, in the USA. Each study intersection was modeled with PTV VISSIM and calibrated to replicate the observed field scenarios. The Directed Brute Force method was used to calibrate two car-following and two lane-change parameters of the Wiedemann 1999 model in VISSIM, resulting in an average accuracy of 92.7%. Rear-end conflicts extracted from the calibrated models combined with a ten-year historical crash dataset were fitted into a Negative Binomial (NB) model to estimate the model’s parameters. In all the six intersections, rear-end conflict count is a statistically significant predictor (p-value < 0.05) of observed rear-end crash frequency. The outcome of this study provides a framework for the combined use of LiDAR-based vehicle trajectory data, microsimulation, and surrogate safety assessment tools to transportation professionals. This integration allows for more accurate and proactive safety evaluations, which are essential for designing safer transportation systems, effective traffic control strategies, and predicting future congestion problems.


Introduction
Microsimulation has gained widespread adoption due to its risk-free nature and cost-effectiveness.It is a powerful tool for modeling traffic operations and conducting safety studies.Simulation models replicate individual drivers' behaviors within a virtual environment by providing a detailed representation of car-following and lane-changing behaviors in response to the surrounding traffic [1].Compared to analytical models, microsimulation models provide individual vehicle movements on a second or sub-second basis [2].A recent study on path dispersion (the spatial distribution of vehicular paths) [3] introduced a simulation model that describes the paths of left-turning vehicles under the unprotected left-turn phase inside intersections.The model analyzes drivers' behavior to determine vehicle paths endogenously, rather than relying on predefined inputs.Due to the complexity of the path dispersion (which may be affected by geometric conditions, signal control, traffic conditions, and driver characteristics), the proposed microscopic traffic flow model is a starting point for future studies.Existing commercial microsimulation models including VISSIM, AIMSUN, CORSIM, and SUMO incorporate independent conflicts and predict crash frequencies using a Negative Binomial model, providing a robust method for safety evaluation.Overall, by leveraging the strengths of LiDAR sensor data, this study sets a new standard for the accuracy and realism of microsimulation models, paving the way for more effective transportation safety evaluations and interventions.The rest of this paper is organized as follows: The Materials and Methods section provides details on the workflow for extracting vehicle trajectory data from LiDAR sensor point clouds, VISSIM model creation, the sensitivity analysis method, the calibration approach, and surrogate safety measures.The Results section presents results of calibration, the surrogate safety measures, and the Negative Binomial model.The Discussion section explains the results and their implications.Finally, the Conclusion section summarizes this paper.

Materials and Methods
This study implements a framework for integrating vehicle trajectory data obtained from LiDAR sensors into VISSIM models, performs a sensitivity analysis-based calibration, and predicts rear-end crashes obtained from surrogate safety assessment models.Three separate simulation models were developed, including the Default Model, Macro Model, and Micro Model.For the Default Model, no calibration was performed.Sensitivity analysis (SA) was performed to determine the significant parameters necessary for macroscopic and microscopic calibrations.Based on results of SA, two car-following and two lane-changing parameters of the Macro Models were calibrated using traffic volume and mean speed as Measures of Effectiveness (MOEs).Similarly, parameters of the Micro Models were calibrated using the distributions of lane, headway, speed, and acceleration as MOEs.Time-based (MTTC), deceleration-based (SDI), and severity-based (CSI) surrogate safety indices were calculated to estimate the Rear-End Conflict Index.Figure 1 illustrates the overall workflow of the methodology.Subsequent sections offer detailed explanations of these procedures.

LiDAR Sensor
LiDAR sensors capture data by emitting laser pulses and measuring the time it takes for the reflected light to return.This allows them to create a 3D representation of their surroundings as a point cloud.This 3D representation includes accurate information about objects' shapes, sizes, and positions [16].In this study, vehicle trajectory data were collected Sensors 2024, 24, 4393 4 of 20 using a VLP-32 Velodyne LiDAR sensor (San Jose, CA).The VLP-32 LiDAR sensor is a 32-channel sensor.This means tha it emits 32 laser beams from a compact, spinning housing to create a 3D point cloud of the surrounding environment.With a range of 100 m (328 feet) and a rotational frequency set to 10 Hz, it captures 10 frames per second (records data every 0.1 s), each containing 600,000 3D points.The sensor has a full 360-degree horizontal field of view and a vertical field of view of 30 degrees (ranging from 15 degrees upwards to 15 degrees downwards).The VLP-32C LiDAR sensor components and their physical measurements are shown in Figure 2. Additionally, Table 1 lists several other specifications and characteristics of the sensor.
LiDAR sensors capture data by emitting laser pulses and measuring the time it takes for the reflected light to return.This allows them to create a 3D representation of their surroundings as a point cloud.This 3D representation includes accurate information about objects' shapes, sizes, and positions [16].In this study, vehicle trajectory data were collected using a VLP-32 Velodyne LiDAR sensor (San Jose, CA).The VLP-32 LiDAR sensor is a 32-channel sensor.This means tha it emits 32 laser beams from a compact, spinning housing to create a 3D point cloud of the surrounding environment.With a range of 100 m (328 feet) and a rotational frequency set to 10 Hz, it captures 10 frames per second (records data every 0.1 s), each containing 600,000 3D points.The sensor has a full 360-degree horizontal field of view and a vertical field of view of 30 degrees (ranging from 15 degrees upwards to 15 degrees downwards).The VLP-32C LiDAR sensor components and their physical measurements are shown in Figure 2. Additionally, Table 1 lists several other specifications and characteristics of the sensor.The LiDAR sensor was mounted on a traffic signal pole at the corner of each intersection to record data.One of the study intersections is shown in Figure 3a, with the VLP-32 LiDAR sensor mounted on a pole.Data collection for each intersection lasted 2 h and was conducted during the evening peak commute period (4:00 pm to 6:00 pm) on different weekdays in March 2024.
Avenue Q; (v) 50th Street and Quaker Avenue; and (vi) 4th Street and Frankford Avenue.The LiDAR sensor was mounted on a traffic signal pole at the corner of each intersection to record data.One of the study intersections is shown in Figure 3a, with the VLP-32 Li-DAR sensor mounted on a pole.Data collection for each intersection lasted 2 h and was conducted during the evening peak commute period (4:00 pm to 6:00 pm) on different weekdays in March 2024.

Data Processing, Trajectory Extraction, and Smoothing
VeloView is a user application which provides real-time visualization of 3D LiDAR data from Velodyne LiDAR sensors.It is used to acquire, visualize, save, and replay sensor data.VeloView can also playback pre-recorded data stored in "pcap" (Packet Capture) files.Figure 3b shows the raw 3D point clouds obtained from the LiDAR sensor, visualized on the Veloview graphical user interface.
A generalized four-step algorithm was developed by Zhao et al. (2019) for processing roadside-LiDAR sensor data [17].The four steps include the following: (i) Background Filtering; (ii) Object Clustering; (iii) Object Classification; and (iv) Object Tracking.The first step employs a filtering technique based on point density to remove both static and dynamic background points from all frames.Step two involves clustering the remaining points into separate unique objects by means of a modified DBSCAN algorithm with varying parameters depending on their proximity to the LiDAR sensor.In step three, an artificial neural network (ANN) model is developed to classify the clustered objects into their respective road user groups, including vehicles, cyclists, and pedestrians.The parameters of the ANN model include distance to the LiDAR sensor, number of points in the cluster, and direction of cluster points distribution.Lastly, the Global Nearest Neighbor (GNN) is applied to track the same object across multiple data frames.The method proposed by Zhao et al. (2019) [17] was adopted in this study.Figure 3c shows vehicle

Data Processing, Trajectory Extraction, and Smoothing
VeloView is a user application which provides real-time visualization of 3D LiDAR data from Velodyne LiDAR sensors.It is used to acquire, visualize, save, and replay sensor data.VeloView can also playback pre-recorded data stored in "pcap" (Packet Capture) files.Figure 3b shows the raw 3D point clouds obtained from the LiDAR sensor, visualized on the Veloview graphical user interface.
A generalized four-step algorithm was developed by Zhao et al. (2019) for processing roadside-LiDAR sensor data [17].The four steps include the following: (i) Background Filtering; (ii) Object Clustering; (iii) Object Classification; and (iv) Object Tracking.The first step employs a filtering technique based on point density to remove both static and dynamic background points from all frames.Step two involves clustering the remaining points into separate unique objects by means of a modified DBSCAN algorithm with varying parameters depending on their proximity to the LiDAR sensor.In step three, an artificial neural network (ANN) model is developed to classify the clustered objects into their respective road user groups, including vehicles, cyclists, and pedestrians.The parameters of the ANN model include distance to the LiDAR sensor, number of points in the cluster, and direction of cluster points distribution.Lastly, the Global Nearest Neighbor (GNN) is applied to track the same object across multiple data frames.The method proposed by Zhao et al. (2019) [17] was adopted in this study.Figure 3c shows vehicle trajectories extracted from the LiDAR sensor.A vehicle trajectory obtained from the sequence of processing algorithms is raw and frequently contains noise, deviations, and inconsistencies at the microscopic level (0.1 s), which can greatly affect the quality of the trajectory data.Therefore, it is necessary to smooth trajectory data to reduce the noise in the data (x and y-error correction).A recent study [18] introduces a new method for reconstructing 2D paths followed by objects, specifically driver behavior.However, the reconstructed trajectory using this method may be heavily impacted by outliers.In this study, the low pass Savitzky-Golay filter was used to denoise the trajectory data and accurately represent the vehicle's intended path of travel.A window length of 11 and polyorder of 2 were used to smooth the raw trajectory data.The smooth trajectory data also removes the false peak in the velocity and acceleration profile, thereby providing the accurate representation of the velocity and acceleration profile.A detailed explanation of the trajectory data smoothing procedure can be found in the previous works of the authors [19].Figure 3d shows the result of a smoothed trajectory.A sample of the processed trajectory data is presented in Table 1, showing all the variables used in this study.

Vissim Simulation Models
PTV's Verkehr In Städten-SIMulationsmodell (VISSIM 2024) was used to develop the simulation models for this study.VISSIM is based on a psycho-physical car-following model that incorporates perception thresholds to model driver behavior.The basic concept of the car-following model is that drivers are sensitive to the changes in distance and speed of slower moving vehicles in front of them [20].PTV VISSIM is by far the most used traffic microsimulation software for simulation-based surrogate safety studies [15].Researchers cited its high flexibility in simulating real-world traffic conditions and the ease of extracting conflict data without resorting to manual observation.Each of the six study intersections were modeled in VISSIM.The geometry includes lanes, link, design speed, designated turn lanes, vehicle storage lengths, and curb turn radii.These data were obtained from Geographic Information System (GIS) files and Google Maps.The processed vehicle trajectory data were used as vehicle input (volume) and static routing decision (relative flow).Signal timing sheets were obtained from the City of Lubbock for all six intersections.The PM peak day plan was used to code the signal control operation in the VISSIM network model of each study intersection [2].Three separate simulation models were developed for each study intersection, including the Default Model, Macro Model, and Micro Model.For the Default Model, no calibration was performed (all the default parameters of the Wiedemann 99 model were retained).And the model was simulated for 2 simulation hours to replicate the duration of field data collection.

Sensitivity Analysis
Sensitivity analysis (SA) helps identify which model parameters have the most significant influence on the output.This information is crucial for calibration because it allows traffic engineers to focus their efforts on the most impactful parameters, making the calibration process more efficient and effective.Variance-based sensitivity analysis quantifies the importance of both individual factors (first-order effects) and interactions (total effects) in complex models.By analyzing both first-order effects and total effects, we gain a deeper understanding of how each factor contributes to the overall output variability.Consider a model with multiple factors affecting a single output, given in the form shown in Equation (1).
First-order effects represent the influence of a single factor (X 1 ) on Y, independent of other factors.These are relatively easy to analyze using tools like regression.Interactions, on the other hand, capture the combined, non-additive effects of multiple factors and are more challenging to detect.Sobol, 1976 [21] initially introduced the variance decomposition method, offering both analytical derivation and Monte Carlo implementation.The core concept of Sobol decomposition lies in Sobol indices.These indices quantify the contribution of individual variables and their interactions to the variance of the output.There are two main types of Sobol indices: (1) the main effect (first order) index (S i ) which measures the individual contribution of a generic factor X i to the variance of Y, and (2) the total effect index (S T ) which captures main effect of X i plus its interaction effects with other variables.
The latest advancements in Sobol 1976, useful for practical application, come from Saltelli et al., 2008 [22].The expression of S i and S T are given in Equations ( 2) and (3).
A multistep approach for complex traffic simulation models' sensitivity analysis (SA) was proposed by Ciuffo and Azevedo, 2014 [23].In the first step, parameters are grouped with respect to the sub-models they are part of, and an SA is carried out considering the different groups rather than the different parameters.Then, the most influential groups on the model outputs are identified, and a new SA on all the parameters of these groups is carried out.
In this study, a multistep SA was performed to identify car-following parameters of the Wiedemann 99 model (including lane-changing parameters) with significant effect on the output variance.Tables 2 and 3 present all the parameters of the Wiedemann 99 chosen for sensitivity analysis.
The first step of sensitivity analysis involves grouping parameters which measure the same quantity.SA is performed on each group separately to select the parameter with the most significant influence on the output variance.For instance, in the first step, CC0 and CC2 (m) are grouped because they both measure distance.Similarly, CC1 and CC3 (s) are combined because they both measure time.This iterative process is repeated for all parameter sets, measuring the same quantity.After a reduction in the parameters set, a final SA was performed to identify the subset of final model parameters which represent only the parameter with the most significant influence on the output variance.Based on SA results across the six study intersections, the following car-following and lane-change parameters were selected: Standstill Distance (CC0), Gap Time Distribution (CC1), Safety Distance Reduction Factor (SDRF), and Accepted Deceleration-Trailing (ADT).In a study by Buck et al. (2017), it was found that the CC0 parameter influences vehicles near the stop line at the front of the queue, whereas the CC1 parameter has an impact on vehicles located further back in the queue [24].Before making a lane change, drivers consider the reduced safety distance resulting from such maneuver.In VISSIM, this reduction is calculated by multiplying the original safety distance by the Safety Distance Reduction Factor (SDRF).The resulting distance represents the minimum acceptable headway a vehicle needs in the neighboring lane to safely complete the lane change.Accepted Deceleration-Trailing (ADT) describes the specific deceleration rate that the vehicle in the new lane (the trailing vehicle) is willing to accept to allow the lane change to occur safely [25].

Parameter
Unit Description CC0 m Standstill distance: The desired standstill distance between two vehicles.

CC1 s
Gap time distribution: Time distribution from which the gap time in seconds is drawn which a driver wants to maintain in addition to the standstill distance.

CC2 m
"Following" distance oscillation: Maximum additional distance beyond the desired safety distance accepted by a driver following another vehicle before intentionally moving closer.

CC3 s
Threshold for entering "BrakeBX": Time in seconds before reaching the maximum safety distance to a leading slower vehicle at the beginning of the deceleration process (negative value).

CC4 m/s
Negative speed difference: Lower threshold for relative speed compared to slower leading vehicle during the following process (negative value).

CC5 m/s
Positive speed difference: Relative speed limit compared to faster leading vehicle during the following process (positive value).

CC6
1/(m.s) Distance impact on oscillation: Impact of distance on limits of relative speed during following process.

CC7
m/s 2 Oscillation acceleration: Acceleration oscillation during the following process.

CC8
m/s 2 Acceleration from standstill: Acceleration when starting from standstill.

CC9 m/s 2
Acceleration at 80 km/h: Acceleration at 80 km/h is limited by the desired and maximum acceleration functions assigned to the vehicle type.

Element
The safety distance of the trailing vehicle on the new lane for determining whether a lane change will be carried out.

•
The safety distance of the lane changer itself.

•
The distance to the preceding, slower lane changer.

Maximum cooperative deceleration (CoopDecel)
Maximum cooperative deceleration (CoopDecel) specifies to what extent the trailing vehicle A is braking cooperatively, so as to allow a preceding vehicle B to change lanes into its own lane.

Calibration of Simulation Models
Researchers have treated the calibration of simulation models as optimization problems consisting of a searching algorithm and an objective function.The goal is to determine the optimal set of parameters that minimize the objective function.The Genetic Algorithm (GA) is among the most widely used metaheuristic methods by researchers in calibration/validation problems.This is due to its easy implementation and good performance in calibration and optimization.However, Hale et al. (2015) argued that GA frequently requires thousands of trials to locate an acceptable solution, and the traffic simulations cannot process thousands of trial runs in a reasonable time frame [26].Recently, researchers have also implemented several other nonlinear optimization methods to solve calibration problems.Some of these methods include Simultaneous Perturbation Stochastic Approximation (SPSA) [27], sequential quadratic programming [28], dynamic time warping algorithm [29,30], artificial neural networks [31], cross-entropy [32], and Directed Brute Force (DBF) [26].Yu and Fan, 2017 cited several goodness-of-fit formulas previously used by researchers to calibrate simulation models [4].Hale et al. (2015) compared SPSA and the "directed brute force" (DBF) method search [26].The results showed that although SPSA was faster than the DBF method, the results of DBF are more reliable.Multiple random seed number replications are executed for each combination of parameter values to obtain more statistically reliable output.The Macro Models were calibrated based on the method proposed by Hale et al., 2015 using traffic volume and mean speed as Measures of Effectiveness (MOEs) [26].Similarly, the Micro Models were calibrated using the distributions of lane, headway, speed, and acceleration as MOEs.After calibration, each model was simulated for 2 simulation hours to replicate the duration of field data collection.

Surrogate Safety Measures (SSMs)
Surrogate safety measures (SSMs) have been extensively used to detect near-crash events.SSMs measure the time and space proximity of a vehicle pair to the point of collision or detect their evasive actions to avoid a collision.SSMs serve as a proactive approach to road safety analysis without relying on historic crash data.Recently, there has been significant interest in SSMs due to the causal relationship between near-crash events identified by SSMs and the actual crashes from field observations.Traffic conflicts have been considered an effective surrogate safety measure.Compared to crashes, conflicts are much more frequent and easier to be captured in the real world [33].
The Surrogate Safety Assessment Model (SSAM) is an automated tool that has been widely used by researchers to extract surrogate measures such as Post Encroachment Time (PET), Time to Collision (TTC), and Modified Time to Collision (MTTC) from simulation models [34].From the initial positions and velocities of the pair of vehicles, SSAM generates 100 paths for each vehicle using combinations of acceleration and orientation that are independently generated from two triangular distribution functions and detects all collision points between every pair of projected paths.In this study, SSAM-3.0 was used to extract rear-end conflict data from the VISSIM-simulated trajectory files.
Bhattarai et al., 2023 present a methodology based on three surrogate safety indices to detect rear-end conflicts at signalized intersections [16].The authors combined time-based (MTTC), deceleration-based (SDI), and severity-based (CSI) surrogate safety indices to estimate the Rear-End Conflict Index (RECI) at the intersection of Los Altos Parkway and Pyramid Way in Reno, Nevada.The Stopping Distance Index (SDI) is a surrogate measure used to capture rear-end conflicts using Equations ( 7)-( 9): AASHTO's A Policy on Geometric Design of Highways and Streets [35] recommends a perception-reaction time (P r ) of 2.5 s.A Maximum Available Deceleration Rate (D max ) of 8 m/s 2 was recommended by a previous study [36].
The crash severity index (CSI) is quantified by the expression given in Equation ( 10) [37].MTTC values below a predefined threshold are combined with SDI = 1 (unsafe) to estimate the Risk Exposure (RE) index within each temporal window.Similarly, the MTTC values below the threshold are combined with CSI to estimate Risk Severity (RS) index values within the same temporal window.Equations ( 11) and (12) show their mathematical relationships.

RE =
where RE = Risk Exposure.U i = Unsafe instances (count of frames with MTTC < 2 s and SDI = 1).n i = Total count of instances (frames).

RS =
CSI max CSI critical (12) where RS = Risk Severity.CSI max = maximum CSI value for the temporal window.CSI critical = maximum possible CSI value.
As each pair of vehicles interacts at an intersection, the RE and RS indices constantly assess their conflict risk at every temporal window.This combined effect is the Rear-End Conflict Index (RECI), obtained by normalizing the RE and RS indices with min-max scaling using Equations ( 13)- (15) where RECI = Rear-End Conflict Index.RE norm = Normalized RE.RS norm = Normalized RS. i = Current window.N = Total count of windows.

Field Data
Field data were collected at each of the six study intersections in Lubbock, Texas, USA: (i) 34th Street and Indiana Avenue, (ii) 82nd Street and Milwaukee Avenue, (iii) 82nd Street and Slide Road, (iv) 50th Street and Avenue Q, (v) 50th Street and Quaker Avenue, and (vi) 4th Street and Frankford Avenue.The study period covers the evening peak commute period (4:00 pm to 6:00 pm) on six different weekdays in March 2024.

Sensitivity Analysis and Model Calibration Results
The results of the sensitivity analysis (SA) include the main effect (S i ), which measures the individual contribution of each parameter to the output variance and the total effect (S T ), which captures the main effect plus its interaction with other parameters.(S i ) and (S T ) were calculated using Equations ( 2) and (3), respectively.Based on the results of the SA, the following car-following and lane-change parameters were selected: Standstill Distance (CC0), Gap Time Distribution (CC1), Safety Distance Reduction Factor (SDRF), and Accepted Deceleration-Trailing (ADT).Based on findings from the literature [1,38] and early experimentation in VISSIM, the coefficients considered for the Wiedemann 99 car-following model parameters and lane-changing parameters are given below.
The model calibration was treated as optimization problems consisting of a search algorithm and an objective function.The objective function was to minimize the difference between field-observed measures and calibrated measures (e.g., field-observed speed vs. simulated speed).This can be achieved by identifying the optimum combination of the coefficients of the selected lane-changing and car-following parameters.For the Macro Models, Root Mean Squared Error (RMSE) was selected as the goodness-of-fit measure between simulated and observed performance measures.To ensure that the error metrics are scale-independent, the RMSE values were normalized values between 0 and 1 based on the minimum and maximum observed values of volume and mean speed.The objective function for the Micro Models is extended to consider not just volume and mean speed but also lane usage and traffic characteristics (distributions of vehicle headway, speed, and acceleration).A confusion matrix is used to compare how well the simulated lane usage matches the observed lane usage.The confusion matrix shows how many vehicles were assigned to each lane category in both simulated and observed data.Additionally, the Kolmogorov-Smirnov test was used to compare the distributions of vehicle headway, speed, and acceleration from both simulated and observed data and calculate p-values.The p-value indicates the probability that the distributions are statistically different.Overall, this modified objective function provides a more comprehensive evaluation of the traffic simulation by considering not just volume and speed but also lane usage patterns and the statistical distribution of traffic characteristics.This can lead to a more robust optimization process that calibrates the simulation to reflect real-world traffic behavior better.The collected dataset was split in the order of 1.5 h (frame 0-53,999) and 0.5 h (frame 54,000-71,999), representing the 75% calibration and 25% validation sets, respectively.After calibrating and validating the model, default coefficients of the calibrated parameters in VISSIM were then replaced with the identified coefficients to produce a simulation of the real-world scenario.The calibrated parameters for the Macro and Micro Models across all six study intersections are shown in Tables 4 and 5, respectively.The experimental study evaluates the accuracy of traffic volume simulations at various intersections using three different models: Default Model, Macro Model, and Micro Model.Observed volumes (veh/h) are compared with simulated volumes, and the accuracy percentage is calculated for each model.Table 6 presents the results of the observed versus VISSIM-simulated traffic volumes.

•
Default Model: The Default Model shows varying accuracy, with the highest accuracy of 99.7% at the 82nd Street and Slide Road intersection and the lowest accuracy of 91.6% at the 4th Street and Frankford Avenue intersection.

•
Macro Model: The Macro Model consistently provides high accuracy, ranging from 99.3% to 99.9%, demonstrating its reliability across different intersections.
• Micro Model: The Micro Model also demonstrates high accuracy, with the lowest accuracy being 99.5% at the 50th Street and Quaker Avenue intersection and the highest accuracy of 99.9% at multiple intersections.Similarly, the study evaluates the accuracy of mean speed simulations at various intersections using Default Model, Macro Model, and Micro Model.Observed mean speeds (mi/h) are compared with simulated speeds, and the accuracy percentage is calculated for each model.Results of observed mean speed and simulated mean speeds are presented in Table 7.The Default Model shows less consistent accuracy compared to the Macro and Micro Models, indicating room for improvement.The Micro Model consistently provides the highest accuracy of simulated volume and mean speed, making it the most reliable.

Conflict Extraction
The Surrogate Safety Assessment Model (SSAM) was used to directly extract simulated conflict data from trajectory (.trj) files generated from PTV VISSIM.SSAM follows a four-step procedure to extract conflicts from simulated trajectory data.(i) Define the Analysis Area: First, SSAM reads the header information in the TRJ file to determine the dimensions of the analysis area.It then constructs a zone grid that completely covers this rectangular area.(ii) Project Vehicle Paths: For each vehicle within the analysis region, SSAM projects its expected location based on its current speed.This projection assumes that the vehicle continues traveling along its future path for a set duration, typically 1.5 s, using trajectory data for the next 10 s to perform this projection.(iii) Identify Potential Conflicts: SSAM calculates a rectangular perimeter that defines the location and orientation of each vehicle at its projected future position.This rectangle is then overlaid on the zone grid.SSAM identifies which zones in the grid will contain at least part of the vehicle based on this overlap.For each zone containing the projected vehicle, SSAM checks for an overlap between the new vehicle's rectangle and any other vehicles (rectangles) already present in that zone.(iv) Refine Time-to-Collision (TTC): For any identified overlapping vehicle pairs, SSAM refines the initial TTC value.It does this by iteratively shortening the future projection timeline by tenths of a second.For each shortened timeframe, both vehicles are re-projected as before for successively shorter distances.This iterative process allows for a more accurate calculation of the TTC for that specific time step.Finally, various surrogate safety measures are calculated and updated based on these refined TTC values [34].Generally, SSAM conflicts are classified into only three types: crossing, rear-end, and lane change.This study focusses on rear-end crashes at signalized intersections.Therefore, rear-end conflicts were selected for further analysis.Rear-end conflict pairs extracted from each study intersection are presented in Table 8.

Surrogate Safety Measures
There are several factors that could potentially explain the differences in conflict counts across the study intersections.First, all six study intersections are four-leg intersections.Based on the signal timing sheets, all six intersections have the same cycle length of 130 s during the PM peak period.It is observed that 82nd Street/Milwaukee Avenue and 82nd Street/Slide Road have dual left turn lanes, and therefore run protected left turn phasing, while others have a single left turn lane each and run protected-permitted phasing during the PM peak commute period.Finally, the results of data collection show that traffic volume varies significantly across all six study intersections.The observed traffic volume can also explain the differences in conflict counts.For all rear-end conflict pairs extracted from SSAM, the corresponding MTTC value was used to calculate SDI and CSI.The Risk Exposure (RE) index and Risk Severity (RS) index were calculated using Equations ( 11) and ( 12) for every 1 s temporal window.Only the surrogate safety measures from Micro Models are provided in this paper because they performed much better than the other two models.This is consistent with previous studies [39].Figure 4 shows the distribution of normalized RS and RE indices obtained using Equations ( 14) and (15), respectively.
Normalized RE and RS indices are shown in the central plot area, while frequency distributions of the observations are shown in the marginal plots.The frequency distribution of the normalized RE index shows heavy volume at the range between 0 and 0.6, with the tail capturing the higher exposure indices above 0.8 to 1.However, the distribution of the normalized RS index shows heavy volume at the lowest range between 0 and 0.45, with the tail capturing the higher severity indices above 0.5 to 1.The 15th percentile from the top of the distributions of normalized RE and normalized RS were used to evaluate the Rear-End Conflict Index (RECI) from Equation (13).The frequency distribution of the normalized RE index shows heavy volume at the range between 0 and 0.6, with the tail capturing the higher exposure indices above 0.8 to 1.However, the distribution of the normalized RS index shows heavy volume at the lowest range between 0 and 0.45, with the tail capturing the higher severity indices above 0.5 to 1.The 15th percentile from the top of the distributions of normalized RE and normalized RS were used to evaluate the Rear-End Conflict Index (RECI) from Equation (13).

Rear-End Crash Prediction
Crash records of the past ten years (2014-2023) were collected from the Crash Records Information System (CRIS).CRIS is a crash database of the Texas Department of Transportation (TxDOT).The annual crash frequencies were extracted from all six intersections at a radius of 328 feet, matching the detection range of the LiDAR sensor.The crash data were filtered based on vehicle travel directions to extract the rear-end crashes.The result was further streamlined based on the hour of day to obtain only the crashes corresponding to the data collection period (4:00 PM to 6:00 PM).A negative binomial (NB) model was developed to estimate the expected number of crashes based on rear-end conflict at hourly time intervals.The NB model has been used in previous crash-prediction studies because it accounts for the over-dispersion in crash data resulting from unobserved heterogeneity [40].Negative Binomial models were developed with rear-end conflicts identified at 95th percentile values of the RECI index.The ten years' historic crash data and the rear-end conflicts were fitted into a Negative Binomial regression model to analyze the relationship between identified conflicts and crash frequency.Table 7 presents the results of Negative Binomial models across the six study intersections.

Rear-End Crash Prediction
Crash records of the past ten years (2014-2023) were collected from the Crash Records Information System (CRIS).CRIS is a crash database of the Texas Department of Transportation (TxDOT).The annual crash frequencies were extracted from all six intersections at a radius of 328 feet, matching the detection range of the LiDAR sensor.The crash data were filtered based on vehicle travel directions to extract the rear-end crashes.The result was further streamlined based on the hour of day to obtain only the crashes corresponding to the data collection period (4:00 PM to 6:00 PM).A negative binomial (NB) model was developed to estimate the expected number of crashes based on rear-end conflict at hourly time intervals.The NB model has been used in previous crash-prediction studies because it accounts for the over-dispersion in crash data resulting from unobserved heterogeneity [40].Negative Binomial models were developed with rear-end conflicts identified at 95th percentile values of the RECI index.The ten years' historic crash data and the rear-end conflicts were fitted into a Negative Binomial regression model to analyze the relationship between identified conflicts and crash frequency.Table 9 presents the results of Negative Binomial models across the six study intersections.
The rear-end conflict count coefficient represents the effect of rear-end conflicts on the outcome.Positive coefficients indicate an increase in rear-end conflicts resulting in an increase in crash frequency.Pseudo R-squared (Cox and Snell) measures the proportion of variation in the dependent variable explained by the model.Higher values indicate better model fit.p-value assesses the significance of the rear-end conflict count coefficient.p-values below 0.05 indicate statistically significant effects.A lower AIC indicates a better model fit, considering the number of predictors.Log-likelihood is a statistical measure used to assess how well a model fits the data.In this case, lower log-likelihood values indicate a better fit.While it is not directly comparable across models due to potential scaling differences, we can see that all intersections have negative log-likelihood values, suggesting a reasonable fit for the data.Deviance is another measure of model fit, related to log-likelihood.It represents the difference between the model and a perfect model (where the predicted values perfectly match the observed data).Lower deviance values indicate a better fit.Here, we see a similar trend to log-likelihood, with all intersections having relatively low deviance values.For intersections 34th/Indiana, 82nd/Milwaukee, 50th/Ave Q, 50th/Quaker, and 4th/Frankford, the rear-end conflict count is a statistically significant predictor (p-value < 0.05).For 82nd/Slide, the p-value is 0.05, indicating marginal significance.The model explains a significant proportion of the variance in the dependent variable for all intersections, with pseudo R-squared values ranging from 0.2142 to 0.6515.The best model fit is for 34th/Indiana (pseudo R-squared = 0.6515).The impact of rear-end conflicts varies by location.The coefficient is highest for 50th/Quaker (1.5677), indicating a strong effect, and lowest for 82nd/Slide (0.0165).Based on AIC values, the model for 4th/Frankford has the best fit (AIC = 16.669),suggesting that it balances model complexity and goodness-of-fit most effectively.

Discussion
The results of the distribution of RE indices indicate that there are a lot of instances with high-risk exposure (up to 0.6) across the six study intersections.But the corresponding RS indices show heavy volume at the lowest range between 0 and 0.45.This explains the scenarios at intersection approaches where exposure is high due to vehicles decelerating or stopping in response to traffic signals, but the risk of such conflict is very low due to reduced or zero speed.Therefore, extreme values of RE and RS were selected to capture instances that represent dangerous driving behaviors.The top 15th percentile values of RE and RS were selected to estimate the Rear-End Conflict Index (RECI) at each 1 s temporal window.Negative Binomial models were developed with rear-end conflicts identified at 95th percentile values of the RECI.The ten years' historic crash data and the rear-end conflicts were fitted into a Negative Binomial regression model to analyze the relationship between identified conflicts and crash frequency.

34th Street and Indiana Avenue
The positive coefficient for rear-end conflict count (0.2214) suggests that an increase in rear-end conflict count is associated with an increase in the expected Crash Count.Specifically, for each unit increase in the rear-end conflict count, the log of the expected Crash Count increases by 0.2214.The intercept value of −2.2513 indicates the log of the expected Crash Count when the rear-end conflict count is zero.The model's pseudo Rsquared value of 0.6515 indicates that the model explains approximately 65.15% of the variance in the Crash Count.The rear-end conflict count is a highly significant predictor of the Crash Count (p-value < 0.001).The AIC value of 28.4242 provides a measure of the model's goodness of fit.

82nd Street and Milwaukee Avenue
The coefficient for the rear-end conflict count (0.0423) is positive.For each unit increase in the rear-end conflict count, the log of the expected Crash Count increases by 0.0423.The intercept value is −2.2524.The model's pseudo R-squared value of 0.5496 indicates that the model explains approximately 54.96% of the variance in the Crash Count.The rear-end conflict count is a highly significant predictor of the Crash Count (p-value < 0.001).

82nd Street and Slide Road
The rear-end conflict count (0.0165) suggests a positive relationship with the Crash Count, but its statistical significance is marginal (p-value = 0.053).The intercept value is −1.0561.The model's pseudo R-squared value of 0.2142 indicates a relatively weak fit compared to the previous models, suggesting that the model explains only about 21.42% of the variance in Crash Count.While the rear-end conflict count shows a positive relationship with the Crash Count, its significance level is borderline (p-value = 0.053).

50th Street and Avenue Q
The coefficient for the rear-end conflict count is 0.6979.The intercept value is −1.7797.The model's pseudo R-squared value of 0.3768 suggests that the model explains approximately 37.68% of the variance in the Crash Count.The rear-end conflict count is a highly significant predictor of the Crash Count (p-value < 0.001).

50th Street and Quaker Avenue
The rear-end conflict count (1.5677) suggests that an increase in the rear-end conflict count is associated with a substantial increase in the expected Crash Count.The intercept value is −0.8329.The model's pseudo R-squared value of 0.3540 suggests that the model explains approximately 35.40% of the variance in the Crash Count.The conflict count is a statistically significant predictor of the Crash Count (p-value = 0.018).

4th Street and Frankford Avenue
The coefficient of the rear-end conflict count is 0.4993 with an intercept value of −4.7538.The model's pseudo R-squared value of 0.5834 indicates a strong fit, suggesting that the model explains approximately 58.34% of the variance in the Crash Count.The rearend conflict count is a statistically significant predictor of the Crash Count (p-value = 0.004).

Conclusions
This study implements a framework for integrating vehicle trajectory data obtained from LiDAR sensors to VISSIM models and surrogate safety assessment models to predict rear-end crashes at urban signalized intersections under heterogenous traffic conditions.Vehicle trajectory data were extracted from LiDAR point clouds collected at six urban signalized intersections in Lubbock, Texas, in the USA.Each study intersection was modeled with PTV VISSIM and calibrated to replicate the observed field scenarios.Rear-end conflicts extracted from the calibrated models combined with a ten-year historical crash dataset were fitted into a Negative Binomial (NB) model to estimate the model's parameters.The model presented offers valuable insights for implementing traffic control and safety improvement strategies at intersections.Here is how the different aspects of the model can be applied: i.
Identifying High-Risk Intersections: By analyzing the coefficient values for each intersection, authorities can prioritize locations with a stronger correlation between rear-end conflicts and crashes.Intersections like 50th/Quaker (high coefficient) warrant immediate attention compared to those with a lower impact (e.g., 82nd/Slide).ii.Targeting Safety Measures: The model highlights rear-end conflicts as a significant factor in crashes.This knowledge allows for targeted safety measures that address these conflicts.Examples include the following: • Improved Signal Timing: Optimizing traffic light timing can reduce sudden stops and improve following distances, potentially decreasing rear-end conflicts.
• Advanced Warning Signs: Flashing yellow arrows or countdown timers before red lights can warn drivers and encourage them to adjust their speed, reducing rear-end risks.
iii.Data-Driven Decision Making: The model can be used as a tool for the monitoring and evaluation of implemented safety measures.By analyzing changes in rear-end conflict counts after implementing interventions, authorities can assess the effectiveness of the strategies and make necessary adjustments.

Figure 1 .
Figure 1.Workflow of the methodology.Figure 1. Workflow of the methodology.

Figure 1 .
Figure 1.Workflow of the methodology.Figure 1. Workflow of the methodology.

Figure 2 .
Figure 2. Parts and dimensions of a Velodyne VLP-32C LiDAR sensor.

Figure 2 .
Figure 2. Parts and dimensions of a Velodyne VLP-32C LiDAR sensor.

Figure 3 .
Figure 3. (a) A VLP-32 LiDAR sensor mounted on a traffic signal pole.(b) Raw 3D LiDAR sensor data visualized using Veloview.(c) Vehicle trajectories extracted from LiDAR sensor data.(d) A smoothed trajectory after removing noise.

Figure 3 .
Figure 3. (a) A VLP-32 LiDAR sensor mounted on a traffic signal pole.(b) Raw 3D LiDAR sensor data visualized using Veloview.(c) Vehicle trajectories extracted from LiDAR sensor data.(d) A smoothed trajectory after removing noise.
Maximum deceleration for a vehicle when changing lanes, determined by its route in proximity to the emergency stop position.Trailing vehicle (MaxDecelTrail): Maximum deceleration for the trailing vehicle on the new lane of a changing vehicle, influenced by its route in close proximity to the emergency stop position.−1 m/s 2 per distance (DecelRedDistOwn and DecelRedDistTrail) Own (DecelRedDistOwn): The distance over which the accepted deceleration for a vehicle decreases linearly by 1 m/s 2 from the maximum deceleration when changing lanes.Trailing vehicle (DecelRedDistTrail): Distance over which the accepted deceleration for the trailing vehicle in the new lane of a changing vehicle is linearly reduced by 1 m/s 2 from the maximum deceleration.The deceleration accepted at any given time for a vehicle when changing lanes Trailing vehicle (AccDecelTrail): The deceleration accepted at any given time for the trailing vehicle in the new lane of a changing vehicle Safety distance reduction factor (lane change) (SafeDistRedFact) The safety distance reduction factor (lane change) (SafeDistRedFact) is taken into account for each lane change.It concerns the following parameters: where SDI = Stopping Distance Index.SD l = Stopping Distance of leading vehicle.SD f = Stopping Distance of following vehicle.ν l = Velocity of leading vehicle.ν f = Velocity of following vehicle.P r = Perception-reaction time.D max = Maximum Available Deceleration Rate.h lf = Headway between the leading vehicle and the following vehicle.
where MTTC = Modified Time to Collision.a l = Acceleration of leading vehicle.a f = Acceleration of following vehicle.

Figure 4 .
Figure 4. (a) 34th/Indiana; (b) 82nd/Milwaukee; (c) 82nd/Slide; (d) 50th/Avenue Q; (e) 50th/Quaker; (f) 4th/Frankford.Normalized RE and RS indices are shown in the central plot area, while frequency distributions of the observations are shown in the marginal plots.The frequency distribution of the normalized RE index shows heavy volume at the range between 0 and 0.6, with the tail capturing the higher exposure indices above 0.8 to 1.However, the distribution of the normalized RS index shows heavy volume at the lowest range between 0 and 0.45, with the tail capturing the higher severity indices above 0.5 to 1.The 15th percentile from the top of the distributions of normalized RE and normalized RS were used to evaluate the Rear-End Conflict Index (RECI) from Equation(13).

Table 1 .
Sample of processed vehicle trajectories.

Table 1 .
Sample of processed vehicle trajectories.

Table 4 .
Calibrated parameters for Macro Model.

Table 5 .
Calibrated parameters for Micro Model.

Table 6 .
Observed versus simulated traffic volumes for the study intersections.

Table 7 .
Observed mean speeds versus simulated mean speeds.The Default Model shows moderate accuracy, with the highest accuracy of 88.5% at the 50th Street and Quaker Avenue intersection and the lowest accuracy of 69.2% at the 34th Street and Indiana Avenue intersection.• Macro Model: The Macro Model has improved accuracy over the Default Model, ranging from 75.4% to 92.3%, indicating better performance across different intersections.• Micro Model: The Micro Model demonstrates the highest accuracy among the models, with the lowest accuracy being 80.8% at the 50th Street and Avenue Q intersection and the highest accuracy of 97.4% at the 82nd Street Slide Road intersection.

Table 8 .
Extracted rear-end conflicts at the study intersections.

Table 7 .
Negative Binomial Model results for the study intersections.

Table 9 .
Negative Binomial Model results for the study intersections.