A Novel Traffic Speed Prediction for Road Weaving Sections: Incorporating Traffic Flow Characteristics

Weaving sections on roads are crucial areas with high concentrations of mandatory lane changes, which can increase the likelihood of traffic accidents. Speed is a key factor in determining traffic safety, and the development of an accurate speed prediction method is essential for improving safety in weaving sections. While current methods are effective in predicting speeds in straightforward situations, they face challenges in more complex scenarios such as weaving sections. This study presents a refined traffic speed prediction approach specifically designed for weaving sections in order to tackle the aforementioned issue. Initially, novel variables were formulated to capture the unique traffic flow attributes present in weaving sections, which distinguish them from standard road segments. Subsequently, supplementary empirical variables that are known to impact speed were incorporated. We conducted a variable importance assessment to ascertain the extent and direction of each variable’s contribution. Lastly, variables with significant positive effects were chosen as inputs for three machine learning algorithms: Random Forest (RF), Backpropagation Neural Network optimised with Genetic Algorithm (BPNN-GA) and Support Vector Regression (SVR). This method was evaluated using aerial footage from five distinct weaving sections in China, maintaining an approximately 3 km/h prediction error. In addition, the study also finds that the speed distribution in weaving sections is negatively correlated with the number of lane-changing. Vehicles experience deceleration at both on-ramp and off-ramp, with a more significant deceleration occurring at the on-ramp. Speed is significantly affected by short length, number of lanes and proportion of large vehicles. The proposed method can be embedded into intelligent traffic systems for safe speeds of autonomous vehicles in weaving sections. Reconstructing spatiotemporal patterns of traffic congestion, predicting traffic accidents and implementing active traffic management strategies in weaving sections could be investigated in the future.


INTRODUCTION
The road infrastructure throughout the world continues to expand year by year.By the end of 2022, the total road mileage in the United States had accumulated 6.6 million kilometres, while China's road network had surpassed 5.5 million kilometres.Alongside this expansion, there has been a notable increase in the interchange construction at the crucial intersections of road infrastructures, which serve a vital role in segregating and directing traffic flows.In recent years, cloverleaf configurations have gained increasing popularity in these interchanges, primarily due to land use constraints.This research focuses on weaving sections Many methods have been employed for predicting traffic speeds.The most classical approach is the Lighthill-Whitham-Richards (LWR) model, which necessitates various types of data as inputs, including traffic speed, traffic density and traffic flow data [6].In recent times, there has been a growing interest in methods that rely merely on speed data, categorising them into statistical technique methods, data-driven methods and hybrid methods [7,8].The first category contains methods that utilise small sample data and traffic flow theory to investigate the pattern of traffic speed change.Notable techniques in this category include the Autoregressive Integrated Moving Average (ARIMA) [9,10] and the Hidden Markov Model (HMM) [11].The second category is data-driven methods which primarily include machine learning and deep learning methods, such as Support Vector Machines (SVM) [12], Recurrent Neural Networks (RNN) [13], Convolutional Neural Networks (CNN) [14] and others.Hybrid approaches often combine data-driven methods with theoretical traffic flow models.Recent approaches include Physics-Informed Neural Networks (PINN) [15] and a hybrid method that combines Stacked Autoencoder (SAE) with Long Short-Term Memory (LSTM) [16].
However, these methods concentrate on data fitting and often lack a deep exploration of underlying causes of speed changes, resulting in models with limited interpretability.Additionally, it should be noted that these methods are trained on extensive traffic data with spatial variations, theoretically making them effectively applicable to any road location.Nevertheless, in intricate and complex traffic environments, especially in weaving sections, these methods often exhibit limited prediction accuracy [17,18].To address such limitations, this paper proposes a novel method that leverages the traffic flow characteristics of the weaving sections to address the aforementioned challenges.This method has the following key features: 1) Refined and targeted: Previous studies have often overlooked the influence of spatial heterogeneity data on prediction outcomes.As a result, the resultant "generic" prediction methods for traffic speeds are broadly applicable across entire roads but tend to exhibit reduced accuracy in specific intricate spatial locations [19,20].However, these intricate locations play a particularly crucial role in improving traffic safety.To address this, we refine the generalised speed prediction model and construct a specialised model tailored for weaving sections to improve prediction accuracy in these weaving sections.2) Enhanced interpretability: Previous studies on speed prediction have often relied on data-driven methods such as machine learning and deep learning, to analyse speed change patterns and trends, with a primary focus on accurately fitting the data rather than uncovering the underlying causes of speed variation patterns.In this study, we take a novel approach by drawing from traffic flow theory and the actual speed distribution within weaving sections.We have introduced three distinct types of input variables designed to capture the essence of traffic flow in weaving sections.Furthermore, our research demonstrates the extent to which these innovative variables contribute to variations in traffic speed.This method allows us to explain the main factors driving changes in traffic speed across various weaving section configurations.3) Reliability and wide applicability: We validated the reliability of our method by utilising real aerial video data.It is worth noting that many prior studies have relied on the Next Generation Simulation (NGSIM) dataset, collected by the Federal Highway Administration (FHWA) of the United States.Nevertheless, this dataset may not accurately reflect the conditions on China roads, given disparities in route design standards, speed limits and driving rules.In our study, we conducted fieldwork to capture aerial videos from four weaving sections situated in Xi'an, China.We then extracted vehicle trajectory data for model validation.Furthermore, we compared our approach with the classical Highway Capacity Manual 6th (HCM6) method and the state-of-the-art method, to prove the reliability and applicability of our method for weaving sections in western China.The paper is structured as follows.Section 2 summarises related work and highlights the contributions of this paper.Section 3 details the proposed traffic speed prediction method for weaving sections.Section 4 validates the effectiveness of the proposed method by comparing predicted speeds from the following methods: the proposed method in this paper, the classical HCM6 method and the state-of-the-art LSTM, using collected real speed data as a benchmark.Section 5 discusses the findings and offers future research suggestions.

LITERATURE REVIEW
This section reviews previous traffic speed prediction methods, categorised into three groups: (i) statistical technique methods, (ii) data-driven methods, and (iii) hybrid methods.
Statistical technique methods, such as ARIMA [9,10], HMM [11] and Kalman Filter [21,22], are primarily suited for scenarios with limited data and simple traffic conditions [23].For example, ARIMA makes the unrealistic assumption that the traffic states follow a stable process with constant mean, variance and autocorrelation values, which does not align with real-world complexities.In addition, the HCM6 method constructs a regression model to achieve traffic speed prediction on weaving sections, but its accuracy has been demonstrated to be insufficient [24].Given this, several studies have attempted to enhance its predictive performance.Xu et al. [25] proposed a new speed prediction method that converges to the basic road section under low weaving flow or substantial weaving lengths, surpassing the HCM6 method.Sun et al. [26] proposed a segmented and modified model for speed prediction in weaving sections, employing a genetic algorithm for parameter calibration.The results showed a total error of less than 10%, with an average prediction error of 9.52% for weaving traffic speed and 6.64% for non-weaving traffic speed.However, it is worth noting that the predictive accuracy of statistical technique methods is relatively low since they struggle to capture non-linear variations in traffic speed.
Data-driven methods, including machine learning and deep learning, have become prevalent in estimating traffic speed, thanks to advancements in Artificial Intelligence (AI) and traffic data science.These methods are well-suited for the wealth of traffic data available and excel at capturing non-linear relationships, effectively addressing the limitations of statistical technique methods.Key data-driven methods for traffic speed prediction encompass LSTM, CNN and SVM, among others.LSTM has gained popularity for its ability to accurately capture non-linear traffic states and forecast speed based on time series [27,28].However, it focuses mainly on the temporal dimension of data while ignoring the spatial dimension, which can be taken into account by CNN.CNN has become a common method for predicting the traffic speed in road networks.Using a two-dimensional spatio-temporal matrix, it transforms traffic states into images and leverages powerful spatial feature extraction capability to predict traffic speeds [29,30].Ma et al. [31] compared this method against three deep learning methods -SAE, RNN and LSTM -as well as four widely used methods: Ordinary Least Squares (OLS), K-Nearest Neighbor (KNN), Artificial Neural Network (ANN) and RF.Their model outperforms the others, improving the average accuracy by 42.91% while maintaining reasonable execution time.Wang et al. [32] designed a recursive convolutional neural network structure with error feedback, considering the influence of the nearby road sections and uncovering the implicit linkages between road sections.Ke et al. [33] proposed a two-stream multi-channel convolutional neural network (TM-CNN) to account for the influence of traffic flow on speed.
Several studies have shown hybrid methods can improve prediction accuracy.Hybrid methods encompass a combination of statistical techniques with data-driven methods and a fusion of two data-driven methods [8,34].For instance, Tan et al. [35] combined ARIMA with ANN for short-term traffic speed prediction, taking advantage of ARIMA's robust linear fitting ability and ANN's powerful nonlinear mapping ability.Tian et al. [16] combined SAE and LSTM for traffic flow prediction, using SAE to capture spatial features and LSTM to extract temporal features of traffic flow.Hybrid methods may demonstrate better prediction accuracy and stability compared to individual methods [13,36].
However, data-driven and hybrid methods prioritise data fitting at the expense of understanding the true causes behind changes in traffic speeds, resulting in low method interpretability.These methods are typically generic, designed for basic road segments [17,18] and road networks [19,20], using spatio-temporally mixed traffic data (usually NGSIM dataset) without considering the data spatial heterogeneity.Consequently, they tend to perform poorly in specific spatial locations like weaving sections, which have complex traffic flow characteristics.Furthermore, few studies have examined traffic speed prediction in road weaving sections based on data-driven methods.
To address these gaps, this paper proposes a novel traffic speed estimation method tailored for weaving sections.Our work contributes in the following ways.First, a refined model specially designed for speed estimation in road weaving sections is developed.Second, the spatial speed distribution in the weaving sections at various geographic locations is analysed.Three types of variables are innovatively designed to capture the uniqueness of weaving sections compared to regular road sections.They are designed from the perspectives of lane-changing behaviour, distance of vehicles from the weaving section's on/off-ramps and spatial density distribution of vehicles.The individual contributions of these variables to the estimation speed are quantitatively assessed.Additionally, RF, BPNN-GA and SVR are initially considered as potential main algorithms for this paper's method.Third, this study collected aerial videos of weaving sections in various Chinese locations, amassing approximately 600,000 high-precision vehicle trajectory data points for method validation.Finally, method performance is compared with LSTM, a common choice in the recent speed prediction research, and the traditional HCM6 method.The results show our method is superior to other methods, which effectively adapts to various weaving section configurations and geographical locations to provide insightful explanations for speed variations.

SPEED PREDICTION METHOD FOR WEAVING SECTIONS
The method comprises the following main steps: designing three types of unique variables to characterise weaving section traffic patterns, creating a database of speed-influencing variables, assessing their significance, and selecting a preferred speed prediction regression algorithm.Refer to Figure 1 for a visual representation of the procedure.

1) Designing three types of unique variables for weaving sections
These variables are designed from the following perspectives: lane-changing manoeuvre, vehicle and on/ off-ramps positional relativity and vehicle space distribution.We consider four headways that describe lane-changing manoeuvres between the lane-changing vehicles and adjacent front-rear vehicles before and after the lane changes.We analyse the distance between vehicles and the weaving section on/offramps, indicating their positional relativity.We assess longitudinal and transversal vehicle spacings and characterise spatial distribution using a mean and standard deviation of distances between vehicles along a given direction.

2) Building a speed-influencing variables database
The database for speed-influencing variables in weaving sections is enriched with broader empirical variables, in addition to the three types of uniquely designed variables.These empirical variables, including traffic volumes, weaving volumes, number of lane changes, large vehicle proportions and short length (the distance in feet between the endpoints of any barrier markings (solid white lines) that prohibit or discourage lane changing), are sourced from the classical HCM6 method.

3) Assessing variable significance
To streamline method complexity and enhance operational efficiency, we assess variable significance using the RF algorithm, identifying and retaining those variables in the speed-influencing variables database that contribute most significantly to speed variation.4) Selecting the optimal algorithm Three prediction algorithms have been chosen: RF, BPNN-GA and SVR.The best-performing algorithm among them will be used as part of this method.Different input data may be suitable for the different algorithms, so choosing multiple algorithms increases method prediction accuracy.The problem of method running time is also considered.More algorithms were not selected in order to avoid long running time, which affects the real-time nature of the result output.Therefore, only three algorithms were selected.

Design of three types of unique variables
This research explores the lane-changing manoeuvre, vehicle and on/off-ramp positional relativity, and vehicle space distribution in weaving sections to describe complex traffic features of weaving sections.The reasons for this exploration are as follows.
Based on professional theoretical knowledge and extensive video observations of weaving sections, we have formulated a hypothesis that the lane-changing behaviour within these sections may influence the variation of spatial average speed.To test this hypothesis, this paper collected data on spatial average speed and frequency of vehicle lane changes within the weaving sections, as illustrated in Figure 2. Furthermore, this study extends the analysis to encompass three other geographically weaving sections in addition to the one depicted in Figure 2. The speed and lane-changing distributions within these sections exhibit consistency with the patterns illustrated in  In this study, we define the "on-ramp" and "off-ramp" as the longitudinal spatial reference points for each weaving section, dividing the road between a pair of reference points into six segments.Each segment is characterised as a spatial unit, with its length stored as a spatial unit length.Beyond these two reference points, additional segments are created on both sides, each delineated by the spatial unit length.Both average spatial speed and number of lane changes shown in Figure 2 align with the spatial units along their longitudinal positions.From the graphical representation, we make an initial qualitative assessment that the spatial distribution pattern of lane changes in a weaving section is nearly opposite to the pattern of spatial speed distribution.The number of lane changes in the first half of the section exceeds that in the second half, with a peak occurring near the first third of the on-ramp.Therefore, the first variable type introduced is termed "Lane-Changing Manoeuvre Variables".
Figure 2 clearly illustrates vehicles that exhibit deceleration behaviour at both on-ramp and off-ramp locations of a weaving section, with more significant deceleration at on-ramp than off-ramp.This observation implies vehicles exhibit specific acceleration and deceleration patterns as they approach these ramp points.Consequently, the second variable type introduced is termed "Vehicles and On/Off-Ramps Positional Relativity Variables".
The longitudinal cross-section speeds exhibit a distinctive pattern characterised as "high-low-high".Previous research has demonstrated significant variations in the speeds of multi-lane vehicles across the road cross-section [37].This finding suggests a correlation between speed values and their spatial positions within a weaving section.Consequently, "Vehicle Spatial Distribution Variables" is regarded as the third variable type.

Design of Lane-Changing Manoeuvre Variables
In this paper, headways are employed to illustrate the characteristics of lane-changing manoeuvres.The reason we chose headway over the gap is that we have been experimenting specifically with lane-changing behaviour, and we tried a variety of performances and found headway to work better.Additionally, headway contains more information than the gap.It contains information about the gap as well as the speed of rear cars.These headways are associated with up to five vehicles: the lane-changing vehicle itself, two vehicles directly ahead and behind the lane-changing vehicle in the current lane, and two vehicles in the adjacent lane, one ahead and one behind, after the lane-changing manoeuvre.For each lane-changing manoeuvre, there are up to four headways, representing the distances between the lane-changing vehicle and the other four vehicles mentioned.Theoretically, the headways should be determined when there is no front or rear vehicle by considering the weaving section's maximum length and minimum speed.However, for this study, a zero value is assigned to these cases to assess the impact of this variable.Traffic risk increases as headway between vehicles decreases.However, lane-changing manoeuvres commence with the emergence of a need to change lanes and conclude with the selection of an appropriate gap for the lane change.These four headways effectively characterise the lane-changing risk for vehicles within weaving sections, describing the 'vehicle Lane-Changing Manoeuvre.'Consequently, the first variable type is represented by these headways, as depicted in Figure 3.The associated equations are as follows: where H i,j refers to the headways of the j vehicles in the i lane.i=p refers to the present lane, i=t refers to the target lane, j=flc refers to the front vehicle and the lane-changing vehicle, j=rlc refers to the rear vehicle and the lane-changing vehicle, L i,j refers to the distance between j vehicles in the i lane, V i,j refers to the speed between j vehicle in the i lane.

Design of Vehicles and On/Off-Ramps Positional Relativity Variables
As vehicles may accelerate and decelerate when approaching on/off-ramps, vehicle speeds are correlated with the relative positions of vehicles and on/off-ramps locations.In this paper, the average distance between all vehicles and on-ramps signifies the association between vehicles and on-ramps, while the average distance between all vehicles and off-ramps characterises their connection with off-ramps.where Lon-ramp refers to the mean distance between all vehicles and the small nose points of the solid white line on the on-ramps during the same time interval, L i,on-ramp refers to the distance between the i vehicle and the small nose point of the solid white line on the on-ramps, L i,off-ramp refers to the distance between the i vehicle and the small nose point of the solid white line on the off-ramps and n refers to the number of vehicles during the same time interval.

Design of Vehicle Spatial Distribution Variables
The spatial distribution of vehicles during the same time interval also impacts traffic speed.Two variables represent the spatial distribution of vehicles in both the longitudinal (y) and lateral (x) directions, as depicted in Figure 3. D ̅ y denotes the mean distance of all vehicles along the longitudinal road (y) during the same time interval.S y represents the degree of dispersion in the distribution of all vehicles along the y direction during the same time interval.A larger value D ̅ y indicates greater dispersion of vehicles in the y direction, while a higher S y value signifies more disorganised vehicle distribution in the y direction.Similarly, transverse distribution is represented by D ̅ x and S x .In Figures 4a and 4b, we observe uniform and uneven longitudinal distributions, respectively.Figures 4c and 4d depict situations of uniform and uneven transverse distribution, respectively.where D ̅ y represents the mean distance between two adjacent vehicles in the front and rear positions after all vehicles are longitudinally arranged during the same time interval, S y is the standard deviation of the distance between two adjacent vehicles in the front and rear positions after all vehicles are longitudinally arranged in the same time interval.Similarly, D ̅ x denotes the mean distance between left and right adjacent vehicles after all vehicles are transversely arranged during the same time interval, and S x stands for the standard deviation of the distance between left and right adjacent vehicles after all vehicles are transversely arranged in the same time interval.

Construction of a speed-influencing variable database
The speed-influencing variable pool consists of both empirical variables and design variables.Empirical variables are derived from HCM6's Weaving Section Model.In HCM6, traffic speeds within the weaving section encompass weaving speeds and non-weaving speeds.Weaving speeds are calculated based on two expected speed limits, short lengths and lane change rates.Among these factors, the lane change rate is connected to the minimum lane change rate, short length and the number of lanes.The minimum lane change rate is correlated with the weaving flow and its corresponding minimum lane change time.On the other hand, the minimum lane change rate, free-flow speed, total flow and the number of lanes affect the speed of non-interleaving lanes.Therefore, the empirical variables are determined to be traffic volume, weaving volume, lane change time, number of lanes and short length of the weaving section.Furthermore, a component accounting for large vehicle rates has been introduced to consider the impact of large vehicles on speed.The complete set of variables is presented in Table 1.

Short length L s
Distance between the solid white lines at the end of the weaving section that prohibits lane-changing.

Designed variables
Lane-changing manoeuvre

H t,fc
Headway of the front vehicle and the lane-changing vehicle in the target lane.

H t,rlc
Headway of the rear vehicle and the lane-changing vehicle in the target lane.

H p,flc
Headway of the front vehicle and the lane-changing vehicle in the present lane.

H p,rlc
Headway of the rear vehicle and the lane-changing vehicle in the present lane.

Lon-ramp
The average distance between all vehicles and the small nose points of the solid white lines on the on-ramps during the same time interval.

L off-ramp
The average distance between all vehicles and the small nose points of the solid white lines on the off-ramps during the same time interval.

D ̅ x
The average lateral (x-axis) gaps between adjacent vehicles during the same time interval.

D ̅ y
The average longitudinal (y-axis) gaps between adjacent vehicles during the same time interval.

S x
The standard deviation of the lateral (x-axis) gaps between adjacent vehicles during the same time interval.

S y
The standard deviation of the longitudinal (y-axis) gaps between adjacent vehicles during the same time interval.

Evaluation variable importance
Although a dataset frequently contains hundreds or thousands of features, choosing the ones to include in the model that substantially impacts the outcome can reduce model complexity while retaining high prediction accuracy.Typical techniques include RF, Principal Component Analysis, etc.This study uses RF to rank the variable significance.
VIM is the variable importance score and GI is the Gini coefficient.Suppose there are j features: X 1 , X 2 , X 3 ,…,X j I decision trees and C categories.Now to calculate the Gini coefficient score VIM j Gini of each feature X j , namely, the average change of node split impurity of the j feature in all random forest decision trees.
The formula to calculate the Gini coefficient of node q of the i tree is as follows: where C represents categories, P qc represents the proportion of c categories in node q.Intuitively speaking, it represents the likelihood of selecting two random samples from node q, where their category labels are different.The importance of the X j feature in the tree i node q, namely, the Gini coefficient change before and after the branch of node q is as follows: Promet -Traffic&Transportation. 2024;36(4):673-689.

VIM GI GI GI
where GI l i and GI r i respectively represent the Gini coefficient of the two new nodes after branching.If the node where the feature X j appears in the decision tree i is set to Q, then the X j importance in the i tree is as follows: Assuming that there are I trees in a random forest, then, Finally, the feature importance scores of all the obtained trees are normalised.

Algorithm preference for speed prediction
As the RF name implies, the random forest grows a forest randomly.The forest comprises multiple decision trees, each with no correlation.Regression can be performed using the RF algorithm, which logically meets the research goal.All samples are at the decision tree's root in its initial state since it has not yet grown.This tree's sum of residual squares equals the sum of residual squares of the regression.The algorithm's core principle is that choosing a variable will reduce the sum of squares of the two portions.The law is recirculated at the next bifurcation node.The next categorisation characteristic variable is chosen until a full tree is produced.The weighted mean of the target variable at the leaf node is the expected value of the random forest regression tree.
The RF algorithm offers the following benefits: it introduces unpredictability based on the decision tree, making it less likely to become overfitted.It uses an unbiased estimate for generalisation errors when constructing a random forest.Mutual interactions between features can be discovered during the training process and creating a parallelisation mechanism is relatively straightforward.

Data collection in the weaving sections
Five common A-type weaving sections around Xi'an, China, were selected for data collection.These sections exhibit high weaving volumes, weaving ratios and frequent lane-changing activities.The traffic flow weaving phenomenon is significant, allowing for the extraction of abundant weaving vehicle data to study weaving section characteristics.Basic information about five A-type weaving sections is shown in Table 2.An unmanned aerial vehicle (UAV) hovered about 120 meters above these weaving sections, capturing aerial videos during a weekday in July 2022, characterised by cloudy weather and a temperature of 29 degrees.

Extraction of highly accurate trajectory data
Processing aerial video into vehicle trajectory data is required before obtaining a speed-influencing variable database from actual weaving sections.In this study, we extracted high-precision vehicle trajectories from weaving sections using Data From Sky platform and Data From Sky Viewer software.A total of 610,200 pieces of high-precision vehicle trajectories were obtained.The trajectories have a 0.1px spatial accuracy and a 0.0001s time accuracy.Figure 5 depicts the vehicle trajectory extraction interface.Table 3 displays a sample of trajectories.Firstly, it is necessary to extract trajectory data of lane-changing vehicles from the entire dataset to facilitate the analysis of lane-changing manoeuvres in subsequent sections of the paper.Secondly, it is essential to determine the lane location that lane-changing vehicles were in before and after lane-changing.This is achieved by comparing the X-values of trajectory lines with those of lane lines, with the intersection point between the trajectory line and lane line representing the lane-changing location, at which point their X-values are extremely close.Due to an inherent distortion of lane lines in the aerial videos, it was essential to fit the lane lines of the weaving sections.The curve estimation function of Statistical Product and Service Solutions (SPSS) was used to determine the model that best fits the data.According to fitting results, cubic polynomials had slightly smaller residuals than quadratic polynomials, with an R 2 of 0.997 for both, thus the former was chosen.The ANOVA SIG value is 0.00001, demonstrating its significance.Finally, 1,186 lane-changing trajectories were extracted, along with data pertaining to the lane-changing vehicle's IDs, locations, frequencies of lane-changing and other information.

Analysis of variable importance results
The variables in the speed-influencing variable database proposed in this paper were ranked according to their importance, as shown in Figure 6.The significance of all variables is positive.This indicates that they all contribute positively to traffic speed.The results partially support the rationale for designing three new types of variables for weaving sections.Among all variables, vehicle spatial distribution variables made the biggest contribution.In comparison, the lane-changing manoeuvre variables contributed the least.The primary explanation for the limited contribution of lane-changing manoeuvres may be the small sample size, as lane-changing samples represent only about 3% of the total sample.In the experiments, the lane-changing manoeuvre variables were assigned values only for lane-changing samples and not for non-lane-changing samples, which explains their relatively lower overall contribution.Despite the small sample size, their positive impact on the results is significant enough to warrant consideration in subsequent models.Another reason for considering the headway variable is that it is significantly correlated with the level of safety in the weaving section.The greater the headway between the target vehicle and the vehicle on the target lane before and after the vehicle, the greater the safety of the target vehicle after the lane change.Similarly, if the target vehicle is in the current lane, the greater the headway between the front and rear of the vehicle, the greater the degree of freedom of the driver to choose to change lanes and not easily change lanes in the process of colliding with surrounding vehicles.The contribution of the distance from the on-ramp variable is greater than the contribution of the distance from the off-ramp variable among the type of vehicles and on/off-ramp location relativity variables.This shows that, compared with the off-ramp, the on-ramp has a more substantial impact on driver travel speed.Traffic volume, large vehicle ratio, short length of weaving sections and number of lanes are all significant variables among the empirical variables affecting speed.The geometry of a weaving section is determined collectively by its short length and number of lanes, implying that the real operational speed condition of weaving sections can be improved from a design perspective.In conclusion, as every variable in the speed-influencing variable dataset positively affects speed, they are all included in the subsequent traffic speed prediction algorithm.

Comparison analysis of three machine learning algorithms
This paper initially employed three algorithms for selecting the weaving section speed prediction algorithm: RF, SVR and BPNN-GA.The best-performing algorithm was then selected for the speed prediction method.To increase processing speed, 12.5% of the total dataset samples were randomly chosen for the study.This study included a sample size of 10,892 items, with 321 items representing lane-changing samples.The ratio of a training set to a testing set was set at 4:1.R 2 , RMSE and MAE were chosen as the evaluation metrics for the regression algorithms to assess their effectiveness.The results are shown in Table 4. R 2 reflects the extent to which the regression algorithm explains the actual values; RMSE indicates the magnitude of the deviation between predicted values and actual values; MAE represents the average of the absolute differences between predicted values and actual values, reflecting the actual error between them.The results of the training set reveal that the RF achieves the best regression performance, with a prediction error of approximately two kilometres per hour compared to real speed.In contrast, the other two algorithms perform similarly, with prediction errors of approximately five kilometres per hour.The results of the test set exhibit a similar pattern to those of the training set, but the RF algorithm experiences a reduction in prediction accuracy of about one kilometre per hour.
To assess the adaptability of the proposed method across varying data input sizes, the method was tested to determine its capacity to perform well even with limited data inputs.This aligns with a recent research focus on achieving "high-accuracy prediction results with small sample inputs."Thus, in this study, further experiments were conducted to investigate whether the data sample size and the ratio of training to testing sets had a significant impact on the results of the aforementioned method.From 10,892 raw data, samples with sizes of 2,000, 4,000, 6,000 and 8,000 were randomly selected for the experiment.Figure 7a displays the evaluation results using RMSE.The findings indicate that algorithm performances generally improve with increasing data volume, particularly when the overall data sample size is around 10,000.The performance remains stable and the algorithm's effectiveness does not undergo abrupt change with varying samples.These experiments demonstrate that the data sample size does not significantly affect the algorithm's performance.Notably, the RF algorithm consistently outperforms the other two algorithms.Secondly, the study aimed to verify whether the three algorithms consistently delivered stable performance across various test ratios.The experiments were conducted with training set to testing set ratios of 5:1, 4:1, 3:1 and 2:1, respectively.RMSE was used to assess the algorithm's performance and the results are displayed in Figure 7b.The assessment metrics for each algorithm remain consistent across these four different test ratios, indicating that the choice of test ratios has minimal impact on the results.
Through two further experiments in this study, focusing on data sample size and test ratios, it was found that these two parameters barely interfere with the model's prediction capacity, indirectly demonstrating the stability of the speed prediction method developed in this paper.The preferred prediction method, based on the RF algorithm, exhibits a speed prediction error of approximately three kilometres per hour.To the best of our knowledge, the accurate speed prediction error in the current research is about eight kilometres per hour [13].These findings indicate that this study contributes to further enhancing the accuracy of spatial average speed prediction in weaving areas.

Comparison analysis of previous research methods
The speed prediction method proposed in this study was compared using HCM6 and LSTM models.We aggregated the extracted trajectory data into two categories, 5 minutes and 10 minutes, and compared the performance of three methods for estimating spatially averaged speeds, as shown in Table 5.The LSTM neural network used for comparison consists of an input layer, an LSTM layer, an activation layer, a fully connected layer and an output layer.The Adam gradient descent method was used, Mini Batch Size was set to 30, Max Epoch was set to 1,000, Initial Learn Rate was set to 1e-2 and Learn Rate Drop Factor was set to 0.5.The HCM methodology can be specified in the weaving section of HCM6.The DTF-RF algorithm is the method proposed in this paper for weaving sections, DTF means Designing Three kinds of Features, and it can be seen that it demonstrates a higher estimation accuracy than LSTM and HCM6.

CONCLUSION AND FUTURE WORK
This study contributes to speed prediction models for weaving sections by utilising traffic flow characteristics of weaving sections.The method adopts exploratory causation logic, distinguishing itself from the commonly used time series prediction logic in previous research.Specifically, we conducted an in-depth analysis of the speed distribution pattern in weaving sections and explored the unique speed-influencing factors.Furthermore, we integrated these factors with machine learning algorithms to construct this method.Lastly, the method was validated using aerial video data collected in the field, with a prediction error of approximately three kilometres per hour.The proposed method can provide guidance for the development of traffic control strategies related to safer speeds of vehicles in weaving sections.Additionally, this method can be incorporated into the development of HCM6.Several useful findings can be generated in this study.
The lane-changing distribution within weaving sections exhibits a pattern that is nearly inverse of spatial averaged speed distribution.Specifically, the number of lane changes in the first half of the weaving section is greater than in the second half, with the peak occurring near the first third of the on-ramp.Simultaneously, both on-ramps and off-ramps witnessed a sudden drop in speed despite variations in the geographical locations of different weaving sections, with the former pronouncing a notably significant deceleration.
This study finds that lane-changing manoeuvres, the relative positioning of vehicles concerning on/ off-ramps and the spatial distribution of vehicles within the weaving sections significantly influence the spatial average speed.It is worth noting that short length, number of lanes, large vehicle ratios and total traffic volume all impact the spatial average speed of weaving sections, but the degree of impact decreases progressively.This paper is limited to A-type weaving sections only and does not discuss other configurations.Due to the limited duration of aerial video collection, only weaving sections can be studied within a short period.It is not possible to consider the effects of different periods or traffic volumes.
Future work can be conducted by collecting all-weather video data to investigate the traffic flow operation of weaving sections during different periods and under varying traffic volumes.In addition, reconstructing spatiotemporal patterns of traffic congestion, predicting traffic accidents and implementing active traffic management strategies in weaving sections could be investigated.By designing well-considered road markings to guide vehicle trajectories and suggesting appropriate speeds, we can assist vehicles in making safer lane changes, ultimately enhancing the safety of vehicles passing through weaving sections.

Figure 2 -
Figure 2 -Relationship between lane-changing frequency and spatial average speed

Figure 3 -
Figure 3 -Diagram of three categories of design variables

Figure 4 -
Figure 4 -Schematic diagram of the disorder degree of vehicles in longitudinal and transverse

Figure 5 -
Figure 5 -Vehicle trajectory extraction interface in a weaving section

Figure 6 -
Figure 6 -Variable importance ranking under different sample sizes b) RMSEs under different test ratios Figure 7 -RMSEs of the three algorithms under different sample sizes and test ratios

Table 1 -
Description of empirical variables

Table 2 -
Basic description of the studied weaving sections

Table 3 -
Sample trajectory data

Table 4 -
Regression evaluation indicators of each algorithm

Table 5 -
Regression evaluation indicators of each algorithm