Predicting Number of Vehicles Involved in Rural Crashes Using Learning Vector Quantization Algorithm

: Roads represent very important infrastructure and play a significant role in economic, cultural, and social growth. Therefore, there is a critical need for many researchers to model crash injury severity in order to study how safe roads are. When measuring the cost of crashes, the severity of the crash is a critical criterion, and it is classified into various categories. The number of vehicles involved in the crash (NVIC) is a crucial factor in all of these categories. For this purpose, this research examines road safety and provides a prediction model for the number of vehicles involved in a crash. Specifically, learning vector quantization (LVQ 2.1), one of the sub-branches of artificial neural networks (ANNs), is used to build a classification model. The novelty of this study demonstrates LVQ 2.1’s efficacy in categorizing accident data and its ability to improve road safety strategies. The LVQ 2.1 algorithm is particularly suitable for classification tasks and works by adjusting prototype vectors to improve the classification performance. The research emphasizes how urgently better prediction algorithms are needed to handle issues related to road safety. In this study, a dataset of 564 crash records from rural roads in Calabria between 2017 and 2048, a region in southern Italy, was utilized. The study analyzed several key parameters, including daylight, the crash type, day of the week, location, speed limit, average speed, and annual average daily traffic, as input variables to predict the number of vehicles involved in rural crashes. The findings revealed that the “crash type” parameter had the most significant impact, whereas “location” had the least significant impact on the occurrence of rural crashes in the investigated areas.


Introduction
The paramount importance of roadways as critical infrastructure for sustainable development cannot be understated.As such, a reduction in road-related crashes and the augmentation of road safety are preeminent objectives that transport engineers and researchers endeavor to attain, in consonance with sustainable mobility.Consequently, an urgent imperative exists regarding the creation of a model to evaluate the gravity of crash injuries, which would facilitate researchers in examining road safety [1,2].Given the burgeoning global population, it is an incontrovertible fact that both advanced and emerging societies face an upsurge in vehicular volume, subsequently resulting in an intensification of travel and traffic on thoroughfares, thereby amplifying the likelihood of vehicular incidents [3,4].
In addition to the more traditional approaches to interpreting accident data (e.g., descriptive or inferential statistics), since the end of the last century, many researchers have resorted to the simulation of transport networks to evaluate road safety and estimate its effects on people and the environment [5,6].
The multifarious nature of road safety has been the focus of invaluable research efforts aimed at bolstering our understanding of the subject.In certain instances, crashes may be impacted by a combination of various risk factors [7,8].Contributing factors that have been identified encompass, but are not limited to, daylight [9], weather conditions [10], the age of the driver and vehicle [11,12], the speed limit and average speed [13,14], and the annual average daily traffic (AADT) [15].
In general, traffic collisions involve only one or two individuals.Depending on the quantity of automobiles implicated, crashes are categorized as either single-vehicle crashes (SVCs) or multiple-vehicle crashes (MVCs) [16].The appraisal of accident-related expenditures necessitates the contemplation of crash severity, a vital parameter that is classified into several tiers.The number of vehicles implicated in the crash constitutes a critical variable throughout all of these gradations.
Wang [17] incorporated both environmental and safety considerations to comprehensively capture the multifaceted aspects of sustainable transport.To do so, he developed a unified performance measure using data envelopment analysis (DEA), a nonparametric approach to benchmarking entities with multiple inputs and outputs.This measure was then applied to jointly assess the environmental impacts and safety concerns of road transport for a set of OECD (Organization for Economic Co-operation and Development) countries between 2000 and 2014.Finally, he demonstrated that the unified measures derived from this joint assessment can differ significantly from those obtained by evaluating the environmental impacts and safety separately.McLeod and Carey [18] conducted a literature review on traffic safety, utilizing the established hazard control hierarchy.Their research identified and categorized potential approaches to the successful integration of Vision Zero with broader sustainable accessibility policy objectives.The authors synthesized the literature within the context of the Hazard Control Hierarchy, offering a framework for the more efficacious coordination of professional practices that impact urban safety and sustainability.Ultimately, the authors supplied recommendations for enhancing the integration of Vision Zero and sustainable accessibility policies, with the hazard control hierarchy serving as an organizing principle.Ziakopoulos and George [19] conducted a comprehensive analysis of the available literature that explored the diverse spatial methodologies used by researchers to examine and analyze the spatial dimension in their investigations.Additionally, the authors evaluated studies that concentrated on the spatial analysis of precarious road users.The authors also deliberated on the practical implementation, benefits, and drawbacks of the diverse techniques used in spatial modeling.Drawing upon their critical review, they identified current obstacles and future avenues for research in this field.
Afghari et al. [20] utilized a joint model of crash count and crash severity to identify road segments that pose a high risk of fatal and serious injury crashes.The study employed data from state-controlled roads in Queensland, Australia, and a novel risk score was developed by predicting crash counts by severity and weighting them using the cost ratio of severity levels.The weighted risk score was then employed to pinpoint road segments with a heightened risk of fatal and injury crashes.Their results revealed that the joint model of crash count and crash severity substantially enhanced the prediction accuracy when compared with traditional count models.In another study, Tamakloe, and Park [21] utilized fatal crash data from Korea to identify hotspots with increasing (critical) and decreasing (diminishing) temporal trends using a spatio-temporal hotspot analysis tool in a geographic information system (GIS).Additionally, they employed a machine learning technique to investigate the series of factors that influence the number of vehicles and casualties involved in fatal crashes at intersections and midblocks in each hotspot type identified.Based on their findings, they identified groups of factors that could be collectively addressed to enhance road safety and recommended countermeasures to mitigate fatal crashes on the roads.Hossain et al. [22] employed a partial proportional odds model to predict the injury severity of the most severely injured driver in a multi-vehicle crash, using demographic information on all drivers involved.The authors then compared models that incorporated the demographic information and vehicle characteristics of all drivers and vehicles involved in a crash with models that considered only information about the most severely injured driver, evaluating the significance of factors and the prediction accuracy.The results of their study suggested that although young drivers were generally found to have lower levels of injury severity compared to working-age drivers, the severity of injuries increased when the proportion of young drivers involved in a multi-vehicle crash was higher.
Based on a review of the existing literature, it has been established that roads are an essential element of infrastructure and play a critical role in the advancement of society, the economy, and culture.Generally, professionals in the field of transportation engineering and research on road safety issues prioritize two major goals: reducing the occurrence of road crashes and improving overall road safety.These objectives are closely related, making it necessary to develop a model that can accurately predict the severity of injuries resulting from crashes.Such a predictive model is crucial for researchers to assess road safety effectively.Understanding the number of vehicles involved in a crash (NVIC) is one of the most important factors that can play a role in planning and reducing the severity of road crashes.Hence, the main objective of this research is to examine road safety and create a predictive model that can estimate the NVIC.This model is developed using a technique known as learning vector quantization (LVQ 2.1), which is a subset of artificial neural networks (ANNs) used for classification.The study analyzes the records of 564 crashes that took place on rural roads in southern Italy to construct the models.It is worth mentioning that, based on a study of the literature reviews, the proposed LVQ 2.1 model has some performance benefits, while other predictive models have some major limitations.Particularly useful for predicting the variables affecting the NVIC, the LVQ 2.1 algorithm is renowned for its capacity to efficiently manage nonlinear connections in complicated datasets.Furthermore, the strong learning ability of the algorithm enables it to create effective prediction models even in the face of noisy or missing data, a typical difficulty in accident reporting.On the other hand, traditional statistical models such as linear or Poisson regression often find it difficult to faithfully represent the nonlinear and diverse character of NVIC data, therefore producing a less-than-ideal prediction performance.Also, a lot of the machine learning techniques we use now, like decision trees, might not be as effective in handling the complex and high-dimensional interactions we see in NVIC data.This might lead to overfitting or bad generalizations.Therefore, the important contributions of this study are summarized as follows: - The rest of this work is delineated as follows: Section 2 outlines the LVQ methodology employed in this research.Section 3 offers a concise summary of the case study's features.In Section 4, the developed models are constructed, and the factors that contribute to the number of vehicles involved in a crash are analyzed.Lastly, Section 5 provides concluding remarks and suggestions for future research.

Learning Vector Quantization (LVQ)
The contemporary scientific arena has observed noteworthy advancements in various branches of artificial intelligence (AI), which have led to the development of innovative technologies.Hence, the implementation of artificial intelligence techniques to tackle complex issues across various scientific fields is an unavoidable trajectory [23][24][25][26][27][28].The Learning Vector Quantization (LVQ) network is a kind of neural network that uses a supervised learning methodology.Pattern recognition and classification issues are where it finds its most frequent use [29][30][31].LVQ is highly comparable to self-organizing maps (SOM), and it also has many parallels to the k-Nearest Neighbor (kNN) technique of classification.To acquire prototypes (also called codebook vectors) to represent unique class areas, learning vector quantization (LVQ) is a kind of method used in statistical pattern classification.The hyperplanes that separate the prototypes define the boundaries of these class areas, creating Voronoi partitions.The LVQ network stands out from other types of ANNs in its own unique way [32].To assist network training and data categorization, the LVQ network uses the "winner-takes-all" method, which is based on either the "Hebbian Learning" or "Associate Learning" principles.Kohonen is the originator of LVQ, which has been subject to numerous adaptations and refinements over time, resulting in the emergence of several LVQ variants [33][34][35].Figure 1 illustrates an LVQ network exemplar.
The rest of this work is delineated as follows: Section 2 outlines the LVQ methodology employed in this research.Section 3 offers a concise summary of the case study's features.In Section 4, the developed models are constructed, and the factors that contribute to the number of vehicles involved in a crash are analyzed.Lastly, Section 5 provides concluding remarks and suggestions for future research.

Learning Vector Quantization (LVQ)
The contemporary scientific arena has observed noteworthy advancements in various branches of artificial intelligence (AI), which have led to the development of innovative technologies.Hence, the implementation of artificial intelligence techniques to tackle complex issues across various scientific fields is an unavoidable trajectory [23][24][25][26][27][28].The Learning Vector Quantization (LVQ) network is a kind of neural network that uses a supervised learning methodology.Pattern recognition and classification issues are where it finds its most frequent use [29][30][31].LVQ is highly comparable to self-organizing maps (SOM), and it also has many parallels to the k-Nearest Neighbor (kNN) technique of classification.To acquire prototypes (also called codebook vectors) to represent unique class areas, learning vector quantization (LVQ) is a kind of method used in statistical pattern classification.The hyperplanes that separate the prototypes define the boundaries of these class areas, creating Voronoi partitions.The LVQ network stands out from other types of ANNs in its own unique way [32].To assist network training and data categorization, the LVQ network uses the "winner-takes-all" method, which is based on either the "Hebbian Learning" or "Associate Learning" principles.Kohonen is the originator of LVQ, which has been subject to numerous adaptations and refinements over time, resulting in the emergence of several LVQ variants [33][34][35].Figure 1 illustrates an LVQ network exemplar.Figure 1 shows the weight vector W, which represents the connections between each neuron in the input layer and the neurons in the output layer.X represents the input, Y represents the output, and W represents the weight vector.To classify information into a desired category, the LVQ uses the Euclidean distance between input vectors.The data are assigned to the class or target with the least distance if the estimated distance is tiny or negligible [36].
One of the most widely recognized initial variants introduced by Kohonen is LVQ2.1, which is extensively expounded upon in Kohonen, 1990, andKohonen, 1997.As a result, LVQ2.1 was implemented in this study for the purpose of classifying the dataset.LVQ2.1 stands apart from its preceding versions owing to the fact that it updates two centers and neurons concurrently.This feature yields a considerable improvement in the efficiency of the algorithm, ultimately culminating in faster performance.The first step of the LVQ2.1 method is to choose the two closest prototypes based on the Euclidean distance, namely θl and θm, for every data point (x, y) in the training set . If the prototypes' Figure 1 shows the weight vector W, which represents the connections between each neuron in the input layer and the neurons in the output layer.X represents the input, Y represents the output, and W represents the weight vector.To classify information into a desired category, the LVQ uses the Euclidean distance between input vectors.The data are assigned to the class or target with the least distance if the estimated distance is tiny or negligible [36].
One of the most widely recognized initial variants introduced by Kohonen is LVQ 2.1, which is extensively expounded upon in Kohonen, 1990, andKohonen, 1997.As a result, LVQ 2.1 was implemented in this study for the purpose of classifying the dataset.LVQ 2.1 stands apart from its preceding versions owing to the fact that it updates two centers and neurons concurrently.This feature yields a considerable improvement in the efficiency of the algorithm, ultimately culminating in faster performance.The first step of the LVQ 2.1 method is to choose the two closest prototypes based on the Euclidean distance, namely θ l and θ m , for every data point (x, y) in the training set S ={ (x i , y i )} N i=1 .If the prototypes' labels cl and cm are distinct, and one of them corresponds to label y of the data point, then the two closest prototypes are modified based on Equations ( 1) and (2) [37][38][39].
In the event that the labels cl and cm are identical or both labels differ from the label y of the data point, no parameter update is executed.The modeling method and the performance indicators used in the modeling are explained in the next sections.

Case Study
In order to test the proposed methodology, the records of 564 accidents that occurred between 2017 and 2018 on rural roads in Cosenza province (Calabria, Italy) were used (Figure 2).The road accident sample was acquired from the ACI-ISTAT database (Automobile Club Italia-National Institute of Statistics), which collects and analyzes data on road accidents in Italy [40].labels cl and cm are distinct, and one of them corresponds to label y of the data point, then the two closest prototypes are modified based on Equations ( 1) and (2) [37][38][39].
( 1) ( ) ( )( -), In the event that the labels cl and cm are identical or both labels differ from the label y of the data point, no parameter update is executed.The modeling method and the performance indicators used in the modeling are explained in the next sections.

Case Study
In order to test the proposed methodology, the records of 564 accidents that occurred between 2017 and 2018 on rural roads in Cosenza province (Calabria, Italy) were used (Figure 2).The road accident sample was acquired from the ACI-ISTAT database (Automobile Club Italia-National Institute of Statistics), which collects and analyzes data on road accidents in Italy [40].The information contained in the dataset provides details on the date and place of the accident, the type of road, the pavement conditions, the weather conditions, the type of accident, the type of vehicle involved, the causes of the accident and the consequences for the people involved (injuries or deaths).However, this dataset does not contain information on Property Damage Only events, because ISTAT, in Italy, identifies and classifies accidents if they generate at least one injury.
The above information was integrated with other data characterizing the context of the study to enable a more detailed analysis and to implement the proposed method.In particular, the speed limits, the average speed and the average annual daily traffic (AADT) were acquired to characterize the road elements in which accidents occurred.The speed The information contained in the dataset provides details on the date and place of the accident, the type of road, the pavement conditions, the weather conditions, the type of accident, the type of vehicle involved, the causes of the accident and the consequences for the people involved (injuries or deaths).However, this dataset does not contain information on Property Damage Only events, because ISTAT, in Italy, identifies and classifies accidents if they generate at least one injury.
The above information was integrated with other data characterizing the context of the study to enable a more detailed analysis and to implement the proposed method.In particular, the speed limits, the average speed and the average annual daily traffic (AADT) were acquired to characterize the road elements in which accidents occurred.The speed limits were acquired from the dataset of the national autonomous road company (ANAS).
The average speed was obtained by gathering the available data of the historical traffic statistics of TomTom (TomTom Move) and Octo Telematics (Octo IoT Cloud), referring to the road sections with the observed accidents.The average annual daily traffic was obtained from the PANAMA system, a traffic monitoring platform provided by ANAS [40].
As better illustrated in Section Classification Modelling, the above-mentioned data were classified into seven independent variables (i.e., the factors affecting the number of vehicles involved in the crash), including four qualitative variables, namely daylight (DL), the type of crash (TC), day of the week (W), and location (LO), and three quantitative variables, namely the speed limit (SL), average speed (AS), and annual average daily traffic (AADT).

Modelling
The main objective of the current research is to explore the variables that influence the level of road safety in rural regions through the implementation of binary classification modeling techniques.To accomplish this aim, the study utilized a developed classification model, and the NVIC was assessed using the LVQ 2.1 approach, as previously stated.
In binary classification modeling, the confusion matrix provides the most useful accuracy and error measurements for evaluating performance [40].As shown graphically in Figure 3 and mathematically in Equations ( 3) and ( 4), the confusion matrix is used to facilitate model comparison.Data normalization is essential in data-driven system modeling methodologies because the investigated parameters have different ranges and measurement scales.Data that have not been normalized may produce inaccuracies in the calculation due to issues of a greater scale.Therefore, in this study, every piece of data was normalized using the min-max technique before being included in a model to eliminate the possibility of such outliers [40].
AI 2024, 5, FOR PEER REVIEW limits were acquired from the dataset of the national autonomous road company (AN The average speed was obtained by gathering the available data of the historical tr statistics of TomTom (TomTom Move) and Octo Telematics (Octo IoT Cloud), referrin the road sections with the observed accidents.The average annual daily traffic was tained from the PANAMA system, a traffic monitoring platform provided by ANAS As better illustrated in Section Classification Modelling, the above-mentioned data w classified into seven independent variables (i.e., the factors affecting the number of veh involved in the crash), including four qualitative variables, namely daylight (DL), the typ crash (TC), day of the week (W), and location (LO), and three quantitative variables, nam the speed limit (SL), average speed (AS), and annual average daily traffic (AADT).

Modelling
The main objective of the current research is to explore the variables that influe the level of road safety in rural regions through the implementation of binary classifica modeling techniques.To accomplish this aim, the study utilized a developed classifica model, and the NVIC was assessed using the LVQ2.1 approach, as previously stated.
In binary classification modeling, the confusion matrix provides the most usefu curacy and error measurements for evaluating performance [40].As shown graphicall Figure 3 and mathematically in Equations ( 3) and ( 4), the confusion matrix is used to cilitate model comparison.Data normalization is essential in data-driven system mo ing methodologies because the investigated parameters have different ranges and m urement scales.Data that have not been normalized may produce inaccuracies in the culation due to issues of a greater scale.Therefore, in this study, every piece of data normalized using the min-max technique before being included in a model to elimi the possibility of such outliers [40].To commence the modeling process, the initial step involved preparing the data Following a thorough examination of the available data, the seven known parame

Classification Modelling
To commence the modeling process, the initial step involved preparing the dataset.Following a thorough examination of the available data, the seven known parameters were categorized into four distinct data groups.The values and characteristics associated with each collision, which influenced the NVIC, were identified as inputs for modeling (independent variables).These encompassed four qualitative variables, namely daylight (DL), the type of crash (TC), day of the week (W), and location (LO), as well as three quantitative variables, including the speed limit (SL), average speed (AS), and annual average daily traffic (AADT).The aforementioned variables were classified and are presented in Table 1.It is worth mentioning that the evaluation of the NVIC involved considering the first-labeled "1" class in crashes where only one vehicle was involved.Incidents involving multiple vehicles (as identified by the "2" designation) were categorized into the second class.This categorization was based on the underlying assumption that the minimum NVIC is the most critical factor in determining differences between the classes.The next phase, after the collection and preparation of the dataset, was to set the algorithm's governing parameters.The effectiveness of the algorithm and the rate of convergence may be greatly improved by adjusting these parameters.In most cases, there are no standard methods for establishing such limits.Instead, experts rely on their knowledge, experience, and data type to estimate a parameter range [41].Models with varying degrees of accuracy and error rates are created using these factors.The strategic fusion of data-driven techniques and expert judgment may result in more trustworthy and effective models.
The modeling process involved creating a mapping between the input and output data, which was then utilized to design and construct an optimal classification model that could accurately identify the appropriate classes.The primary objective of the model was to achieve the highest possible accuracy.A selection of the governing variables and their corresponding intervals encompassed an epoch quantity of 5, 10, 20, 30, or 50, along with the number of neurons in the hidden layer (NNHL) being regarded as 10, 20, 30, or 40.Furthermore, from the aggregate 564 datasets, a 70% portion (395) was allocated for model training, a 10% segment (56) was used for validation purposes, and the residual 20% (113) was used for testing the model.The determination of these proportions was influenced by insights derived from prior research in the domain of neural network prognostication [42].Table 2 displays the outcomes of a total of 20 models that were constructed and evaluated.Upon constructing various models and determining their accuracy scores for both training and testing, a straightforward technique recommended by Zorlu et al. [43] was employed to rank all of the models.The resulting rankings are presented in Table 3. Table 2 shows that the configurations of LVQ 2.1 models affect their performance greatly, especially with regard to the number of epochs and the NNHL.The training accuracy ratings fall between 64.3% and 82.5%, and the testing accuracy falls between 61.9% and 82.3%.These variants draw attention to how different models' efficacy depend on their setups.Model 15 (30 epochs, 30 NNHL) ranks highest, with a training accuracy of 82.5% and a testing accuracy of 82.3%.Its great accuracy on both the training and testing data points to a well-tuned model that successfully strikes a mix of generalizing and complexity.Model 11 (20 epochs, 30 NNHL), with accuracy values of 82.3% for training and 81% for testing, also shows really excellent performance.This model also shows strong generalizing capabilities.Models 3,4,11,15,and 16 show quite high and consistent accuracy for both training and testing, indicating that these configurations are less prone to overfitting and generalize well.Models with a high training accuracy but greatly reduced testing accuracy-such as Model 6 (71.9% training against 61.9% testing)-may be overfitting the training data.Achieving 80.8% training and 75% testing accuracy, Model 10 (20 epochs, 20 NNHL) is among the better-performing models.It slightly underperforms compared to the top models but remains effective.Models with higher NNHL values tend to achieve better accuracy, but they also require careful tuning to avoid overfitting, as seen in models with significant accuracy drops between training and testing.
Table 3 shows the twenty models' training and testing accuracy-based rankings.By examining their accuracy in both the training and testing stages, Table 3's outcomes help determine which models generally excel.Given this, a higher-ranking value denotes improved performance.Table 3' This study emphasizes the need for model complexity and a balanced performance in both the training and testing phases to ensure optimal model selection and implementation.
Additionally, the confusion matrices for the training, validation, testing, and total datasets can be found in Figure 4a-d.
In the context of classification problems, the utilization of the receiver operating characteristic (ROC) curve is an essential component in analyzing the outcomes due to its probability-based nature.Also, the assessment of the developed binary classification model's performance is accomplished through the calculation of the area under the curve (AUC), which ranges from 0 to 1.It is noteworthy that an AUC value of 0.5 or less indicates inadequate performance by the developed model, while values greater than 0.5 are observed for the train, test, and total ROC curve, indicating acceptable model performance.Consequently, the ROC curve was employed to assess the outcomes produced by the 16th model, and the results for training, testing, and all data based on the ROC curve are presented in Figure 5a-d.It is important to note that a threshold of 0.5 was utilized, which is a commonly accepted value in this scenario.Based on the performance of the 16th model, which outperformed the other developed models, the area under the curve (AUC) for the 16th model is notably greater than the AUC values for the other developed models.

Validation and Discussion
Various input factors' effects on the NVIC were analyzed using a sensitivity study.The best LVQ model was then utilized for predicting the output, and the degree of correlation between the input data and the predicted result was assessed.For further sensitivity analysis, the cosine amplitude approach (Equation ( 5)) was used.Here, n signifies the total number of data points, while r ij stands for the correlation strength between them.Both the input parameters x ik and the projected values y ij are represented by symbols.
Based on Equation ( 5) as well as the results obtained from the best-developed model of LVQ 2.1 (15th model), a sensitivity analysis was performed, and its results were compared with the previous study.To validate the LVQ 2.1 model, a comparison was made using the results of past studies.The prior investigations used two machine learning techniques, namely GMDH and GOA-SVM.It should be mentioned that some brief information about the classification models used in past studies is given.The ideal design of GMDH models greatly influences their remarkable performance.Therefore, a fundamental problem is the exact determination of the GMDH model control parameters.Combining GOA and SVM creates a prediction model.Several SVM parameters using the GOA technique were optimized to ensure the best performance of the SVM model.Finally, after the modeling process, the best GMDH model has an MNL, MNNL, and SP equal to 20, 50, and 0.5, respectively.Furthermore, the optimum control parameter of the best GOA-SVM model containing Grasshoppers' populations equal to 40, k-fold equal to 3, and Gamma (γ) of the RBF kernel was 6.17.For more information, it is recommended that one refers to the study of Guido et al. [41].The results obtained from this comparison are shown in Figure 6. Figure 6 shows the alignment of all models in determining the same results.Although the values of the degree of correlation were different in different models, the answers were finally the same.Based on the results, TC (type of crash) and AS (average speed), respectively, had the greatest impact on the number of vehicles involved in a crash.Also, LO (location) showed the least impact on NVIC in all three models.Multiple independent models confirm that this consistency points to a strong fundamental link between these variables and NVIC.This homogeneity also helps to support the conclusion concerning LO's small impact on NVIC prediction.Additionally, the GMDH and GOA-SVM models show acceptable training accuracy, that is, 83.2% and 84.6%, respectively.These equivalent degrees of accuracy suggest that all three models can efficiently learn from the data.The test accuracy is a major indicator Although the models agree on the factor rankings, their degrees of correlation differ.The LVQ 2.1 model, for instance, provides a correlation coefficient of 0.93 for TC, whereas GMDH shows 0.85 and GOA-SVM shows 0.87.Though small, these variances draw attention to the minute changes in sensitivity each model records.It is worth mentioning that the y-axis values in Figure 6, which show the degree of correlation, are notable because they emphasize the most and least important elements in forecasting the quantity of cars engaged in crashes.This knowledge is required to validate the model and guide sensible efforts to improve road safety.
Also, in another comparison, we compared the performance of the LVQ 2.1 model with previous research models in terms of its accuracy on the training and testing data [41].The results are shown in Figure 7. Based on the obtained results, it is clear that the performance of the LVQ 2.1 model is acceptable, and there is not much difference in accuracy between the GMDH and GOA-SVM models.However, an important point that should be mentioned here and one of the most important strengths of this study is that although there was no great difference between the accuracy of the LVQ 2.1 model and other models in the past literature, the modeling process and development of the model were easier, and the number of parameters that needed to be adjusted in the LVQ 2.1 model is less compared to other models, which enables users to develop the model more easily.Additionally, the GMDH and GOA-SVM models show acceptable training accuracy, that is, 83.2% and 84.6%, respectively.These equivalent degrees of accuracy suggest that all three models can efficiently learn from the data.The test accuracy is a major indicator of the model's generalizability to new, unprocessed data.The LVQ 2.1 model has a testing accuracy that is close to its training accuracy, which is 82.3%, as well as a high generalizing capacity.The GMDH and GOA-SVM models also show similar testing accuracies, like 81.6% and 83.4%, indicating that these models, too, generalize well to new data.
The small variations in the training and testing accuracies for every model show that they do not overfit the training data.A typical problem wherein a model performs well on training data but poorly on testing data is overfitting.The uniformity of the accuracy levels points to the considerable avoidance of this issue by all three models.As mentioned before, the evaluation of crash severity is a crucial part of the road safety process in transportation engineering.Nevertheless, the increase in crash severity is one of the undesirable effects of the increase in the number of vehicles involved in a crash.Therefore, an accurate prediction of the NVIC can be useful in minimizing the level of crash severity in road transportation.Based on the results, it can be inferred that the TC exerts a significant influence on the NVIC.Various factors, such as inadequate traffic Additionally, the GMDH and GOA-SVM models show acceptable training accuracy, that is, 83.2% and 84.6%, respectively.These equivalent degrees of accuracy suggest that all three models can efficiently learn from the data.The test accuracy is a major indicator of the model's generalizability to new, unprocessed data.The LVQ 2.1 model has a testing accuracy that is close to its training accuracy, which is 82.3%, as well as a high generalizing capacity.The GMDH and GOA-SVM models also show similar testing accuracies, like 81.6% and 83.4%, indicating that these models, too, generalize well to new data.
The small variations in the training and testing accuracies for every model show that they do not overfit the training data.A typical problem wherein a model performs well on training data but poorly on testing data is overfitting.The uniformity of the accuracy levels points to the considerable avoidance of this issue by all three models.
As mentioned before, the evaluation of crash severity is a crucial part of the road safety process in transportation engineering.Nevertheless, the increase in crash severity is one of the undesirable effects of the increase in the number of vehicles involved in a crash.Therefore, an accurate prediction of the NVIC can be useful in minimizing the level of crash severity in road transportation.Based on the results, it can be inferred that the TC exerts a significant influence on the NVIC.Various factors, such as inadequate traffic signage and suboptimal road conditions, may contribute to specific categories of vehicular incidents.Head-on collisions are often caused by driver inattention to road signs or insufficient lighting, resulting in poor visibility.Likewise, following too closely, driving while distracted, or sudden deceleration caused by unfavorable road conditions may lead to the occurrence of rear-end collisions.The severity of the crash also plays a role in the number of vehicles involved, with more severe accidents involving a greater number of vehicles.For instance, accidents involving trucks or buses can have a severe impact due to their size and weight, causing damage to multiple vehicles [44][45][46].It is also worth mentioning that TC is in the crash characteristic category.
AS and AADT were, respectively, the most influential parameters affecting the NVIC.Both of these factors are in the traffic flow characteristics category.In summary, it can be inferred that the occurrence of road accidents in the rural area of Cosenza is attributable to a confluence of factors, including human conduct, vehicular attributes, and road infrastructure.In order to mitigate the incidence of road accidents on rural routes in Cosenza, a multifaceted approach is necessary, encompassing enhancements to road infrastructure, heightened public consciousness of safe driving protocols, and the rigorous enforcement of traffic regulations.Through the implementation of these measures, it is feasible to enhance road safety and mitigate the incidence of vehicular mishaps in the rural region of Cosenza.
In the framework of this particular research, the fact that the impact of LO (location) among the input parameters areis lower than those that of other parameters reflect shows that the geographical location has less influence on the number of cars engaged in collisions.This might be the result of the particular circumstances on Calabrian rural roads.Knowing LO conditions might assist in the refinement of models and enable them to concentrate on the most important factors involved in increasing road safety in future plans for southern Italian road network development.
It is imperative to acknowledge that the LVQ 2.1 algorithm, while possessing the potential for utilization in classification analysis and providing a dependable method for forecasting NVIC, is not without certain constraints.One of the most significant among these is the inability of the algorithm to process incomplete datasets.Furthermore, it is essential to recognize that the specific model developed through the application of LVQ 2.1 in this study is not directly transferable to alternative case studies due to the distinct nature of the structures involved.Therefore, it is suggested that this classification framework is used in future research in other regions, and that the input parameters are changed based on the data available from other regions, with their results compared with the results of this research.

Conclusions
Road safety, defined as the absence of crashes that result in injuries or property damage, is an essential part of transportation engineering.The costs and risks of not paying attention to the road safety that is necessary may be high and long-lasting.Therefore, a solid understanding of road safety is crucial.One of the major parameters for evaluating the severity of a crash is the NVIC.In order to estimate the number of vehicles that will be involved in a crash, this study used a classification-based approach.In this study, from the accessible and available data, seven parameters from four data categories were used.Then, the predictive model was built based on the LVQ 2.1 algorithm, and 564 valuable datasets from rural road crashes in Calabria were used.The accuracy of the results obtained from the developed model was acceptable, and it showed that it can be considered a classification prediction model with acceptable accuracy in issues related to road safety.This indicates that the developed model of the LVQ 2.1 algorithm produced outcomes of about 82.5% and 82.3% in the training and testing models, respectively.Also, a sensitivity analysis was performed on the predicted results.The results of this sensitivity analysis were compared with the previous literature.This analysis confirmed the findings of prior research by showing that the TC and LO had the greatest and least influence on the rate of cars engaged in crashes, respectively.Also, this study's results backed up previous studies in this field by highlighting the importance of human behavior in crash causation.Therefore, it is recommended that groups concerned with road safety not only work to improve rural road conditions, but also create a complete strategy for raising awareness and assessing drivers' abilities.Subsequent investigations may benefit from utilizing deep learning algorithms, which possess significant capabilities regarding the construction of models and the analysis of complex datasets.Furthermore, it is recommended that a comprehensive dataset comprising diverse variables that could potentially impact the frequency of vehicular accidents be considered in future studies.

Figure 1 .
Figure 1.An overview of the LVQ network.

Figure 1 .
Figure 1.An overview of the LVQ network.

Figure 2 .
Figure 2. Rural road accident map in the province of Cosenza (Italy) for the years 2017 and 2018.

Figure 2 .
Figure 2. Rural road accident map in the province of Cosenza (Italy) for the years 2017 and 2018.

Figure 3 .
Figure 3.The simplest possible form of a confusion matrix.

Figure 3 .
Figure 3.The simplest possible form of a confusion matrix.
s ranking values clearly show the model performance; higher rankings indicate better results.The top-ranked models are Model 15 and Model 11, which have remarkable accuracy and excellent generalization.Model 15 (30 epochs, 30 NNHL) performs well, with great accuracy in both the training and testing stages.Its design enables it to learn and generalize from the data with efficiency.This model was able to correctly classify 81.4% of all data.A strong contender for dependable predictions, Model 11 (20 epochs, 30 NNHL) routinely rates well in both training and testing.Model 1 (5 epochs, 10 NNHL) exhibits poor performance in both training and testing.Similarly low-ranked, Model 2 (5 epochs, 20 NNHL) performs badly in each phase, suggesting that either more training or some other setup is required.

Figure 4 .
Figure 4.The confusion matrix's results for the 15th developed model regarding training (a), validation (b), testing (c), and the total dataset (d).

Figure 5 .
Figure 5.The ROC curve's results for the 15th developed model regarding training (a), validation (b), testing (c), and the total dataset (d).

Figure 4 .
Figure 4.The confusion matrix's results for the 15th developed model regarding training (a), validation (b), testing (c), and the total dataset (d).

Figure 4 .
Figure 4.The confusion matrix's results for the 15th developed model regarding training (a), validation (b), testing (c), and the total dataset (d).

Figure 5 .
Figure 5.The ROC curve's results for the 15th developed model regarding training (a), validation (b), testing (c), and the total dataset (d).

Figure 5 .
Figure 5.The ROC curve's results for the 15th developed model regarding training (a), validation (b), testing (c), and the total dataset (d).

AI 2024, 5 ,Figure 6 .
Figure 6.A comparison of the LVQ model's sensitivity analysis results with previous studies.The LVQ 2.1 model's high training accuracy indicates that it efficiently captures trends in the training data.For example, if the LVQ 2.1 model has a training accuracy of 82.5%, it means that for 82.5% of the training data, the model correctly forecasts the NVIC.Additionally, the GMDH and GOA-SVM models show acceptable training accuracy, that is, 83.2% and 84.6%, respectively.These equivalent degrees of accuracy suggest that all three models can efficiently learn from the data.The test accuracy is a major indicator

Figure 6 .
Figure 6.A comparison of the LVQ model's sensitivity analysis results with previous studies.

Figure 6 .
Figure 6.A comparison of the LVQ model's sensitivity analysis results with previous studies.The LVQ 2.1 model's high training accuracy indicates that it efficiently captures trends in the training data.For example, if the LVQ 2.1 model has a training accuracy of 82.5%, it means that for 82.5% of the training data, the model correctly forecasts the NVIC.Additionally, the GMDH and GOA-SVM models show acceptable training accuracy, that is, 83.2% and 84.6%, respectively.These equivalent degrees of accuracy suggest that all three models can efficiently learn from the data.The test accuracy is a major indicator of the model's generalizability to new, unprocessed data.The LVQ 2.1 model has a testing accuracy that is close to its training accuracy, which is 82.3%, as well as a high generalizing capacity.The GMDH and GOA-SVM models also show similar testing accuracies, like 81.6% and 83.4%, indicating that these models, too, generalize well to new data.The small variations in the training and testing accuracies for every model show that they do not overfit the training data.A typical problem wherein a model performs well on training data but poorly on testing data is overfitting.The uniformity of the accuracy levels points to the considerable avoidance of this issue by all three models.

Figure 7 .
Figure 7.Comparison between the accuracy results of LVQ2.1 model and prior research.

Figure 7 .
Figure 7.Comparison between the accuracy results of LVQ 2.1 model and prior research.The LVQ 2.1 model's high training accuracy indicates that it efficiently captures trends in the training data.For example, if the LVQ 2.1 model has a training accuracy of 82.5%, it means that for 82.5% of the training data, the model correctly forecasts the NVIC.Additionally, the GMDH and GOA-SVM models show acceptable training accuracy, that is, 83.2% and 84.6%, respectively.These equivalent degrees of accuracy suggest that all three models can efficiently learn from the data.The test accuracy is a major indicator of the model's generalizability to new, unprocessed data.The LVQ 2.1 model has a testing accuracy that is close to its training accuracy, which is 82.3%, as well as a high generalizing capacity.The GMDH and GOA-SVM models also show similar testing accuracies, like 81.6% and 83.4%, indicating that these models, too, generalize well to new data.The small variations in the training and testing accuracies for every model show that they do not overfit the training data.A typical problem wherein a model performs well on training data but poorly on testing data is overfitting.The uniformity of the accuracy levels points to the considerable avoidance of this issue by all three models.As mentioned before, the evaluation of crash severity is a crucial part of the road safety process in transportation engineering.Nevertheless, the increase in crash severity is one of the undesirable effects of the increase in the number of vehicles involved in a crash.Therefore, an accurate prediction of the NVIC can be useful in minimizing the

Table 1 .
Quantitative and qualitative factors serving as independent variables.

Table 2 .
The models' accuracy in training and testing with different controls parameters.

Table 3 .
Ranking of models based on their accuracy in training and testing.