The variability of urban safety performance functions for different road elements: an Italian case study

Urban safety performance functions are used to predict crash frequencies, mostly based on Negative Binomial (NB) count models. They could be differentiated for considering homogeneous subsets of segments/intersections and different predictors. The main research questions concerned: a) finding the best possible subsets for segments and intersections for safety modelling, by discussing the related problems and inquiring into the variability of predictors within the subsets; b) comparing the modelling results with the existing literature to highlight common trends and/or main differences; c) assessing the importance of additional crash predictors, besides traditional variables. In the context of a National research project, traffic volumes, geometric, control and additional variables were collected for road segments and intersections in the City of Bari, Italy, with 1500 fatal+injury related crashes (2012–2016). Six NB models were developed for: one/two-way homogeneous segments, three/four-legged, signalized/unsignalized intersections. Crash predictors greatly vary within the different subsets considered. The effect of vertical signs on minor roads/driveways, critical sight distance, cycle crossings, pavement/markings maintenance was specifically discussed. Some common trends but also differences in both types and effect of crash predictors were found by comparing results with literature. The disaggregation of urban crash prediction models by considering different subsets of segments and intersections helps in revealing the specific influence of some predictors. Local characteristics may influence the relationships between well-established crash predictors and crash frequencies. A significant part of the urban crash frequency variability remains unexplained, thus encouraging research on this topic.


Introduction
The use of Safety Performance Functions (SPFs) is crucial for road safety purposes. Several functions were developed for rural and urban roads [1, 12-14, 23, 26]. Few of these studies were conducted in Europe, especially for urban areas. While functions may be calibrated for being applied in other countries/regions [1,10], their transferability is not without issues [9,24].
Different aspects related to driving behaviour, cultural, geographic variables [1] may affect the model transferability. Transferability issues may be solved by applying a locally derived calibration factor. However, the effect of some variables (e.g., traffic volumes, geometric characteristics) may depend on the geographic context thus, a single calibration factor may not solve transferability issues [4,10]. In fact, the reliability and variability of calibration factors with geographic and road-related variables should be studied in detail (see e.g., [15]). Another option consists in estimating local SPFs, which may be particularly important in countries where they are scarcely used [5,6]or where the transferability of foreign models have been shown to be questionable (such as transferring HSM SPFs to the Italian urban environment, see [2] Some European urban predictive models were developed, e.g., for Danish arterial segments and intersections [13], Portuguese intersections [12]; Italian segments and intersections [3,8]. Some other studies were focused on specific crashes, such as vulnerable users (e.g., [16]). However, some of these studies are old, limited to specific road elements (e.g., roundabouts, segments, or intersections), and/or the considered predictors were limited. In parallel, some other recent studies were focused on developing macro-level SPFs [18,22], including high-level variables, not specifically related to segments and intersections.

Research questions
Given the presented background, this study is based on the following research questions, which are intended to contribute to the existing body of research: systematically explore the crash performances of both urban segments and intersections, with the related influential variables thus searching for the best subsets of segments and intersections with homogeneous characteristics for modelling purposes, among different possibilities. Compare the significant predictors highlighted in the modelling stages with the significant crash predictors retrieved in previous research, to reveal specific local differences which may be of interest for further studies. Explore the influence of several other potential crash predictors, which are usually not considered in safety prediction studies, besides the traditional geometric and traffic control variables used in previous research.
Note that the article is not focused on assessing the optimal model and functional form for urban safety predictions, since the above reported research questions are explored in the context of the application of NB count models, which are best practice for urban safety predictions (e.g. [12,23,26]). For this reason, it is important to stress the exploratory and research purpose of this study, specifically with regard to the possible variability of predictors with the different subsets of segments and intersections, by highlighting similarities and differences between the type and the effect of predictors across the different subsets of road elements. The purpose is corroborated by the few evidences of this type of analysis in previous research, especially considering the European context.

Main dataset
In the context of the Pa.S.S.S. (Scientific Park for Road Safety) National research project (main agency: City of Bari, granted by the Italian Ministry of Transport and Infrastructures), the City of Bari (Italy) was chosen for data collection.
Fatal and injury crash data were collected in the period: 2012-2016. 1 They are crashes provided with generic information (e.g., date, hour), exact localization, information about vehicles and persons involved, crash type and circumstances, road-related variables.
Available traffic data from the City of Bari were coupled with crash data on the main interconnected urban network within the considered urban area (see Fig. 1). After, weekday peak hour traffic counts were manually conducted (during 2018-2019, then converted into average daily volumes) to fill gaps in data obtained and to check for inconsistencies due to old traffic volumes and new roads openings. Traffic volumes were assumed as constant in the period: 2012-2019, coherently with average traffic volume trends in Southern Italy.

Samples of sites
The selected network was further divided into segments and intersections. Crashes were then linked to each segment and intersection identified for this research. To verify the exact location of crashes on segments and intersections, each crash was preliminarily localized on the map based on both its geographic coordinates and the textual description of the segment or intersection in which it occurred, reported in the dataset. After, the following information present in the dataset were further analysed: crash location coded as road segment or intersection; crash type; crash circumstances, such as the regular/illegal manoeuvre that the driver was undertaking. That information was manually matched with the geographic coordinates, the presence of stop/yield lines or zebra crossings and the descriptions of segments and intersections, to determine, one crash at the time, the appropriate location of the crash. This approach was also deemed to reduce the location bias in segments and intersections, particularly related to the distance between the initial crash event and when it comes to rest (e.g., two vehicles hitting each other and one of the vehicles ends up 50 ft. down the road on the sidewalk). However, the issue related to the location during the crash process is more critical for collision types, such as run-off-theroad and sideswipe crashes, which were not frequent in the dataset analysed. Manual data explorations were preferred to predefined distance-based thresholds since they could present some arbitrariness and they may depend on the specific local context. Traffic volumes were divided into volumes on the main and the secondary entering roads. Segments were divided into "homogeneous" segments ( Fig. 2), by considering internal geometric or traffic control differences. Four hundred fortyseven road "sites" were initially investigated: 325 homogeneous segments and 122 intersections. The sample sizes for the various models were considered good enough for this study (given the study objectives), although the dispersion parameter of the NB models could be mis-estimated [19,21].

Crash predictors
Several crash predictors were considered; most of them derived from ad-hoc inspections and/or online sources. For the sake of comparison with similar European models (taken from a previous literature review: [6]), the main variables considered in Greibe [13] and Gomes et al. [12] were used. Lengths, speed limits, paved widths, minor roads/driveways, parking, land-use were collected for segments; while the number/width of intersecting road lanes, medians, turning lanes, number of one-way legs were collected for intersections. Other variables were considered such as sidewalks [3], vertical signs on minor roads/driveways, maintenance of pavements (i.e., pavements in poor conditions such as those shown in the examples of Fig. 3, or otherwise good, if those issues are absent) and markings (visually inspected), cycle  paths, bus stops, reserved lanes, critical sight distance at intersections. The "critical" sight distance is considered in this article as the minimum available sight distance measured on all the intersecting legs of a road intersection, considering the obstacles on the roadside. The selected variables and associated descriptive statistics are listed in Table 1.

Data analysis techniques
Negative Binomial (NB) count data models were used to link crash frequencies to predictors. These models can account for the over-dispersion of crash data [20,21] and they were used in similar studies (e.g., [12,23,26]). NB models were estimated in R (Mass library: [25] Where: AADT = Annual Average Daily Traffic for segments; AADT maj = AADT for the major intersecting road (carrying the highest amount of traffic); AADT min = AADT for the minor intersecting road (carrying the lowest amount of traffic). Note that attempts at estimating separate coefficients for the major and minor traffic volumes were made, which however indicates the functional form in Eq. 2 as the most appropriate for the dataset; L = segment length (m); X i,S = other predictors for segments (numerical or categorical, in case of categorical variables they are transformed into binary dummy variables with modalities 0 and 1, 0: reference modality); X i,I = other predictors for intersections (numerical or categorical, in case of categorical variables they are transformed into binary dummy variables with modalities 0 and 1, 0: reference modality); β i,S = estimate of the coefficients associated to each crash predictor for segments through maximum likelihood estimation (β 0,S is the estimate for the intercept).
β i,I = estimate of the coefficients associated to each crash predictor for intersections through maximum likelihood estimation (β 0,I is the estimate for the intercept).
One of the research questions concerned the most appropriate way of disaggregating segments and intersections into subsets. Hence: preliminary models for the whole datasets of segments and intersections were run; two sub-categories for each family of sites (segments and intersections) were selected as based on results from preliminary models; models for each sub-category were run.
Disaggregating the dataset for research purposes results in reducing the initial sample size. The chosen level of significance was then set to p = 0.10, given the exploratory purposes and the limited dataset (similarly to e.g., [12]). Injury severity modelling was not considered due to the scarce number of fatal crashes and the absence of injury scales (e.g., slight/serious/incapacitating) in the dataset. The Akaike Information Criterion (AIC) was used to comparatively assess different models and the Nagelkerke R 2 as a goodness-of-fit measure. In general, for each subset, the model having the least number of all significant variables included among different candidate best fitting models was selected. Results from each model obtained were compared to the corresponding null and full models through likelihood ratio tests.
Median on the secondary road Entering lanes on the main road Note: Five segments were discharged from the initial dataset due to segment length minor than 30 m, which were deemed as irrelevant for safety modelling purposes. Two intersections were discharged from the initial dataset due to possible errors in the counts of traffic volumes, leading to unrealistic data b These intersections include one five-legs intersection 3 Results

Predictive models for segments
A model was firstly developed for the whole dataset of segments.
The main interest in this stage was to understand if the initial dataset can be efficiently differentiated into subsets. Among the different attempts performed, the explanatory power of the variable "Type of lanes" seems to be promising for a one-way/two-way classification. On the other hand, the variable "One-lane" (one or multilane segments) is never found as a statistically significant predictor. Hence, the dataset was divided into: oneway and two-way segments, rather than one-lane and multilane segments. The possible classification: undivided/divided segments was not considered since the segments divided by medians (for all their length) were firstly divided into two one-way segments (one for each direction), since directional traffic counts were generally available.
The model developed for all the segments is reported in Table 2. Crash frequencies increase with traffic volumes, segment length, two-way segments (with respect to one-way segments), presence of vertical signs on intersecting minor roads, parking on both sides (with respect to no parking). Poor pavement maintenance is associated with a decrease in the crash frequency. However, the included predictors can only explain a limited part of crash frequency, as based on the Nagelkerke R 2 value.
When differentiating into one-way and two-way segments, some predictors are confirmed, while others are highlighted as well. For one-way segments, crash frequencies increase with traffic volumes, lengths, number of driveways/minor roads, vertical signs on minor roads/ driveways; while they decrease with poor pavement maintenance. For two-way segments, crash frequencies increase with lengths, traffic volume (not significantly), parking (especially at both sides compared to prohibited parking), visible markings (although marginally significant at the 5% level), vertical signs on minor roads/ driveways. Note that the model which includes traffic volume was selected among other possibilities to avoid further worsening the limited model fit (traffic volume coefficient is significant, p < 0.10, if traffic is the only predictor).

Predictive models for intersections
A model was firstly developed for the whole dataset of intersections.
The main interest in this stage was to understand if the initial dataset can be efficiently differentiated into subsets. In this case, two promising models for the whole intersections were selected (see Table 3). The first model indicates the number of legs as an important explanatory variable. However, when trying to exclude all other possible correlated variables (turning lanes, number of legs, intersection control), the variable signalized/unsignalized assumes a notable importance in the alternative model. Hence, based on this, specific models were developed for two pairs of subsets: three-legged and four-legged intersections, signalized and unsignalized intersections. The consideration of the signalized/ unsignalized subsets can be important for practical use. Another choice was made between considering: the main and the secondary traffic volume (separated) or the total volume and the main-to-total volume ratio. The second alternative has generally led to a better goodness-of-fit.
Based on the overall models for intersections, crash frequencies increase with the total volume, the fourlegged configuration, traffic signals, specialized turning lanes/cycle paths (first model in Table 3), critical sight distance (alternative model in Table 3). Crash frequencies decrease with the main-to-total AADT ratio (thus the more the secondary AADT, the more crash frequencies increase, the main AADT being equal) and the poor pavement maintenance. The predictors can explain the crash frequency better than in the segments case, as based on the Nagelkerke R 2 values.
For three-legged intersections, sight distance, turning lanes and cycle path crossing are confirmed as predictors (similar coefficients). The presence of traffic lights does not seem to be influential (except for traffic lights with dedicated turning lights). Moreover, more entering lanes (main road) results in a decrease of crashes (p < 0.10).
For four-legged intersections, sight distance and turning lanes are confirmed as significant predictors, while bicycle crossings are not. Traffic lights seem not influential, while the poor pavement maintenance is associated to a decrease in crashes.
For signalized intersections (highest R 2 ), four-legged intersections are comparatively less safe than threelegged intersections. As the critical sight distance increases, the crash frequency increases (similarly to three/ four-legged intersections). Specialized turning lanes and poor pavement maintenance are confirmed, namely, with positive and negative coefficients. Bus stops close to signalized intersections are related to crashes decreasing.
For unsignalized intersections, other predictors result in crashes decreasing, besides of those already mentioned: median on the main road (p < 0.10) and sidewalks.

Subsets of road sites and associated predictors
The predictive models for urban segments were stratified into one-way and two-way models, since significant  differences between these two conditions were found (overall model in Table 2), differently than Greibe [13], who did not include the variable one-way/two-way. Two-way segments result as less safe compared to oneway segments, other conditions being equal; while significant differences between one-lane and multilane segments were not highlighted. Moreover, since the variable "Type of lanes" was not included in the disaggregated models, it seems that the organization of one-way segments in one or more lanes does not seem influential on crash risk. However, on average, the sampled one-way roads are about 10 m wide, thus they could be practically two-lane operated, even if they are single-lane roads. In the two-way segments model, significant differences between one lane and more lanes per direction were not highlighted as well. However, in the overall segments model, the multilane two-way segments seem slightly less safe than two-way two-lanes segments (e Typeoflanes3 / e Typeoflanes2 = 1.20), coherently with AASHTO [1]. On the other hand, the predictive models for urban intersections were stratified into signalized/unsignalized models and three-legged/four-legged models. In fact, significant differences were found between both categories. As expected from the high number of conflicts, fourlegged intersections result as comparatively less safe than three-legged intersections, by a factor of e 0.443 = 1.557, which coincides with the results by Khattak et al. [17] for unsignalized intersections. The effect of signals on intersections is less clear, such as in Gomes et al. [12], who developed three/four-legged models, in which the traffic signal variable was insignificant. However, very disaggregated intersection subsets were considered by Canale et al. [3]: three/four-legged no-control/stopcontrolled, four-legged signalized intersections. In this study, differences in predictors by separating unsignalized from signalized intersections were found. Moreover, contrary to expectations, signalized intersections seem comparatively slightly less safe than unsignalized intersections (factor of e 0.268 = 1.307, alternative model), ceteris paribus. Four-legged intersections are comparatively less safe than three-legged intersections especially for unsignalized intersections, as expected. Note also that give-way/stop controlled three-legged intersections seems even less safe than no-control intersections, ceteris paribus. Specialized turning lanes seem consistently negative for safety, coherently with Gomes et al. [12], in case of right turn on the major road. This could be explained by: a) the turning lanes variable being a surrogate measure for total conflicts, b) aberrant driving behaviours causing additional conflicts. Moreover, Canale et al. [3], found mixed results for left/right turning lanes according to the intersection type.
Models for different subsets of urban intersections and segments are extremely useful for identifying predictors which are specifically only related to some subsets. For example, the increasing intersecting minor roads on segments are generally associated to crashes increasing [8,13]. In this study, this variable is significant only for one-way segments, which is an important difference. In fact, one-way roads (especially if wide as in this dataset) may allow high speeds. Moreover, drivers should not care about other vehicles eventually crossing the travel direction from the other lane, such as on two-way undivided roads. Another difference relates to parking, which is generally associated to an increase in crashes with respect to rarely/prohibited parking [13]. In this study, this effect was found only for two-way segments (especially for parking on both sides, as expected due to the increased conflicts). Parking-related conflicts may even be more unexpected than in case of known minor roads (which seem less influential on two-way segments) and then drivers could not react in time. The traffic volume coefficient indicates a slower than linear increasing tendency for both subsets of segments; it is insignificant (close to zero) in case of two-way segments (for which the sample size is very limited). Hence, urban congestion seems more detrimental to the safety of one-way than of two-way segments. Moreover, note that the average segment length is included between 100 m and 200 m. This could explain the slower than linear increasing tendency of crashes with traffic, since several crashes on short segments may be influenced by the presence of intersections in case of high traffic volumes. Note that speed limits, road width and land use were not included in the segment models, differently than in Greibe [13]. However, note that speed limits are almost always equal to 50 km/h and land use is largely homogeneous in the central city area (Fig. 1).
For what concerns intersections, the coefficients estimated for traffic volumes (and main-to-total ratio) are approximately similar between subsets. The relationship between crash frequency and the total traffic volume is      Over-dispersion parameter (theta) = 3.08 (std. error: 0.71) Over-dispersion parameter (theta) = 8.11 (std. error: 6.14)

Interval of continuous predictors
Total less than linear (all coefficients are less than 0.5, except for the unsignalized intersections, for which it is about 0.75). An almost linear relationship was found instead by Giuffrè et al. [11] in a model developed for Italian four-legged unsignalized intersections, in which however traffic volume ratios between secondary and main roads were not included. Main and secondary volumes were separately considered in the NB models developed by Khattak et al. [17], with estimated coefficients generally less than 0.5. In this study, when the main-to-total AADT ratio increases by 10%, crashes decrease by a factor of about 5.1 and 1.7, namely, for three-legged and four-legged intersections, ceteris paribus. Considering the models disaggregated according to traffic signals, crashes decrease with a factor of about 3 and 2.2, namely for signalized and unsignalized intersections. Hence, especially for three-legged and signalized intersections, as long as the secondary traffic gets similar to the main traffic, crashes may increase. This can be clearly explained by the increase in the number of angular conflicts due to similar intersecting traffic flows and this was highlighted as particularly relevant for three-legged intersections. This is in contrast with results obtained by Gomes et al. [12], based on which an opposite tendency for three-legged intersections was noted (even if weaker). In another study [17], the increase in the number of lanes on the secondary road was associated with an increase in crash frequency for unsignalized intersections, which is a similar effect to that observed for the mainto-total AADT ratio in this study. The usually considered predictor: median on the main road is significant (p < 0.10) and positive for safety, but only for unsignalized intersections. This is in line with previous results specific for three-legged intersections [12], in particular stop-controlled [3]. In fact, medians may help in channelizing the traffic flow. Moreover, road markings were previously found as negative for safety for three-legged stop-control intersections [3], such as here for two-way segments. Other predictors such as the lane balance, the number of intersecting one-way legs [12], two-way operated major roads and the intersecting lane widths [3] were not confirmed here. The increase in entering lanes on the main road was weakly associated with a decrease in crash frequency in three-legged intersections only, while an opposite trend was found by Giuffrè et al. [11], even for four-legged intersections.

Assessment of additional variables
Some additional variables, usually not often considered for safety predictions, were considered in this study and some of them were actually included in the models. However, most of those present some unexpected trends, which may seem surprising at a first glance. For example, the presence of vertical signs on driveways/ minor roads seems to be detrimental to safety. However, the number of driveways/minor roads provided with vertical signs in the sample is scarce (Table 1). Hence, this variable could be a surrogate measure for the driveway/ minor road importance (i.e., considering the unlikely event that very-low volume driveways may have vertical signs); indicating that different driveways/minor roads may have variable effects on safety.
Maintenance-related variables (markings/pavement) are also worth to mention. It seems counterintuitive that deteriorated pavements may be positive for safety (oneway segments, four-legged/signalized intersections), as well as well-maintained markings (two-way segments). This could be explained by drivers being more cautious and driving at lower speeds on poor maintained pavements. However, a temporal displacement exists between the visual observations (mainly during 2018) and the crashes observation period (2012-2016). Hence, it is pavements (and markings) in good conditions in 2018 could have been resurfaced in the last years and vice versa. Thus, the estimated coefficient could also hide an opposite trend. However, the first explanation (prudent drivers) could be more likely. In fact, an exploration analysis of the dataset revealed that the poor pavement conditions are more frequent, on average, on segments and intersections with higher traffic volumes than segments and intersections showing good pavement conditions. This relationship is expected since the presence of cracking, potholes or similar damages is more likely to be observed on high traffic roads. However, this also means that, even if for this road type the resurfacing can be more frequent (i.e., between 5 and 10 years), it is more likely to observe some pavement damages after a short period after resurfacing. Hence, it is likely that most roads with poor pavement conditions were not in optimal conditions in the crash observation period as well. However, this phenomenon should be further considered in dedicated studies on the topic, given the revealed importance of pavement conditions in urban crash prediction.
The computed critical sight distance (the least value among all the intersecting legs) needs cautious interpretations as well. It was included in the models for three, four-legged and signalized intersections with similar positive coefficients. This can be explained by less cautious drivers (e.g., prone to speeding, see [4]) when having more available sight distance, especially at signalized intersections. In fact, while sight distance is an important design pre-requisite; a longer sight distance could have led in these cases to a false sense of increased safety and possible aberrant behaviours, in specific conditions such as running the red light. Manual explorations of crash circumstances at signalized intersections in the dataset seem to confirm this possibility.
Moreover, all the segments/intersections in the dataset were not originally designed with cycle paths/crossings, which were only recently implemented. Hence, having found that bicycle crossings are associated with crash increasing for three-legged and unsignalized intersections may indicate that such conflicts should be mitigated (especially at unsignalized intersections) e.g., by effective traffic calming measures, which are generally not present in the network studied. Moreover, bus stops close to signalized intersections seem to be positive for safety. This could be explained by drivers being forced to slow down for the combined presence of bus stops and intersections.
In general, the additional considered variables which were included in most of the developed models and which are dependent on ad-hoc geometric constructions or onsite investigations are: the critical sight distance and the pavement conditions assessment. Given that the data preparation process was particularly time-consuming for these variables, an assessment of their importance in prediction was made, potentially useful for further research and practice. Hence, additional likelihood ratio tests (LRT tests) and other qualitative assessments of the importance of predictors were carried out for these two variables.
In particular, a likelihood ratio test (LRT) was conducted to determine if the inclusion of each of these two variables significantly improves the prediction. All the final models selected for presentation in the article, which include critical sight distance and/or pavement condition were individually compared, step by step, with the same models deprived of one of these two variables, where present. In all cases, the inclusion of pavement and visibility-related variables leads to a prediction improvement at the 5% level of significance (except for the inclusion of sight distance in the signalized intersection model, significant at the 10% level). Moreover, an additional qualitative analysis was carried out, by analysing the decrease in the Nagelkerke R-squared caused by the removal of each variable included in the finally selected model presented in the article. Through this analysis, it was possible to note that the pavement conditions may explain a significant portion of crash variability, in particular for signalized intersections and segments. Instead, the portion of crash variability explained by the critical sight distance seems limited in all models. Hence, both variables could be considered for further safety prediction studies in urban environment, in particular pavement conditions.

Conclusions
Safety performance functions for urban segments and intersections were estimated. The research aims of this study were dedicated to: a) explore possible subsets of segments and intersections for crash modelling, considering the predictors variability, b) find common trends and/or significant differences from the relevant literature, c) assess additional predictors often not considered for crash modelling.
The optimal subsets found are: one-way and two-way segments for the homogeneous segments; three-legged, four-legged, unsignalized, signalized intersections for the intersections. The division into three-legged/four-legged intersections seems the most effective, compared to the signalized/unsignalized division. Whereas, significant differences were not highlighted for the number of lanes on one-way and two-way segments. Predictors of intersection crashes share both commonalities and differences with similar studies ( [3,11,17];). Nevertheless, the segment model is largely different than the relevant reference study analysed [13].
Some additional predictors often not included in prediction models were found as statistically significant. The effect of pavements/markings maintenance, critical sight distance at intersections, vertical signs on driveways/ minor roads, cycle path crossings was discussed in detail. Their influence on crash predictions was demonstrated, even if requiring some additional explanations. These variables may be used in further urban safety studies.
The main aim of this article was explore and discuss the variability of predictors with the different subsets of segments and intersections. However, the results shown in this study could be used for safety predictions in the same area in which data were collected and should not be directly used in other jurisdictions. This process should be carefully conducted, especially if the functions estimated for disaggregated subsets are used (mainly developed for explorative purposes and based on relatively small sample sizes). Moreover, the transferability of models in other contexts may provide a challenge such as for all road safety prediction models, given their high dependency on the local road, environmental, policy context and the local driving behaviour. Some similarities were shown with another European (Portuguese) study [12], providing ground for the possible transferability of the intersection models in other Italian/European cities with similar configurations. However, as early stated, calibrating models and/or developing local models should be always preferrable.
Besides practical aspects, this study provides new insights to overcome the problems and consequences of dividing urban intersections and segments into possible subsets and to increase the candidate crash predictors. Clearly, this study is based on a limited number of segments and intersections with a small number of crashes for some subsets (especially intersections), which may negatively influence crash predictions [19]. Moreover, in particular the effect of pavement conditions on urban crash frequency needs further investigations, given the issues described in this article. Further data collection is currently in progress during the research project, which could help to enlarge datasets and to validate models/ variables. Roundabouts were not considered since only a few roundabouts were present during the observation period. In some models, the explained variability of crash frequencies is someway small. Thus, there are several other variables which may be considered. A first attempt to enlarge them was conducted here, but further research is surely needed.