Are collision and crossing course surrogate safety indicators transferable? A probability based approach using extreme value theory

In order to overcome the shortcomings of crash data a number of surrogate measures of safety have been de- veloped and proposed by various researchers. One of the most widely used temporal indicators is time-to-col-lision (TTC) which requires the road users to be on a collision course. Road users that are strictly speaking not on a collision course actually might behave and take evasive actions as if they were, thus indicating that such near-miss situations might also be relevant for safety analysis. Taking that into account, a more ﬂ exible indicator T 2 , which does not require the two vehicles to be on a collision course, describes the expected time for the second road user to arrive at the con ﬂ ict point. Recently extreme value theory (EVT) o ﬀ ering two approaches, block maxima (BM) and Peak over Threshold (POT), has been applied in combination with surrogate indicators to estimate crash probabilities. Most of this research has focused on testing BM and POT as well as validating various surrogate safety indicators by comparing model estimates to actual crash frequencies. The comparison of collision course indicators with indicators including crossing course interactions and their performance using EVT has not been investigated yet. In this study we are seeking answers to under what conditions these indicators perform better and whether they are transferable. Using data gathered at a signalized intersection focusing on left-turning and straight moving vehicle interactions our analysis concluded that the two indicators are transferable with stricter threshold values for T 2 and that POT gives more reasonable results.


Introduction
In this section we provide a literature review on surrogate measures of safety focusing on two particular indicators time-to-collision (TTC) and T 2 , give an introduction to extreme value theory as well as specify the research gap and questions.

Role of surrogate measures of safety
To improve traffic safety and to make sure that it is done in an efficient way, one has to be able to quantify safety in order to support evidence-based policy making. The most plausible way to evaluate safety is investigating the occurrence and severity of crashes using historical data. This approach however has a number of limitations (Tarko et al., 2009): accidents are rare events (Hauer, 1997) associated with the random variation inherent in small numbers (Svensson and Hydén, 2006), at least 3 years of observations are needed (Nicholson, 1985) and thus safety analysis based on crash records is a reactive approach, accident records are prone to underreporting, and finally data quality is not always sufficient.
In order to overcome the above limitations the use of non-crash events have gained a lot of attention especially due to the rapid improvement of sensing technologies facilitating the collection of trajectory data. Already 30 years ago Hydén (1987) pointed out that the interaction between road users can be described as a continuum of safety related events. In Hyden's safety pyramid crashes as the rarest events are followed by conflicts of different levels of severity (serious, slight and potential). These are critical events that do not result in a crash but are very close to that and can be used as surrogate safety measures (Tarko et al., 2009). Below the conflicts the majority of events are undisturbed passages or normal traffic processes (Laureshyn et al., 2010). This pyramid also shows how few and exceptional accidents are that we usually base our safety estimates on Svensson and Hydén (2006).
With regard to the shape of the severity hierarchy Svensson (1998) made an important suggestion that it is not necessarily a pyramid. She proposed a diamond-shape based on the frequency of pedestrian-vehicle conflicts observed at signalized and unsignalized intersections. The idea behind the diamond shape is that at a particular site the majority of the interactions will be of moderate severity. Tarko (2012) also noted that this was the first evidence proving that there is a heterogeneity in the frequency-severity relationship due to the type of road facility influencing traffic conflicts. Other conditions, such as vehicle type, road users, collision angle and speed (Laureshyn et al., 2010) but also weather may affect this relationship.
It has to be noted, however, that Svensson limited the events in the hierarchy only to interactions with a collision course (Svensson, 1998;Svensson and Hydén, 2006). A very important implication of this is that even low-severity interactions should be utilized because they may carry useful safety information (Tarko, 2012). This statement is highly relevant as interactions with severe conflicts usually come with low frequencies.
In an attempt to apply Svensson's reasoning later on a few researchers tried to adapt it by broadening the concept of traditional approaches. In Canada St-Aubin et al. (2015) for instance developed an approach called probabilistic surrogate measures of safety (PSMS) with a more general framework for safety analysis considering all possible paths that may lead two road users to collide. The novelty of this approach is relaxing the traffic conflict by allowing a non-zero risk of collision for road users who are not on a collision course (Tarko, 2012).
Following the same reasoning Laureshyn et al. (2010) suggested a new indicator called T 2 broadening the concept of the most common nearness-to-collision surrogate measure of safety, time-to-collision (TTC). TTC can be calculated for any moment as long as the road users are on a collision course and defined as "the time until a collision between the vehicles would occur if they continued on their present course at the present rates" (Hayward, 1972). The lowest TTC value during the interaction, abbreviated as TTC min , is the most commonly used indicator. The supplementary indicator proposed by Laureshyn et al. (2010) measures the expected time that it takes for the second road user to arrive at the potential collision point, hence it is called T 2 . The logic behind this indicator is that TTC assumes the two road users to be on a collision course, which however sets a limitation to the situations to be considered in safety analysis. Laureshyn et al. (2017) argued that encounters without a collision course might have crash potential as well due to the possibility of minor changes in the spatial or temporal relationship between road users.
T 2 tells more about safety since the arrival at the potential collision point is the very last necessary condition for a collision to occur and it provides a smooth transfer between the collision course and crossing course situations (Laureshyn et al., 2017). T 2 assumes unchanged speeds and planned trajectories. If the road users are on a collision course, T 2 equals TTC. In the event that the two road users pass the conflict point with a time margin, T 2 reflects the maximum time available to take evasive actions and alleviate the severity of the situation. T 2 is no longer calculated after the first road user has left the conflict zone (since the crash is no longer possible) (Laureshyn et al., 2017). T 2 is a similar indicator to TTC in the sense that it is also continuous, therefore can be calculated for any time instance. The last possible value is when the first road user leaves the potential conflict area (the same as post encroachment time -PET). An alternative value is T 2min which shows the moment when the two vehicles are closest in time. These two values can be different in case of significant speed changes.
Researchers testing the validity of traffic conflicts have tried to link historical crash data with conflict frequencies. These analyses lead to inconclusive results, as some studies could confirm a relationship, some could not. Tarek and Sany (1999) for instance arrived at the conclusion that there is statistically significant relationship between crashes and conflicts. They identified a determination coefficient (R 2 ) in the range of 0.70-0.77 at signalized intersections. Notwithstanding, this approach is still hampered by the fact that accident data are inaccurate, thus finding a good correlation has a limited power. Zheng et al. (2014) also emphasized that the application of regression models is limited due to three reasons: • the incorporation of crash counts suffers from the same quality issues as traditional road safety analysis; • the stability of crash-to-surrogate ratio is difficult to ensure especially when mixing surrogates of varied severity levels; • the statistical relationship between counts of crashes and surrogates hardly reflects the physical nature of crash occurrence.
An alternative approach to the traditional regression analysis without using observed crash counts was first proposed by Songchitruksa and Tarko (2006) based on the extreme value theory (EVT). An important feature of the EVT is that it enables the researcher to model the stochastic behavior of unusually large or small processes. This extreme behavior is typically very rare and unobservable within a reasonable data collection time period. It often involves estimating the probability of extreme events over an extended period of time given very short and limited historical data (Songchitruksa and Tarko, 2006). The key assumption of EVT is that the underlying stochastic behavior of the process being modeled is sufficiently smooth to enable extrapolations to unobserved levels (Coles, 2001). A general introduction to EVT can be found in the next subsection.

Extreme value theory
Extreme value theory offers two approaches to sample extreme events, in this case near-crashes, the block maxima (BM) (or minima) using Generalized extreme value distribution (GEV) and the Peak over Threshold (POT) using Generalized Pareto distribution (GPD). In the former case the method divides the sample time into blocks of a certain length and samples the largest value (or r largest values) in each block, whereas in the latter case all peak values are sampled and the values over a certain threshold are used to model the extremes.
EVT models based on the block maxima approach focus on the behavior of where X 1 ,…,X n is a sequence of independent random variables having a common distribution function F, M n represents the maximum of the process over n time units of observation. The distribution of M n can be derived as Pr{M n ⩽ z} = {F(z)} n . The function of F is unknown and to look for F n a similar approach to the central limit theorem can be used, by allowing a linear renormalization of the variable M n (Eq. (2)): where {a n > 0} and {b n } are constants for which the appropriate values have to be found. According to the extremal types theorem where G belongs to one of the three families: Gumbel, Frechet or Weibull. The rescaled sample maxima M* n converge to a variable having a distribution within one of the above three families. All the three types have both a location (b) and a scale (a) parameter. The Frechet and Weibull distributions also have a shape (α) parameter. These distributions can be generalized into a single distribution function (Eq. (4)): A. Borsos, et al. Accident Analysis and Prevention 143 (2020) The three parameters that have been already mentioned before are the location parameter (μ), the scale parameter (σ), and the shape parameter (ξ). The distribution function itself determines the value of the shape parameter and vice versa. If ξ > 0, the model corresponds to a Frechet distribution; if ξ < 0, a Weibull distribution; and if ξ = 0, a Gumbel distribution.
Block maxima is criticized to be a wasteful approach as only the maximum value is used from each block, thus not considering other, but possibly still extreme values. Possible solutions to solve this issue is using the so-called r largest order statistic model (e.g. using the largest 5 observations) or by modeling threshold excesses. The latter one is the Peak over Threshold approach, in which observations over a certain threshold are selected and treated as extremes.
Using the GEV distribution for large enough threshold u, the distribution function of (X − u), conditional on X > u (Eq. (5)), is approximately where u is a high threshold, x > u, scale parameter σ u > 0 (depending on threshold u), and shape parameter − ∞ < ξ < ∞.
The distribution family given in Eq. (5) is called the generalized Pareto family, in other words, threshold excesses have a generalized Pareto distribution (GPD) with two parameters, the shape ξ and the scale σ parameters (using the same notation as in GEV). Just like with GEV, the shape parameter ξ determines the behavior of the GPD. If ξ < 0 the distribution has an upper bound of u − σ/ξ; if ξ > 0 there is no upper limit. If ξ = 0, then Eq. (5) simplifies to an exponential distribution function.

Research gap and questions
Extreme value theory is a promising tool to evaluate safety using surrogate safety measures. Most of the research that has been done so far focused on testing the method and validating various surrogate safety indicators by comparing model estimates to actual crash frequencies (Songchitruksa and Tarko, 2006;Farah and Azevedo, 2015;Zheng et al., 2014;Jonasson and Rootzén, 2014;Cavadas et al., 2017;Åsljung et al., 2016;Wang et al., 2018). However, less or no attention was paid to the comparison of various conflict indicators and their performance using EVT, especially the comparison of collision course indicators with indicators including crossing course interactions as well. In this research we investigated under what conditions these indicators perform better and whether they are transferable. To that end two research questions were formulated as follows: 1. What difference is there between the two indicators TTC min and T 2min when analyzing safety using EVT and are these indicators transferable? 2. Which EVT approach (BM or POT) under what circumstances performs better for TTC min and T 2min (e.g. sensitivity to sample size)?

Data
In this section data collection is briefly described as well as descriptive statistics is provided.

Data collection
A regular signalized intersection with two-phases in Minsk (Belarus) was analyzed (53°54′39. 1″ N; 27°35′44. 4″ E). The intersection was recorded for two days (from 6 AM till 9 PM). The video footages of two cameras set on rooftops were then analyzed in the software T-Analyst (2016) allowing the manual tracking of vehicles as well as the calculation of various surrogate measures of safety such as TTC, T 2 , PET. The dataset was provided by Lund University and has already been used in other publications such as in Laureshyn et al. (2017).
Accident data were gathered for 11 years (1999)(2000)(2001)(2002)(2003)(2004)(2005)(2006)(2007)(2008)(2009)) before the video recordings were made. Altogether 32 accidents were recorded, out of which 5 were due to the collision of left turning and straight going vehicles. The severity of all the recorded accidents were property damage only. As this type of severity is the most heavily prone to underreporting, unfortunately this historical accident dataset can be used for validation with certain limitations. In the course of video recordings no accidents were observed.

Descriptive statistics
Altogether 2749 interactions were detected. A subset of situations involving a vehicle making a left turn in front of an oncoming vehicle from the opposite direction was created (n = 792). Whenever an indicator cannot be calculated the software indicates −1 as entry (e.g. if the two vehicles are not on a collision course there is no value for TTC min ). These entries were not considered when compiling the descriptive statistics. All the statistical analyses were done in R (R Core Team, 2013). Table 1 shows the descriptive statistics for the two indicators TTC min and T 2min .
Cumulative distribution functions of T 2min and TTC min are shown in Fig. 1 for values smaller than 20 s. This figure reveals that the cumulative distribution function for TTC min is less steeper than that of T 2min showing that the observed TTC min values are more spread out and that the share of observations in the lower range (between 0 and 5 s) is smaller than for T 2min .
The underrepresentation of TTC min compared to T 2min is due to the nature of these indicators, as TTC min can be measured only for collision course interactions, whereas T 2min can be measured for both collision as well as crossing course interactions.

Models and results
In this study both EVT approaches are applied to the above presented dataset. As for the block maxima approach each interaction can be considered as a block in which the minimum values of T 2 and TTC are used. In the Peak over Threshold approach a threshold has to be selected over which all the values are considered. In order to study the extreme events in both cases the negated values of observations are used (i.e. the minima instead of maxima).

Block maxima approach
Since the minimum values are determined per interaction for both indicators, they can also be high and therefore irrelevant occurrences (e.g. a TTC min value of 10 seconds cannot be considered as a near-crash, hence an extreme value). Therefore a preliminary step of selecting the  (2020) 105517 near-crash events is needed, which can be considered as "subsampling of maxima" (Jonasson and Rootzén, 2014). Mahmud et al. (2017) gave an overview of minimum and desirable TTC threshold values from a selection of studies for different conditions. As far as signalized intersections are concerned Mahmud et al. (2017) did not indicate any minimum values, however he cited two references (Huang et al., 2013;Sayed et al., 2013) where desired values of 1.6 and 3 s were given.
Taking 3 s as a threshold value for near-crashes would only result in 15 observations for TTC min . Based on what the literature suggests and the observed values near-crashes were selected using a threshold value of 3.5 s for the first run. The above problem does not hold for T 2min thanks to its bigger sample size, but for the sake of comparability the same threshold value was applied for the first run. The results of these two model runs were evaluated in detail and followed by several other runs using different threshold values for near-crash situations. Table 2 gives a summary of the model results of the fitted GEV models.
As for a value of TTC min < 3.5 s the 95% confidence intervals of the shape parameter does not include zero, thus we can accept the Frechet distribution as the shape parameter is greater than zero. Notwithstanding, a greater accuracy for the confidence intervals is usually attained by the profile likelihood, which yielded similar results.
Substituting the model estimates into the GEV function (Eq. (4)) using a given value for z one can calculate its probability. We are interested in the probability of crash occurrence, that is, when TTC min < 0 (z = 0 in Eq. (4)). As for a near-crash value of TTC min < 3.5 s this calculation yields the probability of 0.0733 (1 − G(z)).
Using a given return level z one can also obtain the return period, which is 1/p. This means that the level z is expected to be exceeded on average once every 1/p. If each block corresponds to one year, then the  return period can be interpreted in years; in this particular case each block is an individual near-crash interaction. Using the previously calculated probability of crash occurrence (0.0733) one can calculate the return period, which is 1/0.0733 = 13.65. In other words one out of every 14 near-crash interactions (with a TTC min smaller than 3.5 s) will result in a crash. The analysis revealed that the 3.5 s as a threshold value for nearcrash situations lead to unsatisfactory model results and irrationally high crash probability. This is due to the combined effect of practical as well as statistical reasons. The initially small sample size of TTC min is due to the fact that we are looking at left turning and straight moving vehicle interactions, where in many cases TTC cannot be interpreted due to stopped left-turning vehicles waiting for straight moving ones to pass. From a statistical point of view the small sample size results in unreliable extrapolation and large variance.
For the above reasons several models were tested using different threshold levels for the pre-selection of near-crash situations. The threshold was gradually increased with a 0.5 s increment. Diagnostic plots revealed that by increasing the near-crash threshold the model fit improves gradually, Fig. 2 shows the results using 5 s as a threshold for near-crash situations.
With the help of these diagnostic plotsprobability, quantile, return level and density plotsone can check the goodness of the model. The probability plot is a comparison of the empirical and fitted distribution functions, in the quantile plot their quantiles against each other are plotted. Both can be visually checked, if in both cases the points are sufficiently close to linearity, the model can be accepted. An example of a good fit is shown for TTC min (near-crash threshold < 5 s) in Fig. 2. The density plot is a comparison of the probability density function of the fitted model with the histogram of the data; in this case this plot also shows consistency. The return level plot (return periods vs. return levels) also shows observed values (dots) as well as modelbased estimation (line) along with confidence bounds. What we are interested in is the return period associated with the return level when a temporal indicator is equal to zero. The shape of the curve also indicates the type of distribution, which is Frechet since ξ > 0.
In order to further investigate the probability plot and to compare the fitted and the empirical distributions, a Kolmogorov-Smirnov test was used, of which the null hypothesis is that the sample is drawn from the fitted distribution. As the p-values are greater than 0.05 we cannot reject the null hypothesis that our sample deviates from the GEV distribution.
From Table 2 it can also be seen that as the near-crash threshold increases (resulting in bigger sample size) the shape parameter converges to zero. With 4.5 s threshold the 95% confidence intervals include zero. Setting the shape parameter to zero the Gumbel distribution can be fitted and an analysis of deviance between the two models can reveal whether it is more appropriate (Penalva et al., 2013). The results obtained for 3.5, 4, and 4.5 s showed significant differences between the two models, however, for 5 s there was no significant difference so the Gumbel model with two parameters is a good choice for modeling these data.
As for T 2min further steps in model checking are just the opposite as compared to those of TTC min . As previously noted, for a critical value of near-crash situations the literature actually suggests a lower threshold than 3.5 s, as low as 1.5 s. Thus, it is interesting to check how the model fit and output values change as we gradually decrease the near-crash threshold level. In Table 2 the shape of the distribution changes from a Weibull (ξ < 0) to a Frechet (ξ > 0) as the near-crash threshold levels as well as the sample sizes decrease. Crash probability is gradually increasing by decreasing near-crash thresholds. At a near-crash threshold of 2 s a crash probability of 0.0098 is calculated associated with a return level of 101.96 meaning that one crash would happen out of 102 near-crash interactions. The model fit associated with the 2 s near-crash threshold still gave acceptable results. The Kolmogorov-Smirnov tests gave the same results as for TTC min for all the thresholds.

Peak over Threshold
The POT approach offers a different solution to modeling extreme events. It is necessary to choose a threshold over which extreme events are considered. "It is important to choose a sufficiently high threshold Fig. 2. Diagnostic plots for GEV fit to TTC min (near-crash threshold < 5 s). A. Borsos, et al. Accident Analysis and Prevention 143 (2020) 105517 in order that the theoretical justification applies thereby reducing bias. However, the higher the threshold, the fewer available data remain. Thus, it is important to choose the threshold low enough in order to reduce the variance of the estimates." (Gilleland and Katz, 2016). There are basically two methods for selecting the appropriate threshold: • Mean residual plot: this plot shows the mean of the excesses depending on the value of the chosen threshold level u. Above a certain value the GPD provides a valid approximation to the excess distribution (Coles, 2001). Here a threshold has to be selected where the graph is linear within uncertainty bounds. This is, however, not always straightforward, and based on a subjective choice.
• Model estimation: the model is estimated at a range of threshold values with the intention to find stable model parameters. Again, above a certain level of u the GPD is valid, if estimates of the shape parameter ξ are constant, while estimates of the scale parameter σ is linear in u. This point can be read from the plot by checking linearity, in other words estimates will not change much within uncertainty bounds, as the threshold increases.
As for negated TTC min the lowest threshold where the mean residual plot becomes linear within uncertainty bounds is a value around −4 s (Fig. 3). The parameter estimates against thresholds also show relatively stable results for the selected value (Fig. 4). The GPD model results are given in Table 3. The shape parameter ξ is below zero resulting in a convex return level plot with a finite upper bound. To check the goodness of the model the same diagnostic plots as for the block maxima approach (probability, quantile, return level and density plots) as well as the Kolmogorov-Smirnov tests were used.
The probability of crash occurrence, namely when TTC min is smaller than zero, can be calculated using Eq. (6) substituting the model parameters ξ and σ, as well as the threshold u =−4 and x = 0. This calculation yields a crash probability of 0.00017. The return period associated with TTC min < 0 is 5884.8 (1/0.00017), meaning that one out of 5885 near-crash interactions results in a crash: As for negated T 2min a different threshold was chosen, both the mean residual plot and the plots of parameter estimates against thresholds (these are not presented here) suggest a threshold of −2 s to be used. The crash probability associated with this model (T 2min < 0) is 0.00055 and the return period is 1807.3 (1/0.00055), meaning that one out of 1807 near-crash interactions results in a crash.
The GPD diagnostic plots suggested a reasonable model fit in both cases, also the Kolmogorov-Smirnov tests were not significant and thus we could not reject that the samples deviate from the GPD distribution. However the return level plot showed that as the return period increases the return level confidence bounds tend to be wider for TTC min than for T 2min meaning that the prediction of unobserved extreme values comes with less uncertainty for the latter.

Discussion and limitations
Modeling results are summarized in Table 4. Applying the POT approach seems to give more reasonable results in terms of crash probabilities and return periods, which were in the hundreds with BM but in the thousands with POT. If we accept a few assumptions we can attempt to validate these probabilities. These assumptions are as follows: • the number of interactions used in the analysis (194 for TTC min and 792 for T 2min ) was all the interactions observed in the 2-day period between 6 AM and 9 PM, and no interactions were left out; • the observation period (6 AM-9 PM) is a good representation of the entire day and accidents did not happen outside this time period; • accident data provided are accurate, namely 5 crashes due to the collision of left turning and straight going vehicles in a 11-year period, which is approximately 800 days/accident occurrence (one accident happened in 800 days on average).
Accepting the above assumptions and comparing the model results we can actually state that indeed the POT results are much closer to the actual crash frequency. The POT model for TTC min (245.20 days/accident) gives the best prediction, especially if we accept the assumption that property damage accidents are in general underreported. Validation results also show that T 2min tends to overestimate crash frequencies, e.g. for POT T 2min estimated one crash in every 28 days, which value was more realistic (245) for TTC min .
The above results also illustrate that the near-crash threshold value affecting sample size is a critical issue, especially with the BM approach. Fig. 5 illustrates this by further refining near-crash threshold values. Here the block maxima approach was used and sub-samples were created using near-crash threshold values by using a 0.05 increment. As for T 2min 75 models were fitted for near-crash values ranging from 1.3 to 5 s and for TTC min 31 models were fitted for near-crash values ranging from 3.5 to 5 s. Crash probabilities were calculated for all the models. As the sub-sampling near-crash threshold increases, sample sizes become bigger resulting in better model fits, even though with less pragmatic near-crash thresholds. Fig. 5 also illustrates that for different near-crash thresholds different crash probabilities are predicted for the two indicators. As we are Fig. 3. Mean residual plot for TTC min .
A. Borsos, et al. Accident Analysis and Prevention 143 (2020) 105517 analyzing two indicators of different nature (a collision course (TTC min ) and a crossing course (T 2min ) indicator), it is worth investigating for what near-crash threshold values would they predict similar crash probabilities, in other words whether there is transferability in between them. This would provide further insight into the applicability of collision and crossing course indicators. Based on the results shown in Fig. 5 threshold values were selected for which both indicators yielded the most similar crash probability (i.e. the difference between the predicted crash probabilities was marginal). This plot (Fig. 6) actually describes the relationship between these two indicators, saying under what near-crash thresholds we receive almost the same crash probability. There is some fluctuation in the graph, but the pattern clearly shows that for T 2min lower near-crash thresholds than compared to TTC min would yield the same crash probability (e.g. 5 s for TTC min and 2.5 s for T 2min ). A Pearson correlation test was used to determine how strong the relationship is. This test was highly significant (pvalue = 8.96e−14) and indicated a strong correlation (0.93).
As every research, this one also comes with certain limitations, the most important of those are summarized below.
The basis of the analysis was a dataset collected in Minsk, Belarus. All the data came from one location, thus cross comparison in between different locations was not possible. Using several locations can actually give an added value, Zheng et al. (2018) for instance used 16 merging areas and could also provide a comparison of models across these   Borsos, et al. Accident Analysis and Prevention 143 (2020) 105517 locations. The observation periods at these locations were quite short (56-88 minutes) and the authors admitted that this short time frame can hardly claimed to be representative for a five-year period for which accident data was gathered for validation. Nevertheless, similar to Zheng et al. (2018) the authors argue that "the estimated number of crashes can still provide some reference for model evaluation". In the current research one location with data for two days was used (6 AM-9 PM), which is in some sense more representative, even though estimated results were compared with accident data gathered for an eleven-year period. Besides being representative in terms of the length of time period analyzed it can also be questioned whether the results based on observations in Minsk can be generalized or they are location specific. The transferability of these results to other locations depends on many factors, such as geometric features of intersections (e.g. size, channelization) and road user behavior (e.g. priority giving or surrendering attitude, gap acceptance).
The type of interaction analyzed was exclusively left-turning vs. straight moving vehicle to vehicle interactions. The results gained in this research are therefore restricted and applicable to this interaction type only.
Probably the most important limitation was the uncertainty of validation, which was possible along with certain assumptions. Even if accident data are available one has to be cautious about its accuracy, especially if mostly low severity or property damage accidents are present (underreporting).

Conclusions
As for the two EVT approaches it can be concluded that overall applying the POT approach seemed to give more reliable and pragmatic results. When applying the block maxima approach the selection of near-crash situations as a sub-sampling step proved to be a critical issue. As for TTC min the question was to which level the near-crash threshold should be increased to have a reasonable model fit, whereas for T 2min to whichfrom a traffic a safety point of viewmore reasonable level can we decrease the near-crash threshold in such a way that we still have a good model fit.
In the former case with TTC min increasing the near-crash threshold resulted in better model fits, however from a traffic safety point of view these high thresholds cannot actually be considered as near-crash events. In the latter case with T 2min the threshold value could be further decreased with the disadvantage of slightly less well performing models. Obviously there is a trade-off between a good model fit and reasonable threshold values.
Judging which indicator is better could be done by validation using a proper accident dataset which was unfortunately not at hand for this study, which is a limitation. Notwithstanding, it has to be noted that this is not an exceptional case, and even with available historical data its applicability can be questioned in general (for reasons associated with the drawbacks of accident records already outlined previously). A judgment can only be done by considering the rationalism of crash probabilities, return periods, goodness of the models and by using the available accident data for validation along with a few assumptions. Overall we found that using the POT approach TTC min yielded more pragmatic results with wider confidence intervals (more accurate but less precise), whereas T 2min showed better model fits but overestimated crash probabilities (more precise but less accurate).
Having compared the estimated crash probabilities for different near-crash thresholds and by checking the correlation between them we can conclude that collision and crossing course indicators are transferable. Crash probabilities calculated using EVT showed that one has to be "stricter" against crossing course interactions, as compared to collision course interactions, lower near-crash values would yield similar probabilities. The analysis revealed that for straight moving and left-turning vehicle-vehicle interactions, in comparison with a crossing course indicator T 2min , the limitation in using the collision course indicator TTC min is due to its smaller sample size that can be gathered in a given time period.
A possible step to refine the models is using motion prediction. As the above investigated indicators both assume constant speeds and unchanged paths, which is not realistic, it is worthwhile considering a probabilistic approach to predict trajectories and speeds of interacting vehicles. This approach would result in different values with different probabilities for a single interaction, thus providing an increased sample size for both indicators. St-Aubin et al. (2015) developed an approach called Probabilistic Surrogate Measures of Safety (PSMS) considering all possible paths that may lead to two road users to collide. At the time of writing there are also initiatives at Lund University to apply a probabilistic framework.
Another aspect that of relevant interest is modeling the severity of conflicts using surrogate safety indicators. Even though the above investigated temporal indicators may be used on their own to capture the severity of an interaction for instance by applying a threshold value in case of TTC, they are not sufficient to fully describe the severity of consequences. As Laureshyn et al. (2010) stated there is a need for the time-based indicators to be complemented with some speed-related indicator. To that end a number of researchers (Zheng et al., 2018(Zheng et al., , 2019Jonasson and Rootzén, 2014;Cavadas et al., 2017;Farah and Azevedo, 2017;Wang et al., 2019) have applied bivariate extreme value models. Currently the authors are also working on bivariate models trying to capture the severity dimension of interactions and intend to publish the results in a separate paper.