IDENTIFICATION OF BLACK SPOTS BASED ON RELIABILITY APPROACH

Identifying crash “black-spots”, “hot-spots” or “highrisk” locations is one of the most important and prevalent concerns in traffic safety and various methods have been devised and presented for solving this issue until now. In this paper, a new method based on the reliability analysis is presented to identify black-spots. Reliability analysis has an ordered framework to consider the probabilistic nature of engineering problems, so crashes with their probabilistic nature can be applied. In this study, the application of this new method was compared with the commonly implemented Frequency and Empirical Bayesian methods using simulated data. The results indicated that the traditional methods can lead to an inconsistent prediction due to their inconsideration of the variance of the number of crashes in each site and their dependence on the mean of the data.


INTRODUCTION
Identifying black-spots, ranking and determining the potential safety improvement for each location among a set of sites are the main purposes of studies in traffic safety. Meanwhile, the correction of these sites in order to reduce the number of accidents takes a lot of effort for safety engineers. The first step in the process of road safety improvement is the identification of black-spots. Then, determining the potential improvement for each location and the effect of replacing measures should be studied. The number of crashes occurring in several time intervals is usually compared in this step.
The simplest comparison is to rank the sites on the basis of the mean number of crashes occurring on each location. Besides, various methods which were presented in this regard are discussed in Section 2. In this paper, a comparison of each site with a reference site is made in order to identify the blackspots. It should be noted that the implemented data are taken from Cheng and Washington [1]. Due to the probabilistic nature of crash occurrence, identifying high accident sites is a probabilistic problem. Thus the probabilistic methods should be used to compare the crash counts occurring on each site with those of a reference site. Therefore, in this study, the reliability analysis method was proposed to identify the blackspots. This method has many remarkable advantages because the probabilistic nature of crash occurrence in each location is taken into account.
The next section presents the review of some studies. The theoretical definition of reliability analysis is presented in Section 3. In Section 4, a comparison of the conducted method with the Frequency and Empirical Bayesian (EB) methods is discussed. In the last section, the conclusions of applying this method are presented and the directions for future works are provided.

BLACK-SPOTS IDENTIFICATION METHODS
Until now, many researchers have conducted numerous studies on black-spots identification. Some studies have been done on the relative performance of Empirical Bayesian method and other studies have been carried out on techniques such as statistical quality control or confidence interval [2,1]. Elvik showed the predictive validity of EB of road safety by applying five versions of EB estimation in [3]. Tark and Kanodiain [4] proposed the index of crash frequency and index of crash cost in hazard identification. Some researchers have proposed to incorporate accident severity or crash costs into the risk measure, as discussed in the reference [5] and [6].
Also, Milton proposed a mixed Logit model by considering many environmental parameters [7]. Snabjornssonet [8,9] applied a reliability approach for assessing the road vehicle in a windy environment. Zio and Sansavini applied topological measures of interconnection of network systems in the safety analysis [10].
Some researchers have proposed model-based approaches for the black-spots identification. Sayed presented a counter measure-based approach which considered accident pattern besides crash frequency and severity [11]. Brijs proposed a multivariate model to identify and rank the sites according to their total cost to the society [12]. Also, Zheng and Lin discussed the review of accident prediction methods and suggested that the combined prediction approach can give a better result [13].
One of the simpler ways in ranking the black-spots is related to the number of raw crashes (or rate) which produces a large number of misclassifications because of random variation of traffic accident from year to year [1].
As mentioned earlier, researchers were interested in the random variation of traffic accidents; however, the variance of data has not been considered in their studies. By implementing the reliability analysis, the random nature of accidents for black-spots identification can be applied. In the next section, the reliability analysis used in this paper is described.

METHODOLOGY
Most phenomena in the world have some amount of uncertainty; they cannot be predicted with total certainty. In general, the measurement of physical phenomena will not give the same outcomes. So, reliability analysis through implementing statistical and probabilistic procedures determines the probability occurrence of phenomena. The first step in reliability evaluation is to determine the performance function with the input random variables. This function for a problem with two variables can be written as [14]: , (1) where x1 and x2 are the input random variables. The formula , g x x 1 2 h denotes the performance function (or objective function).
The limit state of this function can be defined as Z 0 = , which shows the boundary of safe and unsafe regions referred to as failure surface in the remainder of the paper. The equation of the boundary can be an explicit or implicit function of the basic random vari-ables and the failure region occurs in a condition when Z 0 1 . The failure probability Pf regarding Figure (1) is calculated by formula (2) [14]: h is the joint probability density function for variables x1 and x2, and the integration is performed over the failure region. Note that formula (2) is a general formula for the reliability problems.
In practice, the easy calculation of joint probability density function for the random variables highly depends on the limit state function. Performing the integration is not a simple task, so approximation and simulation methods are used to calculate this problem. Approximation methods are divided into two types, including First Order Reliability Method (FORM) and Second Order Reliability Method (SORM).
-Demonstrating failure function for two variables [14] In general, the failure surface boundary (line of limit state) equation might be linear or non-linear in the basic variables. If the failure surface is a linear function of uncorrelated variables with a normal distribution, the FORM type will be suitable for performing the reliability analysis. On the other hand, SORM calculates the probability failure by the quadratic estimation in order to implement it in a non-linear failure surface. Hence, FORM was used in this study to consider the probabilistic nature of black-spots identification. The considered method is described further in the text.
Halder and Mahadevanre demonstrated that the development of FORM was obtained from the result of Cornel studies [14]. For two random variables with a normal distribution, which are independent from each other and have linear relationship in the form of , , the probability occurrence Z 0 1 can be written as: where b is the safety index and z is the Cumulative Distribution Function (CDF) of the standard normal variable. Also, n and v are mean and standard deviation of random variables. In the equation, the probability failure (Pf ) is related to the proportion of the mean and the variance of Z. The abovementioned relationship is calculated by expanding the first-order Taylor series' approximation of the performance function g. Without exactly calculating the random characteristic of Z and just with the firstorder approximation of its mean and variance, where the normal distribution of Z is around the mean point, the failure probability is calculated in the form of P Z 0 1 h. Note that the exact probability of failure can be calculated by b in the cases where the variables are independent, where they have normal distributions and make a linear relationship with Z. Otherwise, the result will be approximated.
Hasofer and Lind [15], and Rackwitz [16], submitted some methods to improve this approximation. Also, Rackwitz and Fiessler [17] and Chen and Lind [18] presented an algorithm for solving the variables which are not normally distributed from the above problem. SORM-type used the second order approximation of Taylor expansion, which Fiessler, Neuman and Rackwitz [19] had carried out for the first time. Until now, researchers have proposed several methods for the reliability approximation with different levels of accuracy and efficiency [14].
As shown in Figure 1, safety index ( b ) represents the minimum direct distance of origin from limit state equation. The point of minimum distance from the origin to the limit state surface ( x * i ) represents the worst combination of stochastic variables called as design point or most probable point of failure. If the limit state equation is linear, cosines vector ( xi a ) will be perpendicular to all points of line equation. For nonlinear limit state equation, the limit state needs to be calculated in order to find the new design point and a Newton-Raphson type recursive may need to find the design point [14].
If the input variables have a distribution other than the normal distribution, the equivalent normal parameters should be calculated for each variable. These parameters become feasible by moving the distribution of variables into equivalent normal distribution. Rackwitz and Fissler submitted a method to calculate the parameters of equivalent normal distribution , [14]. Superscript N represents equivalent normal distribution parameter. This method is based on the assumption that the cumulative distribution function and the probability distribution of equivalent normal distribution and the distribution of each variable are equal at the design point.
In this paper, the algorithm presented by Ayyub and Halder [20] was used. This iterative algorithm is given for the performance function , g x x 1 2 h with variables which are not normally distributed.
Step1: Define the limit stated in equation for Z.
Step2: Assume an initial value for the safety index. The suitable estimation of b converges the algorithm fast. An initial value from 1 to 3 is reasonable.
Step3: Assume initial values for the design point. For instance, the mean values of random variables can be initial estimation.
Step4: Calculate the parameters of equivalent normal distributions , v n h based on Rackwitz and Fissler [17] (obtained from [14]) at the design point. This design point is calculated in each iteration of the algorithm and shows a point in which the performance function becomes zero.
Step5: Calculate the partial derivative of performance function with respect to the variable at the design point ( g x * i 2 2 h). Sign * means that this value should be calculated at the design point.
Step6: Calculate the direction cosines xi a at the design point as follows. This value shows the dependency of response on each variable. It means that the nearer the angle to an axis in Figure 1 , the greater the dependency of the variable on the calculated probability. Step7: Calculate the new values for design point x * i . As mentioned earlier, the design point is a point which has the minimum distance from the origin shown in Figure 1. The probability occurrence of the new design point has to be closer to Z 0 = than the other points.
Repeat steps 4 to 7 with constant b until there is convergence up to the tolerance of 0.005 for the value of x * i .
Step8: Estimate the new coefficient b by keeping it as an unknown variable while the others in formula (6) are known. This estimation is performed to have a better rate of convergence. Furthermore, the initially assumed value of b is considered for the first iteration. Otherwise, the second step can be eliminated.
Step9: Repeat steps 3 to 8 in order to have a convergence to coefficient b to the acceptable tolerance of 0.001.
In the appendix, the calculation and results of the performed method are given.

COMPUTATIONAL RESULTS AND DISCUSSION
The purpose of this paper is to present a probabilistic method for black spots identification. For this purpose, an adequate number of crashes and fitting of the probability distribution function are needed. It is important to know that just the number of crashes on sites was compared in this study as a criterion and other parameters such as intensity were not considered.
Due to the lack of reliable data and in order to illustrate the way this method can be performed, the simulated data from the study of Cheng and Washington [1] were used for the first-order reliability analysis. In order to compare the performance of black-spots identification methods, these researchers used simulated data instead of empirical data. In general, the crash occurrences for 30 sites in 16 observation periods were simulated in that paper. This simulation was based on real data though their details were not presented in the paper and only the results have been used. Table 1 shows the parameters of normal and logarithmic normal distributions for 30 sites in 16 observation periods.
Here, n and v are the mean and standard deviation of normal distribution, and m and g are two parameters of logarithmic normal distribution. In order to identify black-spots, a comparison of each site with a reference site was conducted. The reference site was determined using the data of 30 sites in16 observation time periods. It means that the normal distribution and logarithmic distribution functions were fitted to more than 480 numbers. Their values are called the reference site and this is shown in the last row of Table 1.
Mirando-Moreno showed that the logarithmic normal distribution for the crash occurrence of each site can be used [6]. A vital difference between logarithmic normal distribution and Poisson distribution is that the former is continuous while the latter is discrete.
As the purpose of this paper is to illustrate the performance of implementing the reliability analysis methods for the black-spots identification, the application of Poisson distribution and other prevalent distributions was eliminated. In addition, it was assumed that the 30 sites were independent from each other and from the reference site.
In this study, for identifying the black-spots, the probabilistic comparison of each site was made with the reference site. If variables R and S are used instead of x1 and x2, then the objective function of this problem for each site is g R Si = -, where R and Si are the probabilistic variables of the reference site and each site, respectively. Each site and also the reference site have a specific logarithmic normal distribution. The probability that crash occurrence at each site will be higher than the reference site (it is the index of high-accident occurrence of the sites) is expressed in Formula (7). It was calculated separately for each site by implementing the reliability analysis. Other results of this analysis were used as the extra production for the black-spots identification and will be mentioned in the following, according to Formulas (3) and (4): Here Pf i is the probability that the crash occurrence for each site i is more than that of the reference site. With respect to the objective function which is linear and also considering the independence of the as- sumed parameters, the first-order approximation algorithm of reliability proposed by Ayyub and Halder [20] was conducted. The values Pf i obtained by formula (7) are shown in Figure 2. As shown in Figure (2), the crash occurrence probability of sites with more crashes than in the reference site, generally increases with increasing the mean number of crashes. This behaviour cannot be generalized to all sites. For example, in this experiment, Pf i of site 22 were higher than Pf i of site 28 although the mean crashes for site 22 (=15.938) were less than that of site 28 (=16.375). In other words, site 22 was more hazardous than site 28. The reason was that the dispersion of data on site 28 was larger than that of site 22.
In order to further discuss this subject and compare the results of this method with the results of other methods, the considered data were analyzed with Frequency and EB methods. In both methods, the mean numbers of crashes were used to represent crash frequency.
In the EB method, the crash frequency on each studied site was adjusted by the mean and variance of the number of crashes for the reference site (shown in Table 1). Formula (8) expresses this adjustment as [21]: where fEB i is the adjusted crash frequency of site i, fi is the crash frequency mean of the observed site i, frp is the mean of crash frequency of the reference site and S 2 is the variance of crash frequency of the reference site. Using these two methods based on the frequency, the high-accident can be identified. Table (2) illustrates the comparison of reliability analysis method with Frequency and EB methods. Additionally, in this table the studied locations that are ranked on the basis of being high-accident in three stated/mentioned methods are compared.
As shown, the EB method has approximately adjusted the crash frequency and reduced the range of changes of crash counts on the intended sites. This order was effective in reducing the black-spots identification. However, this adjustment did not change the result of the ranked site in comparison with the crash frequency method which was due to the correlation between the frequency of this method and the mean number of the crashes occurring in each place. As Table 2 shows, based on Relative Frequency, similar to reference [1], sites 29 and 30 are selected and also based on Empirical Bayesian method, sites 29 and 30 are selected as well. However, in the Reliability Analysis, sites 30 and 29 (a change in the priority of the hazardous site) are selected. In all three methods, site 27 was in the third place.
Therefore, by implementing the reliability analysis method, instead of comparing the mean of the sites with the reference site which may lead to derivative error in data, the probabilistic functions were compared in order to demonstrate the realistic conditions of this problem.
Another outcome of this method was finding an expected number of crashes based on the probabilistic nature. This point was the equal point whose probability occurrence was calculated earlier and in the submitted algorithm was named the design point ( x * i in Formula (6)). The number of predicted accidents can be designated as S * i . Figure 3 shows these points for 30 sites.
Indeed, the number of crashes shown in Figure 3 is the random number which can be used in the comparison of sites with respect to their probabilistic nature. It means that each fixed number is the break-even point of two parameters with the characteristic of probability variables that can be used for ranking sites. Also, it should be noted that this point depends on the probability distribution of the reference sites besides being dependent on each site.