Multiple Classifier of Traffic Accident Based on Matter-Element Analysis

The paper establishes a new multiple classifier for grading the traffic accident based on the Hard Decision Model and the parallel topological structure. Logistic Regression, decision tree(DT) and BP Neural Network are selected as the base classifiers and matter-element analysis is employed as the fusion algorithm to improve the traditional topological structure. Matter-Element analysis has potential to increase the objectivity and dynamic in the criterion setting of the multiple classifier by combining the confidence coefficient of the base classification results. The accuracies of different base classifiers are used for weight factors calculation instead of the analytic hierarchy process (AHP) or the fuzzy synthetic evaluation (FSE) to avoid the influence of human factors. 200 recordings of the traffic accident are selected as a case study for methodology verification. The results show that compared with the base classifiers, the capacity to identify the class of traffic accident can be strengthened apparently by the multiple classifier based on confidence coefficient and Matter-Element Analysis Meanwhile, this classifier provides the best opportunity to avoid many bad conditions of single model, such as overfitting and under-fitting.


INTRODUCTION
With the rapid development of cities and the surge of vehicle, the issues of traffic safety in the city have become increasingly prominent. The evaluation of urban traffic safety, especially classification of traffic accident, can not only quantitatively grasp the overall development level of urban transport, but also accurately locate the security problems in the city. And with the further development of the machine learning, the various identification and classification algorithms are becoming more and more mature, based on which linear regression [1], logistics regression [2], neural networks [3] and decision tree [4] are widely employed in classification of traffic accident. However, there are still certain space for the improvement in accuracy and stability of the classification achieved by a single model [5]. Therefore, the multiple classifier consisting of the several base classifiers is playing a more and more important role in the issues of classification [6]. This is because, compared with a single base classifier, the multiple classifier can improve the generalization ability of classification and avoid overfitting [8] [9].
In existing papers, additionally, multiple classifiers can be divided into two types: SDM(Soft Decision Model) and HDM(Hard Decision Model) [10]. The SDF emphasizes more on the special base classifiers and curb of their weaknesses, whereas the latter pays more attention on the fusion ICTETS 2020 IOP Conf. Series: Earth and Environmental Science 587 (2020) 012038 IOP Publishing doi: 10.1088/1755-1315/587/1/012038 2 algorithms in the form of hard decision and some general base classifiers, such as Decision Tree, Logistic Regression and BP Neural Network [11]. This dissertation mainly studies on the improvement of the multiple classifier based on HDM. Generally, the training set is used to train the general base classifiers, and the fusion algorithm in the form of hard decision is used to fuse the results from general base classifiers. Therefore, apart from the selection of base classifiers, the most important component for a multiple classifier based on the HDM is topological structure, including CS (the cascade structure), PS (the parallel structure) and HS (hybrid structure) [12]. Choosing the right structure can not only improve the accuracy and generalization ability of classification, but also increase the efficiency of time and space. We have found, however, it still has potential to improve the topological structure of current multiple classifier because of its lack of consideration in correlations between the base classifiers, which has the bad impact on the accuracy and stability.
In response to the above-mentioned shortcomings, the overarching theme of this paper is to establish a new multiple classifier for grading the traffic accident based on the hard decision model and the parallel structure, which is achieved by Matter-Element Analysis. The main contributions of this work lie in three aspects: 1) How to use the confidence coefficient to establish the traffic accident grade recognition model when the traffic accident belongs to a certain class. 2) Matter-element theory is firstly selected to establish a multiple classifier based on the parallel topological structure because it systematically considers the correlations between general base classifiers, which aims to solve the incompatibility problems and dynamically ensure the implementation of the multiple classifier. 3) The accuracies of different general base classifiers are employed to calculate weight factors rather than analytic hierarchy process (AHP) or fuzzy synthetic evaluation (FSE) to avoid the influence of human factors on results classification.

Fitting of Base classifier
First of all, the data of traffic accident, including class of traffic accident, characteristic of driver, road condition for driving and crossing condition for pedestrian, are randomly divided into two parts: the train set and the test set. Next, the training set is used to fit all the base classifiers in SPSS 23 in order to evaluate all relevant parameters as well as accuracy of classification. And then these base classifiers are employed to identify the class of traffic accident in the test set, which lays the foundation for estimation of the confidence coefficient where the traffic accident belongs to one class.

Estimation of Confidence Coefficient
In general, the confidence coefficient is the probability of guaranteeing that the error between the predicted value and the actual value is limited in one certain range. For example, if we point out that the accuracy of the predicted result is 95%, the corresponding percentage is called confidence coefficient, also known as the confidence level, which refers to the reliability where people approve of the predicted results. By means of Estimation of Confidence Coefficient and Confidence Interval in SPSS 23, we can get the exact values of confidence coefficient corresponding to the different classes.

Matter-Element Definition of Multiple classifier
As an important method for qualitative evaluation, the matter-element analysis has been successfully used in many fields [13][14]. In matter-element analysis, the class of traffic accident can be divided into several minimum units which are represented by M [15]. The matter-element about the traffic accident can be represented by the orderly ternary array R as shown in (1).
Where M is class of traffic accident, and c is the base classifier used for assessing M, and v is the confidence coefficient where the traffic accident belongs to one certain class. If traffic accident is depicted by n base classifiers, R would be expressed as a n-dimension array. In this paper, c 1 is LR, and c 2 is DT, and c 3 is BPNN.

Classical Domain and Segmented Domain of Matter-Element
The traffic accident is divided into m classes with level of M cj (j=1,…,m). For each M cj subjected to class j, the ranges of probability of all c i are called the classical domain [16], which is expressed as R cj .
In matter-element R c , the whole range of probability v ci of base classifier c i would be divided into m intervals as v cji within the range of a cji -b cji (j=1,2,…,m). Traffic accident can be identified as the jth class when the confidence coefficient obtained by using the ith base classifier is close to v cj . The matrix of the classical domain R cj can be achieved as (2) Segmented domain of matter-element indicates the whole probability range of c i . For c i in equation (2), the segmented domain would be the interval a pi -b pi , as expressed in (3) Where R p represents the matter-element of segmented domain, M p and v p indicate the same  (2) but apply to the segmented domain.

Correlation Degree Calculation
Based on the classical domain and the segmented domain, correlation degree between each model c i and each class j is calculated by using the (4)- (6). K j (v i ) is the correlation degree between base classifier i and class j, and ρ(v i ,v oji ) is the distance between the v i and v, and ρ(v i ,v pi ) is the distance between the v i and v pi .

Weight Calculation and Class Evaluation
The integrated correlation degree between the traffic accident and the evaluation class is one of the most important components of multiple classifier. The integrated correlation degree of each matter-element M cj can be calculated by following equation: Where K j (M cj ) indicates the integrated correlation degree between traffic accident and class j. ω i is the weighted value of base model c i . If K l =max[K j (M cj )](j=1,…,m), M cj belongs to class l.
Instead of deploying the traditional methods for calculating the weighted value, such as analytic hierarchy process (AHP) and fuzzy synthetic evaluation (FSE), this paper applies goodness of fit as the weight to reduce the subjectivity, which is illustrated carefully in (10).

Data Description
In this paper, 200 traffic accident recordings in United Kingdom are used to clearly demonstrate the steps of modeling and test its validity. The severity of traffic accident has been divided into 5 class corresponding to M={M c1 ,M c2 ,M c3 ,M c4 ,M c5 }, and other fields (independent variables) are shown in the table Ⅰ.

Modeling for Matter-Element Analysis
Step 1: model fitting and classifying based on the base model  the other is test set. The training set is employed to fit all the base classifiers to obtain all the parameters as well as goodness of fitting.
Table Ⅱ-Ⅳ show the information of fitting for LR, BPNN, and DT. According to goodness of fitting and accuracy of classification (shown in the Fig.1) in train set, the performance of DT is very poor. Although the LR is better than others, there is still a certain space for improvement. After that, the classes of traffic accidents with the confidence coefficients (shown in the Fig.2) in test set can be identified by means of these base classifiers. Step 2: Building matter-element of accident severity. Based on the above mentioned results, the matter-element model consisting of LR, BPNN, and DT is established to identify the class of accident for every recording in test set. Taking the first record in test set as an example, the detailed steps of modeling are shown as following. Table Ⅴ shows gradation of accident severity based on the confidence coefficient. This means that we can evaluate the class of traffic accident by calculating the distance between the confidence coefficient from one certain base classifier and confidence interval in the Table Ⅴ. Step 3: Building the classical domain and segment domain It is unnecessary to normalize the Table Ⅴ, because the probability where the traffic accident belongs to one certain class is within 0-1. Based on the theory about matter-element analysis, classical domain for M c1 is shown as following: Where c 1 is LR, and c 2 is NN, and c 3 is DT. R c2 , R c3 , R c4 , R c5 can be calculated similarly as R c1 . Because of the same scale (0-1) for confidence coefficient (v i ), segment domain is expressed as following equation: Step

4: Weight evaluation based on identification accuracy
In the existing theoretical studies, AHP and FSE are widely employed to determine the weight of index. AHP combines the qualitative analysis with quantitative analysis into make a decision [17], but there is lots of subjectivity in steps of evaluation. This paper deploys the classification accuracies of the 3 base classifiers in the train set for weight calculation, the values of which embody the individual performance. The weight of each c i can be calculated by following: where the aop i is the accuracy of identification for c i in train set, and ω i is weight of c i which is calculated as ω= (0.336,0.336,0.300) .
Step 5: Evaluating the traffic accident class of 1th record The confidence coefficients with which the accident severity of the first record in the test set belongs to one certain class (M c1 ,M c2, M c3 ,M c4 ,M c5 ) according to base classifiers (LR, NN, DT) are substituted into (1) to build the matter-element of the traffic accident, which is expressed as following: By using (5)- (6), the correlation degrees between each index and class are calculated, which is presented in Table 6. The integrated correlation degrees are calculated as K j (M i )=ω*M rd =(0.028, -0.887, -0.887, -0.908 , -0.857). Because the max integrated degree is K j (M 1 )=0.028, the class of accident severity of the first record in the test set is M 1 , which is as same as the actual value. However, the class of it is classfied as M 2 , M 1 , M 3 according to LR, BPNN and DT respectively.
Step 6: Evaluation of all the recordings in the test set and testing the validity The multiple classifier is employed to identify the classes of traffic accidents for 100 data   According to Fig.3 and Fig.4, with accuracy of only 38%, the performance of decision tree is worst, whereas logistic regression and neural network has better ability to identify the class of traffic accident, the accuracy of which is 84% and 87% respectively. However, by combining the different base classifiers into one based on matter-element analysis, there is obvious improvement in the multiple classifier because of its identification accuracy reaching up to 100%. In conclusion, the multiple classifier put forward in this paper is useful and valid, which provides a new approach to construct a reliable classifier in other fields.

CONCLUSION
This paper explores a new multiple classifier based on the hard decision model and the parallel structure, which realizes the classification of traffic accident with the better accuracy and stability than single model.
Matter-Element theory is selected to establish a multiple classifier consisting of different base classifiers, including LR, BPNN, and DT, which has potential to improve the accuracy of classification and effectively guarantee the application of the classification system for traffic accident. Furthermore, ICTETS 2020 IOP Conf. Series: Earth and Environmental Science 587 (2020) 012038 IOP Publishing doi:10.1088/1755-1315/587/1/012038 9 the paper improves the matter-element evaluation method to increase the objectivity and dynamic in the criterion setting of the multiple classifiers, which is based on the mining of the confidence coefficient where the traffic accident belongs to one certain class.
The data of traffic accidents in UK, including class of traffic accident, characteristic of driver, road condition for driving and crossing condition for pedestrian, are used to demonstrate the steps of modeling and test the validity and ability to classify. 200 recordings of traffic accident are randomly divided into two parts: train set and test set.
Furthermore, the corresponding data of 100 accident recordings in train set are used to fit the base classifiers and the multiple classifier, while other 100 recordings in the test set are employed to test the accuracy and stability. The results show that compared with the general base classifiers, the capacity to identify class of traffic accident can be strengthened obviously by using multiple classifier based on confidence coefficient and matter-element analysis because the classification accuracy is higher than that of LR, DT and BPNN.