A Multi-Label Classification Method for Vehicle Video

In the last few years, smartphone usage and driver sleepiness have been unanimously considered to lead to numerous road accidents, which causes many scholars to pay attention to autonomous driving. For this complexity scene, one of the major challenges is mining information comprehensively from massive features in vehicle video. This paper proposes a multi-label classification method MCM-VV (Multi-label Classification Method for Vehicle Video) for vehicle video to judge the label of road condition for unmanned system. Method MCM-VV includes a process of feature extraction and a process of multi-label classification. During feature extraction, grayscale, lane line and the edge of main object are extracted after video preprocessing. During the multi-label classification, the algorithm DR-ML-KNN (Multi-label K-nearest Neighbor Classification Algorithm based on Dimensionality Reduction) learns the training set to obtain multi-label classifier, then predicts the label of road condition according to maximum a posteriori principle, finally outputs labels and adds the new instance to training set for the optimization of classifier. Experimental results on five vehicle video datasets show that the method MCM-VV is effective and efficient. The DR-ML-KNN algorithm reduces the runtime by 50%. It also reduces the time complexity and improves the accuracy.


Introduction
With the increase of vehicle crashes, threats in driving are rising. The most common use of vehicle video is detecting and identifying one important target in the video (e.g., target vehicles, traffic scene text, pedestrians, etc.). However, autonomous driving needs more information, such as pedestrian, lane line and others. Traditional classification, singlelabel classification is difficult to describe multiple information accurately contained in vehicle video. Therefore, multi-label learning [Zhang and Zhou (2014)] has gradually become the focus of research. In label classification, for each training instance x , there is a corresponding label yϵY. Let Y = � 1 , 2 , … , � denote the label space consisting of q class labels. Given a training set T = {( 1 , 1 ), ( 2 , 2 ), … , ( , )} , if | | = 1 , it means that each instance only corresponds to one label, which is a single-label classification; if | | > 1, it means each instance corresponds to multiple labels, which is a multi-label classification [Zhang and Zhou (2014)]. For example, a picture can be labelled as "blue sky", "house", "tree" and so on. Another example is that, a movie can belong to "action films", "spy_war films" and may also involve suspense stories, thus can be labelled as "suspension films". Different from multi-class classification, multi-label classification means that each one instance corresponds to multiple labels in label space, and multi-class classification refers to one instance corresponds to only one labels among multiple class labels. Essentially, multiclass classification falls into the category of single-label classification. Multi-label learning algorithms are divided into two categories: "problem transformation" and "algorithm adaptation". Problem transformation methods tackle multi-label learning problem by transforming it into other well-established learning scenarios, such as Binary Relevance [Tanaka and Macedo (2015)]. Algorithm adaptation methods tackle multi-label learning problem by adapting popular learning techniques to deal with multi-label data directly, such as ML-KNN [Zhang and Zhou (2007)]. This paper aims to identify multiple labels for vehicle video. We set villainous weather such as rain and snow, pedestrian blocking in front, and unclear lane lines as unusual events. The multi-label classification method can mine valuable information contained in vehicle video comprehensively and provide auxiliary information for unmanned system to improve driving safety. Actually, the feature of adjacent image frames of the vehicle video is easy to show high similarity, resulting in a particularly large feature quantity and redundancy in the data set. The existing multi-label classification method cannot handle this problem, and we choose to improve one of the efficient method ML-KNN (Multilabel K-nearest algorithm). The traditional ML-KNN needs to find K neighbors in the entire data set, which will take a lot of time to run. The major challenges are high time complexity and poor practical effect. According to the requirements and problems mentioned above, a multi-label classification method MCM-VV is proposed in this paper. Firstly, we extract features after video preprocessing. Then algorithm DR-ML-KNN is used to learn the training set to obtain a multi-label classifier. The algorithm performs the PCA [Tan, Ji and Zhao (2017)] on the transposed matrix of feature matrix and uses maximum a posteriori principle to predict labels of test instance, and finally add this instance to the original training set for classifier optimization. Experimental results show that the algorithm can predict labels of road condition quickly, improve accuracy and reduce runtime by 50%. In summary, the major contributions of this paper are as follows.
(1) Formulate the problem of multi-label classification for vehicle video.
(2) Devise a feature extraction method suitable for vehicle video, which can accurately extract the effective information in video. Also, we propose a multi-label classification algorithm based on dimensionality reduction, which is beneficial to predict labels quickly and accurately.
(3) Conduct comprehensive experiments on five real video datasets. The experimental results demonstrate the effectiveness and efficiency of our proposals.

Related work
In recent years, academia has carried out numerous research work on multi-label learning. Zhang proposed backpropagation for multi-label learning to classify gene functions and texts [Zhang and Zhou (2006)]. Jiang proposed a multi-label text classification method based on fuzzy similarity measure (FSM) and k-nearest neighbor (KNN) [Jiang, Tsai and Lee (2012)]. Yu proposed a multi-label classification framework MLNRS based on neighborhood rough set [Yu, Psdrycz and Miao (2013)]. Liu proposed a method of emotion analysis based on multi-label learning [Liu and Chen (2015)]. Ding proposed an algorithm by assessing cost of the majority classes and value of the minority classes to handle the multi-label imbalanced data classification problem [Ding, Yang and Lan (2018)]. Han proposed a new approach called multi-label learning with label-specific features using correlation information (LSF-CI) to learn label-specific features for each label with the consideration of both correlation information in label space and correlation information in feature space [Han, Huang, Zhang et al. (2019)]. Therefore, the multi-label learning has been widely applied in text classification, gene function reorganization, emotion analysis, and has achieved good research results. For vehicle video, we usually do the following analysis: obtain the driving trajectory of the target vehicle [Tan, Ji and Zhao (2017)] through the tracking of profile and characteristics of the target vehicle in vehicle video; locate and recognize the traffic scene text [Zhang, Mao and Chen (2009)] in vehicle video; detect pedestrians in front of vehicle [Pan, Jin and Feng (2015)]. The above research only detects and identifies one important target in vehicle video. In this paper, we propose a multi-label classification method for vehicle video, it can mine and utilize multiple important information contained in vehicle video.

Problem statement
Let V denote vehicle video set with vehicle videos. Let Χ = { 1 , 2 , … } denote instance space with m instances, and each ∈ Χ represents a feature vector with ndimension extracted from V. It is extracted from one frame image in vehicle video. Let Y = � 1 , 2 , … � denote label space with q labels, each label describes one kind of the road condition. Given a training set T = {( 1 , 1 ), ( 2 , 2 ), … , ( , )}( ∈ Χ, ⊆ Y). The particular -th instance is denoted by ( , ). is the real label set . The label in this paper is shown in Tab. 1. Multi-label classification method MCM-VV extracts features from vehicle video in V, converting it to Χ. Multi-label classification algorithm DR-ML-KNN learns a multi-label classifier h: Χ → 2 which optimizes some specific evaluation metric. In most cases however, instead of outputting a multi-label classifier, the learning system tend to produce a real-valued function of the form : Χ × Y → R . It means that, for each instance and its real label set , the value for label y ∈ is ( , ), and the value for label ′ ∉ is ( , ′ ), a successful learning system is tend to output ( , ) > ( , ′ ). What's more, the real-valued function can be transformed to a ranking function Given a video set and test video. Our proposed method MCM-VV will first preprocess video to remove noise. Then it will extract features of the video. Each feature of one frame image in video is characterized by a matrix, fuse all feature matrices and calculate an eigenvector . After that, video set is converted into feature set and label set are combined as the classifier input. Next, algorithm DR-ML-KNN produce a real-valued function . When a test video arrives, our proposed DR-ML-KNN finds its neighbors in the training set according to features, and outputs the labels of test video by calculating real value for each label.

Framework
In this section, we present the framework of method MCM-VV, which is shown in Fig. 1. It shows us how to discover multiple labels from the test video. Method MCM-VV is a lazy machine learning method. It predicts multiple labels for the test instance online. From Fig. 1, method MCM-VV is mainly divided into three steps: preprocessing, feature extraction and classification. The detail of each process is described in next section.

Preprocessing
From driving recorder shoots the vehicle video to the video is processed and analyzed, the image quality will be poor due to the type and version of the driving recorder. The same problem exists in the transmission media and storage devices, which tends to cause various noises in the formation and transmission of video.
According to the research, we find that noise exists in vehicle video is Gaussian noise, the Gaussian filter is selected in this paper, which is a kind of smooth linear filter and is suitable for removing Gaussian noise [Zhang (2014)]. So, for the image in vehicle video, twodimensional mean Gaussian filtering is utilized, and the calculation is as shown in Eq. (1): (1) Here, G( , ) is the value filtered by Gaussian filter, and is a parameter. Based on multiple comparison experiments, it can be seen that the noise reduction capability of the Gaussian filter is better.

Feature extraction
Each image has its own features, such as grayscale, texture, edge… [Chen, Wang and Chen (2015); Wang, Liu and Hai (2017); Zhang and Bao (2002); Zou, Liu and Chen (2019)]. Select some appropriate features can accurately describe the image.

Extract grayscale
Now, almost all vehicles are equipped with a driving recorder in order to detect the driving situation in front of the vehicle effectively. Usually, the video collected by the driving recorder is a color video. If the information is mined from color video directly, it will require a large amount of memory and calculation time. Therefore, we should convert color video to grayscale video collected by the driving recorder firstly. The popular methods used to do this are the maximum value method, the average value method, and the weighted average method. In this paper, the weighted average method [Chen, Wang and Chen (2015)] is selected to perform grayscale processing on the vehicle video. Depending on the sensitivity of human eyes to color, we know that the sensitivity to blue is relatively low, and the sensitivity to green is relatively high, so that different weighting coefficients are assigned to three components of R, G, and B to obtain the most suitable grayscale image. As shown in Eq.

Extract edge of main object
The edge of the image refers to the portion of the image where the change in brightness is significant. It contains more important information in the image, which plays an important role in the analysis and classification of the whole image. Therefore, in feature extraction of vehicle video, edge information [Zhang and Bao (2002)] is also a very important feature. The edge information of the image can be determined by the grayscale gradient amplitude and direction of the image pixel. The calculation is as shown in Eq. (4): here, D( , ) is the grayscale gradient magnitude at ( , ); θ is the direction. and are the derivatives of ( , ). Canny operator is selected in this paper, which is not susceptible to noise interference and detect true edges. The weight coefficient matrix is shown in Eq. (5).
Use the weight coefficient matrix to find and , then calculate the grayscale gradient amplitude and direction of the image according to Eq. (4). By combining the features mentioned above, a huge feature set of vehicle video can be obtained [Zhang, Wang and Geng (2015)]. It can be found that, the number of feature set is particularly large. For feature extraction of vehicle video frame by frame, it can accurately obtain the label set. But at the same time, it is easy to have another condition that the difference of adjacent frames is extremely small. The obtained feature data set has a problem of high redundancy. And if we reduce the frequency of feature extraction violently, it will seriously affect the accuracy of the experiment, which is shown in 5.3. In order to solve the problem of high redundancy and ensure the accuracy of the forecast, we propose a multi-label classification algorithm DR-ML-KNN.

Multi-label K-nearest neighbor classification algorithm based on dimensionality reduction
We present a Multi-label K-nearest Neighbor Classification Algorithm based on Dimensionality Reduction, named DR-ML-KNN. For high-dimensional data such as vehicle video, we select Principal Component Analysis (PCA) for dimensionality reduction of high-dimensional data [Han, Huang, Zhang et al. (2019)]. The basic idea of the PCA method is to map the original n-dimensional features to the k-dimension, that is, reconstruct the k-dimensional features using the original n-dimensional features. The new k-dimensional features are completely new orthogonal features, i.e. principal components. PCA aims to reduce the dimensions to remove similar features, and extract principal component features. One of the major challenges solved in this paper is to remove similar instances, so we apply PCA to transposed matrix of the feature matrix. As a result, similar samples are removed and data redundancy is reduced.
Step 1: The original training set Χ is a data set includes feature vectors. Each feature vector has n dimensions. To reduce the amount of calculation in this paper, the transposed matrix [ ][ ] needs to be reduced to k-dimension.
The training set is a matrix [ ][ ] represents the feature vector set. The whole training set is composed of feature vectors of frame images, and dimension reduction is performed on the transposed matrix [ ][ ] .
Step 2: decentralization, that is, subtracting the characteristic mean ̅ from each eigenvalue.
At this point, the training set is still a matrix of rows and columns.
Step 3: calculate the covariance matrix Ω. Ω = X (8) Step 4: solve the eigenvalues and eigenvectors of the covariance matrix by using the singular value decomposition method.
[ , Σ, ] = (Ω) (9) U is a n × n matrix, the orthogonal vector contained is a left singular vector; Σ is a n × m matrix, all elements except the elements on the principle diagonal are 0 and each element on the principal diagonal are singular value. V is a m × m matrix, the orthogonal vector is the right singular vector. U and V are both unitary matrices. In general, we will sort the values on the Σ in a descending order, and select the eigenvectors corresponding to the first k eigenvalues.
Step 5: the feature set is transformed into in the new space constructed by k feature vectors, and update the original training set X as the transposed matrix .
Step 6: let ⃗ be the category vector for the instance ,where its l − th components ⃗ (l)(l ∈ Y) takes the value of 1 if l ∈ Y and 0 otherwise. let N( ) denote a set of k nearest neighbors of instance x in training set X. In general, the similarity between instances is measured by the Euclidean distance. For the j-th class label, DR-ML-KNN chooses to calculate the following variables: Algorithm DR-ML-KNN is described in Algorithm 1. Step 7: let 1 denote the event that the instance x contains label y and 0 denote the event that the instance x does not contain label y. Let denote that instance x has j neighbors containing label y. P� 1 | � represents a probability that instance x has j neighbors containing label y and itself also contains label . Correspondingly, P� 0 | � represents the probability in the opposite case described above. According to MAP principle [Zhang and Zhou (2007)], the category vector ⃗ ( ) for unseen instance x is determined using the following maximum a posteriori principle： Based on Bayesian rule [16 Zhang and Zhou (2007)]: In the process of algorithm DR-ML-KNN, we firstly extract the features from the vehicle video and performs the de-redundancy operation on the transposed matrix of the feature matrix, so that we get the first k principal instances. This operation removes similar instances and reduces the calculation of finding the neighbors of the test instance in DR-ML-KNN. Also, after predicting labels for the test instance, we add the test instance to original training set to enrich the training set and optimize the classifier. What's more, we can predict the label set of the test instance more accurately.

Evaluation metrics
We choose five evaluation criteria for multi-label learning algorithms mentioned in [Zhang and Zhou (2007)]: Hamming Loss, One-error, Coverage, Ranking Loss, Average Precision.

Dataset
In this paper, we choose five vehicle video data sets in Berkeley Road-Vehicle Dataset (BRVD) to verify the performance and advantages of the algorithm DR-ML-KNN. Our proposed DR-ML-KNN is compared with the traditional ML-KNN and the ML-KNN that reduce feature extraction frequency, recorded as V-ML-KNN. As far as we know, this paper applies the multi-label classification method on vehicle video label prediction for the first time and there is no other algorithm to compare.

Effectiveness on DR-ML-KNN
The algorithm is implemented by MATLAB. The experiments are carried out on MATLAB 2016a, running on a PC with Inter(R) Core 3.2GHz CPU with 4GB of memory. DR-ML-KNN is compared with traditional ML-KNN and V-ML-KNN on 5 real-world multi-label datasets. For average precision, the larger the value, the better the performance. For the other four metrics, the smaller the values, the better the performance.  As can be seen from Tab. 3, Hamming Loss for our algorithm is lower than ML-KNN and V-ML-KNN. The similarity between predicted label set and real set is above 80%. From Tab. 4-6, One-error, Coverage and Ranking Loss for our algorithm are all lower than ML-KNN and V-ML-KNN. From Tab. 7, Average Precision for our algorithm higher than ML-KNN and V-ML-KNN. In summary, it can be seen from the above five tables that the algorithm proposed in this paper works well and improves the accuracy of the classification label.

Efficiency on DR-ML-KNN
In the next set of experiments, we validate the efficiency of the proposed DR-ML-KNN. The time unit is seconds, including training time and predicting time. Runtime comparison is shown in Fig. 2. From Fig. 2, we can see that our algorithm is more efficient than traditional ML-KNN. Although the runtime of V-ML-KNN is the least, its accuracy in Tab. 3 is quite low so that it will not be considered. In summary, when our algorithm is applied to the real world, it will have a good performance.

Conclusion
The traditional multi-label classification method can only predict labels, which can-not extract features from the original video data. Moreover, when the traditional multi-label classification method is applied to vehicle video data, the runtime is high and the classification performance is poor. In order to solve the above defects, we propose a multilabel classification method MCM-VV. We achieve the ultimate goal of reducing time complexity and improving classification accuracy. The experimental results on five vehicle video data sets show that the algorithm achieves better multi-label evaluation indicators and have less running time.

Conflicts of Interest:
The authors declare that they have no conflicts of interest to report regarding the present study.