Pedestrian Crossing Detection Based on HOG and SVM

: In recent years, pedestrian detection is a hot research topic in the field of computer vision and artificial intelligence, it is widely used in the field of security and pedestrian analysis. However, due to a large amount of calculation in the traditional pedestrian detection technology, the speed of many systems for pedestrian recognition is very limited. But in some restricted areas, such as construction hazardous areas, real-time detection of pedestrians and cross-border behaviors is required. To more conveniently and efficiently detect whether there are pedestrians in the restricted area and cross-border behavior, this paper proposes a pedestrian cross-border detection method based on HOG (Histogram of Oriented Gradient) and SVM (Support Vector Machine). This method extracts the moving target through the GMM (Gaussian Mixture Model) background modeling and then extracts the characteristics of the moving target through gradient HOG. Finally, it uses SVM training to distinguish pedestrians from non-pedestrians, completes the detection of pedestrians, and labels the targets. The test results show that only the HOG feature extraction of the candidate area can greatly reduce the amount of calculation and reduce the time of feature extraction, eliminate background interference, thereby improving the efficiency of detection, and can be applied to occasions with real-time requirements.


Introduction
With the continuous advancement of computer intelligence technology, it has become possible for the monitoring system to develop towards intelligence. At the same time, the large-scale application of computer image processing and information technology has gradually combined with video surveillance systems. How to use computer vision technology to process surveillance video effectively and in realtime becomes particularly important. If the video surveillance system can automatically detect abnormalities and alert people when abnormal behaviors occur, it will greatly improve people's work efficiency and find abnormalities early and deal with them. This research mainly used background extraction and pedestrian detection to analyze whether pedestrians are entering the dangerous warning area and cross-border behavior in the video.
Since 2005, Dalal et al. [1] proposed HOG, pedestrian detection technology has entered a stage of rapid development. The HOG feature is used to describe the gradient information of the pixel in the image. This feature fully describes the edge information of pedestrians and is not sensitive to changes in illumination. It is still the most widely used feature operator in the field of pedestrian detection.
The rate of pedestrian detection is easily affected by factors such as the dimension of the feature descriptor and the size of the detected images. The higher the dimensionality and the larger the image, the detection time will be longer [2]. The GMM has a good detection effect on overall human behavior. This method is mainly aimed at representing complex background scenes that the single Gaussian model that cannot represent effectively, such as leaf shaking, water ripples, etc. [3][4][5][6]. GMM is often used in background modeling in complex scenes. Because it can adapt to changes in the background, it is better than other traditional models. So this article uses GMM to model the background. Therefore, firstly, extract the moving target, detect the moving area in the video [7][8], and then perform pedestrian detection, which can greatly reduce the feature detection area, thereby reducing the detection time and improving the detection efficiency. Therefore, this article proposes a method to detect moving targets through background modeling use GMM, which can reduce the interference of the background, and then use the method of HOG and SVM to classify the moving targets in the video, and finally detect the presence in the forbidden area of pedestrians are marked. The flowchart of this method is shown in Fig. 1

Moving Target Detection
The main purpose of background modeling is to convert the moving target detection problem of sequence images into a two-classification problem based on the current background estimation, which divides all pixels of the input sequence into background and motion foreground, and then processes the classification results [9][10].
First, process the generated color image and observe the random variable x . The pixel sample of the variable at time t is { , , } t t t t x r g b = , and the single sampling point t x taken by it satisfies the mixed Gaussian distribution probability density function, as shown in formula (1).
For the convenience of calculation, it is generally assumed that the channels of the pixels are independent of each other and have the same variance, to establish a Gaussian mixture model for each pixel. The effect diagram of moving target detection using Gaussian mixture background modeling is as follows. Fig. 2(a) is an image of the walking behavior video, Fig. 2(b) is the result of moving targets detected by using Gaussian mixture background modeling. Gaussian mixture background modeling is a background representation method based on the statistical information of pixel samples. It uses statistical information such as the probability density of a large number of sample values of pixels in a long time to represent the background and then uses the statistical difference to determine the target pixel, which can perform complex dynamic background Modeling [11].

Pedestrian Detection Based on HOG Feature Combined with SVM Classifier
This article uses HOG and SVM methods to detect pedestrians in a video. The flow chart is shown in Fig. 3

HOG Features
HOG feature descriptors are commonly used in computer vision and image processing for object detection [12]. This feature is a histogram of the gradient direction of the local area of the image obtained by statistics, which represents the local gradient direction and the gradient intensity distribution characteristics. In this method, when the specific position of the edge is unknown, the distribution of the edge direction can also represent the contour of the pedestrian target.
First, the image needs to be preprocessed, the image to be detected is grayed out, and the Gamma correction method is used to normalize the color space of the input image to adjust the contrast of the image, reduce the impact of local shadows and light changes in the image, and suppress the noise interference of the image [13].
Then get edge gradient information. The gradient information mainly exists on the edge of the image, and the statistical information of the gradient can better describe the appearance and shape of the local target. The Sobel operator detects the edge based on the gray-scale weighted difference of the upper and lower, left and right adjacent points of the pixel, and reaches the extreme value at the edge. Sobel operator has a smoothing effect on noise and provides more accurate edge direction information, but the edge positioning accuracy is not high enough. The Sobel convolution factor is shown in formula (2), among them, the calculation of the horizontal gradient detection vertical edge is the Sobel convolution factor x G , and the calculation of the vertical gradient detection horizontal edge dimension is the Sobel convolution factor y G .
Dalal et al. [1] proposed to use a 3 × 3 Sobel operator to perform gradient operations, and then calculate the gradient magnitude and direction by formula (3). 2 2 The calculation of HOG is based on the density matrix of the uniform space to improve accuracy. When calculating the HOG features, as shown in Fig. 4, the feature descriptors of all cells in a block are connected in series to obtain the descriptor of the block, and the HOG of all blocks in the image The feature descriptors can be connected in series to get the HOG feature descriptor of the image, which is the final feature vector for classification. The calculation of HOG is based on the density matrix of the consistent space to improve accuracy. When calculating HOG features, as shown in Fig. 4 below, the feature descriptors of all cells in a block are connected in series to obtain the feature descriptors of the block. The block feature descriptors can be connected in series to obtain the HOG feature descriptor of the image, which is the final feature vector for classification. The relevant parameters of this experiment are set as follows: the cell unit is pixels, the block unit is the cell, the sliding window uses a fixed size, and the scan step is 8 pixels.
Finally, the contrast is normalized. Calculate the density of each histogram in this block, and normalize each cell unit in the interval according to this density. After normalization, better results can be obtained for lighting changes and shadows.
Compared with other characterization methods, HOG has many advantages. First of all, because the calculation is performed on local grid cells, HOG can maintain good invariance to image geometric and optical deformation. Secondly, because of operations such as coarse spatial sampling, fine direction sampling, and local optical normalization, this method allows pedestrians to maintain an upright posture and have some subtle movements of their limbs. These subtle movements can be ignored and it will not affect the detection effect. Therefore, the HOG feature is particularly suitable for detecting people in images.

SVM Classifier
SVM is a two-category model. The learning strategy of SVM is to maximize the interval, which can be formalized as a problem of solving convex quadratic programming, which is also equivalent to the problem of minimizing the regularized hinge loss function. The learning algorithm of SVM is the optimal algorithm for solving convex quadratic programming.
The basic idea of SVM learning is to solve the separation hyperplane that can correctly divide the training data set and have the largest geometric interval. For a linearly separable data set, there are infinitely many such hyperplanes, but the separating hyperplane with the largest geometric interval is unique. Therefore, our goal is to find a segmentation plane that satisfies the classification requirements when the given sample point is separable, and to make the given sample point as far as possible from the segmentation plane [14]. That is, while ensuring the classification accuracy, maximize the blank area on both sides of the segmentation plane, because the larger the blank area, the better the generalization performance.
Use Python to complete the training of the SVM classifier. The first training result is not ideal, and the accuracy of image detection is average. When the human body occlusion is serious and the background is more complex, some misdetections and missed detections are shown in Fig. 5(a) below, because the number of negative samples is not enough and the diversity is not good enough. Next, using the detector that comes with OpenCV, you can clearly feel that the accuracy has improved a lot during the second training. Next, the bootstrap method is used to extract the difficult samples from the false positives, and add them to the negative samples [15][16]. Using the INRIA data set, the final SVM training effect achieved is shown in Fig. 5(b).

Combination of SVM and HOG Models
This article uses positive and negative samples from the French INRIA pedestrian dataset to train the SVM classifier. This data set is the upright pedestrians in the images and videos collected by Dalal during HOG pedestrian detection [17][18]. The data set contains positive samples and negative samples of pedestrian images. The images contained in the data set are relatively clear and friendly to training.
Detection process: Scan the input frame image according to a window of a given size, and extract the HOG feature value in the scan window. According to the SVM classification algorithm, the distance between the HOG feature of the scan window and the support-vector of the positive and negative samples determines the type of the scan window, and positive samples are detected [19].
When implementing pedestrian detection, this paper uses the HOG set SVM detector and detect functions in Python to recognize pedestrians, and uses non-maximum suppression methods to optimize overlapping detection areas.

Experiment Analysis
This experiment uses the INRIA data set for experimental analysis. The INRIA data set has diverse pedestrian poses and complex and changeable backgrounds. In the field of pedestrian detection, this data set is highly recognized [20]. The number of positive samples in the training samples is 2416, and the number of negative samples is 453. The operating environment of the hardware is the Windows10 operating system, Intel(R) Core(TM) i5-8265U CPU, 16 G memory. The software platform is JetBrains PyCharm Community Edition 2019.
When performing pedestrian detection on images in the INRIA data set, the SVM+HOG in this article has already demonstrated superior detection results. Some experimental screenshot is shown in Fig. 6 below. Although there are some false detections and missed detections, it also shows excellent results. For the recognition rate of Haar feature and LBP feature [12], which are also detected in the INRIA data set. The experimental results are compared with the above common features combined with the SVM classifier. It has an advanced performance. The specific values are shown in Tab. 1. In the process of pedestrian detection on the video, some fames may have the results shown in Fig. 7. In order to reduce the false detection of the background and reduce the amount of calculation, the method in this paper uses GMM to model the video. The modeling effect in different scenes is shown in Fig. 8, and then the modeled video is processed to obtain the video shown in Fig. 9. The result of this processing makes the video eliminate the interference of the background, and will not cause false detection of the background. Then use HOG+SVM for pedestrian detection on the modeled video, after the final processing, a screenshot of the video used for detection is shown in Fig. 9, and the recognition result is shown in Fig. 10.

Conclusion
Aiming at the problem of pedestrian detection in intelligent surveillance systems, this paper proposes a method to detect whether pedestrians in the video are out of bounds. This method first performs a mixed Gaussian background modeling on the video frame, and then uses HOG+SVM to classify and detect pedestrians. In this way, background interference is eliminated during detection, the false detection rate is reduced, and the detection speed and efficiency are improved.

Conflicts of Interest:
The authors declare that they have no conflicts of interest to report regarding the present study.