Research on hierarchical pedestrian detection based on SVM classifier with improved kernel function

The research of pedestrian target detection in complex scenes is still of great significance. Aiming at the problem of high missed detection rate and poor timeliness of pedestrian target detection in complex scenes. This paper proposes an improved classification method. First, Haar features were extracted from the images to be detected, and the candidate areas of pedestrians were determined by Adaboost classifier. Then, the traditional SVM classifier was improved by using the combined kernel function instead of the single kernel function, and the optimal proportion of each function in the combined kernel function was found by using the adaptive particle swarm optimization algorithm. Finally, the improved SVM classifier was combined with the fusion feature to further detect the candidate area to accurately locate the pedestrian’s position. Experimental results show that compared with the traditional detection framework, the proposed method can effectively improve the detection speed and the detection accuracy. This method has certain practical significance for pedestrian target detection in complex scenes.


Introduction
The purpose of pedestrian detection is to predict the location of all pedestrians in the images, which provides sufficient technical support for later behavior detection, identity recognition, path tracking, and other technologies. 1 Nowadays, the more commonly used method of pedestrian detection is to extract features based on statistical learning methods to detect after training. Feature extraction methods use computers to extract detailed information of images to be detected and determine whether each image point belongs to an image feature. This method needs to be based on a large number of training samples used to construct a pedestrian detection classifier. Papageorgiou and Poggio 2 used sliding Windows for pedestrian detection, Viola and Jones 3 proposed the integral graph method, in 2005, Dalal and Triggs 4 proposed a robust description feature: Histogram of gradient (HOG) feature for pedestrian feature description, Ojala et al. 5 proposed local binary pattern(LBP), which gave the classification results of one-dimensional eigenvalue distribution of single feature and two-dimensional distribution of complementary feature pairs, Wang et al. 6 combined LBP features with HOG pending features, used the support vector machines(SVM) classifier to complete classification, and proposed feature fusion pedestrian detection algorithm, Felzenszwalb et al. 7 proposed a target detection based on the multi-scale deformable part model mixing, Felzenszwalb et al. 8 also proposed a cascade variable model. Due to the good performance of the above features in pedestrian detection, researchers have proposed improved features such as FHOG, LTP, SLBP, ACFFCF, and so on. As Krizhevsky et al. 9 applied convolutional neural networks to solve largescale image classification problems and achieved remarkable results, more and more researchers have also paid attention to the research of deep learning detection methods. According to the idea of cascading classifiers in the Adaboost algorithm, Angelova et al. 10 proposed a pedestrian detection algorithm based on cascaded convolutional neural networks, which can quickly eliminate most of the background area in the images. Ouyang and Wang 11 proposed a joint deep learning algorithm, which combines HOG features with Cascading Style Sheets (CSS) features and uses SVM classifier classification to design the first-level detector to prefilter the samples, and then use the convolutional neural network makes the next judgment.
Among many pedestrian detection algorithms, the excellent performance of HOG-LBP pedestrian detection fusion feature in detection accuracy and handling the occlusion problem has always been a focus of research. However, the HOG features can not describe the spatial characteristics of gradient well. The binary coding strategy of LBP features makes it more robust to illumination and noise. Moreover, the algorithm complexity of the traditional kernel SVM classifier is relatively large, and the real-time performance of detection also needs to be improved. Based on the HOG-LBP fusion feature and SVM classifier framework, this paper proposes a pedestrian detection algorithm based on the cascaded candidate classifier for the above problems.
In detection, the environment in which pedestrian recognition is more complicated and there are many obstacles. 12,13 In the algorithm of pedestrian detection, the shape of the human body is changeable, such as standing, squatting or obstructing, and there are differences between individuals due to high and low weight and other reasons, so feature extraction is relatively difficult. 13 In pedestrian detection, there is a large amount of noise interference in the images caused by changes in light, which will lead to poor detection accuracy and timeliness. 14 Therefore, in the process of pedestrian detection, the first step is to process the images to reduce the interference factors in the images. In this paper, before the detection, the images are grayed and normalized. 15 The detection is divided into two stages. In the first stage, the region of interest is extracted by using head and shoulder contours. In the second stage, the improved SVM classifier is used to train parameters, and multi-feature fusion is used to re-detect the areas identified as pedestrians in the first stage. This method can greatly reduce the judgment of candidate areas and shorten the detection time.

Materials and methods
In this paper, two-stage detection is used to accurately locate pedestrians in images. In first stage detection, the classifier trained by Haar feature and AdaBoost algorithm is used as a detector to extract the region of interest. Then, feature fusion was used to further enhance the detection accuracy, the fused features were processed by principal component analysis (PCA) for dimension reduction, 16 and the SVM classifier was trained. The region of interest extracted in the first stage was input into the trained improved SVM classifier for two-stage detection, and the final pedestrian detection result was obtained. The overall detection process is shown in Figure 1.
It makes full use of the high speed of Haar features in pedestrian detection, and quickly selects the candidate areas of head and shoulder. 17 On the other hand, it uses the fusion features to describe the precise characteristics in pedestrian detection to complete pedestrian head and shoulder detection.

Level 1 Haar + AdaBoost detection scheme
In order to obtain better detection results, partial characteristics of pedestrians can be tested. 18 The head and shoulders of pedestrians are similar to O shape and are relatively stable. 19 In this paper, the head and shoulder are used as the detection target in the first level detection. On the one hand, it can effectively reduce the impact of occlusion, on the other hand, it can reduce the detection area and improve timeliness. Table 1 compares the performance of the Adaboost classifier trained by Haar-like features, LBP features and HOG features in detection, and it can be concluded that Haar features have better performance through the comparison of the tests. Haar feature and Adaboost classifier were used to train the first level detector for pedestrian head and shoulder detection. By adding the weight coefficient to the traditional arithmetic average method, the final prediction function becomes: The constraint function is: s:t:w i 50, In pedestrian detection, the head and shoulder model is very stable and has excellent contour features. 19,20 When using contours to detect targets, contour features need to be intact, and ensure the effectiveness of the training features in order to obtain highly accurate detection results. It is a stable and reliable way to detect pedestrians with head and shoulder contours to avoid the occurrence of some occlusion. In this paper, Haar features are used to train AdaBoost classifier, and scale of the obtained strong classifier is changed to accommodate different sizes of images to be detected, and the position of head and shoulder in the image is detected. [21][22][23] The Haar feature is the difference between the sum of pixels of two matrices. It is proposed by Paul Viola to calculate Haar features by means of integral graph, which improves the convenience of Haar feature calculations. The training process is shown in Figure 2. For the existing N training samples, the weights of all samples have the same at the beginning of training, then the initial weight distribution is as follows: In the training, the sample weight of accurate identification is reduced, while the sample weight of wrong identification is increased, and the key training is carried out. The weak classifier h m (x) : X ! fÀ1, 1g can be obtained by using the data set for training, and the error rate of the classifier on the training data set is collected: Calculate the coefficient a m of h m (x), the weight of the weak classifier can be calculated by the following formula: When the strong classifier is trained, the previous bestperforming classifier has the largest weight. It can be seen from equation (5), a m varies from 0 to 1/2. a m increases with the decrease of e m , which indicates that the classifier with better performance and lower error rate will play a more important role in classification.
After the change of the weighting coefficient, the weight distribution is: D m + 1 = (w m + 1, 1 , w m + 1, 2 , :::w m + 1, N ) ð6Þ In the formula,z m is the normalized constant, and D m + 1 is transformed into probability distribution: After the changes, the final strong classifier is obtained: The first stage detector is trained based on AdaBoost algorithm combined with sample features, which requires a large number of positive and negative samples. The detection object in the first stage is the position of pedestrian's head and shoulder, so the positive sample is the image of pedestrian's head and shoulder, and the negative sample is the image of non-row human head and shoulder. The positive samples are composed of two parts, one is from the INRIA pedestrian database, and the other are from the actual scene images taken by the network camera of laboratory, from which the head and shoulders of the pedestrians are intercepted. A total of 2360 positive samples were obtained.
With respect to the selection of negative samples, a total of 9440 negative samples were obtained by using the negative samples provided in the data set and the non-head with shoulder images captured in the actual scene. By calling the resize function in OpenCV, all the images are normalized to 32 3 32 and then graying processing.
The detector based on Haar features obtained by AdaBoost cascade classifier has a fast detection speed. However, in complex backgrounds, other objects in the background may be mistaken for detection targets, such as backpacks that are mistakenly detected as pedestrians. In this paper, based on the first stage detection, the second stage detection is carried out to further improve the accuracy of detection results.

Level 2 pedestrian detection scheme
The extracted head and shoulder areas were extracted for feature extraction and fusion by secondary detection, and the fused features were then trained on the improved SVM.
In order to balance the globality and locality of the SVM classifier, and make the classifier have the ability of learning and generalization capability, this paper uses the combined kernel function SVM algorithm based on particle swarm optimization algorithm (PSO) to obtain a pedestrian detection classifier with better classification performance. By analyzing the learning and generalization capability of polynomial kernel function and radial basis kernel function, uses linear combination polynomial kernel function and radial basis kernel function as the kernel function of SVM to construct a pedestrian detection classification model to improve classification performance. In addition, the parameters that affect classification performance of the combined kernel function SVM are analyzed, mainly analyzing the effects of polynomial kernel function parameters, radial basis kernel function parameters, combined kernel function coefficients and penalty factors on classification performance.
Feature fusion processing. Features tend to be used independently, and the importance of feature in the overall representation is expressed by a weighted value. Feature fusion is a mutual complement between features. 24 Another local feature is introduced to compensate for the defect of a local feature, so that each local feature can be fused together more effectively, and ultimately achieve the purpose of improving the robustness of image features and getting better accuracy. 25 Therefore, the secondary detection is completed by the fusion feature combined with the improved SVM classification algorithm. The process of feature fusion is shown in Figure 3.
Firstly, the HOG algorithm is used to extract features of the pedestrian's external contour, and then the LBP algorithm is used to extract features of the pedestrian area. Perform preprocessing operations on the obtained initial features, and the weighted features are fused to get the final fusion features. The weighting formula is shown in (10): In the formula, l refers to the fusion feature obtained after the weighting operation L 1 and L 2 in the formula respectively refer to the weighting of corresponding features, and the sum of them is 1,ã andb refer to the pedestrian edge features and texture features obtained after pretreatment.
Dimension reduction of fusion features. The curse of dimensionality is also a major problem for feature fusion. Feature fusion can showing better performance, but too high dimensionality of fusion features will increase the calculation time, and will also have some influence on the final result of pedestrian detection. PCA is a data dimensionality reduction algorithm, which is a linear feature extraction algorithm based on K-L transformation, and can greatly boost the learning rate. The variance in dimension is calculated as: var(X) = P n i = 1 (X i À X) The calculation of covariance as follows: The result of covariance can show the degree of correlation between the two dimensions. If the result is 0, it means that there is no relationship between the two dimensions. Suppose that given m spatial samples x 1 , x 2 , :::, x m with dimension n of eigenvectors, the covariance matrix can be calculated as: The covariance matrix S is decomposed to obtain matrix P k = p 1 , p 2 , :::, p k ½ composed of eigenvectors corresponding to the first k largest eigenvalues of covariance S, and the final eigenvector can be obtained by dimensionality reduction calculation: Improved APSO -SVM model construction

Build composite kernel functions
The kernel functions commonly used in SVM classifier are as follows 24 : Polynomial function: K(x i , x j ) = (dx T i x j + r) d , d . 0 Radial basis function: K(x i , x j ) = exp( À jjx i Àx j jj 2 2s 2 Sigmoid function: K x i , x j À Á = tanh dx T i x j + r À Á , d . 0 In the kernel functions, d and d, s are all adjustable parameters, and r is the punishment function, which is used to measure losses.
The global kernel function has a strong generalization ability, while the local kernel function has a strong learning ability. If these two kinds of kernel functions are combined, their respective advantages can be brought into fully exploited, which makes the combined kernel function have good generalization ability and learning ability, and improves the recognition performance to a certain extent. The combination kernel function can be formed by linear recombination of the aforesaid several kernel functions, and the combination kernel function is used as the kernel function of SVM. The following expression can be used to express the linear combination of kernel functions: G k represents the combinatorial function, n represents the number of kernel functions, and d n represents the weight value of each kernel function.
In the global kernel function, the polynomial kernel function has the best performance, while in the local kernel function, the radial basis kernel function has the best performance. 26 In this paper, the two functions are combined to construct a combinatorial kernel function. The combined kernel function can be expressed as: m refers to the weight coefficient of the combined kernel function, m 2 (0,1). In this paper, the adaptive particle swarm optimization algorithm is used to obtain the optimal parameters.
APSO algorithm is used to find the optimal parameters The particle swarm optimization (PSO) is to simulate the migration and swarming behavior of birds in the foraging process. Its principle is that the particles start from the random solution and iterate to find the optimal solution, that is, to find the global optimal value by following the current optimal value. By using the information sharing of individuals in the group, the whole group realizes the transformation from disorder to order in the process of solving, so as to get the optimal solution. 27 Adaptation introduces mutation operation into the original algorithm, that is to reinitialize some variables with a certain probability. The mutation operation makes the particle jump out of the previously found optimal value position, and carries out the search in a larger space, while maintaining the diversity of the population and improving the possibility of searching for a better value. In the process of solving the problem, an optimization function needs to be defined, and the fitness value of each particle is calculated by the optimization function. Suppose the problem search space is D dimensional, and the total number of particles in the population is N, it position of particle is D x = (D 1 , D 2 , . . . , D X ), the velocity vector is v x = (v 1 , v 2 , . . . , v x ). The optimization is carried out in the form of gradual iteration. Equation (15) represents the update of speed and position at the current moment and the previous moment: Where P x represents the individual optimal value of the particle, G x represents the global optimal value, w is the inertial weight, c 1 and c 2 are learning factors that vary in the range [0,2], r 1 and r 2 are random numbers that vary in the range (0,1). The formula 17 adds weight on the basis of the original PSO algorithm position update formula, so that the particle swarm can be better judged in the optimization problem.
w d is the weight which we want to find, as can be seen from the equation (18), with the increase in number of iterations, the value of weight is constantly decreasing.
With the increase of the number of iterations, the particle swarm motion is getting closer to region of the optimal solution. 28 At this time, the proportion of the global search kernel function decreases, while the proportion of the local kernel function increases. By slowing down the search speed, the accuracy is improved, and the speed of convergence to the optimal solution is accelerated. In pedestrian detection, the APSO algorithm is used to obtain the optimal parameters in the combined kernel function, and the pedestrian detection classifier is constructed. The main steps are shown in Figure 4.

Results
In the calculation, APSO uses individual extreme values and group extreme values to achieve gradual optimization, and follows the current optimal solution in the search process, with extremely fast convergence speed. The influence of the combined kernel function coefficient m on the classification performance was verified, and the following results were obtained: By comparing from Figure 5, it can be seen that when m is 0.1, performance of the combination kernel function is the best, and the result of detection and classification is the best.
It can be seen from the parameter setting that the parameters affecting classification performance mainly include the optimal parameter of the combined kernel function and the punishment function. When the punishment factor r takes different values, the recognition rate of different kernel functions changes as shown in the Figure 6: The data in Figure 6 show that the recognition rate of the combination kernel function is the highest when the punishment factor r is 100. After feature fusion and dimensionality reduction processing, it is jointly trained with the improved SVM classifier. The obtained classifier is connected in series with the classifier in first stage, and the output of the classifier in first stage is used as the input of the classifier in the second stage to realize the head shoulder detection of the combined structure, and the final detection result is obtained. The detection results are shown in Figure 7.
It can be seen that the second-level detection based on the first-level detection will effectively reduce the false detection rate and distinguish the real target. The feasibility of the method used in this paper is also proved through comparative experiments. The secondary detection based on the primary detection can effectively improve the accuracy. Relatively speaking, the method used in this paper will increase the detection time to a certain extent, but the increase in detection time is very small, while the reduction in false detection rate is significant.
The development tool of the experiment is Microsoft Visual Studio 2015. It is carried out under the 64-bit Windows 10 operating system, Intel(R) Core(TM) CPU i5@3.3GHz; the memory is 8 GB.
In traditional methods, some features perform better, such as HOG features and LBP features. Some researchers propose to fuse features for classifier training to obtain better detection results. The method in this paper is compared with them, and the results are presented in Table 2.
Nowadays, more and more researchers use the method of deep learning to complete the detection research, and compare the methods used in this article with the popular SSD and Faster. The experimental results were compared with other commonly used methods on INRIA and Caltech data sets respectively. In order to reflect the fairness of the experiment, different methods will take the same configuration. The comparison result shows in Figures 8 and 9.
Through data analysis, it can be seen that the method used in this paper is obviously superior to the recognition results of traditional methods in detection results. Compared with SSD, YOLO, etc., it also shows certain advantages, especially in the INRIA dataset. Detection background is relatively simple, no excessive occlusion cases, excellent performance. Comprehensive analysis shows that the classification detection method proposed in this paper has the best recognition effect.

Discussion
In order to achieve accurate and fast pedestrian detection, this paper proposes a hierarchical detection algorithm. In the first stage of detection, the classifier trained by Haar feature + Adaboost algorithm is used as the detector to extract the ROI. The combination function is used as the kernel function of SVM, and the adaptive particle swarm optimization algorithm is used to obtain the optimal weight of the combination kernel function. The improved SVM is used as the classifier of the second stage detection, and the head shoulder detector is trained to generate the relevant features for the second detection of the image of the ''region to be detected.'' The fusion features are used to greatly improve detection performance. In addition, the cascading structure is used to effectively refine the detection area and provide faster results. However, in   complex scenes such as occlusion, the performance needs to be improved. Good results can be obtained on simple INRIA data sets, but the detection results on more complex datasets are not very good, and it needs to be improved and optimized. Future work will focus on pedestrian occlusion, and an effective and lightweight feature extraction network without preliminary training is also worth considering.

Declaration of conflicting interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was partially supported by the Characteristic Innovation research project of Teachers in Guangdong Colleges and universities (2020DZXX07) and the Quality Engineering project of Education Department of Guangdong Province: Guangdong Higher Education Document no. 29 [2021] and Youth Scientific Research Project of Guangdong Universities (ky202015, ky202103).