Using Deep Learning Models Combined with Crowd Emotion Models to Identify Abnormal Behaviors in Crowds

Aiming at the problem that the definition of crowd abnormal behavior detection is ambiguous and difficult to combine with context semantics, an algorithm using OCC human emotion model combined with crowd entropy is proposed. First calculate the crowd entropy for the crowd, and determine whether the entropy value is abnormal, if it is abnormal, further extract the optical flow OF and HOG. Then project it into two-dimensional vector data, send it to CNN for local feature extraction and combine with OCC model to achieve the description of crowd emotions. Finally, predict whether the abnormality occurs according to the judgment factor. Verified on the data set, this method shows a high accuracy.


Introduction
The detection of crowd abnormal behavior is divided into crowd density estimation and crowd behavior recognition, and the latter is the focus and difficulty of current research. Researchers have proposed various methods for directly extracting the trajectory state of the crowd in the video from the underlying image information features to the high-level structural modeling. For low-density people, the methods of individual appearance abnormality detection and individual behavior recognition[1] are mostly used, such as analysis methods based on behavior feature extraction, behavior analysis methods based on wavelet transform algorithm and analysis methods based on rigid body kinematics, of which the most commonly used. It is a behavior analysis method based on behavior feature extraction. Commonly used features are Histogram of Oriented Gradient (HOG), Haar-like, Edgelet, and color features. For medium and high-density people, the method of overall analysis is mostly used, such as judging the change of the entropy value of the people[2,3] , Mutation of the crowd's exercise energy [4,5], abnormal trajectory[6, 7], etc. The definition of crowd behavior abnormality has always been a puzzle. For the abnormality of individuals, after the training of the appearance characteristics of normal individuals, when the behavior of the test set is different from the trained model, it can be judged as abnormal. And for the movement characteristics of the crowd, we usually use the parameters deviate from the threshold, such as entropy value, CMI peak value [8] and motion energy, etc.

Proposed Algorithm
First, calculate the entropy of the crowd and determine whether it exceeds the threshold. If it exceeds the threshold, the next calculation is performed. Otherwise, it is determined as a normal crowd; for the crowd with abnormal entropy value, the optical flow (OF) and gradient histogram from the tensionbased motion descriptor are combined, and then projects it into two-dimensional vector data, through CNN further feature extraction, combined with the OCC model to describe the crowd's emotions. Finally achieve accurate description of crowd characteristics and crowd abnormal behavior of detection.
Contribution: 1) Propose to combine motion features and emotional psychological features to fully describe different crowd attributes; 2) Use wavelet transform to project features into a twodimensional view, and then input into a simple deep learning framework, without affecting the speed of calculation, to realize accurate detection and early warning of crowd behaviors; 3) It can achieve rapid and accurate detection of abnormal behaviors of medium and high-density crowds.

Image Segmentation and Judgment of Entropy
In the video, useful information is partial, so the region of interest needs to be extracted. This paper uses the frame difference method to extract the foreground target area. In the middle and high-density crowd, because the crowd is relatively crowded, the feature extraction of the entire motion area will have a large computational complexity, so we segment the motion area. When initially judging the abnormality of the crowd, the crowd entropy is used, The calculation formula of entropy value is as follows: Where W is the state set composed of all microscopic states of the crowd at a time, W = ; = 1,2, … , , and is the total number of microscopic state of the crowd at a certain time. is the probability of the crowd appearing under . S represents the statistical average value of the probability in each microscopic state , and is the weight. Theoretically, the maximum value of entropy is , and half of this value is used as the threshold. When it exceeds 3/2 of the threshold, anomaly occurs.

Extraction of Emotion Information 3.2.1. Extraction of OF
In the polynomial-based model, we use the linear combination of orthogonal polynomials proposed to approximate the optical flow vector field.
Where 1 1 , 2 and 2 1 , 2 are the horizontal and vertical displacements of point x. The optical flow V 1 and V 2 are approximated by each projection function V on each polynomial P i,j . Definition F: = ( 1 1 , 2 , 2 1 , 2 ).

Histogram of Gradient Tensor
Here we need to calculate the gradient of all n points in the image I j , through the histograms of gradients = , , k∈[1,nb θ ] and l∈[1,nb ψ ], where nb θ and nb ψ are respectively represents the number of θ and ψ. We used equally-divided intervals to obtain nb θ and nb ψ bins. The HOG tensor of frame j with m bins and h j is: Calculate the extracted histogram of the entire image, and each cell of the image is represented by a vector. We use a grid of n x and n y blocks to divide the f frame into x and y directions. Now calculate the tensor of all bins in frame j: The global descriptor of the segmented image and gradient histogram is calculated as: Finally, the histograms of the approximation of the tensor, the optical flow approximation and the gradient approximation are obtained from the video data: = +

Two-dimensional Projection of Features
For the extracted OF and HOG feature vectors, converted to a two-dimensional map, we use wavelet transform, by calculating the vertical and horizontal projections of some wavelet sub-band quantization coefficients, and then vertically stitching to obtain a two-dimensional map of features.

Emotional Analysis
The OCC model is based on the concept of evaluation and attributes emotional arousal to a subjective interpretation of the individual environment. The judgment of the emotions of the existing crowd can be used for behavior prediction, and the detection of the emotions of the existing crowd needs to be combined with the PAD (pleasure-arousal-dominance) model, which determines the average emotional state of the representative life situation samples described by Mehrabian [10]. OCC emotion is always related to PAD status [11]. The PAD space uses such a map and its three orthogonal scales to evaluate emotional tendencies.
When the crowd's emotions exceed a certain threshold, the individual is likely to have the ability to communicate emotions. The threshold of expression ability of individual j comes from the normal distribution with mean value 0.5 − 0.5 and standard deviation (0.5 − 0.5 )/10. We classify this group of people as a triggerable group, which can be used to predict the next behavior of this group based on the type of emotion affected. We use the above two models to preliminarily judge the emotional state of the crowd, and use it as an input of CNN, and the projected two-dimensional data as another input, to further extract and classify the characteristics of the crowd.

Experiments and Analysis
On 4.0 GHz CPU, 64-bit Windows 10 operating system, MATLAB 2018a, Open CV and python programming languages are used as development tools for simulation experiments. In order to verify the effectiveness of the proposed algorithm, this paper selects the UCSD dataset and MED [12] as the experimental objects.
Compared with the method in the literature on the UCSD data set, the results of AUC and equal error rate EER are shown in Table 1. It can be seen that the method we proposed shows good performance on Ped1 and can effectively monitor abnormal frames. But the values of AUC and EER are not optimal on Ped2, which may be related to the shooting angle.  Table 2 is a comparison experiment on the MED dataset. The comparison method is to extract lowlevel features. It can be seen that our algorithm has an accuracy of up to 90.91, indicating that compared to low-level features, our method using emotion-based combined motion information can obtain a more accurate description of the behavior.

Conclusions
This paper uses the entropy value to mke an initial judgment of anomalies, combined with the crowd emotion models OCC, PAD to describe the crowd's emotional psychological characteristics, and as an input of CNN, combined with the optical flow and gradient tensor to extract the crowd's motion features to achieve abnormal accurate detection of behavior. The main advantage of the tensor-based model is that there are many changes in the situation, such as the emergence of new moving objects, there is no need to add new descriptors, that is, strong scene adaptability. The introduction of emotion models can realize the semantic interpretation of behaviors and increase the accuracy of the judgment of crowd status.

Acknowledgments
This work was supported by Foundation Research Fund of Engineering University of PAP (WJY201906) and Scientific Research Project of Equipping the Armed Force in 2018 (WJ20182A020020-2).