Classiﬁcation of human activity detection based on an intelligent regression model in video sequences

The most critical objective in security surveillance is abnormal event detection in public scenarios. A scheme is presented for detecting abnormal behaviours in the activities of human groups based on social behaviour analysis. This approach efﬁciently models group activities than some of the previous strategies that use independent local features. This paper presents a feature descriptor method to signify the movement by implementing the optical ﬂow through covariance matrix coding. The multi-RoI (region of interest) covariance matrix has some frames or patches which could represent the movement in high accuracy. Normal samples are plentiful in public surveillance videos, while there are only a few abnormal samples. For that, the model of a hybridised optical ﬂow covariance matrix is represented in this paper. Optical ﬂow (OF) in the temporal domain is measured as a critical feature of video streams. The logistic regression method is used to detect abnormal activities in a crowded scene. Finally, the behaviours of human crowds can be predicted using benchmark datasets such as UMN, UCSD as well as BEHAVE. The obtained experimental results show that the proposed approach can effectively detect abnormal events from the abandoned environment of surveillance videos.


INTRODUCTION
Human action recognition has drawn an expanding consideration, and its applications are relevant in the extreme mechanism of video surveillance and security, video annotation and recovery, behavioural biometrics and human-computer connection, and so forth [1]. Vision-based human activity recognition is the way towards perceiving human activities in video sequences by using computer vision strategies. The term activity refers to basic movement patterns normally executed by solitary individuals and ordinarily extend for a small duration of time [2]. The objective of action recognition is to investigate progressing actions from an unknown video, consequently [3]. Accordingly, one of an adequate obstacle admired among human action recognition originate from tremendous conflicts such as great intra-class difference, scaling, impediment, and clutter. Videobased human activity detection has numerous applications in human-computer collaboration, observation, video ordering, and recovery. Activities or movements create some of the patterns which are fluctuating in terms of spatio-temporal forms in This is an open access article under the terms of the Creative Commons Attribution License, which permits use, distribution and reproduction in any medium, provided the original work is properly cited. © 2020 The Authors. IET Image Processing published by John Wiley & Sons Ltd on behalf of The Institution of Engineering and Technology recordings, which could be utilised as feature descriptors for the detection of actions [4,5].
In the earlier developments, action recognition has included utilising video successions caught by camcorders. Sequentially spatio-temporal highlights are broadly utilised for perceiving human actions and contrasted with some of the images which are conventional. In lighting conditions, the depth maps are very insensitive to changes and can give 3D data for recognising activities that are hard to portray the utilisation of traditional images [6,7]. Space-time based strategies, for example, the highlights of local spatio-temporal are famous systems for video representation also demonstrates promising execution in action recognition. The intensity-based video images, which restrict the intensity of detecting an action [8]. Convolutional Neural Networks (CNNs) is a sort of deep models in which trainable channels and nearby neighbourhood pooling tasks are connected alternatingly on the raw input images, bringing about a chain of command of progressively complex highlights [9]. When prepared with suitable regularisation, CNNs can accomplish superior execution on visual object recognition tasks. Furthermore, CNNs have been appeared to be invariant to specific varieties, for example, posture, lighting, and encompassing clutter. As a class of deep models for highlight development, CNNs have been applied to 2D pictures [10,11]. In the most recent decade, the progressions in computational capacities and the introduction of powerful methodologies with preparing deep neural system designs need to prompt their wide use to address different computer vision challenges [12].
The convolutional kernels mutually in space and time demonstrate with the 3D CNN because of a direct expansion of the well-known 2D CNN profound structures to the 3D spatiotemporal domain [13]. The techniques go for adapting longextended movement by taking in a chain of command comprising of different layers of 3D spatio-temporal convolution parts by early fusion, late fusion, or slow fusion. The CNN architecture of two-stream functions acquires movement patterns utilising an extra CNN, and input given is the optical flow, which has been obtained from the progressive frames of video sequence [14].
In order to detect the abnormalities in the behaviour of the huge body with normal data, it is relevant to enhance the utilisation of normal data rather than the acknowledgment [15]. Automatically detecting abnormal exercises or occasions from long length video sequences is important for intelligent surveillance, behaviour analysis, and security applications. Specifically, abnormal behaviour detection in queued scenes is a problem because of the substantial number of people on foot nearby, the instability of individual appearance, the continuous incomplete impediments that deliver, and the irregular motion pattern of the group [16,17]. Moreover, there are potential risky exercises in swarmed conditions, for example, crowd panic, stampedes, and accidents, including an extensive number of people, which makes computerised scene investigation in the most need [18]. Among numerous video detection works, identifying anomalous activities in video streams is fundamental significance as it is identified with certain topics in computer vision, which are fascinating, for example, visual saliency, imperative behaviour recognition, and interestingness prediction [19].
Tragically, in the video sequences, the anomalous action recognition is a troublesome test because of the instability of the classifications mutually for normality and abnormality. An illustrative method to address the detection problems of irregularity is to separate local features. After that handle, an exception identification errand where typical exercises in video scenes are displayed, also abnormal occasions are distinguished as fundamentally different as of the model [20]. In a public scene concerning social security, the detection of an abnormal event is important. The public scene could be considered a significant place where common and daily activities often occur. In the public scene, security surveillance concerns are connected to livelihood subjects. The security guards check the surveillance videos, and for them, it is very simple to visual and intellectual fatigue. Hence, it is not possible to perform rapid action towards unpredicted or abnormal events. Because in public scenarios, most monitors are placed for safety purposes, the automatically set alarm system helps the security guard to observe the events which are in an emergency from a large database of videos, which is critical.
The optical flow, as well as the consequent partial derivatives, are fused with the help of the covariance matrix descriptor and are projected into a lower range of dimensional feature space.
In the temporal domain, optical flow (OF) is measured using the Lucas Kanade method, and it is considered as a critical feature of video streams. The covariance is classified using logistic regression, and it probably patterns the dealings of the optical flow in the ROI. Although, as a linear method, the logistic regression is considered in which the predictions are renovated through logistic function. The abnormal event is detected using logistic regression with the proposed covariance matrix descriptor. Experimentations conducted in the standard datasets confirm that the proposed model does better than the existing methods. This research article designs the model of anomalous activity issues present in the complicated public scenarios, and advanced technologies are presented for the intelligent security surveillance system. They are remaining part of the article prearranged, as listed below.
In Section 2, the related works are reviewed. Section 3 explains the proposed method with the subsections mentioned below. In Section 3.1, foreground detection by using the Gaussian mixture model (GMM) in which each pixel is patterned as a mixture of Gaussian. The classification is based on the mean and variance of both foreground and back ground. In Section 3.2, for optical flow estimation, the Lucas kanade approach is proposed, and in Section 3.3 the motion descriptor presents the movement details, Section 3.4 introduces the logistic regression analysis for abnormal behaviour detection with a stochastic gradient descent algorithm for cost optimisation is discussed. In Section 4, the experimental evaluations conducted in the datasets of BEHAVE, UMN, and UCSD are explained. At last, Section 5 concludes with the prospect of future work.

RELATED WORKS
The humans action recognition has been applied in some domains like human-computer communication, game control, and smart surveillance. Liu et al. (2017) [21] proposed an improved visualisation strategy of skeleton meant for human activity detection based on the view-invariant sequence.
At first, based on the sequences, a view-invariant change was made to devastate the impact of skeleton joints view variations from the spatio-temporal areas. Next, the skeletons changed were visualised as a sequence of colour images. Third, a model based on the convolutional neural system for colour images was received to isolate the strong as well as discriminative highlights. Using the choice level fusion of deep features, the final action class scores were produced. Ten CNNs hard determination and the weighted probability fusion was an old strategy that has less flexibility, and the enhancement of images needed to be more improved. Experiments conducted on standard datasets confirmed the durability of the proposed scheme compared to view variations, noisy skeletons, inter, and intra similarities between the skeleton sequences. The experiments, when conducted with the NTU RGB+D dataset, the proposed technique achieves approximately a gain of 10%. The effectiveness of the scheme was shown by this result associated with the modern LSTMbased approaches.
In human behaviour subjects, the investigation means to comprehend the subject behaviour over time utilising the motion information. Ijjina et al. (2017) [22] specified an approach for perceiving human action because of RGB-D video based on the motion sequence information utilised deep learning. This technique does not perceive human behaviour, like hand movements and activities of a group. Multi-modal information usage, as well as the ConvNet features noise tolerance, provided the strength also the compliance of this method compared with other detection tasks. A combination of facts among several classes implied that best actions could be accomplished through the fusing facts across models highlighting certain temporal areas. Then the prevailing methods, the proposed approach is faster owing to the un-complicated calculation in figuring out the new representation (which could be parallelised) as well as the implementation is correspondently done for the ConvNet feature extraction. This approach's efficiency was verified on SBU Kinect interaction, MINIA action NATOPS gesture, and Weizmann datasets.
Skeletons were usually accessible as a contribution for human action recognition. Hou et al. (2018) [23] depicted a strategy to encode the spatio-temporal data of a skeleton arrangement into colour texture images, referred to as skeleton optical spectra. It utilised a Convolutional Neural Networks (ConvNets) to attain the discriminative highlights for activity recognition. Such range portrayal made it conceivable to utilise a standard Con-vNets design to take inappropriate powerful highlights from skeleton arrangements without preparing a huge number of parameters. In this technique, the ConvNets based end to end encoding of skeleton sequence was not available. To increase the accuracy rate, the Late score fusion was implemented, which involves the three orthogonal planes which were complementary to one another. On three standard used datasets, the results were obtained.
In intelligent video surveillance, anomalous action recognition was also a challenging research concern.
Sun et al. (2017) [24] presented Deep One-Class (DOC) by representing an end-to-end model that synchronised with the Convolutional Neural Network (CNN) using the one-class Support Vector Machine (SVM). The proposed model, related to the hierarchical models, finds the optimal global solution even though it simplifies the complication involved in the entire process. Approving DOC showed an easily accessible dataset, and this model has incredible execution; also, it is powerful for anomalous action recognition from surveillance videos. The video anomaly detection system should have been enhanced to acquire greater outcomes.
Separating discriminative and powerful highlights in action recognition of humans from the video sequence is the first and most basic advance. Liu et al. (2016) [25] determined a genetic programming (GP) strategy; this strategy enhances the motion feature descriptor on a population of crude 3D operators. The optimal arrangement chosen by GP was considered as the ideal action descriptor which has been attained. Feature learning was influenced by the proposed method, which lets a computer to routinely collect the holistic feature extraction using a pool of primitive operators, and these operators were formulated based on the basic idea of feature extraction. As an optimisation concern, the feature extraction done based on the GP approach was reviewed on four standard action datasets, specifically KTH, HMDB51, UCF YouTube, and Hollywood2. In all four datasets, the Experimental results showed that the proposed feature learning method based on GP accomplished improved performance in detection associated with modern hand-crafted and machine-learned methods.

THE PROPOSED METHOD
Herein the article intends a new anomaly detection technique for congested ones with a covariance matrix description model for Local and Global Abnormal Detection. In Figure 1, the plan of the proposed approach is encapsulated. Initially using the Gaussian Mixture Model algorithm (GMM), the extraction of the foreground objects is done.
To model the distribution of image intensities, the GMM is used per image location, then the foreground pixels in one locality were categorised into a region of interest (ROI). Then the optical flow is measured using Lucas-Kanade method as this method is easy to compare to another method, very fast calculation with accurate time derivatives. Then, a feature descriptor is proposed to perform as per the movement based on the usage of covariance matrix coding and equivalent partial derivatives of several frames in a connective state, otherwise the patches of the frames. The movements in a high range of accuracy is represented by the covariance matrix of multi-RoI (region of interest). The covariance matrix descriptor is combined with the optical flow and equivalent partial derivatives, then prompted into feature space of lesser dimension. The issues of common binary classification varied for the abnormal event detection, which contains some samples which are normal besides very less or null abnormal samples used for training. Thus, instead of using other classification approaches, the logistic regression method is used in this paper.

Foreground detection by using Gaussian mixture model
The procedure involved in Foreground detection is extracting the object in motion (called foreground object) from the information, which is in static form termed background. In multistage computer vision systems, it remains the backbone and as a principal pace in various computer vision fields, namely video surveillance systems. So, the overall system performance is based on the result of foreground detection. GMM, among several algorithms, is one of the well-known ones because of its adaptive nature, good accuracy as well as less computational cost [26]. Each pixel in GMM [27] is modelled as a Gaussian mixture, then based on the mean and variance, it is categorised as the foreground or background. For identifying a specific pixel value, the probability is From (1) the probability density function is represented as: in which, K, i , t, 2 i,t , are the number of Gaussian, weight estimation, the i th Gaussians mean and variance in the mixture at time 't'. The decision standards at time 't', so the specific pixel as background or foreground is: In (3), is a perpetual threshold equivalent to 2.5, with one of the K Gaussian modules, if a match is noticed, the pixel is grouped as background. Moreover its factors are updated as: where and are the learning rate, which is constant also second learning rate, respectively. The pixel is sorted as a foreground, if none of any match is observed among any of the K Gaussian, besides only weight is updated: Gaussian Mixture Model is applied for extracting the foreground based on shape. Next to the extraction process, for separating skeleton shape from persisting objects this analysis is used. Foreground region of interest (ROI) is termed to feature extraction process. For that, we use feature extraction based on optical flow process. Optical flow feature set is termed to denote the segmented object(s). The optical flow based feature vectors are calculated beside the boundary and the feature set includes the shape and instantaneous velocity data mined through the action performers boundaries. The extracted optical flow based features are fed to a logistic regression.

Lucas-Kanade method for optical flow measurement
The Lucas-Kanade (LK) method is generally utilised for estimating optical flow as well as computer vision in a differential method. The simple optical flow equations are solved by this approach for every pixel in that neighbourhood, using the minimum squares principle. This approach is not able to provide the details of flow in the identical internal regions of the image; hence it is a local method. In this approach, the flow is supposed to be constant in a pixel of the local neighbourhood under consideration [28]. The equation based on the optical flow can be made to hold for every pixel within a window positioned at 'P'. The LK method states that the movement of the image contents among two adjacent frames are in-significant and almost constant within the point P's neighbourhood [29].
Inside the window P 1 , P 2 …..P n are the pixels, and the partial derivatives of the image 'I' are I x (p n ), I y (p n ), and I t (p n ) concerning the position (x, y) and time 't', estimated at the point 'P n ' at the current time. In matrix form, the equations can be denoted: .. ..
By means of the least squares standard, the solution of LK is determined. Now A T AV = A T b can be expressed in another form and obtain V = (A T A) −1 A T b, where A T is the transpose matrix which can be written as: Here, n = N to 1 The structure tensor of the image is characterised by the matrix A T A at the point 'P'.

Weighted window
The former presented plain least squares clarification provides a similar prominence to all N pixels P n in the window. In the procedure, it is generally suitable to provide additional weight to the pixels that are adjacent to the central pixel P. For this purpose, the least squares equation's weighted version is used [30]. A T WAV = A T W b, this equation can be written so that to obtain V.
In which 'W' is diagonal matrix of (n × n) with the weights W ii = w i, which is to be assigned to the equation of the pixel P n , which computes the value of V.
The weight 'wi' is generally allocated to a Gaussian function of the distance concerning P n and P. A weighted least squares (LS) fit of local first-order constraints has been applied by certain approaches such as Lucas and Kanade and others to model for 'v' which is a constant in every small spatial neighbourhood by minimising the below equation.
In (17), w(x, y) represents a window function that provides considerable influence to constrain at the neighbourhood's centre.

Motion descriptor
The goal of finding the anomalous action present in a surveillance video, a feasible descriptor is used for describing the information of the movement, which could be processed through a machine learning approach. Optical flow is a form of object's motion visual formed based on the comparative motion concerning an observer and a scene. Using a basic feature, the optical flow is preferred to signify the movement among two image frames. As an objective function, the optical flow approach is optimised, which using a spatial term modelling, fuses with the brightness constancy limit, in whatever approach the flow probably varies across the image. Using a comprehensive constraint of smoothness, the aperture concerns could be solved. The Optical flow is formulated in Section 3.2. The Feature descriptor based on a covariance matrix dependent optical flow [31] combines the information of the spatio-temporal motion of some of the consecutive RoI (Region of Interest). Initially, the video clips are divided into classes comprising of n number of frames, then for each of the two sequential frames, the optical flow is computed. The RoI features of the pixels in every group are arranged from 1 st to n th . As an example, from the k th group, the position of the pixel from (1, 1) to (h, w) are experimented using the height 'h' also width 'w' in a RoI. The coding related to the information of the movements are done using the matrix with (2 + l) columns and (h × w) rows, the optical flow relied on features length is denoted as 'l'; also the height and width of the RoI are denoted as 'h' and 'w'. One RoI's (R) feature matrix is represented, as shown in (18): Every RoI present in a group is classified with the serial numbers from 1 st to n th in a multi-RoI covariance feature, then one RoI groups feature matrix is represented as in Equation (19), Where1 is a vector with all the elements 1. One group matrix's dimension is (n × h × w) × (3 + l). The initial format of the column in the group is the RoI position, and further columns are the position of the pixel in one RoI. Based on the objective, the  (14*14) [y,x,U,V,U x ,U y ,V x ,V y, U xx ,U yy ,V xx ,V yy ,U xy ,V xy ]

FIGURE 2
The multi-RoI covariance matrix features calculation in progress. The optical flow of RoI is represented based on arrows. Blue, green and red colours denote different RoI's feature based on the optical flow is determined. For the problems related to the detection of an anomalous event, the feature based on the optical flow is presented in Table 1.
The issues present in the feature selected for abnormal detection is described as: In (20), the horizontal and vertical optical flow are denoted as U, V, is the i th feature's optical flow. In the dimension (h × w) × (3 + l ), 'F' denotes the feature gained for a single group of RoI, which is described in Equation (18). With the feature F, the covariance matrix of dimension (3 + l ) × (3 + l ) is presented in (20) [32]: In (20) the number of the pixels sampled art denoted as 'M'. For a group of RoI's it is signified as n × h × w. The mean of the 'm' feature vector is ' ' and the i th point's feature vector is 'Z i '.
Therefore, the covariance matrix unites frequent features. The proposed feature's calculation is shown in Figure 2. A feature matrix 'F' is represented in a video stream 'V', with group 'G' having n number of frames, besides the multi-frame covariance matrix 'c' is computed. The intra-RoI information is defined by the pixels position, optical flow as well as the equivalent partial derivatives. Furthermore, the inter-RoI knowledge stamps by the RoI position in one group. Hence, the RoI movement property combined by the proposed covariance descriptor exposes the spatial-temporal feature. Due to symmetry, based on the covariance matrix 'c' that combines the 'l' features, it has

Logistic regression analysis for abnormal behaviour detection
The logistic regression analysis for the detection of abnormal behaviour is introduced in this section. Logistic regression, as it is known, might be bi-or multinomial. Binary otherwise binomial logistic regression is the case wherein the identified result possess merely two conceivable forms (e.g. 'alive' vs. 'dead', 'success' vs. 'failure', or 'yes' vs. 'no').
Types of logistic regression such as multivariate or multinomial relates to conditions in which the possibility mostly has three or else more conceivable forms (e.g. 'improved' vs. 'certainly not change' vs. 'poor') [33]. As commonly, the logistic regression analysis approaches such as binary logistic regression and multinomial logistic regression approaches are applied to tackle the concerns regarding classification problems since the probability of occurrence of an incidence cannot be computed directly.
Logistic regression (LR), in practice, is generally used as a classifier, specifically meant for probabilistic binary or multivariate classification [34]. For the category selection, the principle in logistic regression is based on the probability which is highest and is generated using a logistic function. The entire procedure flow chart is shown in Figure 3.
In LR, the user is provided with the probabilities which are explicit for classification also for class information. LR [35] is an arithmetical approach for examining a dataset wherein the determining variable is dichotomous (binary).
LR finds its application to find the relations concerning a single determined variable or a fewself-determining variables.
Every independent variable with weights are multiplied and then added. To obtain the result between 0 and 1, sigmoid function are added. The values more than 0.5 are noted as 1, and the values less than 0.5 are noted as 0.
It is essential to obtain the optimal one among the weights or regression coefficients, and for this purpose, the optimisation techniques is used.
Definition of logistic regression can be made in Equation (21): where,

Ai ∈R d is d dimensional self-determining variables
Bi ∈R is the determined variables Here, the matrix A and B lies from −1 to 1.
The probability of 'B' turns to 1 which is given in Equation (22).
The possibility of 'B' turns to −1 is where ∈ R d weights The appropriate weight vector 'θ' is to be found for the negative log likelihood for reducing the error function (35), In which 'λ' is the regularisation term to correct the large weight parameters.

Stochastic gradient descent to minimise an error function
In logistic regression to obtain the best regression weights, two optimisation functions are generally used, that is, gradient descent and stochastic gradient descent (SGD). To minimise an error function, a set of parameters are updated by both the algorithms in an iterative manner. In the training set, the complete samples are trained by the SGD algorithm in a specific iteration to ensure a single parameter update.
The SGD algorithm uses only a single training sample in a specific iteration of the training set to do the update for a parameter.
Therefore, for big data analytics, the gradient descent algorithm does not suit. In the LR to update the weights, the SGD algorithm is used. As an online learning algorithm, the SGD is considered, and it is updating the classifier, as new data approaches in considerably at the same time.
The typical SGD algorithm with the objective 'F (θ)' updates the parameters 'θ' as, In (25), the probability 'E[F(θ)]' is described by calculating the full training sets cost as well as a gradient. SGD uses a particular or small number of training samples, merely carry out using the probability in the update then figures the gradient of the parameters.
The new update is specified by with a pair (A(i), B(i)) from the training set used.

RESULTS OF EVENT ANALYSIS
The experimental results of the approach proposed for anomalous event detection is evaluated in this section provide experiments with standard datasets like BEHAVE [36], UMN [37], and UCSD [38]. In order to develop a multi-RoI covariance feature, consecutive four frames RoI is used. For global range abnormal event detection, the entire frame is chosen as RoI in UMN datasets. For local abnormal event detection, every 16 × 12 blocks is carefully chosen as RoI in the UCSD dataset.

UMN dataset
This dataset is gathered from UMN, the situation in the panic state is identified as the abnormal event which contains the lawn, indoor and plaza scenes. There are 11 series of videos in the dataset. In every dataset, the samples found to be normal or abnormal are separated. The frames in which the walking people are considered as the training samples as well as normal testing samples. The frames in which people running are considered panic scenes, which is regarded as abnormal samples.
The ROC (Receiver operating characteristics) curve of lawn, indoor and plaza scenes in the UMN dataset is presented in the following Figures 4, 5, and 6, respectively. Table 2 provides the detection performance.
In Table 3, the performance evaluations compared with the current methods are shown. From the normal training frames,  the proposed approach for abnormal detection in the UMN dataset has to prove that it can differentiate the global abnormal event. Furthermore, in Figure 4, 5, and 6 it is seen that the F1 (4 × 4) feature descriptor comes out with the results of a low    Figure 7 shows the results of the UMN dataset. a, b and c are the training of the normal samples from lawn, indoor and plaza scenes in which people are walking all over the place. d, e, f are the abnormal sample testing, which shows the persons are in motion.
Generally, the classification based detection method can be achieved with higher accuracy when the covariance matrix fuses more features. However, discrimination could also be weakened by more partial derivatives. Thus, for the detection of abnormal events, appropriate feature components are essential.

UCSD dataset
To find local abnormal events, the UCSD dataset is used, likewise, global abnormal events along with the proposed approach in UMN datasets have been detected. For training, Normal events are used, which contain only pedestrians. In contrast, the abnormal events resemble uncharacteristic motion patterns of an individual or the movement of ordinary objects, including motorcycle otherwise van in the path. Twelve testing clips and 16 training clips and are incorporated in Ped2. Each clip has frames of 150 to 180 and a size of 360 × 240 pixels. Likewise, every clip is presented using a binary mask of pixel-level generated manually, could recognise the region having the anomalies [46]. For cell size calculation, 10 clips other than, per clip 20 frames were selected randomly. The appropriate cell size is revealed to be 30 × 17 pixels.
(i) Frame-level: In this aspect, a frame is said to be anomalous if no less than one pixel in the frame is identified as anomalous. These results of frame-level detection are matched to each frame's frame-level ground truth. Then to find the number of true positive numbers as well as false positive numbers, the detection results are implemented. Though, this frame-level measurement cannot make sure that the detected anomalous pixel coincides with the actual anomalous position. For the reason that some portion of false pixel detection may cause a true positive frame. (ii) (ii) Pixel-level: In this aspect, the results of detection are compared to each frame pixel-level ground truth. If 40% (or more) of anomalous ground truth pixels are identified as true positives, then the frame is considered to be an anomaly. A normal frame, instead, is recognised as false positive in case any normal pixel is found as anomalous.
The pixel-level magnitude compared with the frame-level magnitude is much severe and concentrates more on the correct localisation of an anomaly event. The frame-level and pixel-level conditions are utilised for the experimental performance calculation. The frame level approach identifies which frame contains abnormal activities. The abnormal detection is related along with the remarks of the frame level ground truth.
In Figure 8, (a) cycling, (b) cycling of two persons and (c) EER, RD, and AUC are used for performance evaluation. EER (equal error rate) is the frames misclassified part, while the false positive rate is equivalent to the frame-level miss rate for the standard. The detection rate at the equal error is known as RD (rate of detection). The AUC is a field of area under the ROC curve. The accurate objects are not extracted, and the exact outlines of the moving objects are either not obtained in this proposed approach. Every image of size 16 × 12 is divided into local patches. In the training dataset for local abnormal event detection, the descriptor is extracted with the training sample of feature F2 (8 × 8). The images feature descriptor at the equivalent position is classified in the testing dataset. As height and width dimension, the patches are shifted along with both two pixel strides, and by logistic regression, each 2 × 2 block is classified several times. In Figures 8 and 9, the performance of the local abnormal detection are shown.
In the detection of the frames, the approach of frame-level especially concentrates on which having the abnormal event, whereas pixel-level approach particularly concentrates on the  location of the abnormal pixel. The performance result demonstrates that the frames can be detected by using a sparse reconstruction method where the abnormal event happens, but in ground truth it does not identify the real abnormal pixels. Compared to the sparse reconstruction method, the accuracy of the proposed method based on the location is higher.
The results are compared with various evaluation standards of the existing approaches are mentioned in Table 4. The proposed approach, as well as the algorithm for anomalous detection, performs better than existing algorithms.
Some more example of detected normal/abnormal behaviour in BEHAVE videos is shown in Figure 10. Figure 10(a-f) represented the people walking behind the van.

CONCLUSION
In this paper, the logistic regression based approach is presented for detecting abnormal behaviours in a crowded scene. A multi-RoI covariance (MCOV) descriptor is used for combining the optical flow based feature of multiple fields of interest. In video streams, the Optical flow (OF) is considered as a critical feature. Based on normal samples training, the predictions are transformed with the logistic regression through logistic function to find abnormal events. To prove this proposed methods competitive property, the experiments are conducted using the benchmark datasets, that is, BEHAVE, UMN, and UCSD. Numerous algorithms deal with the anomaly detection with highly structured scenes. This is due to the shortage and nearly infinite variety of abnormal actions in real life. Though, attaching the vast quantity of footage that is caught by the omnipresent CCTV cameras in several public places around the world can offer a source for benchmark datasets, which in different contexts of interests are used. Certain datasets would facilitate researchers to assess how fine an abnormal-detection algorithm executes in two significant tasks, specifically abnormality detection. In the dataset, the ground truths for each frame should be provided to evaluate performance on this task. The other is the localisation of anomaly. In any anomaly detection system, it is vital not only to do well in identifying the presence of an abnormality in the scene but to find where it is taking place. In unstructured situations, there is a requirement to test the performance of these approaches. More research attention needs to be dedicated to the real world of messy environments, which has many moving objects and activities and for the progress of agendas that will especially and effectively deal with the problem of scalability of video analysis. This remains an open challenge. Based on current research, the forthcoming research scopes are action recognition in multimedia event detection problems. In future work, the possibly outspread current approach into a network system with multi-camera to upturn the detection performance of abnormal behaviour in several conditions.