Intelligent crowd sensing pickpocketing group identification using remote sensing data for secure smart cities

: As a public infrastructure service, remote sensing data provided by smart cities will go deep into the safety field and realize the comprehensive improvement of urban management and services. However, it is challenging to detect criminal individuals with abnormal features from massive sensing data and identify groups composed of criminal individuals with similar behavioral characteristics. To address this issue, we study two research aspects: pickpocketing individual detection and pickpocketing group identification. First, we propose an IForest-FD pickpocketing individual detection algorithm. The IForest algorithm filters the abnormal individuals of each feature extracted from ticketing and geographic information data. Through the filtered results, the factorization machines (FM) and deep neural network (DNN) (FD) algorithm learns the combination relationship between low-order and high-order features to improve the accuracy of identifying pickpockets composed of factorization machines and deep neural networks. Second, we propose a community relationship strength (CRS)-Louvain pickpocketing group identification algorithm. Based on crowdsensing, we measure the similarity of temporal, spatial, social and identity features among pickpocketing individuals. We then use the weighted combination similarity as an edge weight to construct the pickpocketing association graph. Furthermore, the CRS-Louvain algorithm improves the modularity of the Louvain algorithm to overcome the limitation that small-scale communities cannot be identified. The experimental results indicate that the IForest-FD algorithm has better detection results in Precision, Recall and F1score than similar algorithms. In addition, the normalized mutual information results of the group division effect obtained by the CRS-Louvain pickpocketing group identification algorithm are better than those of other representative methods.


Introduction
Smart cities can make full use of remote sensing data to analyze and integrate various important pieces of information from the urban core system, so as to make intelligent responses to various services of public safety and transportation. However, some people seldom pay attention to their property when taking public transport in smart cities, which provides opportunities for pickpockets [1]. Although considerable manpower and material resources have been allocated, many pickpockets still do not get caught, which brings risks to the people's property safety. Therefore, it is very important to intelligently identify pickpockets and their groups in smart cities [2]. Smart cities use Internet of Things (IOT) to break down the data islands between devices and organizations, and combine technologies such as wireless sensor networks and cloud computing. It can provide various data for pickpocket detection [3,4], which is of great significance to understand the identity characteristics and travel mode of passengers. Considering the huge differences between pickpockets and normal passengers in travel time, visited stations and stay time, it is possible to find potential pickpockets and groups through exploration of ticketing data provided by the IOT [5,6]. However, it is difficult to accurately identify pickpocketing groups within massive data. Crowdsensing can solve this problem, through adopting the various real-time data collected by the IOT sensors, and then intelligently processing the data by machine learning methods running in the cloud to complete large-scale pickpocketing group identification [7,8].
How can suspicious individuals be detected in a large amount of data to avoid false-positives? There are thousands of passengers traveling every day, and pickpockets are only a tiny part of them. To detect suspicious individuals in such large-scale data, we need to filter the passengers who have completely normal travel modes and are unlikely to be pickpockets [9]. Therefore, we use the collected ticketing data and geographic information data to extract the temporal, spatial and social features of passengers. Through the combined analysis of various features, the normal individuals are filtered out before pickpocketing individual detection, which greatly improves the performance of our algorithm.
How to accurately identify pickpocketing groups among suspicious individuals is a key research topic of this paper [10]. Members of groups often show strong correlations in temporal, spatial, social and identity features. The correlations exist not only in the space-time domain, but also fellow villagers and fellow criminals who are active in the same area and are more likely to form groups to commit crimes together. However, existing studies rarely pay attention to multi-factor correlation combinations for pickpocketing group identification [11]. In this paper, we use the weighted combination similarity of multi-factor features to measure the correlation between members. Furthermore, the pickpocketing groups are mainly small groups composed of two to three people, while the traditional algorithms tend to combine small groups into unified large groups.
The contributions of this paper are summarized as follows: (1) We propose an IForest-FD pickpocketing individual detection algorithm. The IForest algorithm is used to filter the abnormal individuals using temporal, spatial and social features. Through the filtered results, the FD algorithm learns the combination relationship between low-order and high-order features to improve the accuracy of identifying pickpockets, which is composed of FM [12] and DNN [13].
(2) We propose a CRS-Louvain pickpocketing group identification algorithm. Based on the idea of crowdsensing, we measure the similarity of temporal, spatial, social and identity features among the pickpocketing individuals. We use the weighted combination similarity as an edge weight to construct the pickpocketing association graph. Furthermore, the CRS-Louvain algorithm improves the modularity of the Louvain algorithm [14] to overcome the limitation that small-scale communities cannot be identified.

Related works
Based on massive sensing data, pickpocketing individuals and groups can be identified. It is mainly to be able to identify the behavioral characteristics differences between pickpocketing individuals and normal passengers and similar behavioral characteristics among members of the group. At present, relevant researches mainly include two aspects: pickpocketing individual detection and pickpocketing group identification.

Pickpocketing individual detection
The purpose of pickpocketing individual detection is to use the crime related data and intelligent analysis technology to find individuals who are significantly different from most individuals, so as to assist law enforcement agencies in detecting for criminals. Ramachandran et al. [15] proposed an intelligent automatic system to detect behavior of the human in public places including such as fighting, pickpocketing and threatening. the behavior of humans was monitored using a convolution neural network (CNN), and multi-classifier support vector machines (MSVMs) were used on the various features scored to be able to predict activities. Tsiktsiris et al. [16] used deep learning and unsupervised approaches to detect abnormal events in autonomous vehicles including pickpocketing, bag snatching, fighting etc. Selvi et al. [17] introduced an enhanced convolutional neural network (ECNN)-based suspicious activity detection system to detect shooting and stealing. Pascale et al. [18] tested the performance of multiple federated devices encompassing drones, closed-circuit television, smart phone cameras and smart glasses to detect real-case scenarios of potentially malicious activities such as mosh pits and pickpocketing. All the above are based on the detection of abnormal individual behaviors in surveillance video data. For the detection of abnormal individuals in non-surveillance video data, the traditional methods are mainly based on statistical analysis to detect the criminals, which is seriously influenced by expert experience. To find pickpockets intelligently, the method based on machine learning has been widely used. Chen et al. [19] used the logistic regression (LR) algorithm to analyze the people with criminal records and found the individuals with the tendency of being repeat offenders. Du et al. [20] used the support vector machines (SVM) algorithm to detect potential pickpockets in the public transit systems. Ogunleye et al. [21] applied the XGBoost algorithm to detect abnormal individuals by using various features, and the algorithm has high performance. Gu et al. [22] analyzed the collected data of passengers entering the subway station and found the differences between pickpockets and normal passengers in terms of ride time, ride frequency, getting on and off the station, etc. Chun et al. [23] applied DNN to predict the influence of gender, age, race and previous record on whether individuals will commit crimes in the next few years and the severity of their crimes, and obtained good prediction results. Lu et al. [24] proposed a scheme combining principal component analysis (PCA) and gradient lifting tree (GBDT), which found out the crimes by analyzing the time of theft and the selected objects of theft cases in ShenZhen city in the past seven years.
However, the pickpocketing individual detection algorithm based on machine learning is difficult to directly apply to practice because the experimental results tend to the negative ones with a larger proportion facing the imbalance of positive and negative samples. In order to overcome this problem, Xue et al. [25] pre-classified the data according to the features of common abnormal individuals "in-out in the same station". Pradhan et al. [26] undersampled a large negative sample and oversampled a small positive sample in the experiment. Du et al. [20] proposed an unsupervised LOF algorithm for anomaly detection to reduce data imbalance, and then used decision tree (DT), SVM and other algorithms for classification. This paper reduces data imbalance by using the IForest algorithm [27], which filters the abnormal individuals of each feature extracted from ticketing data because of its linear time complexity and the idea of integrated learning. From the filtered results, the FD algorithm learns the combination relationship between low-order and high-order features to improve the accuracy of identifying pickpockets.

Pickpocketing group identification
The key-point of pickpocketing group identification is how to explore the similarity between individuals in the group. The relationship between passengers can be quantified as the similarity in temporal, spatial, social and identity features. Previous studies in the field of crime have clearly pointed out that pickpocketing groups are mostly linked by geography or similar criminal experience. Zhang et al. [28] measured the temporal similarity of individuals by drawing individual time distribution histograms and calculating the earth mover's distances (EMD) between histograms. Liu et al. [29] developed a novel measure that simultaneously considers multiple dimensions of travel behavior to quantify intrapersonal variability. Zhao et al. [30] calculated the cumulative cosine similarity distance between sites based on probability. The study of Gravel et al. [31] shows that there is also convergence effect among individuals in criminal groups, and they tend to recruit people who share the same living space and social environment.
To divide groups by the results of similarity measurement between individuals, an increasing number of scholars use machine learning. Wang et al. [32] proposed a density-based spatial clustering of applications with noise (DBSCAN) approach to recognize irregular travel groups based on cascade clustering, which divided the set of points with sufficiently high density into the same community. Troncoso et al. [33] considered the association between two criminal individuals, adopted the LiRAM algorithm based on the constrained shortest path length to classify criminal groups of social networks, and achieved certain results. Lim et al. [34] applied deep reinforcement learning (DRL) to process metadata such as wiretapping times, arrest warrants and judicial decisions, and constructed FDR-CNA algorithm. By taking the output result of DRL as the edge weight, the algorithm built a network to discover the relationship between criminals, and realized the effective identification of the internal membership relationship of criminal groups. Ma et al. [35] realized the community discovery of multilayer networks by fusing nonnegative matrix factorization and topological structural information. Tayebi et al. [36] applied the Girvan-Newman algorithm based on a greedy strategy to detect offender groups as denser subgraphs of some co-offending network. Zhao et al. [37] applied the Louvain algorithm to find potential pickpocketing groups on buses, and achieved faster convergence speed and better division effect when there are more than five individuals. However, these are not ideal for identifying the small-scale groups. Considering that the small-scale groups of pickpockets account for a very high proportion, this paper proposes the CRS-Louvain algorithm to improve the modularity of the Louvain algorithm, which overcomes the limitation that small-scale groups cannot be identified.

Our approach
In this section, we introduce and describe the two algorithms proposed in this paper: IForest-FD pickpocketing individual detection algorithm and CRS-Louvain pickpocketing group identification algorithm. We mainly use the collected ticketing data, geographic information data and personnel identity data to analyze the ticket sales time, travel time, departure and arrival stations. Additionally, we extract the temporal, spatial, social features and passengers' identity features in the travel mode of passengers, and construct the model of pickpocketing individual detection and pickpocketing group identification.
This paper includes two research aspects: pickpocketing individual detection and pickpocketing group identification. First, we propose an IForest-FD pickpocketing individual detection algorithm. The IForest algorithm is used to filter the abnormal individuals by temporal, spatial and social features. Through the filtered results and identity feature, the FD algorithm learns the combination relationship between loworder and high-order features to improve the accuracy of identifying pickpockets. Second, we propose a CRS-Louvain pickpocketing group identification algorithm. Based on the idea of crowdsensing, we measure the similarity of temporal, spatial, social and identity features among the pickpocketing individuals, and then use the weighted combination similarity as an edge weight to construct the pickpocketing association graph. Furthermore, the CRS-Louvain algorithm improves the modularity of the Louvain algorithm to overcome the limitation that small-scale communities cannot be identified. The overall flow of the model is shown in Figure 1.

IForest-FD pickpocketing individual detection algorithm
The IForest-FD pickpocketing individual detection algorithm consists of two parts. First, the IForest algorithm is used to filter the abnormal individuals by analyzing the temporal, spatial and social features. Second, the FD algorithm is used for individual pickpocketing detection using the filtered individuals. The FD algorithm learns the combination relationship between low-order and high-order features to improve the accuracy of identifying pickpockets.

IForest algorithm
The IForest algorithm [38] recursively divides the data space until only one tree in all subspaces reaches the upper limit height. Then, each tree is traversed from root to leaf, the average depth is calculated and the abnormal scores are estimated. The abnormal score of sample x is calculated by the following formula: represents the average of h(x) of all trees, and c(φ) is the average of path length when the number of samples φ is given. c(φ) is computed as follows: where H(k) = ln(k) + ε, ε is Euler's constant. From the definition of an abnormal score, it can be seen that: If the value of s(x) is close to 1, the possibility of abnormal data is higher; otherwise the value of s(x) is close to 0, indicating that the possibility of abnormal data is lower.

FD algorithm
The FD algorithm is composed of two parts: Factorization Machines (FM) and DNN, as shown in Figure 2.
The dense embedding layer compresses the input data into low-dimensional dense vectors to solve the problems of data sparsity and excessive dimension. The FM layer obtains the combination relations of first-order and second-order features, and the DNN layer determines the combination relations of high-order features. The FM layer and DNN layer both use vector features compressed by a dense embedding layer as input and are trained simultaneously. The final output result is shown: whereŷ ∈ (0, 1) is the output of the whole algorithm, which is transformed into a binary identifier of the pickpocketing individual (0 or 1). y F M is the output of the FM layer, and y DN N is the output of the DNN layer. (1) FM algorithm. The FM algorithm solves the problem of feature combination of sparse data [39]. The algorithm can not only obtain the first-order features, but also capture the second-order features better through the inner product of vector features. The formula for this algorithm is shown below: where w i ∈ R d , v i ∈ R k . d is the feature number and k is the vector dimension. ⟨w i , x i ⟩ reflects the importance of first-order features, and d i=1 d j=i+1 ⟨v i , v j ⟩x i x j represents the second-order feature interactions. (2) DNN algorithm. The DNN algorithm is used to learn the interactions of high-order features. Before entering the hidden layer, the algorithm uses the dense embedding layer to compress the input vector into low-dimensional dense vectors for training [40]. The output of the dense embedding layer is: Then, input the value a (0) into the DNN algorithm, and its forward process can be expressed as: where h is the number of layers and σ is the activation function. a (h) is the output of layer h. W (h) and b (h) are the weight and bias of the algorithm, respectively. After passing through H hidden layers, the output of the DNN algorithm is as follows:

CRS-Louvain pickpocketing group identification algorithm
Through the set of detected pickpocketing individuals, this paper proposes a CRS-Louvain pickpocketing group identification algorithm. Based on the idea of crowdsensing, we measure the similarity of temporal, spatial, social and identity features among the pickpocketing individuals. To represent the network relationships among individuals, we use the weighted combination similarity as the edge weight and the pickpocketing individuals as nodes to construct the pickpocketing association graph. Further, the CRS-Louvain algorithm improves the modularity of the Louvain algorithm to overcome the limitation that small-scale communities cannot be found.
Using the EMD distance of two frequency distribution histograms [41], we measure the difference between the two distributions and describe the temporal similarity tS im(a, b) between pickpockets a and b. The formula of tS im(a, b) is as follows: (2) Similarity of the spatial feature sS im(a, b). We use the weighted cosine similarity [42] to measure the spatial similarity of pickpockets a and b. The formula of sS im(a, b) is as follows: where S a and S b are the sites visited by pickpocketing individuals a and b. w a i and w b i are the frequencies of the site n i visited by individuals a and b. If the common visited site is empty, the value of sS im(a, b) is 0; otherwise the visited sites of a and b are the same, and the value of sS im(a, b) is 1.
(3) Similarity of the social feature cS im(a, b). We construct the sequence m a and m b corresponding to the social features of pickpockets a and b. The negative exponent of the normalized Euclidean distance cDis(a, b) between two feature sequences is used to measure the similarity of individuals a and b in social features. The formulas of cDis(a, b) and cS im(a, b) are as follows: where N(a, b) represents the number of pickpocketing individual pairs. (4) Similarity of the identity feature iS im(a, b). As pickpocketing groups are mostly organized as fellow villagers or criminals, this paper uses the Jaccard similarity coefficient [43] to measure the similarity between the household registration and criminal record. The calculation formula of iS im(a, b) is as follows: where A = {a 1 , a 2 }, a 1 and a 2 are the household registration and criminal record of pickpocketing individual a. where P rn represents the value of pearson correlation coefficient [44] between the similarities r and n. r P rn represents the correlation between similarities n and other indicators. n r P rn is the correlation between any two similarities. In this paper, the association between pickpockets is regarded as an undirected weighted association graph G = (V, E), where V is the node set of undirected graph, which represents the set of pickpocketing individuals. E is the set of edges, which is the association of pickpockets a and b. The weighted combination similarity WS im(a, b) represents the edge weights of temporal, spatial, social and identity features.

CRS-Louvain algorithm
This paper proposes a CRS-Louvain algorithm for identifying pickpocketing groups. By improving the modularity used in the traditional Louvain algorithm [45], the CRS-Louvain algorithm effectively overcomes the limitation that small-scale communities cannot be identified. The formula of the traditional modularity Q is as follows: where A i j represents the edge weight between nodes i and j, m = 1 2 i, j A i j . k i represents the degree of node i, k i = j A i j . C i represents the community of node i. If C i is equal to C j , then δ is 1; otherwise, δ is 0. This formula can be simplified as: where w x represents the total edge weights in community x, and d x is the sum of edge weights of all nodes of community x. This paper introduces a new weighted modularity, which considers the CRS. The formula of CRS weighted modularity Q λ is as follows: where λ x = 2l x n x (n x − 1) (3.19) where n x is the total number of nodes in community x. l x is the total number of edges in community x. λ x is the community relationship strength, which represents the ratio of the actual number of edges in community x to the number of edges in the ideal community where all points are fully connected.
The CRS-Louvain algorithm for pickpocketing group identification is as follows: Step 1: Consider each node in the pickpocketing association graph as a community. We traverse all nodes, find the neighbor that maximizes the CRS weighted modularity of each node, and merge each node with the neighbor. Until the communities of all nodes do not change, a layer of community division is formed.
Step 2: We compress all nodes in each community into one node. The weight of nodes in the community is transformed into the self-weight of new node, and the weight between communities is transformed into the weight of new node edges. We iterate Step 1 to obtain a higher level of community division.
Step 3: We iterate Step 2 until the algorithm is stable.
.., n . Each community only contains individuals with similar temporal, spatial, social and identity features. Thus the pickpocketing groups are identified.

Experiments
In this section, we first preprocessed the data, then extracted the data features, and conducted experiments according to the proposed model and algorithm. The experiment includes two parts: pickpocketing individual detection and pickpocketing group identification. On the one hand, we compared IForest-FD pickpocketing individual detection algorithm proposed in this paper with LR, SVM, XGBoost, FD, IForest + LR, IForest + SVM, IForest + XGBoost. On the other hand, we compared CRS-Louvain pickpocketing group identification proposed in this paper with DBSCAN algorithm, the Girvan-Newman algorithm, the Louvain algorithm.

Data preprocessing
(1) Ticketing data. The data used in this paper is based on real data in J province and construct the simulated data according to the ratio of the combination of various features. We simulate the ticketing data of passengers in J province from June 1, 2022 to July 5, 2022. The data set has approximately 6.5 million records, mainly includes name, ID number, ride time, number, etc. and the ID number is used as identification ID.
(2) Geographic information data. This data mainly includes the network and running mileage between stations in J Province. Furthermore, the data of the network includes the routes, stations and location coordinates.
(3) Personnel identity data. This data mainly includes gender, age, place of residence, education, current address and criminal record. Approximately 1% of the data are incomplete and have missing values, and we zeroed the missing values.

Extraction of data features
(1) Temporal feature. The temporal feature mainly consists of four aspects: travel time, riding frequency, night travel ratio and travel regularity. We use entropy to calculate the travel regularity, and the formula is as follows: where t i is the time interval in hours, and P v (T = t i ) represents the probability of each passenger riding within a given time interval t i ∈ T . Passengers who have a higher night travel ratio and a lower travel regularity are more likely to be pickpockets.
(2) Spatial feature. The spatial feature represents the regularity of stations in passengers' travel behavior. The formula of the regularity of stations is as follows: where l u is all the stations visited by passenger u within a given time interval. P u (l) is the probability of visiting a particular station. To reduce costs and avoid being caught, the proportion of key stations and short trips in the pickpocket's record is high.
(3) Social feature. The social feature represents the average stay time of a passenger at one station.As shown in Figure 3, the normal passengers will stay for a long time to finish some work (such as school, travel, work, etc.) after arriving at a place, and the pickpockets desperately need to make his next trip. Therefore, the shorter the stay time is, the higher the abnormality of the passengers. (4) Identity feature. The identity feature mainly consist of five aspects: gender, age, place of residence, education and criminal record. Pickpockets and normal passengers have significant differences in the above features, which can improve the accuracy of pickpocketing individual detection.

Experimental results
After preprocessing the original data, we are left with 4,729,325 records of ticketing data that involve 748,681 passengers. The passengers include 748,513 normal passengers and 168 pickpockets.

Experiment of IForest-FD pickpocketing individual detection
First, we extract temporal, spatial, social and identity features of each passenger, and filter the passengers by the IForest algorithm. Second, we use the LR algorithm [19], SVM algorithm [20], XGBoost algorithm [21] and FD algorithm to detect the pickpockets based on the filtered passengers by the IForest algorithm. We divide all datasets using the 5-fold cross-validation method, and take 10 times to calculate the average value as the results.
(1) Evaluation indicators. To evaluate the performances of different methods, we introduce three evaluation indicators: Precision, Recall, F1score [46]. The calculation formulas are as follows: where T P and FP are defined as the sample number of pickpocketing individuals detected correctly and incorrectly, respectively. T N and FN are defined as the sample numbers of normal passengers detected correctly and incorrectly, respectively.
(2) Result analysis. The experimental results of pickpocketing individual detection are shown in Table 1. We can see that the IForest algorithm can significantly improve the accuracy of the pickpocketing individual detection model. We compared the experimental results of LR, SVM, XGBoost and FD with or without the IForest algorithm. The Precision of IForest + LR, IForest + SVM, IForest + XGBoost and IForest + FD improved by 3.32, 7.41, 5.49 and 6.18%, respectively. The Recall improved by 1.82, 2.68, 6.13 and 5.13%, and the F1score improved by 3.62, 4.72, 5.79 and 5.40%, respectively. Furthermore, we compare the experimental results of four algorithms, which are IForest + LR, IForest + SVM, IForest + XGBoost and IForest + FD. The Precision of IForest + FD is 13.92, 7.82 and 6.42% higher than that of IForest + LR, IForest + SVM, IForest + XGBoost. The Recall improved by 12.69, 8.94 and 4.39%, and the F1score improved by 11.19, 9.94 and 2.43%, respectively. It can be seen that the FD algorithm can better capture the interactive information between temporal, spatial and social features, as well as the internal information of multiple classification features in identity features, so as to reduce the probability of individual pickpocket missing detection. We also note that the IForest + FD algorithm detected 166 pickpocketing individuals. (1) Weighted combination similarity. By analyzing the data of 168 pickpocketing individuals, we calculate the similarity of temporal feature tS im(a, b), spatial feature sS im(a, b), social featureand cS im(a, b) identity feature iS im(a, b) between any two pickpocketing individuals. The values of the four similarities and their correlations are shown in Figure 4.
According to the theory of pearson correlation coefficient, we calculate the weight values corresponding to tS im(a, b), sS im(a, b), cS im(a, b) and iS im(a, b) are 0.27, 0.31, 0.26 and 0.16 respectively. Then, Furthermore, we delete the pickpocketing pairs with five similarity indices that are less than 0.1, because the probability of such pickpocketing pairs in the same group is low. Then, we identify 210 pickpocketing pairs and 126 pickpocketing individuals. (2) Pickpocketing group identification. First, we use the weighted combination similarity WS im(a, b) as the edge weight to construct a pickpocketing association graph, as shown in Figure 5. To compare the CRS-Louvain pickpocketing group identification algorithm with other algorithms, we introduce normalized mutual information (NMI) as an evaluation indicator [47]. The definition of NMI is as follows: where X and Y represent the community labels of nodes in x and y respectively. I X , Y represents the mutual information between X and Y. H X and H Y are the entropies of X and Y. The range of NMI X , Y is 0 to 1. The closer the value of N MI X , Y is to 1, the higher the accuracy of community division.  We compare the CRS-Louvain pickpocketing group identification algorithm with the DBSCAN algorithm, Girvan-Newman algorithm and Louvain algorithm and the NMI results are shown in Figure 7.
In the Figure 7, the abscissa is the number of members in each pickpocketing group. The ordinate is the NMI value, which represents the difference in distribution between the actual pickpocket group and the identification group. The greater the value of NMI, the more accurately pickpocketing groups are detected. When identifying large pickpocketing groups, the four algorithms have shown good performance. However, the Girvan-Newman algorithm and Louvain algorithm do not have strong abilities to identify small groups. When identifying the pickpocketing groups composed of 2 and 3 pickpockets, the NMI values of the CRS-Louvain algorithm are 0.67 and 0.71 respectively, which is better than the DBSCAN algorithm, the Girvan-Newman algorithm and the Louvain algorithm. Figure 7. NMI values of the four algorithms for group division.

Conclusions
In recent years, the application of informatization, digitalization and intelligence in smart cities has been continuously improved. IOT technology has become one of the core technologies for collecting, storing, managing, analyzing and sharing massive data, and has been widely used in various information systems. The further integration of the IOT with big data effectively solves the problem of accurately judging passenger information with wide coverage and massive historical data for law enforcement agencies. Through intelligent analysis and automatic perception, law enforcement agencies have realized the accurate identification of pickpocketing groups in areas, which has brought new opportunities for the establishment of a decision-making system dominated by information.
This paper includes two research aspects: pickpocketing individual detection and pickpocketing group identification. First, we propose an IForest-FD pickpocketing individual detection algorithm. The IForest algorithm is used to filter the abnormal individuals of each feature extracted from ticketing data and geographic information data. Using the filtered results, the FD algorithm learns the combination relationship between low-order and high-order features to improve the accuracy of identifying pickpockets. Second, we propose a CRS-Louvain pickpocketing group identification algorithm. Based on the idea of crowdsensing, we measure the similarity of temporal, spatial, social and identity features among the pickpocketing individuals, and then use the weighted combination similarity as the edge weight to construct the pickpocketing association graph. Furthermore, the CRS-Louvain algorithm improves the modularity of the Louvain algorithm to overcome the limitation that small-scale communities cannot be identified.
The method proposed in this paper is applicable to the identification of active pickpocketing individuals and their groups. However, there are many kinds of pickpockets operating in the city, and the realization of unified modeling and identification of various criminal behaviors will be a key direction in our future research. After law enforcement officers identify the groups through the designed algorithms, their feedback will help us to identify the groups more accurately. Reinforcement learning can then continuously improve the accuracy of the algorithms through "trial and error". Therefore, how to better combine supervised learning with reinforcement learning to imitate expert decision-making in the real world and realize the automatic adjustment of the model will be another future research direction.

Use of AI tools declaration
The authors have not used Artificial Intelligence (AI) tools in the creation of this article.