Graph Representation Learning for Street-Level Crime Prediction

: In contemporary research, the street network emerges as a prominent and recurring theme in crime prediction studies. Meanwhile, graph representation learning shows considerable success, which motivates us to apply the methodology to crime prediction research. In this article, a graph representation learning approach is utilized to derive topological structure embeddings within the street network. Subsequently, a heterogeneous information network that incorporates both the street network and urban facilities is constructed, and embeddings through link prediction tasks are obtained. Finally, the two types of high-order embeddings, along with other spatio-temporal features, are fed into a deep neural network for street-level crime prediction. The proposed framework is tested using data from Beijing, and the outcomes demonstrate that both types of embeddings have a positive impact on crime prediction, with the second embedding showing a more significant contribution. Comparative experiments indicate that the proposed deep neural network offers superior efficiency in crime prediction.


Introduction
The spatio-temporal prediction of crime can detect or identify the future crime risk within a specific spatial unit.The prediction results can be used to guide law enforcement agencies to deploy prevention, control, and intervention strategies in a scientific way to combat crime [1].So far, many spatio-temporal prediction models have been proposed, and they primarily employ statistical and machine learning algorithms to integrate the data of historical crime incidents, social and physical environments, and ambient populations to derive the estimated crime risk in a targeted area.Though the proposed models achieved satisfactory prediction results, the input variables (crime incidents and other related features) are mostly aggregated or organized by specifically designated areal units (e.g., uniform grids, administrative districts, and census tracts).Compared to fixed areal units, however, the street network is a unique spatial topology that encodes the connections between street segments and determines the physical links along which crime risk spreads [2].Subsequently, the street network-based predictive crime mapping tools NTKDE and GLDNet were proposed, and they outperformed grid-based alternatives in property and assault crime forecasting [3,4].Although these models showed better prediction performance, they did not include the potential spatial variables at multiple scales, which were thought to have a significant impact on the occurrence of crime [5], such as the raw attributes of street segments and the influence of adjacent urban functional facilities.
Representation learning methods have gained significant achievements in recent years, particularly in Natural Language Processing, Computer Vision, and Speech Recognition.The advancements also offer new perspectives for crime prediction research.The primary objective of representation learning is to uncover latent higher-order dependencies from big data.These feature vectors, also known as embeddings, are then generated and subsequently used as inputs for machine learning algorithms in downstream tasks, such ISPRS Int.J. Geo-Inf.2024, 13, 229 2 of 18 as classification and prediction.Considering the disadvantages of street network-based crime prediction research, we propose an approach which incorporates several techniques to learn temporal and spatial representation vectors potentially correlated with crime.The effectiveness of these vectors is then evaluated.Specifically, our work comprised five sequential steps.In the first step, the issue of sparse crime data was addressed.A bi-exponential decay smoothing technique was employed to smooth the data establishing temporal dependencies in historical crime incidents.In the second step, a vector representing the intrinsic attributes of the street segment was generated.In the third step, the embedding of the street network was generated by employing Deepwalk learning.In the fourth step, a heterogeneous information network (HIN) was constructed by integrating the street network with urban functional facilities.After that, the street embedding of the HIN was obtained through link prediction tasks.Lastly, all vectors were fed into our deep neural network to realize crime prediction.
The contributions of this article are as follows: It provides a flexible framework for crime prediction.Compared to the current stateof-the-art GLDNet, our framework treats each street segment as an independent unit.The general spatio-temporal feature representation vectors for each street segment are captured and subsequently aggregated to enable crime prediction via deep neural networks.The framework has the advantage of streamlining the integration of newly discovered features.
It utilizes graph representation learning.To capture the topological structure embedding of the street network (Street Network to Embedding, SN2V), the street network was converted into a dual graph, and the embedding learning process was conducted using the Deepwalk algorithm.Furthermore, to obtain the functional embedding (heterogeneous information network, HIN2V) of street segments within an urban built environment, a heterogeneous information network was established.

Related Work 2.1. Crime Prediction
Prior research has indicated that the distribution of crime is not uniform [6] but displays a phenomenon of repeat victimization in space and time [7].It inspired the following crime concentration analysis and crime prediction research.The initial models proposed for crime prediction treated historical crime recordings as the sole input variables and borrowed the principles from other fields to predict future crime.Typical models include Prospective Mapping [8], the Contagion Model [9], the Kernel Density Estimation Model [10,11], the Self-Exciting Point Process Model [12], the Neural Network Model [13], etc.With the abundance of data, crime prediction models have started to incorporate environmental variables.These variables help capture the impacts of factors, such as weather and land use, on crime occurrences.Typical models include the Risk Terrain Model [14], Bayesian Spatio-temporal Model [15], Discrete Choice Framework [16], Tensor Decomposition [17], Graph Learning [18], and Flexible Search Window [19].Moreover, some models include the movement pattern of the population.By analyzing patterns of human movement, such as commuting patterns or the influx of people during specific events, crime hotspots could be better understood and predicted [20][21][22].However, all of the models are common in that they aggregate crime incidents into a specific temporal and areal unit and then predict the future crime risk within the unit.Most of the spatial units used are grids, such as arbitrarily defined grids, census tracts, and communities, but they also cause some problems for crime analysis and prediction.First, grid-based units have non-overlapping boundaries, which might interrupt the continuity of certain characteristics in adjacent units [23].Second, these methods are applied under the assumption that the crime risk is uniformly distributed within a grid cell, which may be particularly problematic in cases where street segments that are co-located within a grid experience very different risks [3].Third, the prediction accuracy varies across different spatial scales, which is known as the modified areal unit problem (MAUP) [24,25].
Facing the problem of fixed grids, researchers have been seeking to predict crime on street networks.There is growing evidence that crime variability can be attributed to street segments [26], because the shape and structure of the street network are thought to play a crucial role in influencing the distribution of crime [27,28].More importantly, predicting crime on the street is practical and effective for police patrolling in urban cities [29,30].The typical network-based predictive crime mapping, named NTKDE, was proposed by Rosser et al.They translated the kernel density estimation (KDE) used in Prospective Crime Mapping (ProMap) into a network space and achieved better prediction performance in property crime forecasting than grid-based alternatives.Subsequently, a deep learning model for network-based predictive mapping, GLDNet, was developed.GLDNet leveraged a graph-based representation of network-structured data and introduced a localized diffusion network to model spatial propagation.The experiment showed that GLDNet outperformed NTKDE [4].However, there are still some limitations in network-based studies.Firstly, it is important to note that the above-mentioned studies did not fully consider the correlation between the attributes of each street segment and the occurrence of crime [31][32][33][34].Secondly, previous research input the entire street network into deep learning models to learn spatial dependencies and try to predict the probability of crime occurrence across all street segments, which requires a huge amount of computational resources.Moreover, the role of urban functional facilities in impacting crime occurrence was proven by previous studies, for example, crime generators and crime attractors [35,36].However, the above-mentioned studies primarily focused on learning the propagation patterns of crime risk within the network but ignored the correlation between the streets and these functional facilities in the built environment.

Street Network Modeling and Graph Representation Learning
In recent years, the availability of open-source projects like OpenStreetMap (OSM) has facilitated easy access to street network data.Various methods have been employed to model street networks, with the graph structure being the most prevalent due to its ability to represent both the geometric and topological structures of real-world street networks [37].Two types of graph models are commonly utilized, namely primal graphs and dual graphs.In primal graphs, the nodes denote intersections, and edges denote street segments.In dual graphs, the nodes denote street segments, and edges denote intersections connecting street segments [38].
Representation learning based on street networks is a burgeoning topic in the Intelligent Transportation Systems (ITS) area.Applications of this approach span a range of tasks, including street classification, traffic flow prediction, and travel time estimation.Wang proposed a model called Road Network to Vector (RN2Vec) to jointly learn embeddings of intersections and road segments and evaluated the learned embeddings for node/edge classification and travel time estimation [39].Zhang proposed a dual graph-based approach that encompasses both a simple graph and a hypergraph, and it is capable of capturing the low-order, high-order, and long-range relationships among roads simultaneously [40].Zhang proposed a spatial-temporal generative adversarial network (ST-GAN) to disclose the relationship between underlying patterns and citywide traffic dynamics, which improved the prediction accuracy, and characterized the structural properties of the traffic evolution process [41].Gharaee utilized a line graph transformation to learn highly representative road embeddings and proposed a Graph Attention Isomorphism Network to achieve road type classification [42].
Given the remarkable achievements of representation learning in ITS, it is worthwhile to further study its applicability in crime prediction.Analogous to the evaluation of traffic flow and travel time, crime incidents are also constrained by the spatio-temporal elements present in the urban built environment.This realization motivates us to employ representation learning techniques to capture higher-order spatial dependencies within street networks and subsequently examine their impact on crime prediction.

Problem Statement and Method
The objective of this study is to forecast the likelihood of crime occurrence across all street segments in the following week.This prediction is based on the analysis of historical crime risk patterns over time in conjunction with the attributes of street segments, spatial dependencies within the street network, and the influence of various functional facilities in the built environment.
The training data comprises instances (x, y), where x represents a four-field representation originating from various datasets related to a street segment, and y indicates whether a crime occurs on that street segment.If y = 1, it signifies that a crime occurs, and if y = 0, it signifies the absence of a crime.The x variable encompasses various spatio-temporal fields, including time-varying risk, street attributes, street network embedding, and heterogeneous information network embedding.x can be denoted as [x tvr , x sa2v , x sn2v , x hin2v ].Therefore, the objective of this prediction task is to construct a model, y = CP_model(x), that estimates the probability of crime occurrence on a specific street in a specific time window.Essentially, this task can be framed as a binary classification problem, as demonstrated in Figure 1.
elements present in the urban built environment.This realization motivates us to employ representation learning techniques to capture higher-order spatial dependencies within street networks and subsequently examine their impact on crime prediction.

Problem Statement and Method
The objective of this study is to forecast the likelihood of crime occurrence across al street segments in the following week.This prediction is based on the analysis of historica crime risk patterns over time in conjunction with the attributes of street segments, spatia dependencies within the street network, and the influence of various functional facilities in the built environment.
The training data comprises instances (x, y), where x represents a four-field represen tation originating from various datasets related to a street segment, and y indicates whether a crime occurs on that street segment.If y = 1, it signifies that a crime occurs, and if y = 0, it signifies the absence of a crime.The x variable encompasses various spatio temporal fields, including time-varying risk, street attributes, street network embedding and heterogeneous information network embedding.x can be denoted as [xtvr, xsa2v, xsn2v xhin2v].Therefore, the objective of this prediction task is to construct a model, y = CP_model(x), that estimates the probability of crime occurrence on a specific street in a specific time window.Essentially, this task can be framed as a binary classification prob lem, as demonstrated in Figure 1.Initially, historical crime data were mapped onto street segments using the shortes distance method, and sequential risk data were generated for each street segment through data augmentation techniques.The risk data served as features along the temporal dimen sion, denoted as time-varying risk (TVR).Then, the fundamental attributes of the stree segments were accessed from OpenStreetMap (OSM), and block characteristics were ex tracted from the Beijing Laboratory.These attributes were normalized and transformed into feature vectors to enhance their compatibility with deep learning models, referred to as Street Attributes to Vector (SA2V and SA2V_B).Next, the street network from OSM was converted into a graph structure, and the Deepwalk algorithm was applied to derive vec tors encapsulating the network's structural characteristics.The vectors were named as Street Network to Vector (SN2V).Subsequently, a heterogeneous information network was constructed, incorporating the street network and functional facility entities obtained from Baidu.By performing link prediction tasks, advanced semantic features were de rived and named as Heterogeneous Information Network to Vector (HIN2V).Given the advantages of the DeepFM model, such as its end-to-end processing capability and its Initially, historical crime data were mapped onto street segments using the shortest distance method, and sequential risk data were generated for each street segment through data augmentation techniques.The risk data served as features along the temporal dimension, denoted as time-varying risk (TVR).Then, the fundamental attributes of the street segments were accessed from OpenStreetMap (OSM), and block characteristics were extracted from the Beijing Laboratory.These attributes were normalized and transformed into feature vectors to enhance their compatibility with deep learning models, referred to as Street Attributes to Vector (SA2V and SA2V_B).Next, the street network from OSM was converted into a graph structure, and the Deepwalk algorithm was applied to derive vectors encapsulating the network's structural characteristics.The vectors were named as Street Network to Vector (SN2V).Subsequently, a heterogeneous information network was constructed, incorporating the street network and functional facility entities obtained from Baidu.By performing link prediction tasks, advanced semantic features were derived and named as Heterogeneous Information Network to Vector (HIN2V).Given the advantages of the DeepFM model, such as its end-to-end processing capability and its ability to capture both low-order and high-order feature interactions [43], it was adopted as the binary classifier to fulfill the prediction task.

Time-Varying Risk (TVR)
Historical crime records have to be linked to specific street segments, a process that involves mapping each crime to its corresponding street segment.In this study, crime incidents were mapped to the nearest street segment using the Euclidean distance between their spatial coordinates.However, an issue that might be encountered is the highly sparse distribution of crime incidents on some street segments.Previous studies have demonstrated that crime risk follows an exponentially decaying trend over time [4,19].To reflect this phenomenon, a bi-exponential decay model was utilized to construct an augmented data sequence that represents the time-varying risk.The principle of this method is illustrated below [44]: where t denotes time (hour), N denotes the number of crime incidents, p + r denotes the communicative memory decay rate, q denotes the cultural memory decay rate, r denotes the rate of information flows from communicative memory to cultural memory, and p, q, and r can be estimated by data fitting.This manipulation serves two purposes: firstly, it maintains the consistency of the processed data with the temporal trends observed in the historical data; secondly, it reduces the interference caused by a large number of zero values in subsequent learning tasks.The resulting output captures the temporal changes in crime risk associated with each street segment, and it is then used as an input for the subsequent prediction models, as shown in Figure 2.
ISPRS Int.J. Geo-Inf.2024, 13, x FOR PEER REVIEW 5 of 18 ability to capture both low-order and high-order feature interactions [43], it was adopted as the binary classifier to fulfill the prediction task.

Time-Varying Risk (TVR)
Historical crime records have to be linked to specific street segments, a process that involves mapping each crime to its corresponding street segment.In this study, crime incidents were mapped to the nearest street segment using the Euclidean distance between their spatial coordinates.However, an issue that might be encountered is the highly sparse distribution of crime incidents on some street segments.Previous studies have demonstrated that crime risk follows an exponentially decaying trend over time [4,19].To reflect this phenomenon, a bi-exponential decay model was utilized to construct an augmented data sequence that represents the time-varying risk.The principle of this method is illustrated below [44]: where t denotes time (hour), N denotes the number of crime incidents, p + r denotes the communicative memory decay rate, q denotes the cultural memory decay rate, r denotes the rate of information flows from communicative memory to cultural memory, and p, q, and r can be estimated by data fitting.This manipulation serves two purposes: firstly, it maintains the consistency of the processed data with the temporal trends observed in the historical data; secondly, it reduces the interference caused by a large number of zero values in subsequent learning tasks.The resulting output captures the temporal changes in crime risk associated with each street segment, and it is then used as an input for the subsequent prediction models, as shown in Figure 2.

Street Attributes to Vector (SA2V)
Statistical analyses have proven that a close relationship exists between street segment attributes and the occurrence of crime [45][46][47].The attributes encompass street type (e.g., arterial or local and pedestrian friendliness), topological configuration, directional alignment, speed limit, length, and other relevant characteristics.However, the raw attributes are not directly suitable for model training.To integrate them into a unified representational space for training, preprocessing steps including feature selection, scaling, and normalization were performed (see Appendix A).These attributes, categorized as low-

Street Attributes to Vector (SA2V)
Statistical analyses have proven that a close relationship exists between street segment attributes and the occurrence of crime [45][46][47].The attributes encompass street type (e.g., arterial or local and pedestrian friendliness), topological configuration, directional alignment, speed limit, length, and other relevant characteristics.However, the raw attributes are not directly suitable for model training.To integrate them into a unified representational space for training, preprocessing steps including feature selection, scaling, and normalization were performed (see Appendix A).These attributes, categorized as low-order features, were concatenated to form a 12-dimensional vector representative of each street segment.
Following the phenomenon that crime occurrence is correlated with street blocks [48,49], the block attributes were integrated as a distinctive feature set for intersecting streets within our analysis.The one-hot encoding method was used to construct the feature vectors (see Appendix A).To distinguish this feature set from the basic street attributes, denoted as SA2V, the block attributes were referred to as SA2V_B.

Street Network to Vector (SN2V)
The street network can be conceptualized as a graph in which the nodes denote street segments and edges denote the intersections between every pair of street segments.This dual graph approach, introduced by Porta, differs from the conventional primal graph [38].Figure 3 illustrates the process of converting a fictive urban street network into a dual graph.In this scenario, a local street network is composed of interconnected street segments (labeled by numbers) and intersections.Each street segment is assigned as a node, and the edges represent the possible routes or travel paths between pairs of nodes, which correspond to streets within the graph.
order features, were concatenated to form a 12-dimensional vector representative of ea street segment.
Following the phenomenon that crime occurrence is correlated with street block [48,49], the block attributes were integrated as a distinctive feature set for intersecti streets within our analysis.The one-hot encoding method was used to construct the fe ture vectors (see Appendix A).To distinguish this feature set from the basic street attri utes, denoted as SA2V, the block attributes were referred to as SA2V_B.

Street Network to Vector (SN2V)
The street network can be conceptualized as a graph in which the nodes denote stre segments and edges denote the intersections between every pair of street segments.Th dual graph approach, introduced by Porta, differs from the conventional primal grap [38].Figure 3 illustrates the process of converting a fictive urban street network into a du graph.In this scenario, a local street network is composed of interconnected street se ments (labeled by numbers) and intersections.Each street segment is assigned as a nod and the edges represent the possible routes or travel paths between pairs of nodes, whi correspond to streets within the graph.As a general graph representation method, Deepwalk was utilized to derive embe dings for each street segment.It operated by simulating short-range random walks learn node embeddings, which were then used to represent the similarity of neighbo hoods and the social interaction identity of nodes [50].The initial step involved converti the street segments into graph nodes.A node representation matrix ф was initialized, an a binary tree t was constructed from the node set v. Subsequently, a random walk gene ator was applied to each node, followed by a Skip-Gram procedure [51] that updated t node representations.
Following the experimental comparison, the optimal parameters for our model a as follows: window size (w = 5), embedding size (d = 128), walks per node (γ = 10), wa length (t = 80).Additionally, five negative samples and 22,789 iterations were utilize respectively.Upon training the model to convergence, a 128-dimension vector was o tained for application in downstream tasks.

Heterogeneous Information Network to Vector (HIN2V)
Prior research has indicated that the urban environment significantly influenc crime.However, considering that the urban environment is characterized by heterogen ous data from multiple sources, the data organization and feature extraction directly im pact the model's predictive accuracy.In order to effectively model the influence of t As a general graph representation method, Deepwalk was utilized to derive embeddings for each street segment.It operated by simulating short-range random walks to learn node embeddings, which were then used to represent the similarity of neighborhoods and the social interaction identity of nodes [50].The initial step involved converting the street segments into graph nodes.A node representation matrix ф was initialized, and a binary tree t was constructed from the node set v. Subsequently, a random walk generator was applied to each node, followed by a Skip-Gram procedure [51] that updated the node representations.
Following the experimental comparison, the optimal parameters for our model are as follows: window size (w = 5), embedding size (d = 128), walks per node (γ = 10), walk length (t = 80).Additionally, five negative samples and 22,789 iterations were utilized, respectively.Upon training the model to convergence, a 128-dimension vector was obtained for application in downstream tasks.

Heterogeneous Information Network to Vector (HIN2V)
Prior research has indicated that the urban environment significantly influences crime.However, considering that the urban environment is characterized by heterogeneous data from multiple sources, the data organization and feature extraction directly impact the model's predictive accuracy.In order to effectively model the influence of the urban environment on crime, a heterogeneous information network (HIN) was developed to represent the interconnections of fundamental urban entities.
HIN includes the administrative districts, business districts, Points of Interest (POIs)/ Areas of Interest (AOIs), and streets.Figure 4 illustrates the process of transforming the layout structure of street networks and urban functional facilities into an HIN.The transformation involves converting the physical infrastructure of a city, encompassing streets and various urban amenities, into a unified network that integrates diverse types of information.The POIs were bi-directionally linked with their nearest street segments, utilizing OSRM's nearest point API service.The AOIs were processed in the same manner.The street segments were first divided at segmentation points, and then the directed relationships between them were maintained.This operation was facilitated by the NetworkX's line_graph function.For business districts, bi-directional links between the business districts and the street segments located within these areas were established.Lastly, the administrative districts were bi-directionally linked with the intersecting business districts.
urban environment on crime, a heterogeneous information network (HIN) was developed to represent the interconnections of fundamental urban entities.
HIN includes the administrative districts, business districts, Points of Interest (POIs)/Areas of Interest (AOIs), and streets.Figure 4 illustrates the process of transforming the layout structure of street networks and urban functional facilities into an HIN.The transformation involves converting the physical infrastructure of a city, encompassing streets and various urban amenities, into a unified network that integrates diverse types of information.The POIs were bi-directionally linked with their nearest street segments, utilizing OSRM's nearest point API service.The AOIs were processed in the same manner.The street segments were first divided at segmentation points, and then the directed relationships between them were maintained.This operation was facilitated by the Net-workX's line_graph function.For business districts, bi-directional links between the business districts and the street segments located within these areas were established.Lastly, the administrative districts were bi-directionally linked with the intersecting business districts.In an HIN, each node represents a different entity, and there are various relationships between these entities.For instance, self-loops on street nodes denote the interconnection relationship between streets themselves.Bi-directional edges between a street and POI (AOI, business district) nodes indicate their proximity within a specific distance range.Administrative districts are more macro-level entities and are represented by nodes that connect to business district nodes to capture more comprehensive semantic information.
The HIN can be simplistically viewed as a knowledge graph; thus, the link prediction techniques could be involved.The Relational Graph Convolutional Network (RGCN) algorithm was employed to predict potential connections between nodes in the network and obtain the embeddings for streets.The RGCN is designed to improve the model's sensitivity to various relationships by assigning distinct weight matrices to each type of relationship [52].It assigns scores to possible edges (s, r, o) based on the function f(subject, relation, object).For any node pair s and o in the graph, a presentation pair (hs (L) and ho (L) ) is assigned to each.The prediction score is then calculated by directly taking the inner product of these representations.A four-layer RGCN model was constructed and trained, achieving convergence with an accuracy of 64%.
where  denotes the set of neighbor indices of node i under relation  ∈ , and  , is a normalization constant that can either be learned or predefined.In an HIN, each node represents a different entity, and there are various relationships between these entities.For instance, self-loops on street nodes denote the interconnection relationship between streets themselves.Bi-directional edges between a street and POI (AOI, business district) nodes indicate their proximity within a specific distance range.Administrative districts are more macro-level entities and are represented by nodes that connect to business district nodes to capture more comprehensive semantic information.
The HIN can be simplistically viewed as a knowledge graph; thus, the link prediction techniques could be involved.The Relational Graph Convolutional Network (RGCN) algorithm was employed to predict potential connections between nodes in the network and obtain the embeddings for streets.The RGCN is designed to improve the model's sensitivity to various relationships by assigning distinct weight matrices to each type of relationship [52].It assigns scores to possible edges (s, r, o) based on the function f (subject, relation, object).For any node pair s and o in the graph, a presentation pair (h s (L) and h o (L) ) is assigned to each.The prediction score is then calculated by directly taking the inner product of these representations.A four-layer RGCN model was constructed and trained, achieving convergence with an accuracy of 64%.
where N r i denotes the set of neighbor indices of node i under relation r ∈ R, and c i,r is a normalization constant that can either be learned or predefined.

Data and Experimental Preparation 4.1. Data and Representations
Study area.The study area is Beijing, the capital city of China.In the year 2017, Beijing was home to over twenty million residents.The city is administratively divided into sixteen districts, encompassing a total area of 16,410 square kilometers.
Crime data.To evaluate the predictive performance of our proposed framework, an experiment focusing on two types of property crime in Beijing was conducted: residential burglary and pickpocketing.The recordings of residential burglaries spanned from January to December 2017 with a total of 22,478 incidents.Similarly, the recordings of pickpocketing incidents also spanned from January to December 2017 with a total of 11,274 incidents.Both datasets provided reliable temporal and spatial information.The distribution of crime incidents on the street segments is shown in Figures 5 and 6.

Data and Representations
Study area.The study area is Beijing, the capital city of China.In the year 2017, Beijing was home to over twenty million residents.The city is administratively divided into sixteen districts, encompassing a total area of 16,410 square kilometers.
Crime data.To evaluate the predictive performance of our proposed framework, an experiment focusing on two types of property crime in Beijing was conducted: residential burglary and pickpocketing.The recordings of residential burglaries spanned from January to December 2017 with a total of 22,478 incidents.Similarly, the recordings of pickpocketing incidents also spanned from January to December 2017 with a total of 11,274 incidents.Both datasets provided reliable temporal and spatial information.The distribution of crime incidents on the street segments is shown in Figures 5 and 6.Street network.The street network data of Beijing serves as the foundational layer due to the significant correlation between crime and the urban street structure [3,4,19,44,53].A total of 292,468 instances of Beijing urban street information were acquired from OSM.The primary attributes of street segments encompass length, free_speed, allow_uses, lanes, capacity, link_type, geometry, and other relevant characteristics.Besides that, block forms, which play a significant role in shaping urban functionality and human mobility patterns, were also considered.Finally, 19,860 instances of block forms from the Beijing Urban Laboratory were collected.
Urban functional facilities.The facilities encompass POIs, AOIs, and business districts in Beijing.POIs can be used to represent the land use characteristics of specific locations.AOIs offer insights into the geographical entities that function as landmarks in local Street network.The street network data of Beijing serves as the foundational layer due to the significant correlation between crime and the urban street structure [3,4,19,44,53].A total of 292,468 instances of Beijing urban street information were acquired from OSM.The primary attributes of street segments encompass length, free_speed, allow_uses, lanes, capacity, link_type, geometry, and other relevant characteristics.Besides that, block forms, which play a significant role in shaping urban functionality and human mobility patterns, were also considered.Finally, 19,860 instances of block forms from the Beijing Urban Laboratory were collected.
Urban functional facilities.The facilities encompass POIs, AOIs, and business districts in Beijing.POIs can be used to represent the land use characteristics of specific locations.AOIs offer insights into the geographical entities that function as landmarks in local areas.Business districts reflect the spheres of influence of service centers.The data were accessed through the API interface provided by the Baidu Map Service.In total, 311 business districts, 789,245 POIs, and 31,900 AOIs were received.
ISPRS Int.J. Geo-Inf.2024, 13, x FOR PEER REVIEW 9 of 18 areas.Business districts reflect the spheres of influence of service centers.The data were accessed through the API interface provided by the Baidu Map Service.In total, 311 business districts, 789,245 POIs, and 31,900 AOIs were received.
Representations.The spatio-temporal vectors generated from all representation learning methods are displayed in Table 1.

Baseline Models
GLDNet.A baseline model was constructed following the previous studies [4,54].The model is composed of two primary components.The temporal component is a twolayer Gated Recurrent Unit (GRU) [55] engineered to capture the temporal dynamics of event propagation.The spatial component integrates three distinct types of Graph Neural Network (GNN) layers: Graph Attention Network (GAT) [56], Graph Gaussian Neural Network (GGNN) [57], and EdgeConv [58].The layers are tailored to extract features of event propagations across the spatial dimension, which is confined by the street network.The model's architecture is shown in Figure 7.It is noted that the original model Representations.The spatio-temporal vectors generated from all representation learning methods are displayed in Table 1.

Baseline Models
GLDNet.A baseline model was constructed following the previous studies [4,54].The model is composed of two primary components.The temporal component is a two-layer Gated Recurrent Unit (GRU) [55] engineered to capture the temporal dynamics of event propagation.The spatial component integrates three distinct types of Graph Neural Network (GNN) layers: Graph Attention Network (GAT) [56], Graph Gaussian Neural Network (GGNN) [57], and EdgeConv [58].The layers are tailored to extract features of event propagations across the spatial dimension, which is confined by the street network.The model's architecture is shown in Figure 7.It is noted that the original model architecture of GLDNet, as presented in the literature, only accounts for two features: the time-varying risk (TVR) and Street Network to Vector (SN2V).
ISPRS Int.J. Geo-Inf.2024, 13, x FOR PEER REVIEW 10 of 18 architecture of GLDNet, as presented in the literature, only accounts for two features: the time-varying risk (TVR) and Street Network to Vector (SN2V).
Other machine learning algorithms.To substantiate the efficacy of our proposed prediction framework, a suite of conventional machine learning algorithms was incorporated for a comparative analysis.These algorithms include Logistic Regression (LR) [59], Decision Tree (DT) [60], Random Forest (RF) [61], XGBoost [62], and Support Vector Machines (SVMs) [63].All of the algorithms utilized the spatio-temporal feature vectors introduced in this study.The Stochastic Average Gradient (SAG) optimization algorithm was chosen for its efficiency with LR.Both Gini impurity and entropy were evaluated as splitting criteria with DT and RF.Gini impurity was selected for its simplicity and effectiveness.The performance of linear and RBF kernel types with the SVM was compared.Due to the frequent convergence issues encountered with the RBF kernel, we opted for the linear kernel as it provided more reliable results.Both 'gbtree' and 'gblinear' were tested as booster types with XGBoost, and 'gbtree' was chosen for its performance with tree-based models.In our model, a two-layer deep neural network (DNN) architecture was selected to test the configurations of (128,64), (256,128), and (512,256).The configuration of (256,128) was found to be the most effective one.The ReLU activation function was employed for the DNN layers, and the regularization coefficient was set to 1 × 10 −5 to prevent overfitting.

Evaluation Metrics
Three standard evaluation metrics were utilized for classification tasks: the Area Under the Curve (AUC), Mean Squared Error (MSE), and accuracy (ACC).The AUC, representing the area beneath the receiver operating characteristic curve, is advantageous due to its insensitivity to class imbalance, a significant consideration given the infrequency of crime incidents.A higher AUC value indicates superior predictive model performance.Conversely, the MSE was utilized to assess the regression accuracy of the prediction models.A lower MSE value signifies enhanced prediction model performance.The ACC measures the overall correctness of the predictions made by a model, which is calculated as the ratio of the number of correct predictions to the total predictions made.A higher ACC suggests better overall performance.However, it is crucial to consider a comprehensive set of metrics, particularly in the crime prediction domain where datasets are often imbalanced.

Dataset Composition and Division
Due to the disparity between the number of streets where crimes have occurred and those that have not, it is essential to balance the dataset to prevent model bias towards the majority class.Therefore, a down-sampling technique was employed to construct a balanced dataset.Firstly, all street segments with recorded crimes were included as positive samples in the dataset.An equivalent number of street segments with no crime records were randomly selected to serve as negative samples.The selected negative samples were Other machine learning algorithms.To substantiate the efficacy of our proposed prediction framework, a suite of conventional machine learning algorithms was incorporated for a comparative analysis.These algorithms include Logistic Regression (LR) [59], Decision Tree (DT) [60], Random Forest (RF) [61], XGBoost [62], and Support Vector Machines (SVMs) [63].All of the algorithms utilized the spatio-temporal feature vectors introduced in this study.
The Stochastic Average Gradient (SAG) optimization algorithm was chosen for its efficiency with LR.Both Gini impurity and entropy were evaluated as splitting criteria with DT and RF.Gini impurity was selected for its simplicity and effectiveness.The performance of linear and RBF kernel types with the SVM was compared.Due to the frequent convergence issues encountered with the RBF kernel, we opted for the linear kernel as it provided more reliable results.Both 'gbtree' and 'gblinear' were tested as booster types with XGBoost, and 'gbtree' was chosen for its performance with tree-based models.In our model, a two-layer deep neural network (DNN) architecture was selected to test the configurations of (128,64), (256,128), and (512,256).The configuration of (256,128) was found to be the most effective one.The ReLU activation function was employed for the DNN layers, and the regularization coefficient was set to 1 × 10 −5 to prevent overfitting.

Evaluation Metrics
Three standard evaluation metrics were utilized for classification tasks: the Area Under the Curve (AUC), Mean Squared Error (MSE), and accuracy (ACC).The AUC, representing the area beneath the receiver operating characteristic curve, is advantageous due to its insensitivity to class imbalance, a significant consideration given the infrequency of crime incidents.A higher AUC value indicates superior predictive model performance.Conversely, the MSE was utilized to assess the regression accuracy of the prediction models.A lower MSE value signifies enhanced prediction model performance.The ACC measures the overall correctness of the predictions made by a model, which is calculated as the ratio of the number of correct predictions to the total predictions made.A higher ACC suggests better overall performance.However, it is crucial to consider a comprehensive set of metrics, particularly in the crime prediction domain where datasets are often imbalanced.

Dataset Composition and Division
Due to the disparity between the number of streets where crimes have occurred and those that have not, it is essential to balance the dataset to prevent model bias towards the majority class.Therefore, a down-sampling technique was employed to construct a balanced dataset.Firstly, all street segments with recorded crimes were included as positive SN2V generates moderate results.However, this ablation experiment only provides an intuitive insight into the contribution of feature vectors, while the exploration of potential interactive effects among multiple features is lacking.Consequently, a Shapley analysis was employed to investigate the contribution of the newly proposed features to the prediction model.

Feature Vector Interpretation
The effectiveness of machine learning models depends on the quality of feature engineering, as different representations can either obscure or reveal various explanatory factors of the underlying data [64].The experimental results can provide a rough estimate of whether individual features or feature combinations contribute to the prediction, but it is necessary to have an accurate description of each feature's specific contribution.To address this issue, the Shapley value was used to complete the task [65,66].Equation (3) illustrates the working principle of the Shapley value.The ablation experiments revealed that the introduced representation vectors collectively enhanced predictive performance.Despite distinct variations in the SVM and LR models, their subpar predictive capabilities suggest that our proposed feature learning method may not be suitable for these models.Furthermore, the findings suggest that the HIN2V and SA2V_B vectors might significantly influence model performance, while SN2V generates moderate results.However, this ablation experiment only provides an intuitive insight into the contribution of feature vectors, while the exploration of potential interactive effects among multiple features is lacking.Consequently, a Shapley analysis was employed to investigate the contribution of the newly proposed features to the prediction model.

Feature Vector Interpretation
The effectiveness of machine learning models depends on the quality of feature engineering, as different representations can either obscure or reveal various explanatory factors of the underlying data [64].The experimental results can provide a rough estimate of whether individual features or feature combinations contribute to the prediction, but it is necessary to have an accurate description of each feature's specific contribution.To address this issue, the Shapley value was used to complete the task [65,66].Equation (3) illustrates the working principle of the Shapley value.
where S is a subset of the features used in the model, p represents the number of features, x j denotes the j-th feature, and val(S) represents the predicted value of the subset S, while ∅ j (val) represents the contribution of the j-th feature to the model.The contribution of each feature to the model is derived from its marginal contribution relative to other features, measured in Shapley values.
The Shapley values of the AUC and ACC metrics were computed for two crime categories, as illustrated in Figure 10.
ISPRS Int.J. Geo-Inf.2024, 13, x FOR PEER REVIEW 14 of 18 where S is a subset of the features used in the model, p represents the number of features,  denotes the j-th feature, and   represents the predicted value of the subset S, while ∅  represents the contribution of the j-th feature to the model.The contribution of each feature to the model is derived from its marginal contribution relative to other features, measured in Shapley values.
The Shapley values of the AUC and ACC metrics were computed for two crime categories, as illustrated in Figure 10.The results indicate that HIN2V significantly enhances the prediction performance across nearly all models, as evidenced by the noteworthy improvements in both the AUC and ACC metrics for the two crime categories.This enhancement is likely attributable to the inclusion of a heterogeneous information network that encompasses more sophisticated semantic information.Such information represents nuances that conventional spatial feature representations fail to capture.
Moreover, SN2V, representing the topological structure of the street network, consistently demonstrates a notable positive impact across the majority of predictive models.This result further validates the significant influence of street network structural traits in various domains, including Intelligent Transportation Systems, as previously discussed.Thus, its favorable involvement in crime prediction is not unexpected.
Next, the block features exhibit variable performance across different models, predominantly positive effects, though occasionally negative.Within our proposed framework, they maintain a positive influence.This finding reaffirms earlier research, which indicates that neighborhood characteristics can affect individuals' mobility and daily routines, potentially contributing to the occurrence of crime.The results indicate that HIN2V significantly enhances the prediction performance across nearly all models, as evidenced by the noteworthy improvements in both the AUC and ACC metrics for the two crime categories.This enhancement is likely attributable to the inclusion of a heterogeneous information network that encompasses more sophisticated semantic information.Such information represents nuances that conventional spatial feature representations fail to capture.
Moreover, SN2V, representing the topological structure of the street network, consistently demonstrates a notable positive impact across the majority of predictive models.This result further validates the significant influence of street network structural traits in various domains, including Intelligent Transportation Systems, as previously discussed.Thus, its favorable involvement in crime prediction is not unexpected.
Next, the block features exhibit variable performance across different models, predominantly positive effects, though occasionally negative.Within our proposed framework, they maintain a positive influence.This finding reaffirms earlier research, which indicates that neighborhood characteristics can affect individuals' mobility and daily routines, potentially contributing to the occurrence of crime.

Conclusions
In this article, we initially applied a general graph representation learning approach to derive the topological structure embedding of the street network.Subsequently, we constructed a heterogeneous information network incorporating both street network and urban functional facilities.Through a link prediction task, we obtained the embeddings of street segments within the urban built environment.Subsequently, these two high-order embeddings, combined with other spatio-temporal features, were fed into a deep neural network to enable street-level crime prediction.The predictive outcomes demonstrate the positive impact of both embeddings, with a particular emphasis on the significant contribution of HIN2V.When employing our model to predict burglary crimes, the Shapley value calculations reveal that HIN2V contributes 45.06% and 44.23% to the AUC and ACC metrics, respectively.For pickpocketing crime prediction, HIN2V's performance is even more pronounced, with contributions to the AUC and ACC metrics reaching 63.8% and 59.9%, respectively.Comparatively, SN2V performs better in predicting burglary than pickpocketing crimes, contributing 44.29% and 45.76% to the AUC and ACC for the former, and 19.17% and 27.51% for the latter.
Comparative experiments have conclusively shown that our neural network outperforms other baseline models in terms of efficacy, exhibiting a 6.3% enhancement in the AUC and a 4.12% increase in ACC compared to the next best model.
There remains room for improvement in our research.Firstly, while there are various graph representation learning methods based on street networks, our study only utilized Deepwalk.In subsequent work, we intend to explore additional methods to uncover more effective embeddings.Secondly, the heterogeneous information network (HIN) we constructed is, in theory, still a static network and does not yet consider the impact of dynamic factors like population movement on crime prediction.Addressing this limitation is also a focus of our future research endeavors.

Figure 1 .
Figure 1.Crime prediction framework based on representation learning.

Figure 1 .
Figure 1.Crime prediction framework based on representation learning.

Figure 2 .
Figure 2. The crime risk following the application of bi-exponential decay smoothing on a street segment (the blue bars represent the "time of occurrence of the incidents".).

Figure 2 .
Figure 2. The crime risk following the application of bi-exponential decay smoothing on a street segment (the blue bars represent the "time of occurrence of the incidents").

Figure 3 .
Figure 3.The dual graph transformation of a street network: (a) a fictive street network; (b) a du graph after transformation.

Figure 3 .
Figure 3.The dual graph transformation of a street network: (a) a fictive street network; (b) a dual graph after transformation.

Figure 4 .
Figure 4.The HIN schema transformation: (a) a fictive urban built environment including an administrative district, business district, AOI, POI, and street network; (b) the schema of the HIN after transformation.

Figure 4 .
Figure 4.The HIN schema transformation: (a) a fictive urban built environment including an administrative district, business district, AOI, POI, and street network; (b) the schema of the HIN after transformation.

Figure 5 .
Figure 5.The distribution of burglary incidents on the street network.

Figure 5 .
Figure 5.The distribution of burglary incidents on the street network.

Figure 6 .
Figure 6.The distribution of pickpocketing incidents on the street network.

Figure 6 .
Figure 6.The distribution of pickpocketing incidents on the street network.

Figure 7 .
Figure 7.The structure of the baseline model.

Figure 7 .
Figure 7.The structure of the baseline model.