Modelling interactions among offenders: A latent space approach for interdependent ego-networks

Abstract Illegal markets are notoriously difficult to study. Police data offer an increasingly exploited source of evidence. However, their secondary nature poses challenges for researchers. A key issue is that researchers often have to deal with two sets of actors: targeted and non-targeted. This work develops a latent space model for interdependent ego-networks purposely created to deal with the targeted nature of police evidence. By treating targeted offenders as egos and their contacts as alters, the model (a) leverages on the full information available and (b) mirrors the specificity of the data collection strategy. The paper then applies this approach to analyse a real-world example of illegal markets, namely the smuggling of migrants. To this end, we utilise a novel dataset of 21,555 phone conversations wiretapped by the police to study interactions among offenders.

alters and X be the N × M incidence matrix encoding presence or absence of an edge 139 between egos and alters, with entries x ik = 1 if there is a tie between ego i and alter 140 k and x ik = 0 otherwise. Egos can be connected to the same alter l if x il = x jl = 1. proposed to use the squared Euclidean distance for undirected networks instead of the 154 commonly used Euclidean distance (Hoff et al., 2002) for two main reasons: firstly, it 155 allows one to visualise more clearly the presence of nodal clusters by giving a higher 156 probability of a link between two close nodes in the latent space and lower probabili-157 ties to two nodes lying far away from each other. Secondly, it makes the model need 158 fewer approximation steps for the variational estimation procedure which provides a very 159 fast estimation method for large networks (see Section 2.3 for more details). This is 160 particularly helpful when dealing with large-scale police evidence. 161 The relational structure of the ego-ego network Y can be captured by a latent space 162 model with squared Euclidean distance: where the density parameter and the latent positions are respectively α ∼ N (ξ α , ψ 2 α ), and 164 z i ∼ N (0, σ 2 I D ) and ξ α , ψ 2 α , σ 2 are fixed parameters.
where z i and w k are the latent positions of ego i and alter k respectively. Assuming condi-173 tional dyadic independence given the latent positions we have that the overall probability 174 of observing the incidence matrix X can be written as respectively the latent positions of egos and alters in the latent space, and ξ β , ψ 2 β , σ 2 are 177 fixed parameters.

197
Due to the large size of the data we analyse in this paper, variational methods repre- where the distributions of p(Z), p(W), p(β), and p(α) are defined in Section 2.1 and 2.2.

201
We propose the following variational approximation to the target distribution: An expectation-maximisation (EM) algorithm used to carry out parameter inference 205 at each (t + 1) iteration consists of the following steps: where KL(·) is the Kullback-Leibler divergence measure and Θ α = (ξ α ,ψ 2 α ), • M-Step:

212
-Estimateξ α andψ 2 α by evaluating: -Estimateξ β andψ 2 β by evaluating: To minimise the risk of estimating local maxima, the algorithm needs to be run several 260 Figure 3 shows the adjacency matrix Y of the data set and the ego degree for each 261 ego. Figure 4 shows the incidence matrix X and the alter degree for each ego. Figure 5 262 shows degree distribution for the alters. changed. An edge between any two actors is present if an interaction between them has 270 been recorded. In this paper, we do not consider the direction of the call and the number 271 of calls exchanged, hence we work with an undirected and unweighted graph. However, 272 our modelling framework can be adapted for directed and weighted graphs.    We now include in our model the information about the interactions between egos (tar-298 geted smugglers) and alters (non-targeted individuals) to gain a complete picture of the 299 behaviour of the 28 targeted smugglers. To do so, we estimate the model for interde-300 pendent ego-networks by including the relational information of the incidence matrix 301 X.

319
Finally, there is a number of players whose position is almost overlapping, e.g., E7/E3 320 and E4/E20. This suggests a high degree of equivalence from a market perspective, and When comparing Figure 8 to Figure 6, we can notice that most of the structure 324 has remained unchanged. This means that the ego-alter relational structure is broadly 325 reflecting the ego-ego relational structure. However, some changes did appear when using 326 the full information available. Notably the position of E28, who was previously close to 327 E15, is now showing a much higher criminal distance with E15 (and a full switch from 328 the right-hand side to the left-hand side of the picture). 329 Figure 9 shows the graphical goodness of fit diagnostics for the estimated model The lower triangle of the matrix displayed in Figure 10 shows

365
In this paper, we looked specifically at the effect of the sampling strategy adopted by 366 law enforcement agencies on the data structure. We started from the observation that investigations as well as data extracted from police records using a targeted extraction approach. 374 We have posited that our tailored model can be fruitfully used to study interactions 375 among offenders -and, more generally, the structure of illegal markets. By modelling 376 a market as a latent space, researchers can identify central actors, clusters of close in-377 teractions ("criminal proximity") as well as gauging the reverse behaviour, which we The ego-ego network latent space model is defined as: We assume the following distributions for the model unknowns, where p(α) = N (ξ α , ψ 2 α ), 522 p(z i ) iid = N (0, σ 2 I D ) and σ 2 , ξ α , ψ 2 α are fixed parameters, and the squared Euclidean dis-523 tance between ego i and ego j is The ego-alter network latent space model is defined as: We assume the following distributions for the model unknowns, where p(β) = N (ξ β , ψ 2 β ), 526 p(w k ) iid = N (0, σ 2 I D ) and σ 2 , ξ β , ψ 2 β are fixed parameters, and the squared Euclidean dis-527 tance between ego i and alter k is The posterior probability is of the unknown (Z, α) is of the form: We define the variational posterior q(Z, W, α, β|Y, X) introducing the variational 530 parametersΘ z = (ξ α ,ψ 2 α ),z i ,Σ z ,Θ w = (ξ β ,ψ 2 β ),w k ,Σ w : where q(α) = N (ξ α ,ψ 2 α ), q(z i ) = N (z i ,Σ z ), q(β) = N (ξ β ,ψ 2 β ) and q(w k ) = N (w k ,Σ w ).

561
First-order Taylor series expansion of We have:

570
Second-order Taylor series expansion of where: .

584
KL ≈ψ 2 Second-order Taylor series expansion approximation of Therefore, Let's find the gradient G and the Hessian matrix H of f evaluated atw k =w o k .