1 Introduction and Related Work

A huge amount of data is generated every day in networked systems such as social networks [4, 5], biological networks, internet peer-to-peer networks [13], and other technological networks [3]. These data can be modelled using graph theory in which, the nodes represent the users or objects and the edges represent the relationship between two nodes in such networks. All networks undergo changes, with nodes or edges arriving or going away as the system develops. Therefore, static graph networks are not adequate to model these kinds of network structures.

It is known that naif anonymization of a graph can lead to disclosure because the adversaries can use information that they posses to infer private information from the structure of the graph. Several types of such attacks have been developed. See e.g., de-anonymization attack [18], degree attacks [14], 1-neighborhood attacks [23], and sub-graph attacks [11]. Proper privacy models have been developed for static graphs. These models can be broadly classified into those following k-anonymity and those following differential privacy. We focus here in the differential privacy model [8].

The definition of “adjacent graphs” is the key to extending differential privacy to social networks [12]. So far different definitions have been provided: node privacy [10], edge privacy [19], out-link privacy, and partition privacy [19, 20]. The most commonly used are node and edge differential privacy. Node privacy, provides desirable privacy protection but is impractical to deliver high utility (precise network analysis). Edge privacy shields users from attackers trying to learn about precise relationships between them, and it has been more widely adopted since it offers effective privacy protection in many practical applications.

Data on dynamic networks can take many different forms, but the most popular form and the one we consider in this paper is a collection of successively obtained, typically (but not necessarily) equally spaced snapshots of the network topology [22]. We restrict ourselves to networks based on a constant set of nodes. That is, nodes do not change but only edges do. This is not a limitation, because as we consider a finite set of snapshots all known a priori, the set of nodes appearing in at least one snapshot is known beforehand.

Keeping all of these in mind, here we propose dynamic graph privacy models and two novel edge-differentially private mechanisms for dynamic graphs. The closest related work, are the differentially private algorithms for counting-based problems in [9]. However, their algorithms are based on sensitivity and hence the edge randomization cannot be carried out locally, as in the present work.

1.1 Contributions and Paper Structure

In this work, our contributions are:

  • the extension of the definition of local differential privacy for edges to dynamic graphs;

  • the privacy mechanisms for providing graphs compliant with edge-local differential privacy for dynamic graphs. This is achieved by applying the noise-graph mechanism;

  • an empirical analysis of such privacy mechanisms. We show that the community structure in dynamic graphs can be preserved while still protecting the edges with local differential privacy.

The remainder of the paper is arranged in the following manner. We conclude this section presenting basic definitions related to graph protection (Subsect. 1.2). In Sect. 2 we propose two differentially private algorithms for dynamic graph protection. In Sect. 3 we implement the algorithms and describe how they work on real datasets. Lastly, we draw a conclusion and give a sketch of the future work on the basis of all these discussions. This is in Sect. 4.

1.2 Basic Definitions

For graph randomization, we consider adding noise-graphs as in [16], that is, a simplification from the original definition in [21], assuming that the original graph and the noise graph have the same sets of nodes.

We denote by G(VE) the graph with the set of nodes V and set of edges E.

Definition 1

Let \(G_1(V, E_1)\) and \(G_2(V, E_2)\) be two graphs with the same nodes V; then the addition of \(G_1\) and \(G_2\) is the graph \(G=(V,E)\) where:

$$E = (E_1 \setminus E_2) \cup (E_2 \setminus E_1).$$

We denote G as

$$G = G_1 \oplus G_2.$$

We will add noise using the Gilbert model, which is denoted by \(\mathcal{G}(n,p)\). That is, there are n nodes and each edge is chosen with probability p. The Gilbert and the Erdös-Rényi random graph models are the most common and general in the literature. It has been proved that they are asymptotically equivalent in [1]. So, to add noise to a graph G, we will draw a random graph \(G'\) from the Gilbert model (i.e., \(G' \in \mathcal{G}(n,p)\)) and add it to G, to obtain \(\tilde{G} = G\oplus G'\).

Now, we can define the general noise-graph mechanism [15] that we will use.

Definition 2

(Noise-graph mechanism). For any graph G with n nodes, and two probabilities \(p_0\) and \(p_1\), we define the following noise-graph mechanism:

$$\begin{aligned} \mathcal {A}_{p_0, p_1}(G) = G \oplus G_0 \oplus G_1, \end{aligned}$$
(1)

where \(G_0\) and \(G_1\) are such that:

$$\begin{aligned}&G_0 = G' \setminus G \text{ for } G'\in \mathcal{G}(n, 1-p_0)\\&G_1 = G''\cap G \text{ for } G'' \in \mathcal{G}(n,1-p_1). \end{aligned}$$

Definition 3

(Stochastic matrix associated to the noise graph). The probabilities of randomization of an edge or a non-edge in a graph G after applying the noise-graph mechanism \(\mathcal {A}_{p_0, p_1}\) are represented by the following stochastic matrix:

$$\begin{aligned} P = P(\mathcal {A}_{p_0, p_1})= \begin{pmatrix} p_0 &{} 1 - p_0 \\ 1 - p_1 &{} p_1 \end{pmatrix} \end{aligned}$$
(2)

2 Dynamic Graphs

Considering that the relations in a dynamic network may remain or disappear over time, a basic model that accounts for the ratios of appearence or disappearance of edges in a graph over a period of time was proposed in [22].

Formally, the network is observed at an initial state \(G_0\) at time \(t = 0\), and for every snapshot \(G_t\) each node pair not connected by an edge at the previous snapshot gains an edge with probability \(\alpha \), or not with probability \(1 - \alpha \). Similarly each existing edge disappears with probability \(\beta \) or not with probability \(1 - \beta \), from one snapshot to the next.

We denote the set of all dynamic graphs with n nodes V and T timestamps as \(\mathcal {G} = \{(G_0, G_1 \ldots G_T): G_t = G_t(V, E_t) \text{ for } t=0, \ldots T\}\).

Thus, we can formally define a dynamic network graph model G iteratively by considering the initial state \(G_0\) and applying the noise-graph mechanism \(\mathcal {A}_{p_0, p_1}\) to \(G_{i-1}\) to obtain \( G_i\), where \(p_0 = 1- \alpha \) and \(p_1 = 1 - \beta \), for \(i = 1, \ldots T\).

Definition 4

(Dynamic-network-graph-model). The dynamic network graph model consists of an initial state \(G_0\) and states \(G_t\), for \(t = 1, \ldots T\), defined by:

$$ G_t = \mathcal {A}_{1-\alpha , 1-\beta }(G_{t-1})$$

We will denote it as: \(G(G_0, T, \alpha , \beta )\).

In the other way around, if we have a series of snapshots \(G_0, G_1, \dots , G_T\) of a graph that evolves with time, and we know that it follows the basic model of dynamic graphs, then, we can estimate \(\alpha \) and \(\beta \) from these snapshots. The expressions to compute the parameters from the adjacency matrices are given in [22].

2.1 Differential Privacy for Dynamic Graphs

We adapt the definition of local differential privacy from [6] to be applied specifically to edges in a graph. Edge differential privacy is about the presence or absence of any edge, and local differential privacy is related to local randomization of each of the outputs. We combine both definitions for dynamic graphs to consider the edges in any of the graphs (snapshots) of the dynamic graph.

Definition 5

(Local differential privacy). [6] A randomized algorithm \(\pi \), satisfies \(\varepsilon \)-local differential privacy if for all inputs \(x, x'\) and all outputs \(y \in Range(\pi )\):

$$\begin{aligned} P(\pi (x) = y) \le e^{\varepsilon }P(\pi (x') = y) \end{aligned}$$
(3)

We denote by \(\mathbb {1}_{uv(t)}\) the indicator function of edge uv in \(G_t\), that is \(\mathbb {1}_{uv(t)} = 1\) if \(uv \in E_t\), and \(\mathbb {1}_{uv(t)} = 0\) otherwise. Similarly, \(\mathbb {1}_{\mathcal {A}(uv(t))}\) is the indicator function of edge uv in \(\mathcal {A}(G_t)\), the randomized graph.

Definition 6

(Edge-local differential privacy for dynamic graphs). An edge randomization algorithm \(\mathcal {A}: \mathcal {G} \rightarrow \mathcal {G}\), satisfies \(\varepsilon \)-edge local differential privacy if for every pair of nodes \(u, v \in V\), any timestamp \(t \in \{1, \ldots T\}\) and \(x,x',y \in \{0, 1\}\):

$$\begin{aligned} P ( \mathbb {1}_{\mathcal {A}(uv(t))} = y \mid \mathbb {1}_{uv(t)} = x ) \le e^{\varepsilon } P( \mathbb {1}_{\mathcal {A}(uv(t))} = y \mid \mathbb {1}_{uv(t)} = x' ) , \end{aligned}$$
(4)

we say that \(\mathcal {A}\) is \(\varepsilon \)-edge locally differentially private (\(\varepsilon \)-eLDP).

Observe that when we consider local differential privacy at edge level, it implies the same probability of presence or absence of every edge in the protected graph independently of whether the edge was present or not in the original graph.

Thus, Definition 6 can be obtained from Definition 5, considering that the inputs \(x, x'\) represent whether any edge uv is present in the snapshot-graph \(G_t\) or not, and output y is the presence or absence of the same edge in the randomized graph.

2.2 Protection Mechanisms for Dynamic Graphs

Considering that a dynamic graph can be modelled with the dynamic-network-graph-model from Definition 4, it is natural to use it as a first approach to protect dynamic graphs. Thus, we show that the dynamic random graph model \(G(G_0, T, p_0, p_1)\) is edge-differentially private for specific parameters \(p_0\) and \(p_1\).

Additionally, we define the parallel protection mechanism that adds noise to each snapshot of the dynamic graph, and therefore it may have a better utility.

Definition 7

(Dynamic-network-mechanism). Let \(G = (G_0, G_1 \ldots G_T)\) be a dynamic graph. We define the protected dynamic graph \(G' = (G'_0, G'_1 \ldots G'_T)\) by letting:

$$G'_0 = \mathcal {A}_{p_0, p_1}(G_0) \text{ and } G_i = \mathcal {A}^{i+1}_{p_0, p_1}(G_0).$$

That is, the dynamic-network-mechanism is:

$$\mathcal {D}_{p_0, p_1}(G) = G(G'_0, T, 1-p_0, 1-p_1),$$

Remark 1

(Randomization probabilities matrix). The probabilities of randomization for the dynamic network mechanism \(\mathcal {D}_{p_0, p_1}(G)\) are calculated by the (\(t=1, \ldots , T+1\)) powers of the stochastic matrix P in (2), we denote them as as:

$$\begin{aligned} P^t = \begin{pmatrix} p_{0} &{} 1 - p_{0} \\ 1 - p_{1} &{} p_{1} \end{pmatrix}^t = \begin{pmatrix} p_{00}[t] &{} p_{01}[t] \\ p_{10}[t] &{} p_{11}[t] \end{pmatrix} \end{aligned}$$
(5)

Note that \(p_{xy}[t]\) corresponds to: \(P( \mathbb {1}_{\mathcal {A}(uv(t))} = y \mid \mathbb {1}_{uv(t)} = x )\), with \(x,y \in \{0,1\}\).

Theorem 1

The mechanism \(\mathcal {D}_{p_0, p_1}\) is \(\varepsilon \)-eLDP if

$$\begin{aligned} e^{\varepsilon } \ge \max _{t=1,\ldots T+1} \left\{ \frac{p_{10}[t]}{p_{00}[t]}, \frac{p_{11}[t]}{p_{01}[t]}, \frac{p_{00}[t]}{p_{10}[t]}, \frac{p_{01}[t]}{p_{11}[t]} \right\} \end{aligned}$$
(6)

See proof on page 13.

Lemma 1

Assume that the following inequality holds:

$$\begin{aligned} e^{\varepsilon } \ge \max \left\{ \frac{p_{10}}{p_{00}}, \frac{p_{11}}{p_{01}}, \frac{p_{00}}{p_{10}}, \frac{p_{01}}{p_{11}} \right\} \end{aligned}$$
(7)

Then (6) holds, that is:

$$\begin{aligned} e^{\varepsilon } \ge \max _{t=1,\ldots T+1} \left\{ \frac{p_{10}[t]}{p_{00}[t]}, \frac{p_{11}[t]}{p_{01}[t]}, \frac{p_{00}[t]}{p_{10}[t]}, \frac{p_{01}[t]}{p_{11}[t]} \right\} \end{aligned}$$

See proof on page 14.

Corollary 1

The mechanism \(\mathcal {D}_{p_0, p_1}\) is \(\varepsilon \)-eLDP if

$$\begin{aligned} e^{\varepsilon } \ge \max \left\{ \frac{p_{10}}{p_{00}}, \frac{p_{11}}{p_{01}}, \frac{p_{00}}{p_{10}}, \frac{p_{01}}{p_{11}} \right\} \end{aligned}$$
(8)

Theorem 1 from [17] provides a complete characterization of the values for the probabilities \((p_{00}, p_{11})\) for which this equation holds, hence \((p_{00}, p_{11})\) can be parameterized depending on the \(\varepsilon \) required for protection.

Let us now consider an alternative protection mechanism. We call it the parallel protection of a dynamic graph. We define it as follows.

Definition 8

(Parallel protection mechanism). Let \(G=G_0, G_1, \dots , G_T\) be a dynamic graph. Let \(\mathcal{A}_{p_0, p_1}\) denote the noise-graph mechanism. Then, we define the parallel protection of the dynamic graph with parameters \(p_0\) and \(p_1\) as the protection process that provides \(\tilde{G}=\tilde{G}_0, \tilde{G}_1, \dots , \tilde{G}_T\) with \(\tilde{G}_i=\mathcal{A}_{p_0,p_1}(G_i)\) for \(i=0, \dots , T\).

We denote the parallel protection of a dynamic graph G with parameters \(p_0\) and \(p_1\) as \(\mathcal{A}^{||}_{p_0,p_1}(G)\).

Equivalently to Corollary 1, the following can be proven.

Proposition 1

The parallel protection mechanism \(\mathcal{A}^{||}_{p_0,p_1}\) satisfies \(\varepsilon \)-local differential privacy when

$$\begin{aligned} e^{\varepsilon } \ge \max \left\{ \frac{p_{10}}{p_{00}}, \frac{p_{11}}{p_{01}}, \frac{p_{00}}{p_{10}}, \frac{p_{01}}{p_{11}} \right\} \end{aligned}$$

See proof on page 15.

3 Application to Community Detection Algorithms

This section consists of an experimental analysis of previous theoretical claims for an application of the proposed privacy algorithms. We base our utility analysis on community detection algorithms through normalised mutual information.

3.1 Experiment Description

We used two real-life datasets to evaluate the application of the proposed privacy protection algorithms. They are: CAIDA-AS relationship and DBLP datasets. We provide their basic statistics in Table 1.

One dataset is the CAIDA-AS relationship dataset [2] – Autonomous Systems (AS), which roughly corresponds to Internet Service Providers (ISP) and their relationships. We consider the p2p links, that are those that connect two ISPs who have agreed to exchange traffic on a quid pro quo basis. From this data we took the 1-month snapshot graphs for each of the 12 months in 2015.

The computer science bibliography DBLP provides its whole dataset of bibliographical entries in XML format, under the terms of the Open Data Commons Attribution License (ODC-BY 1.0). We use the coauthorship graph from [7], which has 1,482,029 unique authors and 10,615,809 timestamped co-authorship edges between authors. We preprocess the DBLP dataset, considering only the authors that published a paper each of the years between 2005 and 2013.

Table 1. Preprocessed datasets statistics

The experiments are divided into five parts, that we summarize as follows:

  1. 1.

    We divide the data into snapshots such that the same vertices appear in every snapshot. In the case of DBLP this is the set of authors that have published at least a paper each year of the period from 2005 to 2013.

  2. 2.

    We fix the value of \(p_0,p_1\) as in Table 2. We choose the smaller values of \(p_1\) for smaller \(\varepsilon \), otherwise the data will have a huge amount of edges. For larger \(\varepsilon \) values we may choose larger \(p_1\) which also yields better utility. Note that the same \(\varepsilon \) can be obtained from several pair of values \(p_0,p_1\), cf. [17].

  3. 3.

    We protect the data with our two proposed protection algorithms: the dynamic-network and the parallel mechanisms. We apply them five times each to obtain the average and confidence intervals of the utility measures.

  4. 4.

    We detect the community structure on each of the snapshot graphs, and compare it to the original community structure without privacy protection, we report the average and \(95\%\) confidence intervals on the figures.

  5. 5.

    We measure the density of each snapshot graph and compare them.

Table 2. Values of \(p_{0}\) and \(p_{1}\) to obtain the \(\varepsilon \) in the experiments.

Utility Measures: We use Community Detection Algorithms, to assess the partitioning or clustering of nodes as well as their propensity to stick together or disintegrate. Communities make it possible to map a network at a wide scale because they operate as meta-nodes in the network, that are used to facilitate analysis. The prediction of missing connections and the detection of fake links in the network are the two most significant applications of community detection in network research.

NMI, or Normalised Mutual Information, is a metric used to assess how well community discovery methods execute network partitioning. Due to its broad meaning and ability to compare two partitions even when there are different numbers of clusters, it is frequently taken into consideration.

Finally, the Graph Density is defined to be the ratio of the number of edges with respect to the maximum possible edges.

3.2 CAIDA Dataset

In this section, we compare the effects of the dynamic-network and the parallel mechanisms for small and large \(\varepsilon \) values on the NMI and density measures of the CAIDA-AS dataset in Figs. 1 and 2.

Fig. 1.
figure 1

Normalized mutual information between the communities detected on the CAIDA-AS data and the data protected with the dynamic and parallel mechanisms for several \(\varepsilon \) values.

In Fig. 1, we notice that for \(\varepsilon = 20\) the NMI of both algorithms is almost 1, which means that the communities discovered over the protected data are almost the same as the original ones. Additionally, the NMI in the dynamic mechanism tends to decrease as the timestamp increases, whereas the parallel does not has this effect. For smaller values of \(\varepsilon \), as the protection is stronger, the difference between the NMI values in both mechanisms is small. This may be explained with the larger densities obtained for small \(\varepsilon \) values in Fig. 2 (a) and (b). In contrast, in Fig. 2 (c) and (d), we note that for large \(\varepsilon \) values, the densities for each protected snapshot graphs are similar to the original density, and tend to it as the \(\varepsilon \) value increases.

Fig. 2.
figure 2

Densities for the snapshot-graphs obtained by applying the dynamic and parallel mechanisms to CAIDA-AS.

3.3 DBLP Dataset

We compare the effects of the dynamic-network and the parallel mechanisms for small and large \(\varepsilon \) values on the NMI and density measures of the DBLP dataset in Figs. 3 and 4. In Fig. 3, it is shown that for larger \(\varepsilon \), the communities detected on the protected graph are similar to the original communities, since the NMI is around 0.9. Also, it can be noted that the parallel mechanism has better NMI than the dynamic-network mechanism for large \(\varepsilon \) values. Additionally, the dynamic-network mechanism’s NMI decreases in time, whereas the parallel does not. For smaller \(\varepsilon \) values the performance of both mechanisms is similar.

Fig. 3.
figure 3

Normalized mutual information between the communities detected on the DBLP data and the data protected with the dynamic and parallel mechanisms for several \(\varepsilon \) values.

In Fig. 4 (a) and (b), we show the effect of both mechanisms on the densities of the graph for small \(\varepsilon \) values. We notice that for \(\varepsilon = 2\) the density of the protected snapshots is near to 0.007 which means that they have around 4,529,999 of edges, which is 100 times the original average snapshot density of 0.00007. In Fig. 4 (c) and (d), we show the effect of both mechanisms on the densities of the graph for large \(\varepsilon \) values. An increase in the density means that there have been created more noise-edges than there have been erased real-edges. Again, the parallel mechanism incurs a lower increase in density than the dynamic. Moreover, the increase in density for \(\varepsilon = 10\) is more steep for the dynamic than for the rest of \(\varepsilon \) values.

Fig. 4.
figure 4

Densities for the snapshot-graphs obtained by applying the dynamic and parallel mechanisms to DBLP.

4 Conclusions and Future Scope

We proposed two protection methods for dynamic graphs: the dynamic-network and the parallel protection mechanisms. We extended the definition of local differential privacy for edges in dynamic graphs. We showed that both our proposed methods are \(\varepsilon \)-edge locally differentially private for specific values of randomization probabilities in the noise-graph mechanism \(p_0\) and \(p_1\). We performed an empirical analysis of such algorithms, to show that they keep the community structure of the dynamic graphs while protecting their edges with local differential privacy.

In this work, we only focus on edge privacy with fixed nodes which extend the \(\varepsilon \)-edge local differential privacy notion. But, there is still room to look over changing nodes and change of edges and nodes simultaneously. We also would like to extend this notion of privacy in the path of graph neural networks and federated learning. We plan to extend the empirical analysis to other graph utility metrics and other definitions of dynamic graphs.