Distribution and Dependence of Extremes in Network Sampling Processes

We explore the dependence structure in the sampled sequence of complex networks. We consider randomized algorithms to sample the nodes and study external properties in any associated stationary sequence of characteristics of interest like node degrees, number of followers or income of the nodes in Online Social Networks etc, which satisfy two mixing conditions. Several useful extremes of the sampled sequence like kth largest value, clusters of exceedances over a threshold, first hitting time of a large value etc are investigated. We abstract the dependence and the statistics of extremes into a single parameter that appears in Extreme Value Theory, called external index (EI). In this work, we derive this parameter analytically and also estimate it empirically. We propose the use of EI as a parameter to compare different sampling procedures. As a specific example, degree correlations between neighboring nodes are studied in detail with three prominent random walks as sampling techniques.


Introduction
Data from real networks shows that correlations exist in various forms, for instance the existence of social relationships and interests in social networks.Degree correlations between neighbors, correlations in income, followers of users and number of likes of specific pages in social networks are some examples, to name a few.These kind of correlations have several implications in network structure, for example, degree-degree correlations manifests itself in assortativity or disassortativity of the network [4].
We consider very large networks where it is impractical to have a complete picture a priori.Crawling or sampling techniques are in practice to explore such networks by making use of API calls or HTML scrapping.We look into randomized sampling techniques which generate stationary samples.As an example, random walk based algorithms are in use in many cases because of several advantages offered by them [3,7].
We focus on the extremal properties in the correlated and stationary sequence of characteristics of interest X 1 , . . ., X n which is a function of the node sequence, the one actually generated by sampling algorithms.The characteristics of interest, for instance, can be node degrees, node income, number of followers of the node in OSN etc.Among the properties, clusters of exceedances of such sequences over high thresholds are studied in particular.The cluster of exceedances is determined as the consecutive exceedances of {X n } over the threshold {u n } between two consecutive non-exceedances [10,17].It is important to investigate stochastic nature of extremes since it allows us to disseminate advertisement or collect opinions more effectively within the clusters.
The dependence structure of sampled sequence exceeding sufficiently high thresholds is measured using a parameter called extremal index (EI), θ in Extremal Value Theory.It is defined as follows.
Definition 1. [13, p. 53] The stationary sequence {X n } n≥1 , with M n = max{X 1 , . . ., X n } and F as the marginal distribution function, is said to have the extremal index θ ∈ [0, 1] if for each 0 < τ < ∞ there is a sequence of real numbers (thresholds) (1) The maxima M n is related to EI more clearly as [5, p. 381] When {X n } n≥1 is i.i.d.(for instance uniform independent node sampling), θ = 1 and point processes of exceedances over threshold u n converges weakly to homogeneous Poisson process [5,Chapter 5].But when 0 ≤ θ < 1, point processes of exceedances converges weakly to compound Poisson process and this implies that exceedances of high threshold values u n tend to occur in clusters for dependent data [5,Chapter 10].EI has many useful interpretations and applications like • Finding distribution of order statistics of the sampled sequence.These can be used to find quantiles and predicts the kth largest value which arise with a certain probability.Specifically for the distribution of maxima, (2) is available and the quantile of maxima is proportional to EI. Hence in case of samples with lower EI, lower values of maxima can be expected.When sampled sequence is the sequence of node degrees, these give many useful results.
• Close relation of extremal index to the distribution and expectation of the size of clusters of exceedances.
• First hitting time of the sampled sequence to (u n , ∞) is related to EI.Thus in case of applications where the aim is to detect large values of samples quickly, by knowing only the joint and marginal distributions of the sampled sequence and without actually employing sampling (which might be very costly), we can compare different sampling procedures by EI: smaller EI leads to longer searching of the first hitting time.
These interpretations are explained later in the paper.
The main contributions in this work are as follows.We study the extremal and clustering properties of the sampling process due to correlations in large graphs.In order to facilitate a painless future study of correlations and clusters of samples in large networks, we propose to abstract the extremal properties into a single and handy parameter, EI.Analytical procedures are derived to calculate EI for any general stationary sampled sequence satisfying two mixing conditions.Degree correlations are explained in detail with a random graph model for which joint degree correlations exist for neighbor nodes.Three different random walk based algorithms that are widely discussed in literature (see [3] and the references therein), are then revised for degree state space and EI is calculated when the joint degree correlation is bivariate Pareto distributed.We establish a general lower bound for EI in PageRank processes irrespective of the degree correlation model.Finally two estimation techniques of EI are provided and EI is numerically computed for a synthetic graph with neighbour degrees correlated and for two real networks (Enron email network and DBLP network).Several useful applications of EI to analyse large graphs, known only through sampled sequences, are proposed.
The paper is organized as follows.In Section 2, methods to derive EI are presented.Section 3 considers the case of degree correlations.In Section 3.1 the graph model and correlated graph generation technique are presented.Section 3.2 explains the different types of random walks studied and derives associated transition kernels and joint degree distributions.EI is calculated for different sampling techniques later in Section 3.3.In Section 4 we provide several applications of extremal index in graph sampling techniques.In Section 5 we estimate extremal index and perform numerical comparisons.Finally Section 6 concludes the paper.

Calculation of Extremal Index (EI)
We consider networks represented by an undirected graph G with N vertices and M edges.Since the networks under consideration are huge, we assume it is impossible to describe them completely, i.e., no adjacency matrix beforehand.Assume any randomized sampling procedure is employed and let the sampled sequence {X i } be any general sequence.
This section explains a way to calculate extremal index from the bivariate distribution if the sampled sequence admits two mixing conditions.
where (n/r n )α n,ln → 0 and l n /r n → 0 with α n,ln , l n as in Condition 2 and r n as o(n).
Proposition 1.If the sampled sequence is stationary and satisfies conditions D(u n ) and D ′′ (u n ), then extremal index is given by and 0 ≤ θ ≤ 1.
Proof.From [14], for the stationary sequence {X n } with Conditions D(u n and The existence of EI in [0, 1] is evident from the definition used in this proof. Remark 1.The condition D ′′ (u n ) can be made weaker to D (k) (u n ) presented in [8], where r n is defined as in D ′′ (u n ).For the stationary sequence k) is satisfied for some k ≥ 2 along with D(u n ), then following the proof of Proposition 1, EI can be derived as In some cases it is easy to handle with the joint tail distribution.Survival Copula C(•, •) which corresponds to Lower tail dependence function of survival copula is defined as [21] λ(u 1 , u 2 ) = lim Hence C ′ (0, 0) = λ(1, 1).λ can be calculated for different copula families.In particular, if C is a bivariate Archimedean copula, then it can be represented as, where ψ is the generator function and ) −β and (X 1 , X 2 ) has a multivariate regularly varying distribution [21].Therefore, for Archimedean copula family, EI is given by As an example, for bivariate Pareto distribution of the form P( 2.1 Check of conditions D(u n ) and D ′′ (u n ) If the sampling technique is assumed to be based on a Markov chain and consider the sampled sequence as measurable functions of stationary Markov samples, then such a sequence is stationary and [20] proved that another mixing condition AIM (u n ) which implies D(u n ) is satisfied.Condition D ′′ (u n ) allows clusters with consecutive exceedances and eliminates the possibility of clusters with upcrossing of the threshold u n (X i ≤ u n < X i+1 ).Hence in those cases, where it is tedious to check the condition D ′′ (u n ) theoretically, we can use numerical procedures to measure ratio of number of consecutive exceedances to number of exceedances and the ratio of number of upcrossings to number of consecutive exceedances in small intervals.Such an example is provided in Section 3.3.
Remark 2. The EI is derived in [9] to the same expression in (3).But [9] assumes {X n } is sampled from a first order Markov chain.This condition is much stricter than D(u n ) and D ′′ (u n ) which we used to derive (3).For instance, degrees of the node samples obtained from a Markov chain based sampling, mostly not form a Markov chain as node-degree relation is not one-one while D(u n ) is agreed for such a case and D ′′ (u n ) can get satisfied, see Section 3.3 for an example.

Degree correlations
The techniques established in Section 2 are very general, applicable to any sampling techniques and any sequence of samples which satisfy certain conditions.In this section we illustrate the calculation of extremal index for correlations among degrees.We introduce different sampling techniques through this section though they can be used in case of any general correlations.We denote the sampled sequence {X i } as {D i } in this section.

Description of the model
We take into account correlation in degrees between neighbor nodes.The dependence structure in the graph is described by the joint degree-degree probability density function f (d 1 , d 2 ) with d 1 and d 2 indicating the degrees of adjacent nodes or equivalently by the corresponding tail distribution function with D 1 and D 2 representing the corresponding degree random variables (see e.g., [4,6,12]).
The probability that a randomly chosen edge has the end vertices with degrees zero otherwise.The multiplying factor 2 appear on the above expression when due to the undirected nature of the underlying graph, and the fact that both f (d 1 , d 2 and f (d 2 , d 1 ) contribute to the edge probability under consideration.
The degree density f d (d 1 ) can be calculated from the marginal of f (d 1 , d 2 ) as where E[D] denotes the mean node degree, ) can be interpreted as the degree density of a vertex reached by following a randomly chosen edge.The approximation for f (d 1 ) is obtained as follows: in the R.H.S. of ( 5), roughly, is the number of half edges from nodes with degree around d 1 and E[D]N is the total number of half edges.
From the above description, it can be noted that the knowledge of f (d 1 , d 2 ) is sufficient to describe this random graph model and for its generation.
Most of the results in this paper are derived assuming continuous probability distributions for f (d 1 , d 2 ) and f d (d 1 ) because an easy and unique way to calculate extremal index exists for continuous distributions in our setup (more details in Section 2).Also the extremal index might not exist for many discrete valued distributions [13].

Random graph generation
A random graph bivariate joint degree-degree correlation distribution can be generated as follows ([19]).
1. Degree sequence is generated according to the degree distribution, An uncorrelated random graph is generated with the generated degree sequence using configuration model ([4]) 3. Metropolis dynamics is applied now on the generated graph: choose two edges randomly (denoted by the vertex pairs (v 1 , w 1 ) and (v 2 , w 2 )) and measure the degrees, (j 1 , k 1 ) and (j 2 , k 2 ) correspond to these vertex pairs.Generated a random number, y, according to uniform distribution in , then remove the selected edges and construct news ones as (v 1 , v 2 ) and (w 1 , w 2 ).Otherwise keep the selected edges intact.This dynamics will generate the required joint degree-degree distribution.Run Metropolis dynamics well enough to mix the network.

Description of random walks
In this section, we explain three different random walk based algorithms for exploring the network.They have been extensively studied in previous works [3,7,16] where they are formulated with vertex set as the state space of the underlying Markov chain on graph.The walker in these algorithms, after reaching each node, moves to another node randomly by following the transition kernel of the Markov chain.But since the interest in the present work is in the degree sequence, rather than node sequence, and its extremal properties, we take degree set as the state space and find appropriate transition kernels.We use f X and P X to represent the probability density function and probability measure under the algorithm X with the exception that f d represents the probability density function of degrees.

Random Walk (RW)
In a random walk, the next node to visit is chosen uniformly among the neighbors of the current node.From ( 5) we approximate the standard random walk on degree state space by the following transition kernel, conditional density function that the present node has degree d t and the next node is with degree d t+1 , This approximation is obtained as follows: given the present node has degree d t , 1/d t is the probability of selecting a neighbor uniformly and rest of the terms in R.H.S. represent the mean number of neighbors with degree around d t+1 .When is the mean number of edges between degrees about d t and d t+1 and f d (d t )N is the mean number of nodes with degrees about d t , and thus their ratio represents such a mean number of edges per node with degree about d t , i.e., mean number of neighbors with degree about d t+1 .The probability of occurring the other case, d t = d t+1 , is zero as the degrees are assumed to follow a continuous distribution.
If the standard random walk on the vertex set is in the stationary regime, its stationary distribution (probability of staying at a particular vertex i) is proportional to the degree (see e.g., [16]) and is given by d i /2M .Then in the standard random walk on degree set, the stationary distribution of staying at any node with degree around d 1 can be approximated as Then, the joint density of the standard random walk is

Check of the approximation
We provide comparison of simulated values and theoretical values of transition kernel of RW in Figure 1.The bivariate Pareto model is assumed for the joint degree-degree tail function of the graph, PageRank is a modification of the random walk which with a fixed probability 1 − c samples a random node with uniform distribution and with a probability c, it follows the random walk transition [7].Its evolution on degree state space can be described as follows: Here the 1/N corresponds to the uniform sampling on vertex set and 1 N N f d (d t+1 ) indicates the net probability of jumping to all the nodes with degree around d t+1 .

Check of the approximation
We provide a consistency check of the approximation derived for transition kernel by studying tail behavior of degree distribution and PageRank distribution.It is known that under some strict conditions, for a directed graph, PageRank and Indegree have same tail exponents [15].In our formulation in terms of degrees, for uncorrelated and undirected graph, PageRank for a given degree d, P R(d), can be approximated from the basic definition as, This is a deterministic quantity.We are interested in the distribution of the random variable P R(D), PageRank of a randomly choosen degree class D. PageRank P R(d) is also the long term proportion or probability that PageRank process ends in a degree class with degree d.This can be scaled suitably to provide a rank-type information.Its tail distribution is where D ∼ f d (.).The PageRank of any vertex inside the degree class d is P R(d)/(N f d (d)).The distribution of Page Rank of a randomly chosen vertex i, P (P R(i) > x) after appropriate scaling for comparison with degree distribution is P (N.P R(i) > d), where d = N x.Now This of the form P (D > A d + B) with A and B as appropriate constants and hence will have the same exponent of degree distribution tail when the graph is uncorrelated.
There is no convenient expression for the stationary distribution of PageRank, to the best of our knowledge, and it is difficult to come up with an easy to handle expression for the joint distribution.Therefore, along with other advantages, we consider another modification of the standard random walk.

Random Walk with Jumps (RWJ)
In this algorithm we follow random walk on a modified graph which is a superposition of the given graph and complete graph on same vertex set of the given graph with weight α/N on each edge, α ∈ [0, ∞] being a design parameter ( [3]).The algorithm can be shown to be equivalent to select c = α/(d t + α) in the PageRank algorithm, where d t is the degree of the present node.The larger the node's degree, less likely is the artificial jump of the process.This modification makes the underlying Markov chain time reversible, significantly reduces mixing time, improves estimation error and leads to a closed form expression for stationary distribution.
The transition kernel on degree set, following PageRank kernel, is The stationary distribution for node i (on the vertex set) is (d i + α)/(2M + N α) and the equivalent stationary probability density function on degree set by collecting all the nodes with same degree is since 2M/N = E[D].The stationarity of the f RW J (d 1 can be verified by plugging the obtained expression in the stationarity condition of the Markov Chains.We have where (5) has been applied.Then, the joint density function for the random walk with jumps has the following form

Inria
Moreover the associated tail distribution has a simple form, Remark 3. Characterizing Markov chain based sampling in terms of degree transition has some advantages, • In the different random walk algorithms considered on vertex set, all the nodes with same degree have same stationary distribution.This also implies that it is more natural to formulate the random walk transition in terms of degree.
• Degree uncorrelations in the underlying graph is directly reflected in the joint distribution of the studied sampling techniques.For uncorrelated networks,

Extremal Index for Degree Correlations among Neighbors
As explained in the Introduction section, extremal index is an important parameter in characterizing dependence and extremal properties in a stationary sequence.We assume that we have waited sufficiently long that the underlying Markov chain of the three different graph sampling algorithms are in stationary regime now.We use the expression for EI given in Proposition 1. Condition D(u n ) is satisfied as explained in Section 2.1.In order to check D ′′ (u n ), we use the model in (7) and in Figure 1.We collect samples for each of the techniques RW, PR and RWJ.Intervals are taken of duration 5, 10, 15 and 20 time samples.Proportion of ratio of number of upcrossings to number of exceedances in these intervals averaged over 2000 occurrences of each of these intervals and over all such intervals in the class, gives 2 − 7% for all sampling procedures.Averaged proportion of ratio of number consecuitve exceedances to number of exceedances is 85 − 90%.These statistics indicates strong occurrence of condition D ′′ (u n ).

Example-EI for Random Walk sampling
As f RW (x, y) is same as f (x, y), we have, In this case, we can also use expression given in (4).

Example-EI for Random Walk with Jumps sampling
We have for random walk with jumps, After differentiation and solving, Thus the extremal index is 2. When α = ∞ (independent uniform node sampling): Moreover this result can be verified using the approximate analysis, i.e., when x → ∞, with The EI is calculated to be same for RW and RWJ when α ∈ [0, ∞).This interestingly hints, at least for this particular graph model, that the jumps or restarts in RWJ appear as perturbations along RW behavior and do not affect the extremal properties of the samples sequence.Among the random walk based algorithms considered, RW seems to best reflect the extremal properties of the underlying network, but faces many practical issues like the possibility to get stuck in a disconnected component, biased estimators etc. RWJ overcomes such problems and at the same time, it achieves the extremal behavior of RW.

Lower bound of EI of the PageRank
We obtain the following lower bound for EI in the PageRank processes.
Proposition 2. For the PageRank process on degree state space irrespective of the degree correlation structure in the underlying graph, the extremal index Proof.From [20], the following representation of EI holds for degree sequence, where {p n } is an increasing sequence of positive integers, p n = o(n) as n → ∞ and M 1,pn = max{D 2 , ..., D pn }.Let A be the event that the node corresponding to D 2 is selected uniformly among all the nodes, not following random walk from the node for D 1 .Then P P R (A) = 1 − c.Now, with (8), = (1 − c)P Inria where {p n } is the same sequence as in ( 11) and (a) follows mainly from the observation that conditioned on A, {M 1,pn ≤ u n } is independent of {D 1 > u n }, (b) and (c) result from the approximations in (2) and (1) respectively2 .Assuming p n − 1 = n 1/2 and since (1 11) and ( 12), The PageRank transition kernel (8) on the degree state space does not depend upon the random graph model in Section 3.1.Hence the derived lower bound of EI is useful for any degree correlation model.

Applications of Extremal Index in Network Sampling Processes
This section provides several uses of EI to infer the sampled sequence.This emphasis that the analytical calculation and estimation of EI are practically relevant.The limit of the point process of exceedances, N n (.), which counts the times, normalized by n, at which {X i } n i=1 exceeds a threshold u n provides many applications of extremal index.A cluster is considered to be formed by the exceedances in a block of size r n (r n = o(n)) in n with cluster size ξ n = rn i=1 1(X i > u n ) when there is at least one exceedance within r n .The point process N n converges weakly to a compound poisson process (CP ) with rate θτ and i.i.d.distribution as the limiting distribution of cluster size, under condition (1) and a mixing condition, and the points of exceedances in CP correspond to the clusters [5,Section 10.3].We name this kind of clusters as blocks of exceedances.
The applications below require a choice of the threshold sequence {u n } satisfying (1).For practical purpose, if a single threshold u is demanded, for the sampling budget B, we can fix u = max{u 1 , . . ., u B }.
The applications in this section are explained with the assumption that the sampled sequence is the sequence of node degrees.But the following techniques are very general and can be extended to any sampled sequence satisfying conditions D(u n ) and D ′′ (u n ).

Order statistics of the sampled degrees
The order statistics X n−k,n , (n − k)th maxima, is related to N n (.) and thus to θ by where we apply the result of convergence of N n to CP [5, Section 10.3.1].

Distribution of Maxima
The distribution of the maxima of the sampled degree sequences can be derived as (2) when n → ∞.Hence if the extremal index of the underlying process is known then from (2) one can approximate the (1 − η)th quantile x η of the maximal degree M n as In other words, quantiles can be used to find the maxima of the degree sequence with certain probability.
For a fixed certainity, η, x η is proportional to θ. Hence if the sampling procedures have same marginal distribution, with calculation of EI, it is possible to predict how much large values can be achieved.Lower EI indicates lower value for x η and higher represents high x η .For the random walk example in Section 3.3.1 for the degree correlation model, with the use of ( 13), we get the (1 − η)th quantile of the maxima M n The following example demonstrates the effect of neglecting correlations on the prediction of the largest degree node.The largest degree, with the assumption of Pareto distribution for the degree distribution, can be approximated as KN 1/γ with K ≈ 1, N as the number of nodes and γ as the tail index of complementary distribution function of degrees [2].For Twitter graph (recorded in 2012), γ = 1.124 for outdegree distribution and N = 537, 523, 432 [11].This gives the largest degree prediction as 59, 453, 030.But the actual largest out degree is 22, 717, 037.This difference is because the analysis in [2] assumes i.i.d.samples and does not take into account the degree correlation.With the knowledge of EI, correlation can considered as in (2).

Relation to first hitting time and interpretations
Extremal index also gives information about the first time {X n } hits (u n , ∞).Let T n be this time epoch.As N n converges to compound poisson process, it can be observed that T n /n is asymptotically an exponential random variable with rate θτ , i.e., lim n→∞ P(T n /n > x) = exp(−θτ x).Therefore lim n→∞ E(T n /n) = 1/(θτ ).Thus the more EI smaller, the more time it will take to hit the extreme levels as compared to independent sampling.This property can make use to compare different sampling procedures.

Relation to mean cluster size
If the conditions D ′′ (u n ) is satisfied along with D(u n ), asymptotically, a run of the consecutive exceedances following an upcrossing is observed, i.e., {X n } crosses the threshold u n at a time epoch and stays above u n for some more time before crossing u n downwards and stays below it for some time until next upcrossing of u n happens.This is called cluster of exceedances and is more practically relevant than blocks of exceedances at the starting of this section and is shown in [14] that these two definitions clusters are asymptotically equivalent resulting in similar cluster size distribution.The expected value of cluster of exceedances converges to inverse of extremal index [5, p. 384], i.e., where {π n (j), j ≥ 1} is the distribution of size of cluster of exceedances with n samples.More details about cluster size distribution and its mean can be found in [17].

Relation to Assortativity Coefficient
Assortativity coefficient ρ captures degree-degree dependencies in networks [4].Its value is in [−1, 1].For uncorrelated networks, it is zero, while if there is a high correlation between high degree nodes or between low degree nodes, it is 1 and in case of high dependency in the other direction, low degree-high degree, it is -1.From our observations, we found that EI has a relation to 1 − |ρ|, i.e., EI represents the correlation structure while ρ considers not just correlation, but the nature of correlation as well.But the use of ρ is limited as it is mainly introduced only for degree correlations, but EI is more general and more useful than ρ since it has many more interpretations than just measuring the degree correlation.

Estimation of Extremal Index and Numerical results
This section introduces two estimators for EI.Two types of networks are presented: synthetic correlated graph and real networks (Enron email network and DBLP network).For the synthetic graph, we compare the estimated EI to its theoretical value.For the real network, we calculate EI using the two estimators.We take {X i } as the degree sequence and use random walk as the sampling technique.The methods mentioned in the following are general and are not specific to degree sequence or random walk technique.

Empirical Copula based estimator
We have tried different estimators for EI available in literature [5,9] and found that the idea of estimating copula and then finding value of its derivative at 1 works better without the need to choose and optimize several parameters found in other estimators.We assume that {X i } satisfies D(u n ) and D ′′ (u n ) and we use (3) for calculation of EI.Copula C(u, v) is estimated empirically by The sequence {X i k } is chosen from the original sequence {X i } in such a way that X i k and X i k+1 are sufficiently apart to make them independent to certain extent.Now, to get θ, we use linear least squares error fitting to find slope at (1, 1) or use cubic spline interpolation for better results.

Intervals Estimator
This estimator does not assume any conditions on {X i }, but has the parameter u to choose appropriately.Let N = n i=1 1(X i > u) be number of exceedances of u at time epochs 1 ≤ S 1 < . . .< S N ≤ n and let the interexceedance times are T i = S i+1 − S i .Then intervals estimator is defined as [5, p.We choose u as δ percentage quantile thresholds, i.e., δ percentage of {X i , 1 ≤ i ≤ n} falls below u.

Synthetic graph
The simulations in the section follow the model and parameters in Section 3.2.1.Figure 2 shows copula estimator, and theoretical copula based on the continuous distribution in (7) and is given by Though we take quantized values for degree sequence, it is found that the copula estimated matches with theoretical copula.The value of EI is then obtained after cubic interpolation and numerical differentiation at point (1, 1).For the theoretical copula, EI is 1 − 1/2 γ , where γ = 1.2.

Real network
We consider two real world networks: Enron email network and DBLP network.The data is collected from [1].Both the networks satisfy the check for the condition D ′′ (u n ) (Section 2.1) reasonably well.Figure 4 shows the bivariate copula estimated and mentions corresponding EI.Intervals estimator is presented in Figure 5.After observing plateaus in the plots, we took EI as 0.25 and 0.2 for DBLP and Enron email graphs, respectively.

Conclusions
In this work, we have associated Extreme Value Theory of stationary sequences to sampling of large graphs.We show that for any general stationary samples (function of node samples) meeting    two mixing conditions, the knowledge of bivariate distribution or bivariate copula is sufficient to derive many of its extremal properties.The parameter extremal index (EI) encapsulates this relation.We relate EI to many relevant extremes in networks like order statistics, first hitting time, mean cluster size etc.In particular, we model correlation in degrees of adjacent nodes and examine samples from random walks on degree state space.Finally we have obtained estimates of EI for a synthetic graph with degree correlations and find a good match with the theory.We also calculate EI for two real-world networks.

Figure 2 :
Figure 2: Empirical and theoretical copulas for the synthetic graph.

Figure 3
Figure 3 displays the comparison between theoretical value of EI and Intervals estimate.

Figure 3 :
Figure 3: Intervals estimate and theoretical value θ = 0.56 of the synthetic graph against the quantile level δ.

Figure 4 :
Figure 4: Empirical copulas for email-Enron graph and DBLP graph.