Statistically validated mobile communication networks: Evolution of motifs in European and Chinese data

Big data open up unprecedented opportunities to investigate complex systems including the society. In particular, communication data serve as major sources for computational social sciences but they have to be cleaned and filtered as they may contain spurious information due to recording errors as well as interactions, like commercial and marketing activities, not directly related to the social network. The network constructed from communication data can only be considered as a proxy for the network of social relationships. Here we apply a systematic method, based on multiple hypothesis testing, to statistically validate the links and then construct the corresponding Bonferroni network, generalized to the directed case. We study two large datasets of mobile phone records, one from Europe and the other from China. For both datasets we compare the raw data networks with the corresponding Bonferroni networks and point out significant differences in the structures and in the basic network measures. We show evidence that the Bonferroni network provides a better proxy for the network of social interactions than the original one. By using the filtered networks we investigated the statistics and temporal evolution of small directed 3-motifs and conclude that closed communication triads have a formation time-scale, which is quite fast and typically intraday. We also find that open communication triads preferentially evolve to other open triads with a higher fraction of reciprocated calls. These stylized facts were observed for both datasets.


Introduction
The data deluge has its origin in the development of information communication technology, which in turn has revolutionized the scientific research into social systems. The 'digital footprints' that we leave behind in almost all of our activities enable unprecedented investigations, in both depth and sample size. A new discipline, called 'computational social science' [1], has emerged to join the efforts of social scientists, computer scientists, physicists, and mathematicians in a truly interdisciplinary approach with the aim of better understanding the laws of human society both at an individual and at a collective level.
Mobile call records (MCRs) play a special role in the studies of human societies, as the mobile phone coverage is close to 100% in the adult population in much of the world and mobile phones are our companions in almost all of our activities. Accordingly, MCRs are well suited for mapping out the structure of social networks [2,3], including the dynamics [4] and hierarchical structure [5] of the communities, for studying dynamic aspects of human behavior like mobility characteristics [6][7][8] or communication patterns [9,10], and temporal motifs [11]. The origin of mobile call records shows an increasing socio-economic and cultural variety, ranging across different European [2,5], American [7], Asian [12] and African [13] sources. Although no systematic comparison has yet been made, some universal features seem to emerge. These include the Granovetterian structure [2,14] of the network, the bursty character of communication [15], and the strong inhomogeneity both in the number of contacts and in the strength of the activities [16].
MCRs do indeed provide detailed information about human interactions: data for millions of users as regards who called whom, when, how long the conversation took and the whereabouts of the callers serve as a goldmine for understanding the structure of the society and the dynamic laws of communication and mobility of individuals. Moreover, if so-called metadata are at the disposal of the researchers, deeper insight into gender and age related behavioral patterns can be mapped out [17,18].
Mesoscopic structures like network motifs [19] are of particular interest for the understanding of the structure and function of the society. A motif is a set of isomorphic subgraphs and it is generally assumed that it represents a functionally important group of nodes if its cardinality significantly exceeds the expected number of such subgraphs in a reference system, which is usually the configuration model. With this concept, it was possible to identify classes of networks, where within one class (e.g., networks of genetic transcription in different species or networks of different languages) there is similar over-representation of motifs. On the basis of weight intensity, the concept of motifs has also been generalized to weighted networks [20]. Recently, there has been a growing interest in the dynamic patterns. Dynamic motifs [11,18] are classes of similar event sequences, where the similarity refers not only to the topology but also to the temporal order of the events (e.g., phone calls). Using the information about the locality of the calls, mobility motifs were defined and the classification of human mobility patterns was enabled [21].
In temporal networks [25], where links are present only temporarily for an interaction event, static motifs defined on the aggregate networks present a time evolution as a function of the aggregation window. So far this has not been investigated, although this dynamics contains interesting information about the system. Time stamped mobile phone data are particularly suitable for such a study. One then asks: what are the typical motifs that are over-expressed in the network of MCRs? How do they emerge as a function of time? What is the characteristic time needed for the evolution of the motifs? These are the questions that we will focus on in this paper.
In social science there is a long tradition for studying triads, i.e. subgraphs of three nodes connected by directed links [22]. Recently, triads have been investigated in many other classes of complex networks ranging from biological to economic and financial networks [23,24]. Following the terminology introduced by Alon and collaborators, triads are called 3-motifs. Motifs of higher order (typically of order 4 or 5) are also sometime considered. However, the most developed theories and the largest number of empirical studies about motifs concern 3motifs. In fact the number of different motifs explodes when motifs of higher order are considered and the investigation of 4-motifs or 5-motifs in large networks is extremely demanding from a computational point of view. In social sciences a large amount of studies have focused on the 3-motif statistics and dynamics with the aim of using this information to detect global properties of the social system investigated. So far, the most prominent property investigated in social networks is triadic closure. The triadic closure is observed when a triad with only two relationships detected among the three social actors evolves to another triad with all the pairwise relationships present to some degree [22].
In the present study, we investigate the 3-motifs observed in two large databases of MCRs. Specifically, we investigate MCRs of two different mobile companies, one operating in Europe and the other in China. In this way we are studying the effectiveness of our approach and the validity of our investigation of communication links of social origin in two datasets that are different in various respects, e.g. as regards the telecom company (with its specific commercial policy), the geographical location and the recording time period. We have chosen to focus our investigation on the directed 3-motifs for two main reasons: (i) because they are extremely informative from a social point of view and (ii) because a reliable empirical estimation requires a series of strict conditions in the processing of very large samples.
A major problem with big data is that they have to be cleaned and filtered, as they contain spurious information due to recording errors and interactions, like commercial and marketing activities, not directly related to the study at hand. Usually MCRs are not collected for scientific purposes and, even if the companies attempt to provide the relevant data, there could be serious problems. One example is that for studying social relationships, private communication is needed; however, experience tells us that sometimes phones registered as private are used for professional purposes, like in call centers or marketing and information campaigns. In fact, the presence of large spurious communication hubs, e.g. large call centers, significantly alters the statistics of 3-motifs (and, more generally, of any class of motifs). Dialing wrong numbers is another possible source of false links. In addition, the usual corruption arising during coding, transferring and processing data can also take place. Unless data are cleaned, spurious links could be misinterpreted as real social relationships. This problem is part of the general topic of information filtering in complex networks with strong inhomogeneities [26]. In fact, human related systems usually show properties changing over many orders of magnitude, and this is so for communication networks also: the distributions of degrees or activities are fat tailed [16].
A somewhat arbitrary way of filtering data was introduced by Onnela et al [2]. Three measures were taken: (i) only mutual connections were considered as links, i.e., both individuals had to initialize calls during the period of observation; (ii) links with total call duration of less than 10 s during the period of 18 weeks examined were ignored; and (iii) the nodes with less than 60 s of total call activity were filtered out. In fact, in this way spurious nodes with enormous (10 4 ) numbers of unidirectional connections and sometimes more than 24 hours/day (!) activity were eliminated. The 10 s cutoff served to filter out the calls of wrong numbers. However, this method unintentionally distorts the results, as there can be many socially relevant unidirectional links and even short duration links that may carry social interaction.
Another, more systematic way of filtering was proposed by Serrano et al [27]. The idea is to statistically validate the links by deciding locally which of the links carry a disproportionate fraction of the weights adjacent to a given node. Comparing the empirical observations with a null model that takes into account the inhomogeneities of the system can reveal significant overrepresentation of links, thus indicating their relevance. Carrying out the procedure node by node results in what is called the 'multiscale backbone' of the system. This method indicates important aspects of filtering, including the necessity of statistical validation and the relevance of the selection of an appropriate null model. However, it is asymmetric for the nodes of the links and it has some restrictions upon the degree k of the nodes (isolated links between two k = 1 nodes can never be validated, irrespective of the weight of the link), and it handles the local network topology independently of the rest of the network.
The problem of pruning and/or filtering of a network has also been encountered in the construction and analysis of similarity based networks. In fact, the construction of similarity based networks can be seen as a filtering procedure selecting informative structures of the underlying system, such as minimum spanning trees [28], partial correlation interdependence [29], subgraphs of arbitrary genus [30], planar graphs [31], etc.
Recently, a method for filtering out statistically links in bipartite complex networks [33] was proposed. As the mobile call network can be considered as a bipartite one, where one set of nodes corresponds to the mobile phone users and the other one to the calls that they perform, the method can be straightforwardly applied to our problem. As it is based on multiple-hypothesis testing, global information is built in, and thus the above-mentioned problems can be avoided. This method has already been applied successfully to a number of systems, including the networks of simple organisms, financial stocks and the Internet Movie Database [33], classification of investor strategies [34] and of the specializations of criminal suspects [35].
In this paper, we adapt and apply the method introduced in [33] to mobile phone communication networks obtained from MCRs of two different regions, which are a European country and the province level municipality of Shanghai (China). We first construct the communication networks from the raw MCRs (for short, called by us 'original networks') and then the networks of the statistically validated links, also called Bonferroni networks. We keep the directed character of the links, as it carries important information about the underlying social relationships. The comparison between the original and the Bonferroni networks shows significant differences in the basic statistical properties. For example, our filtering removes the extremely large hubs (which would contradict the social brain hypothesis [36]) but keeps a large number of unidirectional contacts.
The Bonferroni filtering of the original network allows us to perform a detailed analysis of so-called 3-motifs. We show that the study of 3-motif statistics and dynamics is unreliable unless we perform the Bonferroni filtering on the original network. This is due to the fact that the empirical estimation of 3-motifs is strongly affected by the presence of huge communication hubs that have no social origin but only some socio-technical motivation, such as in the case for call centers. We study in the Bonferroni network the time evolution of the communication 3motifs. Our results show that communication 3-motifs are typically characterized by triadic closure at an intraday time scale. In fact, 3-motifs with only links detected between two pairs of subscribers primarily evolve into other 3-motifs characterized by a higher number of reciprocated calls. Triadic closure is preferentially observed in communication 3-motifs only after the calls of the open 3-motif are reciprocated.
The paper is structured as follows. In the next section, we discuss the application of the Bonferroni network method to the MCRs. In section 3, we describe our results for the directed 3-motifs. Then the temporal evolution of the motifs is discussed. Finally we present the conclusions.

The Bonferroni network of mobile call records
In order to analyze communication data for constructing MCR based networks, one has to first decide whether the entries in the records serve as good proxies for real social interactions in a probabilistic sense. This is a multiple-hypothesis test validation problem, which we approach by adapting and applying a directional version [32] of the recently introduced method of Bonferroni networks [33].

Data
We investigate two sets of data: one from a Chinese mobile phone service provider and another one from a European service provider. The Chinese data contain time stamped data of all (hashed) subscribers of the service within the time periods from 28 June to 24 July 2010, and from 1 October to 31 December 2010. In the second period, the calls recorded on 12 October, 5, 6, 13, 21, and 27 November, and 6, 8, 21, and 22 December contain missing records, and these days are removed from our analysis. Thus we have in total 109 days of calls recorded for the Chinese data. This dataset consists of 4031 090 subscribers and 1091 695 590 calls (done with both subscribers and non-subscribers of the service provider). When we select calls occurring only among subscribers, the number of calls reduces to 128 410 897, i.e., 88.24% of the calls go to non-subscribers. The set of mobile phone users including subscribers and non-subscribers exceeds nine million users.
The data from the European provider contain all records of its 7387 034 subscribers during 212 days between 1 January 2007 and 31 July 2007. This includes 3969 043 426 calls, 682 124 009 of which occurred between subscribers of the given provider, i.e., 81.54% of all calls connecting subscribers with non-subscribers. The whole set of subscribers and nonsubscribers exceeds 91 million users.
As the primary focus of our investigations is on the evolution of 3-motifs, in the present study we perform our investigations on the calling networks of subscribers only. In fact, including non-subscribers would alter the 3-motif statistics because calling data between two non-subscribers are not recorded in our datasets.

Statistically validated networks
In our directed network, nodes are mobile phone subscribers and a directional link is set from subscriber A to subscriber B if A makes a call to B in a selected time window, the weight w of the link being the number of calls in the time window investigated. For each link in the network, we perform a statistical test to check whether a link is statistically validated against a null hypothesis assuming heterogeneous calling profiles of the subscribers. The method is a directional variant of the method introduced in [33]. The statistical test is implemented as follows. We define N as the total number of calls among the subscribers in the system, and focus on two subscribers, i and j, to check whether the numbers of calls of i to j are overexpressed with respect to a null hypothesis taking into account heterogeneity in the number of calls performed. Let us call n c i the number of calls made by subscriber i as a caller, and n r j the number of calls that subscriber j receives. By labeling the number of times subscriber i calls j with X, the probability of observing X calls between them is given by We can therefore associate a p-value with the observed number = X n cr ij of calls from subscriber i to subscriber j as follows: Calculating the p-value for all the directed edges, which are N E in our network, implies that we run N E statistical tests for obtaining the network. When a large number of statistical tests are performed simultaneously the effectiveness of the statistical test can be decreased by a large number of false positives unless a multiple-hypothesis test correction is used. In the present study, we use the Bonferroni correction, which is the strictest multiple-hypothesis test correction controlling the familywise error rate when either dependent or independent multiple hypotheses are tested. That means that the univariate level of statistical significance p u = 0.01 must be replaced by a multivariate level, to be set as If the estimated p n ( ) cr ij is less than p m , we conclude that the link from subscriber i calling subscriber j is not due to the high heterogeneity of the subscribers and most probably reflects a social interaction between the subscribers. Accordingly, we set a link from i to j in the filtered network that is named the Bonferroni network.
We show in figure 1 series of histograms of the number of links characterized by a certain p-value for the Chinese and the European datasets respectively. Different histograms (characterized by different colors) are obtained by grouping the various links in terms of the number of calls characterizing them, i.e., in terms of their weights. The time interval used to build the original network is the entire time period available, that is 109 days and 212 days for the Chinese and European datasets respectively. Figure 1 shows that in both datasets the links with just one call (weight equal to 1) are characterized by a p-value which is larger than the Bonferroni threshold (indicated as a vertical line). The links that are filtered out from the original network comprise essentially all the links with weight 1 and some of the links with weight up to 5. Under the conditions of our analysis, both for the Chinese and for the European datasets, when the weight is larger than 5 the links are always included in the Bonferroni network (see the case of w = 10 in figure 1).
The fact that links with unit weight are not present in the Bonferroni network is due to the procedure of statistical validation of the links. In fact, for weight 1, the above defined p-value reads E m E and the last inequality holds true in our system, because the average number of calls per directed link, N N E , is much smaller than 100. As mentioned above, with our statistical validation procedure also some of the links with higher weights do not get validated in the Bonferroni network. However, the absence of validation is not a direct consequence of the small average number of phone calls per link. For example, let us consider a simple case in which In this case, the p-value is This p-value would be statistically significant if it were < N 0.01 E , and such a condition is easily attained, even in a sparse system like the present one: Indeed, the latter inequality says that, to validate the link from i to j, it is sufficient if just the average number of calls per link is larger than , which is a quantity smaller than 1 in any setting that includes more than 201 phone calls.
In the setting of the Bonferroni threshold there is a margin of arbitrariness. In other words, which is the most appropriate threshold to be used when we obtain distinct daily networks that we wish to compare? We believe that the answer to this question depends on the type of comparison that one aims to perform on the networks obtained. Therefore there might be more and less restrictive choices in the setting of the Bonferroni threshold. To minimize the number of false positive links, in the present study we set the Bonferroni threshold to where N E is the number of edges observed in the overall periods investigated (we have a single period investigated for the European data 9 and two distinct periods investigated in the Chinese data 10 ). In 9 For the European dataset the Bonferroni threshold is set to 0.01/49 029 577. 10 For the Chinese dataset, we have two periods of data. In the first period (the first month) 6441 490 links are present in the original network between 2309 619 subscribers and in the second period (the last three months) 13 616 634 links are present between 3492 116 subscribers. We consider the two time periods as separate time periods and we set two Bonferroni thresholds as 0.01/6441 490 and 0.01/13 616 634 when performing the construction of the Bonferroni networks. For the original daily networks, we have about × 6.53 10 5 nodes and × 6.94 10 5 links on average. By using the statistical test, 52.17% of the nodes and 65.87% of the links are removed in daily networks on average. this way, the Bonferroni threshold is rather conservative for the networks computed at short time intervals, e.g., at daily and weekly time intervals.
It should be noted that the choice of the Bonferroni threshold within broad limits up to some orders of magnitude does not crucially affect the composition of the networks obtained for the different time intervals. For instance, if the Bonferroni threshold used to construct daily Bonferroni networks for the Chinese data is increased by one order of magnitude, then the number of statistically validated links increases by only 4.8% on average. This behavior is specific to the daily networks. The percentage of nodes in the largest connected component increases when the period of time used to detect the network increases. We show in table 1 the average percentage and the standard deviation of the nodes which are present in the largest connected component of the original and Bonferroni networks obtained for different time periods for the Chinese and European datasets. From the table we see that for weekly networks the percentage of nodes of the largest connected component is already almost 49% and almost 35% for the Chinese and European datasets respectively. These values further increase for the monthly networks when the largest connected components of the Bonferroni networks are 73% and 81%, i.e., values not too different from the ones (81% and 94%) observed in the original networks for the Chinese and European datasets respectively. We interpret these results as an indication that our filtering methodology is able to detect a progressively increasing fraction of the weak ties that provide the interconnections building the largest connected component. This observation is in agreement with the detected role of weak ties discussed in [2].

Basic metrics and the degree distribution
The bottom panel of figure 2 shows the time evolution of the number of 3-motifs. In the original network this number is fluctuating and presents a huge spike at day 79. In the case of the Bonferroni network, the time evolution is fluctuating less and no spike is present. In summary, our statistical validation procedure selects a network characterized by properties that are much more stable than those of the original network. We hypothesize that the Bonferroni network is able to retain links whose social motivations are typically more pronounced than those of the ones left out from the original network; thus the Bonferroni network is a better proxy for the underlying social network than the original one. Our hypothesis is supported by the results that we obtain for some important network metrics and for the census of the 3-motifs and their dynamics. We show in figure 3 the cumulative in-degree and out-degree distributions for the Chinese datasets (subscribers only) for the entire period (109 days). The cumulative distributions are shown for the original network (panel (a)) and for the Bonferroni network (panel (b)). We observe a series of interesting differences in the cumulative distributions between the original and the Bonferroni networks. We observe in the original network subscribers with very large out-degree (of the order of 3000) and in-degree (of the order 700). We also note that the tails of the in-degree and out-degree distributions are very pronounced and quite different from each other (with the out-degree distribution significantly more pronounced that the in-degree distribution). In the case of the Bonferroni network the in-degree and out-degree distributions are still showing pronounced tails, but the largest degree is of the order of 200. We also note that the tails of the in-degree and out-degree distributions are in the Bonferroni case similar, with the in-degree being only slightly more pronounced than the out-degree for very large degrees.
A similar pattern is observed also in the degree distributions of European data (see panels (c) and (d) of figure 3). However, we also note differences between the European and the Chinese distributions. Specifically, for the original network (panel (c) of figure 3) the more pronounced tail is observed for the in-degree distribution in the European case, whereas the opposite is observed in the Chinese case. We also note that in the European case the distributions of the Bonferroni networks show slightly different tails: the tail of the in-degree distribution is more pronounced than that of the out-degree case. It is worth noting that, similarly to what we observe for the Chinese dataset, the maximal in-degree is close to 250 and the maximal out-degree is close to 150 if we do not consider an in-degree outlier characterized by a degree of 1131. The tails of the cumulative distributions of the in-degree and out-degree in the European case are well described by a power law decay with an exponent equal to 3.85 and 6.25 respectively 11 .

3-motifs
We show in figure 4 the 13 different kinds of 3-motifs that can be observed in a network. There are different ways to code the identities of these motifs. In the present paper, we use the labeling The exponent for the out-degree distribution is pretty big. It decays so fast that it is difficult to distinguish between power law and exponential decay.
of Milo et al [19]. Here, we investigate the properties of communication 3-motifs of networks obtained from MCRs data. A direct inspection of figures 5 and 6 shows that the estimation of the fraction of daily 3motifs for the original network presents seasonalities of various frequencies and huge spikes localized at specific weeks. The seasonality is extremely pronounced for 3-motifs presenting only two of the three possible relationships (see the panels of figure 5). On the other hand, the pattern observed in the Bonferroni networks is more stable and shows only a weekly seasonality and a small deviation occurring for some special days (days with labels 5 and 105, which are most probably related to big holidays). In the Bonferroni network, the weekly pattern is quite evident for the 3-motifs with two pair relationships (figure 5), whereas for triads with a triangle structure the weekly pattern is less evident, especially in some cases-for example, for the 3motif labeled as 98 (see figure 6).    We interpret these empirical results as supporting our hypothesis that the Bonferroni network is sampling relationships characterized by a strong social interaction, whereas the original network also includes kinds of calls that are related to commercial or technical activities such as the ones typically performed by call centers. The presence of these activities can substantially alter counts of the triads because a node with a very large in-degree or out-degree participates in a large number of triads. This kind of spurious effect is clearly not observed in the Bonferroni network.
We present in tables 2 and 3 summary statistics for the fractions of 3-motifs observed in the daily, weekly and monthly original and Bonferroni networks for the Chinese and European datasets respectively. For each 3-motif and for each network we report the average value observed for real data μ and the average value μ rnd observed by randomly shuffling the network a large number of times while keeping the in-degree and out-degree of each node fixed and keeping the number of bidirectional relationships constant. The counting of the 3-motifs and the shuffling procedures were performed by using the FANMOD algorithm [37]. We also report in the tables the standard deviation observed for real data σ and for shuffled data σ rnd .
For each 3-motif we evaluate a z-score defined as . This variable indicates the deviation of the observed average value from the average value obtained by random shuffling of the network in units of the standard deviation. We have decided to use this definition of the z-score instead of the another possible one, namely rnd , because our definition is the most conservative one in the present case. We highlight, in the tables, the average percentage of a 3-motif in boldface when its associated z-score is larger than 3 (a character (+) follows the average percentage value in this case) or smaller than −3 (a character (−) follows the average percentage value in this case). Tables 2 and 3 Tables 2 and 3 show that for the first set of 3-motifs, the average percentage of the 3-motifs is close to the value expected for random connections (6, 12, and 36) or less than expected for the 3-motifs 14, 74, and 78. On the other hand, for the second set of motifs (38, 46, 98, 102, 108, 110 and 238), all the 3-motifs are presenting an average percentage which is higher than the value expected for random driven communications. In other words, the underlying social structure and the communication style of the social actors over-express the 3-motifs characterized by triadic closure. This behavior is observed at daily, weekly and monthly time scales (with a pattern more pronounced when the time period used to build the network is longer) and it is observed both for the Chinese and the European datasets.
The above cited results are qualitatively observed both for the original and for the Bonferroni networks. However, original and Bonferroni networks present values of the average percentages of the 3-motifs which are quite different, especially for weekly and monthly time periods. The difference is quite pronounced for 3-motifs of the first set (see, for example, the average percentage of 3-motif number 6 for the monthly networks). Our analysis of the time dependence of the average percentage summarized in figures 5 and 6 indicates that the results obtained for the Bonferroni networks are more robust and reliable than the results obtained for the original network, and this allows for a more detailed investigation of the process of formation and disappearance of these communication structures. In the next section, we will analyze the process of formation of the communication 3-motifs in the most reliable setting, which is the setting of the Bonferroni networks.

Temporal evolution of communication 3-motifs
Communication 3-motifs are continuously forming and disappearing over time. Here we primarily focus on the dynamics of the 3-motif formation observed at a daily time scale. Specifically, we detect the Bonferroni network at day k and at the two-day time interval beginning at day k, and we count and identify all the 3-motifs present in each network. The identification of each 3-motif is carried out by considering the identity of the three social actors composing it. In other words, we keep a memory of the fact that, for example, one 3-motif of type 6 is observed among subscribers with identities i, j and k. This is done to follow each 3motif evolution during the increase of the monitoring time interval, which is primarily producing a network expansion 12 .
We show in tables 4 and 5 the conditional probabilities for 3-motifs during a one-day expansion of the time period used to determine the Bonferroni network. The starting network is computed for day k, whereas the target network is computed for a two-day time interval including the previous one (days k and + k 1) for Chinese (table 4) and European (table 5) datasets. The networks refer to the cases of Bonferroni networks obtained from the records of subscribers. We highlight in boldface the entries with conditional probability higher than 0.05. On inspecting tables 4 and 5, we note that the conditional probability for the Chinese and European data respectively. It is worth noting that the less stable 3-motif is the one labeled as 98, which is a motif characterized by a circular flux of information among the three social actors. The second-lowest value of the conditional probability | P (38 38) is observed for the other 3-motif which is a triad of unidirectional links.
We also observe that the second-largest value in each row of conditional probabilities is associated with a 3-motif pair requiring that a unidirectional link of the original 3-motif modifies into a bidirectional one in the arrival 3-motif. This observation suggests that the underlying communication process governing the 3-motif dynamics is primarily related to the probability of observing return calls (see | P (14 6), | P (14 12) and | P (74 12), | P (78 14), etc) between two social actors.
We provide in figure 7(a) schematic representation of the most relevant conditional probability among the different 3-motifs. We draw a line in the panels of the figure when the conditional probability from the originating to the arrival 3-motif exceeds 5%. For both the Bonferroni networks obtained from the one-day expansion of the Chinese and European datasets, we observe that the typical path of a 3-motif communication does not preferentially 12 Indeed during the increase of the time interval used to obtain the Bonferroni network of a longer time period, some links existing in the first Bonferroni network might also disappear due to the absence of validation of the link in the second extended period of detection but not in the first period. The probability of disappearance of a link is fairly small, but in a few cases such events occur. Table 2. Statistics of 3-motifs for the Chinese data. Subscribers only. The networks investigated are the original network and the Bonferroni network. We show the average value observed for real data μ and the average value μ rnd observed by randomly shuffling the network. We also report the standard deviation observed for real data σ and shuffled data σ rnd . Values are given as percentages. Daily (d), weekly (w) and monthly (m) time periods are shown. Values labeled in boldface indicate positive (+) or negative (−) z-score values larger than 3 in absolute value. The z-score is computed as    We interpret this observation as a manifestation of the fact that communication closed 3motifs typically form at an intraday time scale. This interpretation is also supported by the results reported in figure 8, where we look at the details of formation and evolution of closed 3motifs from one day to the next one, without varying the time window and without distinguishing among the different closed 3-motifs and the different open 3-motifs. On average, more than 2 3 of the closed 3-motifs observed in the Bonferroni network at a given day come from unconnected triples of nodes in the Bonferroni network of the previous day, and evolve to unconnected triplets of nodes in the Bonferroni network of the following day (red rectangles in the figure). On the other hand, the closed 3-motifs originating from (evolving to) open 3-motifs in the Bonferroni network of the previous (following) day amount to about 1 4 of the total (blue triangles in the figure). Such an erratic pattern suggests that a large fraction of closed 3-motifs that appear in a daily Bonferroni network occur due to contingent reasons of communication that develop at an intraday time scale, e.g., the peak of closed 3-motifs observed on Friday may be due to the need for people to coordinate their social activities.

Conclusions
In this paper, we have adapted and applied a filtering procedure to a directed communication network. This filtering procedure is based on a statistical validation performed by using multiple-hypothesis test correction. We hypothesize that the links detected in the directed Bonferroni communication networks describe relevant ties of the underlying social structure originating the communication. We test our hypothesis by comparing basic statistics of the original network and the Bonferroni network, and conclude that the latter is much more realistic as it removes spurious links related to non-social interactions. Furthermore, we investigate the relative frequency of 3-motifs in two large sets of mobile communication data recorded in two different countries of two distinct continents. In both cases, we verify that the frequency profile of the 3-motifs of the directed Bonferroni communication networks is much more stable over time than the frequency profile of the original network. We believe that this empirical Top panels: evolution of closed 3-motifs for Chinese (left) and European (right) data across a week. Black circles indicate the total count of closed 3-motifs in the daily Bonferroni networks for the first six days of a week. Red rectangles indicate the number of these motifs evolving, the day after, to node triplets that do not determine a 3-motif. Blue triangles (green diamonds) indicate the number of closed 3-motifs evolving, the day after, to open 3-motifs (closed 3-motifs). Bottom panels: formation of closed 3-motifs for Chinese (left) and European (right) data across a week. Black circles indicate the total count of closed 3-motifs in the daily Bonferroni networks for the last six days of a week. Red rectangles indicate the number of these motifs emerging from node triplets that did not determine a 3-motif in the Bonferroni network of the day before. Blue triangles (green diamonds) indicate the number of closed 3-motifs that were open 3-motifs (closed 3-motifs) in the Bonferroni network of the day before. observation supports the hypothesis of Bonferroni networks being good proxies of strong ties of social origin.
After having verified the robustness and reliability of our statistical filtering procedure, we We interpret these results as evidence for the fact that correctly sampled mobile call records reflect rapid communication interactions of an underlying social structure that forms and dissolves over a longer time scale. In other words, the time scales of the communication network and of the social network are quite distinct, with the first lasting usually less than one day and the second requiring months or years. Under this interpretation, we conclude that the triadic closure process is governed by distinct rules in communication and in social networks.