Predicting the SARS-CoV-2 effective reproduction number using bulk contact data from mobile phones

Significance Numerous COVID-19 studies used mobile phone data but with limited power in accounting for infection numbers. We have evaluated deidentified Global Positioning System (GPS) data from over 1 million devices in Germany and inferred contacts from coproximity of devices. Through calculating the contact graphs, we derived a contact index (CX) that exhibits a high correlation with the incidence-based reproduction number R so that changes in CX precede those in R by more than 2 wk. CX thus is an early indicator for outbreaks and can be used to guide social-distancing policies. Further questions, including those for the efficacy of vaccination, can be addressed with our method. We discuss limitations, e.g., transmission on international travel, and the relation to superspreading.


0DU
$SU 0D\ -XQ 7LPH PHDQFRQWDFWQXPEHUk R kGD\UROOLQJDYHUDJH R Fig. S1. The initial surge in March 2020 in Germany ended with a strong decline of the mean contact and the R value over the month March. The evolution does not fit the naive expectation that a change in k is followed by that of R with a time lag of at least a few days. As well there is a noticeable increase in k from April to June which is not reflected in the R evolution.

B. Distribution of degrees.
Evaluating the number of contacts per device we find that the degree of nodes, i.e., the number of 18 contacts of each person, is broadly distributed with a long tail before the lockdown (Fig. S2, blue line) while at later dates after 19 the first wave, the distribution has a much shorter tail and is highly concentrated around few contacts (compare brown and 20 green lines). Thus initially there are many individuals with large numbers of contacts who would be potential 'super-spreaders' 21 (called hubs in network theory), but the lockdown clearly led to a reduction of the number of such individuals at later weeks.

22
The same pattern is observed before and after the second wave and lockdown (red and purple dots).  Fig. S3. The CX can be locally calculated by only evaluating cell phones located mostly in a specific county (while keeping all contacts, as well outside of the county, of that cell phone for the calculation). The map shows the CX on March 1, 2020, a Sunday. Several professional soccer matches were held on that day. The circles show cities that took part in the games from the two top tiers.

D. Vaccination campaign.
In the light of the network hubs that are present in the contact graph for March 12-19, we can 33 discuss implications of our findings for vaccination strategies. This seems particularly interesting considering that at least 34 for the initial stages of a vaccination only a small share of the population can be immunized. It is widely assumed that 35 for a herd immunity a share of the population of 60 % or more must be vaccinated (1). However, this threshold assumes 36 "well-mixedness" of the population while for outbreaks on graphs the heterogeneity of contacts must be taken into account.

37
In Fig. S5 we analyse a random model. We base the calculation on the distribution of degrees in our sampled network for 38 the week March 12-19, 2020. For the random strategy we uniformly sample and remove (i.e., vaccinate) nodes of all degrees 39 k with the same probability. We judge the strategy using our threshold of CXcrit = 38 which corresponds to an R of 1.0. In 40 order to reduce CX below the critical value, starting from the social situation in early March, we need a vaccination fraction 41 of about 95 %. It is conceivable that by targeting nodes with large numbers of connections (i.e., hubs) one can reduce the 42 fraction of nodes that need to be immunized.

43
The calculation of CX for the network with immunization proceeds in the same way as for the sampled networks considered 44 before. In the synthetic experiment nodes are removed randomly from the sequence of degrees of the sampled network. Then  March 12-19. We judge the efficacy of the campaign by comparing the CX to the critical value (horizontal line). Herd immunity is achieved at a level of 95 % vaccination. This behavior is expected for networks with a strong heterogeneity in the contact numbers (2).
"contact" and use it as a proxy for a human physical contact.

56
Our data allows estimating the number of real-world contacts for the entire population of Germany. However, a large part 57 of these real contacts is missing from our cell phone sampling for two main reasons: (A) We only cover a fraction of devices. (B) 58 We only cover times when the cell phone is sending a ping. In more detail regarding (A), we cover about 800,000 GPS-enabled 59 devices per day, so that the majority of contacts for an individual goes undetected. As there are about 83 million people in 60 Germany, we can expect to cover about 1% of the population. Regarding (B), a typical cellphone sends about 200 pings per 61 day. In order to cover the entire day, one ping every two minutes i.e. 720 pings per day are needed. Thus, only about 28% of 62 the time of the day is covered for the average device. Assuming for simplification that the time of pings are independent for 63 different devices, a lower bound estimate of the probability that a contact between two devices was observed is 0.28 × 0.28 ≈ 64 0.1. So, according to this rough calculation, we can expect to track at least 0.01 2 × 0.1 = 0.001% of all contacts between any 65 two people living in Germany.

66
The average number of cell phones registered during a day was about 800,000. Per day we find between 20,000 and 160,000 67 devices that had at least one match. The total number of matched pairs varied between 150,000 (before lockdown) and 12,000 68 (during lockdown) per day.  (14). 73 In the following, let G denote the full network or graph of all cell phones and let M denote the maximal degree of a node 74 in G. As a reminder, the degree of a node (i.e., person) equals the number of contacts of this person. Following Zhang et al.

75
(3) we let N denote the vector containing the degree counts of the nodes (an alternative way of deriving our results is based 76 having k links (contacts) to other cell phones.

79
In the sampling of phones according to (A), we assume that each phone is sampled from G with the same probability p, 80 resulting in the sampled graph G * . This situation is also described as induced network sampling in network theory (3). The 81 induced network G * includes all sampled nodes as well as all links from G that connect the sampled nodes in G * .

82
The vector of the expected values of the degree counts of the sampled network, N * , is E(N * ) = P N , Here, P is a matrix 83 of entries P (k, k ′ ) that describe the probability that a node of degree k ′ in G is selected and has degree k in G * . For induced 84 sampling, P is: In the following we assume that the particular sampling given by our mobile phone records gives rise to a N * ind , which can 88 be approximated by E(N * ) for large networks, from which we can calculate the degree moments for the original network.
The equality of (4) and (5) follows since Here, (7) is the definition of the second moment, (10) follows from (9) since the second moment for the binomial distribution , and (11) follows from (10) because of the definitions of the 106 first and second moments of N (k ′ ). Finally, we describe how the ratio ⟨k 2 ⟩/⟨k⟩ of the original graph can be obtained from the 107 sampled graph via ⟨k⟩ ind and ⟨k 2 ⟩ ind : phones during March to July 2020. The legal conditions for the processing of the data were described in a report by A. Böken proximity. This effect becomes more likely with increasing distance between GPS coordinates.

150
For evaluation of this algorithm, we used a numerically approach: we set a point p1 to the origin coordinates and a point 151 p2 to random coordinates and distance d to p1. We then create a tile with random positions which includes p1. If p2 is also 152 within the tile, the contact with distance d was successfully detected. By repeating this algorithm, a curve for the probability 153 of contact detection in relation to distances between GPS positions can be calculated (Fig. S6A). To calculate a similar graph 154 for the real positions between individuals, the inaccuracy of GPS has to be taken into consideration. In order to do this, we The result can be seen in Fig. S6B. less than 10 −4 respectively. This was further confirmed by a visual inspection of the residuals of the model. Figure S10 shows 207 residuals plotted as a time series, which seem to resemble a white noise process, except for the time period corresponding to 208 the meat factory outbreak. A plot of the PDF estimate of residuals is shown on Figure S11, which resembles a zero-centered 209 Normal distribution. The two visible outlier "bumps" are located around −1.3 and 1.5. These correspond to the residual 210 values of the meat factory outbreak seen in Figure S10. The ACF and PACF plots for the residuals are shown in Figure S9.