Contact Tracing: a game of big numbers in the time of COVID-19

One of the more widely advocated solutions to slowing down the spread of COVID-19 has been automated contact tracing. Since proximity data can be collected by personal mobile devices, the natural proposal has been to use this for contact tracing as this provides a major gain over a manual implementation. In this work, we study the characteristics of automated contact tracing and its effectiveness for mapping the spread of a pandemic due to the spread of SARS-CoV-2. We highlight the infrastructure and social structures required for automated contact tracing to work for the current pandemic. We display the vulnerabilities of the strategy to inadequately sample the population, which results in the inability to sufficiently determine significant contact with infected individuals. Of crucial importance will be the participation of a significant fraction of the population for which we derive a minimum threshold. We conclude that a strong reliance on contact tracing to contain the spread of the SARS-CoV-2 pandemic can lead to the potential danger of allowing the pandemic to spread unchecked. A carefully thought out strategy for controlling the spread of the pandemic along with automated contact tracing can lead to an optimal solution.


Introduction
A relentless and damaging battle is being fought against the spread of COVID-19. While several countries have managed to significantly slow down its spread, severe measures have had to be taken to do so and at great cost to the economic and social well-being of the nations. It is still not certain when a significant control over the spread of SARS-CoV-2 can be attained. Recent projections propose surveillance for the next few years [1], with several measures that will need to be put in place to minimize the cost of the pandemic to humankind. Automated contact tracing is one of these measures.
Contact tracing has been observed to be effective in previous pandemics (or epidemics) like the Ebola virus outbreak in 2014-2015 [2]. This preemptive method allows for the containment of the pathogen by isolating potentially infected individuals that have been traced. Extensive studies of manual contact tracing were done during the previous outbreak of the Ebola virus [3][4][5], SARS-CoV and MERS-CoV [6]. More recently, mathematical models have been formulated to study contact tracing assuming the disease spread to be quantifiable by the SIR model [7]. However, there are reasons for us to believe that the effectiveness of automated contact tracing during the SARS-CoV-2 pandemic requires a more detailed examination given the distinct difference in the prevalence of this pandemic from the ones in the recent past and the different modes of transmission of the pathogen.
Contact tracing is not very effective against pathogens that spread like the influenza virus but is more effective for containing smallpox and SARS-CoV and partially effective in containing foot-and-mouth disease [8]. The viral shedding patterns of SARS-CoV and MERS-CoV are similar [9,10] and show almost no pre-symptomatic transmission [11] 1 , while Ebola is known to be transmitted through the bodily fluids of infected individuals after the onset of symptoms [13]. On the other hand, influenza shows a significant rate of viral shedding in the pre-symptomatic stage [14]. The important transmission characteristics of SARS-CoV-2 that set it apart from other HCoV pathogens like SARS-CoV and MERS-CoV and from Ebola are: • SARS-CoV-2 transmission is driven by pre-symptomatic spreading like the influenza virus [15][16][17].
• The pathogen can be transmitted through the air in high contamination regions and through contaminated dry surfaces for several days [15,18,19] leading to its high transmission rates. This brings about additional challenges when the disease cannot be contained within an isolated envelope of a healthcare system. While a similar spreading pattern is seen in SARS-CoV and MERS-CoV, this makes SARS-CoV-2 more easily transmittable than Ebola.
Similar to SARS-CoV the reproductive number, R 0 , for SARS-CoV-2 is estimated to be 2.2 − 2.7 [23][24][25][26][27] 2 . The dispersion parameter is estimated to also be similar to that of SARS-CoV (close to 0.1), which could be causing superspreading [26,[29][30][31]. While, theoretically, contact tracing can be shown as an effective means of containing SARS-CoV-2 [30], factors such as long delays from symptom onset to isolation, fewer cases ascertained by contact tracing, and increasing pre-symptomatic transmission can significantly impact how effective contact tracing will be in practice. Defining significant contact as being within 2 meters and lasting for at least 15 minutes can result in the detection of more than 4 out of 5 infected cases but at the cost of tracing 36 contacts per individual (95 th percentile: 0 − 182 contacts per individual) [32]. Changes to the definitions of the parameters can reduce the numbers traced. If the threshold of minimum contact time is increased to at least 4 hours of contact the spread of the pathogen cannot be controlled by automated contact tracing since many potentially infected contacts will escape detection. Detailed modeling of SARS-CoV-2 transmission shows that the pandemic can be sustained just by pre-symptomatic transmission and that contact tracing can be used to contain the spread of the pathogen if there are no significant delays to identifying and isolating infected individuals and their contacts [33].
Considering all the factors that make contact tracing a different game for SARS-CoV-2, in this paper we will examine in detail how much data and participation from the population will be needed to make automated contact tracing effective. This will give an estimate of the necessary scale of implementation of contact tracing and whether it will be feasible.

The game of big numbers
To begin with, we consider a disease that spreads only in the symptomatic stage. The infected individuals can spread the disease to their contacts before they are isolated and to medical workers after they are isolated with varying probabilities. Of significance here is that after the initial period of ignorance of the population about a rising pandemic, infected individuals will be isolated with higher efficiency (even with manual contact tracing) resulting in the curtailment of the spread of the pathogen. How is contact tracing more effective in such diseases? Since the mobility of the infected individual usually sees a decline after the onset of symptoms, the number of contacts at risk become limited to nearest neighbours and possibly next-to-nearest neighbors in the contact space. This allows the implementation of a manual contact tracing algorithm that identifies these neighbours and isolates or tests them as suggested by [8]. This was seen to be effective during the Ebola, MERS-CoV and SARS-CoV outbreaks.
However, the spreading of SARS-CoV-2 follows a very different pattern. With the prevalence of spreading of infection through pre-symptomatic and subclinical hosts, the number of individuals that might need to be traced can be very large. This has led to the belief that automated contact tracing in a wider gamut should be implemented. Most of the proposed solutions [30,32,33] require the use of historical proximity data to trace contacts. In the context of COVID-19, there are some obvious pitfalls in the algorithm: • It is estimated that about 86% (95% CI: [82% -90%]) of the infected cases in China were undocumented prior to the travel ban on the 23 rd of January 2020 generating 79% of the documented infections [23]. A large number of these undocumented cases experienced mild, limited or no symptoms and can hence go unrecognized. Similar results were reported by other studies [34,35]. It is not possible to trace all the contacts of these individuals since they will be partially reported leading to incomplete coverage of contact tracing.
• While it is assumed that the SARS-CoV-2 spreads within a proximity radius of r 0 (assumed to be 2 meters), not much is known about the probability of transmission, p t when two individuals come within this domain of contact for a minimum contact time t 0 . Assuming p t to be large will lead to an unreasonably large estimate of the number of potential contacts in a crowded region like supermarkets, which remain open during the period of social distancing. On the other hand, assuming p t to be small will underestimate the number of infected contacts, especially because there might be other modes of transmission of SARS-CoV-2 that are not being considered. By definition p t depends on the dynamics of disease transmission when a healthy individual comes in significant contact with a sick individual. Moreover, p t is not constant over r 0 and also varies with the stage of infection the infected individual is at [17]. Several other factors contribute to the value of p t in addition to the contagiousness of the disease including, but not limited to, measures such as the use of PPE, public awareness of the disease spread etc. 2 Much higher reproductive rates have also been estimated with data from Wuhan, China [28] 2/8 . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted April 27, 2020.  Interactions occur from t = 0 to t = t 0 + ε where ε t 0 . A will be confirmed as COVID-19 positive in the future and C will be notified having come in contact with A. E might be notified if E stays in contact with A for a time period greater than t 0 .
The first pitfall can be alleviated by increasing the testing rate of individuals for viral RNA in the hope that a larger fraction of the asymptomatic or mildly symptomatic carriers can be traced. Increasing awareness can also help. The second pitfall can be alleviated when more detailed knowledge of the spread of SARS-CoV-2 is available and with the help of simulation of the spread of the disease in a population. For the rest of the work, we will assume p t to be a variable and r 0 to be fixed to 2 meters [32].
The real-world applicability of automated contact tracing requires the examination of the effects of finite sampling of the population. The assumption that we are working with is that enrollment in contact tracing will be voluntary and individuals remain free to do one of the following: • Choose not to enroll in the program by either not using the application or the devices needed for tracing, including discontinuity in participation. • Choose not to report on their health condition which is assumed to be voluntary.
Both types of occurrences have an effect of reducing the efficiency of automated contact tracing but in slightly different manners. In the first case, not subscribing to the service would not only remove an individual from the pool that is being notified, but it also removes them from the pool of individuals that are reporting. In the second case only the latter happens. To understand this better we build a toy model that derives concepts from particle interactions.
To understand how automated contact tracing works we describe every individual by a circle in a 2D plane with a radius of r 0 /2 which we shall call the cross-section of the individual. The cross-section is chosen such that any overlap between two cross-sections can be taken as a significant contact between the two respective individuals. Temporally, the cross-sections have to overlap for a time t 0 which is the threshold interaction time that is critical for an individual to infect another by proximity. For the sake of simplicity and without any loss of generality of our argument we can assume that the probability of getting infected, p t , is independent of the degree of overlap of the cross-sections and for any time t > t 0 . Figure 1 gives a depiction of what the individuals in this model might look like. In the left-most panel, B and C are in contact with A at t = 0 but not with each other. D is isolated from all of them. After a period of time t < t 0 , B is isolated but C stays in contact with A. Then at time t = t 0 + ε, where ε t 0 , we see that C is still in contact with A, B remains isolated and E has come in contact with both A and C. Using the methods of contact tracing, if A reports sick at a future time, C will be deemed as having had significant contact with A. E might also be deemed as such depending on how long he maintains proximity with A, but the proximity of E with C need not be counted even if E spends t > t 0 in contact with C (if only primary contacts are traced) unless C reports sick too.
This method of automated contact tracing will work as long as A and C (and possibly E) are enrolled in the service even if B and D are not. However, D is completely isolated and by remaining so for a long time is observing social distancing from any other individual. B is representative of an individual who observes partial social distancing. Hence, for D this service is not necessary and for B it is of limited value. If C is not enrolled in the service C will never get notified if A gets sick. C might fall sick or become an asymptomatic carrier and continue contaminating others. If A does not enroll in this service then C never gets notified leading to the same conclusions but E might get notified if C declares sick and E is enrolled in the service. This is how automated contact tracing will work if only neighbours (not next-to-neighbours) of anyone reporting sick are informed.

3/8
. CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted April 27, 2020. . https://doi.org/10.1101/2020.04. 22.20071043 doi: medRxiv preprint An estimated 45% of virus transmission occurs in the pre-symptomatic phase of an infected individual [33]. Combining this with the results of Hellewell et. al. [30] it is seen that more than 80% of the contacts need to be traced and quarantined to contain the outbreak. Prevalence of subclinical infections of SARS-CoV-2 further reduces the effectiveness of contact tracing. With contact tracing using a definition of r 0 = 2 meters and t 0 = 15 minutes more than 80% of the cases can be traced [32] if every infected case is reported. In what follows we create a simplified model of contact tracing to deduce the minimum fraction of the population that needs to enroll in the program for it to be effective.
• Let N be the number of individuals in a population and f i the fraction of the population that is infected, regardless of whether they know it or not. Therefore, the true number of infected individuals is f i N. • If testing is conducted only when mild or severe symptoms are seen (i.e. excluding testing of asymptomatic cases), the number of confirmed cases is r c f i N with r c being the fraction of the infected that will be confirmed as infected by testing.
• We define f e as the fraction of the population that is enrolled for automated contact tracing and f c as the fraction of the users that will confirm that they have been diagnosed positive. Hence, the number of individuals that have tested positive, are using contact tracing and will confirm that they are sick is f c f e r c f i N. • We define a c as the average number of contacts per person in the period of time t 0 who are at risk of being infected due to proximity with a sick individual and is assumed to be greater than 0. 3 Since only f e fraction of contacts are using the service, we can estimate the number of individuals that can be traced as f c f e r c f i N × a c × f e . To compute the number of individuals that need to be quarantined or isolated since they are now at risk of being infected from coming in contact with a sick person, we define the following.
• Since p t is defined as the probability of transmission of infection within the proximity radius r 0 being exposed for a time greater than t 0 , the number of individuals at risk is, at most p t f i Na c . • Finally, we define f T as the fraction of the individuals at risk of being infected that needs to be quarantined or isolated to quell the spread of the pathogen.
Therefore, the number of individuals that should be quarantined is f T p t f i Na c . For contact tracing to work effectively, we have, Eq. (1) simply states that the number of individuals that can be notified by contact tracing (on the left-hand side) has to be greater than or equal to the number of individuals who need to be notified (on the right-hand side). Note that a c , the average number of contacts, drops out of the inequality and hence, the inequality is independent of the population density of the region. Since the right-hand side is the minimum fraction of the population that needs to be traced we arrive at the relation: The fraction f min e is the minimum fraction of the population that needs to be enrolled in contact tracing for it to be effective as a means of slowing down the spread of the pandemic. In eq. 2, p t depends on the spreading dynamics of the pathogen determined by individual-to-individual interactions and, therefore, also depends on the mitigating measures taken at both the population level and the individual level. The parameter f T depends on the disease spreading dynamics and can be estimated from modeling the disease spreading amongst a population [33]. The parameter r c is governed by the ability to identify infected individuals through testing and depends on the protocols of the testing program and its coverage. On the other hand, f c is determined solely by the degree to which individuals are willing and able to confirm that they have been tested positive.
Lastly, we define the effectiveness of the contact tracing, η, as the ratio of the actual number of individuals that will be notified ( f 2 e f c r c f i Na c ) to the minimum number of individuals that should be notified to quell the spread of the disease ( f min e 2 f c r c f i Na c ) and get: 3 Here we make a simplifying assumption that the disease has spread to only a small fraction of the population and the probability of a single healthy person to meet two sick individuals within the proximity radius in a period of 14 days and be in contact with them for t ≥ t 0 is negligibly small in general. There will be outliers depending on the habits of individuals but we can neglect them for this analysis.

4/8
. CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted April 27, 2020.  In this plot the solid and dashed lines represent p t = 20%, 15% respectively. For both panels the percentage of the truly infected individuals that will be confirmed as sick, r c is varied between 75% -95%. Three cases for the minimum fraction of the individuals at risk that need to be traced are considered with f T = 60%, 75%, 90% in orange green and red respectively.  figure 2, we show the minimum percentage of the population that needs to be enrolled in contact tracing f min e (in %) versus the transmission probability p t . We consider two values for f c = 80%, 100%, the fraction of individuals who test positive and will confirm their symptoms to trigger contact tracing, by the solid and dashed lines respectively. In the right panel the variation of f min e with f c is shown. The solid and dashed lines represent p t = 15%, 25% respectively. The bands are generated by varying the fraction of sick individuals that can be confirmed as sick by testing, r c , between 75% and 90%. For both the plots the minimum fraction of the individuals who are at risk that need to be traced f T is represented by red, green and orange for f T = 60%, 75%, 90% respectively.
If we take a closer look at eq. 1 and the left panel of figure 2 we see that even with a modest probability of transmission p t (e.g. about 30%) quite a large fraction of the population (about 40% -60%) needs to be enrolled in contact tracing even when we assume almost all of them will be actively participating in confirming when they get infected. Assuming all the traced contacts within radius r 0 lasting for more than t 0 period of time are going to be infected is equivalent to stating p t = 100%. From the panel on the right, we can see how a fall in the fraction of individuals that confirm that they are sick, f c , can increase f min e . Even with quite low values of p t nearly half the population needs to be enrolled in contact tracing.
Let us try to understand why the effectiveness of automated contact tracing seems to drop so drastically with the enrollment fraction f e when it seems to have worked in some countries like China, South Korea, Singapore etc. From the left-hand side of eq. 1 we see that the effectiveness of contact tracing drops as f 2 e . We see that η drops to 64% when f e = 0.8 f min e and 25% when f e = 0.5 f min e . This non-linearity exists because f e not only reduces the number of sick individuals who can report their status but also the number of individuals who can receive a notification that they have come in contact with a sick person. This is the key difference between manual contact tracing where all contacts are looked up and traced 4 when compared with automated contact tracing where the process is driven by voluntary participation. Furthermore, as seen in figure 2, when the percentage of sick individuals who report that they are sick, f c , is lower than 100%, contact tracing becomes even less effective. In addition, the percentage of cases that can actually be detected, r c , will realistically be less than 100% for SARS-CoV-2 because of the prevalence of subclinical cases that will escape detection.
In our analysis, we have inclined towards an optimistic picture of the spread of SAR-CoV-2. We have considered only spreading due to proximity and not considered other means of spreading like contaminated surfaces and aerosol that are common for SARS-CoV-2 [15,18,19] and can increase p t . In figure 2 we have taken a minimum r c of 75% when this can be even lower if widespread testing is not conducted to identify subclinical cases that can go undetected. We have also neglected the requirement for tracing secondary or tertiary contacts. In addition, we have also ignored events where a large number of individuals are infected in very a crowded location like public events for which thresholds like r 0 and t 0 need to be modified. 4 This effectively makes f e close to 100% for both those who have been diagnosed as infected and their contacts.

5/8
. CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted April 27, 2020. . https://doi.org/10.1101/2020.04. 22.20071043 doi: medRxiv preprint Despite this optimistic picture, our analysis shows that a majority of the population has to enroll and actively participate in contact tracing for the measure to work in the absence of active social distancing measures.
We have not addressed the sociological aspect of selection bias in the enrollment process. Diversity in socio-economic conditions, awareness of technology and willingness to participate in a community effort will create variation in representations amongst the population. This can lead to the most vulnerable in society getting the least benefit from the implementation of automated contact tracing. Addressing the challenges of implementing automated contact tracing in developing nations where the necessary technologies might not be accessible to a large proportion of the population lies beyond the scope of our work.

Assisted Contact Tracing
The necessary scale of implementation of automated contact tracing appears to be too large for it to be considered an effective measure to slow down the ongoing pandemic. For contact tracing to be a viable option, f min e has to be as low as possible. To achieve this either the product f T p t needs to be decreased or the product f c r c needs to be increased as seen from eq. 2.
• Both f T and p t depend on the dynamics of the disease spreading amongst humans. The fraction of traced cases that need to be quarantined to stop the spread of the disease f T can be reduced by extensive monitoring of the disease to make sure sick cases are isolated as soon as possible and their contacts are traced. Even a day or two of delays can increase f T making contact tracing ineffective [33]. • Variations in p t can be caused by several factors some of which are controllable. Since p t depends on the contagiousness of the disease and any protective measures taken against the spread of the infection, p t can be reduced by measures of limited social distancing, the use of PPE and raising public awareness about the contagiousness of COVID-19. This can pose a significant challenge in densely populated regions and regions with poor living conditions and might lead to the breakdown of the applicability of automated contact tracing. • f c is somewhat more difficult to control assuming the reporting of those who are confirmed sick is voluntary. This can only be increased by increasing the population's willingness to contribute to contact tracing. • r c is the parameter that is least under control since without very large-scale testing, asymptomatic and mildly symptomatic cases will be difficult to find. This is especially true if the infection can spread by means other than proximity alone as might be the case for SARS-CoV-2 [15,18,19].
Thus we see that a combination of several measures along with a large participation of the population in contact tracing would be the optimal solution for avoiding extensive population-wide social distancing measures and reducing the cost to the economy and well-being of a nation and also allow for greater freedom of movement during a pandemic.

Discussion
We have shown that in real-world scenarios, automated contact tracing alone cannot contain a pandemic driven by a pathogen like the SARS-CoV-2. Advocating it as such can lead to an exasperating the spread of the pathogen. The primary reasons why such a strategy will not work as effectively as projected for SARS-CoV-2 is because of a large degree of spreading from pre-symptomatic and subclinical hosts, and the rapidity with which the virus spreads through proximity alone if no additional measures are taken to mitigate the spread. All of these conjugated with the vulnerability of automated contact tracing to insufficient sampling due to limited participation amongst the population and possibly incomplete reporting of sick cases will lead to the failure of contact tracing to be effective enough. With automated contact tracing a small fraction of the population being infected with SARS-CoV-2 can quickly lead to a majority of the population being needed to participate in the program.
We put together all the factors of concern and show that they follow a simple relationship. We further discussed how factors like the transmission probability p t should be reduced and the fraction of infected individuals that test positive, r c , should be increased to assist in reducing the burden on automated contact tracing while keeping the entire process voluntary. While our focus in this paper was to address the feasibility of contact tracing for containing the spread of SARS-CoV-2, eq. 2 can be applied for using contact tracing to contain other pathogens too. Our analysis is also independent of the methods of implementation of automated contact tracing and the definitions of r 0 and t 0 . Therefore, our approach is quite general.
The trust in contact tracing stems from the effectiveness with which it was used to contain pathogens like Ebola, SARS-CoV and MERS-CoV. However, the dynamics of the spread of SARS-CoV-2 is very different from these pathogens. Hence, the effectiveness of contact tracing in stopping the spread of these pathogens should not be seen as a validation of the effectiveness of contact tracing for SARS-CoV-2. To make contact tracing work in this case, a majority of the population has to enroll for this service and actively participate in it. If this cannot be established then other measures of mitigating the spread of SARS-CoV-2 should be implemented in addition. As can be seen by the success of several nations in containing the spread of COVID-19, only a judicious combination of contact tracing with measures such as partial social distancing, wide use of PPE and dissemination of information about the disease can prove to be effective the slow down the spread of the ongoing pandemic.

6/8
. CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted April 27, 2020. . https://doi.org/10.1101/2020.04. 22.20071043 doi: medRxiv preprint