In the shadow of privacy: Overlooked ethical concerns in COVID-19 digital epidemiology

The COVID-19 pandemic witnessed a surge in the use of health data to combat the public health threat. As a result, the use of digital technologies for epidemic surveillance showed great potential to collect vast volumes of data, and thereby respond more effectively to the healthcare challenges. However, the deployment of these technologies raised legitimate concerns over risks to individual privacy. While the ethical and governance debate focused primarily on these concerns, other relevant issues remained in the shadows. Leveraging examples from the COVID-19 pandemic, this perspective article aims to investigate these overlooked issues and their ethical implications. Accordingly, we explore the problem of the digital divide, the role played by tech companies in the public health domain and their power dynamics with the government and public research sector, and the re-use of personal data, especially in the absence of adequate public involvement. Even if individual privacy is ensured, failure to properly engage with these other issues will result in digital epidemiology tools that undermine equity, fairness, public trust, just distribution of benefits, autonomy, and minimization of group harm. On the contrary, a better understanding of these issues, a broader ethical and data governance approach, and meaningful public engagement will encourage adoption of these technologies and the use of personal data for public health research, thus increasing their power to tackle epidemics.


Introduction: COVID-19 as the first digital pandemic
The SARS-coV-2 (COVID-19) pandemic reinforced the role of data as an indispensable resource for fighting public health threats. For the first time in the history of epidemiology, researchers had real-time access to large volumes of health data (Johnson, 2020). Health authorities worldwide have relied on information from digital diagnostics, computer vision tools (such as temperature-sensing cameras), and social media epidemiological surveillance strategies and online searches, among others. When combined with powerful artificial intelligence (AI) algorithms and machine learning (ML)-based computational models, these datasets have provided valuable insights concerning regional rate of infection rates or the evolution of the epidemic. Hence, a change occurred in the approach to epidemic response during COVID-19. We shifted from a reactive model, chasing pandemic developments and attempting to mitigate the consequences, to a dynamic one that anticipates the steps forward and responds to the epidemic in real time. Such a shift had the potential to inform policymakers regarding timely responses to the health crisis (e.g., concerning which mitigation measures to adopt).
Among other examples, the successful case of the Global.health data repository exemplifies the power of health data to curb COVID-19 (Maxmen, 2021). During 2020 and 2021, this platform stored not an aggregate of data but rather anonymized individual datadate of positive infection test, coronavirus variant, disease symptoms, hospitalization, travel historyfrom over 150 countries, in a single open-access database. Epidemiologists worldwide had the opportunity to access this pool of highly granular data, comparing findings across projects and more accurately hypothesizing how the virus was spreading (Kraemer et al. 2021). The success of this story is largely due to involvement of the big tech industry (specifically Google), which invested human and financial resources to curate and standardize the various datasets, making them interoperable.
However, collaboration with such high-profile private partners is not always a viable option in biomedical and epidemiological research. Furthermore, the COVID-19 pandemic shed light on other issues that can undermine access to and use of health data in such contexts. Among them is the scarce availability of big data repositories at the level of national research institutes and ministries of health. Even where they exist, the data are often not complete, up-to-date, or granular to an adequate degree (Oderkirk, 2021). Lack of appropriate infrastructure and technology to share this data, especially across national borders, presents a further challenge. Data governance regulations vary greatly across jurisdictions, interfering with access and exchange of data between countries for research purposes (OECD, 2021). During COVID-19, researchers in Europe may have favored a risk-adverse approach to data sharing to avoid violating the General Data Protection Regulation (GDPR), which sets demanding standards in order to safeguard individual autonomy and privacy (McLennan et al., 2020). Consequently, there were calls to strengthen the open data science approach, to meet the rising demands for health data while ensuring safe data use (Gardner et al. 2021).
To address the hunger for data prompted by the race against COVID-19, a variety of digital technologies were leveraged for epidemic surveillance (Budd et al. 2020), including direct to consumers tools (mobile apps, social media, online search engines, wearable trackers), as well as computer vision devices and infrared cameras (Davis and Matsoso, 2020). While the discipline of digital epidemiology was established long before the COVID-19 pandemic (Salathe et al. 2012), over the past three years we have witnessed its full potential to harvest, analyze, and interpret data that were not originally collected for public health purposes. Notably, these data helped to detect COVID-19 infection and related symptoms at an early stage, monitor social removal and quarantine obligations, track infected contacts, and provide insights into citizen attitudes about vaccination campaigns (Whitelaw et al. 2020;Mahmood et al. 2020). As epidemic surveillance went digital, actors outside the healthcare environment, such as tech giants and telecommunication companies, emerged as new stakeholders in the data ecosystem (Robinson, 2022). However, such private sector stakeholders often hold divergent interests from those of national governments, health agencies, or biomedical researchers, with different ethical standards for data management and use (Thomason, 2021). This increasingly complex ecosystem of stakeholders, data types, interests, and standards brought to the forefront an issue that, though historically discussed in epidemiological surveillance (Mariner, 2007), received amplified attention during the COVID-19 pandemic: the problem of privacy.

Privacy in the spotlight
The COVID-19 pandemic affirmed the paramount value of privacy in the age of digital surveillance, particularly regarding the debate on the development and adoption of digital contact tracing. At a time when vaccines were not yet available, ministries of health and policymakers around the world endorsed the use of these technologies, and thus the processing of personal data, to contain the spread of disease (European Commission, 2020;Sust et al. 2020). Some Asian countries mandated that their citizens download mobile applications that relied on GPS or geolocation from cellular towers, storing the data in centralized government archives (Blasimme, Ferretti, and Vayena, 2021). The public health emergency seemed to justify such intrusive intervention by appealing to exceptional reasons of public interest and security.
On the other side, the adoption of digital contact tracing in Europe (and other Western countries) was not flawless. Beyond questions and concerns about the reliability of the technology was debate over the risk of harm to citizens through privacy violations. Some scholars have noted that data on one's health status has the potential to be re-used by third parties to discriminate and restrict individual rights (including the right to free movement, to study, or to work) (Gasser et al. 2020). Others have pointed out that institutional and government access to personal data (e. g., geospatial data) could inhibit the exercise of basic freedoms, if individuals feel watched as to what they do or with whom they spend time (Gasser et al. 2020).
Because of these potential risks, and despite the urgency of finding quick and effective solutions to curb the pandemic, European advisory committees and governing bodies emphasized the need to safeguard citizens' privacy and data. This precautionary approach aligned with GDPR provisions, as well as the opinion of the European General Data Protection Board (eHealth Network, 2020; EDBP 2020b). European regulators and policymakers recommended minimal processing of personal data and adherence to technical precautions, to avoid data leakage and cyber-attacks (EDBP 2020a). As a result, many European countries opted for privacy-preserving systems based on voluntariness, transparency, exchange of unidentifiable Bluetooth data, and decentralized data storage built on the application programming interface (API) created by Google and Apple. While these efforts represented the intention of democratic societies to protect one of their core values (i.e., privacy), they also framed the conversation on digital contact tracing, and on digital tools for epidemic monitoring more generally, in binary terms (Vayena, 2021). The tension between individual privacy and public health has been exemplified by an increasingly polarized privacy-focused debate. Whereas some scholars have criticized the European approach to digital contact tracking for prioritizing privacy over public health and the safeguard of human life (O'Connell and O'Keeffe 2021), others worry that these tools will become normalized and support a state of surveillance even in a post-pandemic world (Seberger and Patil, 2021).
While the issue of privacy dominated ethical, technical, and governance debates about digital surveillance during the COVID-19 pandemic, this did not translate into widespread adoption of digital contact tracking technologies. On the contrary, uptake of these technologies was quite modest. Among the empirical studies conducted to date to assess public perceptions and motivations about this phenomenon, one from the United Kingdom (UK) has suggested that lack of trust in digital health surveillance resulted from distrust in the government, and was further exacerbated by scandals involving big data corporations (Samuel et al., 2021). The extent to which the discussion on digital epidemiology overlooked other relevant issues and ethical concerns may also account for this distrust. The next section explores what lay in the shadow of privacy, and considers how neglecting other ethical concerns may not only negatively affect public acceptance of digital technologies, but also undermining their power to manage public health threats.

Unaddressed issues and ethical concerns
Individual and collective harm can result from failing to harness the potential of digital epidemiology to stop epidemics. However, the use of data for epidemic surveillance and control can also pose personal and societal harms. While various stakeholders displayed intention and effort to address risk to individual privacy during COVID-19, this cannot be considered a panacea for data ethics. Data ethics extends far beyond protecting data, ensuring control over one's information, and applying privacy-by-design technological choices (Blasimme and Vayena, 2020). Indeed, further ethical issues exist in relation to how (i.e., in which ways and by whom) and why (i.e., for which purposes) personal data are collected and used in digital epidemiology. If not promptly addressed, concerns about equity, accountability, trust, transparency, risk of group harm, and autonomy will persist and will disproportionately impact the most vulnerable, even when individual privacy is assured.

The digital divide
The COVID-19 pandemic shed light on the presence of two opposing but coexisting forces: on the one hand the abundance of data that characterized the battle against the pandemic, and on the other hand the scarce and poor quality data from certain population groups (Ibrahim et al. 2021). This duality can be understood in light of the digital divide that affected both advanced and emerging economies during the pandemic. The digital divide was evident in relation to three aspects: access to technology, ability/willingness to use technology, and variety of technologies.
First, digital technologies and communication infrastructure do not reach everyone the same way. Economic, cultural, and political barriers stand between technological potential and the opportunity to exploit it (Naudé and Vinuesa, 2021). Certainly, this problem is prevalent in lowand middle-income countries, but during the COVID-19 pandemic, it also emerged in high income economies (Eruchalu et al. 2021;Pagliari, 2020). In Europe, for instance, as only newer smartphone models supported the Google-Apple API system, those with older devices were denied access to these apps (Reader, 2020). Second, even in circumstances when access to technology is feasible, some people might not be online, either because of other limitations or by choice (Giansanti and Velcro, 2021). For instance, the older segment of the population may lack sufficient digital literacy and skills to take advantage of digital tools. Others may be unwilling to engage in such digital endeavors (e.g., due to fear of stigma and third-party data access, or due to lack of interest) (Nachega et al. 2021). Finally, the wide array of technologies rolled out during a health crisis is matched by inconsistencies in the quality of data collected by these technologies. The reliability of self-reported or social media datasets about disease symptoms for example, may be lower and more difficult to corroborate than that of traditional epidemiological datasets (Campos-Castillo and Laestadius, 2020). In addition, inadequate technology validation due to the pressure to address an urgent health threat only increases the likelihood of errors in datasets (Crawford and Serhal, 2020).
Although inequalities in data quantity and quality are not exclusive to the field of digital epidemiology, the magnitude of impact in this context is severe. Thus, the digital divide, data poverty, and biased datasets can lead to economic, social, and health burdens on many lives, while potentially exacerbating existing health inequalities amid a public health emergency. In the COVID-19 pandemic, missing or skewed data from vulnerable groups (e.g., elderly people, those in lower income households) resulted in undetected infections and inadequate care while the virus continued to circulate (Blom et al. 2021). Conversely, when areas were incorrectly flagged as high-risk for disease infection, the closure of schools and businesses affected entire communities (Mello and Wang, 2020). Because the stakes are so high, proper testing of datasets and validation of ML models, alongside technologies designed to be compatible with existing infrastructures and digital literacy (Veinot, Mitchell, and Ancker, 2018), are necessary to ensure fair distribution of the benefits of digital epidemiology. In a pandemic setting, acess to these technologies becomes even more crucial. In this regard, some researchers have recently recommended including digital transformation among the determinants of health, lest the most vulnerable be the most negatively affected by the effects of health digitization (Kickbusch et al. 2021).

The role of big tech
The roles and distribution of accountability between national governments and the private sector in the biomedical and health sectors require timely clarification. This urgency arises as the asymmetry of power between these two stakeholders is growing, and during the COVID-19 pandemic sovereign states appeared to be losing ground.
As commercial technology and telecommunications companies increasingly enter the spaces of digital epidemiology, health research, medical services, and healthcare infrastructure, they control an everincreasing amount of data (Tagmatarchi et al., 2021). Despite dealing with a public good (i.e. safeguarding health), the private sector decides whether or not to make these data available for research and public benefit (Kostkova et al. 2021). Unlike national governments, private companies are not bound by democratic and transparent decision-making processes, and do not have the same standard of public accountability. For example, pharmaceutical or health insurance companies acquire and control large amounts of health data but bind their strategies for using and managing this data through non-disclosure agreements. Despite efforts to standardize and update data governance within and across countries, some uses of data may still fall outside the purview of existing oversight mechanisms, particularly when it comes to publicly available data and data generated by the private sector (Ferretti et al. 2020). Yet, the question of the liability and moral ground of the private sector in the case of exploiting sick people for corporate profit must be considered carefully, as it affects trust in government, corporations, and ultimately in health research (Levine, 2021). In this regard, scholars have investigated the limits of informed consent in digital health research, as well as the negative impact of insufficient public engagement and unfair distribution of benefits towards data subjects (Amann et al., 2021;Paterson and McDonagh, 2018;Banks, 2020).
This aside, the heart of the matter of the "Googlisation" of health is that private companies have become indispensable players in various sectors of society, capable of providing platforms connecting the sphere of health to those of communication, marketing, education, transportation, and others (Sharon, 2022). Thanks to this methodological advantage and pervasive network, companies offer services that not even governments can resist. Hence, the U.S. government's recent collaboration with numerous dating apps to promote the COVID-19 vaccination campaign among young people (Judd, 2021), illustrating the growing one-way reliance of lay citizens and governments on the private sector to address (public) health needs.
Strengthened by this power, commercial companies advocate positions that often go beyond their technological expertise. These positions may not only influence the focus of biomedical research, but can also impact changes in society (Sharon, 2021). Google and Apple's prompt offer of digital support through their privacy preserving API system in the fight against COVID-19 is one example. This case illustrates the power of corporations to set technical standards that can hardly be negotiated. Google and Apple prioritized privacy over the use of sensitive data (e.g., location data), and in doing so determined the balance between privacy protection and data access (Kahn, 2020). What remains to be seen is whether the private sector will assume the responsibility that accompanies such determinations, and how this will impact the power dynamics with national governments and public trust.

Data (re-)uses without public engagement
During the COVID-19 pandemic, many people were willing to share their data and health information to improve the public health situation. However, this proactive attitude was perhaps at times misunderstood by government authorities, who interpreted it as a free pass to re-use these data as they wished, as long as it was for a "common good". In this regard, we witnessed a series of incidents involving law enforcement agencies accessing health data for investigations without seeking public agreement, and even though these data were explicitly collected to fight COVID-19.
A first case was reported in early 2021, in relation to the Singaporean government's app TraceTogether, a Bluetooth and centralized systembased contact tracing tool. Although the app was praised for its effectiveness in monitoring the spread of the virus and enforcing COVID-19 restrictions, some criticized it harshly for failing to adequately protect citizens' privacy. As law enforcement agencies were granted access to citizens' data, the government revised the app's original privacy statement and amended legislation to justify use of this data for serious criminal investigations (Ikeda, 2021). A similar scandal emerged in Fall 2021 when Australian police accessed QR code check-in data for criminal investigations on at least six occasions, even though the data were originally collected by digital epidemiology tools for outbreak monitoring (Galloway, 2021). Finally, the recent "Luca app" case caused a stir in the public debate, as the German police successfully petitioned local health authorities to release location data collected via this check-in app used to trace restaurant guests and shop customers (Pannett, 2022). By misusing data originally gathered to protect against infection, the police and prosecutors violated German data protection law. A great deal of media hype followed each of these cases.
There have also been reports of individuals being tracked, privacy being breached, and minorities being stigmatized. Some authors have noted that the misuse of data by law enforcement agencies is likely to exacerbate profiling, policing, discrimination, and criminalization of vulnerable groups and minorities (Sundquist, 2021;Spektor, 2021). Facial recognition software, thermal imaging, and other digital epidemiology tools can lead to human rights infringement, besides being relatively ineffective and inaccurate at detecting communicable diseases like SARS-coV-2 (Roussi, 2020;Hendl, Chung, and Wild, 2020).
The limits of data access should be a matter of public engagement and deliberation (namely seeking public opinion via different means such as referendums, polls and co-creative opportunities) during which concerns can be voiced and benefits and risks can be understood and balanced. Lack of engagement can undermine public support and lead to a de-legitimization of public health measures, even if governments have laudable intentions (such as promoting public good or catching criminals). If data are collected in non-transparent ways, without informing the public or obtaining permission, the public can in response feel spied upon and betrayed, diminishing trust in institutions (Zhao et al. 2021). By accessing data for purposes which the public does not approve, authorities undermine public trust.
As an example, the Canadian government recently procured aggregate location data from a telecommunications company in order to monitor the prevalence of the pandemic in certain areas (Berendt, 2021). Despite the authorities' good intentions and the fact that these data aggregates may sufficiently protect individual privacy, this decision sparked a public response and questions about non-transparent public-private partnerships in digital epidemiology and health research. Similarly in the UK, concerns were raised about the use of wastewater data to forecast COVID-19 transmission, despite the absence of individual privacy violations (Tubb, 2022). This case exemplifies once again howregardless of the urgency and exceptional circumstances of a health emergencycompromising transparent communication and adequate involvement of the population can hinder positive outcomes (Gable et al., 2020). On the contrary, lack of adequate communication and misinformation may increase mistrust in health care authorities, and consequently negatively affect the adoption of public health measures.
The end-users may reject digital epidemiology interventions as unacceptable if their opinions and perspectives were not included in the development of the interventions. Notably, research shows that people find more acceptable those tools that involve their active participation at each design phase, as well as those that are aligned with their preferences and expectations (e.g., about the data uses) (Perski and Short 2021;Westerlund et al., 2021). These preferences are influenced by context, sociocultural norms, and individual needs, suggesting that a one-size-fits-all approach to the development and implementation of these technologies may be inappropriate. The risk would not only lie in the potential rejection of public health measures, but more dramatically in discriminating against those individuals or groups whose voices have not been heard and whose needs have not been addressed (Crawford and Serhal, 2020).
Although a daunting task in the health crisis setting, it is nonetheless crucial to promote open dialogue with stakeholders, codesign of technologies, careful assessment of the enabling context, and meaningful involvement with vulnerable individuals and marginalized groups. As a recent analysis suggested, these strategies may encurage public agency and data sharing for research purposes, while ensuring social acceptance and greater trust in public health technologies (Erikainen et al. 2021).

Conclusion: The path towards a more comprehensive ethical approach to digital epidemiology
Our experience with COVID-19 has shown that data for epidemic surveillance must be protected. Certainly, data privacy regulation and privacy-by-design help to limit the frequency of data abuse. In this regard, stakeholders seem to be increasingly aware of privacy issues, as evidenced by efforts to avoid data misuse (Sharon, 2022). Yet, critical lessons must still be learned and acted upon to guarantee more ethically-aligned used of digital epidemiology.
A first lesson is that beyond privacy, there are still unresolved issues to critically addressed. We need to rethink what it means to use and rely upon digital epidemiology, as even guaranteed data security does not translate necessarily into fair, transparent, and correct use of data. We must redefine the ethical rationale that justifies the implementation of these technologies and the use of personal data. Such ethical appraisal and reflection must be integrated in the process of developing a technology and reiterated at various stages from conceptualization to deployment.
A second lesson is the need to clarify the relationships and the roles of the public and private sectors in public health research and services. The definition of mechanisms to hold governments, private companies, researchers, and technology developers accountable is of paramount importance to ensure the ethical use of digital health technologies. In this regard, harvesting data for public good might not be reason enough to justify data re-use, notwithstanding individual privacy safeguards.
A third lesson is that public trust and adequate social license for data usage serve to legitimize digital surveillance interventions. Despite claims of seeking to engage with underrepresented voices and integrate their perspectives into data governance and digital technology development, this action has yet to happen (Agrawal, 2021). Hence, the call of the World Health Organization, among other institutions, for adoption of community data oversight (WHO, 2021): both private and public sectors should seek meaningful social engagement when deploying digital health tools and using personal data for health research.
While these issues have been raised since the early days of digital epidemiology (Vayena et al. 2015), we have yet to effectively address them. The pandemic experience should serve as an opportunity to now promote a more ethically aligned use of surveillance technology against health threats in order to unlock its full potential.

Funding
This research did not receive any specific grant from funding agencies in the public, commercial, or not-for-profit sectors.