1 Introduction

At the turn of the twenty-first century , public health officials began to notice an increase in the number of human infections with highly pathogenic avian H5N1 influenza virus originating from birds in close proximity to people. The possibility of influenza pandemics is always a concern, but this zoonotic jump seemed particularly worrisome because the case fatality rate (CFR) was approximately 60%. Fortunately, the virus did not acquire mammalian transmissibility, and there were no confirmed examples of human-to-human spread. Nevertheless, there was great concern that if the virus did acquire human transmissibility and maintained such a high CFR, the world could face a public health emergency of unprecedented danger. For comparison, the CFR during the 1918 influenza epidemic , which many believe to be the worst influenza pandemic in history, was approximately 2.5%.

Another development in the mid-2000s was the creation in the USA of the National Science Advisory Board for Biosecurity (NSABB), whose charge was to advise the US government on so-called “dual-use” problem in biomedical research: research being performed with beneficial goals in mind, but the results of which could be misused for nefarious purposes. The NSABB delved into the dual-use controversy early in its tenure when the US government asked it to review the paper describing the reconstruction of the influenza virus strain responsible for the 1918 epidemic [1]. Although the NSABB voted to recommend publication, the editor of Science made it clear that the journal would have published the article irrespective of the NSABB vote unless the paper was classified [2]. For the next half decade, the NSABB struggled with the problem of how to deal with dual-use research in the biological sciences and proposed identifying a small subset of science known as “dual-use research of concern,” or DURC, as that domain on which to focus efforts.

While the NSABB was formulating DURC definitions and devising recommendations, two laboratories set out to experimentally test whether H5N1 virus could become transmissible in what is thought to be the best animal model for such studies, the ferret . Research groups led by Yoshi Kawaoka in the USA and Ron Fouchier in the Netherlands took similar approaches albeit with different starting strains, first engineering the ability of the virus to bind to human receptors and then serially passaging it through ferrets. Both obtained the same answer: the H5N1 virus could attain the ability to be transmitted via respiratory droplets. They wrote manuscripts and submitted them for publication in Nature and Science, respectively [3, 4]. In late 2011 the NSABB had the opportunity to evaluate its policies and recommendations when the US government learned of the two submitted manuscripts describing these high pathogenicity avian influenza viruses (HPAIV ) that had been made mammalian transmissible: it asked the NSABB to advise it whether publication was wise given potential biosecurity concerns. The details of the deliberations of the NSABB and its ultimate decision to recommend publication have been described in detail elsewhere, but briefly, the NSABB determined that the benefits of the research outweighed the biosecurity risks [5]. Hence, the first round of the controversy was focused primarily on biosecurity issues.

After the 2012 decision on the two influenza papers, the situation quieted for a couple of years until a series of biosafety lapses at US government laboratories at the CDC and the NIH received rekindled interest on the problem. These laboratory incidents received a high degree of public attention, spearheaded by the reporting of Alison Young at USA Today. This conjunction of laboratory accidents together with additional follow-up publications [e.g., 6] puts the H5N1 story in a new light, namely, whether similar experiments in which new phenotypes are added to pathogens, gain-of-function (GOF) studies, could be conducted safely and whether they should be pursued at all. Confronted with a public outcry combined with a serious scientific debate on the benefits and risks of GOF-type experiments, in 2014 the NIH, which has administrative responsibility for the NSABB, imposed a moratorium on US-funded GOF experiments with “pathogens of pandemic potential (PPP),” those being influenza virus, severe acute respiratory syndrome (SARS) coronavirus (SARS-CoV), and Middle East respiratory syndrome (MERS) coronavirus (MERS-CoV ). While SARS-CoV had disappeared from the human population due to a highly successful public health containment endeavor, the MERS-CoV outbreak had emerged in the Middle East and was (and still is) ongoing at the time. The US government also charged the NSABB with making recommendations about the future of GOF research (https://www.phe.gov/s3/dualuse/Documents/gain-of-function.pdf). We have summarized the events of 2014 in a series of papers, and those details will not be recounted here [7, 8]. Instead, our focus will be to make the case for the scientific and moral value of GOF-type research provided that it can be conducted safely. Hence, the second round of the controversy was focused less on biosecurity and intentional release and almost entirely on biosafety.

2 Benefits of Gain-of-Function Studies

We have previously argued that the H5N1 experiments, and similar GOF studies, provide benefits at many levels, from the practical to the epistemological [9, 10]. Some tangible benefits of GOF studies involving influenza virus include:

  1. 1.

    The results of GOF experiments can be definitive. For example, the 2012 GOF studies showing that HPAIV could acquire the capacity for mammalian transmissibility established that these influenza viruses had the biological capacity to emerge as a contagious human pathogen. Prior to these studies, it was not clear whether the absence of human transmissibility in the isolated cases of HPAIV was a result of a biological limitation in the ability of the virus to be mammalian transmissible or just a stochastic effect such that the necessary mutations had not occurred. The 2012 GOF studies unequivocally established the capacity of these HPAIV for mammalian transmission, which in turn implies pandemic potential. To our knowledge there are no other experiments or analysis that could have provided such definitive information. Hence, these experiments provide a warning to humanity of the dangers posed by these HPAIV strains and suggest that similar dangers lurk in other influenza strains. In this regard, similar experiments have shown that H7N1 has the capacity of mammalian transmissibility [11], extending the threat horizon to H7 viruses.

  2. 2.

    The results of GOF experiments can inform on important biological questions. For example, GOF experiments yielded mutants that when analyzed showed that higher and lower pH optima for the hemagglutinin were associated with enhanced virulence in birds and mammals, respectively [12]. This information is important for knowing how influenza viruses jump from birds to humans, which is a critical step in the emergence of new pandemic strains.

  3. 3.

    GOF experiments can be used to produce new viral strains to improve vaccine production. One of the hurdles in vaccine preparation is the adaptation and growth of vaccination strains in eggs for efficient production. GOF-type of experiments can be used to identify mutations that facilitate replication of influenza strains in eggs, and this information could facilitate vaccine production [13].

Are there other claimed benefits that have been derived from the H5N1 studies? At the time of publication, the authors and others argued that the results would inform surveillance efforts and vaccine development. It is unclear to us how much surveillance has benefitted although the mutation information has been used in monitoring [14, 15]. As we and others have pointed out, there is a danger of focusing on the exact mutations discovered in the ferret experiments because there may be other genetic routes to human transmissibility [10]. Some have noted that by the time such mutations have been detected in avian populations, it might be too late to stop the spread due to the relative rates with which we can identify those mutations using current technology and the rate of spread of influenza virus [16, 17]. Increased awareness of the threat of H5N1, however, must certainly have led to better isolation of patients who have been exposed to the virus. In this regard, it seems reasonable to assume that any onward transmission of an avian strain would occur largely in the healthcare setting, as has been the case with SARS-CoV and MERS-CoV, and rapid intervention could prevent a widespread outbreak.

While monitoring the mutations discovered in the laboratory from GOF experiments is not sufficient to predict dangers from environmental isolates, it is important to note that while both groups obtained viruses with different mutations, they both achieved similar phenotypes of enhanced stability of the HA protein . As our ability to predict phenotype from genotype improves, this becomes increasingly important. A big caveat to overstating the importance of this, however, is the contribution of epistasis to any given phenotype. For example, the same mutations identified by the Fouchier group, when introduced into a different H5N1 background, yielded a different HA phenotype [18]. This implies that insights gained from one set of mutations in one strain are unlikely to be generalizable to other strains given sequence differences. Hence, with the hindsight of half a decade of work since the original controversy in 2012, it appears that GOF-type experiments are very informative with regard to big questions such as whether mammalian virulence and transmissibility potential exists in HPAIV but may be less useful in making fine-scale molecular predictions.

3 Developments Since 2014

As mentioned above, in 2014 the US government mandated a moratorium on GOF-type experiments involving PPP such as influenza, MERS-CoV, and SARS-CoV . To the best of our knowledge, US-supported experiments to examine changes in transmission of avian influenza viruses have largely stopped. The Fouchier lab did publish a follow-up report to their original Science manuscript in which they narrowed down the exact mutations that enabled transmission in the ferret model, but based on the publication date, it appears that this work was completed prior to the moratorium [19]. In the MERS-CoV field, the development of a small animal model that faithfully reproduces human transmission and disease has been slowed significantly by the moratorium. MERS-CoV uses the dipeptidyl peptidase 4 (DPP4) protein as a receptor, and the human and murine proteins differ enough such that the virus cannot use the mouse molecule. Transgenic mice that express human DPP4 ubiquitously experience a broader set of symptoms than do humans when infected with wild-type MERS-CoV [20,21,22,23]. This problem was recently overcome by developing a transgenic mouse expressing a mutant DPP4 in which two key amino acids were mutated from the mouse allele to the human allele. However, the authors still needed to passage the virus through these transgenic mice to derive a GOF variant that recapitulated human disease [24]. This GOF virus was subsequently used to show the efficacy of a promising nucleotide prodrug, originally initially developed for Ebola virus, for treating MERS [25]. Continuing efforts to produce a mouse-adapted MERS-CoV in wild-type mice have been prohibited [26; R. Baric, personal communication].

The question remains, how does society at large, and the scientific community and regulatory agencies in particular, weigh the risks and benefits? The difficulty lies largely in trying to apply quantitative risk assessment measures to the problem. The benefits of all biological research, not just GOF research, often do not make themselves apparent until years or even decades after the experiments are performed. On the other hand, the risks, even when theoretical, manifest themselves in the present when the experimentation is done. Fortunately, we have not had real examples of laboratory accidents leading to significant morbidity or mortality. The 1977 reintroduction of H1N1 into human circulation has been attributed to a laboratory mishap, although not all agree since other explanations are possible [27,28,29]. As discussed below, some authorities have presented worst-case scenarios based on hypothetical calculations, whereas others have used similar types of data to come up with numbers at the extreme opposite end of the spectrum, arguing that the likelihood of such events is extremely low.

Perhaps the greatest benefit of this controversy has been an increased attention to biosafety when working with dangerous pathogens. In 2014, the NIH proposed a Biosafety Stewardship Month as a means to promote increased attention to biosafety, and many institutions took advantage of this opportunity. Indeed, while it can be misleading to draw strong conclusions from the absence of events, the types of events that found their way into the news media earlier in this decade seem to have decreased significantly. Another benefit of the controversy was that it has stimulated efforts to gain comparable information in systems that do pose the same biosafety or biosecurity risks as working on the wild-type virus. For example, a recent study reporting that only three mutations were needed to switch H7N9 tropism to human cells was performed using an attenuated virus [30].

4 The Moral Dimension

GOF research, especially that involving a PPP , has become the focus of intense ethical debate in the wake of the 2014 moratorium. The ethical concerns have been formulated along three main axes of criticism.

4.1 Misuse of GOF Research

The first concern is the potential for misuse of any information or product generated by GOF research by bad actors. The worry here is that, once the results of GOF studies are published, bad actors could replicate the work for nefarious purposes, such as a terrorist attack. Such concerns were expressed, for example, in connection with the aforementioned 2012 publication of studies of engineered avian flu virus transmissibility in ferret models [5, 3, 4]. As noted above, the question whether to permit publication was considered by the NSABB, which eventually allowed publication, judging that the risk of misuse was outweighed by the potential benefits.

This is neither the first nor the last occasion on which questions have been raised about the risks of dual-use research, with examples ranging far beyond biomedical research to include such technologies as unmanned aerial vehicles being used as weapons and hacking being used for the Stuxnet cyberattack. Cyberwarfare tools can be used to enhance national security by disarming opponents while at the same time finding employment in crime such as occurred recently when the leaked National Security Agency code was used in Spring 2017 for global ransomware attacks.

Such misuse of GOF research is certainly possible. But one serious problem in an ethical evaluation of possible misuse is the difficulty of estimating the likelihood and potential impact of misuse. One can imagine numerous different scenarios involving everyone from major state actors and non-state terrorist organizations to freelance mischief-makers working in a home basement lab. There is no way to quantify over so many possibilities, let alone to assay systematically the effects that such a diverse array of actors might achieve.

The misuse concern is further complicated by the fact that the cost and complexity of using relevant tools, such as CRISPR/Cas9 , continues to decline, making ever easier the replication of even unpublished research. Indeed, the risks of the democratization of CRISPR/Cas9 technology go well beyond GOF research, and there has been a call for a much more concerted, international, public debate about monitoring and possibly regulating access to some of those tools [see 31].

Given these difficulties, what is the responsible way for GOF researchers to proceed with respect to potential misuse? This is a place where prudence might be a better guide than cost-benefit analysis. Thus, one notes that, for the researcher’s own safety, GOF research with especially dangerous pathogens must be done in labs with very high biosafety standards, at least biosafety level (BSL )-3. That means that, except for the rare suicidal individual, such research will only be feasible in countries with the necessary biosafety facilities and protocols, and the list of such facilities today includes mostly nations with proven records of responsible research conduct, nations which also have the technical expertise to conduct such research entirely on their own. Of course one could imagine a nation like North Korea aspiring to a bioweapon capability, and as they have demonstrated with their successes in offensive cyber operations, ballistic missiles, and nuclear weapons, there is no lack of technical talent in North Korea.

For all of these reasons, the risk of publication of GOF research related to misuse of biotechnology seems to have receded from the forefront of concern. In addition, the scientific publishing community is much more aware of these issues, and many journals have instituted internal reviews for papers that include DURC [e.g., 32, 33]. The more prominent worry today is about accidently unleashing the very kind of global pandemic that one was seeking to prevent.

4.2 Accidental Release of Highly Virulent Pathogens

A number of critics have argued that the risk of inadvertently creating a global pandemic through accidental release of an engineered, human-transmissible pathogen with high virulence and case fatality rate vastly outweighs any benefits that might be obtained from such research [e.g., 34]. Thus, one source claims a 0.01% to 0.1% probability per year of research in a BSL-3 lab of an accidental release of highly transmissible influenza virus that would kill between 200,000 and 16 million people [35, 36]. If this is a reliable estimate, that’s a scary prospect. On the other hand, one of the authors of the original H5N1 studies has calculated the risks to be much, much lower [37].

Given a claimed risk on that high a scale, how shall we think about the balance between benefit and risk? First, we must ask some tough questions about the risk analysis itself. Many factors go into such an analysis, including historical data on accidental exposure in BSL-3 and BSL-4 labs and simulation studies of disease spread after accidental exposure. We consider each of these factors separately.

To date, only one reasonably sophisticated simulation study for the spread of a potentially pandemic influenza virus has been reported [38]. This is the study cited by Lipsitch and Inglesbsy [35, 36]. The researchers who conducted the simulation study had to make many assumptions about such variables as virulence , case fatality rate, latency time, demographics, geography, response capabilities of public health authorities, and monitoring of lab personnel for symptoms of exposure. The simulation model was robust against variations in many of these parameters, but there were considerable differences in outcome with some variations, such as early detection of initial exposure. Still, on the whole, the numbers are grim. With the right combination of factors, it is theoretically possible for an accidental release to wreak havoc. That brings us to the question of how probable such a catastrophic event is and the science of risk-benefit analysis.

The other major component of the risk calculation—the chance of accidental exposure or release—is generally estimated on the basis of historical data, which, with its foibles, is far less robust. The spate of accidents in US BSL-3 labs a few years ago rightly aroused concern among experts but also generated in the public mind a perception of risk that might not accurately represent the current state of affairs after more stringent monitoring and review procedures were implemented. The annual rate of accidental exposure today could easily be ten to one hundred times smaller than it appeared to be in 2013. Of course, 200,000 to 16 million deaths per lab year is still reason for worry. But then there are other factors to consider. The reported accidental exposures aggregate data on all organisms that require high levels of biosafety containment. None of those cases involved GOF research, and, given the attention being paid to GOF by comparison with research on, say, natural pathogens, it is not at all clear that we can reasonably extrapolate from the historical data on accidental exposure to the risk of accidental exposure or release in GOF research.

There are still deeper problems with a cost-benefit approach to assess the ethics of GOF research. First, assessing the risk of accidental release and an attendant global pandemic is only one small part of a more comprehensive cost-benefit analysis, which must always compare the costs and benefits of alternative courses of action. Proponents of GOF research argue that it can play a crucial role in preventing or lessening the effects of a global pandemic by enabling mitigation factors such as early detection and the rapid development of vaccines. One obvious additional risk, then, is the risk of a global pandemic that might have been prevented or mitigated by continuing GOF research. That risk is even more difficult to quantify than the risk of pandemic through accidental release. Still, it must be part of a comprehensive analysis, and since its major consequence, e.g., a global pandemic, is just as severe as a pandemic caused by accidental release, such a scenario would loom just as large as the accidental release scenario in a thorough cost-benefit analysis.

Another deep problem with conventional cost-benefit analyses is that they usually do not include the intrinsic benefit of new knowledge [9]. One might think that “knowledge” is something too ephemeral to be quantified, or one might argue that, if knowledge counts as a “benefit,” it does so only through the consequences of such knowledge for human flourishing, which means that it does enter our calculations indirectly though measures of the costs and benefits of human health and human suffering. One does not want to indulge uncritically in the mantra of knowledge for knowledge’s own sake, a trope that has too often been used to justify research, but the fact remains that, on the whole, increased knowledge brings increased ability to promote the good and to mitigate suffering. New knowledge can always be used for evil ends, but how we use that knowledge is a moral choice, and if we don’t have that knowledge in the first place, then we cannot use it for the good. Exactly how new knowledge will contribute to future human flourishing in a complex and rapidly changing technical and social world is usually unforeseeable at the time when the new knowledge is first generated. Technology ethicist Shannon Vallor dubs this the problem of “technosocial opacity” [39]. Thus, it would seem impossible to include the long-term effects of new knowledge production in a cost-benefit analysis of the research generating that knowledge. The only way to fairly assess the benefits of such research is to put a premium on knowledge production per se. How to do that is the hard problem, and that difficulty must be added to the list of reasons suggesting that cost-benefit analysis is a seriously limited tool for policymaking about scientific research.

Perhaps the deepest problem with the cost-benefit approach to policymaking is what we have termed elsewhere the “apocalyptic fallacy” [40]. Imagining an unintentional, global pandemic as one possible outcome of GOF research is tantamount to assigning infinite negative utility to such research in a cost-benefit analysis. Thus, it makes no difference how low the probability of such an outcome might be. Infinite negative utility, multiplied by any finite probability, totally swamps every other term in a cost-benefit analysis, meaning that no imagined benefits of alternative courses of action, however great (excepting eternal salvation for all of humankind), can make a difference in the calculation. What this means is that cost-benefit analysis is, effectively, useless in such settings, at least with regard to the moral assessment of something like GOF research. If cost-benefit analysis is the epitome of reason and rationality in policymaking, then reason fails us in cases where the risks include the extermination of a significant fraction of all human life.

Nevertheless, there is still value in risk analysis. As the above-cited simulation study of disease spread after accidental exposure and release amply demonstrates, a careful risk analysis can point us toward those factors that are critical for minimizing risk, such as enhanced biosafety protocols and rapid public health responses. But when promoted to the public and policymakers as forecasting a doomsday scenario, such analyses risk inducing panic that overwhelms practical reason.

Similarly, comparisons of the risks of GOF research to the events leading to the development of the Nuremberg Code do not serve a useful purpose in the discussion [34]. The use of the word “Nuremberg” connotes an association with war crimes that is simply inappropriate for use in a rational discussion about experiments that are ostensibly being done by well-meaning scientists trying to prepare humanity for confronting a potential pandemic. Although we are well aware that the Nuremberg Code and the war crimes of the high echelons of the Nazi hierarchy are very different things, the problem lies in the symbolism of the phrase and how it may be perceived by the public. We urge that it not be used and in fact, it is possible to discuss these important principles without invoking the name of the city with its historical baggage.

If cost-benefit analysis is not the optimal tool for policymaking with respect to GOF research, what is? Earlier, we mentioned the virtue of simple prudence in thinking about dual-use research and technology. Perhaps that is the answer here as well. Technology ethicists have given the name “precautionary principle”—policymakers prefer sophisticated names like this—to the homespun idea of prudence. But prudence is really a simple idea, familiar to us all. It means, “think before you act” and “do not act if you are not sufficiently secure in thinking that you will do more good than harm.” Thinking before acting should include risk analysis in settings such as GOF research. But a risk analysis is not the end of clear thinking; it is only the beginning.

5 Conclusions

Where does this leave us? One major conclusion is this: For almost every form of human activity, there is a nonvanishing probability of catastrophic outcomes. Any casual individual act could set in motion a chain of events that would lead to the rise of the next world war. This is highly improbable but possible, and the consequences would be cataclysmic, outweighing any imaginable good that might derive from that action. Therefore, the precautionary principle would suggest that we do nothing. But doing nothing can equally well engender a sequence of events leading to human extinction. We conclude, therefore, one should act. This dilemma highlights a fatal contradiction lurking within any attempt to rationally assess actions that might entrain consequences with infinite negative utility via cost-benefit analysis.

How otherwise should we proceed? This is not a hard question. Think about both risks and benefits, take obvious precautions, and then make the prudent choice. With enhanced biosafety protocols and improvements in the public health response, we should not ban GOF research but monitor it. The relevant research communities should insist upon stringent norms for the conduct of the research and in safety protocols. Provided that these conditions are met, there is no obvious reason why GOF type of experimentation should not go forward.