The growing availability of microbial genomes sequenced for health care rather than research raises the question of whether such data should be included in an individual’s electronic health records (EHR). While integrating human genome data into EHR has been widely discussed1, microbial genomic data bring unique and important challenges. One challenge is that the ownership of microbial genomic data remains ambiguous. Genomics service providers, public health agencies that fund such services, and patients consider themselves as stakeholders in genomic data governance. While human DNA defines our identity, this cannot be said about the genomes of coronaviruses, Salmonella or other microorganisms. These pathogens are temporary residents, and their genomes remain the same when sampled from different humans, especially those within a transmission chain2.

Another challenge is that microbial genomics is a highly specialized field, with data from a growing and increasingly complex spectrum of pathogens and commensal flora that includes viruses, bacteria, medically relevant fungi and parasites. Each of these pathogens represents a different disease with unique pathology and epidemiology and comes with specific terminology standards3.

Nevertheless, the utility of microbial genomics in controlling communicable diseases is clearly established4, as demonstrated during the COVID-19 pandemic5. The inclusion of microbial genomic sequences within EHR together with comprehensive standards for data interoperability will enhance surveillance and disease prevention and can be guided by five principles (Box 1).

Population level data

The first principle for including microbial sequencing data into EHR is that such data should be linked to population-level data, as this is critical for disease prevention and control analyses. This ensures that important public health and population data are available within the EHR of individual patients. Data linkage between healthcare records can improve the effectiveness of public health surveillance and interventions4,5,6. Multi-jurisdictional outbreaks and the cross-border spread of diseases have been understood and controlled through the sharing of genomic data. The similarity between the genomes of pathogens recovered from clinical cases of food poisoning and contaminated foods has been instrumental to identify sources and implement preventative public health actions3,4.

The value of such shared data is a function of dataset size. The larger the sets of shared genome sequences, the more representative they can be of current disease activity and the easier it is to identify clusters of infections with a common source, which can then be identified and acted upon. A modelling study predicted that for each additional 1,000 genomes of foodborne bacteria added to a database, there are ~6 fewer cases per pathogen per year7. As genotyping of drug-resistant bacteria and pathogens with epidemic potential becomes the standard-of-care in infectious disease control, it will permit fine-grained observational analysis of treatments and outcomes, shaping best practice recommendations and identifying malpractice. Table 1 describes the main benefits and challenges of microbial genomics data linkage in health. To minimize risks such as privacy breaches, only minimal metadata (such as outbreak location) are shared along with the microbial genomes, especially in publicly accessible databases.

Table 1 Benefits and risks of microbial genome data linkage in EHR

Patient care

The second principle for including microbial sequencing data in EHR is that microbial sequencing data must be essential to the management of a patient’s health and have a specific and essential role in shaping decisions about their care. The value of microbial genome sequencing lies in the recognition and tracking of transmission events, as well as the identification of antimicrobial resistance and clinically relevant co-infections, with distinct variants of the same pathogen potentially improving the risk assessment and selection of treatment8,9,10. Three use cases in Table 2 provide examples of the added value of genomic data in patient management and infection control. In the three examples, microbial genomic data offer high-resolution diagnostic information, enable the detection of genomic markers of drug-resistance in difficult-to-treat and high-consequence diseases and allow risk stratification for healthcare-associated infections.

Table 2 Clinical case studies using microbial genome sequencing

Genomics data are usually available as raw sequences produced by next-generation sequencing, consensus sequences inferred by bioinformatics pipelines or text files reporting laboratory interpretations. Additionally, metadata may contain information about a patient’s demographics, data about specimen collection and exposure history, or the results of phenotypic tests on the sequenced pathogen. While many clinical applications would be well served by laboratory reports, some require less-processed data. Consensus sequence data may be needed to provide a high-resolution diagnosis in cases of severe disease, and raw sequence data may be expected to support the accurate detection of drug resistance in an actionable timeframe or to guide infection control measures for hospital-acquired infections (Table 2).

Interpretable data

Microbial genomic data must be interpretable by a clinician. Typically, diagnostic laboratories report processed rather than raw data to requesting clinicians. Results of molecular tests such as PCR are often provided with a statement such as ‘nucleic acid of a pathogen was detected’ but not the primary data, like melting curves from the PCR. The specialized nature of genomic data creates significant complexity for microbial genome analysis outside of genomic laboratories. There is a risk of misinterpreting specialized genomic data or incidental findings after the initial laboratory report, which could impact patient safety and privacy, with medico-legal implications.

However, advances in genomics and data analytics challenge this approach of withholding information to avoid confusion or misinterpretation by clinicians and encourage the full sharing of sequencing data as well as the provision of support for clinicians when interpreting data. Historically, clinicians have become adept at interpreting signal data such as electrocardiograms or radiological images, either as part of comprehensive training or for specific clinical issues. There is no reason to expect that infectious disease specialists will not gradually upskill in genomics. If microbial sequence data are to be retained in EHR, competence in understanding these data should be a requirement and part of medical training, and the interpretation of genomic data should be aided by computerized decision support.

Decision making

Microbial sequence data must be presented in a way that they can be directly and meaningfully interrogated by a clinician or informatics specialist or be integrated into software that provides the requisite decision support1. The representation of sequence data in decision support software must permit computational access and manipulation, for example, through the creation of summary visualizations, the identification of features of interest or annotation with relevant and up-to-date external information, given that knowledge about pathogen genomics as well as therapeutic options and links to known local outbreaks evolves rapidly. Currently, the integration of structured microbial genomic information into EHR to support patient care remains limited11. Microbial genomics reports are generally still presented as text files, and more structured delivery will require embracing information standards. The research community needs to invest in the implementation of standards for microbial genomics metadata to ensure that these data are interoperable, findable, searchable and reusable12.

Privacy and access

Microbial genomics data that have been generated as part of healthcare delivery have often also been stored in health databases and registries focused on notifiable conditions. The value of sharing genomics data is shaped by its timeliness. Withholding human genome sequence data is commonplace among researchers and companies, but delays in pathogen data sharing can reduce the likelihood that transmission events are recognized in an actionable timeframe, and the negative effects of data hoarding on disease control have long been recognized13. Microbial genomic data stored in EHR serve different purposes to the data captured in public health or open databases or biobanks. Sequencing data in EHR can support prescribing decisions, diagnostic testing and prognosis. By contrast, data in public health information systems underpin public health investigations of disease clusters and non-pharmaceutical interventions. Biobanks collate data and the associated samples from healthcare providers and researchers to facilitate reanalysis of genomic data. We would thus expect that different types of genomic data and different types of data governance arrangements would be needed to support clinical management, population health and translational research.

While open databases can support open science and crowd-sourced discovery, genomic data linked to individual and public health records can support clinical trials and can be used to measure the outcomes of treatments and population health interventions. Sharing microbial genomic data and metadata associated with diseases affecting humans and animals and involving cases across different jurisdictions requires collaboration between sectors, including human and animal health, food and environment, and relevant government, commercial and not-for-profit stakeholders3,14.

Genomic data providers must reduce the risks to individuals of privacy-invasive or reputation-damaging inferences that could be drawn from microbial genomics data, and they must protect the legally recognized rights of individuals to access personal data and contest interpretations of their data.

Precision public health

One benefit of genomic data sharing is more efficient public health surveillance, which will enable better targeted and nuanced interventions, a model referred to as precision public health. Furthermore, local data can be compared with global context data (such as genomes reported as circulating in other countries), providing opportunities for research and development (Table 2). Indeed, the World Health Organization has made genomic surveillance a global health priority15. The onus to reduce the risks of data sharing is largely borne by data donors and relate to patient privacy, confidentiality and data security. The risk of re-identification of patients from publicly shared microbial genomic data is considered to be low, but the risk also depends on the sensitivity of the data. For example, a re-identification risk of 10% may be considered acceptable for SARS-CoV-2 sequence data, but for data associated with sexually transmitted infections, some guidelines recommend a risk threshold of 5%, as there is a greater potential harm16.

Genomics service providers are sequencing-data donors, while clinicians and epidemiologists are data recipients, and the two have different responsibilities and expectations of this data sharing arrangement. Laboratory data donors are increasingly concerned about the legal and ethical implications of incidental findings made by data recipients, misuse of data and profiteering from re-using data with limited attribution of intellectual property, as well as the lack of accountability of data recipients due to distributed and unpredictable data re-use.

Genomic data have been treated as an emerging asset, with privately run DNA marketplaces paying individuals for their genomic and personal data with either money or in-kind payments. As personalized medicine becomes the norm, sequence data from individual people and their pathogens may become more valuable17. Some countries assert that microorganisms located within or isolated from their territories are their sovereign property and should be protected as elements of their natural diversity — a concept referred to as microbial sovereignty18.

Microbial genomic testing has shifted from being a reasonable step to prevent the spread of infectious diseases to the standard of care for many of them. Microbial genomics data can improve infection control and prevent hospital-acquired infections. This genomic evidence on preventable transmission events in healthcare settings can also be used by parties outside healthcare systems for claims about malpractice. The importance of the responsible analysis and reporting of disease clusters using genomics data cannot be overstated. If accurate, such discoveries should improve health care delivery, but the economic and reputational repercussions of making invalid inferences from genomic data can be significant for healthcare systems, industries and nations.

New ways of managing microbial genomic data are required to maximize the benefits of such data for patients and society. There is value both in integrating microbial sequencing data into EHR for patient care and population health and in sharing genomics data that, as a consequence, should expand the secondary uses of data from EHR. The inclusion of microbial genomes into EHR should be based on the principle that such data are shared responsibly and in a timely manner. Data processing tools are needed to integrate and contextualize the clinical analyses of microbial sequences, which will reduce the complexity of sequencing data interpretation for healthcare providers and patients.