SARS-CoV-2 Genomic Epidemiology Dashboards: A Review of Functionality and Technological Frameworks for the Public Health Response

During the coronavirus disease 2019 (COVID-19) pandemic, the number and types of dashboards produced increased to convey complex information using digestible visualizations. The pandemic saw a notable increase in genomic surveillance data, which genomic epidemiology dashboards presented in an easily interpretable manner. These dashboards have the potential to increase the transparency between the scientists producing pathogen genomic data and policymakers, public health stakeholders, and the public. This scoping review discusses the data presented, functional and visual features, and the computational architecture of six publicly available SARS-CoV-2 genomic epidemiology dashboards. We found three main types of genomic epidemiology dashboards: phylogenetic, genomic surveillance, and mutational. We found that data were sourced from different databases, such as GISAID, GenBank, and specific country databases, and these dashboards were produced for specific geographic locations. The key performance indicators and visualization used were specific to the type of genomic epidemiology dashboard. The computational architecture of the dashboards was created according to the needs of the end user. The genomic surveillance of pathogens is set to become a more common tool used to track ongoing and future outbreaks, and genomic epidemiology dashboards are powerful and adaptable resources that can be used in the public health response.


Introduction
Infectious disease surveillance and public health communication play an important role in the prevention and control of epidemics [1,2].Disease surveillance data must be shared promptly and in an easily interpretable way to affect changes in the public health response [3].Public Health Informatics (PHI) is an emerging field that plays a major role in ensuring the timely dissemination of information, health promotion, and infectious disease prevention.This is achieved through the integration of information science, technology, and engineering [3][4][5].The broad scope of this emerging field covers data translation and communication infrastructure, from conceptualization to refinement and maintenance [4][5][6].An important tool utilized for communication in PHI is the computational dashboard [7].
Genes 2024, 15, 876 2 of 17 Dashboards can be described as a dynamic mode, in which information and data are presented visually [8].They provide the most pertinent information in a concise manner using data mining technologies, and this allows for an increase in decision-making power among stakeholders and policymakers [9].In epidemiological surveillance, the usefulness of a dashboard can be seen in the accountability and transparency created between researchers, public health stakeholders, and the public [7].
Coronavirus disease 2019 (COVID-19) was caused by the virus SARS-CoV-2 [10][11][12].This virus emerged in December 2019 in the city of Wuhan, China, and began as a clustered outbreak of viral pneumonia in a live animal market in the city [13].By 1 March 2024, over 774 million global COVID-19 cases and over seven million COVID-19-associated deaths had been reported [14].Genomic surveillance played an important role in understanding the COVID-19 pandemic [11,15,16].The real-time sharing and access to genomic data during the COVID-19 pandemic enabled global monitoring of the virus's spread and evolution [15].
During the pandemic, there was a significant increase in the number of sequences produced [15], with 57,845 sequences being deposited to the Global Initiative on Sharing Avian Influenza Data database (GISAID) in the first six months of 2020.By the end of the first year, there was a 20% increase in genomic sequences, with 314,310 sequences deposited.As of 1 March 2024, there are just over 16.5 million sequences deposited on GISAID [17].Global genomic surveillance data, combined with epidemiological data of the SARS-CoV-2 virus, allowed for patterns of viral transmission to be elucidated and for Variants of Concern (VOCs) to be identified and tracked.For example, South Africa played a leading role in identifying VOCs and variants of interest (VOIs), which informed public health policies (such as what vaccines to administer) [11,16,18].
In the past, genomic surveillance data were not released to the public immediately.Following the outbreaks of the Ebola and Zika viruses (in 2013 and 2015, respectively), the importance and advantage of open and immediate sharing of genomic data became apparent [19].Sharing genomic epidemiological data publicly, in a timely fashion, is particularly useful for reconstructing precise epidemic transmission dynamics and subsequently allowing informed decision-making about interventions and control measures [20,21].However, it can be difficult for the public and professionals, who do not specialize in this field, to interpret these data [22].Therefore, presenting this information in an easily interpretable manner on a dashboard would be effective for situational awareness, public health communication, and influencing policy [22,23].During the SARS-CoV-2 pandemic, genomic epidemiology dashboards were used in data-driven public health decisions.
This review aims to provide users of genomic epidemiology dashboards with an understanding of the functionality and technologies of public health dashboards, in the context of publicly accessible SARS-CoV-2 dashboards.Here, we evaluate six dashboards to provide a better understanding of the data presented, visual and functional features, and software used.We finalize this review by highlighting how dashboards can be adapted to novel, emerging, or re-emerging pathogens, such as Influenza A, dengue virus, Ebola virus, and Vibrio cholera.

SARS-CoV-2 Genomic Epidemiology Dashboards
There was an exponential increase in the number and types of dashboards utilized during the COVID-19 pandemic [8].Examples of these included epidemiological dashboards which presented case, death, hospitalization, and vaccination data, such as the Johns Hopkins [13] and World Health Organization (WHO) dashboards [14].Another important epidemiological dashboard that was used extensively during the pandemic was Our World in Data's Coronavirus Pandemic dashboard.This dashboard enabled important comparisons of key epidemiological indicators between countries [24].Mathematical modeling dashboards such as epiMOX, which also contained a predictive tool to understand the progression of the evolution of the epidemic, were also produced [25].A hospital management dashboard [26] was created by the New York Presbyterian-Columbia University Irving Medical Center to assist with the rapid reorganization and planning of nephrology services during the COVID-19 pandemic.Various open-source tools were developed for this dashboard to facilitate the management of resources and the supply chain for the hospital [26].Dashboards created during the pandemic had a significant effect on how difficult data and metrics were translated to information for the public, media, and policymakers [15].
Genomic epidemiology dashboards were also created to communicate information generated from ongoing genomic surveillance systems, given the large-scale expansion of genomic surveillance globally during the SARS-CoV-2 pandemic [21].Figure 1 shows the general process undertaken to produce genomic epidemiology dashboards.Data (sequence data and associated metadata) are typically ingested from genomic databases.The data are then processed through three general steps: cleaning, filtering, and formatting.The processed data can then be visualized in different ways using a variety of data visualization tools.Examples of the different visualizations that are produced for genomic epidemiology dashboards include maps, phylogenetic trees, and charts.The visualizations are then integrated with the computational architecture of the dashboard.The computational architecture typically includes a front-and a back-end.
ical modeling dashboards such as epiMOX, which also contained a predictive tool to understand the progression of the evolution of the epidemic, were also produced [25].A hospital management dashboard [26] was created by the New York Presbyterian-Columbia University Irving Medical Center to assist with the rapid reorganization and planning of nephrology services during the COVID-19 pandemic.Various open-source tools were developed for this dashboard to facilitate the management of resources and the supply chain for the hospital [26].Dashboards created during the pandemic had a significant effect on how difficult data and metrics were translated to information for the public, media, and policymakers [15].
Genomic epidemiology dashboards were also created to communicate information generated from ongoing genomic surveillance systems, given the large-scale expansion of genomic surveillance globally during the SARS-CoV-2 pandemic [21].Figure 1 shows the general process undertaken to produce genomic epidemiology dashboards.Data (sequence data and associated metadata) are typically ingested from genomic databases.The data are then processed through three general steps: cleaning, filtering, and formatting.The processed data can then be visualized in different ways using a variety of data visualization tools.Examples of the different visualizations that are produced for genomic epidemiology dashboards include maps, phylogenetic trees, and charts.The visualizations are then integrated with the computational architecture of the dashboard.The computational architecture typically includes a front-and a back-end.Genomic epidemiology dashboards can be broadly classified into those that display phylogenetic analyses, genomic surveillance data, and mutational data (Figure 1, Panel 5).These dashboards integrate specialized genomic analyses but present them in an interactive and easy-to-interpret way.Phylogenetic analyses are useful on a genomic epidemiology dashboard, as they depict the evolutionary history of the virus.The information produced from phylogenetic analyses is important when monitoring the genetic changes in emerging variants of a virus, which affects vaccine efficacy, rate of transmission, and viral antigenicity [27].This information can also help characterize the levels of community transmission in a particular location or viral movements between locations.Genomic surveillance data on a dashboard are important, as they present temporal, spatial, and variant Genomic epidemiology dashboards can be broadly classified into those that display phylogenetic analyses, genomic surveillance data, and mutational data (Figure 1, Panel 5).These dashboards integrate specialized genomic analyses but present them in an interactive and easy-to-interpret way.Phylogenetic analyses are useful on a genomic epidemiology dashboard, as they depict the evolutionary history of the virus.The information produced from phylogenetic analyses is important when monitoring the genetic changes in emerging variants of a virus, which affects vaccine efficacy, rate of transmission, and viral antigenicity [27].This information can also help characterize the levels of community transmission in a particular location or viral movements between locations.Genomic surveillance data on a dashboard are important, as they present temporal, spatial, and variant proportion and prevalence information about the genomes that are sequenced.This can provide a succinct perspective of the spread of genomic variants and can inform further surveillance and intervention efforts [28].Mutational data of viruses are useful to present on dashboards, as they provide insight into viral virulence, infectivity, and transmissibility [29].This is achieved through a variety of methods, such as Single Nucleotide Polymorphism (SNP) calling, relative growth advantage, and machine learning models, to predict mutations [29,30].
In this review, we selected a total of six publicly available SARS-CoV-2 dashboards that fell into the different genomic epidemiology dashboard categories.We prioritized choosing dashboards with diverse characteristics to achieve a well-rounded understanding of the utility of genomic epidemiology dashboards.The phylogenetic dashboard category included Nextstrain's and Cluster-Tracker's dashboards.Nextstrain's Genomic Epidemiology of SARS-CoV-2 was selected as it displays global phylogenetic trees, nucleotide diversity, and frequencies for SARS-CoV-2.Cluster-Tracker was another diverse dashboard selected, as it displays information from a phylogenetic-informed summary heuristic for the United States of America (USA) only.The genomic surveillance dashboards included the Microreact, Wellcome Sanger Institute, and the SARS-CoV-2 Africa dashboards.Mi-croReact's Global SARS-CoV-2 dashboard is a simple, one-page dashboard that displays global genomic information on the virus's variants geographically and temporally.The Wellcome Sanger Institute's COVID-19 Genomic Surveillance dashboard was selected as it displays genomic and estimated case information for the United Kingdom (UK) only.The SARS-CoV-2 Africa dashboard was selected as it focuses on SARS-CoV-2 genomic epidemiology for Africa only from a temporal and geographical view.A mutational dashboard included in this review is the CoV-Spectrum dashboard.CoV-Spectrum is a detailed dashboard focusing on the globally known SARS-CoV-2 variants, their amino acid and nucleotide substitutions, deletions, and insertions.
We evaluated three major properties of genomic epidemiology dashboards, with a closer look at specific elements within these properties.These properties included the data, visual and functional features, and computational architecture on the genomic epidemiology dashboards.The selected six SARS-CoV-2 dashboards were drawn on as examples when discussing the specific elements within the overarching properties, and a comprehensive summary is provided in Table 1.

Data Sources
The quality of data determines the reliability of a dashboard.Presenting reliable data in an easy-to-interpret manner allows the dashboard to be used as a data-driven tool in policy decision-making.This would alleviate information asymmetry between multiple stakeholders [7,31].
The two most common databases used to provide sequence data and support genomic analyses on the selected genomic epidemiology dashboards are GenBank and GISAID (Table 1) [32,33].These databases have different approaches to how data can be reshared, with principles based on ensuring open science and data sharing to restricting redistribution to protect the data submitters [34].GenBank adopts open data principles as implemented by the International Nucleotide Sequence Database Collaboration (INSDC).INSDC advocates for open science and data sharing of published or unpublished results [35].However, the immediate sharing of SARS-CoV-2 genomic data could potentially leave data submitters vulnerable to "scooping", if analyses and publications are not produced timeously [34].Thus, databases sharing genomic data can choose to restrict the distribution of the available data.For example, GISAID encourages submitters to share unpublished data, as the database prohibits users from redistributing the metadata or sequence data [34].This enables genomic epidemiology dashboards to present unpublished data in near-real time, but they cannot be downloaded and shared.
Nextstrain's dashboard is an example that utilizes both GenBank and GISAID data.The Nextstrain Team has taken it a step further and curated open datasets from GenBank.This allows users of these data to have a preprocessed starting point from which other analyses can be conducted [34].CoV-Spectrum is an example of a dashboard that utilizes the Nextstrain datasets as a data source [20].The SARS-CoV-2 Africa dashboard is an example that sources data from GISAID only [36].Microreact and Wellcome Sanger Institute's dashboards present data sourced from the COVID-19 Genomics UK Consortium (COG-UK) [37,38].These data are stated to be only a subset of what is made available by COG-UK and do not represent a complete record of what is available on public databases.Cluster-Tracker is a dashboard that incorporates sequence data from GenBank, GISAID, and COG-UK [39].The genomic epidemiology dashboards that present data from GenBank, Nextstrain, or COG-UK provide the option for downloadable metadata.

Data Processing
How source data are processed is another factor that affects the quality of information presented on a genomic epidemiology dashboard.It is important to ensure the source data are processed appropriately; otherwise, it can lead users to arrive at incorrect conclusions [7,40].Panel 1 in Figure 1 provides general steps that take place during data processing, often using RStudio and the Python programming language to achieve this.It is beneficial for dashboards to describe the data processing steps taken, whether as a disclaimer on a web page or in a publication.Providing this information enables users to produce datasets of the same standard for similar analyses.For example, Nextstrain's genomic epidemiology dashboard provides detailed documentation on the workflow and tools used to process the source data on the web page.On the other hand, Cluster-Tracker's and the SARS-CoV-2 Africa's dashboards detail the data processing steps in a publication [36,39].

Geographic Level
Dashboards can be designed for various geographic levels depending on the overall purpose of the dashboard, the users it was created for, and the level of data granularity (Table 1).For example, Wellcome Sanger Institute's dashboard presents genomic surveillance data for each local authority in the UK [38].This level of data granularity is useful for users, as it provides a more detailed perspective on the key performance indicators (KPIs) of the dashboard.Another example of a dashboard displaying genomic surveillance data for a different geographical level is the SARS-CoV-2 Africa dashboard.This dashboard is useful, as it provides a dynamic perspective of genomic surveillance on the African continent [36].

Key Performance Indicators
Dashboards present data in a specific set of metrics, known as KPIs, which allow users to utilize the information in decision-making processes [41].In the context of genomic epidemiology dashboards, KPIs are key metrics that clearly define the outbreak or epidemic situation [42].These include, for example, the number of genomes produced, the proportion of variants over time, and the mapping of mutations within the genome.
An important part of the dashboard design process is determining the number of KPIs to include.A study by Nijkamp and Kourtit (2022) refers to the "magical number seven" [31].This principle is based on experimental research and explains that any decision that considers more than seven objectives will be flawed and inconsistent.Thus, genomic epidemiology dashboards that have more than seven KPIs can confuse users and prevent them from making informed decisions [43].
The type of KPIs presented vary over the category of genomic epidemiology dashboards, as shown in Figure 2. Genomic surveillance data dashboards typically include KPIs, such as the number of genomes produced, the proportion of variants, and the number of variants per location (region, country, state, etc.).Phylogenetic dashboards include KPIs such as the nucleotide diversity of the genome, best potential origins, and best origin regional indices.Mutational dashboards present KPIs, such as relative growth advantage, estimated cases, reproduction number, and specific information about the mutations over time.It should be noted that a genomic epidemiology dashboard can also present KPIs from across the different categories.
Genes 2024, 15, x FOR PEER REVIEW 8 of 17 (KPIs) of the dashboard.Another example of a dashboard displaying genomic surveillance data for a different geographical level is the SARS-CoV-2 Africa dashboard.This dashboard is useful, as it provides a dynamic perspective of genomic surveillance on the African continent [36].

Key Performance Indicators
Dashboards present data in a specific set of metrics, known as KPIs, which allow users to utilize the information in decision-making processes [41].In the context of genomic epidemiology dashboards, KPIs are key metrics that clearly define the outbreak or epidemic situation [42].These include, for example, the number of genomes produced, the proportion of variants over time, and the mapping of mutations within the genome.
An important part of the dashboard design process is determining the number of KPIs to include.A study by Nijkamp and Kourtit (2022) refers to the "magical number seven" [31].This principle is based on experimental research and explains that any decision that considers more than seven objectives will be flawed and inconsistent.Thus, genomic epidemiology dashboards that have more than seven KPIs can confuse users and prevent them from making informed decisions [43].
The type of KPIs presented vary over the category of genomic epidemiology dashboards, as shown in Figure 2. Genomic surveillance data dashboards typically include KPIs, such as the number of genomes produced, the proportion of variants, and the number of variants per location (region, country, state, etc.).Phylogenetic dashboards include KPIs such as the nucleotide diversity of the genome, best potential origins, and best origin regional indices.Mutational dashboards present KPIs, such as relative growth advantage, estimated cases, reproduction number, and specific information about the mutations over time.It should be noted that a genomic epidemiology dashboard can also present KPIs from across the different categories.

Visual and Functional Features
The visual features on a dashboard refer to how information is presented, and the functional features refer to how the dashboard carries out its tasks.A user needs to consider the types of analyses and filtering options of a genomic epidemiology dashboard.This will affect the user's dashboard choice when employing it in different situations.

Visual and Functional Features
The visual features on a dashboard refer to how information is presented, and the functional features refer to how the dashboard carries out its tasks.A user needs to consider the types of analyses and filtering options of a genomic epidemiology dashboard.This will affect the user's dashboard choice when employing it in different situations.

Analyses
There are a variety of analyses that can be presented within each category of genomic epidemiology dashboards.The types of analyses and visualizations are summarized in Table 1.
Phylogenetic trees are the most common visual feature found on genomic epidemiology dashboards to present information from phylogenetic analyses.Nextstain's dashboard contains an interactive and customizable phylogenetic tree, which can be seen in Panel B in Figure S1.The subsampling for the phylogenetic tree can be changed in Panel A under the "Dataset" section.The subsampling period determines the time frame which the global sequences are selected from.This will form the focus of the phylogenetic tree with contextual sequences that accurately represent the pandemic before the subsampling period.An interactive phylogenetic tree would be useful to various users, depending on the type of information taken from it.For example, evolutionary virologists would benefit as it provides in-depth genetic history information, whereas an epidemiologist would be interested in the emerging variants to inform outbreak management [44].
Cluster-Tracker is an example of a genomic epidemiology dashboard that displays phylogenetic information without using a phylogenetic tree.Cluster-Tracker is an open-source dashboard that visualizes the results of a phylogenetically informed summary heuristic for the USA.This dashboard prioritizes communicating the results of this heuristic model, which are clusters that represent the number of introductions of the SARS-CoV-2 virus into a State; this allows for major transmission clusters of that State to be represented.Cluster-Tracker contains one visualization; an interactive map of North America is displayed, with emphasis and coloring only present in the USA (Figure S2).The map is colored according to a color ramp indicating the number of clusters.The period of this interactive visualization can be controlled by the options in the top left-hand corner.The user can extract a large amount of information for each state on the map.This includes the log-fold enrichment of introductions into the selected state, and the number of introductions from other states into the selected state (Figure S2) [39].Although the phylogenetically informed summary heuristic is complex, the visualization allows users to track the introductions of SARS-CoV-2 into each state, without any specialized knowledge.
An example of the types of analyses a user can find on a genomic surveillance dashboard can be seen on the SARS-CoV-2 Africa dashboard.This dashboard presents information produced by SARS-CoV-2 genomic surveillance programs on the African continent.This dashboard aims to produce an interactive perspective of the SARS-CoV-2 variants circulating on the African continent [36].Panel B in Figure S3 is a map of Africa denoting the number of genomes each country has produced, through a gradient color ramp.This map is animated and can be controlled with the play, pause, and stop buttons located beneath it.There is also a sliding time window that allows the user to toggle the date range of the data displayed.Panel C is a stacked proportional bar chart depicting the proportion of circulating lineages and variants.Panel D is a strip chart displaying the genomic sequence data available for individual African countries.This figure is a jittered scatter plot that displays the number of genomes as colored dots, where the color represents the variant of the genome; each dot represents a variable number of genomes (Figure S3) [36].
CoV-Spectrum is a dashboard that presents mutational information about the SARS-CoV-2 genome and identifies novel VOCs.This is achieved by providing a summary of the raw data and presenting subsequent statistical analyses.Figure S4 shows the three main interactive panels (B-D) that can be found on the CoV-Spectrum dashboard.Panel B, titled "Known Variants", presents a grid of possible variants that can be explored individually.After the selection has been made, the user is redirected to another dashboard page.This page is where a user can access a vast amount of mutational data about a specific variant in a summarized format.A few examples of the data provided on each variant include the estimated cases from that variant, relative growth advantage with three different mechanisms, reproduction number, mutations over time, nucleotide entropy, insertions, substitutions, and deletions.These analyses are useful to genomic specialists looking to gain a deeper understanding of each monitored variant of SARS-CoV-2 [20].

Filtering Options
There is a wealth of information produced by genomic surveillance systems, and this is displayed on genomic epidemiology dashboards in an aggregated fashion.However, the dashboard must provide users with the option to further drill down into the data [40].This enables the user to interact with the data and increases the transparency between the researcher and various stakeholders [7,31].Advanced filtering options on genomic epidemiology dashboards ensure there are drill-down features for aggregated information that can be explored by the user.
There are different filtering options for data across the categories of genomic epidemiology dashboards (Table 1).Filtering options on phylogenetic dashboards can include the customization of a phylogenetic tree.This can be seen on Nextstrain's dashboard, where the user can navigate to the "Tree Options" found in Panel A (Figure S1).The customization options include changing the layout, branch length, branch labels, and tip labels of the tree.Customizing the tree can provide the user with valuable information that can be used to inform various decisions.For example, changing the tips colors to represent emerging variants can provide the user with information on which emerging variants are found in the global sample.Public health laboratory managers could use this information, for example, to request for more samples to be sequenced; this could assist in determining if the emerging variant in the global dataset correlates with the variants found in a specific country [44].
The filtering options found on genomic surveillance dashboards typically involve temporal, geographical, and variant filter options.For example, the Wellcome Sanger Institute's dashboard contains three main filters; these can be seen in Panel A-C (Figure S5).The first filter (A) controls the timescale of the data that are presented.The second filter (B) allows the user to select a local authority or postcode from England.The third filter (C) allows for the lineages of SARS-CoV-2 to be selected.There is a lineage option map that appears in Panel D of the dashboard, once the "Lineages" settings button, in Panel C, is selected (Figure S5).This option is useful, as it enables users to drill down into temporal, geographical, and estimated case data for a specific variant.
Mutation dashboards also contain filtering options to enable users to interact with and subset the data presented on the dashboard.Figure S6 shows the advanced filtering option page of CoV-Spectrum.These advanced filters deal with hosts, submission dates of sequences, and various sequence quality scores (overall, missing data, mixed sites, private mutations, SNP clusters, frameshifts, stop codons scores, and coverage).CoV-Spectrum has a comparative analysis option, allowing the dashboard to enable comparisons of data for different variants or multiple variants against a baseline [45].This is particularly useful for mutation dashboards, as the volume of information presented can be overwhelming.This provides users with the option to explore specific mutation data or to control which variants are compared [45].

Computational Architecture
The programming language used to develop each type of genomic epidemiology dashboard is variable and is usually based on the discretion of the developers (Table 1).There is no standard for the type of programming language used, but it is rather dependent on what the dashboard aims to achieve.There can also be variability in the programming languages used between the user interface (front-end) and the processing layer (backend) of the dashboard.This allows developers to utilize different libraries on each end to achieve the desired result (Table 1).Panel 3 in Figure 1 presents examples of libraries and software that can be used to create the visualizations on the dashboard.Panel 4 in Figure 1 displays the various options of developers when creating the front-and back-end of genomic epidemiology dashboards.An example of this can be seen in Cluster-Tracker [39].The back-end of the dashboard is coded in Python, whereas the front end is coded in Vanilla JavaScript, which can be described as plain JavaScript that does not contain external frameworks or libraries [39,46].The interactive visualization on the dashboard uses the JavaScript library Leaflet.The interactive table found beneath the map visualization is enabled through DataTables, which is a plug-in for the JavaScript library, jQuery.The input files required to run these two interactive analyses are a phylogenetic tree (a Newick file), sample identification labels, and geographic information in a GeoJSON file [39].The SARS-CoV-2 Africa dashboard is another relevant example that utilizes Python for the front-and back-end.In this dashboard, the web interface and its components were coded using Streamlit, which is a Python-based framework for web applications.The back-end is a combination of pure Python and Pandas, used to process and format data coming from GISAID.The visualizations on the dashboard were generated using Plotly [36].
Genomic epidemiology dashboards can also be supported by custom software.An example of this is Nextstrain's dashboard, which is maintained by the ncov workflow.This workflow utilizes Augur subcommands, required for the bioinformatic pipeline, and Auspice to produce the dashboard [36].The ncov workflow is managed by Snakemake workflow management software [47].Augur is a bioinformatics toolkit.It allows for the phylogenetic analysis of human pathogens, as various subcommands are packaged into one command line interface tool [48].Certain Augur subcommands have integrated pre-existing bioinformatics tools such as FastTree [49], RAxML [50], MAFFT [51], TreeTime [52], and IQTREE [53].Finally, Auspice is used to view the annotated phylogenies in a Graphical User Interface (GUI) such as a Nextstrain dashboard [48].The ncov workflow is open-source, allowing users such as bioinformaticians to provide it with their dataset and view it in a Nextstrain dashboard.Thus, phylogenetic analyses can be carried out with greater ease, as all the bioinformatics tools (pipeline and visualization) are encompassed in one workflow.Augur and Auspice make it possible for any bioinformatician to create a phylogenetic workflow for any pathogen, analyze the data, and visualize them in a dashboard.

Data Storage and Management
Genomic epidemiology dashboards can employ various ways to store and manage the data that are presented.The data can be retrieved from various storage and management systems through Application Programming Interfaces (API), which allow the dashboard to be updated regularly.
Firstly, the simplest way to store and provide data to a dashboard is through traditional file systems.The retrieved and processed data can be converted into data files (such as Excel, comma-separated values, text, and JSON) and then stored in local directories.Thus, dashboard applications will be able to read the data files by using native or third-party libraries.The SARS-CoV-2 Africa dashboard is an example that employs this system.The dashboard then utilizes an API agreement with GISAID that allows for data retrieval every day [36].
Secondly, data can be stored and managed using databases.Integrating a database system into a genomic epidemiology dashboard is effective for storing and managing large amounts of data.This prevents the dashboard from using files to store, read, process, and write data [54].There are various types of databases used by genomic epidemiology dashboards (Table 1).The first type of database employed is the traditional Relational Database Management System (RDBMS).For example, CoV-Spectrum utilizes a PostgreSQL database [20].This database is open-source and does not require a license.This type of database follows a server/client model, which allows for a lighter client library and ensures that clients are not affected by changes in the database engine [55].Additionally, to retrieve information from the PostgreSQL database, the CoV-Spectrum dashboard uses two REST Application Programming Interfaces (API) for sequence and non-sequence data retrieval.The sequence data are queried using the Lightweight API for Sequences (LAPISs).The non-sequence data are retrieved from a server application managed by CoV-Spectrum.These servers are coded using the Spring Boot framework in Java and Kotlin [20].
The second type of database used in genomic epidemiology dashboards includes a non-relational database.For example, Microreact employs a MongoDB database to store the dashboard's data [37].A MongoDB database includes the use of documents when storing data; these documents are comparable to JavaScript Object Notation (JSON) objects [56].MongoDB is a non-relational database that offers advantages over traditional relational databases.These include the ability to store data with no schema enforcement, which allows it to be more flexible and mirror different data types, ensuring better performance overall [56].Relational and non-relational database records can be used in a variety of programming languages.However, the flexibility of the non-relational database makes it better suited for use cases where the data structure is dynamic and may be represented in different ways.

Open-Source and Adaptability
Software can be defined as open-source if its source code is available.This enables users and developers to produce their dashboards by adapting, customizing, or improving the original code.[20,39,44].The SARS-CoV-2 Africa dashboard is an example of a dashboard that became an opensource software by releasing the code as the Genomic Dash framework.This is useful as users can download the Genomic Dash GitHub repository (https://github.com/BIAlab/genomic-dash,accessed on 12 March 2024), use their pathogen data in the set format, and create their pathogen dashboard with the same analyses and visualizations as the SARS-CoV-2 Africa dashboard [36].
Genomic epidemiology dashboards can also be closed source, which prevents public access to their source code.This could be attributed to various concerns, such as privacy issues regarding sensitive information.Another concern includes the ownership of genomic data when heterogeneous data sources are utilized to curate the dashboard dataset [57].Microreact is an example of a dashboard that is not open source but allows users to upload their pathogen data to create a replica of the Microreact dashboard.This option would be useful for users with limited computational skills who need to visualize data [37].A drawback of this option prevents users from customizing the analyses displayed on the dashboard.

Discussion
SARS-CoV-2 genomic epidemiology dashboards are diverse, but they can be broadly classified into the three broad categories of phylogenetic, genomic surveillance, and mutational dashboards.Understanding the information presented on each dashboard category allows users to make informed decisions as to which dashboard category to use to obtain the desired information.Phylogenetic dashboards would be used by genomic specialists, such as medical virologists or epidemiologists who want to understand the evolutionary history and transmission dynamics of the virus.Genomic surveillance dashboards are designed for users ranging from laymen (public, public health officials, stakeholders, local or national ministers of health) who want to view a summary of the landscape of the epidemic or outbreak in visual form.This type of dashboard can also be used by genomic specialists who want a spatiotemporal overview of the SARS-CoV-2 variants.Lastly, mutational dashboards can be utilized by molecular biologists or virologists who can understand the large amount of mutational variation on each variant.Other users include laboratory staff, bioinformaticians, geneticists, pathologists, clinicians, infectious disease specialists, etc.
The genomic epidemiology dashboards evaluated in this review utilize reliable data sources and present diverse genomic information.The data found on genomic epidemiology dashboards are sourced from the three most used genomic databases-GenBank, GISAID, and COG-UK [58].Genomic epidemiology dashboards in the various categories present different KPIs, analyses, and filtering options.These attributes may be similar in dashboards of the same category.Users benefit from understanding these attributes in each category so that an informed decision can be made about which would be the most useful in a given scenario.Other users may want to develop their dashboards to present different pathogen data.Therefore, it would be beneficial for the user to understand the computational architecture, data storage options, and adaptability of current genomic epidemiology dashboards.These genomic epidemiology dashboards can provide information to a variety of audiences, ranging from laymen to scientific specialists.This diversity of information provides a well-rounded understanding of the genomic epidemiology of the virus.
Genomic epidemiology is a powerful resource that can be used to inform, understand, and respond to future outbreaks and epidemics [59,60].The conclusions drawn from these analyses enable the early detection of emerging pathogenic variants, elucidate the spatiotemporal patterns of pathogenic variants, inform genomic surveillance programs, and initiate epidemiological investigations [61,62].As genomic epidemiology is continually used to track outbreaks and epidemics, the success of the surveillance programs and analyses can be directly linked to the timeliness in which the data are released [15].Thus, dashboards that characterize the genomic epidemiology of ongoing, emerging, and reemerging pathogens are useful tools that can influence public health decisions, aiding in the response to future outbreaks and epidemics.
An important strength of genomic epidemiology dashboards lies in their adaptability to present information on novel, emerging, or re-emerging pathogens.Nextstrain and Microreact are the best examples of genomic epidemiology dashboards that have adapted their computational architecture to communicate surveillance data of different pathogens in near-real time.Nextstrain was produced to serve as a model for sharing public data.It initially shared phylogenetic analyses of dengue, seasonal and avian influenza, Zika, and Ebola [63].The Nextstrain team was then able to quickly pivot to novel and emerging viruses, such as SARS-CoV-2 and Monkeypox, due to their robust computational architecture.Nextstrain has also created an option for users to create dashboards with their genomic surveillance data.There is extensive documentation provided by the Nextstrain team that enables users to create their own pathogen bioinformatic workflows, resulting in the production of a genomic epidemiology dashboard [63].
Microreact was a tool produced for researchers to share open genomic epidemiology data visualizations.This enables public sequencing projects of pathogens to be worked on in a collaborative and interdisciplinary effort [37].This framework and tool were well established before the SARS-CoV-2 pandemic began, so Microreact was able to swiftly and successfully adapt it to present SARS-CoV-2 genomic data [64].In addition, there are currently 11 published dataset dashboards that are openly available on the Microreact web application, such as the West African Ebola epidemic [65], Zika virus in the Americas [66], and V. cholera [67].
Genomic epidemiology dashboards can be used to respond to future outbreaks and epidemics by presenting data in near-real time.For example, the dengue virus continues to cause ongoing epidemics globally [68].The London School of Hygiene and Tropical Medicine is currently working on the first real-time database to communicate dengue outbreak forecasting information [69].Another pathogen that caused rapid global spread in 2022 was the mpox virus [70].Nextstrain was successful in producing a genomic epidemiology dashboard to monitor the ongoing outbreaks (https://nextstrain.org/mpox/ clade-IIb, accessed on 30 March 2024) [63].
The usefulness of genomic epidemiology dashboards is apparent; however, some limitations exist (Table 1).Firstly, dashboards are rendered irrelevant if they are not updated and maintained.For example, the Microreact and COVID-19 Genomic Surveillance dashboard by Wellcome Sanger Institute was last updated in February 2022 and February 2023, respectively (Table 1).Thus, the usefulness of genomic epidemiology dashboards can be short-lived and only relevant during an outbreak or epidemic.Secondly, dashboards do not allow for the data to be downloaded if they are sourced from GISAID.This prevents users from accessing and interacting with the data being presented.It is important to

Figure 1 .
Figure 1.Process involved in the design and production of a genomic epidemiology dashboard.1-Data collection, 2-processing of the source data, 3-analyzing and visualizing the data, 4-computational architecture development of the front-and back-end, and 5-different categories of genomic epidemiology dashboards.

Figure 1 .
Figure 1.Process involved in the design and production of a genomic epidemiology dashboard.1-Data collection, 2-processing of the source data, 3-analyzing and visualizing the data, 4-computational architecture development of the front-and back-end, and 5-different categories of genomic epidemiology dashboards.

Figure 2 .
Figure 2. Key performance indicators (KPIs) of the different types of genomic epidemiology dashboards.

Figure 2 .
Figure 2. Key performance indicators (KPIs) of the different types of genomic epidemiology dashboards.

Table 1 .
Summary of the data, visual features, functional features, and technologies behind six SARS-CoV-2 genomic epidemiology dashboards.