Introduction

Vector competence is an arthropod vector’s ability to transmit a pathogen after exposure to the pathogen1,2,3. It combines the intrinsic potential of a pathogen to successfully enter and replicate within the vector, and then disseminate to, replicate within, and release from the vector’s salivary glands into the saliva at sufficiently high concentration to initiate infection in the next vertebrate host. Quantifying this process at each step within the vector is fundamental to understanding and predicting vector-borne disease transmission.

Due to the inherent complexity of arboviral transmission, experimental studies of vector competence are also necessarily complex, and may report a number of types of data. Experimental settings add additional constraints, as controlled laboratory conditions are themselves inherently complex, and vector competence is highly responsive to some of these conditions (e.g., the temperature at which experiments take place). While the complexity and requisite scientific skills make these experiments challenging, their importance and value – particularly in response to vector-borne disease outbreaks of international concern – cannot be overstated, and has led to increasing numbers of these experiments. However, the complexity of the experiments, and the variety of conditions under which they are conducted, make it difficult to meticulously share (and synthesize) all relevant metadata, especially with consistent enough terminology to compare results across studies4. Because primary data are not reported in a standardized manner, opportunities are being lost to advance science and public health.

Here, we propose a minimum data standard for reporting the results of vector competence experiments. The motivation to create and disseminate data standards for reporting is part of a broad effort across scientific disciplines to preserve data for future use, recover existing data that may be unsearchable for many reasons, and establish open principles for harmonizing those data to better leverage the effort of the larger community of research5,6,7,8,9,10. In particular, the FAIR (Findability, Accessibility, Interoperability, and Reusability) guiding principles11,12 were created to improve the infrastructure supporting the reuse of scholarly data, including public data archiving13. These principles aim to maximize the value of research investments and digital publishing, and have been adopted into both efforts to synthesize and populate databases for use by the scientific community, and into the language of a growing number of funders’ reporting requirements. Tailoring FAIR principles to different subfields of scientific research requires consideration of the specific kinds of data that are regularly generated, and how they would best be reported. For example, the recently published minimum data standard MIReAD (Minimum Information for Reusable Arthropod abundance Data) aims to improve the transparency and reusability of arthropod abundance data14, thereby improving the benefits reaped from data sharing, and reducing the cost of obtaining research results. Importantly, these data standards do not aim to provide guidance on how experiments are conducted, nor guide research, but provide a reporting standard flexible enough to accommodate the outputs of most of these experiments.

In this paper, we characterize the key steps of vector competence experiments, and the data generated at each stage, as a means to establish common guidelines for data reporting that follow FAIR principles. Due to the long history of experimental work with mosquito vectors (and the incomparable role it plays in efforts to decrease the global burden of vector-borne disease), we propose a minimum data standard focused on capturing results from studies that test pairs of mosquitoes and arboviruses. However, we intentionally aimed to make these standards flexible, extendable, and adaptable, and therefore applicable to additional systems (e.g., experiments with ticks and other vectors, or mechanical transmission components of Chagas disease by triatomine insects).

Methods

Tables 14 provide a standard checklist for data that arise from, and metadata about, vector competence experiments (and a blank Excel file with these columns is available as Supplementary File 1, for researchers to use directly as a template when reporting primary data along with publications). We have designed these standards with a particular focus on applicability to mosquito-borne arboviruses, and on capturing aspects of experimental design that are known confounders (e.g., rearing and experimental temperature, or inoculation route and dose)15,16,17,18,19. While reviewing the literature to design the standard, we found that many of the rates reported (e.g., transmission rate) are derived from discrete and detailed experimental information, yet the original raw data may never be reported, and is often impossible to reconstruct from provided bar or line charts. Moreover, the derived quantities often follow different calculations, with (usually intentional but) very different biological meaning (e.g., the difference between ‘dissemination rate’ and ‘disseminated infection rate’ which are often used interchangeably) (Fig. 1)20. Given these choices, it may be misleading to directly compare derived rates across studies. To avoid this problem, we suggest that reporting raw numbers of both vectors tested and those found positive for each basic metric may prevent confusion across study terminology, while still allowing derived rates to be calculated and reported in publications.

Table 1 A minimum standard for vector metadata.
Table 2 A minimum standard for virus metadata.
Table 3 A minimum standard for experimental metadata.
Table 4 A minimum standard for experimental outcome data.
Fig. 1
figure 1

(A) For mosquito-borne viruses, vector competence experiments follow a relatively standardized format. Mosquitoes are inoculated with a virus through intrathoracic inoculation or by feeding on a live host or a prepared blood meal; infection and dissemination are measured by testing different mosquitoes tissues; and transmission is measured either by testing saliva or salivary glands, or by allowing mosquitoes to feed on a susceptible host and infect them. (B) The results are best understood as rates, but each rate might be reported in several formats; this is further complicated if only a subset of mosquitoes are tested at each stage (e.g., if some mosquitoes die between stages of the experiment). As a result, reporting only denominators leaves much to be desired. Instead–as our data standard reflects–the clearest presentation of raw data is to report total counts of tested and positive mosquitoes at each stage. (“+” indicates how many mosquitoes test positive out of the total sample). Created with BioRender.com.

Finally, we note that our goal here is only to provide a minimum standard for even the most basic experiment; more specialized designs may require additional columns, and bespoke solutions to those problems may, as they are developed, become future standardized templates. For example, experimental designs focused on coinfection with additional microbes (e.g., Wolbachia; insect-specific viruses, or ISVs) will likely need to reproduce many of the “Virus metadata” variables as a set of “Coinfection metadata” variables, and might also require additional fields. Once standardized, this could be incorporated into a future version of the base template, encouraging more researchers to assay and report the microbes present in laboratory populations, and thereby reducing unquantified heterogeneity among experimental designs.

Results

To illustrate the data standard in practice, we revisit a study by Calvez et al.21 of vector competence for Zika virus in Aedes mosquitoes relevant to Pacific islands. Unlike many studies, which report results in a mix of summary tables and bar or line graphs, Calvez et al. provided very detailed summaries of raw data in their supplementary tables (Table 5). Because they report results in a structured format, with detailed data on the experimental results, other studies have been able to gather their findings alongside other studies (e.g., Table 6). However, these aggregate datasets often lack important dimensions of metadata. To illustrate how researchers might report primary results in the future, we present a metadata-complete version of the results from Calvez et al., that meets the minimum data standard we propose, as interpreted from both their Methods and Supplementary Table 1 (Fig. 2)22. In rare cases where information was unavailable (e.g., detailed locality information on the origin of mosquitoes), we use “none” to indicate that no data was provided.

Table 5 An example dataset from a set of vector competence experiments with Aedes aegypti mosquitoes and Zika virus, as reported in Supplementary Table 1 of Calvez et al.21 Additional details on experimental protocols are provided in the methods section, and the study reports an additional set of experiments with Aedes polynesiensis mosquitoes as well (not shown).
Table 6 An example of how the same data (Table 5) could currently be reported in a synthetic format, reproduced in the same format from a table of the results of Aedes aegypti vector competence experiments from several studies, assembled by Souza-Neto et al.27.
Fig. 2
figure 2

The same dataset (Table 5) in a metadata-complete format with standardized columns, reporting (a) ID’s for experimental group, and vector species and vector metadata; (b) virus species and viral metadata; (c) experimental protocols; and (d) the standard results in infection/dissemination/transmission, with clear data on diagnostics and denominators.

Discussion

Vector competence experiments can have very real-world and urgent applications, informing how health decision-makers assess risks like “Are temperate vectors permissive to a tropical outbreak spreading north?”23,24 or “Is an ongoing epizootic likely to spill over into humans?”25,26 However, a lack of standardized data reporting is a barrier to reuse and synthesis in this growing field4. In turn, current efforts largely remain disconnected from one another, without any central repository that immortalizes these studies’ findings. Some studies have begun to scale this gap: one study compiled a table of results from several dozen studies of Aedes aegypti and various arboviruses (see Table 6)27. More recently, another study compiled a dataset of 68 experimental studies that tested 111 combinations of Australian mosquitoes and arboviruses, and analyzed biological signals in the aggregated data28. These types of efforts are painstaking, requiring substantial manual curation of metadata, and hundreds more experiments are reported in the literature, yet remain unsynthesized due to this barrier.

Going forward, adopting a data reporting standard might make it easier for researchers to share data in reusable formats, and – in doing so – would support the creation of a database following this format. This could also help explain or resolve the issue of why results across studies are inconsistent, especially historical studies and newer studies, which often use newer and more sensitive techniques (e.g., qRT-PCR as compared to PFU). Storing these data in aggregate would facilitate formal meta-analysis and create new opportunities for quantitative modeling. It would also have practical benefits for researchers, assisting them in disseminating their findings, and potentially reducing duplication of research. To that point, a recent synthetic study found that while some combinations (e.g., Ae. aegypti and Zika virus) are extremely well studied, over 90% of mosquito-virus pairs might never have been tested experimentally. Standardizing data more broadly might help researchers identify and fill these gaps, simultaneously supporting infectious disease preparedness and fundamental research into the science of the host-virus network29.