Virome genomics: a tool for defining the human virome

Highlights • High-throughput sequencing can be used to study virus–host interactions.• Sequencing can be used for virus discovery and detection of emerging pathogens.• Sequencing can identify viruses associated with diseases of unknown etiology.• Components of the microbiome interact dynamically.• Sequencing has potential utility for diagnosis of viral infections.


Introduction
The dynamic interactions between viruses and their hosts during infection are complex. Viruses manipulate the cellular environment and subvert host immune responses in order to replicate, and the host counters the virus's maneuvers in order to control the infection. The interplay between virus and host can be studied in many ways: natural infections; model systems (either animal or cell culture); manipulation of virus-host interactions; identification of the proteins involved in virus-host interactions; and studies of protein functions. High throughput, deep sequencing is a powerful tool for gaining insights into virus-host interactions. Sequencing assays can predict novel viruses and describe the genomes of novel and known viruses. Genomic information can be used to discover viral proteins that can then be characterized, describe genes in the host that are important in controlling infections, and evaluate gene expression of viruses and hosts during infection. Sequencing can also assess variation and evolution of viruses during replication and transmission. This review recounts some of the major advances in the studies of virus-host interactions from the last two years, and discusses the uses (or potential uses) of sequencing technologies relating to these studies ( Figure 1).

Virus discovery and emerging pathogens
In order to understand how viruses interact with their hosts and how they affect human health, we must understand the scope of viral diversity and be able to detect the viruses present in clinical samples. High-throughput, deep sequencing has proven to be an effective tool for this purpose. The relatively unbiased approach it offers for screening clinical samples enables virus discovery without preconceptions about which viruses might be present in the samples. After 10 years of applying this technology to virus detection, eukaryotic virus discovery continues to be robust. A recent example of this is the novel rhabdovirus, Bas-Congo virus, which is an emerging pathogen associated with acute hemorrhagic fever [1], notable for being the first instance of a rhabdovirus being implicated as a cause of hemorrhagic fever. This virus was characterized in the context of a small outbreak, and the presence of antibodies in an asymptomatic caregiver suggested that person-to-person transmission had occurred. Another emerging pathogen, human coronavirus EMC (HCoV-EMC), was recently identified and characterized following an outbreak in the Middle East [2]. This betacoronavirus causes symptoms resembling those of its sister species, SARS coronavirus, including respiratory symptoms and acute renal failure, although HCoV-EMC is most closely related taxonomically to bat coronaviruses. Using modern technologies, the genome of HCoV-EMC was completely sequenced, and assays have been developed to monitor its presence. This virus is particularly interesting because coronaviruses are typically highly restricted to a specific host, but HCoV-EMC can infect cells from primates (human and monkey), swine, and bats (four families) in culture, suggesting that this virus may utilize a receptor shared among these host groups and may be readily transmitted between hosts [3]. These and similar studies demonstrate that continued viral discovery is needed in order to identify and prepare for the effects of emerging viral pathogens on human health. The techniques and technologies are in place for identifying pathogens with similarities to known viruses (even remote similarities, see Table 1). The availability of samples and the funding required for the experiments currently bottleneck virus discovery efforts.
Testing samples from affected individuals during outbreaks of diseases of unknown etiology is important for surveillance of pathogens, but it is also important to identify viruses producing symptoms that are mild or subclinical because infection with these viruses may nevertheless have long-term implications for human health. For example, polyomaviruses and papillomaviruses may establish chronic infections. Some of these viruses, including many alpha papillomaviruses and Merkel cell polyoma virus, are associated with cancer [4][5]. In light of their capacity to transform cells, identifying the full range of polyomaviruses and monitoring their presence could ultimately provide insight into the development of some cancers.
Potentially emerging pathogens are of great concern, and influenza pandemics are of particular interest because transmissions between animal reservoirs and humans are observed and the emergence of pandemic strains is expected. Current molecular technologies allow us to screen for transmission of influenza between animal species and between humans, and to evaluate mutations and quasispecies. In two highly publicized studies, researchers identified mutations that correlated with airborne mammal-to-mammal transmission of the virus, a trait critical for the development of a pandemic [6 ,7 ]. Researchers used either an H5N1 influenza that originated in birds but had infected humans or a reassortant virus with the avian subtype H5 hemagglutinin and the other seven segments from a 2009 pandemic H1N1 virus. In these studies, viruses were passaged in ferrets, and isolates that had acquired mutations allowing for airborne transmission between ferrets were identified. The strains were sequenced and the mutations that correlated with ferret-to-ferret transmission were identified. The mutations were in the host receptor binding protein hemagglutinin and in the polymerase complex protein basic polymerase 2, and the studies showed that surprisingly few mutations would be necessary for the virus to achieve the capacity for airborne transmission between ferrets. Furthermore, a second study showed that two of the mutations were already circulating among H5N1 strains in birds [8]. These studies provide valuable insight into the plausibility of emergence of an H5N1 pandemic • High-throughput genomic sequencing is an ideal method for identifying novel organisms and mutations/reassortants in clinical and survey samples without culturing or prior knowledge about the virus genome.
Virus discovery and emerging pathogens Viruses associated with diseases of unknown etiology • It is important to consider the dynamics and complexity of the microbiome, and genomics is a powerful tool for characterizing complete microbial communities to initiate and aid these kinds of studies.
Components of the microbiome interact with and affect other microbes • Detailed genomic analysis can be used to identify novel genes for further characterization and study. • Faster sequencing and analysis brings us closer to clinical applications of genomic sequencing.
Genome characterization, gene discovery, and the future of diagnostic tests • Many diseases are linked with potential viral causes (Table 2), and genomics has great potential for identifying pathogens that associate with clinical symptoms, again without the need for culturing or the necessity for prior knowledge of the genome.

Current Opinion in Microbiology
The major topics of this review are summarized. Areas of virology research are noted in the blue boxes, and the text in the gray boxes describes how genomics can be used as a tool in that area of research.
strain for which surveillance could be maintained to anticipate potential outbreaks. It is important to note that the effects of any mutations on the transmission and pathogenesis of influenza in humans are regulated by the genomic context, and the other influenza genes present may play important roles in promoting or limiting the spread of a virus in nature. The controversy generated by this research stemmed from concerns over accidental release from this laboratory or application of the knowledge by bioterrorists. These studies sparked heated debate worldwide, causing some countries to amend their laws and restrictions for research on pathogens with potential as agents of bioterrorism [9]. Some argue that tighter restrictions will slow research in these areas, potentially leaving us less protected from coming pandemics.

Viruses associated with diseases of unknown etiology
There are many diseases of unknown etiology, including Kawasaki disease and chronic fatigue syndrome that have attracted efforts to discover a microbial cause (e.g. see Table 2). XMRV is a retrovirus that was previously associated with chronic fatigue syndrome and prostate cancer. It was discovered in prostate cancer samples, and its complete genome was sequenced [10]. Expanded studies of this virus suggested that it was present in prostate cancer tissue of 40% of patients with a mutation in the RNase L gene, which encodes an antiviral function, compared to only 1.5% in prostate cancer patients without the mutation [10]. A separate study also showed an association of XMRV with prostate cancers in patients who did not have an RNase L mutation [11]. Subsequently XMRV was also associated with chronic fatigue syndrome [12]. However, these observations were highly controversial, because other studies found no correlation of XMRV with prostate cancer, chronic fatigue syndrome, or a plethora of other diseases of unknown etiologies. The story of this virus has changed dramatically in the last year as additional studies showed that XMRV was a laboratory contaminant that arose from recombination between two proviruses during the passage of prostate cancer tumor cells through mice during the development of cell lines [13 ] and ruled out a causal relationship between XMRV and prostate cancer [14]. This story illustrates the caution that must be exercised in the process of virus discovery. Highly sensitive molecular assays may uncover low-level contaminants as well as legitimate pathogens, so carefully constructed controls are critical for accurate interpretation of the data.

Components of the microbiota interact with and affect other microbes
Interactions of viruses with hosts is not only affected by the host genotype but may be influenced by the effects of the bacterial microbiome. This is an area where research Virome genomics Wylie, Weinstock and Storch 481 Table 1 Computational tools used to identify viral sequences, including those with remote sequence similarity to known viruses

IDBA-UD, MetaVelvet
These programs assemble shorter sequences into longer contiguous sequences. Longer sequences can help identification of more remote sequence similarities [28,29] BLASTX and TBLASTX A translated nucleotide sequence is queried against a database of protein sequences or translated nucleotide sequences. This approach is not as sensitive as the searches involving profile Hidden Markov Models (HMMs) and iterative searches (described below), but it is sufficient to identify many viral sequences, including those from many recently discovered novel viruses. [30] Realtime Genomics mapx, MulticoreWare mblastx, Rapsearch2 These three programs are like BLASTX but accelerated to accommodate the large amount of data generated from high-throughput sequencing.
http://www.realtimegenomics.com, http://www.multicorewareinc.com, [31] HMMER3 A protein query is used to search a profile Hidden Markov Model (HMM) database. Jackhmmer is an iterative alignment program that is part of this package (similar to PSI-BLAST, described below). [32] Hhblits A profile HMM is constructed for the query sequence and then used to query an database of profile HMMs. [33] PSI-BLAST Relatively closely related proteins are identified and conserved amino acid positions are represented in a profile. The profile is used to search a protein database to identify proteins with more remote homologies. The new proteins are added to the profile and the search is repeated iteratively. [30] PHI-BLAST A protein and specific pattern within the protein are used to query a protein database. Proteins that contain the pattern and are similar to the input protein around the pattern are retrieved. CSI-BLAST performs the search iteratively. [34] DELTA-BLAST Searches a database of profile HMMs and identifies the HMM most closely related to the query. The resulting HMM is then used to search a protein database.
is only beginning, but already interesting observations are being made. For example, the retrovirus MMTV and poliovirus have enhanced ability to infect mice in the presence of gut microbes [15,16]. In the case of MMTV, the virus uses bacterially derived LPS to subvert the antiviral response by a mechanism involving the TLR-4-dependent induction of the immunosuppressive cytokine IL-10 [15]. In the case of poliovirus, gut bacteria appear to enhance infectivity by promoting virus attachment to host cells [16]. Likewise, depletion of commensal bacteria in the airways of mice by antibiotic treatment dampened the immune response to influenza virus, likely due to the role of commensal bacteria in developing and regulating the immune system [17 ]. These studies suggest that shifts in bacterial communities may be important considerations when trying to understand the implications of viral infections for human health. In each of these cases, mouse models were used, enabling the controlled comparison of viral infection in genetically identical mice, which were germ-free, antibiotic-treated, or colonized with commensal bacteria. Similar studies correlating viral infections in humans with bacterial communities are important, but much more complicated. Nonetheless, metagenomic sequencing analyses have the potential to broadly characterize all of the microbial organisms present in a clinical sample, a capability that can be especially powerful when coupled with patient information. Ultimately, we may be able to identify bacteria and viruses that frequently co-occur or are mutually exclusive and correlate these patterns with clinical symptoms and outcomes. This could yield insights into pathogenesis as well as better diagnostic tools and treatments.
The implications of co-infections are important when evaluating virus-host interactions. For example, the flavivirus GBV-C has been shown to slow the progression of HIV to AIDS [18,19], and herpes simplex virus 1 infection enhances the risk for infection by HIV [20]. Recently, disease progression of pathogenic SIV infection was shown to be associated with a major expansion of the enteric virome [21 ]. These initial observations were made using high-throughput metagenomic sequencing, which identified at least 32 novel enteric viruses in nonhuman primates. Virus infections may compromise the integrity of the intestinal epithelial lining, allowing the translocation of immunostimulatory molecules from the gut into the blood stream with subsequent systemic immune activation. This study suggests that viruses that co-infect with SIV may have important roles in the pathogenesis of the disease. Metagenomic analyses of samples from people with HIV, sepsis, immunosuppression from transplants, or even more routine illnesses like rhinovirus infections, may reveal co-infections that correlate with disease development or symptoms. Recent PCR-based studies of respiratory samples have found coinfections with multiple viruses in 12-21.7% of children with acute respiratory illness [22][23][24]. In another study, co-infections were found to be common in patients presenting to the emergency room with fever without a source, based on metagenomic sequencing [25] and PCR assays [26], with as many as five viral genera detected in samples from some subjects [25]. These studies and others indicate viral co-infections are common and worthy of further investigation.
Genome characterization, gene discovery, and the future of diagnostic tests Genomic sequences of viral strains can be used to predict viral features that are important for viral replication or pathogenesis. For example, based on alignment of influenza A virus segment 3 (PA) from >1000 strains, researchers recently identified a conserved region that leads to a pause and ribosomal frame shift during translation [27 ]. This results in a fusion protein called PA-X consisting of the N-terminus of the RNA-dependent RNA polymerase and the newly discovered peptide. This protein appears to have nuclease activity that dampens the host antiviral response. What this study illustrates is that comparative genomics of well-studied viruses can reveal exciting, novel genomic features with implications for virus replication, pathogenicity, and interactions with the hosts. In addition to these kinds of discoveries, the characterizations 482 Host-microbe interactions: viruses  [49,50] of specific viral genotypes or viral variants that associate with virulence can lead to enhanced diagnostic assays that provide information that was not previously available to clinicians. In a new generation of diagnostic tests, rapid sequencing assays could be used not only to detect viruses in patient samples but to provide information about the viral subtype and the presence of virulence genes, which could be used to predict disease severity and outcome. Sequence-based diagnostic tests could also be used to identify the presence of antiviral drug resistance alleles, which would inform the management of some viral infections, including influenza A and cytomegalovirus. Furthermore, ultra deep sequencing (resulting in >100-1000 sequences at each base position of the viral gene or genome) could be used to assess viral variation and quasispecies, which could be important for monitoring emerging pathogens or to development effective vaccines. Sequencing assays aimed at the human transcriptome also have the potential to be developed into clinical tests to measure the host response to viral infection, which may help both with diagnosis and selection of treatment. An additional advantage of diagnostic tests based on high throughput sequencing is that they may be less vulnerable than PCR-based assays to sequence variation in primerbinding or probe-binding regions.

Conclusion
Studies of virus-host interactions are firmly supported by the usage of genome sequencing technologies. A number of applications and advances that have occurred in the last two years are described here, but as sequencing becomes faster and less expensive, the applications to research and diagnostics expand further. Exploratory sequencing analyses can be powerful tools to carry out targeted, hypothesis-driven studies that lead us to better understand pathogenic effects of viruses.

References and recommended reading
Papers of particular interest, published within the period of review, have been highlighted as: of special interest of outstanding interest