Metagenomic study of the viruses of African straw-coloured fruit bats: Detection of a chiropteran poxvirus and isolation of a novel adenovirus

Viral emergence as a result of zoonotic transmission constitutes a continuous public health threat. Emerging viruses such as SARS coronavirus, hantaviruses and henipaviruses have wildlife reservoirs. Characterising the viruses of candidate reservoir species in geographical hot spots for viral emergence is a sensible approach to develop tools to predict, prevent, or contain emergence events. Here, we explore the viruses of Eidolon helvum, an Old World fruit bat species widely distributed in Africa that lives in close proximity to humans. We identified a great abundance and diversity of novel herpes and papillomaviruses, described the isolation of a novel adenovirus, and detected, for the first time, sequences of a chiropteran poxvirus closely related with Molluscum contagiosum. In sum, E. helvum display a wide variety of mammalian viruses, some of them genetically similar to known human pathogens, highlighting the possibility of zoonotic transmission.


Introduction
Zoonoses caused by unknown agents represent a significant proportion of the challenge of emerging infectious diseases (EIDs) (Morens et al., 2004). Viruses account for approximately 25-44% of all EIDs (Jones et al., 2008;Taylor et al., 2001) and studies suggest they are the pathogen class most likely to emerge (Cleaveland et al., 2007;Dobson and Foufopoulos, 2001). Hantaviruses, henipaviruses, SARS coronaviruses and filoviruses are all viruses of zoonotic origin. Nearly 80% of zoonotic EIDs originate from wildlife, and the overall contribution of wildlife pathogens to human EID events is increasing and represent an ongoing threat to global health (Cleaveland et al., 2007;Jones et al., 2008). For example, a novel coronavirus associated with acute respiratory disease was recently diagnosed in pneumonia patients in Saudi Arabia and London (Bermingham et al., 2012;Zaki et al., 2012). Analysis of the novel coronavirus genome suggests a possible bat origin (Bermingham et al., 2012).
South and East Asia, Eastern Europe, Latin America and tropical Africa constitute areas of increased relative risk for zoonotic emergence from wildlife (Jones et al., 2008;Morens et al., 2004). Numerous studies have successfully combined metagenomics with next generation sequencing to explore the viruses of different animal species, including: domestic pigs and turkeys (Day et al., 2010;Shan et al., 2011); Californian sea lions ; and  (Phan et al., 2011). Characterising the viruses of candidate reservoir species in high-risk geographical areas is an important step toward better understanding viral emergence.
Bats are the primary reservoirs for many viral zoonoses, including henipaviruses, filoviruses, some lyssaviruses and SARS-like coronaviruses (Halpin et al., 2000;Kuzmin et al., 2008;Li et al., 2005;Luby et al., 2009;Towner et al., 2007). Indeed, seminal work has been recently published on the role of bats as natural reservoirs of paramyxoviruses (Drexler et al., 2012). Detailed studies of the viruses of insectivorous bat species in both North America and China have been conducted (Donaldson et al., 2010;Ge et al., 2012;Li et al., 2010a;Wu et al., 2012). These studies found large numbers of insect and plant viruses, which were thought to reflect dietary inputs, as well phage sequences and mammalian viruses. The majority of the mammalian viruses identified in those studies were those previously identified in bats (often with high diversity being reported in individual populations) and include: Adenoviridae (Li et al., 2010c); Parvoviridae (Li et al., 2010b); Circoviridae (Ge et al., 2011); Coronaviridae (Tang et al., 2006;Woo et al., 2006) and Astroviridae (Chu et al., 2008). Papillomaviridae and Herpesvirdae sequences were also commonly found (Donaldson et al., 2010;Ge et al., 2012;Wu et al., 2012) and some studies also reported Picornaviridae, Flaviviridae and Retroviridae (Li et al., 2010a;Wu et al., 2012).
If we consider that the ∼1200 bat species constitute approximately 20% of the class Mammalia and that they are near-globally distributed, the benefits of expanding our knowledge of bat viruses on geographic and taxonomic levels become evident.
Here we conducted a metagenomic study to detect viruses of E. helvum, a frugivorous African bat species that is widelydistributed and migratory throughout much of sub-Saharan Africa. The species is eaten as bushmeat, and the populations studied have ample opportunities for human contact, including a roost directly over a hospital in Accra, Ghana (Hayman et al., 2012).

Performance comparison among assemblers
Here, we show important performance differences among four different assemblers (Velvet, ABySS, MetaIDBA and MetaCortex) and three sample types.
The assemblers generated different numbers of contigs, with MetaCortex and ABySS producing more sequences than Velvet and MetaIDBA for each sample type (Fig. 1A). Sample type also affected the number of contigs, with increasing cellularity resulting in more contigs (Table 1), except with MetaIDBA, where the sample allowed to iterate over a kmer size-range generated the most contigs. Contig length parameters also varied with assembler and sample type. Velvet generated contigs with the longest average length across sample types, while ABySS typically produced the longest contigs. Regarding sample type, the longest contigs were generated from the throat sample for each assembler (Table 1).
As well as generating contigs of differing length and number, the nucleotide composition of contigs varied among assemblers. Base composition of the total assembled contigs revealed important differences (Supp. Fig. 2A). Velvet and MetaCortex contigs had similar base compositions, while ABySS contigs incorporated non-ATCG notations (e.g. N, R, Y, Supp. Fig. 2A), and MetaIDBA contigs were primarily composed of adenine (though this was not true for MetaIDBA contigs included in final analyses (Supp. Fig. 2B)).
Consolidation of contigs combines the strengths of assembler approaches and reduces complexity De novo assemblers have different contig-construction methods, resulting in strengths and weaknesses in different situations. By consolidating contigs from multiple assemblers, we combined the strengths, while reducing the computational complexity of analyzing contigs from assemblers separately. The proportion of contigs retained after consolidation differed among assemblers. For example, ≤22% of ABySS contigs but ≥94% of MetaIDBA contigs were retained into the final consolidated set for each sample type ( Table 2). The consolidation resulted in the discard of approximately 30% of the total assembled contigs (∼4.4 million sequences). Consistent with the observed assembler differences in contig generation, a variable proportion of sequences were also length-excluded per assembler, but these proportions were approximately equal among sample types (Fig. 1B, Table 2).

Eidolon helvum samples contain large numbers of viral sequences
To identify viral sequences in the consolidated contigs, we used multiple algorithms, which had different efficacies. While BLASTn identified 258 suspect-viral sequences among the sample types, BLASTx and tBLASTx identified a further 6448 and 2563 viral sequences, respectively. Manual exclusion and curation of these 9269 suspect-viral sequences was used to further focus analysis on viral sequences of interest (Fig. 1B). Here, we aimed to explore the viruses that likely infect E. helvum (for which they probably constitute a natural reservoir) not the viruses infecting their dietary inputs or bacterial flora. Consequently, 8095 suspect-viral sequences related to viral families not known to infect vertebrates were excluded from further analysis. Subsequent to close inspection of the remaining 1174 suspect-viral sequences, a further 11 sequences were removed due to incorrect classification in the database (not shown). This resulted in 1363 viral sequences related to eight mammalian-infecting viral families being identified. While the majority (77%) were related to viruses with double stranded DNA (dsDNA) genomes, 21% were related to retroviruses (classified separately as sequences may have derived from exogenous-RNA or proviral-DNA forms) and single-stranded DNA and positive-sense single-stranded RNA viruses were also present ( Table 3, Fig. 1C).
All sample types, assembly algorithms and BLAST comparison algorithms identified viral sequences (Table 3, Fig. 1C). By using multiple assembly and identification algorithms we generated more contigs and identified more viral sequences.

Analysis of viral sequences by family Herpesviridae
We identified 539 sequences related to herpesviruses, mostly from the throat sample (Table 3). Sequences related to a wide range of genes and proteins involved in diverse functions including gene regulation, nucleotide metabolism, DNA replication as well as envelope glycoproteins and other structural proteins. Most sequences related to members of the betaherpesvirinae (n ¼366) and gammaherpesvirinae (n ¼171), and only two sequences most closely related with alphaherpesvirinae (Supp. Table 1). Phylogenetic analysis of a region of the DNA polymerase showed the presence of distinct herpesviruses in the throat sample, including some related to other bat betaherpesviruses  and a novel gammaherpesvirus (Fig. 2). The presence of contig th_687866 in the throat sample was confirmed by PCR and sequencing (not shown).

Papillomaviridae
Most papillomavirus sequences derived from the throat sample (405 of 408), but the other sample types also contained papillomavirus sequences (Table 3). Sequences related to both early (E) and late (L) genes of viral replication in proportions approximate to gene length (Fig. 3A). The sequences related to members of geneticallydiverse genera within the Papillomaviridae (Supp. Table 1, Fig. 3B). Phylogenetic analysis of overlapping fragments showed two related sequences of novel papillomavirus(es), one (th_683255) with 76% aa ID with an incomplete, unpublished 'Eidolon helvum papillomavirus Contigs assembled from sequencing reads by different de novo assemblers (starred, denoted by colours) were consolidated by sequential comparisons (numbered, curved green arrows) and removal (red arrows) of duplicate sequences. The size of charts is proportional to the number of contigs. (B) Consolidated sequences subject to sequential BLAST comparison with automated taxonomic classification to identify suspect-viral sequences. Then sequences were excluded manually on the basis of related viral family and curation to identify a final set of viral sequences. Proportions of sequences assembled by each algorithm (coloured as in A) and from each sample type (coloured as in the inset key) before and after length-exclusion are shown in the stacked charts. (C) Proportions of 1,363 mammalian-virus related contigs by assembly algorithm, sample-type and identification algorithm (shaded as in B) are shown in the stacked chart. Proportions by viral family are shown in the chart on the right. Number of contigs related to members of each family are shown in parentheses after the viral family names, which are grouped by genome type. (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.) 1' and another (th_NODE_12326) with 64% aa ID with Rousettus aegyptiacus papillomavirus 1. We confirmed the presence of contig th_679786 by PCR and sequencing (not shown).

Adenoviridae
Sequences related to adenoviruses were present in all three sample types (Table 3). All sequences related to members of the mammalian-infecting genus Mastadenovirus. When aligned with the prototypic member of this genus (Human adenovirus C, NC_001405), the 68 adenovirus-like contigs displayed 65-100% aa identity with twelve proteins involved in capsid morphogenesis, DNA replication and encapsidation, and apoptosis (Supp. Table 1). We isolated an adenovirus (referred to as E. helvum adenovirus 1) from a urine sample obtained from this bat population ( Fig. 4A and B), and although phylogenetic analysis identified as a mastadenovirus, it was distinct from those previously described in bats (Fig. 4C). Notably, E. helvum adenovirus 1 clustered with human Table 3 Identification of viral sequences (divided by viral family) according to sample type, assembler and identification algorithm.

Sequence origin
Related viral family   adenoviruses. Contigs from the throat (th_NODE_10144) and urine samples (ur_NODE_27579) shared 77% and 90% aa identity with the isolated virus' hexon protein over distances of 58 and 63aa, respectively.

Poxviridae
We detected 38 contigs related to poxviruses, all derived from the throat sample and related to chordate-infecting poxviruses. Most (n ¼25) were related to Molluscum contagiosum (MC), a human contagion (Supp. Table 1). We compared all poxvirus contigs against MC reference proteins (NC_001731) using BLASTx. The sequences shared 29-74% aa identity with 23 different MC proteins and had e-values of 3.45 e −111 -8.03 e −4 , showing that all sequences had significant similarity to MC. Sequences were related to proteins in the outer (variable) as well the inner (core) regions of the genome (Supp. Fig. 3), but no proteins unique to Molluscipox were detected. This relationship was exemplified by phylogenetic analysis of contig th_node1036_0_0_38518, related to the major core protein (Fig. 5). We confirmed the presence of this contig in individual throat swabs by PCR. Five of the forty throat swabs (13% of samples, 95% CI 6-26%) contained this sequence. To further confirm the presence of poxvirus sequences in E. helvum, we aligned the MetaCortex contigs (of any length, from all three sample types) against the MC genome using BLAT (which reports alignment blocks of over 95% identity) and increased the number of poxviral related contigs to 12,845 (Supp. Fig. 3)

Polyomaviridae
We assembled a sequence from the throat sample that was related to the VP1 capsid protein of polyomaviruses. This phylogeneticallyclustered with primate polyomaviruses with low confidence, likely due to the short length of the sequence (Supp. Fig. 4).

Retroviridae
There were 292 sequences related to retroviruses, primarily derived from the lung sample, though sequences were also present in the urine and throat samples (Table 3). Retroviral sequences related primarily to gamma, beta and unclassified retroviruses (Fig. 6A). The sequences related to all three canonical genes of retroviruses in proportions approximate to gene length (Fig. 6B). Translations of many retrovirus contigs contained stop codons within the region of BLAST alignment, suggesting that they derived from non-functional, endogenous retroviruses. The longest ORF of a partial polymerase protein sequence (th_NODE_62045) was phylogenetically related to, but distinct from, both avian and mammalian viruses (Fig. 6C).

Parvoviridae
Ten sequences derived from the throat (n¼8) and urine (n¼2) related to members of the Parvoviridae, from both the mammalianinfecting Parvovirinae subfamily and the invertebrate-infecting Densovirinae subfamily (Supp. Fig. 5). The analysed Parvovirinaelike sequence (th_node7292_0_0_7345) related to members of different genera (e.g. Erythrovirus and Betaparvovirus) and was distinct from a known E. helvum parvovirus (Supp. Fig. 5B).

Picornaviridae
Contigs from the urine (n ¼6) and lung (n ¼ 1) sample related to picornaviruses. Urine contigs related to the polyprotein of members of the genus Kobuvirus and the longest sequence, ur_181630, phylogenetically-clustered with human and canine kobuviruses (Supp. Fig. 6). Short sequence lengths precluded useful phylogenetic comparison of these Kobuvirus sequences with those detected in North American insectivorous bats (Li et al., 2010a), however, there was 50% identity over a 90aa overlapping region. The lung picornavirus sequence related to members of the genus Enterovirus, but was too short (79 bp) for useful phylogenetic analysis.

Discussion
Here we described the first detailed study of metagenomic viral sequences from a megachiropteran species. E. helvum have a wide geographical distribution and live in close contact with human populations. As such, this bat species is an ideal candidate reservoir host, also being a source of bushmeat in Ghana and likely being infected with henipaviruses, Lagos bat virus (lyssavirus) and Ebola virus (

The impact of bioinformatics tools on metagenomic studies
Differences in the assembler efficacy manifested as differences in number, length-parameters and base-composition of the generated contigs. ABySS and Velvet are consensus assemblers designed to assemble a single genome from sequence reads. Contrastingly, Meta-Cortex and MetaIDBA are meta-assemblers specifically designed to address situations where multiple genomes would be expected. Generally, consensus assemblers adopt more stringent algorithms for error removal in order to build longer contigs, while meta-assemblers preserve sample variation. The consolidation of contigs from multiple assemblers generated a more robust contig set and reduced the number of sequences by one third, facilitating downstream processing.
The BLAST algorithm used also impacted the number of contigs identified as suspect-viral. BLASTn identified fewer sequences than tBLASTx against the same database with identical retention criteria. Similarly, BLASTx identified more suspect-viral sequences than BLASTn (although a different database was used), supporting the observation that protein-based comparisons are more effective than nucleotide-based comparisons where divergent sequences are expected (Kunin et al., 2008). The use of multiple identification algorithms here, enabled the detection of more viral sequences.

Viral sequences
The relative identification success for mammalian viral sequences in this metagenomic study compared with others, as well as among sample types and analytical tools, provide guidance on how best to approach such studies. Using the Illumina platform, and working with a frugivorous bat species, we found that 28% of viral sequences identified were of mammalian-origin, more than the ≤10% previously identified in insectivorous bat species (Donaldson et al., 2010;Ge et al., 2012;Li et al., 2010a;Wu et al., 2012). The sample type also affected the level of detection, with most viral sequences being derived from the throat sample (though differences attributable to colony differences from which the samples were collected cannot be ruled out). These discrepancies in detection show that the quantitative and qualitative success of viral metagenomic studies is determined partly by the study species, sample type, and molecular and bioinformatic tools used.
Here we aimed to identify viruses circulating in E. helvum that might have zoonotic potential. We detected novel, often diverse, viruses in many viral families, in samples collected over a short time frame from a small number of bats. The viral sequences were often distinct from those previously described in bats and often we saw diversity within the viral family (e.g. herpesvirus and retroviral sequences from different subfamilies, and at least six phylogenetically-distinct papillomaviruses from different genera). Clearly, a wide range of diverse and previously uncharacterized viruses circulate in E. helvum.
Given the proximity of this species with humans over a large geographical area, it is important to consider the zoonotic potential of these viruses. We detected poxvirus, adenovirus and polyomavirus sequences closely related with those from humans and primates (in the latter two cases, more closely related than with those viruses previously described in bats). We also isolated an adenovirus from urine collected directly underneath the colony, a sample type to which humans are regularly exposed. The relatedness of these viruses to human pathogens may indicate that these viruses are more likely to emerge (Antia et al., 2003). Given the relationship of these viruses with human pathogens and E. helvum's high rate of human contact, more extensive active surveillance such as molecular and serological studies of humans in contact with relevant bat populations seems appropriate.
Our study expands the known chiropteran viral profile. While many viral families detected here have previously been reported in bat populations, here we report the detection of poxvirus sequences in bats. Although Li et al. reported pox-related sequences derived from a circovirus (Li et al., 2010a), the viral sequences reported here are likely to derive from a true poxvirus. We detected sequences with a high degree of relatedness to 23 different proteins throughout the genome of M. contagiosum. The presence of this virus in 5% of the throat swabs suggests a high prevalence of poxvirus infection in this bat population. Given the relatedness of the virus to MC, the possibility of zoonotic transmission of poxviruses from bats to humans should be considered further, and in other geographical areas.
The commonalities between our findings and those in other metagenomic studies of bats provide insight on the relationship that bats may have with their natural pathogens. Of the eight viral families detected here, six have previously been detected during metagenomic studies of insectivorous bats (Donaldson et al., 2010;Ge et al., 2012;Li et al., 2010a;Wu et al., 2012) and one (the Polyomaviridae) was detected in Myotis spp. using consensus PCR (Misra et al., 2009). We worked with a host from a separate taxonomic suborder to those studies and still found similar viruses, suggesting that a common viral footprint may be present in all chiropteran species. Continued description of viral profiles of disparate host species and geographical locations will deepen our understanding of host-pathogen relationships in these important zoonotic reservoir species.
While these results represent an interesting development in the study of these reservoir hosts, the limitations of metagenomic methods should be acknowledged. Only partial sequence information was generated for each viral family under study, limiting analysis. Additionally, although some viral sequences were confirmed by PCR and virus isolation, making them likely-derived from true viruses, the same cannot be said of the retroviral sequences. The high proportion of retroviral sequences in the lung, as well as the presence of stop codons in a number of sequences indicate that they likely derive from endogenous proviruses. Furthermore, in common with other studies of this nature, most viral sequences here were derived from those with dsDNA genomes. Drexler et al. (2012) showed that metagenomic methods could not detect paramyxoviral RNA where consensus PCR was successful. Similarly, this population of E. helvum has been demonstrated to harbour a high prevalence and diversity of paramyxoviruses yet none were detected here (Baker et al., 2012). Although some progress is being made toward validating the scope of viral metagenomic studies (Sachsenroder et al., 2012), further work is needed to conclude whether this repeatedlyobserved bias is biological (perhaps consequent to the long, sometimes latent, infection periods of dsDNA viruses) or if it constitutes a laboratory artifact. Due to these methodological limitations, it is likely that the viral sequences described here are not an exhaustive representation of the viruses of E. helvum.

Ethics declaration
This study was approved by the Zoological Society of London's animal ethics committee.

Populations under study
Two colonial populations (250,000-1,000,000 bats each) of E. helvum in Ghana were sampled: one in Accra and one in Tano Sacred Grove (TSG, approx. 400 km North, Supp. Fig. 1). The Accra population is urban, roosting in trees over a city center hospital. The TSG population is rural, roosting in a protected forest area. The two populations comprise part of a metapopulation (Peel, 2012). Interspecies co-roosting of these populations was not observed in five years of field study.

Sample collection
Urine for metagenomic analysis was collected twice from beneath the Accra roost; in January and March 2009. Sterile cotton swabs were saturated with urine on plastic sheeting placed beneath the roost, and placed in 1 ml of virus transport medium (VTM: Hank's Balanced Salt Solution, 1% BSA [w/v], gentamicin 100 mg/ml and amphotericin B 2 mg/ml). Urine samples for virus isolation were collected in 2010, as previously described (Baker et al., 2012). Throat swabs were collected from individual, manually-restrained bats caught from TSG in March 2009. Swabs were placed in 1 ml of VTM. Lung tissue was collected from healthy, adult bats euthanized by anaesthetic overdose (ketamine/medetomidine), captured in Accra in March 2009. An individual piece of tissue (approx. 5 mm 2 ) from each bat was used. Samples were frozen at −80 1C until further processing.

Next generation sequencing
This work was carried out at the Wellcome Trust Sanger Institute. The DNA library was sheared (Covaris AFA, Covaris) to 200-300 bp and purified (QIAquick spin columns, Qiagen), bluntend repaired, and ligated to sequencing primers. Ligation products were purified (Agencourt Ampure SPRI beads, Beckman Coultor Genomics) and the library was 200 bp size-selected by agarose gel electrophoresis. Following purification (Gel extraction kit, Qiagen), the library was PCR amplified for 10 cycles (Phusion DNA polymerase) in triplicate using Illumina adaptor-specific primers. The primers were removed (Agencourt Ampure SPRI beads, Beckman Coultor Genomics), and libraries quantified by qPCR were diluted to 40 nM for cluster generation. Libraries were sequenced on an Illumina GAII (Illumina Inc) for 76 bp paired-end reads. Following data QA, QC and computational primer removal there were 5,218,132 sequence reads from the urine sample, 15,809,698 from the throat sample and 22,530,774 from the lung tissue sample.

Generation of consolidated contiguous sequences (contigs)
Sequences from each sample type were processed individually. Reads were de novo assembled using four assembly algorithms: Velvet 1.1.04 (Zerbino and Birney, 2008), ABySS 1.2.7 (Simpson et al., 2009), MetaIDBA 0.19  and MetaCortex (Leggett and Caccamo, personal communication), a recently-developed variant of Cortex (Iqbal et al., 2012). These assemblers are based on de Bruijn graphs, which are constructed by dividing reads into smaller, overlapping sequences called kmers. For ABySS, Velvet and MetaCortex, a range of kmer sizes (21, 31, 41, 51, 61 and 71) were evaluated, with 31 proving optimal (providing the largest number of contigs ≥100 nt and the largest number of viral matches to the NCBI nt database) for Velvet and ABySS, and 61 being optimal for MetaCortex. MetaIDBA iterates over a kmer size range, and the throat sample was iterated over 21-71 nt. When assembling the urine and lung samples, MetaIDBA was unable to reach completion when allowed to iterate, so was fixed to 71 nt. The four contig sets for each sample type then underwent a consolidation process comprised of sequential BLAT alignments (Kent, 2002), followed by removal of shorter contigs that were 95% identical over a 95% length fraction (Fig. 1A). First, contigs generated by Velvet and ABySS were compared (Comparison 1). The retained contigs were then compared with MetaIDBA contigs (Comparison 2), and then those still retained were compared with MetaCortex contigs (Comparison 3, Fig. 1A). Consequently, sequences retained after Comparison 3 were a consolidated contig set with reduced redundancy.

Identification of suspect-viral sequences
Contigs≤76 nt in length (considered potentially-derived from single sequencing reads) were not analysed further (Fig. 1B). Remaining sequences underwent sequential BLAST comparison (Altschul et al., 1990) against NCBI databases (at November 25, 2011) and taxonomic classification queried from the NCBI taxonomy web service. Contigs underwent BLASTn comparison with the NCBI nt database. The taxonomic classification of the source organism of the reference sequence with which the contig 'best' aligned (i.e. the alignment with the lowest expect value [e-value]) was retrieved. If the reference sequence taxonomy was viral and aligned with the contig with an e-value of≤0.0001, the sequence was flagged as suspect-viral and retained for further analysis. Sequences not flagged as suspect-viral then proceeded to BLASTx comparison with the NCBI nr database, with the same retention criteria. Sequences still not retained underwent tBLASTx comparison with the NCBI nt database, and were similarly retained. Those not retained in this final comparison round were discarded.

Classification and curation of viral sequences
Suspect-viral sequences related to viral families not known to infect vertebrates were excluded. Remaining suspect-viral sequences were manually curated by examination of the region of database sequence that matched the contig in BLAST alignment. Where the database sequence providing the alignment with the lowest e-value appeared to be classified into the incorrect taxonomic group, all BLAST hits for the contig were examined and the majority taxa was determined to be the likely origin of the sequence (Fig. 1B). Remaining sequences and BLAST results were then grouped by viral family.

Analysis of viral sequences
Reference sequences were downloaded from NCBI, and global alignments with contigs generated using Clustal X (Version 2 (Thompson et al., 1994)) and Muscle 3.8.31 (Edgar, 2004). Gapstripped alignments (columns with450% gaps were removed) were then used to infer phylogenetic trees using MrBayes (Ronquist and Huelsenbeck, 2003), as previously described (Baker et al., 2012). Local BLAST comparisons and pairwise identities were performed using Genomics workbench (Vs 5.1, CLC Bio).

Amplification of viral sequences by PCR
Primers (sequences available on request) were designed to detect poxvirus, herpesvirus and papillomavirus contigs in the throat sample nucleic acids submitted for sequencing. Poxvirus PCR was also performed on nucleic acids extracted from the individual throat swabs. PCR (DreamTaqGreenPCR Mastermix, Fermentas) products were visualized by gel electrophoresis, purified (Gel extraction kit, QIAGEN) and sequenced.

Adenovirus isolation and characterization
An adenovirus causing cytopathic effect in Pteropus alecto primary kidney cells (Crameri et al., 2009) was isolated from sample U69 (Baker et al., 2012). Negative contrast electron microscopy (EM) was used to examine 6 day post-infection culture supernatant. Supernatant was adsorbed onto parlodion-filmed copper grids coated with carbon and stained with nano-W stain (Nanoprobes, Yaphank, NY, USA). Thin section EM was used to examine cells 5 days post-infection (as in (Weir et al., 2012) except using Sorenson's phosphate buffer (300 mosM/kg, pH 7.2)). The full-length hexon gene of this isolate was sequenced, as in .

Nucleotide sequences
Viral sequences discussed here were deposited in GenBank (JX885594 -JX885611), except for the too-short polyomavirus sequence (Supp. Fig. 4). Sequencing data were deposited in the European Nucleotide Archive (ERP001979). Supplementary fasta files of all viral and suspect-viral sequences and comparison outputs are available online, and assembled contigs are available on request.