Simultaneous Identification of DNA and RNA Viruses Present in Pig Faeces Using Process-Controlled Deep Sequencing

Background Animal faeces comprise a community of many different microorganisms including bacteria and viruses. Only scarce information is available about the diversity of viruses present in the faeces of pigs. Here we describe a protocol, which was optimized for the purification of the total fraction of viral particles from pig faeces. The genomes of the purified DNA and RNA viruses were simultaneously amplified by PCR and subjected to deep sequencing followed by bioinformatic analyses. The efficiency of the method was monitored using a process control consisting of three bacteriophages (T4, M13 and MS2) with different morphology and genome types. Defined amounts of the bacteriophages were added to the sample and their abundance was assessed by quantitative PCR during the preparation procedure. Results The procedure was applied to a pooled faecal sample of five pigs. From this sample, 69,613 sequence reads were generated. All of the added bacteriophages were identified by sequence analysis of the reads. In total, 7.7% of the reads showed significant sequence identities with published viral sequences. They mainly originated from bacteriophages (73.9%) and mammalian viruses (23.9%); 0.8% of the sequences showed identities to plant viruses. The most abundant detected porcine viruses were kobuvirus, rotavirus C, astrovirus, enterovirus B, sapovirus and picobirnavirus. In addition, sequences with identities to the chimpanzee stool-associated circular ssDNA virus were identified. Whole genome analysis indicates that this virus, tentatively designated as pig stool-associated circular ssDNA virus (PigSCV), represents a novel pig virus. Conclusion The established protocol enables the simultaneous detection of DNA and RNA viruses in pig faeces including the identification of so far unknown viruses. It may be applied in studies investigating aetiology, epidemiology and ecology of diseases. The implemented process control serves as quality control, ensures comparability of the method and may be used for further method optimization.


Introduction
About 191 million pigs are kept in Europe as farm animals [1]. As they provide a crucial source of food worldwide, viruses directly affecting the pig health have a great impact on food production in general. In addition, pigs are a natural habitat for zoonotic viruses, which can cause infectious diseases in humans, such as influenza virus [2], rotavirus [3] and hepatitis E virus [4]. Other viruses present in pigs such as bacteriophages or viruses causing subclinical infections may affect the pig health indirectly, e.g. by modification of the bacterial population or by modulation of the immune system as also shown for other hosts [5,6]. In order to understand the complex interaction between the different viruses, their host cells and the immune system, a simultaneous analysis of a broad range of virus species in a distinct compartment is desirable. For a long time, such an analysis was hampered by technical limitations [7]. Recently, this problem was solved by the development of novel molecular approaches utilizing deep sequencing techniques, e.g. 454 pyrosequencing [8]. This technique enables the simultaneous analysis of thousands of sequences present in a DNA sample. Deep sequencing techniques have been frequently used to study the composition of microbial communities in different kinds of environmental samples including faeces of humans and animals [9][10][11]. They have also been applied to the detection of single known and unknown viruses in various kinds of samples [12][13][14][15][16].
Some studies have applied deep sequencing methods to the analysis of the composition of viral communities in human faeces [17][18][19][20][21], but only one study focussed on pigs [22]. In most cases, the majority of detected viruses were bacteriophages. Among the RNA viruses, transitory plant viruses, which most probably originated from feed, were often identified. In addition, several human/animal viruses were detected.
The described protocols used for the analysis of the composition of the intestinal viral community generally consist of four basic steps: (i) purification and concentration of the virus particles present in the faeces, (ii) extraction of nucleic acids, (iii) deep sequencing of the nucleic acid and (iv) bioinformatic analysis of the sequence data. Despite this common backbone, the protocols differ from each other in several details. For example, different methods are used for purification and concentration of virus particles. As the virus particles are very heterogeneous in shape and size [23], the applied filtration method will strongly influence the result of the analysis. Moreover, the genomes of viruses consist of either DNA or RNA, which additionally may appear in different topologies [23]. Most of the protocols analyse DNA and RNA separately [13,17,21,22], which makes the comparison between both groups of viruses difficult. In many of the protocols, various amplification steps, which may affect the distribution of the different genome types, are implemented. In addition, either primary sequence reads or contigs assembled from these sequence reads were used for analysis. All of these variations in the applied protocols may lead to different results. Indeed, the reported composition of the human viral gut community varies remarkably, which -beside other factors -may also be caused by application of different protocols for its analysis [17][18][19]21].
Here, a protocol was established for the simultaneous analysis of DNA and RNA viruses present in pig faeces. A process control consisting of a mixture of bacteriophages with different morphology and genome type was added to the sample and used to assess the efficiency of the method. The use of a process control may enable the optimization of the method as well as a comparison to other published protocols. The optimized protocol was thereafter tested with a pooled pig faecal sample in order to analyse the composition of the viruses present in pig faeces.

Establishment of a process control
To monitor the performance of the method and to optimize the distinct steps of the purification/concentration method, we used three bacteriophages (T4, M13, MS2) with different morphology and genome type [23] as process control (Tab. 1). Bacteriophages were chosen as they are easy to propagate and do not need extensive safety containment. T4 belongs to the family Myoviridae and has a large, double-stranded DNA. M13 is a filamentous phage and contains a small genome of single-stranded DNA. MS2 has a small icosahedral capsid containing a genome of singlestranded RNA. Using these three bacteriophages, the most common genome types were covered by the process control. All phages were added to the pig faecal suspension at a defined concentration. Quantitative real-time RT-PCR (qRT-PCR) protocols were established for these bacteriophages (Tab. 2), which enabled detection and quantification of the viruses at each step of purification. By comparison of the determined genome copy number at the end of the purification process with the number at the beginning, the efficacy of the purification protocol was calculated.
Optimization of the protocol for the detection of the viral community in pig faeces Using the established process control, the method for purification of virus particles from pig faeces was gradually optimized. Several techniques were tested and various conditions were compared to each other, e.g. magnetic stirring vs. stomaching of the faecal sample, filtration vs. centrifugation for removal of larger debris, classical filtration vs. tangential flow filtration (TFF) for removal of smaller debris and bacteria, TFF vs. ultrafiltration for concentration of virus particles and caesium chloride gradient ultracentrifugation vs. sucrose gradient centrifugation for final purification and concentration of virus particles (data not shown). For example, a prolonged centrifugation time for removal of larger debris resulted in a marked loss of bacteriophage T4, whereas bacteriophages M13 and MS2 were less affected. Figure 1 shows the original qRT-PCR plots of the three bacteriophages from this purification step before and after centrifugation at 17,0006 g for 5 hours. However, further processing of the sample by TFF without prior centrifugation was not practicable because of the appearance of filter occlusions. Therefore, a shorter centrifugation of three hours was used, although this might result in some loss of virus particles with high densities.
After the purification step had been optimized, several parameters were tested for efficient removal of free nucleic acids from the purified virus particle suspension. While DNase I digestion had only little effects on the control bacteriophages, treatment with RNase A led to a significant decrease in the detectable genome number of bacteriophage MS2, which may be explained by the presence of residual RNase activity in the final nucleic acid preparation. Therefore, treatment with RNase was omitted in the final protocol. Tests with naked RNA prepared from bacteriophage MS2 that was added to pig faecal samples showed that the RNA was no longer detectable by qRT-PCR after 30 minutes of incubation at room temperature, suggesting that RNase treatment is not necessary at later steps. Although nucleic acid amplification should be omitted to avoid errors caused by PCR, the optimized method contains an amplification step at the end of the procedure. This step had to be introduced, because the yield of nucleic acids was too low for direct use in deep sequencing. In order to minimize the amplification steps, aliquots were taken from different PCR cycles to determine the minimum amplification needed. A general scheme of the working steps in the optimized protocol is presented in Fig. 2. Applying this protocol, recovery rates of 4%, 125% and 105% were obtained for the genomes of T4, M13 and MS2, respectively, by comparing qRT-PCR results after addition of the phages with those from the final nucleic acid preparation.
Recovery rates higher than 100% can be explained by concentration of the virus particles and by amplification of their genomes during the last step of the protocol. The low recovery rate of T4 may be explained by loss of this high density virus during clearing centrifugation as shown above.

Application of the optimized method to a pooled pig faecal sample
The faeces of five 35 day-old male pigs were pooled and subjected to purification of virus particles using the optimized protocol. By deep sequencing of the extracted nucleic acid, 69,613 reads with an average sequence length of 250 nucleotides were generated. After trimming of the sequence ends and exclusion of reads shorter than 50 nucleotides, 66,129 reads were included in the analyses. From these reads, 1,482 contigs could be assembled with a length between 100 bp and 1998 bp. The primary sequencing data are summarized in Table 3.
As a quality control of the method, the sequence reads were analysed for the presence of nucleotide sequences originating from the added bacteriophages T4, M13 and MS2. Sequences of all three phages could be identified. In detail, 175 reads originated Figure 1. Optimization of the purification protocol using qRT-PCR detection of bacteriophages used as process control. The amplification curves determined by qRT-PCR for the phages T4 (blue), M13 (red) and MS2 (green) were determined before and after centrifugation for 17,0006 g for 5 hours. The shift between the curves, which corresponds to the degree of virus loss, is indicated by arrows coloured according to the bacteriophage type. doi:10.1371/journal.pone.0034631.g001 from MS2, 137 reads from M13 and 29 reads from T4 (Tab. 1), thus correlating with the numbers of the bacteriophage genomes detected by qRT-PCR in the final preparation. These reads were excluded from the following data analysis.

Data analysis
Primary reads and contigs were checked for sequence identities with a viral genome database including the calculation of tBLASTx E scores. Only sequences with a tBLASTx score E , = 10 24 were suspected to represent sequences of known and closely related viruses. By this, 5,366 primary reads (7.7%) and 315 contigs (21.3%) showed significant identities to known viral sequences and were therefore included in the further data analysis.
Out of the selected 5,366 primary reads, 3,965 reads (73.9%) showed significant sequence identities to bacteriophages, 1,282 reads (23.9%) were similar to mammalian viruses, 43 reads (0.8%) were identified as plant virus sequences and 56 reads (1.4%) could not be classified into one of these groups (Fig. 3A). An assignment of the primary reads to different types of viral genomes identified 49.8% double-stranded (ds) DNA viruses, 26.4% single-stranded (ss) DNA viruses, 6.0% dsRNA viruses, 13.5% ssRNA viruses and 4.2% unknown genome types (Tab. 4, Fig. 3B). By analysis of the contigs, 39.0% dsDNA viruses, 36.2% ssDNA viruses, 6.3% dsRNA viruses, 14.3% ssRNA viruses and 4.1% unknown genome types were detected. An assignment of the sequences to known viral families was performed by application three different calculations (Tab. 5). Using the primary reads only, the most abundant virus families were Siphoviridae (30.1%), Microviridae (21.7 %) and Myoviridae (11.3%), all of them representing bacteriophages. Using the contigs only, three bacteriophage families were again most prominent, but with different percentages: 33.1% Siphoviridae, 14.9% Myoviridae and 9.1% Podoviridae. The maximum identities between the deduced amino acid sequences of the contigs with the lowest tBLASTx E scores and that of known members of the respective virus families ranged from 35% to 100% (Tab. 5). The lengths of the viral genomes are remarkably different between different virus families [23] and it can be assumed that longer genomes are represented by more primary reads per genome than smaller ones. Therefore, a correction factor for the genome length was introduced to give estimation for the particle number. After this correction, the most abundant viral families were Microviridae (bacteriophages, 45.9%) followed by the Chimpanzee stool associated circular ssDNA virus (ChiSCV, see below, 12.0%) and Picornaviridae (mammalian viruses, 10.8%) (Fig. 3C). The Shannon index was used to assess the diversity of the sequences detected in the sample. Shannon indexes of 4.7777 and 4.252 were calculated for the primary reads and contigs, respectively.
A closer inspection of the sequences revealed that a total of 15 different viruses with sequence identities to mammalian viruses were identified (Tab. 6). These included the RNA viruses: kobuvirus, rotavirus group A and C, astrovirus, enterovirus B, sapovirus, picobirnavirus, teschovirus, picornavirus, and the

Whole genome analysis of a novel circular ssDNA virus
The whole genome of the suspected ChiSCV-related virus was amplified from the nucleic acid preparation in two fragments using PCR. Primers used in PCR were designed on the basis of the sequences derived from the deep sequencing analysis. After sequencing of the overlapping PCR products, the whole genome sequence was assembled and ORFs were identified. The genome of this virus has a length of 2,459 nucleotides and a circular topology (Fig. 4A). A total of 6 ORFs encoding proteins consisting of more than 100 amino acids were identified. The protein encoded by ORF1 shows the highest amino acid sequence identity (27.0%) to the replicase protein (Rep) of ChiSCV (Gen-Bank Acc.-No. ABD24829.1). ORF2 shows the highest amino acid sequence identity (49.2%) to the capsid protein (Cap) of ChiSCV (Gen-Bank Acc.-No. ADB24798.1). No significant similarities to known proteins could be determined when the amino acid sequences encoded by the other ORFs were analysed. The genome organization of PigSCV is different from that of ChiSCV: while Rep and Cap are encoded by the same DNA strand in ChiSCV, the respective genes are oriented in opposite direction in the PigSCV genome (Fig. 4A). A search for repetitive sequences revealed a short stemloop sequence between nucleotide positions 743 and 759 on the PigSCV genome (Fig. 4B). This stemloop also contains a conserved sequence with identitiy to stemloop sequences of several ChiSCV isolates. A phylogenetic analysis of the amino acid sequence of Rep together with related viruses from animals and plants revealed the closest relationship of PigSCV to ChiSCV (Fig. 4C). However, within this group, the PigSCV sequence forms a separate branch indicating a membership of this virus to a separate species.
In order to analyze PigSCV infection in the individual pigs, stored faecal and serum samples from the animals were tested by PCR for the presence of the PigSCV genome. As evident from Table 7, specific PCR products were detected in faecal samples of each of the tested piglets; however, at different time-points and with different intensity (Tab. 7). The highest detection rate was found in the pigs at 33 and 35 days of age. The PigSCV genome was not detected in serum samples derived from the pigs at 42 days of age.

Discussion
The viral community present in animal faeces is complex in function and composition. So far, only little is known about the diversity and distribution of viruses in this ecosystem. The analysis of the composition of viruses in faeces was hampered for a long time by the lack of suitable methods. However, the recent development of deep sequencing techniques enabled the analysis of the viral community in faeces from different species like human [18,19,21], horse [24], turkey [25], sea lion [26] and rodents [27]. For pigs, only one study was published very recently [22]. The applied protocols are diverse and not standardized; therefore, comparison of the generated data is difficult.
To overcome the problem of comparability, we here suggest the use of a process control for monitoring the efficiency of the  protocol applied for the analysis. The process control consists of a set of previously characterized viruses, of which a defined number is added to the sample and then followed during the process. Similar process controls are well established in other analytical applications, such as PCR analysis of food and environmental samples [28][29][30]. For the application in analysis of viral communities, the process control should reflect the variety of viruses with different morphologies and genome types, which can be suspected to be present in the sample. The three selected bacteriophages used here are diverse in these aspects and represent a large proportion of known viruses. However, some morphologies, e.g. enveloped spherical viruses, as well as some genome types, e.g. double-stranded RNA, are not present so far and may be added in future in order to improve the process control. As a proof of principle, we could demonstrate that all of the bacteriophages added to the sample could be identified in the sequences generated by deep sequencing. This indicates that the bacteriophages could be monitored throughout the whole process until its end, which includes sequence analysis. Using the recovery rates or sequence read numbers of the control bacteriophages, a comparison of results from studies performed with different techniques should be possible in the future. Although in principle the three used bacteriophages seem to be appropriate for monitoring the efficiency of the process, the low recovery rate of bacteriophage T4 demand further developments for a robust dsDNA virus control. Since the majority of viral sequences identified in our analysis of the pig faecal sample originate from dsDNA viruses, the bacteriophage T4 control may not sufficiently reflect the behaviour of many dsDNA viruses. Therefore, changes in the preparation protocol for bacteriophage T4 (e.g. by adding steps preventing aggregation of virus particles) or a replacement by another virus should be considered in the future. The process control was also used as a read-out for optimization of the distinct steps of the method. By this, it could be confirmed that some of the techniques applied in former studies, e.g. tangential flow filtration [31,32], are more effective than others. Besides minor variations of parameters in order to assess the optimal conditions for each step, some more basic decisions had to be made during the optimization process. One of them refers to the use of an RNase digestion step to remove contaminating free RNA from the virus particle suspension. As high amounts of RNase are difficult to inactivate or to remove from the preparation, residual RNase activity can destroy the viral RNA in the final nucleic acid preparation [32]. We show that free bacteriophage RNA is rapidly degraded in the faecal suspension within the first 30 minutes of the applied protocol; therefore, RNase treatment seems not necessary at later steps. A second decision refers to the use of amplification steps after preparation of the nucleic acids. It is known that some techniques induce changes in the relative distribution of viral genome types, e.g. several isothermal amplification methods preferentially amplify small circular DNA molecules [33]. To avoid extensive amplification steps, a relatively high amount of starting material (100 g faeces) was used here. However, the yield of purified virus nucleic acid was too low for direct use in deep sequencing. Therefore, amplification was implemented in the protocol; however, the cycles were limited to the lowest possible number. The identification of all known viral genome types in the primary sequence reads of the pig faecal sample analyzed here indicate that the optimized protocol is suitable for simultaneous identification of DNA viruses and RNA viruses in pig faeces. By application of the optimized protocol to the pooled pig faecal sample, we detected 7.7% sequences with a high sequence identity to known virus genomes. This value is similar to that of recently published viral metagenome studies, which range from 2% [18] to 13% [22]. The slight difference between our study and that of Shan et al. [22], which both analyzed pig faecal samples, may be explained by the use of different databases and different sequence identity cut-offs. In other aspects, the results of both studies are similar. For example, Shan et al. identified 99% of the mammalian viruses as RNA viruses, which is similar to the value of 96% obtained in our study. Kobuvirus, enterovirus, sapovirus, teschovirus, astrovirus and bocavirus were identified in both studies. The similarity of these results is quite surprising taking into consideration that the samples analyzed in both studies originated from pigs from different continents. However, some differences still exist as coronavirus was only found by Shan et al. [22] and rotavirus was only detected in our study. The detection of rotaviruses, which are known as causative agents of gastroenteritis in pigs [34,35], correlates with the fact that some of the pigs included in our study suffered from watery diarrhoea. In the study of Shan et al. [22], a correlation was found between the detection of bocavirus or coronavirus and the appearance of diarrhoea. Additionally, we detected a higher number of different virus families, compared to the study of Shan et al. [22], which may be explained by the use of a higher amount of faeces as starting material enabling detection of less represented viruses.
An additional factor with marked influence on the results of metagenomic analyses is the applied method for data analysis. First, a reliable cut-off for selection of sequences with significant sequence identities to known virus sequences has to be defined. Here, a tBLASTx E score , = 10 24 was used, according to other studies applying tBLASTx E scores between 10 23 [20,21] and 10 25 [16,18]. However, using this cut-off, many sequences with relative low deduced amino acid sequence identities were included in the analysis (Tab. 5). In addition, it has to be considered that the analysis of deduced amino acid sequences may not be appropriate in the case of sequences representing non-coding regions of viral genomes. Second, it should be decided, whether primary reads or contigs are used. In our study, the overall distribution of viral genome types or virus families was rather similar when using primary reads or contigs for analysis. However, analysis of the contigs only may lead to a reduction of detected virus diversity, as the rare sequences of less represented viruses cannot be assembled into contigs. Indeed, by using the primary reads we could identify 447 different virus species, compared to 121 viruses identified by the use of contigs. The Shannon index of diversity determined using the primary reads was 4.777 compared to 4.252 using contigs. As additionally contig assembly is strongly dependent on the used assembly algorithm [36,37], it may be generally advisable to use primary reads for metagenomic data analysis. A third factor influencing results of data analysis refers to the highly diverse viral genome length. As a larger genome type is likely to be represented by more primary reads per genome than a smaller one, correction factors should be developed and included. We could show that the use of such correction factors can dramatically change the calculated distribution of viral families leading to a much higher abundance of mammalian viruses (with small genomes) than calculated without consideration of genome length. By application of the optimized protocol and subsequent data analysis, sequences with similarities to a chimpanzee virus have been identified in the pig faeces. Subsequent amplification and analysis of the whole genome sequence indicate that this virus is closely related to the chimpanzee virus [38]. The two largest ORFs clearly show sequence similarities to the encoded replicase and capsid proteins of the chimpanzee virus, but their orientation in the genome is different and the percentages of the sequence identity are relatively low. As such virus sequences have not yet been described in pigs, it is concluded that it represents a novel pig virus. Analysis of samples from the pigs used in our experiment indicated that all of the piglets were infected with this virus, but the amounts and time-points of virus excretion in the faeces differed between the individuals. The different pattern of virus excretion in pigs receiving the same feed argues against an origin of the virus from infected plants that were ingested and excreted after intestinal passage, as originally suspected as one possibility for ChiSCV [38], although the feed could not be directly tested. No PigSCV genome was detected in the serum of the pigs; however, only one time-point could be analyzed because of the limited availability of the sera. Further studies are necessary in order to assess the origin, dissemination, organ distribution and clinical significance of this novel virus in pigs. PigSCV belongs to the  single-stranded circular DNA viruses, which are increasingly detected in animals, plants and environmental samples [39][40][41]. The reason for this increased detection is unknown so far; however, it may be explained by the increased use of specific amplification methods in the detection protocols, which are known to amplify circular DNA molecules more efficiently than others [33]. Shan et al. [22] also found several single-stranded circular DNA virus sequences in pig faeces, though, with identities to circoand cycloviruses. Although the detection of the novel virus indicates the suitability of the protocol for detection of so far unknown viruses, it has to be clearly stated, that only those viruses can be identified, for which sequence data of related viruses are present in the databases. For the identification of members of totally new virus families, novel analytical tools have to be developed using ''deep bioinformatics'', e.g. functional predictions of putative proteins.
In summary, the results show that the established protocol enables the efficient analysis of the virus content in pig faeces including the identification of new viruses. The high percentage of bacteriophages detected in the faecal sample suggests an important function of these viruses in the modulation of the bacterial intestinal community, which should be analyzed in more detail in future. In addition, the clinical significance of the detected pig viruses -separate and in combination with each other -should be investigated further. The method may generally be useful in studies investigating aetiology, epidemiology and ecology of diseases. The implemented process control serves as quality control ensuring comparability of the method and may be used for further method optimization in future.

Ethics statement
The animal study was approved by the local ethic commitee of the state Berlin (Landesamt für Gesundheit und Soziales, Berlin, Germany) under the accession number G 0350/09.

Faecal samples
Faecal samples were derived from pigs kept in the experimental animal facility of the Federal Institute for Risk Assessment (Berlin, Germany) and used in preliminary studies for establishment and optimization of the methods. For final testing of the established protocol, a fresh pooled faecal sample (approximately 100 g) was derived from five 35 day-old male pigs housed in one pen of the experimental facility. The animals were in good overall clinical condition. However, two of the five pigs suffered from watery diarrhoea at the time of the collection of the faeces. After collection, the faeces were stored at 4uC for 2 days until analysis. Individual rectal faecal samples and serum samples were collected at different time-points and stored at 280uC.

Bacteriophages
The bacteriophages T4 (German Collection of Microorganisms and Cell Cultures, Braunschweig, Germany), M13 (kindly provided by E. Lanka, Max Planck Institute for Molecular Genetics, Berlin, Germany) and MS2 (kindly provided by J. Dreier, Ruhr University of Bochum, Bad Oeynhausen, Germany) were used as process control to assess the efficiency of the method. Table 1 summarizes the properties of the phages. M13 and T4 were propagated on E. coli JM110 (New England BioLabs GmbH, Frankfurt, Germany), MS2 on E. coli strain Top10 F+ (Invitrogen, Leek, Netherlands). For quantification, the phage titer of lysates was determined as previously described [42]. For the detection of the bacteriophage genomes at distinct steps of purification and amplification, quantitative real-time RT-PCR (qRT-PCR) assays were established. Primers and probes used for the detection of the phages are listed in Table 2. All qRT-PCRs were performed using the QIAamp probe RT-PCR kit (Qiagen, Hilden, Germany) in an ABI 7500 cycler (Applied Biosystems; Darmstadt, Germany). After reverse transcription at 50uC for 30 min and activation of the polymerase at 95uC for 15 min, 45 cycles were performed, each consisting of 94uC for 15 sec, 56uC for 1 min and 72uC for 1 min. Although the bacteriophages T4 and M13 have DNA genomes, their detection was also carried out using this RT-PCR protocol in order to enable parallel analysis of all three bacteriophage genomes using the same cycling conditions. Ten-fold diluted nucleic acid preparations of the titrated phages were used as standards for quantification.

Purification and concentration of virus particles
Virus particles were purified and concentrated by a combination of tangential flow filtration (TFF), ultrafiltration and caesium chloride (CsCl) density gradient ultracentrifugation [17,21,31,32,43]. Briefly, approximately 100 g of pig faeces were resuspended in 1,000 ml SM-buffer by magnetic stirring. Thereafter, 1610 7 plaque-forming units of each bacteriophage were added to the suspension. Aliquots of the faecal suspension were taken before and after adding of phages and tested at a later date by qRT-PCR. The sample was centrifuged at 17,0006 g for 30 min to remove the large particulate debris and the supernatant was collected. The procedure was repeated by a three hour centrifugation to remove smaller particular structures. Afterwards, a first TFF was performed using a 0.22 mm filter (PALL Corporation, Middleton; MA, USA) to remove bacterial and eukaryotic cells and debris. The remaining filtrate was subjected to a second TFF with a 50 kDa filter (PALL Corporation, Middleton; MA, USA) to concentrate the virus particles. This viral preparation was further concentrated by centrifugation through Vivaspin 50,000 MWCO concentrators (Sartorius Stedim Biotech GmbH, Goetting, Germany) at 1,5006 g resulting in a final volume of 36 ml. Volumes of 18 ml each were loaded onto a stepwise CsCl density gradient with density layers of 1.7, 1.5, 1.35 and 1.2 g ml 21 (each 5 ml) and centrifuged at 65,0006 g for 14 hours at 10uC. The 1.35-1.5 g ml -1 layers were collected from the gradients using a syringe and pooled resulting in a final volume of 9 ml.

Nucleic acid preparation and deep sequencing
To eliminate free DNA present in the virus concentrate, 500 ml CsCl purified virus suspension were treated with 25 units DNase I (2,000 U/mg, bovine pancreas grad II; Roche Diagnostics GmbH, Mannheim, Germany) for 45 min at 37uC, followed by heat inactivation for 10 min at 65uC. Thereafter, DNA and RNA were simultaneously extracted from 200 ml using the RTP H Pathogen Kit (Invitek; Berlin, Germany). The extracted nucleic acids (75 ng per reaction) were randomly primed for cDNA synthesis using the TransPlex H Complete Whole Transcriptom Amplification Kit (WTA2, Sigma-Aldrich, St. Louis, MO, USA) according to the protocol recommended by the supplier; however, the annealing temperature was decreased to 40uC (cycles 1 and 2) and 45uC (cycles 3 and 4) to enable the simultaneous amplification of DNA and RNA. Aliquots of 75 ml each were removed from the WTA reaction at different cycle numbers, and purified as well as size-selected using MobiSpin S-400 Columns (MoBiTec, Goetting, Germany). The DNA concentration of the preparations was measured using a nanodrop spectrometer (PEQLAB Biotechnologie GMBH; Erlangen Germany) and the preparation derived from a minimum of amplification cycles with a DNA concentration higher than 50 ng/ ml was used for deep sequencing. A total of 1 mg DNA was applied to deep sequencing on a 1/8 plate of the GS-FLX sequencer 454 Titanium (GS Titanium SV emPCR Kit (Lib-L) v2; GS Titanium PicoTiterPlate Kit 70675; GS Titanium Sequencing Kit XLR70t; Life Sciences, Roche, Branford, USA) according to the manufacturer's protocol.

Data analysis
Primary sequence analysis was applied to raw sequence reads, which were subjected to amplification primer/adaptor trimming using SeqMan (DNASTAR, Lasergene, Madison, USA) and selection for a minimum length of 50 nt. In parallel, all primary reads were subjected to de novo contig assembly using the 454 Newbler Assembler [44] software (http://www.my454.com/), with criteria of 90% minimum overlap identity and a minimum overlap length of 40 nt. Homology searches for primary reads and assembled contigs were performed with tBLASTx [45] and CLC Main Workbench 6.2 (http://www.clcbio.com/index.php?id = 532) against a local database. This local database was created using EXCEL (2003). It included the viral genome non-redundant reference sequence nucleotide database (RefSeq, NCBI, ftp://ftp. ncbi.nih.gov/refseq/release/viral/ 30.08.2011 download) and additional sequences from recently discovered viruses, which had been manually added. BLAST results with an E-value , = 10 24 were selected and used for further grouping analysis, which was performed using a script manually written with R 2.13 [46]. This analysis included counting of detected species and determination of their taxonomy, which was also used to determine the virus hosts. Correction of the different genome length of the virus families was done by using the formula: formal size factor = read number/ genome size (the genome size for the respective virus species was derived from the above mentioned local database, which included these data from the RefSeq database, NCBI). The Shannon index [47] was calculated to compare the diversity of the species identified by primary reads and by assembled contigs.

Whole genome analysis of PigSCV
Two overlapping fragments covering the whole genome sequence of PigSCV were amplified by PCR using nucleic acid derived from the concentrated virus particles as template. The Long range PCR kit (Qiagen, Hilden, Germany) was used with primers PigSCV-1s (59-CCA ATC AGA TTC ACG CTT ACC G-39) and PigSCV-1as (5 9AAC ATC GTC AAC CGT ATC ATG G-39) or primers PigSCV-2s (59-GGG CCA CGC ATG AAC CTT CC-39) and PigSCV-2as (59-ACC ATT GAA ATC ATC TGG GAT G-39) for amplification of overlapping fragments with lengths of 1,849 bp or 1,123 bp, respectively. The PCR products were directly sequenced in an ABI 3730 DNA Analyser (Applied Biosystems). The whole genome sequence was assembled from the sequenced fragments, submitted to the GenBank database under the accession number JQ023166. ORFs were predicted using the SeqBuilder module of the DNASTAR software package (Lasergene, Madison, USA). Sequence distances were calculated and phylogenetic trees were constructed with the MegAlign module implemented in the DNASTAR software package.

Detection of PigSCV genome in pig samples
Faecal samples derived from the rectum of the pigs at d29, d33, d35 and d42 and serum samples of d42 were stored at 280uC. The nucleic acid was extracted from 200 ml faecal suspension or 100 ml serum using NucliSens Magnetic Extraction in a NucliSens EasyMag device (bioMerieux Deutschland GmbH, Nürtingen, Germany). PigSCV was detected by PCR using the TaKaRa ExTaq PCR Kit (TaKaRa Bio Europe, Göttingen, Germany) with primers PigSCV-2s and PigSCV-1as amplifying a 361 bp fragment of the PigSCV genome. PCR was performed by an initial denaturation step at 95uC for 5 min followed by 40 cycles each including 95uC for 30 sec, 60uC for 30 sec and 72uC for 45 sec, and a final elongation at 72uC for 5 min. PCR products were analysed by electrophoresis on ethidium bromide-stained agarose gels.