Identification and Distribution of Novel Cressdnaviruses and Circular Molecules in Four Penguin Species in South Georgia and the Antarctic Peninsula

There is growing interest in uncovering the viral diversity present in wild animal species. The remote Antarctic region is home to a wealth of uncovered microbial diversity, some of which is associated with its megafauna, including penguin species, the dominant avian biota. Penguins interface with a number of other biota in their roles as marine mesopredators and several species overlap in their ranges and habitats. To characterize the circular single-stranded viruses related to those in the phylum Cressdnaviricota from these environmental sentinel species, cloacal swabs (n = 95) were obtained from King Penguins in South Georgia, and congeneric Adélie Penguins, Chinstrap Penguins, and Gentoo Penguins across the South Shetland Islands and Antarctic Peninsula. Using a combination of high-throughput sequencing, abutting primers-based PCR recovery of circular genomic elements, cloning, and Sanger sequencing, we detected 97 novel sequences comprising 40 ssDNA viral genomes and 57 viral-like circular molecules from 45 individual penguins. We present their detection patterns, with Chinstrap Penguins harboring the highest number of new sequences. The novel Antarctic viruses identified appear to be host-specific, while one circular molecule was shared between sympatric Chinstrap and Gentoo Penguins. We also report viral genotype sharing between three adult-chick pairs, one in each Pygoscelid species. Sequence similarity network approaches coupled with Maximum likelihood phylogenies of the clusters indicate the 40 novel viral genomes do not fall within any known viral families and likely fall within the recently established phylum Cressdnaviricota based on their replication-associated protein sequences. Similarly, 83 capsid protein sequences encoded by the viruses or viral-like circular molecules identified in this study do not cluster with any of those encoded by classified viral groups. Further research is warranted to expand knowledge of the Antarctic virome and would help elucidate the importance of viral-like molecules in vertebrate host evolution.


Introduction
Our knowledge of global viral diversity remains limited, with only a small fraction of viruses known to affect humans and wildlife having been properly described or studied. In the case of Antarctic associated viruses, only a speck of the proverbial iceberg of viral diversity has surfaced in modern research. The Antarctic Polar Front, the marine boundary between the Antarctic and sub-Antarctic waters formed in the Eocene, has played a part in the evolutionary divergence and high levels of endemism present in the Southern Ocean [1,2]. Penguins (Order: Sphenisciformes) are diving seabird mesopredators distributed across the islands and continental landmass of the Southern Ocean who form an important part of the avian biomass in the region.
Our understanding of diversity in the vertebrate hosts of the Antarctic region remains limited, with early research using serological techniques typically focusing on well-known avian and poultry pathogenic viruses associated with influenza, Newcastle Disease, and infectious bursal disease [3][4][5][6] or distemper viruses likely transferred from sled dogs to Antarctic pinnipeds [7,8]. A small number of symptomatic events linked to viruses, such as avian pox and "puffinosis-like" disease, have also been recorded in case reports [9][10][11]. Nonetheless, our knowledge of existing Antarctic vertebrate viruses beyond these reports remained constrained until the advent of high-throughput sequencing techniques, uncovering a growing number of viral genomes in the last decade, most recently reviewed by Smeele et al. [12] and complemented with the identification of RNA viruses in penguins and their ectoparasites [13].
King Penguins are primarily sub-Antarctic seabirds with a circumpolar distribution on islands near the Polar Front. Adélie Penguins are considered true Antarctic, ice-loving birds, breeding along the entire Antarctic coast and southern islands of the Scotia Arc. Chinstrap Penguins, on the other hand, are restricted to South Georgia and the South Sandwich Islands, South Orkney Islands, South Shetland Islands, and Western Antarctic Peninsula. The widest-ranging penguin species is the Gentoo Penguin, with a circumpolar distribution that spans 46-66 • S, breeding north and south of the Polar Front on Sub-Antarctic Islands and throughout the entire Scotia Arc and down the Western Antarctic Peninsula. Some of the four species, therefore, overlap in portions of their range and provide an opportunity to investigate circular single-stranded DNA viruses related to those in the phylum Cressdnaviricota [27] across sites and in sympatric breeders. We also evaluated whether viral sequences are shared between parent-chick pairs of Pygoscelid penguins.

Field Sampling
Between 22 December 2015 and 17 January 2016, a total of 95 cloacal swab samples were obtained as part of gastrointestinal microbial analysis from penguins. Since we had an archived sample set, we decided to determine the associated DNA viruses from the animals where none of the animals had any obvious signs of illness/disease at the 8 sites on Deception Island, the South Shetland Islands, South Georgia, and the Antarctic Peninsula (Supplementary Table S1, Supplementary Data 1).  Sampling sites are shown in Figure 1, and details are found in Supplementary Table S1  Review Board. All samples were transported in accordance with applicable export permits and United Kingdom Department for Environment, Food, and Rural Affairs import permits.

Viral Extraction and High-Throughput Sequencing Analysis
Viral DNA was extracted from 200 µL of the swab suspension after vortexing, using the High Pure Viral Nucleic Acid Kit (Roche Diagnostics, Indianapolis, IN, USA). Following viral DNA extraction, circular DNA was preferentially amplified by rolling circle amplification (RCA) using the TempliPhi™ 100 Amplification Kit (GE Healthcare, Chicago, IL, USA). An aliquot of the resulting RCA-amplified DNA (5 µL) was pooled based on animal species per site and used to generate Illumina sequencing libraries using the Nextera DNA Flex Library Prep Kit (Illumina Inc, San Diego, CA, USA). The libraries were sequenced on an Illumina 4000 sequencer (2 × 100 bp library) and the resulting paired-end raw reads were trimmed using Trimmomatic [28] with default settings. The trimmed reads were de novo assembled using metaSPAdes v 3.12.0 [29]. All assembled contigs >500 nts were analyzed against a viral RefSeq [30] protein database using BLASTx [31]. Circular contigs were determined based on terminal redundancy. A summary of the read counts and contigs > 500 nts per sample sequenced is provided in Supplementary Table S2. The raw reads have been deposited in SRA (SRX9081836-SRX9081846).

Recovery and Sequencing of Viral Genomes and Viral-Like Circular Elements
There were 28 cressdnavirus-like contigs identified, and these were used to design abutting primer pairs (Supplementary Table S3) for the recovery of complete viral genomes by PCR. The specific primers pairs, together with KAPA HiFi HotStart DNA Polymerase (Kapa Biosystems, Wilmington, MA, USA), were used to screen and amplify the virus genomes and viral-like elements from each sample, using the thermal cycling protocol recommended by the manufacturer with an annealing temperature of 60 • C and 0.5 µL of the RCA product. The resulting amplicons were resolved on a 0.7% agarose gel, excised, purified, and cloned into pJET1.2 plasmid (ThermoFisher, Waltham, MA, USA). The recombinant plasmids were Sanger-sequenced by primer walking at Macrogen Inc. (Seoul, Korea), and contigs assembled using Geneious Prime [32].
A dataset of the replication-associated proteins (Rep) encoded by the members of the recently established phylum Cressdnaviricota [27], i.e., alphasatellites, circoviruses, geminiviruses, genomoviruses, nanoviruses, redondoviruses, and smacoviruses was assembled with sequences available in GenBank. This dataset was clustered with a 0.9 sequence identity cut-off using CD-HIT [34] and a representative from each cluster was used to build a Rep-CRESS dataset together with all the Rep sequences of unclassified CRESS DNA viruses and plasmids analyzed in Kazlauskas et al. [35]. The Rep-CRESS dataset, together with 54 Reps from this study, was used to generate a sequence similarity network (SSN) analysis using EST-EFI [36,37] with a minimum similarity score of 60. The SSN was visualized in an organic layout with Cytoscape V3.7.1 [38].
Similarly, a capsid protein (CP) dataset of the CPs encoded by circoviruses, geminiviruses, genomoviruses, nanoviruses, redondoviruses, and smacoviruses was assembled and clustered with a 0.9 sequence identity cut-off using CD-HIT [34]. Representative sequences from each cluster were assembled together with the RNA virus CP sequences (since they have similarities to some CRESS DNA virus CPs) of tombusviruses (n = 54; derived from RefSeq), albetoviruses (n = 3; derived from RefSeq), all CPs of unclassified CRESS DNA viruses, and 84 CPs from this study to build a CP-CRESS-albeto-tombus dataset. The CP-CRESS-albeto-tombus dataset was used to generate a sequence similarity network (SSN) analysis using EST-EFI [36,37] with a minimum similarity score of 10. The SSN was visualized in an organic layout with Cytoscape V3.7.1 [38]. Sequences within a network cluster that included sequences from this study were extracted and aligned using MAFFT [39,40]. The alignments were trimmed using TrimAl [41] with a gappy option, and the resulting alignments were used to determine the best amino acid substitution model using ProtTest [42] and infer Maximum likelihood (ML) phylogenetic trees with aLRT branch support. All trees were mid-point rooted, and branches with <0.8 aLRT support were collapsed using TreeGraph2 [43]. The trees were visualized using iTOL v4 [44].
Species-level sequence alignments of Antarctic viruses (AntVs) and Antarctic circular molecules (AntCMs) identified in this study were used to identify any evidence of recombination using RDP4 v.4.97 [45] with default settings. Only events that were detected by 3 or more recombination detection methods implemented in RDP v4.97 with p-values < 0.05 were accepted as credible.

Identification of Viruses and Viral-Like Molecules from Cloacal Swabs
We identified 28 cressdnavirus-like contigs >500 nts from the de novo assembled sequences from pooled samples per species per site that had similarity to viral sequences. In addition, we identified 24 bacteriophage-like contigs (8 inovirus-like, 11 microvirus-like, 1 mycovirus-like, podovirus-like, and 3 siphovirus-like). For the purpose of this study, we focused on the likely eukaryotic viruses that were most closely related to those in the phylum Cressdnaviricota. Using abutting primers designed from the de novo assembled contigs, we amplified, cloned, and Sanger-sequenced 40 viral genomes (with detectable Rep and CP-coding ORFs) and 57 viral-like elements. The 40 viral genomes (2322-2729 nts; GenBank accession numbers MT196222-MT196223, MT196247-MT196250, MT196252-MT196253, MT196261, MT196279, MT196289-MT196318) did not fall within any known viral families, and based on their Rep sequences, would be part of the expanding unclassified circular Rep-encoding single-stranded (CRESS) DNA viruses within the recently established phylum Cressdnaviricota [27]. The phylum Cressdnaviricota includes seven viral families (Bacilladnaviridae, Circoviridae, Geminiviridae, Genomoviridae, Nanoviridae, Redondoviridae and Smacoviridae), the Reps of the satellite nucleic acids in the family Alphasatellitidae, and a suite of viral groups that have been loosely labeled CRESS DNA viruses. Within these named viral families in the order Cressdnaviricota, only two Penguin circovirus genomes (family Circoviridae) had previously been recovered from this same sample set, found in a Chinstrap Penguin and an Adélie Penguin [46]. CRESS DNA viruses that cannot be classified into the eight families above have been identified primarily through viral metagenomic approaches from a variety of samples, including animal tissues, as highlighted in the recent report by Tisza et al. [47], fecal samples [48][49][50] and environmental samples including plant leaves [51], soil [52], wastewater [53][54][55], seawater [56], freshwater [57], and sea spray [58]. In Antarctica, CRESS DNA viruses have been identified from cryoconite samples from the dry valley [26], Adélie and Chinstrap penguin fecal and cloacal swab samples [46,59], lake samples [17], and a pond sample [22].
Of 57 circular viral-like elements identified in this study, one viral-like element (1982 nts; MT196283) encodes a CP and a partial Rep (missing the N terminus, which is the endonuclease domain) and two additional unknown proteins, while eight For the purpose of this study, we adopted an 80% genome-wide pairwise identity cutoff (similar to other CRESS DNA viruses) as a putative species demarcation cutoff and 98% identity for genotypes (Supplementary Data 2). Thus, 40 viral genomes represent 6 tentative species (labeled as Antarctic viruses, AntV 1-6) and 12 genotypes ( Table 1). The circular molecules (labeled as Antarctic circular molecules, AntCM 1-12) can be split into 12 species and 15 genotypes (Table 1).
Viruses 2020, 12, 1029 6 of 21 A preliminary BLASTn based analysis revealed that only AntCM3, -4, -5, -6, -7, -9, 10 and -12 have hits with >20% sequence coverage to viral-like sequences identified in sea water samples from Saanich Inlet (BC, Canada) [56], fish and abalone tissue [47]. The results of this are summarized in Table 2. JX904537 does not have annotated ORFs thus, is not included in the coding sequence analysis in this study.
Viruses 2020, 12, x FOR PEER REVIEW 6 of 21 Table 2. JX904537 does not have annotated ORFs thus, is not included in the coding sequence analysis in this study.

Distribution of Viruses and Viral-Like Molecules across Sampling Sites and Penguin Species
Of 95 swab samples tested in this study, the 97 viral genomes and circular molecules were recovered from 45 individuals (2/17 Adélie Penguins, 24/32 Chinstrap Penguin, 6/26 Gentoo Penguins, and 13/20 King Penguins) with at least one AntV or AntCM (Table 1; Figure 2).
The 40 AntVs were recovered across the four penguin species (Table 1). From the 57 AntCMs recovered, 8 sequences that encode Reps were recovered only from Chinstrap Penguins while the other 49 sequences were recovered from Chinstrap Penguins and one Gentoo Penguin (Table 1; Figure 2). AntCMs (12 species/15 genotypes) were found in 19/32 individual Chinstrap Penguins sampled and in a single Gentoo Penguin (Table 1; Figure 2). Seven AntCM genotypes were found in multiple individuals, with three genotypes shared across the two Chinstrap colonies sampled in the South Shetland Islands. AntCV6 genotype I was found in 100% (n = 10) of Chinstrap Penguins shared between Chinstrap Penguin parents and their chicks at Baily Head, Deception Island. One set of AntCM genotypes with a bacterial rep (AntCM8 genotype I and AntCM8 genotype II) at the edge of the species threshold (78.8-79.1% genome-wide identity) were found at two sites 290 km apart (Booth Island, Western Antarctic Peninsula and Deception Island, South Shetland Islands). AntCM8 genotype II was unique in this study as being the only genotype found in two distinct host species: An adult Gentoo Penguin and two adult Chinstrap Penguins nesting at the same colony on Booth Island, with >99.9% shared genome-wide identity. One of these adult Chinstrap Penguins was recently found also to carry a penguin circovirus [46]. Three species (each as a single genotype) of Antarctic circular molecules that encode Reps (AntCM10-12) were identified in this sample set, all from a single Chinstrap colony in the South Shetland Islands, where two genotypes, those which only contain a rep gene (rep: 963-1335 nts), were found in multiple individuals. The other genotype, AntCM11, found in a single Chinstrap Penguin, contained a cp gene (960 nts), an unknown ORF, and a partial rep (459 nts).
Among the AntVs (6 species/12 genotypes), each genotype was only detected in a single host species, with five only found in Chinstrap Penguins, one in Adélie Penguins, one in Gentoo Penguins, and five in King Penguins. Each Antarctic virus sequence was only found at a single site, though 8/12 genotypes were found in multiple individuals at those sites ( Figure 2). AntV3 has five genotypes, two of which (genotype III and IV) were only found in King Penguins from South Georgia, sharing 82.5-94.5% genome-wide identity with the three genotypes (genotype I, II and V) found only in Chinstrap Penguins from Deception Island (Supplementary Data 1), sites separated by approximatelỹ 1700 km of the Southern Ocean (Figure 1, Supplementary Table S1). This suggests that this virus is broadly distributed across the Scotia Arc.
Cases of parent-chick viral genotype sharing were found in each Pygoscelid species. AntV1 genotype II was found in one Chinstrap Penguin adult/chick pair at Georges Point (Western Antarctic Peninsula), whereas AntV6 was shared in one Adélie Penguin adult/chick pair on Booth Island (Western Antarctic Peninsula). AntV2 was identified in five Gentoo Penguin individuals from Yankee Harbor in the South Shetland Islands, including an adult/chick pair ( Figure 2, Table 1, and Supplementary Data 1). King Penguin chicks had already fledged during sampling, thus parentage could not be determined in this sample set. In those cases where penguin parents and their chicks both share the same viruses, future studies have to be conducted to observe if those were vertically or horizontally transmitted.
Co-occurrence of these novel AntVs and AntCMs was noted in 14/32 Chinstrap Penguins  Table S1). At this time, we are unable to discern whether the presence or co-occurrence of one or more AntVs or AntCMs indicates that these infect penguins, cause decreased fitness in the hosts, or are penguin-associated through trophic transfer.

Analyses of the Replication-Associated Protein
The hallmarks of the Rep protein are the endonuclease and helicase domains that are involved in the initiation of replication in ssDNA viruses, and to some extent, in bacterial plasmids (reviewed in Kazlauskas et al. [35] and Rosario et al. [60]). The Reps encoded by the AntVs (n = 40) have the rolling circle replication (RCR) endonuclease domain and the super family 3 (SF3) helicase motifs ( Table 3). Thirteen of the 14 and AntCMs also have the RCR domain and SF3 motifs. AntCM11, which has a partial Rep, is missing the RCR domain and the Arg finger domain (Table 3).
Since we identified six AntCMs with Reps that are most closely related to those of plasmids, we included a suite of plasmids grouped by Kazlauskas et al. [35] for clustering the Reps using a sequence similarity network (SSN) based approach. We have found in the past [26,61,62] that an SSN analysis using EST-EFI [36,37] with a network threshold of 60 allows for reasonable family level clusters for families in the phylum Cressdnaviricota [27]. Based on the sequence similarity network analysis of the Rep amino acid sequences (Figure 3), none of the AntV Reps cluster with known viral sequences. A vast majority of the AntV Reps (n = 33) are part of a large cluster of Reps (n = 213). The 33 AntV Reps within this cluster are distributed within 3 clades in the ML phylogenetic tree (RtREV+I+G+F amino acid substitution model; Figure 3). One of the clades (Clade I in Figure 3) includes Reps of AntCM10. The second clade (Clade II in Figure 3) includes Reps of AntV6. The Reps of AntV3, -4 and -5 are represented in Clade III (Figure 3), forming a well-supported clade and share >72% amino acid identity. The Reps of AntCV3, -4 and-5 share < 60% amino acid identity with other Reps in this cluster. AntCV2 Reps, which share 100% amino acid identity, cluster with a Rep of Copepod LaCopCV (JF912805) sampled in Tampa Bay (Florida, USA) [63] sharing 46.8% amino acid identity. AntCV1 Reps cluster together: Those that are genotype I share 94.4-94.7% amino acid identity with genotype II (Figure 3). AntCM12 Reps cluster together sharing >97.3% amino acid identity. The AntCM11 with a partial Rep is a singleton sharing 44.1% amino acid identity to a Rep of a virus (MN928918) from a cloacal swab of an Indian blue peafowl (Pavo cristatus).    The plasmid-like Reps of AntCM1, -2 and -8 do not cluster with the plasmids (pCRESS) grouped by [35] and form two clusters, one of two Reps (AntCM1 and -2) sharing 44% pairwise amino acid identity and one of four Reps with AntCM8 genotype I Reps sharing 100% identity and collectively they share 59.6% identity with that of AntCM8 genotype II (Figure 3).

Analyses of the Capsid Protein
In general, for viruses in the phylum Cressdnaviricota, the CPs are more diverse than the Reps. Furthermore, the CPs of geminiviruses share homology to those of albetoviruses (linear ssRNA(+) genomes) [65], and a group of unclassified cressdnavirues have CPs that are homologous to those of tombusviruses (linear ssRNA(+) genomes) [64,[66][67][68].
The SSN analysis of the CPs reveals that AntCM11 CP clusters with those of smacoviruses and smacovirus-like CPs (n = 175) sharing < 33% identity (Figure 4). In a ML phylogenetic tree (amino acid substitution model LG+I+G+F), AntCM11 CP is part of a well-supported clade (Clade I, Figure 4). The remaining 83 CPs identified in this study cluster with none of the classified viral groups.
The CPs of AntCM3, -4 and -9 are part of a cluster of 57 CPs sharing 46-57% amino acid identity. In the ML phylogenetic tree (amino acid substitution model rtRev+I+G+F) these CPs are part of a well-supported clade (clade II; Figure 4). The CP of AntCM3 is most closely related (sharing 76-79% amino acid identity) to that of MH648984 identified in a seabass tissue [47]. AntCM4 CP shares~49% amino acid identity with the CP of MH649117 from a seabass tissue, whereas AntCM9 shares 69-77% amino acid identity with MH648943 (seabass tissue), MH616644 (red snapper tissue) and KX246262 (sea cucumber sample) [47,69].
The CPs of AntV3, -4, -5 and -6 are part of a cluster of 77 CPs but the CPs of AntV6 (Clade III) are part of a distinct clade in the ML phylogenetic tree (amino acid substitution model LG+I+G+F) that is different from the one of AntV3, -4 and -5 ( Figure 4). AntCV6 CPs share~40% amino acid identity with the CP of MH616953 from rainbow trout tissue [47]. The CPs of AntV3, -4, and -5 share >61% amino acid identity and less that 40% identity with other CPs within the cluster.
The CPs of AntCM5, -6 and -7 are part of the same cluster composed of 108 CPs and form two distinct clades in the ML phylogenetic tree (amino acid substitution model LG+I+G+F) with AntCM5 and -6 CPs being part of the same clade (clade V; Figure 4) sharing~40% amino acid identity. AntCM5 CP shares the highest amino acid identity (~73%) with the CPs of MH510272 and MH617705 both from seabass tissue [47]. On the other hand, the CPs of AntCM6 share the highest amino acid identity (~46%) with the CPs of MH617236 and MH617624 identified in abalone and red snapper tissue, respectively [47]. When compared to other CPs in this cluster, CPs of AntCM7 are part of a smaller clade (clade VI; Figure 4), sharing the highest amino acid identity of 77.6% with the CP of MH616694 and MH617624 of a viral sequence identified in abalone tissue [47].
AntV1 and -2 CPs form distinct clusters, each sharing 99.3-99.8 and 100% pairwise identity, respectively ( Figure 4). The CPs of AntV2 (from translation using the ciliate translation table) share the highest identity (35%) with CP-like element MK012509 from crucian carp tissue, whereas those of AntCV1 share~27% with a CP-like element of MH648877 from seabass tissue [47]. Although the CPs from this study have a high degree of sequence diversity, it is interesting to note they are more closely related to other CPs of viruses identified in the marine ecosystem in which penguins play important roles.

Recombination Analysis
The AntVs and AntCMs were aligned using MAFFT [39] at a species level, and these were used to identify evidence of recombination using RDP4 v.4.97 [45]. We identified evidence of recombination in genomes of AntV3 and -4, and AntCM3 ( Figure 5). For AntV3, two recombination events were identified, one spanning ~46% of the genome, with genotype I being the recombinant,

Recombination Analysis
The AntVs and AntCMs were aligned using MAFFT [39] at a species level, and these were used to identify evidence of recombination using RDP4 v.4.97 [45]. We identified evidence of recombination in genomes of AntV3 and -4, and AntCM3 ( Figure 5). For AntV3, two recombination events were identified, one spanning~46% of the genome, with genotype I being the recombinant, with the recombinant region Viruses 2020, 12, 1029 16 of 21 derived from genotypes III and IV. The AntV3 genotype III has a 198 nts recombinant region from the non-coding region of an unknown/unsampled viral sequence. Five variants of AntV4 genotype II have a~290 nts recombinant region in the non-coding region derived from AntV4 genotype I. AntCM3 genotype II has two recombinant regions, both derived from genotype III and in total they account for 46% of the sequence. The second smaller recombinant region of~180 nts is found in two AntCM3 genotype I variants as well ( Figure 5). Evidence of recombination between these viruses and circular molecules indicates host commonality.
Viruses 2020, 12, x FOR PEER REVIEW  16 of 21 with the recombinant region derived from genotypes III and IV. The AntV3 genotype III has a 198 nts recombinant region from the non-coding region of an unknown/unsampled viral sequence. Five variants of AntV4 genotype II have a ~290 nts recombinant region in the non-coding region derived from AntV4 genotype I. AntCM3 genotype II has two recombinant regions, both derived from genotype III and in total they account for 46% of the sequence. The second smaller recombinant region of ~180 nts is found in two AntCM3 genotype I variants as well ( Figure 5). Evidence of recombination between these viruses and circular molecules indicates host commonality.

Concluding Remarks
The overall knowledge of penguin-associated viruses is limited, and thus it is not surprising that penguin disease understanding also remains limited to observations of mass mortality events, case reports, and veterinary knowledge obtained from captive animals, reviewed in Smeele et al. [12], Barbosa and Palacios [70], Clarke and Kerry [71], Woods et al. [72], and Grimaldi et al. [73]. Advances in high-throughput sequencing have shown that penguins are associated with a number of (pathogenic or non-pathogenic) viruses not previously detected through serological means [59,[74][75][76][77][78][79][80][81]. Studies such as these to directly detect nucleic acid presence in animal hosts add to our baseline knowledge, which still remains constrained by sampling effort.
In this study, the viruses and circular molecules that we identify are novel, and thus we are unable to distinguish whether penguins serve as the hosts for replication of these viruses, as they may enter the penguin gastrointestinal tract through prey consumption, indirectly through the consumption of phytoplankton or zooplankton by penguin prey, or through ingestion of ocean water. Based on the CP network analysis, it does appear that some of the CPs identified in this study have

Concluding Remarks
The overall knowledge of penguin-associated viruses is limited, and thus it is not surprising that penguin disease understanding also remains limited to observations of mass mortality events, case reports, and veterinary knowledge obtained from captive animals, reviewed in Smeele et al. [12], Barbosa and Palacios [70], Clarke and Kerry [71], Woods et al. [72], and Grimaldi et al. [73]. Advances in high-throughput sequencing have shown that penguins are associated with a number of (pathogenic or non-pathogenic) viruses not previously detected through serological means [59,[74][75][76][77][78][79][80][81]. Studies such as these to directly detect nucleic acid presence in animal hosts add to our baseline knowledge, which still remains constrained by sampling effort.
In this study, the viruses and circular molecules that we identify are novel, and thus we are unable to distinguish whether penguins serve as the hosts for replication of these viruses, as they may enter the penguin gastrointestinal tract through prey consumption, indirectly through the consumption of phytoplankton or zooplankton by penguin prey, or through ingestion of ocean water. Based on the CP network analysis, it does appear that some of the CPs identified in this study have homologs in sequences derived from various marine fish. Further investigation into viral presence in other mesopredators, such as flying seabirds and pinnipeds, along with cetaceans, may help elucidate whether relationships exist among these molecules. Coupled with long-term monitoring studies of individual animals and sites in the rapidly changing Antarctic, detection of potential disease-causing agents will inform conservation management and biosecurity.