Reconstruction of the complete mitogenomes of predator and prey from a faecal metagenomic dataset

The application of faecal DNA in genetic studies of wild populations minimises disturbances to their normal behaviours and body integrity. Here, I present an analysis of a metagenomic dataset generated from the faecal DNA of several specimens of the estuarine pipefish, Syngnathus watermeyeri, to simultaneously assemble the mitogenomes of the predator and its main prey species, the copepod Pseudodiaptomus hessei. The mitogenomes of the pipefish and the copepod were successfully reconstructed using a combination of short seed extension and denovo metagenomic assembly. Nucleotide blast searches of the circular contigs, mitogenome annotations, and Bayesian phylogenetic analyses confirm the completeness and correct taxonomic placements of the two mitogenomes. In addition, heteroplasmy detection and Pool-Seq variant calling quantified the level of genetic diversity in the sequences that formed these assemblies. These can be used as a first step to non-invasively survey genetic diversity in these populations.


Value of the Data
• The analysed metagenomic dataset is the first to simultaneously assemble the mitogenomes of a predator and its major prey species from non-invasively sampled faecal DNA.
• It provides researchers with a new method to acquire genetic data from wild species without the need to immobilise or even directly observe them in the field.It can also be used as a first step in non-invasively assessing genetic diversity in wild populations.• It indicates that a large amount of unexplored mitogenomic sequencing data that are already available in public genomic databases can potentially contribute to the development of more extensive reference databases for genetic studies.

Background
Genetic studies on wild populations have traditionally relied on samples that were collected intrusively, e.g., through lethal methods or by taking tissue samples from animals that were immobilised physically or chemically [1] .
Even when non-lethal methods are used, survival is not guaranteed because the target animals are subjected to significant stress, and the small injuries resulting from the acquisition of tissue samples may result in excessive bleeding and serious infection [ 2 , 3 ].The application of intrusive sampling methods in the study of small-bodied, rare, and threatened species is thus potentially problematic.
With the recent development of more sensitive biochemical assays and sequencing technologies, coupled with high-throughput computational resources, it is now possible to obtain a comparable level of information by sequencing the trace amounts of DNA that animals leave behind in their surrounding environments (environmental DNA, or eDNA), and thus minimise the level of stress or injury to which the subjects are exposed [4] .
Molecular reconstruction of diet by high-throughput sequencing of DNA retrieved from body excretions, such as faeces or regurgitates, provides valuable information about a species' dietary requirements across a broader taxonomic range and on a shorter time scale than morphological identification of hard part contents of gut or faecal samples [5] .However, exposure to digestion enzymes in the alimentary canal fragments DNA into shorter pieces, making the sequencing of the relatively long regions that are typically targeted by PCR-based approaches a challenging task [6] .This has necessitated the use of diagnostic short fragments of gene regions from the mitochondrial genome to identify prey species [7] .While these markers are useful for distinguishing between different taxa at higher taxonomic levels, longer DNA fragments or sequencing the complete mitogenome are usually needed to conclusively resolve taxonomic and phylogenetic uncertainties [8] .
As the output of sequencing platforms increases, the application of faecal metagenomic DNA for simultaneous assembly not only of the complete mitogenomes of a predator but also of the prey species it has consumed has become feasible, but it remains largely unexplored [ 8 , 9 ].
The Critically Endangered South African estuarine pipefish, Syngnathus watermeyeri Smith, 1963 is one of the rarest fish species on Earth.In 1994, it was declared extinct by the IUCN until a small population was rediscovered in 2006 [10] .Its current distribution range is restricted to narrow strips of submerged macrophytes (mainly Zostera capensis and Ruppia cirrhosa ) in the Kariega and Bushmans estuaries on the south-eastern coast of South Africa.Its catastrophic population decline has been attributed to poorly managed coastal development plans that resulted in the degradation of its estuarine habitats [11] .Similar to other species of pipefish, S. watermeyeri is characterised by low mobility, small home ranges and low fecundity, which can negatively impact the successful implementation of conservation plans [11] .Thus, genomic resources for this species are needed to establish a multifaceted effective conservation plan to protect its last remaining populations.
In this analysed dataset, I evaluated whether it is possible to use random shotgun sequencing of faecal metagenomic DNA to simultaneously assemble the complete mitogenome of Syngnathus watermeyeri and that of its most important prey species, the copepod Pseudodiaptomus hessei (Mrázek, 1894).

Data Description
An Illumina sequencing run yielded a total of 35,293,150 paired-end sequences.The NOVO-Plasty pipeline successfully retrieved the estuarine pipefish initial cytb seed from pooled faecal DNA originating from several pipefishes, and a circular contig 16,449 bp in length was assembled ( Fig. 1 a).A blast search against the NCBI database identified the only published mitogenome of Syngnathus watermeyeri , which was assembled from DNA extracted from a fin clip biopsy (NCBI accession number OR496150), as the closest match, with 99.9 % identity and 100 % coverage.When this assembly was used as the guiding reference template to re-assemble the complete mitogenome denovo in the GetOrganelle toolkit, an identical circular contig was retrieved, confirming the completeness of the assembly.
The MitoZ pipeline assembled a circular contig 14,714 bp long, with 75.69 % identity to the mitogenome of the copepod Phyllodiaptomus tunguidus (NCBI accession number MW9714 4 4.1), a species that does not occur in South Africa ( Fig. 1 b).However, when the annotated COI sequence from this contig was searched against the nucleotide database, it was 99.52 % identical to the COI gene sequence from Pseudodiaptomus hessei (NCBI accession number OM747860.1), a species that was recently identified as the major prey item of the estuarine pipefish based on COI metabarcoding [12] .In addition, a neighbour-joining phylogenetic tree that was constructed based on the annotated COI sequence of the assembled mitogenome, COI sequences from previously published records of the same species in the same geographic region, and other closely related calanoid copepods confirmed that sequences from the assembled mitogenome clustered with other COI sequences generated from Pseudodiaptomus hessei tissue samples with 100 % bootstrap support (Supplementary Information, Fig. 1 ).
A total of 37 mitogenomic features, including 13 protein-coding genes, 22 tRNAs and 2 rRNAs, were annotated on two assembled contigs ( Tables 1 and 2 ).Comparative Bayesian phylogenetic analysis ( Table 3 ) shows that the pipefish mitogenome that was assembled from faecal metagenomic DNA clusters with that assembled from tissue sample with a posterior probability of 1.These two almost identical mitogenomes form a monophyletic group with another southern African pipefish species, Syngnathus temminckii , and this clade of southern African pipefishes has a sister taxon relationship with a clade of northern hemisphere pipefishes that includes S. acus, S. rostellatus, and S. typhle ( Fig. 2 ).The mitogenome of P. hessei is the first to be assembled for this species.In the phylogenetic tree ( Fig. 3 ), it clusters with other members of the copepod order Calanoida with a posterior probability of 1, and their phylogenetic placement relative to the Notostraca ( Triops longicaudatus, Lepidurus arcticus ), Artemidae, Streptocephalidae, Limnadidae and Daphniidae was correctly recovered.
The successful assembly of two circular contigs, complete annotation of 37 mitogenomic features on both mitogenomes, and the correct phylogenetic placement of the two species confirmed that the complete mitogenomes of the predator and its main prey item were successfully assembled from the faecal metagenomic dataset.Heteroplasmy detection identified a total of 62 putative heteroplasmic variable sites in the pipefish mitogenome ( Fig. 4 ).Similarly, when the raw sequences were mapped against the assembled copepod mitogenome, a total of 211 SNPs with an average allele count of 2.08 were assigned ( Fig. 5 ).These results reflect the level of mitochondrial genetic diversity in the faecal samples for the predator and its major prey, and the putative number of alleles in each assembly can serve as preliminary estimates of minimum genetic diversity in the populations of both the predator and its prey ( Fig. 5 ).

Experimental Design, Materials and Methods
Five pipefish were caught with a seine net in the Bushmans Estuary on the southeast coast of South Africa (-33.682215south, 26.640596 east).They were immediately transferred to a wellaerated aquarium containing estuarine water from the collection site, which was partly replaced every 30 min.The pipefish were closely monitored for three hours before releasing them back to the site in the estuary where they had been captured.All faecal pellets that they had dropped in the aquarium during this time were then collected using sterile tweezers and transferred into an Eppendorf tube containing 99 % ethanol.Faecal DNA was extracted using the standard CTAB pro- tocol [13] , DNeasy Power Soil Pro (Qiagen, Hilden, Germany) and NucleoSpin (Macherey-Nagel, Dueren, Germany).Based on agarose gel visualisation, the three extraction methods produced metagenomic DNA of similar quality, and these extractions were pooled before genomic library preparation.To construct a genomic library, the extracted DNA was first fragmented using a Diagenode Bioruptor (Diagenode, New Jersy, United States), end-repaired, and adapter-ligated as explained in the NEB 's kit instruction (NEB, Massachusetts, USA) and only DNA fragments ∼350 bp in length were selected for the amplification step.The quality of the genomic library was inspected using Qubit (Thermo Fisher Scientific, Waltham, USA), qPCR, and the DNA NGS 3K assay (PerkinElmer, Waltham, USA).The generated library was sequenced on the NovaSeq 60 0 0 SP platform (Illumina, San Diego, USA) using paired-end 150 bp chemistry.The quality of the resulting sequences and the presence of adapter contaminations were checked in FastQC v0.12 ( https://www.bioinformatics.babraham.ac.uk/projects/fastqc/ ), and potential adapter contaminations and low-quality sequences, which were defined as sequences that contain 4 consecutive base pairs with an average Phred score quality below 20 were removed using Trimmomatic v.0.39 [14] .
I followed two different methods to assemble mitogenomes: a seed extension and a k-mer based denovo cross-validation approach for the pipefish and a k-mer based denovo metagenomic assembly for its prey.The DNA of a predator is significantly more abundant in the faeces than that of the prey it has consumed [ 15 , 16 ].This suggests that while mitochondrial seed initiation is more feasible for the predator, denovo assembly is preferable for the prey.The pipefish mitogenome was assembled by the extension of a cytb seed sequence from the same species (NCBI accession number JX228139.1) in NOVOPlasty v4.3 [17] .Then, this assembled circular contig was used as the guiding reference template in GetOrganelle v.1.7 [18] to verify whether the denovo assembly method can retrieve the same circular contig from metagenomics DNA.In both packages, the assembly parameters were set to their default values.
To reconstruct the mitogenome of the prey, all metagenomic sequences were assembled denovo in the megahit v.1.2assembler [19] since this method is particularly suitable for the denovo assembly of complex metagenomic DNA.An implementation of the megahit assembler in MitoZ v.3.4 [20] enables users to filter assemble sequences based on the taxonomic group.The MitoZ assembly pipeline was run with default settings in terms of the k-mer size and other parameters, and arthropods were selected as the target taxonomic group based on previous knowledge of the importance of copepods in the diet of the pipefish [12] .
To identify the taxonomic origin and quality of the assembled mitogenomes, the resulting circularised assemblies were blast-searched against the NCBI nucleotide database.These contigs were then annotated using a combination of the MITOS online server [21] and the MitoZ "annotate" command.The phylogenetic placement of the assembled mitogenomes was determined using Bayesian phylogenetic analysis in Beast2 [22] .To this end, mitogenome sequences from 12 closely related pipefish species and four seahorses were identified based on the percentage identity of the blast results and retrieved from the NCBI database.Similarly, the mitogenome of nine closely related copepod species that covered all major taxa within this class were downloaded.A consensus Bayesian phylogenetic tree was constructed for each assembly based on the DNA alignment of 13 protein-coding sequences.The Bayesian phylogenetic site model av- eraging package bModelTest [18] was used to select the best nucleotide substitution model for each gene.The remaining parameters were set to their default values.Beast2 was run for ten independent replicates, each 100 million iterations long with an initial 33 million burn-in steps.A consensus phylogenetic tree was reconstructed in TreeAnnotator v.1.4,which is part of the Beast2 package and visualised using Figtree v.1.4( https://github.com/rambaut/figtree).
Since the mitogenomes of the pipefish and its copepod prey were assembled from mitochondrial sequences of an unknown number of predators in the tank, and an unknown number of copepods in their digestive tracts, I tested two methods to quantify the level of genetic diversity in the sequences that formed each assembly.I used the NOVOPlasty heteroplasmy detection pipeline for the pipefish, which identifies the presence of different mitochondria in cells.This method uses a two-step pipeline.In the first step, the circular genome is assembled in the regular way.Then, this assembled contig is used as both the initiation seed and reference template to identify putative variant sites above a user-defined allele frequency threshold.
To identify variable sites in the copepod mitogenome, the raw sequences were first mapped against the assembled mitogenome using BWA v.0.7 [23] .The resulting Sequence Alignment Map (SAM) was converted to its binary BAM format and subsequently sorted in SAMtools v.1.16[24] .All putative variable sites were identified using Bayesian haplotype variant caller FreeBayes v.1.3[25] in "pooled-continuous" mode, which is suitable for variant calling in Pool-Seq data when the number of individuals contributing to each pool is unknown.All variant calling parameters were set to their recommended default setting ( https://github.com/freebayes/freebayes).

Limitations
A major limitation of assembling prey mitogenomes from faecal metagenomics datasets is that the number of individuals that contributed to the assembly is unknown.Moreover, when direct observation of the predator and immediate collection of faecal samples are not feasible due to small pellet sizes or logistical constraints, predator faeces may also be a mixture of excretions from an unknown number of individuals, as was the case in this dataset.Further, territorial animals often scent-mark their territories with body excretions near those left behind by other individuals, so faecal samples collected in the field could be a mix of DNA from  multiple individuals.As a result, mitogenome assemblies like the ones reported in this dataset can only be considered consensus mitochondrial genomes.However, this level of information is typically adequate for species identification and preliminary surveys of population-level genetic diversity in wild species since the focus of these studies is on monitoring the overall genetic diversity of predator and prey populations and assigning haplotypes to individuals is of less importance.

Ethics Statement
The research permit for this study (RES2020/101) was granted by the Department of Environment, Forestry and Fisheries of the Republic of South Africa in accordance with IUCN requirements.The animal ethics clearance for this study was approved by the Faculty of Science Ethics Committee at the University of Johannesburg (Ethics Reference Number: 2020-02-06/Teske_Weiss).

CRediT Author Statement
Not Applicable to a single-authored data note.

Fig. 1 .
Fig. 1.Circular representations of the mitogenomes of (a) the pipefish Syngnathus watermeyeri and (b) the copepod Pseudodiaptomus hessei assembled from pipefish faecal metagenomic DNA.The positions of the protein-coding genes, tRNAs, and rRNAs are shown in different colours.The interior blue bars show the GC content.

Fig. 2 .
Fig. 2. Bayesian phylogenetic tree showing the phylogenetic placement of the mitogenomes of Syngnathus watermeyeri among other species of pipefish.The numbers next to nodes represent posterior probabilities.The branches leading to the two assemblies are shown in red, with "Syngnathus watermeyeri _eDNA" representing the mitogenome reconstructed in the present study and "Syngnathus watermeyeri " representing a previously reconstructed mitogenome using a fin clip biopsy.The scale bar is proportional to the number of substitutions in each branch.

Fig. 3 .
Fig. 3. Bayesian phylogenetic tree showing the phylogenetic placement of the mitogenome of Pseudodiaptomus hessei among other crustaceans.The numbers next to nodes represent posterior probabilities.The branches leading to the two species are shown in red.The scale bar is proportional to the number of substitutions in each branch.

Fig. 4 .
Fig. 4. Number single nucleotide polymorphic sites (SNPs) identified from metagenomic sequences that were used to assemble the mitochondrial genome of Syngnathus watermeyeri .

Fig. 5 .
Fig. 5. Number single nucleotide polymorphic sites (SNPs) identified from metagenomic sequences that were used to assemble the mitochondrial genome of the copepod Pseudodiaptomus hessei.

Table 1 A
description of the 37 annotated mitogenomic features in the mitogenome of Syngnathus watermeyeri.

Table 2 A
description of the 37 annotated mitogenomic features in the mitogenome of Pseudodiaptomus hessei assembled from faecal metagenomic DNA of the pipefish Syngnathus watermeyeri .

Table 3
Names and NCBI accession numbers of the species used to reconstruct Bayesian phylogenetic trees for Syngnathus watermeyeri and Pseudodiaptomus hessei .