AMDB: a database of animal gut microbial communities with manually curated metadata

Abstract Variations in gut microbiota can be explained by animal host characteristics, including host phylogeny and diet. However, there are currently no databases that allow for easy exploration of the relationship between gut microbiota and diverse animal hosts. The Animal Microbiome Database (AMDB) is the first database to provide taxonomic profiles of the gut microbiota in various animal species. AMDB contains 2530 amplicon data from 34 projects with manually curated metadata. The total data represent 467 animal species and contain 10 478 bacterial taxa. This novel database provides information regarding gut microbiota structures and the distribution of gut bacteria in animals, with an easy-to-use interface. Interactive visualizations are also available, enabling effective investigation of the relationship between the gut microbiota and animal hosts. AMDB will contribute to a better understanding of the gut microbiota of animals. AMDB is publicly available without login requirements at http://leb.snu.ac.kr/amdb.


INTRODUCTION
Animal gut microbiota is a diverse microbial community that lives in the intestine of the host and consists of predominantly bacteria, as well as some archaea, fungi, protozoa and viruses (1). The gut microbiota has received widespread attention due to its potential to influence host physiology (2), immunity (3) and development (4). The gut microbiota has also been hypothesized to contribute to host evolution (5).
The gut microbiota and the host display a bidirectional interaction. Various studies have shown that variations in gut microbiota can be explained by differences in host characteristics (6)(7)(8). In particular, host phylogeny and diet largely account for the gut microbiota variations (7).
A recent analysis of samples from wild baboons found widespread gut microbiome heritability (9). This vertical transmission may be one of the drivers of phylosymbiosis (10). Phylosymbiosis is defined as 'microbial community relationships that recapitulate the phylogeny of their host' (11). Patterns of phylosymbiosis have been reported in many studies (12)(13)(14). Additionally, the host diet may also affect the gut microbiota, with several studies reporting that host diet can lead to the convergence of gut microbes in the host species (10,(15)(16)(17)(18).
Despite the importance of the relationship between gut microbiota and host characteristics, specifically host phylogeny and diet, there is currently no database available that enables easy exploration of the gut microbiota of various animal hosts. Most curated databases focus only on humans (GIMICA (19), GMrepo (20) and HPMCD (21)) and mice (MMDB (22)). There are several databases that contain microbiota data from various animal hosts, including IMNGS (23), MGnify (24), MG-RAST (25) and Qiita (26). However, these databases contain data from various sources other than solely from the animal hosts, making it difficult to identify the relationship between the gut microbiota and animal hosts.
Here, we present Animal Microbiome Database (AMDB) that overcomes these limitations. AMDB includes bacterial 16S ribosomal RNA (rRNA) gene profiles from various animal species to enable the assessment of the relationship between gut microbiota and animal hosts. AMDB currently incorporates 10 478 bacterial taxa and 2530 samples from 34 projects, representing 467 animal species with manually curated metadata. This novel database (i) supports searches by the bacterial taxon of interest, (ii) provides a taxonomic composition of each sample, (iii) incorporates summary information for each project and host and (iv) includes interactive visualizations. Therefore, AMDB will help scientists to quickly access animal gut microbiota data through a user-friendly interface.

Data collection and curation process
We manually selected candidate data for AMDB from the NCBI Sequence Read Archive (SRA) based on the following criteria: (i) samples included fecal or intestinal contents from individual healthy animal hosts, (ii) the PCR primers had to target the V4 hypervariable region of the 16S rRNA gene, (iii) amplicons had to be sequenced on Illumina instruments, (iv) samples had to be linked to research articles. For longitudinal data, only one sample was selected as follows; only one adult sample was included when the samples were from multiple life stages, and one sample from an earlier time point was selected for a given life stage. Samples that were duplicates of those previously included in the AMDB were not included. Amplicon data from different hypervariable regions of the 16S rRNA gene cannot be directly compared due to differences in binding affinity and resolution (27,28). We only used amplicon data from the V4 hypervariable region to ensure comparability. Illumina data was used because we used the Deblur for data processing, which was designed for Illumina data (29). To ensure that samples were of high-quality, we only selected samples linked to research articles. We checked the suitability of samples by reading the publication materials and methods, and we collected metadata, including the accession numbers and the host information. We extracted information on host diets from the MammalDIET (30) and the EltonTraits database (31). A total of 4633 samples were obtained from 51 projects (Figure 1). Figure 1 summarizes all of the data processing steps. The entire analysis was performed using QIIME 2 (Version 2021.2) (32). Paired-end reads were merged using VSEARCH with default parameters (33). The total number of sequencing reads was 434 900 445. The sequencing reads were quality filtered as follows; reads were truncated at any site containing >3 consecutive low-quality base (Phred score < 4), and the minimum fraction of consecutive high-quality bases to be retained was set to 75% of the length of the input sequence with no uncalled bases (Ns) (34). The total number of sequencing reads after the quality filtering was 432 039 098. The Deblur was used for denoising and chimera removal to obtain amplicon sequence variants (ASVs) using a trim length of 250 bases (29,35). The resulting ASVs from all samples were combined into a BIOM table (36). After using the Deblur, a total of 81 701 877 reads were obtained from 2601 samples (34 projects), with an average of 31 412 reads per sample (a minimum of 2 reads and maximum of 205 611 reads). Samples with a minimum of 1000 reads were included after denoising and chimera removal, and a total of 2530 samples from 34 projects were available (the total number of sequencing reads was 81 669 682).
For taxonomic analysis, taxonomy was assigned to ASVs using the q2-feature-classifier classify-consensus-vsearch (33,40,52) against the EzBioCloud (53). All matches with an identity percentage of 0.97 or higher were kept. We only used bacterial 16S rRNA gene sequences from the EzBio-Cloud. Multi-layered pie charts representing the taxonomic composition were visualized with Krona (54), and network graphs representing the associations between bacteria and hosts were visualized with Flourish (https://flourish. studio/).

Database content and usage
AMDB can be divided into four main parts, namely 'Taxa', 'Samples', 'Projects/Hosts' and 'Visualization'. 'Taxa' shows samples enriched with the bacterial taxon of interest. 'Samples' provides the gut microbiota composition of the sample of interest. 'Projects' and 'Hosts' give users summary information on the project and the host, respectively. 'Visualization' visually presents valuable information related to the relationship between the host and the gut microbiota. 'Taxa' allows users to search for the taxon of interest (Supplementary Figure S1). 'Taxa' provides taxon informa-D732 Nucleic Acids Research, 2022, Vol. 50, Database issue

Other functionalities
To better guide users, the 'Help' page provides an overview of AMDB with simple examples. Users can also propose candidate data for AMDB using the submission form on the 'Contact' page. Our team will manually check new usersubmitted information, and AMDB will be updated on an ongoing basis.

DISCUSSION
AMDB is a database for exploring the gut microbiota of various animal species. AMDB provides a search capability for the various components related to gut microbiota. For example, one may be interested in Bilophila wadsworthia, Nucleic Acids Research, 2022, Vol. 50, Database issue D733 which is known to be related to animal-based diets in humans (56). The samples rich in this taxon can be identified in the search result of AMDB. Additionally, AMDB allows users to search for the sample based on metadata, including host taxonomy and diet types. The work from Youngblut et al. identified that hosts from the same species showed similar relative abundances of microbial phyla (7). This can be confirmed by comparing the microbial taxonomic compositions of samples taken from the same species. In addition, AMDB provides summary information about related projects and hosts. Users can thus compare the mouse information held within AMDB to the core microbiota of the mouse gut identified in multiple studies (57)(58)(59). Interactive visualizations are also available in AMDB. Host phylogeny and diet can explain variations in the gut microbiota (7), which can be confirmed using a PCoA plot within AMDB. The phylum Proteobacteria was identified as the dominant phylum in the samples from Actinopterygii (60), which can be identified using the network graph.
The number of available amplicon data in the NCBI SRA is continually increasing. AMDB will also be continuously updated to add additional data related to new and existing animal species. We will include new data collected by our team, as well as data based on the user-submitted information after manual curation.
Investigations into the relationship between gut microbiota and the host is a rapidly growing area of research (61). AMDB is the first database enabling easier exploration of this relationship. AMDB comprehensively addresses the taxonomic composition of animal gut microbiota with manually curated metadata, thus assisting in providing a better understanding of the gut microbiota of animals.

SUPPLEMENTARY DATA
Supplementary Data are available at NAR Online.