Published October 31, 2018 | Version 0.2
Dataset Open

EukZoo, an aquatic protistan protein database for meta-omics studies.

  • 1. University of Southern California

Description

This database contain protein sequences of aquatic microbial eukaryotes, or protists. The purpose of this is to make a database that is of reasonable quality to serve as resource for both taxonomy and functional interpretation of metagenomic and metatranscriptomic studies of protists. The source of the sequences were mainly from Marine Microbial Eukaryotes Transcriptome Sequencing Project (MMETSP), and supplemented with various genomes and transcriptomes of organisms that were not a part of MMETSP.

To use this database, one has to understand the main function of the three files here.

(1) The protein sequences are stored in .faa file. You can build an alignment/search database out of that and search your meta-omics sequences against it. Each sequence in the FASTA file has an ID which always consists of two parts like this: "MMETSP0004_1234567". The text before the first underscore is the source ID of that sequence.

(2) Taxonomy information of each source ID are stored in "EukZoo_taxonomy_table_v_0.2.tsv". One can use the information within in conjunction with database search results to assign taxonomy to sequences.

(3) KEGG annotation of each sequence are stored in "EukZoo_KEGG_annotation_v_0.2.tsv". One can use the information within in conjunction with database search results to assign KEGG functional annotation (KO ID) to sequences.

I also provide scripts to assign taxonomy and KEGG annotation from database search results. You can also find the scripts and explanations on how to use them on the EukZoo GitHub page. You will find details on how the database was created and curated on there as well.

Please contact me at zhenfeng.liu1@gmail.com if you have any questions or requests. Thank you for your interest in EukZoo.

Files

Files (3.9 GB)

Name Size Download all
md5:58d2cd88847d378a6f7ed2c190382d72
529.3 kB Download
md5:d898b9061517fd3b6c6dbe5a1bf4266e
100.0 MB Download
md5:91821d18415b9021751cc5ae95820f98
107.2 kB Download
md5:4a753abebff09f700927d039a4ba4d1c
3.8 GB Download