Published September 13, 2021 | Version v1
Dataset Open

Supporting data for the manuscript "Nerpa: a tool for discovering biosynthetic gene clusters of nonribosomal peptides"

  • 1. St. Petersburg State University, Sirius University of Science and Technology
  • 2. St. Petersburg State University, Sirius University of Science and Technology, St. Petersburg Electrotechnical University "LETI"
  • 3. University of California San Diego
  • 4. Carnegie Mellon University

Description

Preprocessed structures of nonribosomal peptides [NRPs] and genomic sequences (reference and representative genomes, biosynthetic gene clusters [BGCs]) used in the benchmark experiments in the Nerpa paper.

Files description

  • mibig_nrp_bacteria_preprocessed.tar.gz contains the preprocessed dataset of 194 bacterial NRP BGCs from the MIBiG database.
  • mibig_nrp_bacteria_summary.tsv contains metadata for the MIBiG-NRP dataset.
  • bacterial_ref_and_repr_genomes_20210604_preprocessed.tar.gz contains the preprocessed dataset of 13,399 reference and representative bacterial genomes from the NCBI RefSeq database (retrieved on 2021/06/04).
  • bacterial_ref_and_repr_genomes_20210604_summary.txt contains metadata for the RefSeq dataset.
  • pnrpdb_preprocessed.info contains the Nerpa-preprocessed pNRPdb database, a database of 8,368 known and putative NRP structures.
  • pnrpdb_summary.tsv contains the pNRPdb database metadata.
     

Files

bacterial_ref_and_repr_genomes_20210604_summary.txt

Files (62.0 MB)

Name Size Download all
md5:9461f60dda8308ee93aa027b214882b2
53.9 MB Download
md5:63d40bac7d9670880299266e67945bd4
4.7 MB Preview Download
md5:ceca8f21627352076bdc7ec36b0f8769
305.0 kB Download
md5:94bc9f4dfea1bf0d285391de5dd42a89
39.0 kB Download
md5:452a3ccafe5d5517467604cbfba701da
879.5 kB Download
md5:e1ff8badab5167d59129580efe5300e3
2.1 MB Download