Published April 3, 2024 | Version v2
Dataset Open

Hybracter v0.7.0 Benchmarking Output

Description

This dataset contains:

  1. The subsampled FASTQ files used to benchmarking Hybracter (https://github.com/gbouras13/hybracter).
  2. Benchmarking Output for Hybracter v0.7.0 vs Unicycler v0.5.0 vs Dragonflye v1.1.2 on these files.

The full benchmarking code and explanation is available https://github.com/gbouras13/hybracter_benchmarking. 

The `hybracter_benchmarking_fastqs.tar.gz` tarball will contain subsampled FASTQs (gzipped) of the first 20 samples used to benchmarking `hybracter`. These are the JKD6159, Lerminiaux, Chitale and super-accuracy model basecelled simplex ATCC fastqs.

The `PRJNA1087001_ATCC_SUP_Duplex_FAST_Simplex_fastqs.tar.gz` tarball will contain subsampled FASTQs (gzipped) of the 10 added samples in v2 of the prepint used to benchmarking `hybracter`. These are the fast model basecelled simplex ATCC fastqs and super-accuracy model basecelled duplex ATCC fastqs.

The other 4 tarballs ( `hybracter_benchmarking_results_v0.7.0.tar.gz`, `hybracter_benchmarking_results_fast.tar.gz`, `hybracter_benchmarking_results_duplex.tar.gz` and `hybracter_depth_Lerminiaux_isolateB_benchmarking_results.tar.gz`) contain benchmarking outputs for the first 20 samples, 5 fast model basecelled simplex ATCC samples, 5 super-accuracy model basecelled duplex ATCC and the depth analysis for Lerminiaux isolate B.

The when untared, each tarball will contain:

  • `BENCHMARKS` - contains the time etc benchmarking for each run (sample x tool)
  • `DNADIFF` - contains raw chromosome Dnadiff results for each run (sample x tool)
  • `DNADIFF_PARSED_OUTPUT` - contains parsed chromosome Dnadiff results for each sample
  • `DNADIFF_PLASMIDS` - contains plasmid Dnadiff results for each run (sample x tool)
  • `DNADIFF_PARSED_OUTPUT_PLASMID` - contains parsed plasmid Dnadiff results for each sample
  • `REAL` - this contains all the actual output for each assembler. The following 5 directories will contain the all the raw output with subdirectories for each sample:
    • `HYBRACTER_HYBRID_OUTPUT`
    • `HYBRACTER_LONG_OUTPUT`
    • `DRAGONFLYE_HYBRID_OUTPUT`
    • `DRAGONFLYE_LONG_OUTPUT`
    • `UNICYCLER_OUTPUT`
    • Additionally, `hybracter_benchmarking_results_v0.7.0.tar.gz` will have `HYBRACTER_HYBRID_OUTPUT_REAL_BULK` - this contains the output for the 12 Lerminiaux et al isolates assembled using `hybracter hybrid` with modified config file `bulk_assemble_lerminiaux_config.yaml`.
  • It will also contain a number of other subdirectories `_SUMMARIES`, `_PLASMIDS`, `_CHROMOSOMES` with parsed summary outputs and parsed specific plasmids and chromosome assemblies for Unicycler and Dragonflye (this made the assessment a lot easier and automated).

To untar e.g.

`tar -xzf hybracter_benchmarking_results_v0.7.0.tar.gz`

 

 

 

 

Files

Files (35.2 GB)

Name Size Download all
md5:5b4105760fd2d12c499ed8c794af35f4
13.7 GB Download
md5:0c25c64dd756198b9078ea30c85d0344
1.4 GB Download
md5:6e44c2c3269783429588249f9406460d
1.5 GB Download
md5:1cf95c03b2ca2f6c2e9621fb813e0394
8.3 GB Download
md5:4ff5b952b95708c4fb856486b16e1916
7.3 GB Download
md5:8d96b424c071e371515d70ed8ccaa2e4
3.1 GB Download

Additional details

Dates

Created
2023-11-20