×

Recommended by

Indexed by

Revised MD17 dataset

Anders Christensen1*, O. Anatole von Lilienfeld1*

1 Department of Chemistry, University of Basel, Switzerland

* Corresponding authors emails: anders.christensen@unibas.ch, anatole.vonlilienfeld@unibas.ch
DOI10.24435/materialscloud:wy-kn [version v1]

Publication date: Jul 23, 2020

How to cite this record

Anders Christensen, O. Anatole von Lilienfeld, Revised MD17 dataset, Materials Cloud Archive 2020.82 (2020), https://doi.org/10.24435/materialscloud:wy-kn

Description

The original MD17 dataset (http://quantum-machine.org/datasets/#md-datasets) [Chemiela et al. Sci. Adv. 3(5), e1603015, 2017] contains numerical noise. Thus, any numbers presented from benchmarks on this data are likely flawed. Here, we present a new dataset with negligible numerical noise for benchmarking of forces and energy predictions for molecular dynamics simulations. As the structures are taken from a molecular dynamics simulation (i.e. time series data), they are not guaranteed to be independent samples. This is easily evident from the autocorrelation function for the original MD17 dataset. In short: DO NOT train a model on more than 1000 samples from the revised dataset, and do not train models for more than 50 samples from the original MD17 dataset. Data already published with 50K samples on the original MD17 dataset should be considered meaningless due to this fact and due to the noise in the original data.

Materials Cloud sections using this data

No Explore or Discover sections associated with this archive record.

Files

File name Size Description
rmd17.tar.bz2
MD5md5:cb1a927628d96f2e966025da4fb63d18
1016.9 MiB Tarfile containing the data in NPZ and CSV format
readme.txt
MD5md5:29e6f250bb2d1c461363e24955b5be1e
2.0 KiB Readme file

License

Files and data are licensed under the terms of the following license: Creative Commons Zero v1.0 Universal.
Metadata, except for email addresses, are licensed under the Creative Commons Attribution Share-Alike 4.0 International license.

External references

Journal reference (Paper in which data is presented)
Preprint (Preprint in which data is presented)

Keywords

Chemistry Machine Learning Noise Forces Energies Molecules

Version history:

2020.82 (version v1) [This version] Jul 23, 2020 DOI10.24435/materialscloud:wy-kn