12859_2022_4927_MOESM2_ESM.xlsx (9.94 kB)
Additional file 2 of Genomic data integration and user-defined sample-set extraction for population variant analysis
dataset
posted on 2022-09-30, 06:15 authored by Tommaso Alfonsi, Anna Bernasconi, Arif Canakoglu, Marco MasseroliAdditional file 2. Example of transformed metadata: In this .xlsx (MS Excel) file, we list all the output metadata categories generated for each sample from the transformation of the 1KGP input datasets. The output metadata include information collected from all the four 1KGP metadata files considered. Some categories are not reported in the source metadata files—they are identified by the label manually_curated__...—and were added by the developed pipeline to store technical details (e.g., download date, the md5 hash of the source file, file size, etc.) and information derived from the knowledge of the source, such as the species, the processing pipeline used in the source and the health status. For every information category, the table reports a possible value. The third column (cardinality > 1) tells whether the same key can appear multiple times in the output GDM metadata file. This is used to represent multi-valued metadata categories; for example, in a GDM metadata file, the key manually_curated__chromosome appears once for every chromosome mutated by the variants of the sample.