figshare
Browse
1/1
18 files

Improving gene function predictions using independent transcriptional components - Raw Figure Data

Version 2 2021-02-18, 18:23
Version 1 2020-11-20, 11:52
dataset
posted on 2021-02-18, 18:23 authored by Carlos Urzúa-Traslaviña, Vincent LeeuwenburghVincent Leeuwenburgh, Rudolf Fehrmann, Stefan Loipfinger
Refer to the below descriptions of the files, also available in README.txt;

>*****_medians_file
gene_set_name : Name of gene set
size : Number of genes with prediction scores in that gene set
gene_set_db : Name of host gene set collection
Method : Method used to calculate prediction scores
Subset : Median was calculated using member genes (n = size) or genes that are never members in that gene set collection.
median_prediction_score: median prediction score (for gene sets with less than 10 or more than 500 genes = NA)

>******_multifunctonality_file
gene_set_name : Name of gene set
multifunctionality_score_correlation : Distance correlation between member prediction scores and multifucntionality score calculated using host gene set collection (Empty value = gene set size was outside 10->500 range)
gene_set_db : Name of host gene set collection
method : Method used to calculate prediction scores

>*****_old_version_comparison_file
gene_set_name : Name of gene set
size : Number of genes that were added between v3.0 and v6.2
gene_set_db : Name of host gene set collection
type : version of gene set used to calculate prediction scores of these genes
method : Method used to calculate prediction scores
Median prediction score : median prediction score of subset of genes added in between v3.0 and v6.2

>unbiased_clustering
File with probe entrez_id to cluster mapping
GROUP : cluster number
LABEL : Affymetrix probe
ENTREZID : Corresponding entrez number
SYMBOL : Corresponding symbol
uncharacterized : 0 or 1 if its an orf or LOC gene

>cluster_predictability
File with cluster metrics
GROUP : cluster number
size : size of cluster
median_max_prediction : Maximum prediction score for each gene across 16 collections, cluster median.
density : density metric
median_multifunctionality : Median disctance correlation association to multifunctionality calculated using all genesets from all collections
ORF or LOC : amount of orf or LOC genes in cluster

>durocher_comparison_figure_data
Comparison data for Olivieri et al hit genes
entrez_id : entrez number
variable : Comparison is GO_DNA_REPAIR to GO_DNA_REPAIR
pca_prediction_scores : PCA based prediction score
i.variable : Comparison is GO_DNA_REPAIR to GO_DNA_REPAIR
ica_prediction_scores : ICA based prediction score
known_link_to_DDR : 0 or 1 if it was called as known link in Olivieri et al

>57_lps_network_all_genes
ICA-TC based prediction scores for Immunological Signatures gene set collection, of the subset of genes identified in a CRISPR-Cas9 screen that have a high co-functionality.
entrez_number: Entrez ID of gene
gene_name: HGNC Gene Symbol
gene_set: Immunological Signatures gene set
value: Z-score of gene for gene set

>HALLMARK_ICA_ZTpvalues_CORFsandLOCS_wardClustered
Matrix containing ICA-TC based prediction scores for all Corf and LOC genes, hierarchically clustered using ward's method and 1-cor(dist) as distance function.
Columns correspond to hallmark gene sets, rows to genes.

>HALLMARK_ICA_ZTpvalues_CORFsandLOCS_wardClustered_cutoff_0.8
Cluster membership of the 835 Corf and LOC genes at a dendrogram cutoff height of 0.8
GROUP: Cluster number
LABEL: HGNC Gene Symbol

>HALLMARK_ICAvPCA_ZTpvalues_CORFsandLOCS
Comparison between ICA-TC based and PCA-TC based prediction scores of Hallmark gene sets for all Corf and LOC genes.
gene: HGNC Gene Symbol
variable: gene set
i.value: ICA-TC based prediction score
value: ICA-TC based prediction score
category: logical; 1 if i.value>value, 0 if value>i.value

>ICAvPCA_GO_negviralregulation.txt
Comparison between ICA-TC based and PCA-TC based prediction scores of Hallmark gene sets for all Corf and LOC genes.
entrez_id: Entrez ID of gene
variable: gene set
i.value: ICA-TC based prediction score
value: ICA-TC based prediction score
known_link: one of three strings: "yes" if gene is a member of the gene set; "no" if gene is not a member of the gene set, "screen" if the gene is one of the hits of the investigated CRISPR-cas screen

>ICAvPCA_KEGG_Lysosome
Comparison between ICA-TC based and PCA-TC based prediction scores of Hallmark gene sets for all Corf and LOC genes.
entrez_id: Entrez ID of gene
variable: gene set
i.value: ICA-TC based prediction score
value: ICA-TC based prediction score
known_link: one of three strings: "yes" if gene is a member of the gene set; "no" if gene is not a member of the gene set, "screen" if the gene is one of the hits of the investigated CRISPR-cas screen

>Mouse_v_Human_barcode_spearman_correlations.txt
Spearman correlations of mouse gene barcodes with ortholog human gene barcodes for each of the 16 gene set collections.
mouse_gene: Mouse Entrez ID
assoc_human: Entrez ID of corresponding human ortholog
spearman_r: Spearman correlation coefficient
collection: number corresponding to gene set collection
name: gene set collection name

History

Usage metrics

    Licence

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC