Published January 9, 2023 | Version v2
Journal article Open

Sharing GWAS summary statistics results in more citations

  • 1. 1 Cambridge Institute of Therapeutic Immunology and Infectious Disease (CITIID), University of Cambridge, Cambridge, United Kingdom. 2 Department of Medicine, University of Cambridge, Cambridge, United Kingdom.
  • 2. 1 Cambridge Institute of Therapeutic Immunology and Infectious Disease (CITIID), University of Cambridge, Cambridge, United Kingdom. 2 Department of Medicine, University of Cambridge, Cambridge, United Kingdom. 3 MRC Biostatistics Unit, University of Cambridge, Cambridge, United Kingdom.

Description

Data and code supporting "Sharing GWAS summary statistics results in more citations"

This Zenodo repository is intended to store the data and code to reproduce the results in Sharing GWAS summary statistics results in more citations paper, currently accepted at Communications Biology and available as a preprint on bioRxiv!

Find below the description of the main scripts and datasets used in this work.

Note that this repository contains the main analysis only; see here for the code and data used in the extended search for shared files.


Code directory contains the following (updated) scripts:

  • 01_Preparing_data.R: Script with all steps we followed to clean our data prior to analysis. These included matching journal names between different datasets to get citation and impact factor together, as well as fetching individual-level paper citations per year, adding per-year SJR, and updating GWAS sharing classification according to our new search. Check this file for links to the original data sources and how files used in analyses were generated.
  • 02_Analysis.R: Script with all analyses and figures generated in this work. 
  • 03_PG_NG_nonsharers_selection.R: Script with the code used to select a random 50% of articles classified as non-sharers at GWAS catalog published in two of the journals with the most published GWAS: Nature Genetics and PLoS Genetics.
  • 04_response_to_reviewers_v2.R: Script used in our response to referees during the first peer-review round.


Data directory contains the downloaded and generated datasets for analysis. 

Downloaded datasets:

Generated datasets (See code/01_Preparing_data.R for code to generate them):

  • R1_Main_data_20221207.tsv: Latest dataset used in our analysis. Other files named similary but with different dates correspond to previous data versions. See scripts in `code/` to learn more about their generation/usage.
  • Full_scimago_info.tsv: Merged SCimagoJR dataset with journal impact factors for years 2005-2021, after merging scimagojr_*.tsv files.
  • NLM_journals.tsv: NLM catalog with journal names, ISSN, and abbreviations.
  • Citations_per_year_20220611.tsv: Dataset on the number of citations obtained per article, per year since publication.
  • PG_NG_nonsharers.tsv: A list of 353 randomly selected papers published in PLoS Genetics or Nature Genetics, and classified by GWAS catalog as non-sharers, used for our first sharing status chek-up. The result from this analysis is shown in Supplementary Table 2.
  • Non_sharers_all.tsv: List of articles classified as non-sharers in GWAS catalog, input for our extended sharing search.
  • data_found.txt: List of articles reclassified as sharers by our extended sharing search.

We also provide the datasets used to generate the figures in their respective tab-separated files (ie. Figure1.tsv, Figure2.tsv, Figure3a.tsv, and Figure3b.tsv).

Guillermo Reales
2023/01/09
 

Files

gwas-sharing-zenodo-v2.zip

Files (111.7 MB)

Name Size Download all
md5:f5cbd81bb536fc9a2927666a1798ff87
111.7 MB Preview Download

Additional details

Funding

Engineering genomic features to better understand the molecular basis of immune mediated diseases and their treatment 220788
Wellcome Trust
Statistical ‘omics approaches to understanding autoimmune diseases MC_UU_00002/4
UK Research and Innovation

References

  • Reales and Wallace (2021) Sharing GWAS summary statistics results in more citations: evidence from the GWAS catalog. BiorXiv. doi:10.1101/2022.09.27.509657