Planned intervention: On Wednesday April 3rd 05:30 UTC Zenodo will be unavailable for up to 2-10 minutes to perform a storage cluster upgrade.
Published December 31, 2016 | Version v1
Other Open

Musings on the Open Science Prize

  • 1. OHSU

Description

As I was thinking about casting my vote for the Open Science Prize, I realized that I would in fact need a rubric for choosing. I was concerned that the public vote would tend towards popularity, familiarity, or bling, rather than the quality of the open science. But what does it mean to be “quality open science?” What should be the most important criteria? The different semi-finalist projects are all very different - on different topics, of varying degrees of maturity (some pre-date the competition and some do not), and targeting different audiences. If successful, each will have different societal impacts. I applaud them all.

Recently, we evaluated over eighty manuscripts from PLOS to determine which ones were most significant, impactful, or otherwise representative to form the core of the new PLOS Open Data Collection. In this context, we created a rubric for evaluation and then scored each manuscript objectively. For each manuscript, what was the impact on policy change? Were ethical issues considered? Did the science advance our abilities to share data or use shared data? Did the project utilize shared data (the noble discipline of “data parasitism”)? Was the community involved?  How sexy were the figures? How much did the work foster cross-pollination of ideas and approaches across disciplines? And of course, what did people think about the work? I needed a similar rubric here, but for knowledgebases and not manuscripts.

Knowledge is our collective insights, captured by experts and able to provide an explanatory framework for evaluating new observations. A knowledgebase makes that knowledge findable and computable. A recent NIH RFI: “Metrics to Assess Value of Biomedical Digital Repositories” highlighted the ineffectiveness of current knowledgebase evaluation. Traditional citation and impact factors as a measure of success or value are inadequate. For example, almost everyone in biomedicine relies on PubMed, but almost no one ever cites or mentions it in their publications. In response to this RFI, our group (consisting of the NCATS Data Translator and the Monarch Initiative) developed a rubric arranged according to the commonly cited FAIR principles -- Findable, Accessible, Interoperable, and Reusable, but with three additional principles: Traceable, Licensed, and Connected. These latter three extensions are, in my opinion, fundamental to “quality open science”, as without them, you do not have computability, legal ability to reuse the data/knowledge, and no ability to navigate the fabric of the data landscape.

Therefore, for evaluation of the open science projects, I applied the rubric we described in our response to the RFI, but with additional considerations throughout relating to the PLOS data science collection curation, and trying to take into account advances since the open science prize project began (since some projects were preexisting and backed by other funds/projects, where others were brand new). I note that this is as much an evaluation of the rubric as it is of the projects themselves.

I purposefully did not watch any of the videos explaining the projects on the Open Science Prize website before performing the evaluation.  I wanted to determine how well the projects themselves related their goals, content, and functionality. As a potential user of the data, I aimed to evaluate the ease of navigating the data and its access and reuse directly. Most importantly, I wanted to avoid bias where the real distinctions between projects might be obscured by video production quality, rather than highlighting each project’s genuine values and their differences. It would be all too easy to create a great video about a great idea, and then not implement a quality platform based on strong open science principles, such as open code and data access, or the FAIR+ principles: Findable, Accessible, Interoperable, and Reusable,Traceable, Licensed, and Connected.

One might ask, why bother? The first reason was I wanted to determine how well the preliminary rubric we laid out in our response to the RFI might work in the real world, as we plan to write a more thorough proposal for knowledgebase/data repository evaluation in the future. The second reason is that I simply wanted the evaluation of these projects to inform the future development of the open science projects I work most on, such as the Monarch Initiative (genotype-phenotype data aggregation across species for diagnostics and mechanism discovery), Phenopackets (a new standard for exchanging computable phenotype data for any species in any context), and OpenRIF (computable representation of scholarly outputs and contribution roles to better credit non-traditional scientists). How can we all do better and learn from the Open Science competition? In other words, such a competition shouldn’t just be about the six finalists, but rather it should inform how we all go about practicing open science in general.

So now you are probably wondering, which project(s) did I vote for? Well, that is for you to infer. As you review the musings below, consider your own values for what constitutes robust open science. The full text of my review is available at https://www.force11.org/blog/musings-about-open-science-prize. Comments and corrections entirely welcome on the force11 page or tweet to @ontowonka.

Notes

More readable full text available at https://www.force11.org/blog/musings-about-open-science-prize

Files

Musings about the Open Science Prize _ FORCE11.pdf

Files (133.4 kB)