Published March 4, 2023 | Version 1.0
Dataset Open

First Steps towards a Risk of Bias Corpus of Randomized Controlled Trials

  • 1. University of Geneva
  • 2. IUFRS, University of Lausanne, Lausanne, Switzerland
  • 3. School of Health Sciences, HES-SO Valais-Wallis, Leukerbad, Switzerland
  • 4. SIB Swiss Institute of Bioinformatics (SIB), Geneva, Switzerland

Description

Abstract

Risk of bias (RoB) assessment of randomized clinical trials (RCTs) is vital to conducting systematic reviews. Manual RoB assessment for hundreds of RCTs is a cognitively demanding, lengthy process and is prone to subjective judgment. Supervised machine learning (ML) can help to accelerate this process but requires a hand-labelled corpus. There are currently no RoB annotation guidelines for randomized clinical trials or annotated corpora. In this pilot project, we test the practicality of directly using the revised Cochrane RoB 2.0 guidelines for developing an RoB annotated corpus using a novel multi-level annotation scheme. We report inter-annotator agreement among four annotators who used Cochrane RoB 2.0 guidelines. The agreement ranges between 0% for some bias classes and 76% for others. Finally, we discuss the shortcomings of this direct translation of annotation guidelines and scheme and suggest approaches to improve them to obtain an RoB annotated corpus suitable for ML.

 

Methods

The upload contains two zip files and a .json file.

  • plain.html.zip

Original corpus (n = 10) in .html format. The corpus was generated using the methodology described in the paper. Each .html file could be opened in any default text editor in any operating system or browser. A .html contains full text divided into several annotatable text parts. 

 

  • ann.json.zip

The .zip contains RoB annotations conducted by the authors (R.H., M.S., K.G., R.C.). The annotation files are in .json format. Each .json is divided into two JSON objects and three JSON arrays.  

  1. annotatable (object): Parts from the full-text document corresponding to the text parts from the plain .html files. 
  2. metas (object): full-text document label
  3. entities (array): contains labelled entities. Each entity is linked to which part of the full-text it is linked to.
  4. relations (array)
  5. sources (array)

 

  • annotations-legend.json

This .json file contains entity and entity labels encoded to text legends. For example, entity class label "1_2_Yes_Good" is encoded as "e_113".

 

Resources

The code to parse annotations can be found on GitHub.

 

Funding

HES-SO Valais-Wallis, Sierre, Switzerland

Files

ann.json.zip

Files (318.3 kB)

Name Size Download all
md5:fee748084d8a92f1b8d5e513c9e4846f
130.2 kB Preview Download
md5:61d97cbbaa750d6c4e887ec64d6ce54a
3.6 kB Preview Download
md5:b731bf02cd33eccd92b848cb6f274eca
184.4 kB Preview Download

Additional details

References

  • Dhrangadhariya A et al. First Steps towards a Risk of Bias Corpus of Randomized Controlled Trials. In Proceedings Medical Informatics Europe 2023. Gothenburg, Sweden.