First Steps towards a Risk of Bias Corpus of Randomized Controlled Trials
Creators
- 1. University of Geneva
- 2. IUFRS, University of Lausanne, Lausanne, Switzerland
- 3. School of Health Sciences, HES-SO Valais-Wallis, Leukerbad, Switzerland
- 4. SIB Swiss Institute of Bioinformatics (SIB), Geneva, Switzerland
Description
Abstract
Risk of bias (RoB) assessment of randomized clinical trials (RCTs) is vital to conducting systematic reviews. Manual RoB assessment for hundreds of RCTs is a cognitively demanding, lengthy process and is prone to subjective judgment. Supervised machine learning (ML) can help to accelerate this process but requires a hand-labelled corpus. There are currently no RoB annotation guidelines for randomized clinical trials or annotated corpora. In this pilot project, we test the practicality of directly using the revised Cochrane RoB 2.0 guidelines for developing an RoB annotated corpus using a novel multi-level annotation scheme. We report inter-annotator agreement among four annotators who used Cochrane RoB 2.0 guidelines. The agreement ranges between 0% for some bias classes and 76% for others. Finally, we discuss the shortcomings of this direct translation of annotation guidelines and scheme and suggest approaches to improve them to obtain an RoB annotated corpus suitable for ML.
Methods
The upload contains two zip files and a .json file.
- plain.html.zip
Original corpus (n = 10) in .html format. The corpus was generated using the methodology described in the paper. Each .html file could be opened in any default text editor in any operating system or browser. A .html contains full text divided into several annotatable text parts.
- ann.json.zip
The .zip contains RoB annotations conducted by the authors (R.H., M.S., K.G., R.C.). The annotation files are in .json format. Each .json is divided into two JSON objects and three JSON arrays.
- annotatable (object): Parts from the full-text document corresponding to the text parts from the plain .html files.
- metas (object): full-text document label
- entities (array): contains labelled entities. Each entity is linked to which part of the full-text it is linked to.
- relations (array)
- sources (array)
- annotations-legend.json
This .json file contains entity and entity labels encoded to text legends. For example, entity class label "1_2_Yes_Good" is encoded as "e_113".
Resources
The code to parse annotations can be found on GitHub.
Funding
HES-SO Valais-Wallis, Sierre, Switzerland
Files
ann.json.zip
Additional details
References
- Dhrangadhariya A et al. First Steps towards a Risk of Bias Corpus of Randomized Controlled Trials. In Proceedings Medical Informatics Europe 2023. Gothenburg, Sweden.