Cohort selection for clinical trials using multiple instance learning

https://doi.org/10.1016/j.jbi.2020.103438Get rights and content
Under an Elsevier user license
open archive

Highlights

  • Cohort selection requires the information across all of the patient’s health records.

  • MILs under the bag- and the embedded-space paradigms are generally better.

  • MILs generally perform better than their single instance counterpart approaches.

  • MIL approach is more effective in criteria requiring overall comprehension.

  • Other approaches perform well for criteria with unambiguous definitions.

Abstract

Identifying patients eligible for clinical trials using electronic health records (EHRs) is a challenging task usually requiring a comprehensive analysis of information stored in multiple EHRs of a patient. The goal of this study is to investigate different methods and their effectiveness in identifying patients that meet specific eligibility selection criteria based on patients’ longitudinal records. An unstructured dataset released by the n2c2 cohort selection for clinical trials track was used, each of which included 2–5 records manually annotated to thirteen pre-defined selection criteria. Unlike the other studies, we formulated the problem as a multiple instance learning (MIL) task and compared the performance with that of the rule-based and the single instance-based classifiers. Our official best run achieved an average micro-F score of 0.8765 which was ranked as one of the top ten results in the track. Further experiments demonstrated that the performance of the MIL-based classifiers consistently yield better performance than their single-instance counterparts in the criteria that require the overall comprehension of the information distributed among all of the patient’s EHRs. Rule-based and single instance learning approaches exhibited better performance in criteria that don’t require a consideration of several factors across records. This study demonstrated that cohort selection using longitudinal patient records can be formulated as a MIL problem. Our results exhibit that the MIL-based classifiers supplement the rule-based methods and provide better results in comparison to the single instance learning approaches.

Keywords

Cohort selection
Multiple instance learning
Clinical natural language processing
Electronic health records

Cited by (0)