A hybrid approach to automatic de-identification of psychiatric notes

https://doi.org/10.1016/j.jbi.2017.06.006Get rights and content
Under an Elsevier user license
open archive

Highlights

  • A hybrid approach for automatic de-identification of psychiatric notes is proposed.

  • The rule-based components exploit the structure of the psychiatric notes.

  • The machine learning components integrate additional data to training set.

  • The system showed second-best performance at the CEGS N-GRID task 1.

Abstract

De-identification, or identifying and removing protected health information (PHI) from clinical data, is a critical step in making clinical data available for clinical applications and research. This paper presents a natural language processing system for automatic de-identification of psychiatric notes, which was designed to participate in the 2016 CEGS N-GRID shared task Track 1. The system has a hybrid structure that combines machine leaning techniques and rule-based approaches. The rule-based components exploit the structure of the psychiatric notes as well as characteristic surface patterns of PHI mentions. The machine learning components utilize supervised learning with rich features. In addition, the system performance was boosted with integration of additional data to the training set through domain adaptation. The hybrid system showed overall micro-averaged F-score 90.74 on the test set, second-best among all the participants of the CEGS N-GRID task.

Keywords

De-identification
Psychiatric notes
Natural language processing

Cited by (0)