Methods Inf Med 1998; 37(04/05): 334-344
DOI: 10.1055/s-0038-1634566
Original Article
Schattauer GmbH

Evaluating Natural Language Processors in the Clinical Domain

C. Friedman
1   Department of Computer Science, Queens College CUNY, New York
2   Department of Medical Informatics, Columbia University, New York, USA
,
G. Hripcsak
2   Department of Medical Informatics, Columbia University, New York, USA
› Author Affiliations
Further Information

Publication History

Publication Date:
15 February 2018 (online)

Abstract

Evaluating natural language processing (NLP) systems in the clinical domain is a difficult task which is important for advancement of the field. A number of NLP systems have been reported that extract information from free-text clinical reports, but not many of the systems have been evaluated. Those that were evaluated noted good performance measures but the results were often weakened by ineffective evaluation methods. In this paper we describe a set of criteria aimed at improving the quality of NLP evaluation studies. We present an overview of NLP evaluations in the clinical domain and also discuss the Message Understanding Conferences (MUC) [1-41. Although these conferences constitute a series of NLP evaluation studies performed outside of the clinical domain, some of the results are relevant within medicine. In addition, we discuss a number of factors which contribute to the complexity that is inherent in the task of evaluating natural language systems.

 
  • REFERENCES

  • 1 Sundheim B. Proceedings of the Third Message Understanding Conference (MUC-3). San Mateo, CA: Morgan Kaufmann; 1991
  • 2 Sundheim B. Proceedings of the Fourth Message Understanding Conference (MUC-4). San Mateo,CA: Morgan Kaufmann; 1992
  • 3 Sundheim B. Proceedings of the Fifth Message Understanding Conference (MUC-5). San Mateo,CA: Morgan Kaufmann; 1994
  • 4 Grishman R, Sundheim B. Design of the MUC-6 Evaluation. In: Sundheim B. ed Proceedings of the Fifth Message Understanding Conference (MUC-5). San Mateo, CA: Morgan Kaufmann; 1996: 1-11.
  • 5 Sager N, Lyman M, Buchnall C, Nhan N, Tick L. Natural language processing and the representation of clinical data. JAMIA 1994; 1 (2) 142-60.
  • 6 Haug P, Ranum D, Frederick P. Computerized extraction of coded findings from freetext radiologic report. Radiology 1990; 174: 543-8.
  • 7 Friedman C, Hripcsak G, DuMouchel W, Johnson S, Clayton P. Natural language processing in an operational clinical information system. J of Nat Lang Eng 1995; 1 (1) 83-108.
  • 8 Zingmond D, Lenert L. Monitoring free-text data using medical language processing. Computers and Biomedical Research 1993; 26: 467-81.
  • 9 Lenert L, Tovar M. Automated linkage of free-text descriptions of patients with a practice guideline. In: Ozbolt J. ed Proceedings of the Eighteenth Annual Symposium on Computer Applications in Medical Care. 1994: 274-8.
  • 10 Baud R, Rassinoux A, Scherrer J. Natural language processing and semantical representation of medical texts. Meth Inform Med 1992; 31 (2) 117-25.
  • 11 Rassinoux A, Wagner J, Lovis C, Baud R, Scherrer J. Analysis of medical texts based on a sound medical model. In: Gardener R. ed Proceedings of the Nineteenth Annual Symposium on Computer Applications in Medical Care. Philadelphia: Hanley & Belfus; 1995: 27-31.
  • 12 Zweigenbaum P, Bachimont B, Bouaud J, Charlet J, Boisvieux J. A multi-lingual architecture for building a normalized conceptual representation from medical language. In: Gardener R. ed Proceedings of the Nineteenth Annual Symposium on Computer Applications in Medical Care. Philadelphia: Hanley & Belfus; 1995: 357-61.
  • 13 Hripcsak G, Friedman C, Alderson P, DuMouchel W, Johnson S, Clayton P. Unlocking clinical data from narrative reports. Ann of Int Med 1995; 122 (9) 681-8.
  • 14 Jain N, Knirsch C, Friedman C, Hripcsak G. Identification of suspected tuberculosis patients based on natural language processing of chest radiograph reports. In: Cimino JJ. ed Proceedings of the 1996 AMIA Annual Fall Symposium. Philadelphia: Belfus & Hanley; 1996: 542-6.
  • 15 Knirsch C, Jain NL, Palos-Mendez A, Friedman C, Hripcsak G. Respiratory isolation of tuberculosis patients using clinical guidelines and an automated clinical decision support system. Infection Control and Hospital Epidemiology. (in press).
  • 16 Gundersen M, Haug P, Pryor T, van Bree R, Koehler S, Bauer K, Clemons B. Development and evaluation of a computerized admission diagnoses encoding system. Computers and Biomedical Research 1996; 29: 351-72.
  • 17 Lovis C, Gaspoz J, Baud R, Michel P, Scherrer J. A semi-automatic ICD encoder. In: Cimino JJ. ed Proceedings of the 1996 AMIA Fall Annual Symposium. Philadelphia: Henley & Belfus; 1996: 937.
  • 18 Lovis C, Gaspoz J, Baud R, Michel P, Scherrer J. Natural language processing and clinical support to improve the quality of reimbursement claim databases. In: Cimino JJ. ed Proceedings of the 1996 AMIA Fall Annual Symposium. Philadelphia: Henley & Belfus; 1996: 899.
  • 19 Moore G, and Berman J. Automatic SNOMED Coding. In: Ozbolt J. ed Proceedings of the Eighteenth Annual Symposium on Computer Applications in Medical Care. Philadelphia: Hanley & Belfus; 1994
  • 20 Evans D, Hersh W, Monarch I, Lefferts R, Handerson S. Automatic indexing of abstracts via natural language processing using a simple thesaurus. Med Dec Making 1991; 11: 108-15.
  • 21 Sneiderman C, Rindflesch T, Aronson A. Finding the findings: identification of findings in medical literature using restricted natural language processing. In: Cimino JJ. ed Proceedings of the 1996 AMIA Fall Symposium. Philadelphia: Henley & Belfus; 1996: 239-43.
  • 22 Rindflesch T, Aronson A. Semantic processing in information retrieval. Proceedings of the Seventeenth Annual Symposium in Medical Care. Philadelphia: Henley & Belfus; 1993: 611-5.
  • 23 Hersh W, Campbell E, Evans D, Brownlow N. Empirical, automated vocabulary discovery using large text corpora and advanced natural language processing Tools. In: Cimino JJ. ed Proceedings of the 1996 AMIA Annual Fall Symposium. Philadelphia: Henley & Belfus; 1996: 159-63.
  • 24 McCray A, Srinivasan S, Browne A. Lexical methods for managing variation in biomedical terminologies. In: Ozbolt J. ed Proceedings of the Eighteenth Annual Symposium in Computer Applications in Medical Care. Philadelphia: Henley & Belfus; 1994: 235-9.
  • 25 DuMouchel W, Friedman C, Hripcsak G, et al. Two applications of statistical modelling to natural language processing. In: Fisher D, Lenz H. eds AI and Statistics. NY: Springer Verlag; 1996: 413-21.
  • 26 Spyns P. Natural language processing in medicine: an overview. Meth Inform Med 1996; 35: 285-301.
  • 27 Orthner H. Computers and Medicine. NY: Springer; 1996
  • 28 Rind D, Yeh J, Safran C. Using an electronic medical record to perform clinical research on mitral valve prolapse and panic/anxiety disorder. In: Gardener R. ed Proceedings of the Nineteenth Annual Symposium in Computer Applications in Medical Care. Philadelphia: Hanley & Belfus; 1995: 961.
  • 29 Wyatt J, Spiegelhalter D. Evaluating medical expert systems: what to test and how?. London: Meth Inf; 1990. 15 205-17.
  • 30 Hayes-Roth F, Waterman D, Lenat D. Building expert systems. Reading, MA: Addison-Wesley; 1983
  • 31 Anderson J, Aydin C, Jay S. Evaluating health care information systems. Thousand Oaks, CA: Sage; 1994
  • 32 Rind D, Davis R, Safran C. Designing studies of computer-based alerts and reminders. MD Computing 1995; 12 (2) 122-6.
  • 33 Will C. Comparing human and machine performance for natural language information Evaluation. In: Sundheim B. ed Proceedings of the Fifth Message Understanding Conference (MUC-5). San Mateo, CA: Morgan Kaufmann; 1994: 53-68.
  • 34 Yerushalmy J. The statistical assessment of the variability in observer perception and description of roentgenographic pulmonary shadows. Radiologic Clinics of North America 1969; VII (3) 380-92.
  • 35 Friedman C, Alderson P, Austin J, Cimino J, Johnson S. A general natural language text processor for clinical radiology. JAMIA 1994; 1 (2) 161-74.
  • 36 Jollis J, Ancukiewicz M, DeLong E, Pryor D, Muhibaier L, Mark D. Discordance of database designed for claims payment versus clinical information system. Ann Intern Med 1993; 119: 844-50.
  • 37 Sager N, Friedman C, Lyman M, et al. Medical language processing: computer management of narrative data. Reading, MA: Addison-Wesley; 1987
  • 38 Lyman M, Sager N, Tick L, Nhan N, Borst F, Scherrer J. The application of natural-language processing to healthcare quality assessment. Med Decis Making 1991; 11 (suppl) S65-8.
  • 39 Lin R, Lenert L, Middleton B, Shiftman S. A free-text processing system to capture physical findings: canonical phrase identification System (CAPIS). In: Clayton P. ed Proceedings of the Fifteenth Annual Symposium in Computer Applications in Medical Care 1908. Nov NY: McGraw-Hill; 1992: 843-7.