ABSTRACT
This work proposes a case-based classifier to tackle the gene/protein mention problem in biomedical literature. The so called gene mention problem consists of the recognition of gene and protein entities in scientific texts. A classification process aiming at deciding if a term is a gene mention or not is carried out for each word in the text. It is based on the selection of the best or most similar case in a base of known and unknown cases. The approach was evaluated on several datasets for different organisms and results show the suitability of this approach for the gene mention problem.
- Daelemans, W., Zavrel, J., Berck, P.,&Gillis, S. (1996). MBT: A Memory-Based Part of Speech Tagger-Generator. Paper presented at the Fourth Workshop on Very Large Corpora, Copenhagen, Denmark.Google Scholar
- Hirschman, L., Colosimo, M., Morgan, A.,&Yeh, A. (2005). Overview of BioCreAtIvE task 1B: normalized gene lists. BMC Bioinformatics, 6 Suppl 1, S11.Google ScholarCross Ref
- Morgan, A.,&Hirschman, L. (2007). Overview of Bio-Creative II Gene Normalization. Paper presented at the Second BioCreative Challenge Evaluation Workshop, Madrid-Spain.Google Scholar
- Wilbur, J., Smith, L.,&Tanabe, L. (2007). BioCreative 2. Gene Mention Task. Paper presented at the Second BioCreative Challenge Evaluation Workshop, Madrid, Spain.Google Scholar
Index Terms
- CBR-Tagger: a case-based reasoning approach to the gene/protein mention problem
Recommendations
Named entity recognition using an HMM-based chunk tagger
ACL '02: Proceedings of the 40th Annual Meeting on Association for Computational LinguisticsThis paper proposes a Hidden Markov Model (HMM) and an HMM-based chunk tagger, from which a named entity (NE) recognition (NER) system is built to recognize and classify names, times and numerical quantities. Through the HMM, our system is able to apply ...
Analysis and repair of name tagger errors
COLING-ACL '06: Proceedings of the COLING/ACL on Main conference poster sessionsName tagging is a critical early stage in many natural language processing pipelines. In this paper we analyze the types of errors produced by a tagger, distinguishing name classification and various types of name identification errors. We present a ...
The DANTE Temporal Expression Tagger
Human Language Technology. Challenges of the Information SocietyIn this paper we present the DANTE system, a tagger for temporal expressions in English documents. DANTE performs both recognition and normalization of these expressions in accordance with the TIMEX2 annotation standard. The system is built on modular ...
Comments