Conditional Random Fields for Biomedical Named Entity Recognition Revisited

Named Entity Recognition (NER) is a key task which automatically extracts Named Entities (NE) from the text. Names of persons, places, date and time are examples of NEs. We are applying Conditional Random Fields (CRFs) for NER in biomedical domain. Examples of NEs in biomedical texts are gene, proteins. We used a minimal set of features to train CRF algorithm and obtained a good results for biomedical texts.


Introduction
Although, Biomedical NER (BioNER) has received attention by many researchers, the task remains very challenging due to ambiguity, synonymy, nested and multi-word BioNE [1]. To tackle these challenges, many efforts have been put by researchers to develop efficient BioNER systems.
Approaches for BioNER include dictionary-based approach, rule-based approach, Machine Learning (ML) approach and hybrid approach. Machine learning approach uses ML algorithms, such as Support Vector Machines (SVM) [2], Conditional Random Fields (CRF) [3], Maximum Entropy (ME) [4] and Hidden Markove Model (HMM) [5] to build a model to detect and classify BioNEs. ML approach for BioNER depends on large training data and feature engineering process. Feature engineering includes extraction and selection of the features which will be used for learning the parameters of ML algorithm. Hybrid approach combines two or more different approaches for BioNER [6].
Classical ML classification algorithms have been applied in different NLP tasks such as Author Profiling [7,8] and for multi lingual [9,10,11]. Applying such algorithms requires hand features engineering and the performance of the model depends basically on the experience of feature designer.

Related Work
Nayel and Shashirekha [6], proposed an ensemble-based approach for BioNER using CRF and SVM classifiers with different segment representation schemes. A multi-task learning approach has been applied to BioNER using neural network architecture [12]. The best f-score achieved for BioNER using NCBI dataset is 80.73%. Leaman and Lu [13], used semi-markov models to build TaggerOne, a tool for BioNER. TaggerOne reported a high competitive f-score of 82.9% for NCBI dataset. Nayel and Shashirekha [14] integrated dictionary based features with deep model for disease NEs. Sunil and Ashish [15], designed a Recurrent Neural Network model for BioNER. They utilized Convolution Neural Network (CNN) for representing character embeddings and bidirectional LSTM for word embeddings. They evaluated their system on NCBI dataset and achieved f-score of 79.13%. Nayel et al [16] proposed a deep learning based approach to improve multi-word BioNER extraction. Wei et al [17], developed an ensemble-based system for Disease-NER. At the base level, they built CRF with a rulebased post-processing system and Bi-RNN based system. At the top level, they used SVM classifier for combining the results of the base systems.
In this paper, we designed a CRFs-based model for BioNER. In this paper a minimal set of features has been used to train the classification algorithm. The rest of the paper organized as follows: section 2 shows the related work. In section 3, the methodology details are given and in section 4 experiments details are given. Conclusion is given in section 5.

Conditional Random Fields
CRF [18] are undirected graphical models, a special case of which corresponds to conditionally trained probabilistic finite state automata. Being conditionally trained, CRF can easily incorporate a large number of arbitrary, non-independent features while having efficient procedures for non-greedy finite-state inference and training. CRF is used to calculate the conditional probability of values on designated output nodes given values on the designated input nodes. When implementing CRF for the NER task, the sequence of the words in a sentence is assumed as an observation sequence and the state sequence as its corresponding tag sequence. The context window plays an important role in building the classification model, because CRF assumes that the observations (words) are dependent. This assumption makes CRF a suitable classification model for NER.

Performance Evaluation
The performance of our system is reported in terms of f-score [6]. F-score is a harmonic mean of Precision (P) and Recall (R) and is calculated as follow:

Training
To train the model the following set of features has been extracted: • Word length

Results and Discussion
The detailed results in terms of Precision (P), Recall (R) and f1-measure for all datasets are given in Table 1. These results illustrate that CRF classifiers perform better with minimal features. The main point is the resources for CRF training are minimized in terms of time complexity, space and computations required. The results are still needs improvements such as applying CRF with embeddings as input space.

Conclusion
While deep learning models performs better in many cases, although the use of classical ML approaches still interesting for many tasks specially which needs more resources such as pretrained word representation, huge memory and time consuming. In this paper, we used a classical ML approach with minimal feature set.