ABSTRACT
The scientific literature is a rich resource for information retrieval on the biological knowledge. Nevertheless, the unstructured textual data in the research articles makes it difficult to access the information with computer-aided systems. Text-mining is one of the solution that can transform unstructured information in the text into database content, and most of the approaches are based on the machine learning models. Since these approaches require high-dimensional features, the performance of the model is heavily dependent on the selection of features. However, it is usually difficult and labor-intensive to choose good features, because feature extraction requires prior knowledge and ingenuity of human experts. Here, we suggest a novel framework to extract biological relations from the texts by using hierarchical text features that enhance the effectiveness of relation extraction model.
The proposed framework is composed of two parts, node and edge detection, using deep belief networks. Each part is based on the hierarchical text features learned by Gaussian-Bernoulli restricted Boltzmann machine (GBRBM). In this work, we performed gene-cancer relation extraction task as a pilot study. The classification model was trained based on both GE09 corpus from BioNLP'09 Shared Task and CoMAGC corpus. The results show that our model achieved better performance than other handcrafted feature-based approaches. The evaluation results suggest that deep belief networks offers the optimized and generalized hierarchical text features for the large-scale text mining.
- Sætre, Rune, et al. "AKANE system: protein-protein interaction pairs in BioCreAtIvE2 challenge, PPI-IPS subtask." Proceedings of the Second BioCreative Challenge Workshop. 2007.Google Scholar
- Leaman, Robert, and Graciela Gonzalez. "BANNER: an executable survey of advances in biomedical named entity recognition." Pacific Symposium on Biocomputing. Vol. 13. 2008.Google Scholar
- Charniak, Eugene. "A maximum-entropy-inspired parser." Proceedings of the 1st North American chapter of the Association for Computational Linguistics conference. Association for Computational Linguistics, 2000. Google ScholarDigital Library
- Björne, Jari, Filip Ginter, and Tapio Salakoski. "University of Turku in the BioNLP'11 Shared Task." BMC bioinformatics 13.Suppl 11 (2012): S4.Google ScholarCross Ref
- Hinton, Geoffrey E., Simon Osindero, and Yee-Whye Teh. "A fast learning algorithm for deep belief nets." Neural computation 18.7 (2006): 1527--1554. Google ScholarDigital Library
Index Terms
- Building Text-mining Framework for Gene-Phenotype Relation Extraction using Deep Leaning
Recommendations
A Flexible Text Mining System for Entity and Relation Extraction in PubMed
DTMBIO '15: Proceedings of the ACM Ninth International Workshop on Data and Text Mining in Biomedical InformaticsDue to an enormous number of scientific publications that cannot be handled manually, there is a rising interest in text-mining techniques for automated information extraction, especially in the biomedical field. Such techniques provide effective means ...
BiOnt: Deep Learning Using Multiple Biomedical Ontologies for Relation Extraction
Advances in Information RetrievalAbstractSuccessful biomedical relation extraction can provide evidence to researchers and clinicians about possible unknown associations between biomedical entities, advancing the current knowledge we have about those entities and their inherent ...
Comments