ABSTRACT
The use of text mining and natural language processing can extend into the realm of knowledge acquisition and management for biomedical applications. In this paper, we describe how we implemented natural language processing and text mining techniques on the transcribed verbal descriptions from retinal experts of biomedical disease features. The feature-attribute pairs generated were then incorporated within a user interface for a collaborative ontology development tool. This tool, IDOCS, is being used in the biomedical domain to help retinal specialists reach a consensus on a common ontology for describing age-related macular degeneration (AMD). We compare the use of traditional text mining and natural language processing techniques with that of a retinal specialist's analysis and discuss how we might integrate these techniques for future biomedical ontology and user interface development.
- Aronson, A.R. Effective Mapping of Biomedical Text to the UMLS Metathesaurus: The MetaMap Program. Proc AMIA Symp, (2001), 17--21.Google Scholar
- Ashburner, M., et al. Gene Ontology: Tool for the unification of biology. The Gene Ontology Consortium. Nat. Genet, 25 (2000), 25--29.Google ScholarCross Ref
- Banerjee, S. and Pederson, T. The Design, Implementation and Use of the Ngram Statistics Package. Proceedings of the Fourth International Conference on Intelligent Text Processing and Computational Linguistics. (Feb 2003) Google ScholarDigital Library
- Bruijn, B. and Martin J. Getting to the (c)ore of Knowledge: Mining Biomedical Literature. International Journal of Medical Informatics, 67, 1-3, (2002), 7--18.Google Scholar
- Chen, L. and Friedman, C. Extracting Phenotypic Information from the Literature via Natural Language Processing. In MEDINFO 2004 (M. Fieschi et al., eds), 758--762.Google Scholar
- Church, K. and Hanks, P. Word Association Norms, Mutual Information, and Lexicography. Proceedings of the 28th Annual Meeting of the Association for Computational Linguistics, 16, 1 (1990). Google ScholarDigital Library
- Eom, J.-H. and Zhang, B.-T. PubMiner: Machine Learning-based Text Mining for Biomedical Information and Analysis. Genomics and Informatics, 2, 2, (2004), 99--106.Google Scholar
- Fiszman, M., Chapman, W.W., Aronsky, D., Evans, R.S., and Haug, P.J. Automatic Detection of Acute Bacterial Pneumonia from Chest X-ray Reports, Journal of the American Medical Informatics Association, 7, (2000), 593--604.Google ScholarCross Ref
- Friedman, C., Alderson, P.O., Austin, J.H., Cimino, J.J., and Johnson, S.B. A General Natural-Language Text Processor for Clinical Radiology. Journal of the American Medical Informatics Association, 1, 2, (1994), 161--174.Google ScholarCross Ref
- Friedman, C., Cimino, J.J., and Johnson, S.B. A Schema for Representing Medical Language Applied to Clinical Radiology. Journal of the American Medical Informatics Association, 1, 3, (1994), 233--248.Google ScholarCross Ref
- Friedman, C. Towards a Comprehensive Medical Language Processing System: Methods and Issues. Proc AMIA Symp, (1997), 595--599.Google Scholar
- Friedman, C. A Broad-Coverage Natural Language Processing System. Proc AMIA Symp, 19, 19, (2000), 270--274.Google Scholar
- Friedman, C., Kra, P., Yu, H., Krauthammer, M., and Rzhetsky, A. GENIES: A Natural Language Processing System for the Extraction of Molecular Pathways from Journal Articles. Bioinformatics, 17, Suppl 1, (2001), S74--S82.Google ScholarCross Ref
- Gruber, T.R. A Translation Approach to Portable Ontologies. Knowledge Acquisition, 5 (1993), 199--220. Google ScholarDigital Library
- Hearst, M. Untangling Text Data Mining. Proceedings of ACL'99: the 37th Annual Meeting of the Association of Computational Linguistics (1999). Google ScholarDigital Library
- Hearst, M. What is Text Mining? http://www.sims.berkeley.edu/~hearst/text-mining.htmlGoogle Scholar
- Hobbs, J.R. Information Extraction from Biomedical Text. Journal of Biomedical Informatics, 35, (2002), 260--264. Google ScholarDigital Library
- Humphreys, B.L., Lindberg, D.A., Schoolman, H.M., and Barnett, G.O. The Unified Medical Language System: an Informatics Research Collaboration. Journal of the American Medical Informatics Association, 5, (1998), 1--11.Google ScholarCross Ref
- Humphreys, K., Demetriou, G., and Gaizauskas, R. Two Applications of Information Extraction to Biological Science Journal Articles: Enzyme Interactions and Protein Structures. Pacific Symposium on Biocomputing, (2000), 505--516.Google Scholar
- Inniss, T.R. Seasonal Clustering Technique for Time Series Data. European Journal of Operational Research, In Press. Available online 10 August 2005 at Science Direct.Google Scholar
- Jurafsky, D. and Martin, J.H. Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition. Prentice Hall, Upper Saddle River, New Jersey, 2000. Google ScholarDigital Library
- Kim, J.-D., Ohta, T., Tateisi, Y., and Tsujii, J. GENIA corpus- a Semantically Annotated Corpus for Bio-textmining. Bioinformatics, 19, Suppl 1, (2003), i180--i182.Google ScholarCross Ref
- Kosala, R. and Blockeel, H. Web Mining Research: A Survey. Newsletter of the ACM Special Interest Group on Knowledge Discovery and Data Mining (SIGKDD Explorations), 2, 1 (2000), 1--15. Google ScholarDigital Library
- Krallinger, M., Erhardt, R. A.-A., and Valencia, A. Text-Mining Approaches in Molecular Biology and Biomedicine. Drug Discovery Today, 10, 6 (March 1995).Google Scholar
- Lussier, Y., Borlawsky, T., Rappaport, L.Y., and Friedman, C. PhenoGO: Assigning Phenotypic Context to Gene Ontology Annotations with Natural Language Processing. Pacific Symposium on Biocomputing, (2006), 64--75.Google Scholar
- Manning, C.D. and Schutze, H. Foundations of Statistical Natural Language Processing. MIT Press, Cambridge, Massachusetts, 2002. Google ScholarDigital Library
- Novichkova, S., Egorov, S., and Daraselia, N. MedScan, a Natural Language Processing Engine for MEDLINE Abstracts. Bioinformatics, 19, 13, (2003), 1699--1706.Google ScholarCross Ref
- Raychaudhuri, S., Schutze, H., and Altman, R.B. Using Text Analysis to Identify Functionally Coherent Gene Groups. Genome Research, 12, (2002), 1582--1590.Google ScholarCross Ref
- Rosse, C. and Mejino, J.L. A Reference Ontology for Biomedical Informatics: the Foundational Model of Anatomy. Journal of Biomedical Informatics, 36, 6, (2003), 478--500. Google ScholarDigital Library
- Sager, N., et al. Natural Language Processing and Representation of Clinical Data. Journal of the American Medical Informatics Association, 1, (1994), 142--160.Google ScholarCross Ref
- SAS Institute Inc. Mining Textual Data Using SAS Text Miner for SAS® 9. SAS Institute Inc., Cary, North Carolina, 2004.Google Scholar
- SAS Institute Inc. Getting Started with SAS® 9.1 Text Miner. SAS Institute Inc., Cary, North Carolina, 2004. Google ScholarDigital Library
- Spyns, P. Natural Language Processing in Medicine: an Overview. Methods Inf Med, 35, 4-5, (1996), 285--301.Google Scholar
- Thomas, J., Milward, D., Ouzounis, C., Pulman, S., and Carroll, M. Automatic Extraction of Protein Interactions from Scientific Abstracts. Pacific Symposium on Biocomputing, (2000), 541--552.Google Scholar
- Williams, A.B., Krygowski, T., and Casavant, T. I-DOCS: Distributed Agent-Assisted Knowledge Fusion for Disease Gene Discovery. Proceeding of the Eighth International Conference on Parallel and Distributed Systems (Kyongju, ,Korea, June 26-29 2001). IEEE Computer Society Press, 698--70. Google ScholarDigital Library
Index Terms
- Towards applying text mining and natural language processing for biomedical ontology acquisition
Recommendations
Text mining and natural language processing: introduction for the special issue
Natural language processing and text miningThis paper provides an introduction to this special issue of SIGKDD Explorations devoted to Natural Language Processing and Text Mining.
Ontology learning from biomedical natural language documents using UMLS
Research highlights► Ontologies provide vocabulary standardization. ► In this work, a methodology for building biomedical ontologies from texts is presented. ► This approach relies on natural language processing and incremental ...
AbstractThe generation of new knowledge is continuous in biomedical domains, thus biomedical literature is becoming harder to understand. Ontologies provide vocabulary standardization, so they can be helpful to facilitate the understanding of ...
Mining novel connections from online biomedical text databases using semantic query expansion and semantic-relationship pruning
This paper proposes a semantic-based approach for mining novel connections from biomedical literature. The method takes advantage of the biomedical ontologies, MeSH and UMLS, as the source of semantic knowledge. A prototype system, Biomedical Semantic-...
Comments