Abstract
As biomedical science progresses, ontologies play an increasingly important role in easing the understanding of biomedical information. Although much research, such as Gene Ontology annotation, has been proposed to utilize ontologies to help users understand biomedical information easily, most of the research does not focus on capturing gene-related terms and their relationships within biomedical document collections. Understanding key gene-related terms as well as their semantic relationships is essential for comprehending the conceptual structure of biomedical document collections and avoiding information overload for users. To address this issue, we propose a novel approach called ‘GOClonto’ to automatically generate ontologies for conceptualization of biomedical document collections. Based on GO (Gene Ontology), GOClonto extracts gene-related terms from biomedical text, applies latent semantic analysis to identify key gene-related terms, allocates documents based on the key gene-related terms, and utilizes GO to automatically generate a corpus-related gene ontology. The experimental results show that GOClonto is able to identify key gene-related terms. For a test biomedical document collection, GOClonto shows better performance than other clustering algorithms in terms of F-measure. Moreover, the ontology generated by GOClonto shows a significant informative conceptual structure.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Ashburner, M., Ball, C.A., Blake, J.A., Botstein, D., Butler, H., Cherry, J.M., Davis, A.P., Dolinski, K., Dwight, S.S., Eppig, J.T., Harris, M.A., Hill, D.P., Issel-Tarver, L., Kasarskis, A., Lewis, S., Matese, J.C., Richardson, J.E., Ringwald, M., Rubin, G.M., Sherlock, G.: Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nat Genet 25(1), 25–29 (2000)
Zheng, H.T., Borchert, C., Kim, H.G.: A concept-driven automatic ontology generation approach for conceptualization of document corpora (unpublished manuscript, 2008)
Zamir, O., Etzioni, O.: Web document clustering: a feasibility demonstration. In: SIGIR 1998: Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval, pp. 46–54. ACM, New York (1998)
Zamir, O., Etzioni, O.: Grouper: a dynamic clustering interface to web search results. Comput. Netw. 31(11-16), 1361–1374 (1999)
Schockaert, S.: Het clusteren van zoekresultaten met behulp van vaagmieren (clustering of search results using fuzzy ants). Master thesis, University of Ghent (2004)
Lang, N.C.: A tolerance rough set approach to clustering web search results. Master thesis, Warsaw University (2004)
Osinski, S., Weiss, D.: A concept-driven algorithm for clustering search results. IEEE Intelligent Systems 20(3), 48–54 (2005)
Plaisant, C., Fekete, J.D., Grinstein, G.: Promoting insight-based evaluation of visualizations: From contest to benchmark repository. IEEE Transactions on Visualization and Computer Graphics 14(1), 120–134 (2008)
Olsen, K.A., Korfhage, R.R., Sochats, K.M., Spring, M.B., Williams, J.G.: Visualization of a document collection: the vibe system. Inf. Process. Manage. 29(1), 69–81 (1993)
Grobelnik, M., Maldenic, D.: Visualization of news articles. Informatica 28, 32–35 (2004)
Fortuna, B., Grobelnik, M., Mladenic, D.: Visualization of text document corpus. Informatica 29, 497–504 (2005)
Zhu, W., Chen, C.: Storylines: Visual exploration and analysis in latent semantic spaces. Computers & Graphics 31(3), 338–349 (2007)
Shaw, C.D., Kukla, J.M., Soboroff, I., Ebert, D.S., Nicholas, C.K., Zwa, A., Miller, E.L., Roberts, D.A.: Interactive volumetric information visualization for document corpus management. Int. J. on Digital Libraries 2(2-3), 144–156 (1999)
Fluit, C., Sabou, M., van Harmelen, F.: Ontology-based information visualisation: Towards semantic web applications. In: Visualising the Semantic Web, 2nd edn. (2005)
Thai, V., Handschuh, S., Decker, S.: IVEA: An information visualization tool for personalized exploratory document collection analysis. In: Bechhofer, S., Hauswirth, M., Hoffmann, J., Koubarakis, M. (eds.) ESWC 2008. LNCS, vol. 5021, pp. 139–153. Springer, Heidelberg (2008)
Bada, M., Turi, D., McEntire, R., Stevens, R.: Using reasoning to guide annotation with gene ontology terms in goat. SIGMOD Rec. 33(2), 27–32 (2004)
Camon, E., Magrane, M., Barrell, D., Lee, V., Dimmer, E., Maslen, J., Binns, D., Harte, N., Lopez, R., Apweiler, R.: The gene ontology annotation (goa) database: sharing knowledge in uniprot with gene ontology. Nucleic Acids Res. 32 (database issue) (2004)
Gene_Ontology_Annotation_Tool, http://www.geneontology.org/go.tools.annotation.shtml
Hill, D.P., Smith, B., McAndrews-Hill, M.S., Blake, J.A.: Gene ontology annotations: what they mean and where they come from. BMC bioinformatics 9 (suppl. 5) (2008)
Seki, K., Mostafa, J.: An application of text categorization methods to gene ontology annotation. In: SIGIR 2005: Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval, pp. 138–145. ACM, New York (2005)
Phan, X.H.: Crftagger: Crf english pos tagger (2006), http://crftagger.sourceforge.net/
Phan, X.H.: Crfchunker: Crf english phrase chunker (2006), http://crfchunker.sourceforge.net/
Salton, G., Wong, A., Yang, C.S.: A vector space model for automatic indexing. Commun. ACM 18(11), 613–620 (1975)
OWL_Web_Ontology_Language, http://www.w3.org/tr/owl-ref/
Carrot2, http://project.carrot2.org/
Steinbach, M., Karypis, G., Kumar, V.: A comparison of document clustering techniques. In: KDD Workshop on Text Mining (2000)
Rosse, C., Mejino, J.L.V.: A reference ontology for biomedical informatics: the foundational model of anatomy. J. of Biomedical Informatics 36(6), 478–500 (2003)
Stearns, M., Price, C., Spackman, K., Wang, A.: Snomed clinical terms: overview of the development process and project status. In: Proc. AMIA Symp., pp. 662–666 (2001)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2008 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Zheng, HT., Borchert, C., Kim, HG. (2008). Exploiting Gene Ontology to Conceptualize Biomedical Document Collections. In: Domingue, J., Anutariya, C. (eds) The Semantic Web. ASWC 2008. Lecture Notes in Computer Science, vol 5367. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-89704-0_26
Download citation
DOI: https://doi.org/10.1007/978-3-540-89704-0_26
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-89703-3
Online ISBN: 978-3-540-89704-0
eBook Packages: Computer ScienceComputer Science (R0)