skip to main content
10.1145/1244002.1244031acmconferencesArticle/Chapter ViewAbstractPublication PagessacConference Proceedingsconference-collections
Article

On-the-fly data integration models for biological databases

Published:11 March 2007Publication History

ABSTRACT

The web is a universal repository of information where there is an excellent opportunity to exploit the integration of online biological resources for knowledge discovery. A major challenge is to support the effective flow of information among the sources and services on the web and their interconnection with legacy systems that are designed to operate with traditional relational databases. To address this problem, a possible strategy is to combine information from disparate data sources and display it in a single integrated framework to the user without having to populate local databases. This is called online or on-the-fly data integration. BioXBase is a user-centric biological query system which extracts user requested query information over internet from multiple biological sources and organizes a wide variety of information into a homogeneous unified view to the user after data is cleaned, processed and integrated. BioXBase system has improved the results retrieved approximately by 30% compared to a system that has only a local database. The BioXBase system is further enhanced by 20% while combining the results of both BioMap (a local database) and BioXBase (on the fly system), making the results more significant in biological domain. The results were validated by statistical methods such as precision, recall and power-law degree distribution analysis.

References

  1. Drakos, N. (1994). The LaTeX to HTML translator. Internal report. Computer Based Learning Unit, University of Leeds, January 1994.Google ScholarGoogle Scholar
  2. Brabrand, C., Moller, A. and Schwartzbach, M.I.(2001). Static validation of dynamically generated HTML. In Workshop on Program Analysis for Software Tools and Engineering. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Haas, L. M., Lin, E. T., and Roth, M. A. (2002). Data integration through database federation. IBM Systems Journal 41, 4 (Oct. 2002), 578--596. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Haas, L. M., Miller, R. J., Niswonger, B., Tork Roth, M., Schwarz, P. M. and Wimmers E. L. (1999). Transforming Heterogeneous Data with Database Middleware: Beyond Integration. IEEE Data Engineering Bulletin, 22(1):31--36.Google ScholarGoogle Scholar
  5. Suciu, D. (2002). Distributed query evaluation on semi structured data. ACM Trans. Database System. 27, 1 (Mar. 2002), 1--62. Page 234. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Köhler, J., Philippi, S. and Lange, M. (2003). SEMEDA-Ontology based integration of biological databases, Bioinformatics, vol. 19, no. 18, pp. 2420--2427.Google ScholarGoogle ScholarCross RefCross Ref
  7. Draper, D., Halevy, A. Y., and Weld, D. S. (2001). The Nimble XML Data Integration System. Proceedings of the 17th international Conference on Data Engineering (2001).IEEE Computer Society, Washington, DC, 155--160. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. http://disl.cc.gatech.edu/XWRAP/xwrap.htmlGoogle ScholarGoogle Scholar
  9. http://www-static.cc.gatech.edu/projects/disl/XWRAPElite/Google ScholarGoogle Scholar
  10. http://sunsite.unc.edu/pub/suninfo/standards/xml/why/xmlapps.html.Google ScholarGoogle Scholar
  11. Mork, P., Shaker R., Halevy, A. and Tarczy, P. (2002). PQL: A Declarative Query Language over Dynamic Biological Schemata. Proceedings of the Annual Symposium of the American Medical, 2002 - sigpubs.biostr.washington.edu. Pages - 1--5.Google ScholarGoogle Scholar
  12. Carey, M. J., Haas, L. M., Schwarz, P. M., Arya, M., Cody, W. F., Fagin, R., Flickner, M., Luniewski, A., Niblack, W., Petkovic, D., Thomas, J., Williams, J. H. and Wimmers, E. L.(1995). Towards heterogeneous multi- media information systems: The Garlic approach. In Proc. of the 5th Int. Workshop on Research Issues in Data Engineering - Distributed Object Management (RIDE-DOM'95), pages 124--131. IEEE Computer Society Press, 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Mork, P., Halevy, A. and Hornoch, T.(2001). A Model for Data Integration Systems of Biomedical Data Applied to Online Genetic Databases. In Proceedings of the Symposium of the American Medical Informatics Association. Page 7.Google ScholarGoogle Scholar
  14. Güler, S., Eberhart, A. and Rojas, L., (2003). Web-based exchange of biochemical information Bioinformatics Vol. 19 no. 13., Pages 1730--1731.Google ScholarGoogle Scholar
  15. Ives, Z. G., Halevy, A. Y. and Weld, D. S. (2001). Integrating network-bound XML data. IEEE Data engineering Bulletin Special Issue on XML, 24(2), June 2001.Google ScholarGoogle Scholar
  16. Hernández, M. A., Miller, R. J. and Haas, L. M.(2001).Clio: A Semi-Automatic Tool For Schema Mapping. SIGMOD, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. http://www.ebi.ac.uk/interpro/project_outlines.htmlGoogle ScholarGoogle Scholar
  18. http://www.pir.uniprot.org/Google ScholarGoogle Scholar
  19. http://www.genome.jp/kegg/Google ScholarGoogle Scholar
  20. Barabasi, A. L. and Oltvai, Z. N.(2004). Network biology: understanding the cell's functional organization.Nature Rev. Genet. 5, 101--113 (2004).Google ScholarGoogle ScholarCross RefCross Ref
  21. Borish, L. C. and J. W. Steinke. (2003). Cytokines and chemokines. J. Allergy Clin. Immunol. 111:S460--S475.Google ScholarGoogle ScholarCross RefCross Ref
  22. Chen, R., Pan, S. and Brentnall, T. A., Aebersold, R.(2005) Proteomic profiling of pancreatic cancer for biomarker discovery. Mol Cell Proteomics 2005;4:523--33.Google ScholarGoogle Scholar
  23. Palakal, M., Mukhopadhyay, S. and Stephens, M. (2005). Identification of Biological Relationships from Text Documents. Book in "Medical Informatics: Advances in Knowledge Management and Data Mining in Biomedicine, Ed. H. Chen. Kluwer Publishers, pp.449--489.Google ScholarGoogle Scholar

Index Terms

  1. On-the-fly data integration models for biological databases

          Recommendations

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in
          • Published in

            cover image ACM Conferences
            SAC '07: Proceedings of the 2007 ACM symposium on Applied computing
            March 2007
            1688 pages
            ISBN:1595934804
            DOI:10.1145/1244002

            Copyright © 2007 ACM

            Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

            Publisher

            Association for Computing Machinery

            New York, NY, United States

            Publication History

            • Published: 11 March 2007

            Permissions

            Request permissions about this article.

            Request Permissions

            Check for updates

            Qualifiers

            • Article

            Acceptance Rates

            Overall Acceptance Rate1,650of6,669submissions,25%

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader