eVOC: A Controlled Vocabulary for Unifying Gene Expression Data

  1. Janet Kelso1,
  2. Johann Visagie2,
  3. Gregory Theiler3,
  4. Alan Christoffels1,6,
  5. Soraya Bardien1,
  6. Damian Smedley4,
  7. Darren Otgaar2,
  8. Gary Greyling2,
  9. C. Victor Jongeneel3,
  10. Mark I. McCarthy4,5,
  11. Tania Hide2, and
  12. Winston Hide1,7
  1. 1 South African National Bioinformatics Institute, University of the Western Cape, Bellville, South Africa
  2. 2 Electric Genetics PTY Ltd. Bellville, South Africa
  3. 3 Office of Information Technology, Ludwig Institute for Cancer Research and Swiss Institute of Bioinformatics, Lausanne, Switzerland
  4. 4 Genetics and Genomics Research Institute, Imperial College Faculty of Medicine, Hammersmith Hospital, London, W12 0NN, UK
  5. 5 Wellcome Trust Centre for Human Genetics, Roosevelt Drive, Oxford OX37BN, UK

Abstract

Expression data contribute significantly to the biological value of the sequenced human genome,providing extensive information about gene structure and the pattern of gene expression. ESTs,together with SAGE libraries and microarray experiment information,provide a broad and rich view of the transcriptome. However, it is difficult to perform large-scale expression mining of the data generated by these diverse experimental approaches. Not only is the data stored in disparate locations,but there is frequent ambiguity in the meaning of terms used to describe the source of the material used in the experiment. Untangling semantic differences between the data provided by different resources is therefore largely reliant on the domain knowledge of a human expert. We present here eVOC,a system which associates labelled target cDNAs for microarray experiments,or cDNA libraries and their associated transcripts with controlled terms in a set of hierarchical vocabularies. eVOC consists of four orthogonal controlled vocabularies suitable for describing the domains of human gene expression data including Anatomical System,Cell Type,Pathology and Developmental Stage. We have curated and annotated 7016 cDNA libraries represented in dbEST,as well as 104 SAGE libraries,with expression information,and provide this as an integrated,public resource that allows the linking of transcripts and libraries with expression terms. Both the vocabularies and the vocabulary-annotated libraries can be retrieved from http://www.sanbi.ac.za/evoc/. Several groups are involved in developing this resource with the aim of unifying transcript expression information.

Footnotes

  • [Supplemental material is available online at www.genome.org.]

  • Article and publication are at http://www.genome.org/cgi/doi/10.1101/gr.985203.

  • 6 Present address: Molecular Genetics/Fugu informatics, Institute of Molecular and Cell Biology, Singapore.

  • 7 Corresponding author. E-MAIL winhide{at}sanbi.ac.za; FAX 27-21-959-2512.

    • Accepted February 25, 2003.
    • Received November 12, 2002.
| Table of Contents

Preprint Server