The National Center for Biotechnology Information (NCBI) of NIH has been taking a series of direct steps over the last year to insure that microarray technologies specifically, and gene expression quantifying technologies in general, have a full measure of bioinformatics support.

Essential steps in this support are taking place in two NCBI projects that are collecting, curating, annotating, organizing and publishing a host of gene-based information. The Reference Sequence project, RefSeq (http://www.ncbi.nlm.nih.gov/LocusLink/refseq.html), provides reference sequence standards for naturally occurring transcription- and translation-related polymers, including DNA, mRNA and proteins. These standards will provide stable reference points not only for mutation analyses and polymorphism discovery, but for gene expression studies as well. LocusLink (http://www.ncbi.nlm.nih.Gov/LocusLink/), which is closely related to RefSeq, provides a single query interface to curated sequence and descriptive information about genetic loci. This information includes official nomenclature, aliases, sequence accession numbers, phenotypes, EC numbers, MIM numbers, UniGene clusters, map information and relevant web sites.

NCBI has been actively dealing with gene expression data through its storage and analysis of expressed sequence tag (EST) and serial analysis of gene expression (SAGE) data. For several years, NCBI has been acting as a public repository for EST data through the dbEST database (http://www.ncbi.nlm.nih.Gov/dbEST/), and annotating this sequence information with gene, source and frequency information via the UniGene project (http://www.ncbi.nlm.nih.Gov/UniGene) and participation in the Cancer Genome Anatomy Project (http://www.ncbi.nlm.nih.Gov/CGAP). More recently, NCBI has developed a SAGE repository and several on-line tools for the exploration of that gene expression information (http://www.ncbi.nlm.nih.Gov/SAGE).

As a culmination of our efforts in the gene expression arena, NCBI has been exploring the feasibility of a gene expression resource of wider scope. It is anticipated that this resource would accept gene expression data from various sources, including microarray, high-density array (DNA chip) and SAGE technologies, as well as providing a number of precomputed analyses and on-line tools for the exploration of this data.