Mapping the C. elegans noncoding transcriptome with a whole-genome tiling microarray

  1. Housheng He1,2,8,
  2. Jie Wang1,2,8,
  3. Tao Liu1,2,8,
  4. X. Shirley Liu3,4,
  5. Tiantian Li1,2,
  6. Yunfei Wang1,2,
  7. Zuwei Qian5,
  8. Haixia Zheng1,2,
  9. Xiaopeng Zhu1,2,
  10. Tao Wu1,2,
  11. Baochen Shi1,2,
  12. Wei Deng1,
  13. Wei Zhou5,
  14. Geir Skogerbø1,9, and
  15. Runsheng Chen1,6,7,9
  1. 1 Bioinformatics Laboratory and National Laboratory of Biomacromolecules, Institute of Biophysics, Chinese Academy of Sciences, Beijing 100101, China;
  2. 2 Graduate School of the Chinese Academy of Science, Beijing 100080, China;
  3. 3 Department of Biostatistics and Computational Biology, Dana-Farber Cancer Institute, Harvard Medical School, Boston, Massachusetts 02115, USA;
  4. 4 Harvard School of Public Health, Boston, Massachusetts 02115, USA;
  5. 5 Affymetrix, Inc., Santa Clara, California 95051, USA;
  6. 6 Bioinformatics Research Group, Key Laboratory of Intelligent Information Processing, Institute of Computing Technology, Chinese Academy of Science, Beijing 100080, China;
  7. 7 Chinese National Human Genome Center, Beijing 100176, China
  1. 8 These authors contributed equally to this work.

Abstract

The number of annotated protein coding genes in the genome of Caenorhabditis elegans is similar to that of other animals, but the extent of its non-protein-coding transcriptome remains unknown. Expression profiling on whole-genome tiling microarrays applied to a mixed-stage C. elegans population verified the expression of 71% of all annotated exons. Only a small fraction (11%) of the polyadenylated transcription is non-annotated and appears to consist of ∼3200 missed or alternative exons and 7800 small transcripts of unknown function (TUFs). Almost half (44%) of the detected transcriptional output is non-polyadenylated and probably not protein coding, and of this, 70% overlaps the boundaries of protein-coding genes in a complex manner. Specific analysis of small non-polyadenylated transcripts verified 97% of all annotated small ncRNAs and suggested that the transcriptome contains ∼1200 small (<500 nt) unannotated noncoding loci. After combining overlapping transcripts, we estimate that at least 70% of the total C. elegans genome is transcribed.

Footnotes

  • 9 Corresponding author.

    9 E-MAIL crs{at}sun5.ibp.ac.cn; fax 86-10-64889892.

    9 E-mail zgb{at}moon.ibp.ac.cn; fax 86-10-64889892.

  • [Supplemental material is available online at www.genome.org.]

  • Article published online before print. Article and publication date are at http://www.genome.org/cgi/doi/10.1101/gr.6611807

    • Received April 13, 2007.
    • Accepted July 12, 2007.
| Table of Contents

Preprint Server