Targeting a Complex Transcriptome: The Construction of the Mouse Full-Length cDNA Encyclopedia

  1. Piero Carninci1,2,
  2. Kazunori Waki1,
  3. Toshiyuki Shiraki1,
  4. Hideaki Konno1,
  5. Kazuhiro Shibata2,
  6. Masayoshi Itoh2,
  7. Katsunori Aizawa1,
  8. Takahiro Arakawa1,
  9. Yoshiyuki Ishii1,
  10. Daisuke Sasaki1,
  11. Hidemasa Bono1,
  12. Shinji Kondo1,
  13. Yuichi Sugahara1,
  14. Rintaro Saito1,
  15. Naoki Osato1,
  16. Shiro Fukuda1,
  17. Kenjiro Sato2,3,
  18. Akira Watahiki2,3,
  19. Tomoko Hirozane-Kishikawa1,
  20. Mari Nakamura1,
  21. Yuko Shibata2,6,
  22. Ayako Yasunishi1,
  23. Noriko Kikuchi2,
  24. Atsushi Yoshiki5,
  25. Moriaki Kusakabe5,7,
  26. Stefano Gustincich8,
  27. Kirk Beisel9,
  28. William Pavan10,
  29. Vassilis Aidinis11,
  30. Akira Nakagawara12,
  31. William A. Held13,
  32. Hiroo Iwata14,
  33. Tomohiro Kono15,
  34. Hiromitsu Nakauchi16,
  35. Paul Lyons17,
  36. Christine Wells18,
  37. David A. Hume18,
  38. Michela Fagiolini19,
  39. Takao K. Hensch19,
  40. Michelle Brinkmeier20,
  41. Sally Camper20,
  42. Junji Hirota21,
  43. Peter Mombaerts21,
  44. Masami Muramatsu1,2,3,
  45. Yasushi Okazaki1,2,
  46. Jun Kawai1,2, and
  47. Yoshihide Hayashizaki1,2,3,4,22
  1. 1Laboratory for Genome Exploration Research Group, RIKEN Genomic Sciences Center (GSC), RIKEN Yokohama Institute, Suehiro-cho, Tsurumi-ku, Yokohama, Kanagawa 230-0045, Japan
  2. 2Genome Science Laboratory, RIKEN, Hirosawa, Wako, Saitama 351-0198, Japan
  3. 3Institute of Basic Medical Sciences, University of Tsukuba, Tsukuba, Ibaraki 305-8577, Japan
  4. 4Japan Division of Genomic Information Resources, Science of Biological Supramolecular Systems, Graduate School of Integrated Science, Yokohama City University, Tsurumi-Ku, Yokohama 230-0045, Japan
  5. 5Experimental Animal Research Division, Biogenic Resources Center, RIKEN Tsukuba Institute, Tsukuba, Ibaraki 305-0074, Japan
  6. 6Dnaform International, Inc., Ami Town, Inashiki District, Ibaraki 300-0332, Japan
  7. 7Aloka Co., LTD, Kasumigaura-cho, Niihari-gun, Ibaraki 300-0134 Japan
  8. 8Department of Neurobiology, Harvard Medical School, Boston, Massachusetts 02115, USA
  9. 9Boys Town National Research Hospital, Omaha, Nebraska 68131, USA
  10. 10National Human Genome Research Institute, National Institutes of Health, Bethesda, Maryland 20892, USA
  11. 11Institute of Immunology, Biomedical Sciences Research Center A1. Fleming, 16672 Vari, Greece
  12. 12Chiba Cancer Center Research Institute, Division of Biochemistry, Chuo-ku, Chiba 260-8717, Japan
  13. 13Roswell Park Cancer Institute, Buffalo, New York 14263, USA
  14. 14Department of Reparative Materials Field of Tissue Engineering, Institute for Frontier Medical Sciences, Kyoto University, Sakyo-ku, Kyoto 606-8507, Japan
  15. 15Faculty of Applied Bioscience, Department of BioScience, Tokyo University of Agriculture, Setagaya-ku, Tokyo 156-8502, Japan
  16. 16Laboratory of Stem Cell Therapy Center for Experimental Medicine, Institute of Medical Science, University of Tokyo Minato-ku, Tokyo 108-8639, Japan
  17. 17DRF/WT Diabetes and Inflammation Laboratory Cambridge Institute for Medical Research, Cambridge CB2 2XYUK
  18. 18The Institute for Molecular Biosciences, The University of QLD, St. Lucia Brisbane, QLD 4072 Australia
  19. 19Neuronal Function Research, Lab for Neuronal Circuit Development, RIKEN Brain Science Institute (BSI), Wako-shi, Saitama 300-0198, Japan
  20. 20University of Michigan Medical, Ann Arbor, Michigan 48109, USA
  21. 21Developmental Biology and Neurogenetics, The Rockefeller University, New York, New York 10021, USA

Abstract

We report the construction of the mouse full-length cDNA encyclopedia,the most extensive view of a complex transcriptome,on the basis of preparing and sequencing 246 libraries. Before cloning,cDNAs were enriched in full-length by Cap-Trapper,and in most cases,aggressively subtracted/normalized. We have produced 1,442,236 successful 3′-end sequences clustered into 171,144 groups, from which 60,770 clones were fully sequenced cDNAs annotated in the FANTOM-2 annotation. We have also produced 547,149 5′ end reads,which clustered into 124,258 groups. Altogether, these cDNAs were further grouped in 70,000 transcriptional units (TU),which represent the best coverage of a transcriptome so far. By monitoring the extent of normalization/subtraction, we define the tentative equivalent coverage (TEC),which was estimated to be equivalent to >12,000,000 ESTs derived from standard libraries. High coverage explains discrepancies between the very large numbers of clusters (and TUs) of this project,which also include non-protein-coding RNAs,and the lower gene number estimation of genome annotations. Altogether,5′-end clusters identify regions that are potential promoters for 8637 known genes and 5′-end clusters suggest the presence of almost 63,000 transcriptional starting points. An estimate of the frequency of polyadenylation signals suggests that at least half of the singletons in the EST set represent real mRNAs. Clones accounting for about half of the predicted TUs await further sequencing. The continued high-discovery rate suggests that the task of transcriptome discovery is not yet complete.

Footnotes

  • [Supplemental material available online at www.genome.org.]

  • Article and publication are at http://www.genome.org/cgi/doi/10.1101/gr.1119703.

  • 22 Corresponding author. E-MAIL: rgscerg{at}gsc.riken.go.jp; FAX 8145 503 9216.

    • Accepted March 21, 2003.
    • Received December 2, 2002.
| Table of Contents

Preprint Server