Systematic analysis of genetic alterations in tumors using Cancer Genome WorkBench (CGWB)

  1. Jinghui Zhang1,3,
  2. Richard P. Finney1,
  3. William Rowe1,
  4. Michael Edmonson1,
  5. Sei Hoon Yang1,2,
  6. Tatiana Dracheva1,
  7. Jin Jen1,
  8. Jeffery P. Struewing1, and
  9. Kenneth H. Buetow1
  1. 1 Laboratory of Population Genetics, National Cancer Institute, National Institutes of Health, Bethesda, Maryland 20892, USA;
  2. 2 Division of Pulmonary and Critical Care Medicine, Department of Internal Medicine, Wonkwang University Hospital, Cheonbuk 570-749, Korea

Abstract

Systematic investigations of genetic changes in tumors are expected to lead to greatly improved understanding of cancer etiology. To meet the analytical challenges presented by such studies, we developed the Cancer Genome WorkBench (http://cgwb.nci.nih.gov), the first computational platform to integrate clinical tumor mutation profiles with the reference human genome. A novel heuristic algorithm, IndelDetector, was developed to automatically identify insertion/deletion (indel) polymorphisms as well as indel somatic mutations with high sensitivity and accuracy. It was incorporated into an automated pipeline that detects genetic alterations and annotates their effects on protein coding and 3D structure. The ability of the system to facilitate identifying genetic alterations is illustrated in three projects with publicly accessible data. Mutagenesis in tumor DNA replication leading to complex genetic changes in the EGFR kinase domain is suggested by a novel deletion–insertion combination observed in paired tumor–normal lung cancer resequencing data. Automated analysis of 152 genes resequenced by the SeattleSNPs group was able to identify 91% of the 1251 indel polymorphisms discovered by SeattleSNPs. In addition, our system discovered 518 novel indels in this data set, 451 of which were found to be valid by manual inspection of sequence traces. Our experience demonstrates that CGWB not only greatly improves the productivity and the accuracy of mutation identification, but also, through its data integration and visualization capabilities, facilitates identification of underlying genetic etiology.

Footnotes

  • 3 Corresponding author.

    3 E-mail jinghuiz{at}mail.nih.gov; fax (301) 402-9325.

  • [Supplemental material is available online at www.genome.org.]

  • Article published online before print. Article and publication date are at http://www.genome.org/cgi/doi/10.1101/gr.5963407

    • Received September 18, 2006.
    • Accepted March 21, 2007.
| Table of Contents

Preprint Server