1 Introduction

Severe acute respiratory syndrome-coronavirus (SARS-CoV), the etiologic agent of SARS (13), is a virus that was unknown to us before the SARS epidemic. The concerted efforts of researchers have promptly elucidated its genetic code. The genome of SARS-CoV is a 29,727-nucleotide, polyadenylated RNA. The genomic organization is typical of coronaviruses, having the characteristic gene order 5′-polymerase [Orf1ab], spike [S], envelope [E], membrane [M], and nucleocapsid [N]-′) and short untranslated regions at both termini (4, 5).

With this sequence information, rapid PCR-based molecular diagnostic tests of SARS-CoV infection were designed (1, 610). Besides offering molecular diagnosis and quantitative measurement of viral load, PCR-based technologies have also been exploited to amplify the genomic fragments of SARS-CoV for sequence analysis. The high sensitivity and specificity of PCR has made this genomic sequence analysis possible even for uncultured clinical specimens. Unlike the conventional microbiological methods, PCR-based technologies may not require viral culture, which could introduce culture-derived artifacts in the genomic sequence. The specific PCR primers selectively amplify SARS-CoV sequences from the background of other nucleic acid sequences contributed by the patient or other microbes. Moreover, the PCR-based method is versatile in terms of the type of clinical specimens. In our hands, we have successfully analyzed the SARS-CoV genome directly from uncultured samples of serum, nasopharyngeal aspirate, and stools (11). This obviates any concern about the poor or even unsuccessful viral culture of the precious clinical specimens. The risk in handling large-volume and hazardous viral culture could also be avoided.

Genomic sequence variations were observed in the SARS-CoV obtained from different patients in this epidemic. Based on these sequence variations, most of the isolates are typified by two groups: isolates obtained from patients who were epidemiologically linked to the Metropole Hotel in Hong Kong, and those who were not (3, 12, 13). For example, there are seven sequence variations that can distinguish isolate CUHK-Su10, which is linked to the Metropole Hotel, from isolate CUHK-W1, which is not linked to this hotel case cluster ( Table 1 ). Among them, four variations at nucleotide positions 17564, 21721, 22222, and 27827 (according to the Tor2 sequence in GenBank, accession no. AY274119 [5]) were suggested by The Chinese SARS molecular epidemiology consortium (14) as part of a haplotype configuration that marks the different phases of a tri-phasic SARS epidemic in Guangdong Province of China. CUHK-W1 carried a haplotype G:A:C:C that typified the middle phase. Notably, the same haplotype was observed in CUHK-L2, which was one of the earliest confirmed case of SARS in Hong Kong, having been documented even before any report of the hotel case cluster (15). CUHK-Su10 carried a haplotype T:T:T:T that typified the late phase, marked by the hotel case cluster that spread the virus to many other parts of the world. Genomic sequence variations in SARS-CoV have also revealed the route of infection from within communities and across cities. For instance, compared with isolate CUHK-Su10, two mutations, T3852C and C11493T, first appeared in isolates CUHK-AG01, CUHK-AG02, CUHK-AG03 (GenBank accession numbers AY345986, AY345987, AY345988) obtained from patients involved in the Amoy Gardens outbreak in Hong Kong (11). Later, these two genetic fingerprints appeared in 10 completely sequenced Taiwanese isolates (16). Interestingly, toward the end of the epidemic, another type of fingerprint was found by PCR-based method. A variant of the SARS-CoV with a 386-nucleotide deletion was reported in a cluster of patients that seem to be epidemiologically related (17). Most of the cases were part of a documented outbreak in the North District Hospital in Hong Kong. We have illustrated that sequence variations among different isolates have a remarkable epidemiological correlation. Thus, PCR amplification followed by sequencing is a powerful tool in tracing the route of transmission. The sequence information may provide objective support to epidemiological investigations. Moreover, in the event that SARS re-emerges, one could quickly gain important insight into the origin and evolution status of the SARS-CoV simply by sequencing the critical sequence variations of the genome, as exemplified here. However, to extract this wealth of information from the limited primary clinical SARS specimens, we need very sensitive and efficient protocols to efficiently amplify fragments of the SARS-CoV genome for analysis.

Table 1 Comparison of the Sequences of Two Strains of Severe Acute Respiratory Syndrome (SARS)-Coronavirus Isolated From Patients in Hong Kong at the Beginning of the Epidemica

2 Materials

2.1 RNA Extraction

  1. 1.

    QIAamp viral RNA Mini Kit (Qiagen, Hilden, Germany).

  2. 2.

    Absolute ethanol.

2.2 Reverse-Transcription

  1. 1.

    Superscript III RNase H- reverse transcriptase (Invitrogen, Carlsbad, CA).

  2. 2.

    Random hexamers (Applied Biosystems, Foster City, CA).

  3. 3.

    5X First-strand synthesis buffer: 250 mM Tris-HCl, pH 8.3, 375 mM KCl,15 mM MgCl2 (Invitrogen).

  4. 4.

    RNasin RNase inhibitor (Promega, Madison, WI).

  5. 5.

    dNTP (Invitrogen).

  6. 6.

    0.1 M Dithiothreitol (DTT).

  7. 7.

    RNase-free water (Promega).

2.3 PCR Amplification

  1. 1.

    Advantage cDNA Polymerase mix and buffer (BD Biosciences Clontech, Palo Alto, CA).

  2. 2.

    dNTP.

  3. 3.

    PCR primers.

  4. 4.

    Distilled water.

2.4 Genomic Sequencing

2.5 Sequence Analysis and Comparison SeqScape software (Applied Biosystems).

3 Methods

3.1 Precautions Against Potential Contamination

Genomic sequencing involves PCR amplification, which produces numerous copies of the target DNA, and cycle sequencing, which requires the pipetting and manipulation of PCR products. These steps could easily contaminate the laboratory environment with amplified products. Such contamination problems would affect the interpretation of sequencing results, and adversely affect the performance of diagnostic tests designed to detect the same viral sequences. Hence, extreme care should be taken to avoid contamination. We suggest the following precautions:

  1. 1.

    Perform RNA extraction, PCR amplification, and genome sequencing in different laboratories, or at least in separate and dedicated compartments of the same laboratory.

  2. 2.

    Transfer reagents and samples only with aerosol-resistant pipet tips.

  3. 3.

    Prepare the PCR reagent master mix in a hood dedicated for this purpose. A set of clean gloves and dedicated lab gown should be worn in this area. Illuminate the hood with ultraviolet before and after use.

  4. 4.

    Any steps that involve the handling of cDNA, primary and secondary PCR products (including addition of DNA templates in assembling the PCR), electrophoresis, and cycle sequencing should be performed in a dedicated area far away from any PCR reagents. A separate lab gown and set of gloves should be worn in this area.

  5. 5.

    Discard all pipet tips that contacted DNA with extreme care. Use a double bag for disposal.

  6. 6.

    Include multiple negative PCR controls in each amplification to monitor for environmental contamination.

3.2 RNA Extraction

  1. 1.

    Prepare AVL lysis buffer and AW1 and AW2 wash buffers according to manufacturer's (Qiagen) instructions (see Note 1 )

  2. 2.

    In a biosafety level 2 (or above) containment laboratory, lyse 0.28 mL (1 vol) of viral culture by adding 1.12 mL (4 vol) of AVL buffer, mixing and incubating at room temperature for 10 min. Direct clinical samples, e.g., serum, nasopharyngeal aspirates, and stools, can also be used (see Note 2 )

  3. 3.

    Add 1.12 mL of absolute ethanol to the mixture. Pulse-vortex for 15 s.

  4. 4.

    Load the mixture to QIAamp spin column and wash the column according to the manufacturer's instructions.

  5. 5.

    Add 60 μL of RNase-free water onto the membrane and incubate for 1 min at room temperature. Centrifuge the spin column for 1 min at 6000g.

  6. 6.

    Quantify a small aliquot of the extracted viral RNA yield by real-time quantitative reverse-transcription (RT)-PCR (9) (see Note 3 ).

  7. 7.

    Store the extracted RNA at -80°C.

3.3 Reverse-Transcription

  1. 1.

    Prewarm two thermocycler blocks with heated lid at 72 and 25°C, respectively.

  2. 2.

    Mix 1 μL (50 pmol) random hexamer with 10 μL RNA in a 0.5-mL tube. Denature at 72°C for 10 min (see Note 4 .

  3. 3.

    During this period, assemble the reaction mix in another tube on ice according to Table 2 using SuperScript III RNase H- Reverse Transcriptase (see Note 5 ).

  4. 4.

    After denaturation, snap-cool the RNA-primer mixture on ice for 1 min. Briefly spin the tubes. Add the reaction mix prepared in step 2 to the RNA-primer mixture to make up a total reaction volume of 20 μL. Mix by pipetting gently up and down.

  5. 5.

    Immediately transfer the tube from ice to the prewarmed 25°C thermocycler block for a 5-min incubation. Prewarm the other thermocycler block at 55°C.

  6. 6.

    Transfer the tube to the prewarmed 55°C thermocycler block for a 1-h incubation.

  7. 7.

    Heat inactivate at 72°C for 15 min.

  8. 8.

    Add 1 μL (2 U) of RNase H and incubate at 37°C for 20 min to remove RNA complementary to the cDNA.

  9. 9.

    Dilute the product two- to fivefold with distilled water. Store at -20°C before use.

Table 2 Composition of Reaction Mix for Reverse-Transcription of Severe Acute Respiratory Syndrome-Coronavirus RNA

3.4 Primary PCR Amplification

  1. 1.

    Inside a hood dedicated for setting up PCR, assemble the PCR master mix for the 50 reactions according to Table 3 with cDNA polymerase mix (see Note 6 ) in a Genomic Sequencing of SARS-CoV 183 final reaction volume of 25 μL. Add 50 aliquots of 23 μL into a 96-well PCR microplate.

  2. 2.

    Add 5 pmol each of forward (PCR-F) and reverse (PCR-R) series of primers for each of the 50 reactions amplifying the overlapping amplicons that cover the whole SARS-CoV genome (see Note 7 ). The primer sequences are shown in Table 4.

  3. 3.

    In an area separate from the hood dedicated for PCR, add 1 μL of diluted reversetranscribed products.

  4. 4.

    Commence with PCR in a thermocycler with initial denaturation at 95°C for 1 min and 35 cycles of 95°C for 0.5 min, 55°C for 0.5 min, 68°C for 1.5 min, and a final extension at 68°C for 10 min.

Table 3 Composition of Reaction Mix for Polymerase Chain Reaction Amplification
Table 4 Primer Sequences

3.5 Secondary PCR Amplification

  1. 1.

    Inside a hood dedicated for setting up PCR, assemble the PCR master mix for the 50 reactions according to Table 3 in a final reaction volume of 25 μL. Add 50 aliquots of 23 μL into a new 96-well PCR microplate.

  2. 2.

    Add 5 pmol each of forward (PCR-F) and reverse (BSEQ-R) series of primers for each of the 50 semi-nested PCR reactions. The primer sequences are shown in Table 4.

  3. 3.

    In an area separate from the hood dedicated for PCR, add 1 μL of the corresponding primary PCR product.

  4. 4.

    Commence PCR in a thermocycler with initial denaturation at 95°C for 1 min and 35 cycles of 95°C for 0.5 min, 55°C for 0.5 min, 68°C for 1.5 min, and a final extension at 68°C for 10 min.

  5. 5.

    Electrophorese 5 μL of the secondary PCR product in a 2% agarose gel to verify the success of the PCR amplification. Estimate the amount of PCR product by comparison to DNA marker. Only products with single band should be used for sequencing.

3.6 Cycle Sequencing

Perform sequencing reaction based on the dideoxy dye terminator method, according to manufacturers' instructions:

  1. 1.

    Separate from the hood dedicated for PCR, assemble the cycle sequencing reaction with ASEQ-F, BSEQ-F, ASEQ-R, and BSEQ-R series of oligonucleotides as sequencing primers for each of the amplicon, and with 2-5 ng of secondary PCR product as sequencing template (see Note 8 ).

  2. 2.

    Commence with cycle sequencing reaction in a thermocycler.

  3. 3.

    Purify the extension products with either spin column purification or ethanol precipitation. Mix or resuspend the DNA in formamide solution according to the manufacturer's instructions.

  4. 4.

    Denature the purified extension products at 95°C for 5 min, snap-cool on ice, and load onto the automated capillary DNA sequencer for injection.

3.7 Sequence Comparison

  1. 1.

    Edit, align, and compare sequences using the Tor2 strain (GenBank accession number AY274119) as a reference with the software designed for this purpose, for example SeqScape (see Note 9 ).

  2. 2.

    Re-sequence regions that reveal nucleotide substitutions using a combination of different primer sets to ensure the quality of the sequencing data (see Note 10 ).

4 Notes

  1. 1.

    For RNA extraction, carrier poly (A) RNA is added to the lysis buffer to increase the yield. Because the PCR primers are specific to the SARS-CoV genome, the subsequent amplification would not be affected. However, if one wants to perform 3′ rapid amplification of cDNA ends (3′ RACE) or similar cloning operation that depends on oligo(dT) priming of poly(A) tail of the viral RNA, then the carrier poly(A) RNA should be avoided.

  2. 2.

    Our previous studies have shown that, with two rounds of PCR, even direct clinical samples can be sequenced. This obviates the need for viral culture, which may pose a health hazard if not handled properly. It also minimizes the possible generation of viral mutants through culturing. However, direct clinical samples of high viral titer and a sensitive PCR amplification are required.

  3. 3.

    Yields of viral RNA should be determined by quantitative RT-PCR, because spectrophotometric determination is prone to error as a result of low RNA quantity and interference by the carrier poly(A) RNA, which contributes to most of the RNA present.

  4. 4.

    A prolonged denaturation step is used to remove secondary RNA structures in the SARS-CoV genome that impede reverse-transcription. The use of random hexamer ensures an even representation of the whole RNA genome and allows more sequence information to be obtained from a limited amount of viral RNA.

  5. 5.

    We recommend the use of a reverse transcriptase with increased thermal stability, which facilitates reverse-transcription at a higher temperature (55°C) than normal (42°C). This unfolds some of the secondary RNA structures, and thus produces longer cDNA at higher yields.

  6. 6.

    We recommend the simultaneous use of two different DNA polymerases in the PCR amplification. For example, the cDNA polymerase mix that we use contains KlenTaq-1 DNA polymerase, and a second DNA polymerase with 3′ to 5′ proofreading activity. The inclusion of a minor amount of a proofreading polymerase results in an error rate that is significantly lower than that for Taq alone (18). This advantage is obvious when one is concerned about genomic sequence variations between different viral strains. The use of a two-polymerase system also increases the efficiency and yield, and hence the sensitivity, which is important when the viral titer is suboptimal.

  7. 7.

    The carryover of unused PCR primers into the sequencing reaction would lead to poor sequencing results. Like the sequencing primers, these unused PCR primers would also bind nonspecifically to the sequencing template in the cycle sequencing reaction, and, hence, generate noisy sequencing traces overshadowing the intended traces. Purification of the PCR products is, thus, usually recommended prior to their use as sequencing templates. However, these methods are laborintensive and pose extra contamination risk, as they involve additional steps of opening and handling PCR products. Notably, we have suggested an optimized PCR protocol for direct sequencing of PCR products without PCR product purification. With the low PCR primer concentrations and the optimal number of cycles, most of the PCR primers are consumed at the end of the PCR. Furthermore, a nested sequencing primer selectively extends the specific PCR product in the cycle sequencing reaction. This would suppress any nonspecific PCR product from extension. The combined effect is a neat sequencing trace.

  8. 8.

    The amount of PCR product used for the sequencing reaction must be optimized carefully with different sequencing systems. Although more PCR product input usually gives higher signal intensities, it may also give shorter read lengths and oversaturated signals.

  9. 9.

    The PCR primers target 50 700-bp amplicons that overlap with each other along the SARS-CoV genome. The sequencing primers are designed in such a way that any sequence masked over by the PCR primer binding sites and the sequencing primer peak on one amplicon are reliably backed up by the homologous sequence in the overlapping amplicon.

  10. 10.

    We advocate scrutinizing efforts in validating any genomic sequence variation by resequencing regions with different combination of primers and sequencing chemistry. Because variation seen in a single viral isolate could potentially be a result of sequencing artifacts, we consider only the genomic sequence variations that are shared by at least two SARS-CoV isolates.