The terminal redundancy of the retrovirus genome facilitates chain elongation by reverse transcriptase.

Transcription of DNA from the RNA genome of retroviruses by reverse transcriptase involves an unusual translocation of the growing chain from the 5' end to the 3' end of the RNA template. In order to elucidate the mechanism by which this translocation occurs, we have used chain termination to analyze nascent viral DNA synthesized in vitro by avian sarcoma virus, and we have determined the nucleotide sequence of appropriate regions of viral DNA isolated from infected cells and cloned into prokaryotic vectors. Our results provide direct experimental evidence for a previously proposed model in which a short terminal redundancy in viral RNA, and a DNA copy of the redundant sequence, are used to allow the growing DNA chain to move from the 5' to the 3' end of the template. Transcription of avian sarcoma virus RNA with purified reverse transcriptase also generates an anomalous product, a hairpin DNA that arises when the initial DNA transcript folds back on itself to continue synthesis. The foldback is mediated by an inverted repeat of 5 nucleotides in the sequence of nascent DNA. Anomalous hairpin DNA is not produced by detergent-activated virions. Thus, constituents of the virions or the configuration of encapsidated viral RNA must facilitate correct transcription.

Transcription of DNA from the RNA genome of retroviruses by reverse transcriptase involves an unusual translocation of the growing chain from the 5' end to the 3' end of the RNA template. In order to elucidate the mechanism by which this translocation occurs, we have used chain termination to analyze nascent viral DNA synthesized in vitro by avian sarcoma virus, and we have determined the nucleotide sequence of appropriate regions of viral DNA isolated from infected cells and cloned into prokaryotic vectors. Our results provide direct experimental evidence for a previously proposed model in which a short terminal redundancy in viral RNA, and a DNA copy of the redundant sequence, are used to allow the growing DNA chain to move from the 5' to the 3' end of the template.
Transcription of avian sarcoma virus RNA with purified reverse transcriptase also generates an anomalous product, a hairpin DNA that arises when the initial DNA transcript folds back on itself to continue synthesis. The foldback is mediated by an inverted repeat of 5 nucleotides in the sequence of nascent DNA. Anomalous hairpin DNA is not produced by detergent-activated virions. Thus, constituents of the virions or the confi~ration of encapsidated viral RNA must facilitate correct transcription.
The replication of retroviruses is mediated by a virus-specific DNA intermediate ( l ) , synthesized during the early hours of infection and subsequently integrated into the chromosomal DNA of the host cell. Viral DNA is transcribed from the single-stranded RNA genome of retroviruses by "reverse transcriptase" (21, an RNA-directed DNA polymerase encoded in a viral gene and encapsidated in virus particles. The products of viral DNA synthesis in cells infected with retroviruses include both linear and circular duplex molecules (3-6). The linear form of viral DNA is bounded by a direct terminal redundancy composed of nucleotide sequences representing domains of both the 3' and 5' ends of the viral RNA genome (7,8); some of the circular molecules conbin both copies of the redundant domains, others contain only one Our knowledge of how these forms of retrovirus DNA are generat.ed from a linear single-stranded template remains incomplete. In particular, we need to account for the replication of the ends of the linear RNA template by primerdependent polymerization (11), the genesis of a terminal redundancy in the linear DNA product, and the circularization * This work was supported by United States Public Health Service Grants CA 12705. CA 19287, and Training Grant 11'32 CA 09043, and American Cancer Society Grant VC-70. The costs of p~lblication of this article were defrayed in part by the pagment of page charges. This article must therefore be hereby marked ~~f f d c~e r~i s e m e n t " in accordance with 18 1J.S.C. Section 1734 solely to indicate this fact. copy (7-10).

11
of a portion of the products. It has been argued previously that mechanisms for the first and second of these events are suggest,ed, at least in part, by events that occur during the course of viral DNA synthesis in &" or in z l z z x~ (for a review, see Ref. 12). The present communication confirms and extends this view by providing both a detailed analysis of early events during the transcription of DNA from the genome of avian sarcoma virus in aifro and a correlative analysis of DNA produced in infected cells. Transcription of the ASV' genome init,itates on a tRNA""' primer located about 100 nucleotides from the 5' end of t,he RNA (131, proceeds to the end of the template, and then moves to the vicinity of the %-terminus of the template (14-16). As a result, both ends of the RNA template are copied in tandem, shortly after initiation of DNA synthesis. Fig. 1 illustrates a model of how these events might occur (14,(17)(18)(19)(20)(21). The model derives from the finding that the RNA genome of ASV (and of closely related viruses) possesses a direct terminal redundancy, each copy composed of 16 to 21 nucleotides (DTR2,) (17)(18)(19)22) and included in the terminal redundancy in viral DNA described above (7,s). In the model, RNA at the 5"terminus of the viral genome is removed from the complementary DNA transcript (perhaps by the action of RNase H activity associated with reverse transcriptase ( 2 ) ) , and the DNA base-pairs with the DT& sequence at the 3' end of the template. As a consequence, the nascent DNA is in position for continued transcription from the full length of the template. In addit,ion, the joining of sequences from the 5' and 3' ends of the RNA template in the DNA creates the sequence organization as it appears at, one end of the final DNA product.
On the basis of this model, we can make two predictions concerning the structure of the initial DNA transcript that joins the 5' and 3' domains of the template. First, the DNA will contain only one copy of DTRLzI, whereas the template had two copies at the outset. Second, transcription from the 3' domain of the template will begin with the nucleotide immediately adjacent to the 5' boundary of DTIZ?,. Data consistent with the first prediction were obtained previously by studying t.he pyrimidine tracts of DNA synthesized in cdro by murine leukemia virus ( 2 3 ) , and more recently by nucleotide sequence analysis of cloned murine leukemia virus cDNA ( 2 4 ) . However, in these studies the precise site of transcription from the 3' region of the RNA template was not ~etermined. We now demonstrate that the nucleotide sequence of ASV DNA synt.hesized either i n ritro or in r~z r v~ fulfills both predictions. In addition, we show that transcription of the ASV genome by purified reverse transcriptase (reconstructed reaction) frequently copies nucleotide sequences at the 5' end of the viral genome into a hairpin structure, and we describe the structural basis of this phenomenon. We have not observed ' The abbreviations used are: ASV, avian sarcoma virus; D1'Kzt, the 16-to ~l-nucIeotide-ion~ terminal redundancy in the ASV genome. this hairpin species during synthesis by the holoenzyme in detergent-disrupted virions (endogenous reaction). Our data conform to previous indications that the synthesis of retrovirus DNA proceeds by identical mechanisms in vitro and in uiuo, although the use of purified reverse transcriptase may introduce artifacts not encountered in either DNA synthesized by detergent-disrupted virions or viral DNA synthesis in infected cells.

RESULTS~
Early Euerzts in, the Synthesis of ASV ~~A -W e used the dideoxy chain terminator technique for DNA sequencing (25) to examine early events during the synthesis of ASV DNA. This sequencing technique was adapted to allow the use of reverse transcriptase, rather than ~s c~e r i c h i~ coli polymease I, so that nascent DNA transcripts from viral RNA could be examined. We determined the structure of viral DNA transcripts under five different circumstances (listed below), including the endogenous reaction, reconstructed reaction, and DNA synthesized in vivo, and compared the sequences obtained to the known sequences at the 5' (18,22) and 3'" (17) ends of viral RNA. In each instance the data provided support for the model described above.
(i) "Endogenous reactions" with detergent-disru~ted virions. Examination of the sequence of nascent DNA transcripts synthesized in the endogenous reaction shows that only one copy of the DTR2, sequence is present (Fig. 2). The sequence of the first two nucleotides transcribed beyond the 5' end of the template could not be determined due to experimental artifact (see miniprint, Fig. 3A). The third nucleotide beyond the 5' end of the template represents transcription from the third position upstream of the DTRzI at the 3' end of the template. We assume that the two obscured positions represent synthesis from the two nucleotides immediately adjacent to the 3'-DTR21 sequence, an assumption confirmed by sequencing DNA synthesized in vivo (see below).
(ii) The sequence of DNA synthesized in the endogenous reaction in the presence of actinomycin D, an inhibitor of DNA-dependent DNA synthesis ( 2 6 ) , was identical with the sequence obtained in the absence of the drug, indicating that RNA is serving as the template for the transcripts extended beyond the 5' end of the template (Fig. 2). We observed a substantial decrease in the frequency of transcripts extended beyond the 5' end of the template in the presence of actinomycin D (see miniprint, Figs. 3B and 5). This is probably due to the drug binding to the DNA complement of the DTRn sequence and inhibiting base-pairing with the 3'-DTR2, sequence (27).
(iii) DNA transcripts synthesized in the reconstructed reaction have a sequence similar to the sequence determined in the endogenous reaction (Fig, 2 ) , although the situation is complicated by the presence of a second type of transcript. Beyond the 5' end of the template, transcription occurs from two templates, giving rise to a double sequence (see miniprint, Fig. 4). The double sequence can be divided into extended transcription from the 3' end of the template (Fig. 2 ) , as with the endogenous reaction, and a foldback transcript. giving rise to a hairpin product (see below).   (17-19, 2''). gag-pol, enu, and src refer to the known viral genes 129).
(iv) Our interpretation of the double sequence is supported by experiments with viral RNA lacking the 3' end; this t.emplate supports the synthesis of only the hairpin DNA (see miniprint, Figs. 5 and 6). This last result also supports the model; removal of the 3' end of the template blocks the synthesis of the extended transcripts that are observed in the endogenous reaction and predicted by the model. The viral polymerase and viral 70 S RNA appear to be the only virion components required to accurately carry out the initial steps of DNA synthesis.
(v) We examined the sequence of viral DNA which had been synthesized in uiuv and amplified by molecular cloning    (Fig. 3A). DNA: endogenous reaction with actinomycin D, DNA sequence dekrmined from the endogenous reaction containing actinomycin D at 100 pg/ml (Fig.  3B). DNA: reconstructed reaction, DNA sequence determined from the reconstructed reaction using purified reverse transcriptase and TO S viral RNA as template (Fig. 4). DNA: in uiro, DNA sequence determined from the region of cloned ASV DNA as described in the legend to Fig. 7. The nucleotide sequence of DNA synthesized tn vitro is extended heyond the sequence shown in Figs. 3 and 4 to facilitate comparison of the entire DTR2! sequence. (28). The structure of the cloned viral DNA also supports the model; the region of this DNA analogous to the region studied in vitro contained one copy of the DTRYI sequence, and the upstream sequence represented the nucleotides immediately upstream from the DTRz sequence at the 3' end of the RNA template (Fig. 2). Thus, in each of the settings studied all of the sequences present at the 5' and 3' ends of the template are present in the nascent DNA transcript, fused by one copy of the DTRz, sequence.

DISCUSSION
Early Events in the Synthesis of ASV DNA-Our findings substantiate and extend previous accounts of the early events in the synthesis of retrovirus DNA (29). In particular, we conclude that. DNA transcribed from the 5' end of the viral genome subsequently base-pairs with a complementary nucleotide sequence at the 3' end of the genome. In this way primer-dependent, replication of the template ends is accomplished, and nascent DNA is in position for continued transcription from the full length of the template. These conclusions provide a function for the terminal redundancy in the viral genome and, therefore, account for conservation and cosegregation of the redundant nucleotide sequences during genetic recombination among retroviruses (30,31). A similar conclusion has been reached after analyzing pyrimidine tracts (23), and more recently a cDNA clone (24), of DNA transcripts produced in the endogenous reaction of murine leukemia virus.
As can be seen in Fig. I, the early intermediate we have studied has the sequence organization found at the right end of the linear viral DNA j7, 8). The generation of these sequences is accomplished with the apparent sacrifice of one copy of the DTIt,, sequence, since two copies of the DTRz, sequence in the template are used to generate one copy in the early DNA intermediate. The loss of information is rectified during the generation of the terminal redundancy in the DNA (7,8,32). Our own sequencing studies support this view, showing that each copy of the redundant domains in the DNA contains one copy of the DTRrl seq~ence.~ Since the genome of retroviruses is diploid (12) plates for viral DNA synthesis are available in each virion. Does each of these templates give rise to a haploid unit of viral DNA, or do they collaborate in the production o f a single molecule of viral DNA? The mechanism illustrated in Fig. 1 permits either possibility and, therefore, leaves the puzzle unsolved. It has been suggested previously that both genomes of a heterozygous retrovirus particle are represented in the progeny of a single infectious event (33, 34). This suggestion can be correct only if each haploid subunit of a retrovirus genome is completely transcribed into biologically active DNA. Viral DNA Synfhesis in Vitro and in Vivo-The synthesis of retrovirus DNA has been studied in three settings: the infected cell, endogenous reactions" with detergent-dis~pted virions, and "reconstructed reactions" with purified reverse transcriptase and viral RNA. Viral DNA synthesis in the first and second of these settings may differ by very little; previous reports indicate that endogenous reactions with murine leukemia/sarcoma virus can produce linear duplexes that are identical with those synthesized in infected cells (35, 36), and our present findings indicate that the early events during viral DNA synthesis in vivo are probably identical with those in the ASV endogenous reactions.
Reconstructed reactions, while capable of carrying out the initial steps of DNA synthesis (see miniprint, Fig. 4), usually fail to produce mature forms of viral DNA (29). We have shown here that when purified 70 S RNA is used as template, perhaps one-half of the viral DNA products extended beyond the 5' end of the template are anomalous, consisting of hairpin copies of the sequences near the 5' end of the RNA. It has previously been shown that when ASV 70 S RNA is used as template, a significant amount of hairpin DNA is generated (37); under certain conditions with 38 S subunit RNA as template, hairpin DNA is the only identifiable DNA product longer than 101 nucleotides (38). It appears possible that constituents of the virion or the configuration of encapsidated viral RNA facilitate correct transcription.
Anomalous Viral DIVA Synthesis in Vitro-We attribute the synthesis of hairpin ASV DNA to a 5-nucleotide-long inverted redundancy in the nucleotide sequence of nascent DNA that allows the chain to fold back upon itself (see miniprint, Fig. 6C). Chain progagation can then continue, using DNA as template. This scheme is supported by the fact that actinomycin D inhibits the genesis of hairpin DNA and by the sequence of the DNA itself (38, and Figs. 4, 5, and 6A in miniprint).
There are two arguments that can be made for RNase H playing an obligatory role in the genesis of the hairpin DNA we observed. First, the pairing of the 5-nucleotide-long inverted repeat would appear to require that the initial DNA transcript be single-stranded from position 101 to position 56. Second, since the polymerase is not known to carry out strand displacement synthesis in vitro (39), the nascent DNA is probably single-stranded through position 1, allowing transcription to proceed to the primer binding site (Fig. 6). We attribute the complete removal of the RNA template from the nascent DNA transcript to the RNase H activity associated with the polymerase ( 2 ) . Our initial results with an inhibitor of RNase W activity, sodium fluoride (40), support this view.
The synthesis of the fotdback species is much more sensitive to inhibition by this drug than is synthesis of the doublet termination product. (strong stop DNA) at positions 102 and 103.
The hairpin DNA characterized here is probably identical with that described previously by Collett and Faras (38). These authors suggested that foldback in the nascent DNA might be the means by which synthesis of the second st.rand of ASV DNA is initiated. We doubt that this suggestion is correct: hairpin DNA has not been found in either infected cells or endogenous reactions (1); the foldback described above shortens duplex viral DNA so that it no longer represents the full extent of viral RNA (see Fig. 6C); and formation of the hairpin DNA would preclude genesis of the terminal redundancy in the linear viral DNA.
Foldback synthesis has been observed frequently in the reverse transcription of cellular mRNAs and has been exploited to generate double-stranded DNA for subsequent cloning into prokaryotic vectors (41). We presume that foldback synt.hesis in these instances is a fortuitous event, akin to the process described here.
Transcription from retrovirus RNA in uifro frequently pauses or terminates at t.he 5' end of the viral genome. The resulting DNA has been known variously as "short stop DNA," "strong stop DNA," and cDNAn, (42, 43); its length provides a measure of the distance from the point of initiation to the 5"terminus of the viral genome (13,44). We and others have previously determined the nucleotide sequence of ASV "strong stop DNA" and have reported its length as 101 nucleotides (18,22). In the present study, however, we found two apparent species of "strong stop" DNA, both longer than expected (102 and 103 nucleotides; Figs. 3 and 4 in miniprint). What is the origin of this discrepancy? We can envision several possible explanations, none of which appear satisfactory. (i) Heterogeneity in the virus stock. We discount this explanation; we used the same virus as in the previous study, and beyond position 104, we obtained a single nucleotide sequence in phase with the alleged end of the "strong stop" DNAs. (ii) An error in the previous analyses. The agreement among results of several laboratories makes this explanation unlikely, although the previous work was performed in a manner that makes identification of the penultimate and ultimate 3' residues in "strong stop DNA" less t.han certain. (iii) Transcription from the "cap" nucleotide in the RNA template. This explanation would not account, for the existence of two species of "strong stop DNA," both longer than the uncapped template. In any event, we have not yet identified the nucleotides a t positions 102 and 103 and, hence, cannot anticipate the composition of their template(s).
Aknou~ledgmen,t.s-We thank J. Majors and P , Czernilofsky for helpful discussions, L. Levintow, R. Parker, and J. Majors for reading We also thank W. DeLorbe and H. Parker for making cloned ASV the manuscript, and B. Cook for excellent stenographic assistance. D N A available, and D. Schwartz for communication of results prior to publication.