Repeated Nucleotide Sequence Arrays in Balbiani Ring 1 of Chironomus tentans Contain Internally Nonrepeating and Subrepeating Elements*

Balbiani rings in Chironomus are large puffs on salivary gland polytene chromosomes that contain functionally related, but nonidentical genes that code for tissue-specific secretory polypeptides. In situ hy- bridization was used to select a recombinant plasmid (pCtBR1-1) that contained an insert of Chironomus tentans genomic DNA that originated from Balbiani ring 1. Mapping with restriction endonucleases dem- onstrated that the insert was 385 bp (basepair) and it contained duplicate clusters of certain cleavage sites about 250 bp apart. This repeat was shown to be part of tandem sequence arrays in the genome by hybridi- zation of radioactive pCtBR1-1 to nitrocellulose blots containing limit and partial restriction endonuclease digests of nuclear DNA. Subsequent sequence analysis of the cloned DNA confirmed the presence of one complete copy of a 246-bp repeat comprised of a 114-bp internally nonrepeating segment and a 132-bp segment containing four 33-bp subrepeats. The subrepeats ap-parently evolved from a simple 9-bp sequence encoding a consensus tripeptide (Lys-Pro-Ser) in which the first two codons (AAA-CCA) were highly conserved at the nucleotide level. Comparisons between intragenic and interspecific (BRb in Chironomus thummi) copies of corresponding sequences revealed that, during the ev- olution of these tandemly repeated protein-coding sequences, internally nonrepeated segments were highly conserved and most likely became interspersed by variable segments containing subrepeats that arose from reduplication

Balbiani rings in Chironomus are large puffs on salivary gland polytene chromosomes that contain functionally related, but nonidentical genes that code for tissue-specific secretory polypeptides. In situ hybridization was used to select a recombinant plasmid (pCtBR1-1) that contained an insert of Chironomus tentans genomic DNA that originated from Balbiani ring 1. Mapping with restriction endonucleases demonstrated that the insert was 385 bp (basepair) and it contained duplicate clusters of certain cleavage sites about 250 bp apart. This repeat was shown to be part of tandem sequence arrays in the genome by hybridization of radioactive pCtBR1-1 to nitrocellulose blots containing limit and partial restriction endonuclease digests of nuclear DNA. Subsequent sequence analysis of the cloned DNA confirmed the presence of one complete copy of a 246-bp repeat comprised of a 114-bp internally nonrepeating segment and a 132-bp segment containing four 33-bp subrepeats. The subrepeats apparently evolved from a simple 9-bp sequence encoding a consensus tripeptide (Lys-Pro-Ser) in which the first two codons (AAA-CCA) were highly conserved at the nucleotide level. Comparisons between intragenic and interspecific (BRb in Chironomus thummi) copies of corresponding sequences revealed that, during the evolution of these tandemly repeated protein-coding sequences, internally nonrepeated segments were highly conserved and most likely became interspersed by variable segments containing subrepeats that arose from reduplication and divergence of 9-bp repeats.
Two types of sequence repetitions have been described that occur within protein-coding genes. In one group, a short nucleotide sequence is tandemly duplicated within the coding portion of a gene and is reflected by a repeating amino acid sequence of functional significance (1,2). An alternative repetition pattern is found among members of multigene families (3,4) where short intergenic sequence homologies exist between domains of certain functionally related but nonidentical genes (5)(6)(7)(8)(9). Nucleotide and amino acid sequence data reveal that an underlying evolutionary feature of these * This work was supported by the National Institutes of Health Research Grant GM 26362 from the Institute of General Medical Sciences and a PROPHET computer terminal from the Biotechnology Resources Program, Division of Research Resources. The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked "aduertisement" in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.
$ TO whom all correspondence should be sent. Recipient of a Dean's Medical Student Research Stipend.
genes is that they appear to have evolved by reduplication, divergence, and even translocation of a small ancestral sequence that ultimately yields discrete genes that may possess overlapping phenotypic functions. A group of functionally related, but nonidentical genes code for discrete polypeptides that make up saliva which is secreted from the salivary glands of Chironomus larvae (10, 11). Cytologically, two of these genes can be observed as large chromosomal puffs, known as BRl' and BR2 in Chironomus tentans, on salivary gland polytene chromosomes. BR1 and BR2 transcripts appear similar in that they are 75 S RNA molecules (12, 13) with internal sequence repetition (14)(15)(16). However, in situ hybridization of BR1 uersus BR2 RNAs detect little, if any, sequence homology (14), and their rates and patterns of reassociation to genomic DNA are distinctly different (16). Both transcripts emerge from the nucleus as 37-kb cytoplasmic RNAs (17) that can direct in uitro synthesis of incomplete secretory polypeptides (18). Secretory polypeptides are unusually large (19,201, yet they have a simple amino acid composition (21) and tryptic peptide pattern (18), indicative of short, repeated amino acid sequences. These repeated sequences presumably permit overlapping alignment of individual polypeptides during polymerization of a long, highly elastic, salivary fiber that is used by larvae to spin a protective cylindrical tube (11).
Because BR genes afford an opportunity to examine the evolution of tandemly repeated protein coding sequences at the intragenic, intergenic, and interspecific levels, it is desirable to obtain nucleotide sequence data for their basic repeating units. It is becoming increasingly apparent that the organization of repeated nucleotide sequences in this system will have to be derived from a series of small cloned DNA fragments since larger fragments are remarkably unstable (22)(23)(24). Nucleotide sequence data is currently available for BR1 (24,25) and BR2 (26) in C. tentans as well as their homologous genes, BRb (27) and BRc (23,28) in Chironomus thummi, and BR2 in Chironomus pallidivittatus (29). In each instance, the existance of tandem arrays within a BR gene has been indicated by hybridizing cloned DNA to genomic Southern blots (30). The emerging pattern of BR gene structure is one of tandem arrays which contain an alternating pattern of internally repeating and nonrepeating nucleotide sequences (23,(25)(26)(27)(28)(29).
We report here the nucleotide sequence of a 348-bp genomic fragment of BR1 in C. tentans. The cloned fragment contained 7794 Repeated Nucleotide Sequences in Balbiani Ring 1 a complete copy of a 246-bp repeated sequence that is apparently organized as tandem arrays in the genome. The 246-bp repeat was divided into a conserved internally nonrepeating segment and a somewhat more divergent segment that contained subrepeats which may have evolved from simple 9-bp repeats. Our data for BRl support a recent model (26) for the evolution of 37-kb BR2 genes from a 110 to 120-bp primordial sequence.

EXPERIMENTAL PROCEDURES
Construction, Selection, and Purification of pCtBR2-2-Details have been published regarding the construction of a partial genomic library from which pCtBR1-1 was selected (22). Briefly, BR1 and BR2 sequences were found to reside on 32 to 40-kb fragments after limit digestion of C. tentans nuclear DNA with EcoRI. Such fragments were enriched by gradient sedimentation, randomly sheared, tailed with oligo(dC), and annealed to oligo(dG)-tailed PstI-cleaved pBR322. Recombinants were screened by colony hybridization with radioactive salivary gland 75 S RNA that simultaneously detected BR1+ BR2 sequences. Plasmid DNAs used in this study were purified from detergent lysates of host bacterial cells by two rounds of buoyant density centrifugation in CsCl gradients containing ethidium bromide (31). Recombinant DNA experiments were conducted under containment conditions prescribed by National Institutes of Health "Guidelines for Research Involving Recombinant DNA Molecules." Radiolabeling of Nucleic Acids-Purified (17) salivary gland 75 S RNA was partially hydrolyzed and labeled with [y"P]ATP using polynucleotide kinase (32). Purified plasmid DNAs used for hybridization probes were labeled by the nick translation reaction (33,34) using either ['HJdTTP or [n-"P]dCTP. Phage or plasmid DNAs used as molecular weight markers on gels were labeled by nick translation without the addition of exogenous DNase. DNA fragments cleaved by restriction endonucleases were end-labeled with [y-"PIATP in an exchange reaction (35). Radioisotopes were purchased from New England Nuclear.
In Situ Hybridization-Squashed preparations (36) of salivary gland polytene chromosomes were used for in situ hybridization of "H-labeled plasmid DNAs. Hybridizations were done in 4 X SET (1 X S E T is 0.15 M NaCI, 30 mM Tris-HCI, pH 8.0, and 2 mM EDTA), 30% formamide a t 37 "C. Posthybridization treatment included several rinses in excess hybridization media a t 37 "C and two rinses in 1 X SET at 65 "C. In some instances, a stringent rinse was performed in 0.1 X SET at 65 "C. Hybridized chromosome preparations were subjected to autoradiography and stained with Giemsa prior to being photographed (37).
Restriction Endonuclease Mapping and Gel Blotting-Restriction endonucleases were purchased from New England Biolabs, Beverly, MA, or Bethesda Research Laboratories. Endonuclease cleavage sites within the insert were initially identified by comparing the cleavage patterns of pCtBR1-1 and pBR322 by electrophoresis in 1% agarose gels (38). Subsequent mapping was done by comparing autoradiograms made from polyacrylamide gels (39) containing appropriate single and double enzymatic digests of randomly labeled (nick-translated) uersus end-labeled (kinased) samples of the 385-bp insert (excised by PstI) or an internal 250-bp MboII fragment (see Fig. 2 A ) . These data were confirmed by partial digests of end-labeled fragments, according to published procedures (40). Limit or partial endonuclease digests of C. tentans nuclear DNA were fractionated on agarose gels and transferred to nitrocellulose blots (30). "P-labeled pCtRR1-1 was hybridized to blots in 4 X SET, 30% formamide, 10 X Denhardt's media (41), 0.1% sodium dodecyl sulfate and 0.1% sodium pyrophosphate at 37 "C. Posthybridization washes were the same as described for in situ hybridization. All gels contained internal radioactive molecular weight markers made by cleaving X DNA with Hind111 (42) or pBR322 DNA with HinfI, HaelII, or HpaII (43).
DNA Sequencing-The pCtBR1-1 insert and appropriate fragments were subcloned into the replicative form of bacteriophage M13mp9. The complete insert was subcloned as a PstI fragment. The leftward, central, and rightward RsaI fragments (and adjacent vector sequences, refer to Fig. 2 A ) were separated by polyacrylamide gel electrophoresis and repurified (44), and synthetic EcoRI linkers were added by blunt end ligation (45). The central RsaI fragment was then cleaved with EcoRI to trim linker oligomers and cloned into the EcoRl site of M13mp9 DNA. The leftward and rightward fragments were cleaved with EcoRI + PstI and cloned into phage DNA cleaved by the same enzymes. Recombinant phage containing complementary strands of inserted restriction fragments were sorted by DNA hybridization of phage lysates (46). Transcribed uersu.9 nontranscribed strands were identified by their ability to form RNase-resistant hybrids with "P-labeled 75 S RNA. DNA sequencing was performed by using recombinant phage as templates for DNA synthesis in the presence of a primer (47) and dideoxyribonucleoside triphosphate chain terminators (48). Short repeated nucleotide sequence homologies were detected by using Align and Makedotgraph procedures which are based upon a previously published procedure (49) and available on the PROPHET computer network.

RESULTS AND DISCUSSION
Bulbiuni Ring Origin ofpCtBRI-I-pCtBR1-1 is comprised of a fragment of C. tentans nuclear DNA inserted into the PstI recognition site of pBR322 via oligo(dG):oligo(dC) homopolymeric tails. The recombinant was originally selected as a potential BR clone by colony hybridization using radioactive 75 S RNA that contained both BR1 and BR2 sequences. T o determine which BR the insert originated from, purified pCtBR1-1 DNA was radiolabeled in uitro with ['HIdTTP by nick translation and hybridized in situ to squashed preparations of salivary gland polytene chromosomes. Photomicrographs showed that autoradiographic exposure of silver grains occurred over chromosome IV, especially in the region of BR1 ( Fig. 1A). Peripheral grains were often seen over BR2, similar In situ hybridization of 'H-labeled pCtBR1-1 to salivary gland polytene chromosome IV. Each photomicrograph shows an autoradiogram of a salivary gland fourth chromosome that contained radioactively exposed silver grains over BR1 and to a lesser extent, over BR2. After hybridization, the sample shown in A was rinsed in 1 X SET at 65 "C, whereas the sample in B was rinsed in 0.1 X SET at 65 "C. The bar equals 20 pm. Repeated Nucleotide Sequences in Balbiani Ring 1 7795 to a previous observation (24), but control experiments indicated that these could be greatly reduced by a stringent posthybridization rinse (Fig. 1B). In parallel experiments it was shown that "H-labeled pBR322 did not hybridize to polytene chromosomes. We, therefore, conclude that the genomic insert contained within pCtBR1-l originates from BR1. Hybridization to BR2 was subsequently explained when a partial sequence homology was identified between this recombinant and a cloned BR2 sequence (26).
Endonuclease Cleavage Map and Genomic Organization of the pCtBR1-1 Insert-Restriction endonuclease cleavage mapping of the recombinant (Fig. 2 4 ) indicated that the size of the insert was 385 bp. The central portion of the insert contained four, approximately evenly spaced Sau3A (MboI) sites flanked by two identical clusters of AluI, MboII, HinfI, and RsaI recognition sites about 250 bp apart. These results first suggested that the pCtBR1-1 insert contained a repeated nucleotide sequence.
T o determine the organization of the pCtBR1-1 insert within the C. tentans genome, Southern blots (30) containing restriction endonuclease-cleaved fragments of nuclear DNA were probed with '"P-labeled pCtBR1-1. Autoradiograms of blots made from limit MboII digests revealed a single band of hybridization a t 250 bp (Fig. 3A), while partial MboII digests showed a ladder of bands with a 250-bp interval ranging from 250 bp to at least 2.5 kb (Fig. 3B). Identical results were obtained with HinfI and RsaI (data not shown). These results indicated that the 250-bp MboII fragment of the pCtBR1-1 insert might be organized within BR1 as tandem arrays consisting of a t least 10 copies.
In an attempt to estimate the maximum number of tandem sequence arrays in BRl, blots containing nuclear DNA cleaved with TaqI or HaeIII were hybridized with "'P-labeled pCtBR1-1. Neither enzyme cleaved the pCtBR1-1 insert, yet each enzyme cleaved most of C. tentans DNA to fragments of less than 4 kb. TaqI  sequences. Nonetheless, these results raised the possibility that a substantial portion of a 37-kb BR1 gene may be comprised of tandem arrays of the 250-bp repeated sequence contained within the recombinant. We subsequently found that the faint 28-kb bands obtained in both digests could be eliminated by the same stringent rinse used after in situ hybridization. Thus, we conclude that these bands also result from a sequence homology with BR2.
Nucleotide and Amino Acid Sequence-The DNA sequencing strategy employed in this study involved subcloning the entire 385-bp insert and appropriate subfragments (Fig. 2 A ) into the replicative form (double-stranded DNA) of bacteriophage M13mp9. This procedure allowed us to unambiguously identify recombinants containing transcribed versus nontranscribed strands by filter hybridization with "'P-labeled 75 S RNA (Fig. 2B). All segments were sequenced at least twice and 95% of the sequence was confirmed by independently sequencing complementary strands of the original insert. Due to the presence of a long homopolymeric sequence at the 3'end of the insert (Fig. 2 A ) , it was not possible to obtain confirmation of the 10-bp sequence between the nearest RsaI and AluI sites on the nontranscribed strand (Fig. 2B).
The nucleotide and encoded amino acid sequence of the nontranscribed strand of the pCtBR1-1 insert is presented in Fig. 4. The insert contained 348 bp of C. tentans DNA and homopolymeric stretches of (dGh and (dCLR located at the transcriptional 5'-and 3'-ends, respectively. Direct evidence was found for the repeated sequence previously suggested by endonuclease cleavage mapping (Fig. 2 . 4 ) and genomic blotting experiments (Fig. 3) Table I). These same four amino acids have been identified as the most abundant residues in hydrolysates of the putative BRl polypeptide.' Ser codons frequently occurred within sequences that were compatible with reported phosphorylation sites in other proteins (51). We have found that greater than 80% of "P incorporated into the putative BR1 product in vivo can be recovered as phosphoserine." The 246-bp Repeat Contains an Internally Nonrepeating Segment and a Segment with Subrepeats-While this manuscript was in preparation, models of BR gene structure were published which predicted that segments of highly repeated sequences alternated with nonrepeated sequence segments (23,(25)(26)(27)(28)(29). The 246-bp repeat within the pCtBR1-1 insert could similarly be divided into two regions consisting of distinctive nucleotide sequence patterns. These regions were designated as INR and a segment containing SR based upon a computer-assisted search for internally repeated nucleotide sequences. Examination of the complete nucleotide sequence indicated that the INR and SR segments alternate with one another and that two SR/INR junctions occurred within the pCtBR1-1 insert (Figs. 2C and 4).
The INRl segment (Fig. 2C) contained 114 bp extending from nucleotide position 10 through 123, inclusive (Fig. 4). No obvious patterns of internal sequence repetition were found within INR1, and the clustered AluI, MboII, HinfI, and RsaI recognition sites were located within this segment (compare Figs. The 132-bp SR segment was characterized by four direct repeats (designated SRl, SR2, SR3, and SR4) of a 33-bp sequence (Figs. 2C and 4). Although these direct repeats had considerable sequence homology, more sequence heterogeneity exists between them than between INR segments. SR2 and SR4 had complete homology in their DNA sequence and contained the apparent consensus nucleotide and amino acid sequence for subrepeats within pCtBR1-1. SR3 was 97% (32/ 33 bp) homologous with the consensus nucleotide sequence. At position 215, Ado was substituted for Guo resulting in a Lys codon instead of Arg. SR1 exhibited the greatest divergence from the subrepeat consensus sequence with 91% (30/ 33 bp) homology. The first three codons contained nucleotide substitutions: Guo instead of Ado at position 124, Thd instead of Ado at position 129, and Cyd instead of Thd at position 132. Only the first substitution resulted in a different amino acid (Glu replaced Lys as the first codon), while the latter two were silent mutations. Finally, the sequence from nucleotide positions 1 through 9 was completely homologous with the last 9 bp of the subrepeat consensus sequence. This implied that it represented the distal three codons at the 3'-end of the last subrepeat preceding INRl (Fig. 2C).
Distribution of Amino Acids and Codons between INR and SR Segments-Certain amino acids and codons were not randomly distributed between INR and SR segments of the pCtBR1-l insert. The INRl segment was characterized by containing 70% (7/10) of the Arg residues found within the complete (INR1 + SR) 246-bp repeat (Table I). Less frequently occurring amino acids (Cys, Asn, Met, Phe, and Thr), that individually accounted for less than 7% of the residues,  codon during the evolution of these repeats (Fig. 5). This from Balbiani ring I in C. tentam essentially converted each subrepeat into a block of four 9-bp Frequencyof repeats. The 9-bp repeats exhibited polarity with regard to 'O-Occurrence conservation in nucleotide selection at each position (Fig. 5).
That is, bases were more conserved at the 5'-end of each 9residues f 100 bp repeat than the 3'-end. In fact, no consensus nucleotide 8 sequence was found for the last three nucleotides of the 9-bp were also sequestered in INR1. The 4 Cys residues were  (Table I).
The SR segment contained the majority of Pro (12/13), Lys (12/16), Glu (5/7), and Ser (8/13) residues coded for by the 246-bp repeat ( Table I). As a rule, Pro residues were always preceded by a charged amino acid and most frequently found within the tripeptides Lys-Pro-Ser or Arg-Pro-Glu. We noticed that the only Pro residue within INRl (Table I)

S~n s~
SeqUnce-The nxhndant Pattern of LYs-Pro-Ser and within the complete sequence (Fig. 4). The sequence reads from left Arg-Pro-Glu tripeptides implied that the 33-bp subsepeats to right, top to bottom. The proposed locations of deleted codons (A) may have evolved by duplication and divergence of a smaller are indicated. Each row is designated by its origin in INR or SR repeated nucleotide sequence. A computer-assisted alignment segments of the insert (Fig. fi), then consecutively numbered and lettered as indicated. The consensus nucleotide for each of the first prior to the lNR1/SR1 junction and encompassing the adjaoccurrence. Divergent nucleotides are circled. For the last three colcent SR segment (nucleotide Positions 106 through 255). A 3umns, related codons are shown along with their frequency of occurbp insertion was required, between the eighth and ninth codon rence. (1) "-_" "-. " ---" . "-"----"-"-"-" . -" "-" ---. " "_ . " " ___ _" . " "_ _" " _ _" _" _" "_ "  (Fig. 4). INR3 was assigned to a homologous portion of a previously published (24) 150-bp sequence from BR1 in C. tentans. INR4 and INR5 were assigned to internally nonrepeated segments of a BR1 cDNA clone (25). BRb indicates a homologous portion of a sequence reported (27)  97% DNA sequence homology existed between these five INR segments. INR3 was the most divergent segment with four base changes resulting in three amino acid substitutions. None of the amino acid substitutions resulted in deviations from the distinctive features of INR segments described above.
Evidence was also found for interspecific conservation of INR segments in a homoIogous gene, BRb in C. thummi (27). The sequence of a 242-bp clone contained subrepeats and a region with a 50-bp overlap with INR segments from BR1 (Fig. 6). Although this represented only 44% (50/114 bp) of a INR segment, the overlapping region had 76% nucleotide sequence homology and the first 21 bp were identical. The most diverged portion of the BRb INR was near the 3'-end of the overlapped region; however, the characteristic Lys-Cys dipeptide in this region (codons 13 and 14 in INR1) was conserved despite a base substitution.
BR1 subrepeats also exhibited intragenic and interspecific sequence conservation. The published (24) sequence surrounding INR3 (in Fig. 6) directly corresponds to subrepeats in BRl. The 20 nucleotides preceding INR3 were identical with the consensus sequence for a BR1 subrepeat obtained in this report. The 16 nucleotides following INR3 were also identical with the 5'-end of SR1 and duplicated an INRl/ S R l junction. Similarly, subrepeats were found (25) on both sides of INR5 whose sequence varied by not more than three nucleotides from the 132-bp SR segment reported here. Most of the published BRb sequence (27) was comprised of six inexact copies of 8 to 11-codon subrepeats that contained the consensus tripeptide Pro-Ser-Lys. Remnants of the consensus sequence for a B R l subrepeat were also present in BRb. For example, the 11 codons preceding the BRb/INRl overlap (nucleotides 160 to 192 in Ref. 27) exhibited 79% nucleotide and 100% amino acid sequence homology with the SR3 sequence described here (Fig. 4). Lys-Pro and Arg-Pro dipeptides were abundant throughout BRb subrepeats. We noted that they were most frequently encoded (11/16 dipeptides) by the sequence AAA-CCA. This sequence occurred not only as the conserved hexanucleotide within the consensus tripeptide of BR1 subrepeats (Fig. 5), but also as part of the consensus sequence AGC-AAA-CAC found for subrepeats in BR2 (26).
It was recently postulated (26) that 215-bp tandem arrays in BR2 in C. tentans evolved from an ancestral 105-bp sequence that remained conserved while it became interspersed as a result of reduplication of a terminal 9-bp sequence that eventually resulted in six adjacent and divergent 18-bp repeats. The BR1 sequence organization found in this study ( i e . 246-bp arrays comprised of INR segments flanked by tandem 9-bp repeats) supported such a model and suggested that the ancestral sequence for BR1 in C. tentans was also CA about 105 to 114 bp in length and may be represented by segment INRI. We have noted that INRl shares about 70% sequence homology with a similar segment in BR2. A detailed intergenic (BR1 versus BR2) comparison of INR segments will be made in conjunction with the description of a variant BR2 sequence (56).
The fundamental functioning unit within secretory polypeptides in Chironomus may be reflected by 200 to 300-bp repeats within BR genes. In turn, conserved and variable domains within these units may be represented by alternating INR and SR segments, respectively. Variable domains have tolerated limited divergence in amino composition and length due to variations in short tandemly repeated sequences. However, certain key characteristics were retained, reflected by the occurrence of prevalent di-and tripeptides. Homologous but unequal crossing over between tandemly repeated sequences can lead to variations in nucleotide sequence and length (52) that can presumably lead to expansion or contraction of gene size (53). T o better understand the evolution of BR genes it is desirable to identify major sequence variants that might occur, as well as obtain sequence data from other functionally related genes that code for salivary polypeptides (54, 55).