Identification of a Novel 9-kDa Polypeptide from Nuclear Extracts DNA BINDING PROPERTIES, PRIMARY STRUCTURE, AND IN VITRO EXPRESSION*

Using a modified DNA mobility shift assay, we have identified and purified a novel 9-kDa polypeptide (des-ignated p9) from plasmacytoma nuclear extracts which forms salt-stable DNA-protein complexes without apparent sequence specificity. In competition studies, p9 bound exclusively with naturally occurring DNA relative to RNA and preferentially with polydeoxypy-rimidines among 10 homopolynucleotides tested. Nu- clear extracts prepared from various murine and human cell lines contained a common factor which, when cross-linked with photoreactive t3’P]DNA, co-mi-grated with covalent [32P]DNA-p9 complexes on poly- acrylamide gels. An oligonucleotide, constructed on the basis of the N-terminal 29 amino acid residues of p9, was employed to isolate a 700-base pair cDNA clone encoding the complete polypeptide. On Northern blots, p9 cDNA hybridized with a mRNA species of compa- rable size from both mouse and human cell lines, suggesting a significant degree of interspecies sequence conservation. Amino acid and cDNA sequence analyses demonstrated that p9 derives from the 77-residue C-terminal domain of a 14-kDa polypeptide comprised of 127 amino acids. DNA binding activity was exhibited by peptides synthesized in vitro from run-off RNA transcripts corresponding to the truncated 9-kDa C- terminal domain, but not to the 14-kDa precursor, implicating proteolysis as a post-translational mechanism required for

Multiple cellular proteins which bind single-stranded DNA or RNA in a sequence-independent fashion in vitro, referred to as single-stranded binding proteins, have been identified in both prokaryotic and eukaryotic organisms (reviewed in Ref. 1). Nucleic acid polymers immobilized on solid chromatographic supports have commonly been employed in biochemical approaches to identify such proteins (2, 3), which may play physiological roles in various aspects of nucleic acid metabolism (1,3). Particularly in the case of higher eukaryotes, for which genetic analyses are limited, efforts to define proteins characteristic of this group in functional and structural contexts have been complicated by several factors (l), * This work was supported by the Howard Hughes Medical Institute. The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked "advertisement" in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.

503750.
$ To whom correspondence should be addressed: Howard Hughes Medical Inst., Yale University School of Medicine, 310 Cedar St., New Haven, CT 06510. including (i) apparent adventitious nucleic acid binding by proteins of known, presumably unrelated function (4) and (ii) heterogeneity resulting from in vivo and/or in vitro proteolysis (2, 5, 6). For example, the complete primary structure of single-stranded DNA-binding protein UP1 from calf thymus has recently been shown to be 98% homologous to the Nterminal portion of hnRNA-associated protein A1 from HeLa cells (7,8), suggesting that some members bear structural precursor/product relationships. However, amino acid and DNA sequence information necessary for more extensive homology comparisons is limited at present (1).
The altered mobility of DNA-protein complexes during polyacrylamide gel electrophoresis under nondenaturing conditions (9,10) has recently been exploited in a powerful methodology for identification of sequence-specific regulatory DNA-binding proteins in a number of eukaryotic gene systems (11)(12)(13). Our own experience with the DNA mobility shift assay in the study of a sequence-specific DNA-binding protein from murine plasmacytoma cells (14) prompted us to examine its use, in modified form, as a method for detection of proteins in crude nuclear extracts which tenaciously bind DNAs of unrelated sequence. Presented here are functional and structural properties of a 9-kDa DNA-binding protein (designated p9) initially identified using these criteria. The complete nucleotide sequence of an isolated cDNA encoding p9 revealed that p9 is indeed a proteolytic fragment of a larger 14-kDa polypeptide (designated p14). Results from UV cross-linking and nucleic acid hybridization studies strongly suggest that p9 is an evolutionarily conserved protein expressed in both mouse and human tissues. The primary structure of p9 (and p14) is unique relative to any known DNA-binding protein.

RESULTS
To initially characterize DNA-binding proteins in nuclear extracts which have no stringent sequence specificity, we investigated the applicability of the DNA mobility shift assay originally described by Fried and Crothers (10). DNA binding reactions were carried out by incubating nuclear extracts with t3'P]DNA probes of limited size (s30 base pairs) (13) at high salt concentrations (0.4 M KC1) in the absence of any competitor DNA.
[32P]DNA-protein complexes formed under these conditions were then conventionally detected by virtue of their altered mobilities relative to that of noncomplexed t3'P]DNA on polyacrylamide gels (9,10). As shown in Fig.  1 A , analysis of crude nuclear extracts (lane CNE) from a murine plasmacytoma (J558L) by this method revealed two "Experimental Procedures" are presented in miniprint at the end of this paper. Miniprint is easily read with the aid of a standard magnifying glass. Full size photocopies are included in the microfilm edition of the Journal that is available from Waverly Press. isolated from 15% polyacrylamide-SDS gels (10 ng; lane R N ) were mixed with [32P]DNA in standard binding reactions. A, relative mobilities of ["'PIDNA-protein complexes (C1 and C2) and free ["PI DNA ( F ) revealed by electrophoresis on 4% polyacrylamide gels and autoradiography. B, 15% polyacrylamide-SDS gel autoradiogram of reaction mixtures subjected to UV cross-linking and nuclease digestion. Purified p9 (5 pg) was run in an adjacent lane as a marker ( l a n e MK) and stained with Coomassie Blue to illustrate relative mobility differences. C, 15% polyacrylamide-SDS gel electrophoretic patterns of protein fractions (2-50 pg) visualized by Coomassie Blue staining (the relevant 7-kDa species stains very poorly with silver). Relative mobilities and sizes (in kilodaltons) of protein standards ( l a n e STD) are given to the left of B and C. major [32P]DNA-protein complexes with reduced electrophoretic mobility, designated C1 and C2. Complex C1 was similarly detectable (i) in reactions carried out a t 1 M KCl, (ii) with various unrelated [32P]DNA probes of similar size from polylinker sequences of plasmid pUC19 (19), and (iii) using filtrates derived by passage of nuclear extracts through membranes possessing a 30-kDa cutoff, indicative of a salt-stable DNA-binding protein of relatively low molecular weight (data not shown).
A possible candidate for such a protein was identified by DNA-protein cross-linking experiments. Nuclear extracts were incubated with bromodeoxyuridine-substituted ["PI DNA, covalently cross-linked by exposure to UV light, and subjected to 15% polyacrylamide-SDS' gel electrophoresis after exhaustive nuclease digestion to estimate the size of the protein component. Autoradiographs of the gel-fractionated products revealed a major 8-10-kDa species (Fig. lB, lune CNE). Covalently linked [32P]DNA protected by protein from nuclease digestion, when isolated by protease treatment and phenol extraction, co-migrated with DNA standards two to four nucleotides in length on 15% polyacrylamide-urea sequencing gels (data not shown). From these data, the polypeptide component could therefore migrate independently as a significantly smaller species.
In order to identify and purify the factor(s) responsible for DNA-protein complex C1, we fractionated crude nuclear extracts using three column chromatography steps and followed the DNA binding activity by mobility shift and UV crosslinking analyses. Extracts were first passed over DE 52 cellulose, which effectively removed contaminating DNA and all detectable activity corresponding to ["PI DNA-protein com- plex C2. Complex C1 activity in the flow-through fraction was completely absorbed by passage over Cibacron blue-agarose and eluted between 1 and 2 M KC1. DNA-binding proteins in these eluates were absorbed to single-stranded DNA-agarose and then fractionated on the basis of their differential DNAprotein complex stabilities as a function of ionic strength. Fractions eluted with 1 M KC1 (following a 0.6 M KC1 wash) were highly enriched in a 7-kDa polypeptide (referred to hereafter as p9 for reasons described below) and contained all DNA binding activity corresponding to mobility shift complex C1 and the 8-10-kDa cross-link species (Fig. 1, A-C, lane   DS). We roughly estimate that p9 represents at least 0.01% of nuclear extract protein by weight. Similar activity profiles were obtained with homogeneous p9 samples obtained by preparative 15% polyacrylamide-SDS gel electrophoresis, elution, and renaturation ( Fig. 1, A-C, lune R N ) , verifying that p9 was solely responsible for both DNA binding activities. From these mobility shift assays, we estimate a maximum KD value for p9 to be M for double-stranded DNA. Nuclear extracts from a panel of mouse and human cell lines all contained a factor detectable among UV cross-linked products with a comparable electrophoretic mobility relative to the 8-10-kDa complex formed by p9 of J558L origin (Fig. 2).
Establishment of p9 as the polypeptide which bound DNA to effect formation of mobility shift complex C1 facilitated further fundamental studies of p9 nucleic acid binding properties. In competition experiments, natural and synthetic nucleic acids at comparable concentrations were included in p9 DNA binding reactions and then evaluated by mobility shift analysis for their ability to block formation of complex C1. As shown in Fig. 3, the results indicated that p9 exhibited preferential binding with phage single-stranded DNA relative to either its duplex counterpart (15-fold) (panels 1 and 2 ) or single-stranded RNA from two sources (>6O-fold) (panels 1, 9, and 10). Inhibitory capacities of various synthetic polynucleotides tested suggested a marked effect of base composition on relative p9 binding affinities and, excluding poly(G), indicated an absolute preference for pyrimidine deoxyribohomopolymers. Of all nucleic acids tested, poly(me5dC) effected inhibition of ["PIDNA-p9 complex formation at the lowest input concentrations (panel 8). Subsequent experiments carried out at lower concentrations demonstrated a 10-20-fold greater affinity for poly(me5dC) over poly(dC). Results obtained with homogeneous p9 preparations renatured from 15% polyacrylamide-SDS gels were consistent with these patterns (see Fig. 8; data not shown). Considering the polydisperse nature of these synthetic substrates and the lack of formation concerning p9 binding stoichiometry and cooperativity, further studies will be required for more quantitative measurements of these differences.
In order to design an oligonucleotide probe for isolation of a cDNA encoding p9, highly purified p9 samples were analyzed by gas-phase sequencing, yielding the following N-terminal amino acid sequence: SSKQSSSSRDDNMFQIGKMRYVS-VRDF(K)(G)XI (where X denotes an unassignable residue and parentheses denote weak phenylthiohydantoin-derivative signals). An 86-base anti-sense oligonucleotide probe (5'-

CCCTTGAAGTCCCGCACAGACACATACCGCATCTTG-CCAATCTGGAACATGTTGTCATCCCGGGAGGAGGA-
GGACTGCTTGGAGGA-3') was synthesized, based on the first 29 amino acid residues and the most probable codon usage (29), and used to screen a J558L cDNA library in XgtlO. Six independent cDNA clones which hybridized strongly with the probe (the homology was found to be 79%) were subcloned into pUC19. Restriction mapping established that these overlapping clones fell into two size classes of approximately 700 incubated with photoreactive ["PIDNA, exposed to UV light, treated with nuclease, and subjected to electrophoresis on a 15% polyacrylamide-SDS gel. Reaction mixtures contained 10 pg of crude nuclear extract with the following exceptions: lane 1 , 5 ng of purified p9; lane 2, 2 pg of bovine serum albumin (BSA). Experimental controls not shown included (i) protein alone, (ii) probe alone, (iii) UV irradiation omitted, and (iv) proteinase K treatment (21). In each case, no labeled species was detected, demonstrating the absence of stable [32P]DNAprotein adducts. Cell line sources not identified elsewhere (see Fig. 6 legend) were: WEHI-231, murine B lymphoid; MethA, murine fibrosarcoma; and COS-1, simian kidney. Relative mobilities and sizes (in kilodaltons) of protein standards are given to the left. Autoradiographic exposure time was 5 days. Whereas relative band intensities for a given sample varied somewhat, the 8-10-kDa cross-link species was always detectable, including experiments carried out with an unrelated photoreactive probe (data not shown). and 3,000 base pairs, as typified by clones 86-2 and 86-9 diagramed in Fig. 4. Complete nucleotide sequences were determined for both of these overlapping clones, a composite of which is given in with the addition of a typical poly(A) tail of 100 or more residues (40), 86-2 (698 base pairs including the 31 A residues a t its 3' terminus) is sufficiently long to account for the size of the major mRNA transcript (approximately 800 bases) detected on Northern blots (described below). Excluding TrpS9 through Metg6, all codon assignments given for p9 (Fig. 5) were based on both the cDNA sequence and amino acid sequence data (N-terminal and cyanogen bromide fragments), which were in complete agreement. The calculated molecular masses of p9 and the 127-residue precursor (designated p14) were 9,018 and 14,427 Da, respectively. The amino acid sequence determined for the C-terminal cyanogen bromide peptide (Glu''' through Ledz7) confirmed that Ledz7 is the ultimate residue of p9, in agreement with cDNA sequence This finding further precludes the possibility that noted discrepancies in p9 size determined from 15% polyacrylamide gels (7 kDa) and from cDNA sequence translation (9 kDa) are due to proteolysis at the p9 C terminus. The UV cross-linking data presented in Fig. 2  that DNA-binding proteins similar in size to p9 were expressed in all cells tested. These data fail to establish absolute identity of the relevant cross-linked species with p9, as other DNA-binding proteins of this size could account for the signal. To obtain additional evidence for the prevalence of p9, Northern blot analyses of total cellular RNA from a variety of cell lines were carried out using the 700-base pair cDNA (86-2) as a probe (Fig. 6). These studies identified a major mRNA species of approximately 800 bases in all murine cell lines tested, which is therefore apparently ubiquitous in its tissuetype distribution. This mRNA species was also readily detectable in a number of human cell lines tested under fairly stringent wash conditions (0.04 M Na+, 65 "C), suggesting a significant degree of sequence conservation at least among mammals. Another minor mRNA species of approximately 3.4 kilobases was also detected in all cell lines examined, but varied considerably in abundance. This transcript very likely corresponds to the 2.9-kilobase cDNA clone (86-9) which shares a sequence overlap with the shorter 86-2 cDNA probe (Fig. 4) and presumably arises from transcriptional readthrough at the major polyadenylation site (Fig. 5). A third mRNA species (1.5 kilobases) has so far been detected only in human cell lines.
These results implicating interspecies conservation prompted an examination of p9-related sequences at the genomic level. When murine (J558L) and human (HeLa) DNAs were compared by Southern blot analysis (Fig. 7), a complex banding pattern emerged. Hybridization of the 86-2 probe with murine DNA produced multiple bands with apparent sequence similarity to this cDNA. Analysis of human DNA revealed a similar number of hybridizing bands; but for any   given digest, the specific restriction pattern bore no resemblance to the murine counterpart. Subsequent probing of murine DNA with the 2.9-kilobase cDNA (86-9) revealed a primary band in each digest which hybridized most intensely and thus probably identifies a genomic segment which corresponds to that cloned message (data not shown).

TGTTATATGTACATTTCAGCTGTCCGTTGCAGAGATCTGCGACGTCTGCTACCTAGGCATTGCTAGCTGCTTAGCATATTCCATGTGGAAAGCTTT CTCGGGCATAATTTGTTAGGTTGATAGCTRRATACATTCACTAATGTTATACCAGAGTTGTTGTTATTTTTCTTTGTGTGTTGAGG~TTGA~TT RRAAGAAGAGGAAGGACTTTTCAGATGGARAAGAAAGGGTACAGTCCTTCTATTTTGCGTTCAGATGCCTTTCTCCTCAGTGTCTTAGTTTCTTTTCTGT TGCTGTGGTGAAGTGCCCTGACAGAAGCAGCCMGGGAGAAAGGGGTTTATCCTGACTCACAGTTCCATGTTCCAGTCCGTTGTCGCATTGTCACAGTGGC AGGAACTCCAGACATCTGCCACATGGCTCCATAGCCAGGAAGCTGAGAGCTGTGTTGCATGCTCAGTTCCTTTTTGTCTCATTAACCTGAGCCCAGGAG CTGGAGTTGGCCACAGCAGGCAAGTGATTCTCCAGCATCATGCCTAGAGGCCTATCTTTTGGGTAGTTCTAGTTCTAGTTGAACTAATAACCATCAT AGTMGTTTTCTCCCAGGAAGAAGGCTTGACATTTTAAGGCATAATATCTTATTTTGTMCACT-TAATTAGATCTGCTGCCAGGTTTTGCCCTTTA A T A T G C T C C A T A G T A A G G C T T C C A A A T G G G T T T T C C T C T T A G G G T~G C C T T T G G C C A A T T T C C A C A A T C A C T G~ 2786
To examine the functional relationship between p9 and the p14 precursor, full-length and C-terminal peptides were synthesized in vitro using run-off RNA generated from SP65/ cDNA templates (36). Chimeric DNA templates for p9 synthesis (designated SP65-p9) were constructed by fusion of an ATG translational start codon to codon 48 of the cDNA, resulting in an additional 3 amino acid residues (Met-Leu-Ala) relative to the p9 N-terminal residue (Ser51). For p14 synthesis, the complete cDNA was subcloned into SP65 (designated SP65-pl4) utilizing the authentic Met initiator at codon 1. Both plasmid constructs were linearized at an RsaI site 21 nucleotides downstream of the TAA termination codon, and run-off transcripts generated were translated in rabbit reticulocyte lysates using ["S]methionine for product analysis. As shown in Fig. 8A, the major "S-labeled product derived from SP65-pl4 migrated on SDS-polyacrylamide gels as an 18-kDa species, markedly larger than that predicted from the cDNA sequence (14 kDa). Such discrepancies in molecular mass estimates based on these two criteria have been noted previously for other DNA-binding proteins (39,41). Removal of coding sequences corresponding to the Nterminal 48 amino acids resulted in synthesis of two truncated C-terminal peptides (7 and 4.5 kDa), the larger of which comigrated with purified p9 from nuclear extracts. Surprisingly, both C-terminal peptides were also evident, albeit at substantially reduced levels (<lo%), among 35S-labeled products synthesized from full-length SP65-pl4 templates (Fig. 8A, lane  Z), suggesting the possibility of limited p14 proteolysis. Similar results were obtained with addition of protease inhibitors (soybean trypsin inhibitor and ovomucoid) to the translation lysates (not shown).
In initial attempts to evaluate DNA binding activities of p14 and p9 synthetic peptides by mobility shift assay, results from mixing experiments indicated that certain factors of reticulocyte lysate origin interfered with detection of DNA binding activity by renatured p9 samples (Fig. 8B). These factors, as well as all intrinsic DNA binding activity detectable under our conditions, could be quantitatively precipitated by addition of saturated ammonium sulfate to 60% (Fig. 8B). Supernatants obtained from this lysate fractionation step, which contained >85% of trichloroacetic acid-insoluble protein labeled with [3sS]methionine, were then tested for DNA binding activity. As shown in Fig. 8C, translation products corresponding to the C-terminal (p9) domain of p14 (lanes 10-12) contained DNA-binding factors which exhibited ["PI DNA mobility shift properties and synthetic homopolymer specificity characteristic of p9 derived from nuclear extracts (lunes 1-3). In contrast, synthetic p14 peptides displayed minimal DNA binding activity when tested at molar concentrations equivalent to that of synthetic p9 (lanes 7-9). Results from control experiments carried out under identical conditions with "mock" translation products generated from SP65 vector transcripts (lanes [4][5][6] verified that the observed ["PI DNA-protein complexes involved peptides specifically derived from p9 coding sequences.

DISCUSSION
In this report, we have determined the complete primary structure of a novel 9-kDa polypeptide isolated from nuclear extracts. Amino acid and cDNA sequence analyses demonstrated that p9 is a C-terminal proteolytic fragment of a 14-kDa precursor. The biological role of p9, which binds DNA in vitro at relatively high salt concentrations in a sequenceindependent fashion characteristic of eukaryotic singlestranded binding proteins (see Introduction and Ref. l), remains unknown. Furthermore, although our p9 fractionation scheme employed high salt extraction of nuclei, perturbation of the nuclear membrane could have effected significant bidirectional passive diffusion of relatively small polypeptides.
Equimolar amounts of synthetic p14 and p9 were used (see A). Under our conditions (ie. no fluorographic enhancement), 35S autoradiographic signals were not detected.
We currently have no direct evidence for either DNA binding or nuclear localization of p9 in vivo. Collectively, data obtained from analysis of p9 at the protein, RNA, and DNA levels strongly suggest a remarkable structural conservation and ubiquitous expression of this protein in disparate cell types of mammalian species as diverse as mouse and human. The number of bands hybridizing to p9 cDNA sequences on Southern blots and the amount of genomic DNA encompassed could reflect cross-hybridization with multiple related genes. Resolution of this issue awaits genomic cloning and characterization of the sequences involved.
A computer search of the Protein Data Base of the National Biomedical Research Foundation failed to identify any entry with significant primary struc+val homology to p14. Based on amino acid composition and molecular mass, p9 appears to be an extremely hydrophilic peptide with limited globular character, possessing a relatively high content of basic (18%) and acidic (17%) residues. Both p9 and p14 are devoid of cysteine residues, precluding intra-and interchain disulfide bond formation. Two serine-rich clusters located at the Nterminal portions of p14 (residues 4-19) and p9 (residues 51-58), which represent only 19% of the complete sequence, contain 15 of the 21 serine residues present in p14 (Fig. 5). Kumar et al. (8) have recently noted a single serine-rich cluster in the C-terminal domains of two other nucleic acidbinding proteins, namely the T4 gene 32 protein and human ribonucleoprotein-associated protein Al. The functional im-plications of these clusters are unclear; and we as yet have no evidence for serine phosphorylation or other post-translational modifications, excluding proteolysis, which are required for p9 DNA binding activity. In fact, preliminary studies have indicated that p9 prepared by automated chemical synthesis and subsequently renatured appears to possess in vitro DNA binding activity comparable to that of its cellular counterpart (data not shown).
Apparent proteolytic sensitivity and modulatory effects of the N-terminal domain on C-terminal DNA binding activity suggested from our in vitro translation/expression studies ( Fig. 8) are properties characteristic in principle of other eukaryotic nucleic acid-binding proteins, such as hnRNAassociated protein A1 and high mobility group 1 protein.
Limited in vitro proteolysis of either protein has been shown to yield a protease-resistant N-terminal fragment with enhanced helix-destabilizing activity (8,42). In contrast, our data indicate that the 14-kDa precursor of p9 does not detectably interact with DNA, suggesting that the p9 C-terminal DNA-binding site is sequestered. It is possible that p9 DNA binding activity may be expressed in vivo as a consequence of controlled proteolysis of p14 between Ala'" and Ser" by a mechanism under specific regulation, as has been suggested for other eukaryotic DNA-binding proteins (1,8).