Mammalian Heterogeneous Nuclear Ribonucleoprotein Complex Protein A 1 LARGE-SCALE OVERPRODUCTION IN ESCHERICHIA COLI AND COOPERATIVE BINDING TO SINGLE-STRANDED NUCLEIC ACIDS*

Characterization of mammalian heterogeneous nuclear ribonucleoprotein complex protein A l is reported after large-scale overproduction of the protein in Escherichia coli and purification to homogeneity. A 1 is a single-stranded nucleic acid binding protein of 320 amino acids and 34,214 Da. The protein has two do-mains. The NH2-terminal domain is globular, whereas the COOH-terminal domain of about 120 amino acids has low probability of a-helix structure and is glycine- rich. Nucleic acid binding properties of recombinant A1 were compared with those of recombinant and nat- ural proteins corresponding to the NHa-terminal domain. A 1 bound to single-stranded DNA-cellulose with higher affinity than the NH2-terminal domain pep- tides. Protein-induced fluorescence enhancement was used to measure equilibrium binding properties of the proteins. A 1 binding to poly(ethenoadeny1ate) was co- operative with the intrinsic association constant of 1.6 x 10’ M“ at 0.4 M NaCl and a cooperativity parameter of 30. The NH2-terminal domain peptides bound non-cooperatively and with a much lower association con- stant. With these peptides and with intact A

Characterization of mammalian heterogeneous nuclear ribonucleoprotein complex protein A l is reported after large-scale overproduction of the protein in Escherichia coli and purification to homogeneity. A 1 is a single-stranded nucleic acid binding protein of 320 amino acids and 34,214 Da. The protein has two domains. The NH2-terminal domain is globular, whereas the COOH-terminal domain of about 120 amino acids has low probability of a-helix structure and is glycinerich. Nucleic acid binding properties of recombinant A 1 were compared with those of recombinant and natural proteins corresponding to the NHa-terminal domain. A 1 bound to single-stranded DNA-cellulose with higher affinity than the NH2-terminal domain peptides. Protein-induced fluorescence enhancement was used to measure equilibrium binding properties of the proteins. A 1 binding to poly(ethenoadeny1ate) was cooperative with the intrinsic association constant of 1.6 x 10' M" at 0.4 M NaCl and a cooperativity parameter of 30. The NH2-terminal domain peptides bound noncooperatively and with a much lower association constant. With these peptides and with intact A l , binding was fully reversed by increasing [NaCl]; yet, A1 binding was much less salt-sensitive than binding by the NH,-terminal domain peptides. A synthetic polypeptide analog of the COOH-terminal domain was prepared and was found to bind tightly to poly-(ethenoadenylate). The results are consistent with the idea that the COOH-terminal domain contributes to A 1 binding through both cooperative protein-protein interaction and direct interaction with the nucleic acid.
Single-stranded DNA binding proteins have been purified from a variety of mammalian sources and characterized extensively (1)(2)(3). These proteins bind to single-stranded con-* The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked "advertisement" in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.
$ Recipient of a long-term European Molecular Biology Organization fellowship and on leave of absence from the Instituto Di Genetica Biochimica Ed Evoluzionistica Consiglio Nazionale delle Richerche, Pavia, Italy.
ll Supported by National Institutes of Health Grant GM32370 and acknowledges a North Atlantic Treaty Organization grant for international cooperation in science. ** Supported by National Institutes of Health Grant GM31539.
formations of both RNA and DNA, and they can depress the T,,, of both nucleic acids. Hence, these proteins have been termed nucleic acid helix-destabilizing proteins (HDPs)' (4,5 ) . HDPs are capable of in uitro stimulation of DNA polymerase (Y (1, 5, 61, and they have long been considered to be involved in DNA replication (1,5) by analogy with procaryotic counterparts, such as bacteriophage T4 gene 32 protein (7). Genetic and biochemical studies indicate that gene 32 protein is involved in T4 DNA metabolism, and in addition, the gene 32 protein complexes with its own mRNA and regulates its own expression at the translation level (8-10).
In mammalian systems, HDPs are routinely purified as a family of species differing in size (in the range of 20-33 kDa) but closely related in primary structure (1,2,5). The predominant species have M, = 24,000 and 27,000, respectively.
Experiments involving antibody probing of Western blots of unfractionated extracts of mammalian cells suggested that the native species of HDP in the cell is approximately 35,000 M, (11). Since the purified protein is smaller, these results suggested proteolytic removal of a portion of the native HDP molecule during purification.  further showed that HDP and the M, 34,000 30 S hnRNP complex protein A1 were immunologically related and that the typical M, 24,000 HDP could be prepared from purified A1 by limited in vitro proteolysis. Cobianchi et al. (13) recently isolated and sequenced a full-length cDNA for rat HDP. This cDNA contained an open reading frame for a 34-kDa protein (13). The precise sequence match of this cDNA-deduced protein and the sequence of purified hnRNP complex protein A1 (14) confirmed that HDP and A1 are identical. Recent cDNA and protein sequencing indicates that the Al/HDP from human, calf, and rat are nearly identical in primary structure (14,15). Thus, like histones H3 and H4, a-actin, and somatostatin-28, the primary structure of A1 has been strictly conserved over the 80 x lo6 years since divergence of mammalian orders (16).
We undertook the expression of the rat A1 cDNA in Escherichia coli in order to obtain amounts of the unfused 34-kDa protein sufficient for detailed studies of structure-function relationships. In this paper, we report the overexpression and rapid purification of milligram quantities of A1 protein from E. coli harboring a recombinant expression vector carrying the A1 cDNA coding sequence. Characterization of the puri-The abbreviations used are: HDP, helix-destabilizing protein; hnRNP, heterogeneous nuclear ribonucleoprotein; ssDNA, singlestranded DNA; poly(cA), poly(ethenoadeny1ate); Lys-C, endoproteinase Lys-C. fied recombinant protein revealed interesting details about the mechanism of its strong binding to single-stranded nucleic acids.

RESULTS
We chose to overexpress the A1 protein from the open reading frame of a full-length cDNA (13) using the XPL promoter-based bacterial expression system pRC23 (20). Placement of the cDNA in the vector positioned the ribosome binding site of the vector 8 bases upstream of' the initiation codon in the cDNA. Cells transformed with the expression plasmid had large quantities of a new 34-kDa protein, and immunoblotting experiments with anti-A1 antibody revealed that the 34-kDa protein contained a reactive epitope. Using the purification procedure described here we routinely obtained 50 mg of 34-kDa A1 protein from the soluble extract of 10 g of pelleted cells in 10-12 h. The final preparation of A1 is homogeneous by amino acid sequencing and by sodium dodecyl sulfate-polyacrylamide gel electrophoresis, and as expected, the purified protein itself is immunoreactive with anti-A1 antibody.
The amino acid composition of purified recombinant A1 is in agreement with that predicted from the nucleotide sequence of the cDNA, and sequencing of the protein demonstrated that residues 1-25 are in exact agreement with residues 2-26 of the sequence predicted from the cDNA; the purified protein does not contain the cDNA-predicted NH,-terminal methionine (13). Enzymatic cleavage of the protein was conducted with endoproteinase Lys-C (EC 3.4.99.30). The sequence a t the COOH-terminal end of the protein was determined by sequencing of the only endoproteinase Lys-C peptide that did not contain lysine. The sequence matches the COOH-terminal sequence predicted from the cDNA nucleotide sequence. Several additional endoproteinase Lys-C peptides also were sequenced. These peptides, together with the NH,-and COOHterminal sequences described above, account for -40% of the complete A1 sequence. Finally, the amino acid compositions of the other Lys-C peptides were determined and were found to match peptides predicted from the cDNA sequence. Taken together, the results indicate a perfect match between the recombinant A1 protein sequence and the sequence predicted from the cDNA at 90% of the residues ( Table I).

STRUCTURE-FUNCTION STUDIES
A1 Domain Structure and ssDNA-Cellulose Binding-A1 is a two-domain protein exhibiting a sharp transition at residues 195-200 between the globular NH,-terminal domain and the randomly structured COOH-terminal domain (13). We obtained three preparations of the NH2-terminal domain peptide for comparison with intact Al. These preparations were 1) the NH2-terminal domain isolated from calf thymus, termed UP1; 2) the NHz-terminal domain obtained by in vitro trypsinization of a recombinant AI, termed P24*; and 3) a recombinant UP1 obtained by subcloning a truncated A1 cDNA into an appropriate expression vector.
Our initial structure-function experiments with recombinant A1 involved examination of the ssDNA-cellulose column chromatographic behavior of the intact protein and of products of limited proteolysis. We found that light trypsinization  This indicated that conversion to the smaller species was due to truncation at the COOH-terminal end only. This mixture of A1 and truncated species then was chromatographed on an ssDNA-cellulose column. Elution was with a linear gradient of buffer C containing 0.4-1 M NaCl (Fig. 1). The two smaller species, 24 and 27 kDa, emerged from the column just after the start of the gradient. The 30-, 32-, and 34-kDa species emerged successively as the concentration of NaCl in the gradient increased.
In additional experiments, the 24-kDa species was produced by trypsin digestion of recombinant A1 as described by Kumar et al. (14) for the HeLa A1 protein. This truncated protein, termed P24*, lacked the COOH-terminal 124 amino acids of A1 and was analogous to UP1 isolated from calf thymus. The behavior of P24* during ssDNA-cellulose chromatography was identical to that of the 24-kDa species shown in Fig. 1. Finally, similar results also were obtained with the recombinant UP1. These results indicated that A1 binds much tighter to ssDNA than truncated species lacking either all or part of the COOH-terminal domain.
Quantitative Aspects of Al-Nucleic Acid Binding-Al-nucleic acid binding was evaluated spectrofluorimetrically using the fluorescent receptor poly(ethenoadeny1ate) (poly(tA)). Binding to proteins enhances poly(tA) fluorescence such that the amount of fluorescence increase is directly proportional to the amount of protein bound. Hence, titration curves of A1 with a fixed level of poly(tA) can reveal equilibrium concentrations in a binding mixture of both free A1 and Al-poly(tA) complex.
Under conditions of low ionic strength, 0.01 M NaC1, the overall association constant is sufficiently high so that all of the A1 added to the mixture is complexed with poly(tA) at subsaturating ratios of protein to polynucleotide (Fig. 2 ) . Saturation of the polynucleotide with protein corresponded to 12 nucleotide residues/protein monomer (n). In order to determine the intrinsic association constant and degree of binding cooperativity, a titration was conducted in the presence of 0.4 M NaC1, conditions where each binding mixture contained equilibrium concentrations of both Al-poly(cA) complex and free Al. The shape of the titration curve suggested positive cooperativity, and this was confirmed using a modified Scatchard analysis, as described by McGhee and NaCI. Three computer-derived theoretical curves (29) are shown, each with n fixed a t 12. The cooperativity parameter ( w ) and intrinsic association constant ( K ) were varied as shown. A curve assuming K = 1.5 X 10" M" and w = 30 fit the data points closely. von Hippel (29) (Fig. 2b). With n fixed at 12, a nonlinear least squares curve fitting procedure gave a best fit with the cooperativity parameter, w, of 30 and the intrinsic association constant, K, of 1.45 X lo5 M". For comparison, two other curves are shown where n was fixed at 12, w was fixed a t 10 or 50, and K was varied for best fit (Fig. 2b). The value of w was examined with two curves (not shown) where n and K were fixed a t 12 and 1.45 X lo5 M", respectively, and w was either 20 or 40. These curves have roughly the same shape as curves 1 and 2 in Fig. 2b but do not fix the data; at v of 0.04, w of 20 gave v / L of 15.8 X and w of 40 gave v / L of 27.1 X A series of similar poly(eA) binding experiments with UP1 from calf thymus were conducted (Fig. 3). In 0.01 M NaCl, saturation of the polynucleotide with UP1 corresponded to between 5 and 7 nucleotide residues/protein monomer. Curve fitting of data obtained in the presence of 0.05 M NaCl with n fixed a t 7 gave a best fit with K of 9 X lo5 M-' and w of 1. Thus, this NH2-terminal domain protein does not exhibit cooperativity.
whereas w of 30 gave 21.1 X The original level of fluorescence with A1 and UP1 could be restored by addition of NaCl (Fig. 4a); the point of onehalf reversal was -0.4 and 0.08 M NaCl, respectively, for A1 and UP1. The extent of binding a t each NaCl concentration along the NaCl reversal curve can be used to calculate the overall association constant, Kw (4,30). A plot of logKw versus log[Na+] was linear for both A1 and UP1 (Fig. 4b). The slopes of the plots for A1 and UP1 were -1 and 4, respectively, illustrating that A1 binding was much less sensitive to salt reversal that UP1 binding. In additional experiments not shown, P24* was examined. We obtained results virtually identical to those with calf thymus UP1 shown in Figs. 3 and 4. The results of these poly(cA) binding comparisons are summarized in Table 11. In the presence of 0.4 M NaCl, A1 binding was about 10,000 times stronger than UP1 binding. This difference was not fully accounted for by the cooperatively parameter.
Cooperative Binding of Recombinant A1 to ssDNA-To further document cooperative binding by AI, gel retardation analysis (31) using fd DNA as the single-stranded nucleic acid ligand was conducted. In this analysis binding mixtures were incubated at 25 "C and then subjected to agarose gel electrophoresis where nucleoprotein complexes migrate slower than free fd DNA. Typical results are shown in Fig. 5a. About 0.4 nmol of A1 (lane 7) was sufficient to complex all of the fd DNA in the mixture (3.2 nmol of nucleotide residues), and the nucleoprotein complexes in this mixture migrated in a relatively sharp band. This complex probably corresponded to fd DNA molecules fully saturated with Al. Protease diges-  (lane 3), some of the fd DNA molecules remained free whereas other molecules were complexed. Thus, even though the protein amount was insufficient to fully saturate all of the fd DNA in the binding mixture, those complexes formed appeared to represent many A1 monomers/DNA molecules. These results agree well with the cooperativity and estimate of nucleotides covered ( n = 12) in the poly(eA) binding studies.
The fd DNA gel retardation pattern observed with recombinant UP1 (Fig. 5b) was different from that with Al. At each level of UP1, all of the fd DNA was complexed, and the extent of retardation of the nucleoprotein complex was proportional to the amount of protein in the binding mixture. This is consistent with noncooperative binding and agrees with our results on UP1 binding to poly(tA). Synthetic Analog of the COOH-terminal Domain Peptide and Poly(& Binding-The results described above indicated that the presence of the COOH-terminal domain in A1 causes the protein to bind to poly(tA) or fd DNA with higher affinity and positive cooperativity. To gain insight into the mechanism of the COOH-terminal domain effect, we prepared a synthetic peptide resembling the native domain and studied its binding to poly(tA). Efforts to express the COOH-terminal domain peptide itself in E. coli from the appropriate cDNA segment were not successful, and similarly, attempts to isolate the domain peptide after mild trypsin digestion failed.
The -120-amino acid native COOH-terminal domain is composed of repeated units of about eight amino acids with consensus sequence of GNyGGGRG. We prepared a 16-F S   residue synthetic oligopeptide representing two of the repeats, GNFGGGRGGNYGGSRG. The oligopeptide then was polymerized, and the products were size-selected to obtain higher M , molecules, about 80% of which were M , = 12,000. Hence, this synthetic polypeptide was approximately the same size as the native COOH-terminal domain and had about the same repeat interval of aromatic residues and arginine residues. Poly(cA) binding analysis with the synthetic polypeptide revealed a strong fluorescence enhancement to the same saturation level seen with A1 (Fig. 6). One-half reversal by NaCl was not observed until concentrations >0.6 M NaCl were reached, indicating that binding depends critically upon nonionic interactions.

DISCUSSION
The strategy for obtaining large amounts of a protein by cDNA-based production in E. coli is now well established. Yet in most cases, it remains a formidable experiment achievement to obtain appropriate modification and subcloning of a coding sequence, overexpression, and purification of large amounts of undergraded protein. In the present case, with the full-length A1 cDNA, we obtained desirable subcloning and overexpression in a straightforward fashion. However, the complete purification of the recombinant A1 protein in undegraded form proved difficult. Initially, we found that protein samples purified by standard methods were partially degraded, and upon storage a t 4 or 25 "C, there was progressive degradation due to contaminating protease activity. This protease activity could be removed by the method described here, in which a key step is washing the ssDNA-cellulose column with 5 column volumes of 0.4 M NaCl (in buffer D) prior to gradient elution. The A1 protein in the final fraction is stable during long term storage a t either 4 or 25 "C. The purification procedure described also has been adapted to the preparation of pure recombinant protein from a large amount (50 g) of cell paste.
The recombinant rat A1 protein may be identical to the A1 proteins isolated from calf thymus or HeLa cells, except that the latter two proteins have blocked NH, termini and a methylated arginine at position 194; the corresponding arginine in the recombinant rat A1 protein is not methylated ( Table V). The complete sequence of the A1 proteins is known only for the rat protein, but the sequence of the first 195 residues of the calf thymus protein (32,33) and the last 196 residues of the human protein (15) are known. In these overlapping regions, the sequences of the three proteins are identical.
Various applications of controlled proteolysis have been used to probe structure-function relationships of singlestranded nucleic acid binding proteins (34). These proteins generally have one relatively large trypsin-resistant domain that retains partial nucleic acid binding activity, and this is also true for A1 (14). HeLa Al, for example, yields a fragment of M , = 24,000 corresponding to the NH,-terminal domain that binds specifically and tightly to single-stranded polynucleotides; this trypsin-resistant fragment is identical to the UP1 protein isolated directly from cells (14). The data presented here show that trypsinization of recombinant A1 also yields a M , = 24,000 fragment equivalent to UP1 and that with lighter proteolytic digestion three other fragments of somewhat higher M , are obtained. Sequencing revealed that all of these fragments had the same NH, terminus end, and thus, the fragments contained different amounts of the glycine-rich COOH-terminal domain. We found that each of these fragments bound to an ssDNA-cellulose column in 0.4 M NaCl but was eluted a t lower NaCl concentrations than the intact protein in the order 24 < 30 < 32 kDa (Fig. 1). Hence, we concluded that the COOH-terminal domain exerts an effect on binding affinity of A1 to ssDNA.
Next, we made use of fluorescence properties of poly-(ethenoadenylate) to obtain equilibrium binding data for A1 and NH,-terminal domain peptides. The binding is expressed as Kw, where K is the intrinisic association constant and w is the cooperativity parameter. We found that A1 covers 12 nucleotide residues upon binding to poly(cA) and that in 0.4 M NaCl, K and w are 1.5 X lo5 M" and 30, respectively. In contrast, UP1 covers 5-7 nucleotides, is noncooperative, and K is 0.6 X 10' M". This finding of positive cooperativity for A1 binding and noncooperativity for UP1 binding was corroborated, in a qualitative sense, by gel retardation analysis (Fig. 5) using fd DNA as the single-stranded nucleic acid.
The association constants for A1 and UP1 binding to poly(cA) over a range of [NaCl] were obtained from salt-reversal data (Fig. 4). Plots of logKw uersus log[Na+] illustrated that A1 binding is much less salt-sensitive than UP1 binding. The slope of the logKw uersus log[Na+] plot is interpreted (30) in terms of the number of ion pairs involved in the binding, and this can be assigned as 51 and 3 for A1 and UP1, respectively. Further, the extrapolated value of Kw at 1 M NaCl represents the affinity due to nonelectrostatic interactions (30), and this is equal to 1 X lo6 and 30 for A1 and UP1, respectively. The plot also facilitates comparison of A1 binding with that of other single-stranded nucleic acid binding properties (35)(36)(37). The intrinsic association constant for T4 gene 32 protein binding to poly(tA) is 1.2 X lo6 M" in 0.2 M NaCI, and the interpolated value for A1 binding to poly(eA) is 3 X lo5 M-'. In 0.02 M NaC1, the mouse helix-destabilizing protein has noncooperative affinity of 4 X lo5 M-' for singlestranded DNA (2); the interpolated value for the AI-poly(tA) interaction is 4.5 X lo7 M".
Taken together our comparisons of intact A1 with the NH,terminal domain peptides reveal that the COOH-terminal domain strongly influences nucleic acid binding by Al. The mechanism of this effect involves nonelectrostatic interactions and positive cooperativity. But, even without the cooperativity parameter, A1 binding to poly(tA) is at least 100fold stronger than UP1 binding. This could be due to direct interaction between the COOH-terminal domain and the polynucleotide or to an allosteric effect of the COOH-terminal domain upon binding by the NH2-terminal domain. Our studies with the synthetic COOH-terminal domain analog demonstrate that a peptide with features similar to those of the native domain is indeed capable of very strong nucleic acid binding, and structural modeling of the COOH-terminal domain of A1 (38) indicates that the domain is capable of direct interfacing with nucleic acids. This modeling was based upon theoretical considerations of protein secondary structure, the regular spacing of most of the phenylalanine or tyrosine residues, and hydrophobic interaction of these residues with nucleic acid base residues. Therefore, in addition to providing protein-protein contact, we suggest that the mechanism of the A1 binding involves direct interaction of the COOHterminal domain and the nucleic acid. Clearly, much more study is required to understand the mechanism of A1 binding, and the fact that large amounts of the recombinant protein can now be obtained points to particularly powerful approaches, such as photochemical cross-linking, NMR, and crystallography. These studies of the A1 protein binding mechanism will have implications for other eukaryotic RNA binding proteins, since these proteins share certain structural features. The COOH-terminal domain motif of regularly spread aromatic residues separated by a-helix-free regions of high flexibility is present in nucleolin, hnRNP complex protein A2, Drosophila P9, and RNA polymerase large subunit (38-46).3 The NH2-terminal domain of A1 contains four oligopeptide sequences conserved among several RNA binding proteins (44)(45)(46): Based upon the results presented here and the photochemical cross-linking result of Merill et aL3 it seems clear that all of these converted regions are involved in binding.