Molecular Cloning of a Novel Human Leukemia-associated Gene EVIDENCE OF CONSERVATION IN ANIMAL SPECIES*

We have recently described an 18-kilodalton poly- peptide (p18) that is present in much greater abundance in acute leukemic blast cells (myeloid and lymph- oid) than in resting or proliferating nonleukemic lymphoid cells or chronic lymphoid and myeloid leu- kemic cells. In this report we describe the cloning of two different sized full-length cDNAs that code for p18. The two cDNAs differ in their 3"noncoding regions as a result of alternative polyadenylation. Analy- sis of the complete nucleotide sequence and the corre-sponding amino acid sequence did not reveal signifi- cant homology to any previously described sequences. We show evidence that this gene is highly conserved in several animal species and low stringency hybridiza- tion studies suggest that the p18 gene may be a member of a family of partially homologous genes in the human genome.

From the Veterans Administration Medical Center and the Departments of "Internal Medicine, 'Pediatrics, and 'Pathology, University of Michigan, Ann Arbor, Michigan 48109 and the dMax-Planck-Institut for Biochemie, 0-8033 Martinsried, Federal Republic of Germany We have recently described an 18-kilodalton polypeptide (p18) that is present in much greater abundance in acute leukemic blast cells (myeloid and lymphoid) than in resting or proliferating nonleukemic lymphoid cells or chronic lymphoid and myeloid leukemic cells. In this report we describe the cloning of two different sized full-length cDNAs that code for p18. The two cDNAs differ in their 3"noncoding regions as a result of alternative polyadenylation. Analysis of the complete nucleotide sequence and the corresponding amino acid sequence did not reveal significant homology to any previously described sequences. We show evidence that this gene is highly conserved in several animal species and low stringency hybridization studies suggest that the p18 gene may be a member of a family of partially homologous genes in the human genome.
It is generally believed that more than one event is required for the transformation of a normal cell to a malignant cell. The application of molecular biology to the study of the pathogenesis of cancer has led to the discovery of many oncogenes which may be involved in malignant transformation. In spite of the identification of more than 40 different oncogenes, it has been exceedingly difficult to define the relationship between mutations affecting oncogenes and the establishment of the malignant phenotype (1). The role of oncogene activation in human leukemia has been the subject of numerous investigations. It has been shown that the most common mutations which affect oncogenes in human acute leukemia are those involving members of the ras family of oncogenes (2). In other leukemic disorders, it has been demonstrated that oncogenes may also be activated by chromosomal translocations (3,4), gene amplification ( 5 ) , or retroviral insertions (6). In the majority of cases of human acute leukemia, however, no evidence of oncogene activation has been found.
We have used a different approach for the investigation of the molecular alterations in acute leukemia in humans. The approach is based on direct analysis of the polypeptide con-* This work was supported in part by grants from the Veterans Administration and by National Institutes of Health Grants HL4291901, CA32146, and P60AR20557-10A.
The nucleotide sequence(s) reported in this paper has been submitted to the GenBankTM/EMBL Data Bank with accession numberfs) 50499 I . ' Fellow of the Cooley's Anemia Foundation.
'To whom correspondence should be addressed VA Medical Center (111D), 800 Poly Place, Brooklyn, NY 11209. stituents of leukemic cells using two-dimensional polyacrylamide gel electrophoresis (PAGE)' to identify polypeptide alterations in leukemic cells relative to non-leukemic cells. This approach led to the identification of an 18-kDa polypeptide (p18) that is present in much greater abundance in cells from patients with acute leukemia of different subtypes than in normal peripheral blood lymphocytes, nonleukemic proliferating lymphoid cells, bone marrow cells, or cells from patients with chronic lymphoid or myeloid leukemia (7). In a previous study, p18 was isolated from two-dimensional polyacrylamide gels, and the sequence of a tryptic decapeptide was derived by gas-phase microsequencing (7). In this paper, we report the successful cloning of the cDNA that codes for p18. The availability of this cDNA allowed us to deduce the primary structure of the protein and to perform a study of its conservation in different species.

MATERIALS AND METHODS
Two-dimensional Electrophoresis and Amino Acid Sequence Analysis-A human T cell leukemic cell line (HSB-2) was grown in tissue culture. The cell pellet was solubilized and analyzed by two-dimensional PAGE as previously described (7). The protein of interest was digested with trypsin within the gel matrix, and the sequence of eluted tryptic fragments was determined as described (8).
cDNA Cloning-Poly(A)+ RNA was isolated from the leukemic T cell line HSB-2 and used as template for the synthesis of doublestranded cDNA as described by Gubler and Hoffman (9). The CDM7 plasmid vector that was used in the construction of the cDNA library was the generous gift of Brian Seed (Massachusetts General Hospital) (10,11). CDM7 was digested with BstXI and ligated to doublestranded cDNA to which BstXI adaptors had been added (10). The resulting recombinant DNA was used to transform competent MC1061/p3 cells to generate the cDNA library. 2 X lo5 independent recombinants from a library of 4 X 106 clones were screened by colony hybridization (12) in tetramethylammonium chloride (13) using a mixture of 32 different synthetic oligonucleotides. The degenerate sequence CT(T or C)CT(T or C)TT(A or G)TT(A or G)TT(A or G)AA takes into consideration all possible triplets which code for Glu-Glu-Asn-Asn-Asn-Phe. This sequence is part of the decapeptide sequence we reported earlier (7). Two positive clones were identified after primary and secondary screening. Their inserts were subjected to DNA sequence analysis as described below. DNA Sequence Analysis-After generating a limited restriction map of the two different cDNA clones, DNA sequence analysis was performed by the dideoxy chain termination method of Sanger et al. HeLa cells, and monkey kidney (COS) cells (16) as described by Favaloro et al. (17). Northern blot analysis was performed using glyoxal-denatured RNA (18). Primer extension was performed on 100 pg of K562 RNA as described (19,20) using a 5'-end-labeled primer which extends from position 102 to position 203 of the cloned cDNA.
Southern Blotting-High molecular weight DNA was isolated from cells grown in culture or from tissue derived from different animal species using the Blin and Stafford method (21). This DNA was digested with different restriction enzymes, electrophoresed on agarose gels, and transferred to Nytran filters (Schleicher & Schuell) (22). The filters were hybridized to a nick-translated probe which extends from position 1 to position 1300 of the cDNA clone. High stringency Southern blotting included a final 1-h wash a t 65 'C in 0.1 X SSC (1 X SSC, 150 mM NaCl, 15 mM sodium acetate) while the final wash in the low stringency blotting was performed for 45 min a t 60 "C in 2 X SSC. Fig. 2 shows the two-dimensional PAGE separation of polypeptides from the HSB-2 T cell leukemia cell line that was used as the source of RNA for the construction of the cDNA library. One labeled arrow points to the p18polypeptide and another points to actin for comparison. We synthesized a mixture of 32 oligonucleotides (17-mers) which take into account all possible codons for six internal amino acids of the p18 polypeptide (7). This oligonucleotide mixture was labeled to a high specific activity using polynucleotide kinase (23) and used to screen the HSB-2 cDNA library. Two independent clones which hybridized strongly to this probe mixture were identified and characterized as described below.

RESULTS
One of the recombinant clones contained a 1.5-kb cDNA insert and the other contained a 1-kb insert. Both inserts were sequenced in their entirety using the strategy outlined in Fig. 1. The obtained nucleotide sequence is shown in Fig.  3. The two cDNAs were identical in their 5"untranslated regions and their coding regions. They differed, however, in their 3"untranslated regions. Both cDNAs code for the same 149-amino acid polypeptide that starts with an ATG at position 103 of both transcripts and ends with TAA terminator a t position 551. The translated amino acid sequence contained a perfect match for three tryptic peptides whose sequence was determined by gas-phase microsequencing (the original decapeptide described earlier (7) and two additional peptides that have been sequenced since the initial publication: 1) Ala-Ile-Glu-Glu-Asn-Asn-Asn-Phe-Ser-Lys, (2) Lys-Leu-Glu-Ala-Ala-Glu-Glu-Arg, (3) Asp-Leu-Ser-Leu-Glu-Glu-Ile-Gln-Lys). The calculated molecular weight of the translated polypeptide is 17,302 with a predicted isoelectric point of 5.64. These are in close agreement with a measured molecular weight of 18,000 and a PI of 5.7 (7).

(lane 2), and K562 cells (lane 3 ) .
The filter was hybridized to a probe from the common 5' region of the two p18 cDNA clones. A 1.5-and 1-kb band were seen in RNA from human cells (lanes 1 and 3 ) while a single 1-kb band was seen in RNA from monkey cells (lane 2). In order to map the site of transcription initiation more accurately and to establish that the cDNA clones are full-length copies of p18 mRNA, we performed primer extension analysis. A DNA fragment which extends from position 103 to position 204 was 5"end-labeled and used to prime cDNA synthesis using total RNA from K562 cells as template. An extension product of 205 nucleotides was noted (data not shown). This corresponds very closely to the 5'-end of the p18 cDNA and suggests that both clones are full-length cDNAs.
To study the organization of the p18 gene in the human genome, we performed the Southern analysis shown in Fig. 5. Human genomic DNA was digested with seven different restriction enzymes and probed with a p18 cDNA probe. All lanes show either one or two hybridizing fragments which suggests a gene of limited size and complexity. We then performed Southern blot analysis using DNA isolated from human, monkey, chimpanzee, dog, cow, pig, duck, hamster, and mouse. When the filter was probed with the human p18 cDNA and washed a t low stringency to detect partial homologies, we were able to detect hybridizing bands in every animal DNA tested (Fig. 6A).
When we compared the pattern of the hybridizing fragments in EcoRI-digested human DNA a t high stringency (Fig.  5, lane 3 ) and at low stringency (Fig. 6A, lane 1 ), we noted two additional bands in the low stringency blot that were not seen at higher stringency. This suggested to us the possibility that another gene may exist in the human genome which has partial homology to the p18 gene. To confirm that the additional fragments represent partially homologous sequences rather than partial digestion of human DNA with EcoRI, we rehybridized the same filter shown in Fig. 6A to p18 cDNA and washed it at high stringency. The autoradiograph in Fig.  6B shows the disappearance of the two additional bands noted above. This suggests the existence of sequences in the human genome that have partial homology to the p18 gene.

DISCUSSION
We have recently described the use of high resolution protein separation techniques (two-dimensional PAGE) for the identification of polypeptides that are aberrantly expressed in leukemic cells (7,24). This led to the identification of an 18-kDa polypeptide that is present in significantly greater amounts in leukemic cells than in non-leukemic white blood cells (7). Our preliminary studies suggested that the increased amount of p18 in leukemic cells is not related to specific cell lineage, differentiation stage, or cell proliferation (7). In this report, we describe the cloning of the gene which codes for this polypeptide and describe the complete structure of the cDNA and its translated protein product.
Several aspects of the cloned gene deserve further comment. It appears that the two cDNAs that were isolated are derivatives of alternatively polyadenylated mRNAs transcribed  not generate diversity at the protein level, which is the case in several alternatively polyadenylated genes (25,26). I t is conceivable that alternative polyadenylation may serve a regulatory function by generating mRNAs of different stabilities. In HeLa cells and K562 cells, the majority of the p18 mRNA is a product of proximal polyadenylation while in monkey cells (COS), all the p18 mRNA is a product of proximal polyadenylation. Further studies are needed to investigate the significance of these observations and explore a possible regulatory role for the use of different polyadenylation sites. When the derived DNA and protein sequences were compared with sequences present in Genebank, EMBL, and PIR data bases, no significant homology to known sequences was detected. Analysis of the translated amino acid sequence did not reveal a signal peptide which suggests that p18 is not a secreted protein. Our previous studies have suggested cytosolic localization of p18, based on crude separation of nuclear and cytoplasmic fractions (7). These studies, however, do not exclude some minor nuclear localization. Analysis of the amino acid sequence of p18 did not show any of the well known features of transcriptional regulators (leucine zipper (27), zinc fingers (28), or homeo box sequences (29)). These findings, along with the small size of the protein argue against a regulatory function mediated by DNA binding to promoter or enhancer sequences.
In previous studies, we observed a moderate increase in p18  8), and pig (lane 9) were digested with EcoRI and probed with p18 cDNA. The filter was washed a t low stringency as described under "Materials and Methods." B, the same filter as in A was boiled and rehybridized to p18 cDNA at high stringency as described in the text. Several of the bands that were seen a t low stringency were not seen a t high stringency. expression in stimulated lymphocytes relative to resting lymphocytes (7). We recently examined the effect of lymphoid stimulation on the phosphorylation of the p18 gene product by 32P-labeling of lymphoid cells. Our data suggest that at least two phosphorylated forms of p18 increase in amount following lymphoid activation.2 Interestingly, Feurestein et al. (30) identified an abundant cytosolic phosphoprotein (pp17) ( M , = 17,000, PI 5.5) in HL-60 promyelocytic leukemia cells whose phosphorylation is increased after exposure to phorbol esters in uitro. They suggested that this protein may play a role in intracellular propagation of growth regulatory signals and proposed to call it "prosolin" (31). Pasmantier et al. (32) described a group of phosphoproteins which they called p19 (Mr = 19,000, PI 5.9, 5.7, and 5.4) whose phosphorylation is stimulated in endocrine tumor cell lines by a variety of secretagogues. Later on, the same group identified and purified a similar protein from bovine brain (33). An antibody that they raised to the purified protein reacted with similar polypeptides in many different species from man to mouse and was present at high levels in HL-60 promyelocytic leukemia cells (34). phosphoproteins which they called "stathmin" (Mr = 19,000, PI 5.8-6) that are very abundant in rat brain. They also noted that the phosphorylation of these proteins is regulated by a variety of extracellular effectors which induce different target cellular responses. They pointed out the similarities between the proteins they described and those described by Feurestein et al. (30) and Pasmantier et al. (32) and suggested that these proteins may provide a relay between extracellular signals and intracellular substrates. They speculated that these proteins may be involved in the regulation of the proliferation, differentiation, and/or functions of the many different cell types in which they were discovered (35).
The protein we describe here shares many of the properties of the proteins described above. It has similar migration properties by two-dimensional PAGE ( M , = 18,000, PI 5.6) and is phosphorylated upon stimulation of lymphoid cells. It is a major cytosolic protein that is highly conserved in evolution (Fig. 6 A ) and may be a member of a family of related genes (Fig. 6B). Northern blot analysis showed a very high level of expression of p18 in human brain.3 All these data suggest that p18, p19, prosolin, and stathmin may be different names for the same protein. More studies need to be performed to explore possible regulatory roles such proteins may play in the different cellular processes in which they are involved.