Isolation of two genomic sequences encoding the Mr = 14,000 subunit of rat prostatein.

Complementary DNAs to rat ventral prostate poly(A) RNA were cloned into pBR322 by the "dG-dC tailing" procedure. Clones containing cDNAs to the mRNAs coding for each of the three subunits of a major secretory protein (prostatein) were identified by hybrid-arrested translation. A 457-nucleotide base pair cDNA (E45) and a portion of a 365-base pair cDNA (E85) were analyzed to determine the composite complete DNA coding sequence for the Mr = 14,000 (C3) subunit of prostatein. A sequence of 12-nucleotide bases (TTTGCTGCTATG) in the signal peptide of C3 was noted to be homologous to signal peptide nucleotide sequences reported in cDNAs coding for the other two prostatein subunits, Mr = 6,000 (C1) and 10,000 (C2). Complementary DNA coding for the C3 subunit was used as a hybridization probe to screen an EcoRI rat genomic DNA library. Two unique 12-kilobase genomic clones, each containing mRNA coding sequences within 2.5-3-kilobase fragments, were identified by restriction enzyme mapping and Southern blot analysis. Restriction enzyme sites within the coding regions of both genes were analogous to the cDNA. Differences in restriction enzyme sites in regions of intervening sequences and flanking DNA established the uniqueness of the two genes. It is suggested that both genes may be transcribed in vivo.

Differences in the size of the subunits of prostatein synthesized in vitro and i n vivo have been observed (8,(15)(16)(17)(18)(19)(20). For example, a M , = 10,000 peptide synthesized by cell-free translation of poly(A) RNA in vitro contains an NH2-terminal signal peptide which is cleaved i n vivo to produce the M, = 6,000 subunit (Cl). Similarly, a M , = 14,000 peptide synthesized i n vitro contains a signal peptide which is cleaved i n vivo to produce the M , = 10,000 subunit (C2). In addition to signal peptide cleavage, the M, = 11,000 peptide synthesized in vitro is glycosylated (2,8,9,11) to generate the M , = 14,000 subunit (C3).
The androgen dependence and abundance of prostatein (25% of total cytosol protein) (15-23) make the rat ventral prostate an excellent model system for studies on androgen regulation of gene transcription. Cloning and structure analysis of androgen-dependent genes is a first step towards the identification of transcriptional controls. This report describes the isolation of two unique genes for the C3 subunit of prostatein.

EXPERIMENTAL PROCEDURES AND RESULTS~
Two cDNA clones (E45 and E85) encoding the C3 subunit were isolated and sequenced. Combining these sequences allowed us to read a complete cDNA sequence of the C3 subunit ( Fig. 4). There are 30 bases of 5"noncoding sequence, 54 bases encoding the signal peptide, 231 bases encoding the amino acid sequence, and 142 bases of 3"noncoding sequence. The DNA sequence representing the coding region of the gene is in alignment with the amino acid sequence except for amino acid 61. The codon for this amino acid represents a serine residue, whereas the amino acid was reported to be glycine (12). Within the signal peptide, a sequence of 12 base pairs (TTTGTCGCTATG) (Fig. 4, shown in box) is 92% homologous with signal peptide sequence in cDNAs for C1 and C2 subunits (53). A possible poly(A) addition signal (TATAAA) is located at base 420 (Fig. 4); however, it differs slightly from the putative signal (TAATAAA), and its location at a mini-50 c~A : AAC AGA ACA  ACC CAC AGG GAC TGC CTC AAC ATG AAG CTG GTG  TTT CTA TTC TTG TTG  The predicted amino acid sequence is shown below the nucleotide sequence. Using cDNA clones E45 and E85 (see Fig. 3), a composite coding sequence was constructed. The predicted amino acid sequence differs from the experimentally determined sequence (12) only at residue 61. The cDNA sequence predicts serine, while the protein sequence indicated glycine, implying a G for A substitution at base 265, the first base of the codon. An 18-amino acid signal peptide precedes the NH, terminus and contains a 12-base sequence homologous with sequences found in an analogous region of C1 and C2 subunit cDNAs (53). The conserved sequence is enclosed in a box. Base 60, the center nucleotide of a BstEII restriction enzyme site, was difficult to decipher from the sequencing gels and is represented by an X. Carbohydrate attachment is at amino acid residue 17 (Asn). mum of 37 bases from the 3' end of the cDNA is farther upstream than most poIy(A) addition signals (54).
A restriction enzyme map constructed from the DNA sequence of the C3 subunit cDNA is shown in Fig. 5. All sites were verified experimentally except for MboII and HinfI.
The E45 cDNA insert was used as a probe in screening a rat genomic Charon 4A bacteriophage library. From a screen of approximately 750,000 plaque-forming units, three major and three minor hybridizing plaques were identified. Two gene clones, designated C3.22.14 and C3.22.7, were found to have unique restriction enzyme maps (Fig. 6). Both genes span 2.5-3.0 kilobases and are contained within EcoRI fragments approximately 12 kilobases in length. The 5' to 3' orientations of the genes were determined by hybridizing a Locations of the two genes within the EcoRI genomic fragments are similar. Flanking DNA extends 7.5-8 kilobases from the 5' ends of the genes to upstream EcoRI sites and 1.5-2 kilobases from the 3' ends to downstream EcoRI sites. While restriction enzyme patterns are conserved within the coding regions of these genes, restriction sites differ in flanking regions and in regions containing intervening sequences. Evidence for the location of intervening sequences was obtained by comparing distances between restriction enzyme sites in the cDNAs and natural genes and by Southern blot hybridizations. For example, there are approximately 1500 bases of intervening sequence located within the 43-base BstEIIIPstI segment of the cDNA (base 57 to base 100 in Fig. 4). Within this intervening sequence, a restriction enzyme site for SstI is present in the C3.22.14 gene but not in the C3.22.7 gene. A second intervening sequence of 175 bases is found in both genes between the XbaI sites located in the cDNA at bases 105 and 231 (Figs. 4 and 6). A third intervening sequence of approximately 330 bases is located within the XbaI (base 231)/HincII (base 301) fragment of the cDNA. KpnI and HindIII restriction enzyme sites appear to be located within this fragment of C3.22.7; however, these sites are not represented in the (23.22.14 gene. Additional intervening sequences have not been characterized.

DISCUSSION
We have cloned cDNAs for the three peptides that constitute prostatein, the major androgen-dependent secretory protein of rat ventral prostate. In this report, we present the nucleotide sequence of coding and noncoding regions for C3, the M , = 14,000 subunit of prostatein. A cDNA probe was used to identify two different genomic clones for the C3 subunit.
The base sequence of C3 cDNA bears little resemblance to the sequences of C1 and C2 reported recently by Parker et al. (53), except in a region encoding the signal peptide where there exists an interesting homology. There is a block of 12-nucleotide base pairs which differ by only one base pair (C for T at the fifth base). This conserved sequence (TTTGTI-IGCTATG) is found 40 bases downstream from the proposed methionine start codon (base 71 to base 82 in Fig. 4) in the C3 subunit cDNA. Likewise, this sequence is located 46 bases downstream from the proposed methionine start codon in C1 and 37 bases downstream in C2 (53). The corresponding amino acid sequence Cys-Cys-Tyr (--TGC TGC TAT-) is located 4 amino acid residues upstream from the processed NH, terminus in the C3 subunit and 7 amino acid residues upstream from the processed NHZ termini of the C1 and C2 subunits. Selection pressure may be responsible for stabilization of these sequences. One possibility is that they play a role in coordinate translation or processing of prostatein subunits. Mous et al. (55) have microinjected ventral prostate mRNA into Xenopus laevis oocytes and found that all translated prostatein subunits are assembled correctly into the final secreted product. Addition of the oligosaccharide side chain is not necessary for complete assembly (55). The homologous sequence in each prostatein subunit messenger RNA may be essential for stoichiometric translation and subsequent assembly of the subunits. Comparative analysis of the genomic sequences for C1 (56), C2 (56), and C3 subunit genes may reveal whether these sequences evolved in a convergent manner or were retained despite high evolutionary divergence of duplicated ancestral genes. Two unique genes have been identified for the C3 subunit of prostatein by screening a library of EcoRI restriction fragments in Charon 4A bacteriophage. Evidence for the location of intervening sequences indicates that differences in these genes are confined to intron regions. The conservation of restriction sites within the coding region of each gene argues against the possibility that one of the genes is a pseudogene. Evolution rates of nonexpressed genes are greater than the rates of expressed genes; thus, a drift in the nucleotide sequence of coding regions in a nontranscribed pseudogene might be expected to alter restriction enzyme sites. Retention of analogous cDNA restriction sites in the two genes suggests that both genes are expressed and code for identical C3 subunits. This is supported by the observation that only one C3 subunit amino acid sequence has been observed. C3.22.14 and C3.22.7 genes may be nonallelic duplicated genes which have maintained identical C3 subunit coding sequences. In a similar case, Parker et al. (56) have C T recently reported that the c 2 gene of prostatein may be 22. Carter, D. B., Silverberg, A. B., and Harris, S. E. (1981)   cellulose p a p e r ( 4 6 ) and h y b r i d i r e d w i t h l 2 P n i c k -t r a n s l a t e d C3 s u b u n i t CDNA ultraviolet l i g h t .
WIA f r a g m e n t s were t r a n s f e r r e d f r m gels to n i t r o ( 4 5 ) to construct r e s t r i c t L o n e n z y m e maps of t h e n a t u r a l genes.
Restriction enzyme d i g e s t s were a n a l y z e Cmplementary DNAs t o t o t a l pOly(A1 RNA were cloned and ldenrlfication of cLWA clones f o r t h e t h r e e s Y b w n i t 9 o f p r e s t a t e i n was c a r r i e d oat by hybrida r r e s t e d t r a n 8 1 0 t L o n I H I R T I . F l g u r e s 2A and 28 display HARTS for t h e 14,000 n, P e p t i d e (10,000 nr C2 s u b u n i t ) , t h e 10.000 nr P e p t i d e l6.000 n, CI s u b u n i t ) and t h e 11.000 nr Peptxde (14,000 nr C3 s u b u n i t ) .
DNA sequence of clone E 4 5 (Pig 3Al d l f f e c e d in t h e r e g i o n e n c o d i n g t h e Nrennlnwe Of t h e s u b u n i t ae canpared with our mynino a c i d aequenee (8) and f r m t h e more E m p l e t e S e q u e n c e Of PeeterS e t al. ( 1 2 ) . Another clone, E85, was partially sequenced around a 88t E11 r e s t r i c t i o n s i t e l o c a t e d ln t h e Nterminus coding reglon (Flq 38).
The 5 ' sequence Of E85 a l i g n e d w i t h t h e known N-termlnel amlno a c i d sequence and t h e reglon f k " n u c l e O t I d e b a s e 4 8 to base 105 was r e p r e s e n t e d In t h e E45 clone 5 ' Sequence ae an i n v e r t e d E45 f r a n base I to base 105, and the nucleotide lequenee i n t h e E45 i n v e r a l o n ~e q u e n c e 1" the OppOSlte s t r a n d . The i n v e r t e d sequence wae r e p r e s e n t e d I n a r t l f a c t of cloning ha8 been observed with Other cDNA8 and a p p e a r s to be a