Exons of the Human Pancreatic Polypeptide Gene Define Functional Domains of the Precursor *

Pancreatic polypeptide is a 36-amino acid peptide which inhibits pancreatic exocrine function. We have previously determined from the nucleotide sequence of a cDNA that pancreatic polypeptide is derived from a 95-amino acid precursor, prepropancreatic polypeptide. Pulse-chase studies have suggested that the precursor is cleaved to produce three peptides: pancreatic polypeptide, an icosapeptide, and a smaller peptide. In the present study, we have used the cloned cDNA as a hybridization probe to isolate the pancreatic polypeptide gene from a human bacteriophage genomic library. The nucleotide sequence of 2.8 kilobases of DNA representing the entire human pancreatic polypeptide gene was determined. The gene contains four exons and three introns. Exon 1 encodes the 5”untranslated region of the mRNA, exon 2 encodes the signal sequence and the sequence of pancreatic polypeptide, exon 3 encodes the icosapeptide, and exon 4 encodes a carboxyl-terminal heptapeptide and the 3”untranslated region of the mRNA. By-Southern blot analysis, the gene detected in a pancreatic polypeptide-producing islet cell tumor was indistinguishable from that in normal human leukocytes. The structure of the human pancreatic polypeptide gene is consistent with the hypothesis that prepropancreatic polypeptide generates three distinct peptides, each encoded by a separate exon. Increased expression of pancreatic polypeptide in the islet cell tumor does not appear to be correlated with major alterations in pancreatic polypeptide gene structure.

Pancreatic polypeptide is a 36-amino acid peptide which inhibits pancreatic exocrine function. We have previously determined from the nucleotide sequence of a cDNA that pancreatic polypeptide is derived from a 95-amino acid precursor, prepropancreatic polypeptide. Pulse-chase studies have suggested that the precursor is cleaved to produce three peptides: pancreatic polypeptide, an icosapeptide, and a smaller peptide. In the present study, we have used the cloned cDNA as a hybridization probe to isolate the pancreatic polypeptide gene from a human bacteriophage genomic library. The nucleotide sequence of 2.8 kilobases of DNA representing the entire human pancreatic polypeptide gene was determined. The gene contains four exons and three introns. Exon 1 encodes the 5"untranslated region of the mRNA, exon 2 encodes the signal sequence and the sequence of pancreatic polypeptide, exon 3 encodes the icosapeptide, and exon 4 encodes a carboxyl-terminal heptapeptide and the 3"untranslated region of the mRNA. By-Southern blot analysis, the gene detected in a pancreatic polypeptide-producing islet cell tumor was indistinguishable from that in normal human leukocytes. The structure of the human pancreatic polypeptide gene is consistent with the hypothesis that prepropancreatic polypeptide generates three distinct peptides, each encoded by a separate exon. Increased expression of pancreatic polypeptide in the islet cell tumor does not appear to be correlated with major alterations in pancreatic polypeptide gene structure.
Pancreatic polypeptide is a 36-amino acid hormone secreted by cells within the endocrine and exocrine pancreas. Its secretion is stimulated by a variety of nutritive, metabolic, and neurogenic factors including fasting, protein meals, and hypoglycemia (1). Many of these stimuli appear to be mediated by cholinergic pathways. Pancreatic polypeptide inhibits secretion of enzymes and bicarbonate from the exocrine pancreas and causes relaxation of the gall bladder, but its effects on the endocrine pancreas are unclear (2). The limited effect of pancreatic polypeptide on insulin, somatostatin, and glucagon secretion, despite the proximity of PP cells to A, B, and D cells within the islets, has led some investigators to hypothesize that products of PP cells other than pancreatic * This work was supported by National Institutes of Health Grants T32AM07024, F32AM07448, and CA37370 and Center for Gastroenterology Research on Absorptive and Secretory Processes Grant 1P30AM39428. The costs of publication of this article were defrayed in part by the payment of page charges, This article must therefore be hereby marked "advertisement" in accordance with 18 U.S.C. Section 1734 solely to indicate this fact. polypeptide itself may have effects on islet hormone secretion.
Like other peptide hormones, pancreatic polypeptide is synthesized as a larger precursor. Schwartz et al. (3,4) have demonstrated by pulse-chase analysis that the prohormonal form of canine pancreatic polypeptide is post-translationally processed to produce pancreatic polypeptide, an icosapeptide, and a smaller third peptide derived from the carboxyl terminus of the precursor. The canine icosapeptide as well as its human counterpart have been isolated and sequenced (5), but to date no effect of this peptide on insulin, somatostatin, or glucagon secretion has been reported.
The nucleotide sequences of cDNAs representing the human pancreatic polypeptide precursor (6-8) have confirmed the studies of Schwartz et al. (3,4). These sequences have indicated that pancreatic polypeptide is synthesized as a 95amino acid precursor. Following the co-translational removal of a 29-amino acid signal sequence, a 66-amino acid prohormonal form is generated that contains the sequence of pancreatic polypeptide at its amino terminus (6). This molecule is further cleaved to produce the 36-amino acid pancreatic polypeptide. The fate of the carboxyl-terminal 27 amino acids of the precursor is uncertain. Possibly, this 27-amino acid peptide is cleaved at an arginine residue to produce the icosapeptide isolated by Schwartz et al. (5) and the additional carboxyl-terminal peptide characterized in the pulse-chase experiments. Alternatively, generation of the smaller peptide may occur during peptide isolation.
Genes encoding peptide hormones that undergo post-translational proteolytic cleavages are frequently divided into regions that correspond to functional domains in the precursor. Exon-intron junctions occur near the proteolytic cleavage sites in a number of hormone genes including gastrin (9), glucagon (10, l l ) , parathyroid hormone (12); cholecystokinin (13), and calcitonin (14). In order to better understand the biosynthesis of pancreatic polypeptide, we have characterized the structure of the gene and report its nucleotide sequence. The results of this study indicate that the intervening sequences within the pancreatic polypeptide gene occur at the putative cleavage sites in the precursor. These findings are consistent with the hypothesis that prepropancreatic polypeptide generates three distinct peptides.
EXPERIMENTAL PROCEDURES isotution of Recombinant Phage Containing the Human Pancreatic Polypeptide Gene-A human genomic library prepared from human liver DNA (15) was screened by the method of Benton and Davis (16) using a 350-base pair pancreatic polypeptide cDNA (6) as a hybridization probe. The cDNA was labeled to a specific activity of lo9 cpm/ pg by nick translation (17). Two bacteriophage recombinants, XPPSA and XPP4A, contained DNA that hybridized to the probe. DNA was isolated from the recombinant phage by the method of Blattner et al. (18).
Restriction Endonuclease Analysis of the Purified Recombinant Bacteriophnge DNA-Each phage DNA preparation was digested with several restriction enzymes, and the fragments were separated by agarose gel electrophoresis. Those fragments containing the pancreatic polypeptide gene were identified by Southern blot hybridization (19). A single 4.5-kb' EcoRI restriction fragment of XPP4A and a 1.9-kb EcoRI fragment of XPP2A hybridized to the probe. The hybridizing EcoRI fragment from each recombinant phage was isolated from an agarose gel and was ligated to EcoRI-digested pUC12 using T4 DNA ligase. The ligated DNA was transfected into Escherichia coli strain MC1061. DNA from one recombinant plasmid, pPP4A, was amplified in bacteria and isolated in sufficient quantity for nucleotide sequence analysis.
Nucleotide Sequence Analysis-Restriction fragments were labeled at either the 5' end with [y3'P)ATP and T4 polynucleotide kinase or at the 3' end with the large fragment of DNA polymerase I and an ~~-~'P-labeled deoxyribonucleotide and were sequenced by the method of Maxam and Gilbert (20).
Mapping of Transcriptional Initiation Site-RNA was prepared from a human pancreatic polypeptide-producing tumor by the method of Chirgwin et al. (21) and was enriched for the polyadenylated fraction by affinity chromatography on oligo(dT)-cellulose. The 5' end of the mRNA was mapped with S1 nuclease. A hybridization probe was prepared by digesting pPP4A with the restriction enzyme HinfI and end labeling the fragments with T4 polynucleotide kinase. The labeled DNA was electrophoresed on a 5% acrylamide gel and a 100-base pair fragment was recovered from the gel by electroelution. The fragment was denatured by boiling and allowed to hybridize to the tumor RNA overnight at 57 "C in 80% formamide. The samples were diluted and digested with 65 units of S1 nuclease (Bethesda Research Laboratories) at 30 'C for 30 min (22). The sizes of the S1 nuclease-protected fragments were determined by electrophoresis alongside a sequencing ladder on an 8% denaturing acrylamide gel.
Southern Blot Analysis-Five to 10 pg of genomic DNA from human leukocytes or from the islet cell tumor were digested with various restriction enzymes and were electrophoresed on 0.7-1.0% agarose gels at 30-50 V. The fragments were transferred to nitrocellulose paper and allowed to prehybridize for 2 h in 1 X SET (1 X SET = 0.15 M NaCl, 0.03 M Tris-HCI, pH 8.0, 0.001 M EDTA) containing 50 pg/ml heparin and 0.5% SDS. The blot was then hybridized for 16 h at 60 "C in 1 X SET containing 0.5% SDS, 500 pg/ml heparin, and 10% (w/v) polyethylene glycol 8000 using a 1950base pair XhoI-Hind111 fragment extending from -300 to +1650 as a hybridization probe. Blots were washed at 60 "C in 1 X SET containing 0.5, 0.25, 0.125, and 0.036% SDS for 30 min each and in 0.2 X SET containing 0.1% SDS at 65 "C for 1 h. Following a final 30-min wash in 3 mM Tris base at 25 "C, the filter was exposed to x-ray film and the hybridizing bands were detected by autoradiography.

RESULTS
Isolation of a Bacteriophage Clone Encoding Human Pancreatic Polypeptide-Of 150,000 recombinant phage plaques screened, two contained DNA sequences that hybridized to the pancreatic polypeptide cDNA. The hybridizing phage were plaque-purified, and the phage DNA was digested with the restriction enzyme EcoRI. Southern blot hybridization of the digests from phage XPP4A revealed a single 4.5-kb fragment that hybridized to the pancreatic polypeptide cDNA probe. This fragment was subcloned and mapped with a variety of restriction enzymes. The restriction map is depicted in Fig. 1.
Comparison of Cloned and Genomic DNA by Restriction Analysis-To verify that the cloned DNA was an authentic representation of the pancreatic polypeptide gene, we digested DNA from normal human leukocytes with the restriction enzymes PuuII, SphI, and NcoI and analyzed the fragments by Southern blotting using a cloned pancreatic polypeptide gene fragment as a hybridization probe (Fig. 2). All of the restriction fragments predicted from the map of the cloned bacteriophage gene were present in the digests of genomic DNA. Additional lightly hybridizing bands were observed in each digest. These additional bands were not observed with a ' The abbreviations used are: kb, kilobase; SDS, sodium dodecyl sulfate. cDNA hybridization probe (data not shown), suggesting that the additional hybridization arose from DNA sequences in one of the intervening sequences or the 5"flanking region. We conclude that XPP4A contains a normal non-rearranged copy of the pancreatic polypeptide gene.
In addition, Southern blots of DNA from the pancreatic polypeptide-producing tumor were indistinguishable from those of normal leukocyte DNA. Increased expression of pancreatic polypeptide in this tumor does not appear to be associated with major alterations in gene structure.
Nucleotide Sequence Analysis of the Cloned Pancreatic Polypeptide Gene-Approximately 2.8 kilobases of the 4.5-kb subcloned gene were sequenced. Comparison of the gene sequence to the sequence of the cDNAs revealed that the coding region is divided into four exons separated by three introns of 750, 245, and 190 bases (Figs. 1 and 3). Exon 1 encodes the 5'untranslated region, extending from the transcriptional origin to the translational initiator methionine. The signal peptide and pancreatic polypeptide sequence are contained in exon 2. Exon 3 encodes the icosapeptide which flanks the sequence of pancreatic polypeptide. The carboxyl-terminal heptapeptide as well as the 3"untranslated region of the mRNA are contained in exon 4. The nucleotide sequence of the coding regions of the cloned gene is identical to the sequence of two

+ 745 g g c c d g c t t c c t a g g c a c g g g a a~t c t~g g c c c g g~a c t c c a c c t c c c c t c t g c t t g c t c d c a~~~~~W~
Exon 2

M e t A l a A l a A l a A r g L e u C y s L e u S e r L e u L e u L e u L e u + 985 ~~~~~tgtgtgccacagttgg~gagagatcccagcccdgggagccctgggcccactccacattcctggccacaccctatccccagcc~gcccccagccc +1105 cttctaggcctgctcttgggaaacagggcatctgtcgctcaa~g~cagac~atgtgcctgg~agatggtgtc~acaggtcagatatga~aggtgggctg~cdgggcaca eAsnHetLeuThrArgPr& ~r G l~y~g H i s L y s G 1~r~~~~S e~l u T r~l y~r P r o H i~~a V a l P r~
IcoSapePt~de--""""~""""""

+1225 g t g c t t g c c c c t g c t g c c t c t t c c d c c c a~A~~~~~~C~~t g a g t t t g a c t c
Exon 3 +I345 ccctgtctgtccaggctcccttggggctgaaatgggWtggt99gactgaatca99~ttggaaaggtgtagtggggggttggaagagggagaacaggaggccca99gcca9cgtga9gcctc I _ Heptappti+ +1465 c t g a g g g c a c g a g g c c t a c c c c c t a c a c t g c c a t g t t c t g c c c t g t c c t c a c a~A T i v m X W~ CriXCJCTCa~~ Exon 4 gGluLeuSerProLeuAspLeu +1585 cagcctgtcctgtgtctctgtggadccctggtgggggc +I705 a g c a g g a a g t g c c a a a g t g c c a a a t~a t g g g a g a g a a a g c t c t +1825 ccccttcgcagaggg

FIG. 3. Nucleotide sequence of the human pancreatic polypeptide gene.
Capital letters indicate transcribed regions, and lower-case letters are used for introns and flanking sequences. Numbering (+1) begins at the cap site. Negative numbers indicate 5"flanking sequence. Peptides within the precursor are identified above their respective amino acid sequences. The Goldgerg-Hogness promoter (TATAAA) sequence at -30 is boxed, as is the polyadenylation recognition signal AATAAA. Putative core enhancer-like sequences and possible regions of Z-DNA structure are underlined.
of the cDNAs previously reported (6, 7). The single base substitution in the coding sequence reported by Takeuchi and Yamada (8) was not present in the cloned gene and may represent a polymorphism. Characteristic G-T/A-G donor acceptor sites are present at the ends of all, three intervening sequences. A region of alternating purine and pyrimidine nucleotides was found between positions +242 to +296 within the first intron. Similar sequences have been found to adopt a Z-DNA structure (23).
Promoter, Transcriptional Initiation Site, and 5'-Flanking Sequences of the Human Pancreatic Polypeptide Gene-The transcriptional initiation site was determined by S1 nuclease mapping of mRNA from a human islet cell tumor (Fig. 4). Two DNA fragments differing in length by a single nucleotide were resistant to S1 nuclease digestion. These studies local-ized the transcriptional initiation point to a site 60-61 base pairs upstream from the translational start codon. The degeneracy of the transcriptional origin deduced by S1 mapping could arise from one of several causes including degeneracy of the cap site, mRNA degradation, or instability of mRNA-DNA duplexes close to the cap. The transcriptional initiation site was 24-25 bases downstream from a canonical sequence, TATAAA. The nucleotide sequence of an additional 900 bases upstream from the promoter was also determined and is shown in Fig. 3. Potential consensus core enhancer-like sequences were noted at -682 and -814 base pairs.
Repetitive DNA Sequences-Repetitive DNA sequences were initially identified by Southern blot. hybridizations of subcloned DNA digested with the restriction enzyme MstII. The blots were probed with nick-translated human genomic S I . .

FIG. 4. Localization of the transcriptional initiation site of
the human pancreatic polypeptide gene by S1 nuclease mapping. Messenger RNA from a human pancreatic tumor was hybridized to a denatured 5' end-labeled HinfI restriction fragment of the pPP4A subclone. The hybrid was digested with S1 nuclease, and the SI-protected fragments (arrowhead) were analyzed by electrophoresis on an 8% acrylamide-urea gel alongside sequencing ladders prepared from the same DNA fragment. G, guanine; G+A, guanine + adenine; AC, adenine greater than cytosine; C+T, cytosine + thymine; C,

cytosine.
DNA. These studies revealed hybridization to a 1.6-kb fragment which extended from intron 3 to 1.4 kb beyond the polyadenylation site. The repetitive element was further localized by blot hybridizations of subcloned DNA digested with SphI and EcoRI or Hind111 and BarnHI. This analysis localized the repetitive element to a sequence between 700 and 1500 bases downstream from the polyadenylation site (see Fig. 1).

DISCUSSION
In the present study, we report the nucleotide sequence of the human prepropancreatic polypeptide gene isolated from a bacteriophage genomic library. Restriction endonuclease analysis of a total human genomic DNA indicates that the cloned DNA fragment is an authentic representation of the gene. Characteristic features of the pancreatic polypeptide gene include a Goldberg-Hogness promoter consensus sequence separated by 24 or 25 bases from the cap site, three intervening sequences, and a repetitive DNA element between 700 and 1500 bases pairs downstream from the polyadenylation site. The nucleotide sequence of the gene indicates that the four coding regions correspond to four discrete functional domains. The first intron occurs precisely at the translational origin. The second intron occurs five nucleotides 5' to the Gly-Lys-Arg cleavage-amidation signal at the carboxyl terminus of pancreatic polypeptide. The presence of a third intron located precisely at the arginine codon at the carboxyl terminus of the icosapeptide suggests that this arginine residue may represent another proteolytic processing site. Although single basic residues are not typical cleavage sites, other hormone precursors, including cholecystokinin (13), prosomatostatin '24), and gastrin (9), have also been found to be cleaved at single arginine residues.
We have searched the 5"flanking region of the pancreatic polypeptide gene for potential core enhancer sequences, G T G W G (25). Sequences were found at positions -814 and -682, which share seven of the eight nucleotides. Functional assays of expression will be required to determine the importance of these sequences on gene transcription.
Strikingly increased secretion of pancreatic polypeptide is frequently observed in human pancreatic islet tumors, tumors, in particular, tumors which produce vasoactive intestinal polypeptide and glucagon (26). No similarities were noted among the promoter regions of the pancreatic polypeptide, vasoactive intestinal polypeptide (27), and glucagon (10, 11) genes, however. Southern blots of genomic DNA isolated from such a pancreatic polypeptide-producing tumor were indistinguishable from those obtained using normal human leukocyte DNA. Increased expression of pancreatic polypeptide in the tumor therefore does not appear to be related to major alterations in gene structure.
Pancreatic polypeptide is part of a larger family of peptides that includes neuropeptide Y (28) and peptide YY (29). Southern blot hybridizations of human genomic DNA using pancreatic polypeptide cDNA probes detected only those fragments identified in the cloned gene. Additional weakly hybridizing fragments were only detected using a genomic probe, suggesting that the hybridization arose from DNA sequences in one of the introns or in the 5"flanking region. Comparison of the nucleotide sequence of a neuropeptide Y cDNA (30) to that of pancreatic polypeptide reveals that the greatest degree of homology (approximately 60%) occurs within a limited region of the two precursors. It is unlikely therefore that neuropeptide Y sequences would be detected in the Southern blot hybridizations. The nucleotide sequence of the peptide YY mRNA has not been determined. The amino acid sequence of peptide YY is more similar to neuropeptide Y than it is to pancreatic polypeptide, however. The coding regions of most hormone genes are interrupted by intervening sequences, as shown schematically in 27,31). In some instances, such as in the somatostatin (32) or pro-opiomelanocortin (33) genes, the intervening sequences do not appear to divide the hormone precursor into structural or functional domains. The prepropancreatic polypeptide gene, on the other hand, is clearly divided into functional domains by its intervening sequences.
Like many of the other hormone genes depicted in Fig. 5, the pancreatic polypeptide gene contains an intervening sequence near the translational initiation site. In the case of pancreatic polypeptide, this intron occurs precisely at the initiator AUG codon, thereby separating the ribosomal binding site from the protein coding region. Furthermore, the structure of the prepropancreatic polypeptide gene indicates that the signal region of the precursor is not specifically demarcated by intervening sequences. As shown in Fig. 5, the signal regions of most hormone precursors also do not appear to be encoded by discrete structural units of the respective genes.
The division of hormone coding regions into structural and/ or functional domains by intervening sequences may serve several functions in generating the diversity of peptide hormones. Separating the coding regions for discrete structural domains within a polyprotein may permit conservation of one domain while allowing divergence of another (34). Such a process may have occurred during the evolution of preproglucagon (10, 11), for example. Separation of functional regions in individual exons may also provide a mechanism to increase diversity through alternative RNA splicing. This mechanism appears to be used by the calcitonin (14) and tachykinin (35) genes. Because the biological functions of the icosapeptide and heptapeptide derived from the pancreatic polypeptide precursor are unknown, it is unclear whether the introns divide the gene into functional, in addition to structural, domains. As mentioned above, the interruption of hormone genes by introns may allow independent accumulation of changes within portions of a precursor. Therefore, in the case of propancreatic polypeptide, it might be expected that some portions of the precursor would be more highly conserved than others. The sequences of human, ovine, and canine pancreatic polypeptide differ in only four positions out of 36. The human, ovine, and canine icosapeptides vary at 12 of the 20 positions (5). The poor conservation of the icosapeptide suggests that this peptide may not have an essential biological function or that its function may differ among species. No data are available at present regarding conservation of the heptapeptides at the carboxyl terminus of the precursor. A complete understanding of the physiological significance of the pancreatic polypeptide gene will require the elucidation of the functional role of both the icosapeptide and heptapeptide.