Structural Organization of the Gene for Human Prolidase (Peptidase D) and Demonstration of a Partial Gene Deletion in a Patient with Prolidase Deficiency*

D) catalyzes of the di- and tripeptide with carboxyl-terminal proline and plays an important role in recycling proline in various cells and tissues. By using human prolidase cDNA as a probe, a chromosomal gene related to prolidase was isolated from human gene libraries. The human proli- dase gene is over 130 kilobases long and is split into 15 exons. Ail of the splice donor and acceptor sites conform to the GT/AG rule. The transcription initia- tion site was determined by nuclease Sl mapping and primer extension and was located 131 bases upstream from the initiation codon. A “CAAT” box-like sequence was present 67 bases upstream from the cap site, but there was no “TATA” box-like sequence. There were seven sets of sequences resembling the transcription factor Spl binding sites. Four were upstream from the cap site, and three were downstream.


860, Japan
Prolidase (peptidase D) catalyzes hydrolysis of the di-and tripeptide with carboxyl-terminal proline and plays an important role in recycling proline in various cells and tissues. By using human prolidase cDNA as a probe, a chromosomal gene related to prolidase was isolated from human gene libraries. The human prolidase gene is over 130 kilobases long and is split into 15 exons. Ail of the splice donor and acceptor sites conform to the GT/AG rule. The transcription initiation site was determined by nuclease Sl mapping and primer extension and was located 131 bases upstream from the initiation codon. A "CAAT" box-like sequence was present 67 bases upstream from the cap site, but there was no "TATA" box-like sequence. There were seven sets of sequences resembling the transcription factor Spl binding sites. Four were upstream from the cap site, and three were downstream. We also analyzed findings in patients with prolidase deficiency with respect to major gene re-arrangement.
Several hundred base deletions, including the 14th exon, were identified. Knowledge of the gene structure of human prolidase will facilitate further studies on the expression and regulation of this gene and provide necessary information for analyses of mutations in patients with this deficiency.
Prolidase (peptidase D, iminodipeptidase, EC 3.4.13.9) is an ubiquitous enzyme which splits dipeptides with a prolyl residue in the carboxyl-terminal position. The enzyme is expressed in all tissues and cells in humans. The induction or hormonal control of the enzyme activity under various conditions is not well understood. The enzyme has been isolated from various sources, including humans (l-3). The human enzyme is a homo-dimer with a subunit of 56,000 daltons (3,4), and the activity is activated by a manganese ion.
We Lolated cDNA clones corresponding to human prolidase (5) and determined the primary structure of the subunit polypeptide (6). The subunit protein is composed of 492 amino acid residues and is rich in glutamic acid residues (6). The *This work was supported in part by a Grant-in-Aid from the

Ministry of Education of Japan and the Ministry of Health and
Welfare of Janan and by a grant for research from IBM Japan. The costs of pubgcation of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked "advertisement" in accordance with 18 U.S.C. Section 1734 solely to indicate this fact. amino-terminal alanine residue is acylated, and there are two putative glycosylation sites (6). We also determined chromosomal localization of prolidase (6).
Deficiency of the enzyme results in abnormalities of the skin and other collagenous tissues (7). The affected subjects excrete massive amounts of iminopeptides into the urine, and it is these peptides which function as a substrate for prolidase (8). This rare genetic prolidase deficiency is inherited as an autosomal recessive trait (9). In foregoing work, we defined the pol~eptide and RNA phenotypes of cells obtained from patients with prolidase deficiency (10). All these analyses revealed that the genetic defects in these patients are heterogeneous.
To comprehend the structure-function relationships, gene organization, and biosynthetic regulatory mechanisms better, we isolated and characterized the gene for human prolidase. This gene is over 130 kilobases (kb)' long and consists of 15 exons. The absence of the 14th exon was noted in a Japanese girl, who was a product of consan~ineous mating and was deficient in prolidase.

Isolation and Characterization of Phage Clones
Containing the Human Pro&se Gene-Two independently constructed human genomic DNA libraries were screened for clones carrying the prolidase eene using the nlaoue hybridization technique (13). The first was constructed from EcoRI partial digests of human liver DNA (a kind eift from Dr. A. Hata and Dr. K. Shimada, Kumamoto University), and the second was constructed from Sau3A partial digests of human leukocytes DNA (a kind gift from Drs. M. Takiguchi, Y. Haraguchi, and M. Mori, Kumamoto University) (14). Approximately 1 x lo6 EMBL4 phages of each total DNA library were screened, using the human nrolidase cDNA insert of PL21 as a probe (6). For chromosome walking, we screened the Sou3A partial library, using appropriate nuclear highly repetitive, sequence-free DNA fragments derived from some of the isolated clones. Phage DNAs of positive clones were

Isolation and Characterization
of the Prolidase Gene-Two phage libraries constructed from human liver and human leukocytes were screened for the prolidase gene. Approximately 42 independent clones were isolated and analyzed by restriction enzyme digestion and partial sequencing (Fig. 1). These clones overlapped, except for one region, and spanned  over 140 kb. Since a part of intron 9 was not cloned completely, the gene must be larger than indicated in Fig. 1. To define positions and boundaries of the exon blocks, the restriction fragments identified by Southern hybridization were subcloned and their sequences determined ( Fig. 1, Table I). The gene was divided into 15 exons which ranged from 45 bases (exon 7) to 528 (exon 15), and the 14 introns ranged in size from 1.0 kb (intron 3) to 50 kb longer (intron 9). Since the exons total about 1.9 kb, 98% of the gene is occupied by introns. We attempted to clone the entire intron 9 by chromosome walking, using appropriate DNA fragments derived from some of the isolated clones. We were not able to elucidate the structure of intron 9, even when additional human libraries (23) were screened.
All of the splice donor and acceptor sites conform to the GT/AG rule (24) for nucleotides immediately flanking exon borders (Table I). We found six nucleotide substitutions between the cDNA sequence (6) and genomic sequence; G to A at nucleotide position 793, T to G at 1063, A to T at 1101, G to A at 1294, C to T at 1874, and G to C at 1887, four of which were present in the coding region, and two in the 3'-untranslated region. All of four nucleotide substitutions present in the coding region caused amino acid changes. Characterization of the 5' and 3' Ends of the Prolidase Gene-The nucleotide sequence around the 5' and 3' ends of the prolidase gene is shown in Fig. 2. The 5' end of the mRNA was determined by nuclease Sl mapping and primer extension analysis (Fig. 3) and is numbered +l. The DNA fragment labeled at position 2 downstream from the first nucleotide of the initiation triplet was used as a probe for the Sl mapping. We noted a protected gene fragment of 133 bases for placental poly(A)+ RNA. The 133-base-long fragment, starting from the primer labeled at the same position as the Sl probe, was also detected by primer extension analysis. From these results, the 5' end of the prolidase mRNA was assigned to a position 131 bases upstream from the first nucleotide of the initiation triplet. The assigned 5' end was the residue A, the generally preferred cap site (24,25). The sequence CAAT (-67 to -64), resembling canonical "CAAT" box is situated at the usual location, that is around 70-80 bases upstream from the cap site (25) (Fig. 2A). A sequence resembling the canonical "TATA" box was not observed. Seven sets of potential binding sites for the cellular transcription factor Spl, CCGCCC (26) an inverted form of GGGCGG were present at positions -52 to -47, -38 to -33, -21 to -16, -11 to -6, +39 to +44, +51 to +56, and +62 to +67 (Fig. 2A). The 3'untranslated region of the prolidase gene contains 393 nucleotides (Fig. 2B). The site of the polyadenylation signal was inferred from the cDNA (6). A typical poly(A)+ addition signal sequence AATAAA is located 21 bases upstream from the poly(A)+ tail.   Table II. Southern Blot Analysis of Prolidase Deficiency-We first analyzed DNA from three patients with prolidase deficiency by Southern blot analysis after 'ToqI or BamHI digestion, using the cDNA insert of PL21 as a probe. A re-arrangement such as a major deletion or insertion was never evident (data not shown). We then reexamined all exons, except for exon 6, using specific genomic DNA fragments containing each exon. We were not able to elucidate exon 6 by Southern blots when a IO-kb BamHI fragment containing exon 6 was used as a probe. This fragment we used may had repetitive sequences in the intron sequence. There were no re-arrangements in exons 1 to 13, except for exon 6 (data not shown).

Detection of Restriction Fragment Length Polymorphism
As shown in Fig. 4, a 4.0-kb BamHI-EcoRI fragment of genomic DNA (pGI4), which contained exons 14 and 15, hybridized with 4.4-and 1.5kb fragments generated by TaqI digestion in the control (lane I). On the other hand, a 1.3-kb fragment (instead of the 1.5-kb fragment) appeared when the DNA sample from a patient (lane 2) was analyzed. When the same samples were digested with BamHI and subjected to Southern hybridization with the same probe, a -8.7-kb fragment was visualized in lanes where the DNA sample from the control and the patient were analyzed (lanes 3 and 4, respectively). PstI digested DNA from the patient had a normal 1.4kb and a l.l-kb fragment but lacked the normal 0.8kb fragment (lane 6). Control experiments suggested these changes were not due to RLFPs. The patient was a product of consan- The sequence on both strands was determined. A, the underline (+l to +148) denotes the first exon. The initiation codon is in squares. The boxed urea with the symbols resemble the following sequences "CAAT," CAAT box (25) and her sister had a similar enzyme defect, as described elsewhere (10). Thus, the patient seemed to be homozygous. These findings revealed a partial gene deletion of several hundred bases, thereby eliminating exon 14, in this particular patient. A major abnormality in gene structure was not evident in the other patients studied. DISCUSSION We determined the structural organization of the human chromosomal gene for prolidase by analyzing the overlapping genomic clones obtained from two different human gene libraries. This 130-kb-long gene is one of the relatively large ones thus far determined (the dystrophine gene, 2000 kb (27); the cystic fibrosis gene, 280 kb (28); the factor VIII gene, 186 kb (29); the phenylalanine hydroxylase gene, 90 kb (30); the thyroglobulin gene, 100 kb (31); the insulin receptor gene, 120 kb (32); and the ornitine transcarbamylase gene, 73 kb (33)). Since the sum of the length is 1.9 kb, exons occupy only 2% of the entire gene, a value which is one of the lowest among genes heretofore reported. Although genes containing large introns are common, there does not appear to be a direct correlation between mRNA size and gene structure. For example, the human factor VIII (29) and thyroglobulin (31) genes exceed 100 kb, although these genes code for mRNA transcripts of 7.8 kb. In contrast, the vitellogenin gene codes for a 6.6-kb mRNA product and contains 25 introns (mean size 940 bp) spanning only 23 kb (34). The 15 exons contained in the prolidase gene range in size from 45 to 528 bp (mean size 135 bp), consistent with the documented distributions of exon size for 20 proteins (35).  human genome (28,36,37), the 5' side of intron 9, which we could not analyze, may locate within such a region. Fourteen introns of highly variable size separate the struc- The 5' end of the prolidase mRNA assigned here is 131 ture sequences into 15 exons. Intron 9 was over 50 kb, hence, bases upstream from the initiation codon. The 5'-flanking we could not completely analyze it by three times of chro-region of the prolidase gene resembles that of other so-called mosome walking, and other human libraries were used. As house-keeping genes (38). It has a typical CAAT box and unclonable regions are estimated to constitute 5% of the seven sets of a hexanucleotide Spl binding GC box (inverted  (39) and was shown to enhance transcription by RNA polymerase II lo-50.fold from several viral and cellular promoters that contained at least one properly positioned Spl binding site (39,40). Transcription of the human prolidase gene may be responsive to the Spl factor. Multiple initiation sites for the transcription have been noted in several house-keeping genes lacking the TATA box (41-46). However the initiation of prolidase mRNA transcription seems to occur at one site (Fig.  3).
RFLPs in the human prolidase gene were detected with the enzymes KprzI, BamHI, and EcoRV, the former two were detected with the cDNA insert of PL21 as a probe and the third one was with a genomic fragment of pGl4. The frequencies of the minor components of these three RFLPs are 0.2, 0.25, and 0.3, respectively (Table II). Locus of the prolidase gene was reported to be close to the myotonic dystrophy locus, as determined by linkage analysis (47), and the prolidase gene was mapped to the short arm of the chromosome 19 (19p 13.2), using in situ hybridization techniques (6). With use of the RFLPs, the DNA diagnosis in the pedigree of myotonic dystrophy may be feasible.
We also found six nucleotide substitutions by comparing the sequence of cDNA (6) with that of genomic DNA. All of the substitutions in the coding region were accompanied by amino acid substitutions (glycine to glutamic acid at residue 221, leucine to arginine at residue 311, aspartic acid to valine at residue 324, and arginine to histidine at residue 388). The nucleotide variations found between the gene and cDNA seem to be cloning and/or sequencing artifacts.
We reported the polypeptide and RNA phenotypes of cells obtained from patients with prolidase deficiency (lo), and we analyzed DNAs from three patients lacking enzyme activity and cross-reacting materials. We detected no re-arrangement of the gene when TaqI, BumHI, or EcoRI digested DNAs were used, and the cDNA insert of PL21 served as the probe. A partial gene deletion of several hundred bases and which eliminates exon 14 was detected in a homozygous patient using a genomic fragment (pG14) containing exons 14 and 15 as a probe. By using the cDNA as a probe, exon 14 could not be visualized after TuqI digestion of control DNA, since there were two TuqI sites within exon 14. This deletion mutation was not detected by Southern hybridization after EcoRI or BumHI digestions, since the generated DNA fragments were large (-10.0 and 8.7 kb, respectively).
Only when the 4.0-kb BarnHI-EcoRI fragment (pGl4) containing exons 14 and 15 was used as a probe and the DNA samples were digested with TuqI or PstI, did we clearly note differences between DNAs from healthy persons and the patient. We used Southern blots and examined other patients with prolidase deficiency, using the same probe, however, the hybridization patterns were similar to those seen in the control. Although some patients with prolidase deficiency are genetically heterogeneous, others seem to have different mutation(s).
Having characterized the normal gene for the human prolidase, further studies on the expression and regulation of this gene are underway. These studies will provide pertinent information required to analyze mutations and their effects in patients with prolidase deficiency.