Characterization of a Novel Dentin Matrix Acidic Phosphoprotein IMPLICATIONS FOR INDUCTION OF BIOMINERALIZATION*

Acidic phosphorylated proteins have been shown to be prominent constituents of the extracellular matrix of bone and dentin. The acidic phosphoproteins of bone contain more glutamic acid than aspartic acid and a lower serine content than either. On the other hand, the major dentin acidic phosphoproteins, phosphophoryns, have been defined as aspartic acid- and serine- rich proteins, with a lesser content of glutamic acid. Both sets of phosphoproteins have been implicated as key participants in regulating mineralization, but it has been difficult to unify their mechanisms of action. We have now identified, by cDNA cloning, a new serine-rich acidic protein of the dentin matrix, AG1, with a composition intermediate between the bone acidic proteins and dentin phosphophoryns. AG1 has numerous acidic consensus sites for phosphorylation by both casein kinases I and 11. Immunochemical and biosynthetic studies show that AGl is present in phosphorylated form dentin If fully phosphorylated, AGl would a net charge of -176/molecule of 473

Biomineralization, the biogenic formation of mineral deposits, is one of the most widespread processes in nature (1). All aspects of this mineral deposition are important, but our attention has been focused on the processes in eukaryotes, where the mineral deposition at extracellular sites is mediated and regulated by the proteins and other components that the cells secrete to form the matrix. Although the specific components of the extracellular matrix may differ, it is generally considered that matrix-mediated mineralization follows a basic strategic plan in which the cells first form the structural components of the extracellular matrix and then deliver regulatory macromolecules that modify the properties of the *This work was supported by National Institute of Dental Research Grant DE-01374 and National Institute of Arthritis, Musculoskeletal, and Skin Diseases Grant AR-13921 from the National Institutes of Health. The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked "advertisement" in accordance with 18 U.S.C. Section 1734 solely to indicate this fact. matrix and lead to the location and stereospecific induction or nucleation of mineral deposition (2). Many candidates have been proposed as the regulatory nucleating proteins in various systems, but no definitive data have yet verified this role for any one protein.

The nucleotide sequence(s) reported in this paper
Bone and dentin are the principal mineralized tissues of vertebrates. In these tissues, type I collagen fibrils serve as the extracellular matrix structural component, and a carbonate apatite constitutes the mineral phase. Several acidic proteins have been considered as the potential mineral inductive regulatory macromolecules (3). We have studied dentin for many years as a relatively simple model system and have determined that it contains a set of acidic phosphoproteins, called phosphophoryns, one or more of which could be involved in regulating the nucleation and crystal growth processes in dentin (4). The phosphophoryns, characterized by a very high content of aspartic acid and phosphorylated serine residues (5,6), have proven to be extremely difficult to purify and sequence by standard protein chemical means (7). However, knowing the amino acid sequence of any of these molecules was considered essential to further understanding their specific roles in the dentin extracellular matrix. Thus, we have taken the approach of molecular biology and begun to clone the phosphophoryn genes to determine their sequences. We report here the complete deduced amino acid sequence of the first of the unique dentin acidic phosphoproteins to be cloned and our deductions concerning phosphorylation sites and conformation, all of which have direct relevance to the potential role of this group of phosphoproteins in the mineralization process. We also demonstrate that this protein is present, uniquely, in dentin and that it is phosphorylated in vivo.

EXPERIMENTAL PROCEDURES
Extraction of RNA-Total RNA was isolated from the odontoblastpulp fibroblast complex of incisors of 3-week-old Sprague-Dawley rats using the guanidine isothiocyanate method of Chirgwin et al. (8) as modified by Han et al. (9). Freshly extracted incisors, including the intact pulps, were cleaned of all adherent peridontal tissues and frozen in liquid nitrogen. Less than 2 min elapsed between tooth extraction and freezing. The frozen incisors were then pulverized in 3-g batches in a liquid nitrogen freezer-mill (Spex Industries Inc.). Each batch of pulverized incisors was extracted in 20 ml of cold 5 M guanidine isothiocyanate by homogenization for 1 min in a Polytron homogenizer. The homogenate was centrifuged at 12,000 rpm for 5 min in a Beckman JA-20 rotor at 4 "C. The pellet, containing the nucleic acids, was resuspended in 10 ml of guanidine isothiocyanate, sheared again for 10 s in a Polytron homogenizer, and recentrifuged. The intact RNA and sheared DNA fragments were in the supernatant. The RNA was precipitated by the addition of 250 pl of 1 M acetic acid and 7.5 ml of ethanol (-20 "C). After incubation at -20 "C for 3 h, the RNA precipitate was collected by centrifugation at 6000 rpm for 10 min at -10 "C. The pellet was dissolved in 10 ml of cold guanidine hydrochloride ( 1. I, sequencing strategy for cDNA clone pAG1. Arrows with bores indicate the use of synthetic oligonucleotides as a primer. 2 and 3, 5'-and 3"untranslated regions, respectively, obtained using the polymerase chain reaction-based RACE technique. 4, an 800-bp fragment of pAGl used as a probe in Northern blot analysis.
with ethanol as described above. The reagents, tubes, and other equipment used in all of the procedures described above were treated with diethyl pyrocarbonate to inhibit ribonuclease activity.
Isolation of mRNA and Preparation of Double-stranded DNA-Poly(A+) RNA was obtained by chromatography on an oligo(dT)cellulose column (9). The cDNA was synthesized from poly(A+) RNA using a cDNA synthesis kit following the manufacturer's protocol (Amersham Corp.). The cDNA was methylated with EcoRI methylase and ligated to EcoRI linkers. The resultant cDNA was cut with EcoRI and passed over a Sephadex G-50 column. The fractions without free linkers were selected.
Cloning of Inserts into Xgtll-The selected Sephadex G-50 fractions were ligated into a Xgtll vector that had been cut with EcoRI and dephosphorylated. After ligation, the clones were packaged, and Escherichia coli strain Y1090 cells were infected. The library was subsequently amplified to yield a titer of 1 X lo9 plaque-forming units/ml. Screening of Rat Incisor cDNA Library-The infected E. coli cells were grown for 3.5 h at 42 "C on 14-cm LB plates. These were overlaid with nitrocellulose membranes (Schleicher & Schuell) that had been soaked in 10 mM isopropyl-1-thio-B-D-galactopyranoside and airdried. The E. coli cells were grown overnight at 37 "C. The membranes were washed with 0.05% Tween/Tris-buffered saline and blocked for 8 h with 3% bovine serum albumin and 1% normal goat serum at 4 "C. The library was screened with a 1:lOOO dilution of a polyclonal antibody to the mixture of purified rat incisor phosphophoryns (anti-riPP)' prepared, in rabbits, as described by Rahima et al. (10). The membranes were washed with 0.05% Tween/Tris-buffered saline followed by 3% bovine serum albumin and 1% normal goat serum for 1 h and were then incubated with goat anti-rabbit second antibody (Sigma) for 6 h at room temperature. After a further wash with 0.05% Tween/Tris-buffered saline, the color was developed with 4-chloro-1-naphthol (Bethesda Research Laboratories). A number of positive plaques were obtained. These were purified by repeated plating and selection with the polyclonal anti-riPP antibody. After three rounds of plaque purification, a surprisingly large number of positive plaques remained.
To simplify the selection problem, the antibody-positive plaques were rescreened with an antisense "wobble" poly-Asp probe (5'-

TC(G/A)TC(G/A)TC(G/A)TC(G/A)TC(G/A)-3'; kindly prepared
for us by Dr. Joel Rosenbloom) since direct amino acid sequencing of riPP had shown the presence of poly-Asp sequences (7). Again, a number of nucleotide-positive clones were detected. These were crossscreened by Southern analysis and grouped into several related sets.
Northern Blot Analyses of mRNA-Representatives from these sets were then used as probes for Northern analysis of the initial poly(A+) RNA to confirm that they did indeed correspond to some components of the tissue mRNA. Poly(A+) RNA (500 ng) components were separated by electrophoresis on formaldehyde gels and trans-' The abbreviations used are: riPP, rat incisor phosphophoryn(s); bp, base pair(s); RACE, rapid amplification of cDNA ends. ferred to nitrocellulose membranes. These membranes were prehybridized and hybridized by standard protocols. The blots were probed with fragments cut from the cDNA clones with EcoRI and S a d .
Sequencing of Cloned cDNA-The phage were cut with EcoRI. The longest cloned cDNA insert (2496 bp) was subcloned into the EcoRI site of the M13mp18 vector and sequenced by the dideoxynucleotide chain termination method (11) using modified T7 DNA polymerase, Sequenase (12), and [35S]thio-dATP. Oligonucleotide primers were synthesized to sequence the entire clone, as indicated in Fig. 1.
Compressions resulting from secondary structures in the DNA were eliminated by replacing dGTP with dITP (13).
Isolation of 5'and 3'-Ends by Polymerase Chain Reaction-The 2496-bp cDNA isolated from the library did not contain parts of the 5'-and 3'-untranslated regions in the mRNA. To obtain the fulllength clone, polymerase chain reaction amplification was performed using GIBCOPRL 3'-and 5'-RACE kits. Rapid amplification of cDNA ends (RACE) is a procedure for amplification of nucleic acid sequences from a messenger RNA template between a defined internal site and either the 3'-or 5'-end of the mRNA (14). In both procedures, mRNAs are first copied using reverse transcriptase, and the specific first-strand cDNA products are then amplified by the polymerase chain reaction. For 5'-RACE, an intermediate terminal deoxynucleotidyltransferase tailing reaction is required prior to amplification. the sequence 3'-CCCCCGACAGGACACGAGAGG-5', and the The gene-specific primer synthesized for the 5'-RACE system had primer for the 3'-RACE system was 5"GGAGGGTCACATAGGCT-3'. This 2770-bp cDNA (see Fig. 1) is designated pAG1.
Organ Culture and Biosynthesis Studies-Maxillary and mandibular incisors from 3-week-old male Sprague-Dawley rats were collected fresh and cut transversely so that the pulp was exposed at both ends, but was otherwise surrounded by dentin and the intact layer of odontoblasts. These were placed in culture in minimum essential medium supplemented with 10% fetal calf serum and 100 pg/ml ascorbic acid and in the presence of a 1% antibiotic and antimycotic penicillin/streptomycin solution (15). After a 1-h preincubation at 37 "C in 5% COz, 95% air, 150 pCi of 32P04 was added. Incubation was continued for 24-72 h. At the end of the labeling period, the teeth were cooled to 4 'C, washed in 15% NaC1, and split open to expose the pulp. The tooth shards were sonicated and then thoroughly extracted with 4.0 M guanidine hydrochloride to remove the constituents of the cellular and other nonmineralized tissue components. The proteins of the mineralized matrix were then extracted during demineralization with 0.5 M EDTA at pH 7.5 in the presence of a protease inhibitor mixture (50 mM c-amino-n-caproic acid, 5 mM benzamidine, 2 mM phenylmethylsulfonyl fluoride, 0.5 mM N-ethylmaleimide). The phosphophoryns were isolated and purified by chromatography on DEAE-cellulose columns (6,7). The solubilized proteins were analyzed by gel electrophoresis followed by Stains-all staining or autoradiography, depending on the information desired.  units in the initial screen with the polyclonal anti-riPP antibody, and these were verified by plaque purification. On the assumption that riPP contained some sequences of contiguous Asp residues, these plaques were screened with the poly-Asp nucleotide probe described above. This screening yielded a few very strongly positive plaques, which were then the focus of further study. It is important to note that there were other positive plaques, but we have not yet explored these. The selected strongly positive clones were cross-screened and found to fit into two groups with related (but different sized) inserts.

The initial
The first group of related clones to be sequenced proved to be virtually identical to rat bone osteopontin. There was a 95% sequence identity to nucleotides 1364-1457 of the bone protein according to the National Institutes of Health and EMBL nucleic acid data bases. The expressed protein osteopontin has not been shown to be a major constituent of mineralized dentin, but there is no doubt that the mRNA is present in the cells of the odontoblast-pulp complex (16). Further studies have not been carried out.
The second set of antibody-and oligonucleotide-reactive clones was selected for investigation. The clone providing the longest insert, pAG1, was sequenced following the strategy shown in Fig. 1. The complete nucleotide sequence of 2770 bases of pAG1 and the deduced amino acid sequence of 489 residues are shown in Fig. 2. The arrow at residue 16 marks the consensus signal peptide cleavage site (17). The amino acid composition is shown in Table I. Excluding the signal sequence, the deduced "secreted" form of AG1 contains a total of 134 Asp and Glu residues as compared with a total of 47 Arg, Lys, and His residues, making this intrinsically a very acidic protein. Typical of the presently known specific dentin matrix proteins, the 107 Ser residues are the predominant constituent. In the secreted protein, there is a single Cys residue, only 4 residues removed from the carboxyl terminus. The 3"untranslated region consists of 1000 nucleotides and contains a single polyadenylation signal (AATAAA). As cal-culated from the composition data, the molecular weight of the secreted AG1 is 53,000.
To examine the tissue distribution of AG1 mRNA and to verify its presence in the odontoblast-pulp cells, an 800-bp nucleotide sequence from the 5'-end of the AG1 cDNA ( Fig.  1) was used to carry out Northern analyses of mRNA extracted from rat skin, liver, brain, calvaria, tibia, and dentin. In this case, the mRNA from the odontoblasts was separated from the pulp cells by carefully extracting the pulp with an endodontic file. This leaves a clean, albeit incomplete layer of odontoblasts attached to dentin by their odontoblastic processes. Other pulp cells do not adhere to dentin. On the other hand, some odontoblasts break off at the process and remain attached to the pulp so that the pulp cell fraction is "contaminated" with odontoblasts. Fig. 3 shows that the odontoblasts contain the AG1 mRNA, whereas with equivalent poly(A+) RNA loading, no AG1 mRNA was detected in skin, liver, brain, or calvaria. Trace amounts of AG1 mRNA were detectable (data not shown) in tibia mRNA following prolonged exposure of the blot, but AG1 was certainly not prominent. These data are unequivocal in showing that AG1 mRNA is essentially odontoblast-specific.
Two experiments were carried out to demonstrate the presence of the expressed protein in the tissue. First, pAGl was expressed at high density in the XgtlllE. coli system and blotted onto nitrocellulose. The filters with bound recombinant AG1 were then exposed to the the polyclonal anti-riPP antibody used for its initial detection. The filters were extensively washed in binding buffer to eliminate nonbound components of the mixture of antibodies. The filters were then treated as equivalent to affinity columns, and the affinitybound antibody was eluted and concentrated. An acrylamide gel electrophoresis run was carried out with the total DEAEpurified phosphophoryn extracted from fresh rat incisors in standard fashion. It is important to note that as in the organ culture dentin demineralization, the teeth were washed extensively with guanidine hydrochloride to remove nonmineral phase proteins prior to demineralization. Western analysis was carried out (Fig. 4, lane 3 ) , with the result that a single band at M, = 61,000 was detected by the filter-bound antibody, monospecific to recombinant AG1. The initial polyclonal anti-riPP antibody revealed several bands (lane 4). In addition to binding to the principal a-and 6-riPPs, as expected, a band corresponding to the putative AG1 was evident. However, to see AG1 with the polyclonal anti-riPP antibody, total riPP had to be loaded on the gel at a 10-fold higher concentration than usual, and this revealed a number of stained components not usually observed (lane 4). In contrast to the other bands, only the monospecific anti-recombinant AG1 antibody yielded a single sharp band. These data show that a single protein reactive to the antibody specific to recombinant AG1 is present in the mineralized compartment of the dentin matrix. The apparent molecular weight of organ culture-produced AG1 (Mr = 61,000) is higher than the calculated core protein molecular weight. As indicated below, this is attributed to post-translational modification.
We have recently reported that the dentin phosphophoryns are substrates for phosphorylation by casein kinases I and I1 and that membrane-bound forms of these kinases are present in the endoplasmic reticulum/Golgi compartments of osteoblast-like cells (18). Using the most conservative rules for I1 is indicated in boldface type. The criteria for selecting these sites were the embedding of the serine in an acidic sequence NH,or COOHterminal to the residue and the presence of Asp or Glu at position -3 or +3 from the Ser residue. A residue was not considered a substrate if there was an intervening Arg, Lys, or His residue (19,20). Although a number of the Thr residues are likely candidates for phosphorylation, these have not been designated here. In this lane, the amount of phosphophoryns loaded was >75 pg. A t lower concentrations, the AG1 band could not be seen. The origin of the intense band at -70 kDa is not known, but is seen under these high loading concentrations. Lane 5, autoradiograph of the purified phosphophoryn components extracted from rat incisors cultured in vitro and labeled with 32P for 72 h following the procedure of DiMuzio et al. (15). The incisors were washed with 4.0 M guanidine HCl to remove cellular components prior to demineralization of dentin and phosphophoryn extraction. Protease inhibitors were included during extraction (7). A band corresponding to AG1 is prominent in the immunoblots and autoradiograph.
predicting substrate sequences for these kinases (19,20) (ie. for casein kinase 11, an acidic cluster COOH-terminal to the Ser/Thr with an obligatory Glu, Asp, or Ser(P) at residue +3 from the substrate Ser/Thr; or for casein kinase I, an acidic cluster with Ser(P), Asp, or Glu at residue -3 from the substrate Ser), an examination of the sequence in Fig. 2 shows that at a minimum, 55 of the 106 seryl residues should be likely candidates for phosphorylation. (Intervening Arg, Lys, and His residues were taken to exclude phosphorylation.) This observation suggested that a second way of revealing the presence of AG1 might be to label incisor dentin in culture with 32P04. After 72 h in organ culture, the mineralized matrix-associated proteins were extracted, and the phosphophoryn fraction was isolated and purified by DEAE chromatography. The 32P-labeled cultures yielded matrix-associated mineral phase bound radiolabeled phosphophoryns. Gel electrophoresis of the protein showed the presence of a radiolabeled band corresponding precisely to the position of the specific anti-AG1 antibody (Fig. 4, Iane 5). In fact, in the organ culture system, AG1 was more heavily labeled than the higher molecular weight phosphophoryns. As noted above, the apparent molecular weight of this band is 61,000. The addition of the minimum 55 phosphate groups would increase core mass by 4.350 from 53,000, a value in excellent agreement with the observed data. These data substantiate that pAGl represents and encodes a real protein present in dentin in a phosphorylated state.

DISCUSSION
The dentin extracellular matrix consists of -90% collagen, with the rest composed of phosphoproteins and acidic glycoproteins (21,22). The principal phosphoproteins discussed in the literature are the serine-and aspartic acid-rich phosphophoryns (5). All of the above data indicate that rat dentin contains another previously unreported unique, acidic, and potentially highly phosphorylated extracellular matrix protein, designated here as AG1. Relative to the phosphophoryns, AG1 has a high content of Glu residues and some similarities in amino acid composition to osteopontin and bone sialoprotein. However, AG1 also has features that have been proposed for phosphophoryn and seems to be exceptionally suited to play some regulatory role within the matrix of mineralized bone and dentin. AG1 is remarkably hydrophilic. As shown in Fig. 5, a plot of the hydrophobic/hydrophilic balance using the hydropathy measure of Kyte and Doolittle (23) shows that virtually only the putative signal peptide domain is hydrophobic. The carboxyl-terminal region, which contains the single cysteine residue, is also hydrophilic. Conformational analysis algorithms developed by the University of Wisconsin Genetics Computer Group program using Chou-Fasman methods predict that the AG1 backbone ( i e . before any potential posttranslational modification such as phosphorylation or glycosylation) might have a few 8-turn regions, but there are no regions of ordered long-range @-sheet or a-helix (Fig. 6).
In spite of the fact that the cDNA was detected as reactive with the putative poly-Asp oligonucleotide probe, no extended poly-Asp sequence is present. There are, however, acidic patches of Asp and Glu, such as at residues 84-89 (EEDEDD) and 254-257 (EEDD). Furthermore, there are several regions

254
EEDD with 3 consecutive acidic residues and many more with 2. Table I1 shows that exactly half of the Glu residues appear as consecutive EE sequences, and except at positions 105-106, these are in Ser-and Thr-rich domains. Only two of the EE pairs are near or adjacent to a basic residue (positions 281-282 and 441-442). Similarly, many of the Asp residues are in Ser-, Thr-, and Glu-rich sequences. The acidic nature of the molecule is markedly enhanced when the potential for phosphorylation of strategically placed Ser and Thr is considered. We have shown that casein kinase I1 and membrane-associated casein kinase 11-like kinases in osteoblasts can readily phosphorylate riPP (18) and that membrane-associated odontoblast casein kinase 11-like kinases can rephosphorylate dephosphorylated riPP.' Thus, it is likely that endoplasmic reticulum/Golgi-associated casein kinase I1 could have access to nascent AG1 during its chain synthesis and secretion. As noted above, Ser or Thr residues in acidic sequences with Glu, Asp, or Ser(P) at position +3 from the carboxyl-terminal side are preferred casein kinase I1 substrates. Additional upstream Asp or Glu at position +2 may also be sufficient to direct Ser or Thr phosphorylation by casein kinase I1 in model peptides and physiological protein substrates (25,26). Substrate recognition sites for casein kinase I include Ser(P), Asp, or Glu at position -3 from the amino-terminal direction from the substrate Thr and Ser. Ser(P) is the most active phosphorylation determinant in model peptides (27). The more acidic the neighboring environment, the better the substrate sequence. Using these minimal criteria and the deduced sequences of Fig. 2, it appears C. Sfeir and A. Veis, unpublished data. that 55 of the 107 serine residues in the secreted AG1 could be readily phosphorylated. If this were the case and with the Ser(P) acid dissociation constant at pK. 6.8 (28), AG1 would carry a charge of -88 from the phosphate groups alone and a net charge of -175/molecule at physiological pH. Such a molecule should have a very high capacity for binding divalent cations such as calcium, as already determined for the phosphophoryns (29).
Inspection of the amino acid sequence determined for recombinant AG1 shows several sequence domains that contain an abundance of phosphorylation consensus sites for casein kinases I and 11. Consider the sequence from residues 442 to 454 (SNSTGSTSSSEED). Each S and T residue in the sequence is a potential successive substrate for casein kinase I1 phosphorylation, assuming that each of the 3 COOH-terminal S residues is targeted for phosphorylation by the EED sequence. As the phosphate groups are added, the NHn-terminal S and T residues become autocatalytic substrates. This could potentially yield a domain of 11 acidic residues in a sequence of 13 amino acids and a net charge of -15.8. Similarly, the 3 contiguous S residues in the sequence from positions 393 to 400 (DEDSSSQE) can be acted upon by both casein kinases I and I1 to become fully phosphorylated and to create another hypercharged domain. Further analysis of this type is not appropriate until specific studies of the recombinant protein sequences can be undertaken. It is probable that the phosphorylated molecule will have a much higher charge density in the carboxyl-terminal region and a higher degree of phosphorylation than in the amino-terminal two-thirds of the molecule. It is especially interesting that the sequence -SSSES-, present as a single sequence in @-casein (25) and in some nuclear phosphorylated proteins (30) as phosphorylation sites and calcium ion-binding regions, is so prominent.
There are a few other sequence regions worthy of special note. There is a single Asn-Thr-Ser sequence (positions 340-342) in the protein, representing the consensus attachment site (-Asn-X-Ser-) for an N-glycosidically linked oligosaccharide; and this is in a sequence that is also not likely to become a site for phosphorylation. Furthermore, as indicated in Fig.  2, there is a nearby RGD sequence. This might indicate that AG1 has integrin binding properties. There are several Ser-X-Glu tripeptide sequences that might provide a signal for the synthesis of 0-glycosidically linked oligosaccharides. However, since these same sites are also targets for phosphorylation, one cannot predict which of these reactions would take precedence. Conformational and hydrophobicity analyses indicate that the entire molecule, from the signal peptide cleavage point to the carboxyl-terminal region, is hydrophilic and has no predicted preferred regular conformation; but it is likely that in the physiological milieu, the highly phosphorylated domains would bind multivalent cations such as calcium and that some regular conformations might be assumed. The AG1 sequence, with its several domains of 5 or more consecutive acidic residues, after phosphorylation, presents a marked contrast to the known bone matrix acidic proteins. These are not as highly phosphorylated, and their long acidic sequences do not contain phosphorylation sites.
There are obviously many interesting avenues to explore; but with the cloned recombinant protein, it should now be possible to examine the interactions of the AG1 protein with both collagen and mineral for assessing its potential role in the regulation of mineralization of dentin as well as to determine its localization within the matrix and other aspects of its function and production.