Crystal structure and biochemical characterization of human kallikrein 6 reveals a trypsin-like kallikrein is expressed in the central nervous system

The human kallikreins are a large multigene family of closely related serine-type proteases. In this regard, they are similar to the multigene kallikrein families characterized in mice and rats. There is a much more extensive body of knowledge regarding the function of mouse and rat kallikreins in comparison with the human kallikreins. Human kallikrein 6 has been proposed as the homologue to rat myelencephalon-specific protease, an arginine-specific degradative-type protease abundantly expressed in the central nervous system and implicated in demyelinating disease. We present the x-ray crystal structure of mature, active recombinant human kallikrein 6 at 1.75-A resolution. This high resolution model provides the first three-dimensional view of one of the human kallikreins and one of only a few structures of serine proteases predominantly expressed in the central nervous system. Enzymatic data are presented that support the identification of human kallikrein 6 as the functional homologue of rat myelencephalon-specific protease and are corroborated by a molecular phylogenetic analysis. Furthermore, the x-ray data provide support for the characterization of human kallikrein 6 as a degradative protease with structural features more similar to trypsin than the regulatory kallikreins.

Myelencephalon specific protease (MSP) 1 is a member of the rat kallikrein gene family that is abundantly expressed in the rodent CNS, and shown to be up-regulated in response to glutamate receptor-mediated excitotoxic injury (18). Potential human homologues to rat MSP 5

Experimental Procedures
Expression, crystallization, and data collection. Mature active hK6 was expressed and purified from a baculovirus/insect cell line system essentially as described for rat myelencephalonspecific protease (MSP) (27), using a synthetic (Asp) 4 Lys pro-sequence and activation by enterokinase. Purified active hK6 was concentrated to 20 mg/ml in 40 mM sodium acetate, 100 mM NaCl, and 20 mM benzamidine, pH 4.5. Crystallization conditions were identified using a hanging-drop sparse-matrix screen (31) of precipitants, salts, and pH conditions (Hampton Research, Laguna Niguel, CA). Diffraction quality crystals grew from 30% (w/v) PEG 4000, 0.2 M magnesium chloride hexahydrate, and 0.1 M Tris hydrochloride, pH 8.5 after two weeks incubation at 4 °C.
X-ray intensity data were collected at 103 K from a single crystal (0.5 × 0.2 × 0.05 mm) with a Rigaku imaging plate area detector R-Axis IIc using Cu-K α radiation. Data were processed and scaled using DENZO and SCALEPACK (32,33). This crystal diffracted to at least 1.75Å. The space group was tentatively identified as orthorhombic P2 1  group of the correctly rotated model resulted in a single peak 4σ above the noise level. The R cryst was 47.3% after rigid body refinement of this initial solution.
A 3 Å 2F obs -F calc SIGMAA-weighted composite annealed omit map (5% of data omitted) was calculated and the structure was built and refined through alternating cycles using the graphic program O (36) and CNS. All refinements were performed by simulated annealing using a maximum likelihood target, and this cyclic procedure was repeated several times with gradual increase of the resolution to 1.75 Å. A random selection of 3% of the data was assigned for calculation of R free and was not included in the refinement. Solvent molecules were added at the last stage of refinement at stereochemically reasonable positions.
Autolysis of hK6. Autolysis of hK6 was evaluated using 16.5% Tricine sodium dodecyl sulfate polyacrylamide gel electrophoresis (tricine SDS-PAGE) (37) and activity assays with L-benzoylarginine paranitranilide (L-BAPNA). Mature hK6 in PBS, pH 7.31, was incubated at 37 °C and Digestion of myelin basic protein and extracellular matrix proteins by hK6. Rat myelin basic protein (MBP) isolated from spinal cord was added to hK6 at a 1000:1 mass ratio in 50 mM Tris and 100 mM NaCl, pH 8.0. This mixture was incubated at 37 °C and time points were taken at 10, 30, 60, 120, and 240 min. The MBP and degradative fragments were resolved using Tricine SDS-PAGE (16.5%). Laminin from basement membrane of Engelbreth-Holm-Swarm mouse sarcoma (Sigma Chemical Co., St. Louis, MO) was diluted in TBS, pH 7.5, to a concentration of 1 mg/ml. Active hK6 was added to a concentration of 4.2 µM (10:1 w/w ratio of laminin: hK6).
The sample was incubated at 37 °C, and aliquots of the digestion mix were taken at 0, 1, and 24 h, resolved on 7.5% SDS-PAGE and visualized by Coomassie blue staining. Mouse fibronectin (Life Technologies, Rockville, MD) was used as provided as a stock solution of 1.0 mg/ml in 2.7 mM potassium chloride, and 10% glycerol, pH 7.3. Mature hK6 was added to a final 8 concentration of 4.2 µM (10:1 w/w ratio of fibronectin: hK6). The sample was incubated at 37°C , and aliquots were taken and analyzed in a manner identical to that of the laminin digestion.
Phylogenetic analyses. A dataset of hK6-related proteins was collected and assembled from protein sequence databases (as of Sept. 2001) using FastA (38) and LookUp (39) within the Genetics Computer Group's Wisconsin Package SeqLab interface (GCG, 2001). An Expectation Value of 10 -4 was used as a list cut-off and all entries other than human, rat, and mouse were excluded. Redundancies, splicing variants, and other isoforms and were then sorted out leaving a dataset of thirty-three protein sequences (Table I). PileUp (40) with the BLOSUM30 matrix (41) was used to initially align the sequences, followed by considerable regional realignment and manual adjustment. The final aligned amino acid sequence dataset is available from the authors by request.
GCG's ToFastA and Don Gilbert's ReadSeq (1993) were used to create a PHYLIP (42) format dataset from the alignment, where columns of excessive homoplasy, as judged by similarity less than 15%, were excluded. Three phylogenetic inference methods were used on the resultant data matrix: 1) The maximum likelihood, quartet-puzzling program Tree-Puzzle (43) run with the JTT amino acid substitution model (44) and 1000 steps produced a maximum likelihood tree estimate with branch lengths and node support values. 2) Pair-wise distances were estimated with PHYLIP's ProtDist PAM model (45) and least squares fit to an optimal globally rearranged tree by the PHYLIP Fitch algorithm with ten random additions. 3) The data matrix was bootstrapped 100 times by PHYLIP's SeqBoot, ProtDist generated 100 PAM based pair-wise distance matrices, and then PHYLIP's Neighbor neighbor-joining algorithm and Consense program provided bootstrap node support values. Majority rule, that is wherever two 9 or more of the three estimates agreed, provided the resolved clades on the final tree presented in Fig. 1. Final node supports values were calculated as the average between the Tree-Puzzle and bootstrapped neighbor-joining results wherever they agreed on a particular node; all values greater than 50% were printed at their respective node.

Results
Recombinant hK6 protein. The homogeneity of purified hK6 was evaluated using aminoterminal sequencing and MALDI-TOF mass spectrometry. Mass spectrometry revealed that the hK6 samples used for crystallization contained intact, glycosylated enzyme (Fig. 2). The major peak had a mass of 25,866 Da, which is a difference of +1366 Da from the mass calculated from the protein sequence. This extra mass corresponds to approximately six N-acetylglucosamine molecules. Furthermore, peaks corresponding to six different glycosylated forms were visible in the mass spectrum, with the average difference in mass between each form being § Da (corresponding to the mass of one hexose unit). Amino-terminal sequencing analysis yielded a single sequence of Leu-Val-His-Gly, representing the correct amino terminal sequence for mature hK6.
X-ray structure refinement. A total of 140 solvent molecules were added to the refined hK6 structure. One tentatively assigned solvent molecule exhibited octahedral coordination geometry with adjacent solvent molecules and short (~2.0Å) contact distances with these groups. This solvent was therefore assigned as a Mg 2+ ion (46). Unambiguous density was also visible within the active site region indicating the presence of a bound benzamidine inhibitor with terminal amine groups clearly defined. In the final refined structure 227 of the 229 amino acid residues are defined in the electron density map. The observed electron density is in full agreement with the amino acid sequence deduced from the cDNA sequence (20). The peptide backbone of hK6 could be traced unambiguously from its amino-terminal Ile16 to Gln243 (using the chymotrypsinogen numbering scheme (47)). C-terminal residues Ala244 and Lys245 lacked adequate electron density and were not built into the model. The side chain residues of Lys24, Arg110, Gln239, and Gln243 are undefined in the electron density map and were therefore modeled as Ala residues. Asp150 was modeled in multiple rotamer conformations. Some of the loop regions, in particular the region from Trp215 to Pro225, required extensive rebuilding due to large differences from that of the search model. The model refined to acceptable values of stereochemistry and crystallographic residual (Table II).
Digestion of myelin-related and extracellular matrix proteins by hK6. Rat myelin basic protein (MBP) was extensively and rapidly degraded by hK6 (Fig. 3). Extended incubation resulted in a characteristic pattern of four lower molecular mass fragments. Rat plasma fibronectin was rapidly degraded by hK6 to yield a polypeptide with an apparent molecular mass of § N'D ( Fig. 3). This polypeptide was subsequently degraded to numerous smaller fragments after extended incubation with hK6. Mouse laminin was likewise rapidly degraded by hK6, yielding an initial polypeptide with a mass of § N'D DQG QXPHURXV VPDOOHU peptide fragments (Fig. 3).
Determination of steady-state kinetic constants. Active hK6 exhibited characteristic Michaelis-Menten kinetics with all substrates. Kinetic constants for the hydrolysis of Tos-GlyProArg-AMC and Tos-GlyProLys-AMC are listed in table III. When compared to rMSP, hK6 has a somewhat reduced activity towards these substrates, and exhibits a general preference in k cat for Arg in the substrate P1 position relative to Lys.
Autolysis of hK6. Tricine SDS-PAGE revealed that hK6 undergoes autolysis (Fig. 4). Aminoterminal sequencing of the Tricine SDS-PAGE resolved autolysis fragments identified a peptide sequence corresponding to a single cleavage site between residues Arg76 and Glu77. Activity assays against L-BAPNA indicates that the autolytic event results in a corresponding loss of   Phylogenetic analyses. The three phylogenetic inference estimations consistently grouped certain clades, yet the resolution at the base of the tree remained obscure. Importantly, every analysis specifically associated human hK6 with the rodent MSPs, clearly indicating their orthologous relationship. This particular node on the tree had almost as much support as that grouping the rat and mouse MSP's to each other, 83.5% and 89.5% respectively. Other orthologues between the human and rodent genes in the tree were as expected and range in support value from below 50% for the hK4 human and mouse homologues to 98% in the human and mouse hK7 system. Paralogous hK relationships in the tree, where they were resolved, had quite low support values, ranging from below 50% for those nodes associating hK7 with hK5 and hK4, to 62% between hK2 and hK13, up to 73.5% between hK9 and hK11. Conversely the support values for most of the classical trypsin homologues were quite high although the complement factor D (CFAD) system is only weakly supported, at 53%, as being trypsin's nearest paralogue. hK10 appeared to have diverged the furthest from the common ancestor of all the hKs, although hK15 and hK4 were almost as divergent. All of the trypsins and CFADs diverged as much or greater from the common ancestor of all the sequences on the tree as did any of the hKs. In fact, the human CFAD had almost 0.7 substitutions per site along its length in its divergence from the last common ancestor of the dataset.

Discussion
Human kallikrein 6 is functionally related to rat MSP. Previously reported northern blot analysis of rat MSP and KLK6 have demonstrated a similar abundant expression in the brain in comparison to peripheral tissues (48). These studies also demonstrated tissue-specific expression in the spinal cord and medulla oblongata, and showed that the pattern of expression of MSP differed from that of tissue plasminogen activator. Rat MSP exhibits the highest amino acid identity (69.1%) with hK6, in comparison with the other human kallikreins; KLK6 has therefore been proposed as the human homologue of MSP (27). The present phylogenetic analysis strongly corroborates this assertion. In spite of basal resolution so poor that it is impossible to tell with any confidence just what the ancestral paralogous branching order of the kallikreins was, the orthology of hK6 and the rodent MSPs is obvious (with near 85% node support value). Future work in pursuing these basal relationships is being made through the use of a DNA alignment that corresponds to our aligned protein dataset. Much more sophisticated models of evolution are available for DNA than is for protein datasets, especially as implemented in PAUP*'s (49) maximum likelihood method. These sophisticated models may provide a greater evolutionary look-back time than the present study achieved and allow for the teasing of some order out of the original gene duplications that led to this large, complicated, and important gene family.
Rat MSP is characterized as a degradative protease, with greater catalytic efficiency for The natural pro-peptide sequence of hK6 is Glu-Glu-Gln-Asn-Lys (19), and cleavage after the Lys residue produces mature active hK6. Rat MSP has a similar activation pro-peptide sequence of Glu-Asp-Gln-Asp-Lys (48) and is not activated by autolytic digestion (27). This inability of rat MSP to self-activate has led to the proposal that a distinct, Lys-specific protease is responsible for activation of rat MSP in vivo (27). Similarly, the preference for cleavage after Arg versus Lys residues in the P1 position suggests that a distinct Lys-specific protease is hypothesized to activate pro-hK6 in vivo. Overall structural relationship of hK6 with other serine proteases. The secondary structure of hK6 is composed of thirteen β-strands, two α-helices, two 3 10 -helices, and eight identifiable loop regions. These loop regions have varying functions that, based upon the structures of related serine proteases, include defining substrate specificity (54-57) and autolytic regulation (27,58,59). In addition, these loops can provide sites for N-glycosylation that may serve to regulate activity in this class of enzyme (60).
The overall structure of hK6 is more similar to that of bovine trypsin than the mouse hK6 has no inserted residues in this region and thus lacks the classical kallikrein loop. This loop in hK6 is indistinguishable in length in comparison to the degradative proteases trypsin and chymotrypsin, and shorter than that seen in mouse kallikreins or other regulatory type proteases (Fig. 6). Although the amino acid sequences within this region differ between hK6 and trypsin, the structures are essentially identical (Fig. 5 and 6).
The short surface loop comprising residue positions 172-178 is identical in length for the different proteases compared in Fig. 6. The amino acid sequence for hK6 within this region is identical to that of bovine trypsin with the exception of position 178 (Fig. 6), and adopts an essentially identical structure as bovine trypsin (Fig. 5). This short loop is oriented away from http://www.jbc.org/ Downloaded from the active site, and contrasts with the homologous region in mK13, which is oriented towards the active site (Fig. 5).
The loop region 141-152 in hK6 is shorter than that in trypsin (Fig. 6), and leads to a conformation that orients this loop away from the active site in comparison to trypsin (Fig.5). In the comparison with other proteases (Fig. 6) the broad-specificity degradative proteases generally have a shorter length loop in this region, whereas the regulatory proteases have longer loops that afford more extensive structural determinants of the substrate binding site.
The structural data for the variable surface loop regions that border the active site of hK6 describe loops that are both short and generally oriented away from the substrate binding site.
Thus, their contribution to formation of the S2 and S3 sites within the protease appears limited. This is a characteristic feature of the degradative type proteases, exemplified by the digestive enzymes trypsin and chymotrypsin (61). Thus, the original hypothesis (18) (table III). This apparently weak binding affinity may reflect limited interactions within the S2 and S3 sites, as is suggested from the general structural data of the active site.
Thus, hK6 may function effectively only with larger peptide substrates with the potential for extended contact interactions beyond the S2 and S3 sites. The rapid digestion of myelin basic protein is consistent with this hypothesis. S1 site structural features. Residues 189-195, 214-220, and 224-228 in addition to the catalytic triad define the S1 binding pocket. The presence of a bound benzamidine inhibitor in the x-ray structure of hK6 permits an evaluation of how the guanidino group of a substrate P1 Arg side chain might fit within the active site. In trypsin, each of the nitrogen groups of the bound benzamidine inhibitor hydrogen bonds to an oxygen moiety of the Asp189 in the "bottom" of the S1 binding pocket (Fig. 7). In porcine kallikrein (an available kallikrein structure with a bound benzamidine inhibitor) the Oγ moiety of the Ser side chain at position 226 displaces one of the benzamidine amide groups and forces a rotation of the benzamidine ring of approximately 60°a way from the Ser sidechain (61). Similar to trypsin, hK6 has a Gly residue at position 226 and the interaction of benzamidine with the Asp189 side chain is virtually indistinguishable from that of trypsin (1CE5), and distinctly different from the orientation in porcine kallikrein (2PKA) (Fig.   7).
Further structural similarity of the S1 site between hK6 and trypsin is achieved due to structural changes within the local region 215-220. This region in trypsin adopts a conformation that results in a hydrogen bonding interaction between the main chain carbonyl of residue Gly 218 with a benzamidine nitrogen group (Fig. 7). Although region 215-220 in hK6 has an amino acid insertion in comparison to the same region in trypsin, it adopts a conformation that positions the main chain carbonyl of residue Asn 217 in an almost identical location as that of Gly 218 in trypsin (Fig. 7). Although region 215-220 in porcine kallikrein has the same length as in hK6, there are slight conformational changes, presumably in response to the Ser226 residue. These conformational changes position the main chain carbonyl of residue 217 further away from the bound benzamidine and permit a hydrogen bonding interaction with the alternatively oriented benzamidine nitrogen (Fig. 7). These structural features in hK6 suggest a generally optimized fit for a P1 guanidino group within the active site that translates into a much higher catalytic efficiency towards substrates with an Arg versus Lys residue in this position. Site of glycosylation. It has been reported that N-linked oligosaccharides within the "kallikrein loop" of neuropsin (the apparent mouse homologue of KLK8) affect the size of the S2 pocket and that mutations in this region result in a significant decrease in both k cat and K m (while maintaining the overall k cat /K m ) (60). As previously mentioned, hK6 lacks the equivalent "kallikrein loop" characteristic of the regulatory proteases, including the N-linked Asn residue at position 95 (Fig. 6). However mass spectrometry data suggests there is a potential N-linked glycosylation site (sequence Asn-Xxx-Thr) at position Asn132 that is not present in any of the other known kallikrein structures. In contrast to the N-glycosylation site found on the kallikrein loop in other kallikreins, residue 132 is quite distant from the active site and lies at the "rear" of the enzyme. There is electron density present in this region that is indicative of possible sugar residues, but the density is not sufficient for accurate modeling. The function of this site of glycosylation has yet to be determined, but due to its distal location from the active site it is hypothesized not to significantly affect enzyme specificity or function.